Cap web/backend Node heaps from the container memory limit#1291
Draft
brendan-kellam wants to merge 1 commit into
Draft
Cap web/backend Node heaps from the container memory limit#1291brendan-kellam wants to merge 1 commit into
brendan-kellam wants to merge 1 commit into
Conversation
The web and backend Node processes share the container's memory limit with zoekt, but V8's default old-space heap doesn't track the cgroup limit — so `web` caps near ~4GB regardless of a larger container and OOMs there once its working set grows (supervisord respawns it, but the liveness probe can escalate the gap into a full pod restart). entrypoint.sh now derives a per-process --max-old-space-size from the container's memory limit (cgroup v2 then v1): web 55%, backend 20% by default. The values are wired into the web/backend commands in supervisord.conf (per-process, so the two heaps don't over-commit the shared cgroup). Falls back to V8's default (0) when no limit is detected. Overridable via WEB_HEAP_PERCENT / BACKEND_HEAP_PERCENT or absolute WEB_MAX_OLD_SPACE_SIZE / BACKEND_MAX_OLD_SPACE_SIZE; the Dockerfile sets 0 defaults as a backstop for supervisord interpolation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
Draft — numbers (55% / 20%) are an initial estimate and should be validated against measured per-process RSS under load before merging.
Problem
The
webandbackendNode processes share the container's memory limit withzoekt, but V8's default old-space heap doesn't track the cgroup limit. Sowebcaps near ~4 GB regardless of a larger container and OOMs there once its working set grows:supervisord respawns
web(autorestart=true), but while it's down/api/healthfails and an aggressive liveness probe can escalate the blip into a full pod restart. Observed ~14 restarts over ~5 days on a 16 GiB pod.Fix
entrypoint.shderives a per-process--max-old-space-sizefrom the container's memory limit (cgroup v2 → v1) before launching supervisord, andsupervisord.confpasses it to theweb/backendcommands. Per-process (not a globalNODE_OPTIONS) so the two heaps can't over-commit the shared cgroup.0(V8 default = current behavior) when no limit is detected (unlimited / non-cgroup).WEB_HEAP_PERCENT/BACKEND_HEAP_PERCENT, or absoluteWEB_MAX_OLD_SPACE_SIZE/BACKEND_MAX_OLD_SPACE_SIZE. TheDockerfilesets0defaults as a backstop so supervisord's%(ENV_...)sinterpolation always resolves.On a 16 GiB container this gives web ~9 GiB (up from V8's ~4 GiB), backend ~3.3 GiB, leaving ~25% for zoekt + OS.
Validation
0safely.--max-old-space-size=0confirmed to mean "V8 default", not a zero heap.sh -n entrypoint.shclean.Notes / follow-ups
web's ~8–9 h ramp to OOM may indicate a leak; raising the cap extends the cycle rather than ending it. A separate liveness-probe change (chart side) stops the OOM-and-recover blip from counting as a pod restart.zoekt(Go) is intentionally untouched — its dominant footprint is reclaimable mmap'd shard page cache, which a heap flag wouldn't bound.🤖 Generated with Claude Code