Skip to content

Cap web/backend Node heaps from the container memory limit#1291

Draft
brendan-kellam wants to merge 1 commit into
mainfrom
per-process-node-heap-caps
Draft

Cap web/backend Node heaps from the container memory limit#1291
brendan-kellam wants to merge 1 commit into
mainfrom
per-process-node-heap-caps

Conversation

@brendan-kellam

Copy link
Copy Markdown
Contributor

Note

Draft — numbers (55% / 20%) are an initial estimate and should be validated against measured per-process RSS under load before merging.

Problem

The web and backend Node processes share the container's memory limit with zoekt, but V8's default old-space heap doesn't track the cgroup limit. So web caps near ~4 GB regardless of a larger container and OOMs there once its working set grows:

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
... web (terminated by SIGABRT (core dumped); not expected)

supervisord respawns web (autorestart=true), but while it's down /api/health fails and an aggressive liveness probe can escalate the blip into a full pod restart. Observed ~14 restarts over ~5 days on a 16 GiB pod.

Fix

entrypoint.sh derives a per-process --max-old-space-size from the container's memory limit (cgroup v2 → v1) before launching supervisord, and supervisord.conf passes it to the web/backend commands. Per-process (not a global NODE_OPTIONS) so the two heaps can't over-commit the shared cgroup.

  • Defaults: web 55%, backend 20% of the container limit.
  • Falls back to 0 (V8 default = current behavior) when no limit is detected (unlimited / non-cgroup).
  • Overridable: WEB_HEAP_PERCENT / BACKEND_HEAP_PERCENT, or absolute WEB_MAX_OLD_SPACE_SIZE / BACKEND_MAX_OLD_SPACE_SIZE. The Dockerfile sets 0 defaults as a backstop so supervisord's %(ENV_...)s interpolation always resolves.

On a 16 GiB container this gives web ~9 GiB (up from V8's ~4 GiB), backend ~3.3 GiB, leaving ~25% for zoekt + OS.

Validation

  • entrypoint shell logic unit-tested across scenarios (16 GiB / 4 GiB / unlimited / override / custom %) — computes expected values and falls back to 0 safely.
  • --max-old-space-size=0 confirmed to mean "V8 default", not a zero heap.
  • sh -n entrypoint.sh clean.

Notes / follow-ups

  • The 55/20 split is a starting estimate — validate against real per-process RSS and retune (the env overrides make this a config change, not a rebuild).
  • web's ~8–9 h ramp to OOM may indicate a leak; raising the cap extends the cycle rather than ending it. A separate liveness-probe change (chart side) stops the OOM-and-recover blip from counting as a pod restart.
  • zoekt (Go) is intentionally untouched — its dominant footprint is reclaimable mmap'd shard page cache, which a heap flag wouldn't bound.

🤖 Generated with Claude Code

The web and backend Node processes share the container's memory limit with
zoekt, but V8's default old-space heap doesn't track the cgroup limit — so
`web` caps near ~4GB regardless of a larger container and OOMs there once its
working set grows (supervisord respawns it, but the liveness probe can escalate
the gap into a full pod restart).

entrypoint.sh now derives a per-process --max-old-space-size from the
container's memory limit (cgroup v2 then v1): web 55%, backend 20% by default.
The values are wired into the web/backend commands in supervisord.conf
(per-process, so the two heaps don't over-commit the shared cgroup). Falls back
to V8's default (0) when no limit is detected. Overridable via
WEB_HEAP_PERCENT / BACKEND_HEAP_PERCENT or absolute
WEB_MAX_OLD_SPACE_SIZE / BACKEND_MAX_OLD_SPACE_SIZE; the Dockerfile sets 0
defaults as a backstop for supervisord interpolation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ae51ddb4-0524-4529-a916-ac56f5e3bc40

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch per-process-node-heap-caps

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant