Skip to content

fix(core): restore live-migration throughput via CAP_IPC_LOCK in rootless virt-launcher#2537

Closed
fl64 wants to merge 3 commits into
mainfrom
fix/virt-launcher-multifd-zerocopy-ipc-lock
Closed

fix(core): restore live-migration throughput via CAP_IPC_LOCK in rootless virt-launcher#2537
fl64 wants to merge 3 commits into
mainfrom
fix/virt-launcher-multifd-zerocopy-ipc-lock

Conversation

@fl64

@fl64 fl64 commented Jun 25, 2026

Copy link
Copy Markdown
Member

Description

Point the build at a 3p-kubevirt feature branch carrying a one-line patch:
rootless virt-launcher now grants virtqemud/QEMU CAP_IPC_LOCK in addition
to CAP_NET_BIND_SERVICE (pkg/virt-launcher/virtwrap/util/libvirt_helper.go).

installCacheVersion is added to images/virt-artifact/werf.inc.yaml to force
a virt-artifact rebuild so the patch actually lands in the image.

Why do we need it, and what problem does it solve?

After the rootless virt-launcher switch (qemu uid 107 -> uid 64535), live
migration throughput collapsed from ~10-12 Gbps to ~200-300 Mbps, regardless of
link speed (reproduced on both 1 Gbps and 20 Gbps networks).

Root cause: the rootless QEMU ends up with CapEff = 0x400
(cap_net_bind_service only). Multifd zero-copy migration send (MSG_ZEROCOPY,
which needs page pinning) is therefore unavailable, and QEMU silently falls back
to copy-send, throttling the migration channel. Measured on an idle VM at
DirtyRate=0: MemoryBps stays ~300 Mbps, i.e. the channel — not dirty rate,
not the network, not the disks — is the bottleneck. Code of the migration data
path (migration-source.go, live-migration-source.go, migration-proxy.go)
is byte-for-byte identical between v12n.25.3 and v12n.43.2; the only
runtime change is the capability drop.

This PR is cluster-test scaffolding for the patch. If the measured peak
MemoryBps returns to multi-gigabit on the patched image, the assumption is
confirmed and the patch will be merged into 3p-kubevirt as a proper
v1.6.2-v12n.N tag (this PR will then be re-pointed at the tag and the
installCacheVersion line removed).

What is the expected result?

  1. The virt-artifact image rebuilds against the feature branch.
  2. Deckhouse rolls out new virt-handler pods and live-migrates VMs onto a node
    with the new virt-launcher.
  3. On a freshly migrated VM, CapEff of the QEMU process becomes
    cap_net_bind_service + cap_ipc_lock.
  4. Migrating an idle VM shows multi-gigabit peak MemoryBps via
    vlctl domain stats -o json (MigrateDomainJobInfo.MemoryBps), and
    query-migrate-parameters shows zero-copy send enabled.

Checklist

  • The code is covered by unit tests.
  • e2e tests passed.
  • Documentation updated according to the changes.
  • Changes were tested in the Kubernetes cluster manually.

Changelog entries

section: core
type: fix
summary: "Restored live-migration throughput lost after the rootless virt-launcher switch by granting QEMU CAP_IPC_LOCK for multifd zero-copy send."
impact_level: low

Point versions.yml at the fix/virt-launcher-multifd-zerocopy-ipc-lock
branch and add installCacheVersion to force a virt-artifact rebuild, so
the CAP_IPC_LOCK rootless-patch can be validated in the cluster.

Test-only change; will be replaced by a proper v1.6.2-v12n.N tag once
the patch merges into v1.6.2-virtualization.

Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
@fl64 fl64 force-pushed the fix/virt-launcher-multifd-zerocopy-ipc-lock branch from 284c1cf to 55aa486 Compare June 25, 2026 13:17
fl64 added 2 commits June 25, 2026 16:58
Rebuild virt-artifact against the updated 3p-kubevirt branch that also
adds CAP_IPC_LOCK to the d8v-compute container securityContext
(required so the ambient cap actually reaches QEMU).

Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
…d exec)

The libvirt_helper.go AmbientCaps change caused virt-launcher to fail
starting virtqemud with EPERM (ambient caps need inheritable set, which
is empty for non-root). Reverted; keeping only the container
securityContext bounding-set change, which QEMU inherits anyway.

Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
@fl64

fl64 commented Jun 25, 2026

Copy link
Copy Markdown
Member Author

Closing: root cause of the live-migration throughput regression is the Go toolchain change (greenteagc / go1.25.11 from the ALT 20260119 CVE-mitigation toolchain), not the virt-launcher capabilities. The CAP_IPC_LOCK hypothesis was disproven (measured MemoryBps unchanged, ~140 Mbps). Investigation moved to PR #2539 (greenteagc A/B).

@fl64 fl64 closed this Jun 25, 2026
@fl64 fl64 deleted the fix/virt-launcher-multifd-zerocopy-ipc-lock branch June 25, 2026 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant