Skip to content

Add PCIe RX/TX load to the utilization charts#481

Open
Basavaraja-MS wants to merge 1 commit into
Syllo:masterfrom
Basavaraja-MS:pcie-load-graph
Open

Add PCIe RX/TX load to the utilization charts#481
Basavaraja-MS wants to merge 1 commit into
Syllo:masterfrom
Basavaraja-MS:pcie-load-graph

Conversation

@Basavaraja-MS

@Basavaraja-MS Basavaraja-MS commented Jun 18, 2026

Copy link
Copy Markdown

Summary

Closes #268.

Adds PCIe RX and TX load as plottable metrics in the line charts, alongside the existing GPU and memory utilization graphs.

  • The load is derived from the already-collected pcie_rx / pcie_tx counters (KB/s) and normalized against the link's theoretical peak bandwidth (max_pcie_gen × max_pcie_link_width), so it fits the charts' fixed 0–100% scale.
  • Two new metrics, pcieRxRate and pcieTxRate, are selectable per GPU from the F2 → Chart setup window and persist in the configuration file.
  • The normalization math lives in a dedicated, unit-tested translation unit (src/pcie_utilization.c) rather than inline in interface.c.

Design note: max vs. current link

The metric is normalized against the maximum link bandwidth, not the currently negotiated link. GPUs down-train PCIe when idle (often to Gen1/x4), so normalizing against the live link would make tiny idle transfers look like high load and give a moving denominator. Using the max gives a stable "% of peak PCIe capability," matching the "spot the bottleneck" intent of #268. (The issue phrased it as "current link speed" — flagging this as a deliberate choice.)

Per-lane bandwidth basis

Effective per-lane data rates after PCIe encoding overhead (8b/10b for gen 1/2, 128b/130b for gen 3–5, FLIT for gen 6): 250 / 500 / 984.6 / 1969 / 3938 / 7563 MB/s.

Scope / limitations

  • No layout or header changes — zero impact on users who don't enable the metrics.
  • Only meaningful where the vendor backend reports both PCIe throughput and max_pcie_gen/max_pcie_link_width (NVIDIA, AMD, etc.); otherwise the line stays at 0, consistent with other unsupported metrics.
  • Still bound by MAX_LINES_PER_PLOT (4 metrics per plot).

Test plan

  • cmake -B build -DBUILD_TESTING=ON && cmake --build build — clean, no warnings.
  • ctest — 8/8 pass, including 4 new PcieUtilization cases (per-gen bandwidth, width scaling, unknown-gen/missing-field → 0, percent scaling + clamp to 100%).
  • Manual: ran nvbandwidth -t 0 -t 1 -b 256 -i 6 in a loop on an H200 NVL (Gen5 x16); enabled both metrics via F2. The GPU0 PCIe RX% / PCIe TX% lines peaked at ~95–99% under load and returned to 0 when idle, matching nvidia-smi dmon -s t (~60 GB/s of the 63 GB/s max).
  • Config round-trips: pcieRxRate / pcieTxRate written to / read from the config file via F12.

Screenshot of the chart with the PCIe lines.
nvtop-output

Made with Cursor

Plot the PCIe receive and transmit throughput as a percentage of the
maximum link bandwidth, alongside the existing GPU and memory
utilization graphs. The load is derived from the already-collected
pcie_rx/pcie_tx counters normalized against the link's theoretical peak
(max gen x width), so it shares the charts' 0-100% scale.

The two new metrics (pcieRxRate / pcieTxRate) can be toggled per GPU from
the F2 setup window and persist in the configuration file.

The normalization helpers live in a dedicated, unit-tested translation
unit (pcie_utilization.c).

Closes Syllo#268

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Graph for PCIe bus load

1 participant