[CI][DO-NOT-MERGE] Modularize the per-file test runner into planner/executor/aggregator#6160
Draft
hujc7 wants to merge 25 commits into
Draft
[CI][DO-NOT-MERGE] Modularize the per-file test runner into planner/executor/aggregator#6160hujc7 wants to merge 25 commits into
hujc7 wants to merge 25 commits into
Conversation
Most test callers pass both ``sim_cfg=`` and ``device=`` to
:func:`isaaclab.sim.build_simulation_context`, implicitly expecting the
``device`` kwarg to win. The helper previously dropped the kwarg silently
when ``sim_cfg`` was provided, causing warp kernel-launch device
mismatches on non-default GPUs: the test fixture allocated ``env_ids``
on the requested device while the articulation's ``self.device``
resolved from the untouched ``sim_cfg`` default (``cuda:0``), and
``wp.launch(..., device=self.device)`` failed with::
RuntimeError: Error launching kernel 'set_root_link_pose_to_sim_index',
trying to launch on device='cuda:0',
but input array for argument 'env_ids' is on device=cuda:2.
Change ``device``'s default to ``None`` (sentinel) and apply it as an
override after sim_cfg construction in both branches. The one test that
asserted the old "sim_cfg overrides everything" contract is updated to
cover the new override semantics.
Add an ISAACLAB_PIN_KIT_GPU env var to AppLauncher. When truthy, it appends Kit command-line overrides that pin the renderer to a single GPU (renderer.multiGpu.enabled=False, autoEnable=False, maxGpuCount=1) and disable the fabric GPU-interop path (physics.fabricUseGPUInterop= false), so each Kit process touches only its assigned GPU instead of enumerating every visible GPU at startup. Used by the multi-GPU CI workflow to avoid a shared GPU-interop context across concurrent sibling shards, which otherwise surfaces as "Stage X already attached" errors and SimulationApp.close hangs (see isaac-sim#3475). Off by default; single-GPU and user-facing rendering paths are unchanged.
Adds the scope-intersect-runtime device selection helper (test_devices) and its unit test, so unit tests can declare the devices they are valid on and the multi-GPU lane can narrow them via ISAACLAB_TEST_DEVICES.
Pins torch and Warp to the target device before allocations and scopes the CUDA graph capture to it, so Newton runs correctly on cuda:1+ (issue isaac-sim#5132).
…test tools/conftest.py gains a directory-rename work queue (claim/inflight/done) for work-stealing across shards and a per-file report slug to avoid JUnit collisions between same-basename files. A workspace-root conftest skips non-device-parametrized tests on non-default cuda shards, since single-GPU CI already covers them on cuda:0.
…ions Adds an extra-env-vars input so the multi-GPU workflow can inject ISAACLAB_TEST_DEVICES / ISAACLAB_SIM_DEVICE into the test container.
Adds a workflow that runs the unit-test suite across non-default cuda devices in one container with N parallel pytest shards pulling from a shared work queue, plus the inside-container shard runner it mounts.
Switches device-parametrized unit tests to test_devices() so they also run on the non-default GPUs in the multi-GPU lane. Mechanical scope change only; no test logic changes.
…cher When the caller does not pass an explicit device, AppLauncher reads ISAACLAB_SIM_DEVICE and uses it as the device. Lets the multi-GPU CI lane boot Kit on a non-default GPU without editing every test's AppLauncher() call site.
Speeds up iteration on the multi-GPU lane: forces run_docker_tests=false in build.yaml and gates docs.yaml / install-ci.yml behind a DO-NOT-MERGE PR-title check. Revert this commit before the PR is merged.
The test job's runs-on carried the `gpu` label, which in the self-hosted fleet tags single-GPU runners. Requiring both `gpu` and `multi-gpu` routed the job onto a single-GPU box, so the shard runner aborted with "Need at least 2 visible devices; found 1"; the over-constrained label also left the job queued for hours. Drop `gpu` so the job targets the multi-GPU pool.
Move the host-orchestration bash (symlink, work-queue seed, MIG
detection, docker run, reconciler) and the JUnit-XML aggregation Python
out of the workflow's inline run: blocks into version-controlled,
lint-able scripts under .github/actions/multi-gpu/, alongside the
existing multi_gpu_shard_runner.sh. The workflow drops from 456 to 181
lines and each step becomes a one-line call.
Behavior-preserving: data still flows via step env, $GITHUB_OUTPUT,
$GITHUB_ENV, and $GITHUB_STEP_SUMMARY; the container --name now uses
the runner built-ins $GITHUB_RUN_ID/$GITHUB_RUN_ATTEMPT in place of the
YAML-only ${{ github.run_id }} templating. Also documents why the test
job must not carry the gpu label.
The ECR cache repo is resolved per runner pool (single-GPU runners -> gitci-docker-cache; multi-GPU runners -> multigpu-docker-cache). Building on [self-hosted, gpu] pushed the image to gitci-docker-cache, which the multi-GPU test job cannot see, so it rebuilt from scratch (~27 min) on the multi-GPU runner inside its pull step. Build on the multi-GPU pool so the image lands in multigpu-docker-cache and the test job's pull hits.
Move non-default-shard device selection out of a repo-root conftest.py into a scoped pytest plugin (mgpu_shard_select) that tools/conftest.py injects per file only on multi-GPU shards. The plugin keys off ISAACLAB_TEST_DEVICES so keep/drop matches each test's device parametrization, deselects out-of-scope variants (cpu, cuda:0, other-index), and maps the all-deselected NO_TESTS_COLLECTED exit to OK. Confines the behavior to the lane instead of a global root conftest.
Annotate the uncommon bash idioms in the two multi-GPU orchestration scripts for non-bash readers. Remove the now-dead py-spy/gdb hang-capture remnants -- the SYS_PTRACE cap-add and the py-spy pip install -- along with the matching changelog bullet, since the conftest-side capture they supported was removed.
…ests actions" This reverts commit 6cef107.
# Conflicts: # source/isaaclab_ovphysx/test/assets/test_articulation.py # source/isaaclab_ovphysx/test/assets/test_rigid_object.py # source/isaaclab_ovphysx/test/assets/test_rigid_object_collection.py # tools/conftest.py
Introduce tools/_device_plan.py: a pure planner that turns a runner's files plus its concrete runtime device mask into (file, mask) work units, splitting a device_isolated file into one single-device unit per device only when the runtime spans more than one device. Includes is_isolated() marker detection by source regex (no import). Covered by 14 unit tests. This is the first extraction of the multi-GPU / device-split runner refactor; nothing is wired into conftest yet, so runner behavior is unchanged.
Replace the device_split (-k) and mgpu_shard_select mechanisms with one model: the planner emits (file, mask) units and the executor sets the unit's mask as ISAACLAB_TEST_DEVICES so test_devices() does all device-variant selection. A device_isolated file (renamed from device_split) splits into one single-device unit per device only when the runtime spans more than one; an mgpu shard, with a single-device runtime, runs each file once. - Add tools/_device_exec.py: the executor (subprocess lifecycle, timeouts, retries, JUnit parsing) plus run_unit(); conftest's run_individual_tests is now a thin planner+executor driver. - Add tools/_agnostic_select.py: a dumb plugin that drops device-agnostic tests, injected by the executor only for cpu-less masks (shards and the cuda unit of a split). Replaces mgpu_shard_select's agnostic-drop; its variant filtering is now handled by test_devices()+mask. - Remove tools/_device_split.py (DEVICE_SPLIT_PASSES, -k) and .github/actions/multi-gpu/mgpu_shard_select.py. - Rename the device_split marker to device_isolated across the ovphysx tests and the pytest marker registration. Behavior is preserved: the same tests run on the same devices on both lanes; the single-GPU lane still uses no Kit GPU pin and mgpu still pins via the shell.
Pull the per-unit command/env construction out of run_unit into a pure _build_unit_cmd() and cover it with deterministic tests (no subprocess, GPU, or Kit): the unit's mask becomes ISAACLAB_TEST_DEVICES, the _agnostic_select plugin is injected only for a cpu-less mask, -k is never used, and split units get a suffixed report name while single units keep the aggregator-friendly one. This is the behavior-equivalence guard for the planner+executor refactor.
The first refactor assumed test_devices()+mask did all device-variant selection,
so the plugin only dropped device-agnostic tests. But tests with a literal device
list (e.g. parametrize("device", ["cuda:0", "cpu"])) ignore the mask, so their
out-of-scope variant leaked into the wrong split unit and tripped ovphysx's
process-global device lock (RuntimeError: locked to 'gpu', cannot switch to 'cpu').
Restore per-unit device filtering: rename the plugin to _device_select and have it
keep only the tests whose device param is in the unit's mask (plus the agnostic
rule), reading the structured param so it catches literal lists too. The executor
injects it for both halves of a split and for shards. This reproduces the old
-k 'cpu or not cuda' / -k 'cuda' selection exactly.
Begin rebuilding the per-file CI runner as a tools/test_runner/ package, one module per lifecycle stage, driven by a single RunnerConfig that turns the scattered constants and env reads into configuration: - planning.py: RunnerConfig (with from_env), Unit, Planner (device_isolated split). - process.py: ProcessCapture (subprocess + hang/timeout + diagnostics), JUnitReport. - execution.py: ExecContext, UnitRunner (build cmd, drive capture, classify, retry), merge_statuses. - selector.py: DeviceSelect, the in-process device-mask plugin as a class with documented hooks and a one-line registrar. Additive only — conftest still uses the current modules; the wiring and removal of the old _device_* modules follow. 25 unit tests, ruff clean.
conftest.py is now two documented hooks that build a RunnerConfig from the environment and run the Session lifecycle (collect -> plan -> execute -> report), exiting with the runner's code. Add session.py (WorkQueue, Collector, Reporter, Session) completing the framework, and delete the flat _device_plan/_device_exec/ _device_select modules and their tests. Behavior is preserved: the same tests run on the same devices on both lanes, verified by a literal-device-param e2e (the cpu unit drops cuda variants, the cuda unit drops cpu + agnostic) and 32 framework unit tests. ruff clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tools/conftest.py) into three single-purpose modules and unifies device selection ontotest_devices()+ the device mask, removing the parallel-k(device_split) andmgpu_shard_selectmechanisms.tools/conftest.pyshrinks ~1240 → 499 lines; the runner becomes a thin plan→execute driver.1. Modules
tools/_device_plan.py— pure planner: turns (files, runtime mask, marker) into(file, mask)units. Adevice_isolatedfile splits into one single-device unit per device only when the runtime spans more than one device.tools/_device_exec.py— executor: per-unit subprocess lifecycle (timeouts, startup-hang/crash retries, JUnit parsing) plusrun_unit. Sets the unit's mask asISAACLAB_TEST_DEVICES.tools/_agnostic_select.py— a small plugin that drops device-agnostic tests, injected by the executor only for a cpu-less mask (mgpu shards and the cuda unit of a split). Replacesmgpu_shard_select's agnostic-drop; its variant filtering is now done bytest_devices()+ mask.2. Removed / renamed
tools/_device_split.py(DEVICE_SPLIT_PASSES,-k) and.github/actions/multi-gpu/mgpu_shard_select.py.device_splitmarker todevice_isolatedacross the ovphysx tests and the pytest marker registration.3. Behavior preservation
ISAACLAB_TEST_DEVICESexplicitly (equivalent to the prior unset default of cpu + cuda:0); the Kit GPU pin stays mgpu-only via the shell, unchanged.-kis gone; device-variant selection is the mask viatest_devices(). The agnostic-drop plugin reproduces the prior cpu-pass-keeps / cuda-and-shard-drops behavior.4. Validation
run_unitdrives a real pytest subprocess, parses its report, and the agnostic plugin deselects paramless tests on a cpu-less mask.