Skip to content

Add thread-safety contention tests for ISOProber stats and Scheduler …#77

Merged
wpak-ai merged 4 commits into
developfrom
test/thread-safety-contention
Jun 24, 2026
Merged

Add thread-safety contention tests for ISOProber stats and Scheduler …#77
wpak-ai merged 4 commits into
developfrom
test/thread-safety-contention

Conversation

@henry0816191

@henry0816191 henry0816191 commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Add tests/test_thread_safety.py with OS-thread stress tests for lock-protected paths in ISOProber and Scheduler.
  • Verify ISOProber._stats_lock under concurrent _bump_stat() / _reset_stats() / snapshot_stats() access.
  • Verify Scheduler._health_lock under concurrent _publish_health_snapshot() (single writer) and health_snapshot() reads (multiple reader threads), matching the production event-loop writer / health-server reader model.

Context

T13 (week 4). ISOProber._stats and Scheduler._health_snapshot are guarded by threading.Lock, but previously had no contention tests. Existing coverage was either async-only (test_run_cycle_stats_integrity_under_concurrency) or used a HTTP stand-in (test_health_snapshot_consistent_under_concurrent_updates) rather than the real Scheduler.

No production code changes.

Tests added

Test What it verifies
test_concurrent_bump_stat_totals 32 barrier-synchronized threads × 100 bumps → miss == 3200
test_snapshot_stats_consistent_under_concurrent_reset Resetter + snapshotters + bumpers (1000 iterations each); every snapshot has full key set and non-negative ints
test_health_snapshot_consistent_under_concurrent_publish 1 writer + 6 readers; each snapshot has required keys and probe_success_rate == _compute_probe_success_rate(probe_stats)

Test plan

  • uv run pytest tests/test_thread_safety.py -v
  • uv run pytest tests/ --cov=paperscout --cov-fail-under=90

Related issues

Summary by CodeRabbit

  • Tests
    • Added OS-thread contention stress tests to validate thread safety for probe statistics and scheduler health snapshotting.
    • Exercise concurrent stat increments, resets, and repeated snapshot reads to confirm no thread exceptions and stable, schema-consistent results.
    • Verifies aggregated probe count totals and that health snapshot metrics (including success-rate) remain consistent with snapshot calculations during concurrent publishing.

@henry0816191 henry0816191 self-assigned this Jun 24, 2026
@henry0816191 henry0816191 requested a review from wpak-ai as a code owner June 24, 2026 18:24
@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f8343ac4-9f7b-4e22-9e36-56ddae46d650

📥 Commits

Reviewing files that changed from the base of the PR and between 01a6b1e and b4f5a92.

📒 Files selected for processing (1)
  • tests/test_thread_safety.py

📝 Walkthrough

Walkthrough

A new test module adds multithreaded stress tests for ISOProber and Scheduler. It includes shared helpers and constants, then verifies concurrent stat updates, stat resets with snapshots, and health snapshot consistency under concurrent publish and read activity.

Changes

Thread Safety Contention Tests

Layer / File(s) Summary
Test infrastructure and invariants
tests/test_thread_safety.py
Defines stress-loop constants, mock-based factories for ISOProber and Scheduler, a bounded thread-join helper, and invariant checks for probe stats and health snapshots.
ISOProber stats contention
tests/test_thread_safety.py
Runs concurrent _bump_stat("miss") calls and concurrent _reset_stats() plus snapshot_stats() access, then checks aggregated totals and snapshot invariants across threads.
Scheduler health snapshot contention
tests/test_thread_safety.py
Runs a writer thread that mutates scheduler health state and calls _publish_health_snapshot() while reader threads call health_snapshot() and validate snapshot schema and value consistency.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 Hop, hop—threads race in a ring,
Locks hold steady, snapshots sing.
Misses count and health stays true,
In every thread, the tests march through.
A cozy warren, safe and bright,
With contention tamed just right.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and accurately describes the new thread-safety contention tests.
Description check ✅ Passed The PR description matches the template with Summary, Test plan, and Related issues sections filled in.
Linked Issues check ✅ Passed The new tests satisfy #74 by covering concurrent bump totals, reset/snapshot consistency, and health_snapshot from a non-event-loop thread.
Out of Scope Changes check ✅ Passed The changes stay within scope: a single new test module plus helper code, with no production changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/thread-safety-contention

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_thread_safety.py`:
- Around line 121-122: The worker exception handling is too broad and catches
fatal control-flow exceptions; in the thread worker blocks in
test_thread_safety.py, replace the BaseException handlers with Exception so
KeyboardInterrupt and SystemExit are not swallowed, and update the error
collection type annotation used by record_error/its error list to match
Exception rather than BaseException in each affected worker section.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1050a0ce-15b6-465d-9e7a-eeebd37d5bd3

📥 Commits

Reviewing files that changed from the base of the PR and between f0766ed and ebbdcd5.

📒 Files selected for processing (1)
  • tests/test_thread_safety.py

Comment thread tests/test_thread_safety.py Outdated
@henry0816191

Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review again

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== tests/test_thread_safety.py ==="
cat -n tests/test_thread_safety.py

echo ""
echo "=== Finding ISOProber source ==="
fd -e py | xargs grep -l "class ISOProber" 2>/dev/null || true

echo ""
echo "=== Finding Scheduler source ==="
fd -e py | xargs grep -l "class Scheduler" 2>/dev/null || true

echo ""
echo "=== Listing all py files ==="
fd -e py | sort

Length of output: 10487


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== conftest.py ==="
fd conftest.py | xargs cat -n 2>/dev/null || true

echo ""
echo "=== existing tests ==="
fd -e py . tests/ | sort | head -30

Length of output: 17744


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== ISOProber relevant parts ==="
ast-grep outline src/paperscout/sources.py

echo ""
echo "=== Scheduler relevant parts ==="
ast-grep outline src/paperscout/monitor.py

Length of output: 2804


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== ISOProber: _stats_lock, _bump_stat, _reset_stats, snapshot_stats, _STATS_TEMPLATE ==="
rg -n "_stats_lock|_bump_stat|_reset_stats|snapshot_stats|_STATS_TEMPLATE" src/paperscout/sources.py -A3 -B1

echo ""
echo "=== Scheduler: _health_lock, _publish_health_snapshot, health_snapshot ==="
rg -n "_health_lock|_publish_health_snapshot|health_snapshot|_compute_probe_success_rate" src/paperscout/monitor.py -A5 -B1

echo ""
echo "=== _compute_probe_success_rate ==="
rg -n "_compute_probe_success_rate" src/paperscout/monitor.py -A10

Length of output: 7576



@henry0816191 Here is a full review of the PR.


Review of tests/test_thread_safety.py

The approach is sound: all three tests exercise the real lock code paths under OS-thread concurrency, the stat/health invariants are well-chosen, and the helper structure is clean. Several things deserve attention before merging.


1. Missing threading.Barrier in test_snapshot_stats_consistent_under_concurrent_reset (Line 140–148)

Without a barrier, the 5 threads start sequentially. The resetter may finish before the snapshotters or bumpers have even begun, which reduces peak contention and makes the test less likely to catch a real race. Adding a barrier maximises overlap:

def test_snapshot_stats_consistent_under_concurrent_reset(self, fake_pool):
    prober = _make_prober(fake_pool)
    errors: list[Exception] = []
    errors_lock = threading.Lock()
    n_threads = 5
    barrier = threading.Barrier(n_threads)   # ← add

    def resetter() -> None:
        try:
            barrier.wait()                   # ← add to each worker
            for _ in range(RESET_ITERATIONS):
                prober._reset_stats()
        except Exception as exc:
            record_error(exc)
    # ... same pattern for snapshotter and bumper

test_concurrent_bump_stat_totals already does this correctly with its threading.Barrier(n_threads).


2. _assert_health_snapshot_consistent leaves several fields unchecked (Line 78–85)

last_cycle_status, last_cycle_error, and probe_success_rate are all present in _HEALTH_SNAPSHOT_KEYS (so the key-set assertion fires), but their types are never validated:

def _assert_health_snapshot_consistent(snap: dict) -> None:
    assert set(snap.keys()) == _HEALTH_SNAPSHOT_KEYS
    assert isinstance(snap["poll_count"], int) and snap["poll_count"] >= 0
    assert isinstance(snap["probe_stats"], dict)
    assert all(isinstance(v, int) for v in snap["probe_stats"].values())
    assert snap["probe_success_rate"] == _compute_probe_success_rate(snap["probe_stats"])
    last_updated = snap["last_updated"]
    assert last_updated is None or (isinstance(last_updated, str) and len(last_updated) > 0)
    # Missing:
    assert snap["probe_success_rate"] is None or isinstance(snap["probe_success_rate"], float)
    assert snap["last_cycle_status"] is None or isinstance(snap["last_cycle_status"], CycleStatus)
    # last_cycle_error: None or str
    assert snap["last_cycle_error"] is None or isinstance(snap["last_cycle_error"], str)

3. Writer directly mutates private scheduler attributes (Lines 159–163, 175–189)

The test bypasses any API and writes to scheduler._poll_count, scheduler._last_probe_stats, etc. directly from a thread. This is fine for today's internals, but it means:

  • If Scheduler.__init__ changes a field name, the test silently continues passing (it just won't exercise the new path).
  • Under free-threaded CPython 3.13+, these bare attribute writes from one thread while _publish_health_snapshot reads them in the same thread are still safe (sequentially ordered within the writer thread), but it's worth a comment explaining why they are intentionally unguarded.

Consider adding an inline comment:

# Writer and _publish_health_snapshot() run on the same thread, so these
# unguarded attribute writes happen-before the corresponding reads in
# _publish_health_snapshot(). The lock under test only guards _health_snapshot
# assignment/read between writer and reader threads.
scheduler._poll_count = i + 1

4. THREAD_JOIN_TIMEOUT = 5.0 may be too short in constrained CI (Line 33)

5 seconds is tight when 32 threads × 100 bumps share a single GIL lock, especially on heavily loaded CI runners. Consider raising to 10.0 or more. If the test is expected to be fast, add a CI-friendly note about what to do on timeout.


5. Missing slow/stress test marker (module level)

These tests spawn up to 32 OS threads and run thousands of lock acquisitions. Developers running the fast feedback loop might prefer to skip them. Consider adding a pytest.mark:

pytestmark = pytest.mark.thread_safety   # register in pyproject.toml/pytest.ini

or at a minimum document in the module docstring that they can be filtered with -m not thread_safety.


6. FakePool is not thread-safe (informational)

FakePool._FakeCursor and _FakeStore use plain Python dicts with no locking. In the current tests no pool methods are called concurrently (stat/health operations never touch the pool), so this is not a bug today. If future tests extend this pattern to probe-cycle contention, they will need a thread-safe pool. A docstring note would prevent a future false-sense-of-safety issue:

# NOTE: FakePool is not thread-safe. These tests never call pool methods
# concurrently; they only stress the _stats_lock and _health_lock paths.

Nits

  • Line 101: prober._bump_stat("miss") only — exercising a single stat key is enough to prove lock safety, but bumping a variety of keys (each thread a different key) would additionally guard against accidental dict key creation bugs.
  • Line 20: ISOProber._STATS_TEMPLATE is accessed at module import time. If that class attribute is ever made an instance attribute this will silently return the wrong thing; a short comment noting this assumption helps.

Summary

# Severity Item
1 Medium Missing threading.Barrier in test_snapshot_stats_consistent_under_concurrent_reset
2 Low _assert_health_snapshot_consistent leaves last_cycle_status, last_cycle_error, probe_success_rate type-unchecked
3 Low Direct mutation of private scheduler fields needs an explanatory comment
4 Low THREAD_JOIN_TIMEOUT = 5.0 may be insufficient on slow CI
5 Low No pytest.mark to allow filtering stress tests
6 Info FakePool is not thread-safe; fine today, needs a note for future contributors

The most impactful fix is item 1 — without a barrier, the resetter/snapshot/bump contention test has much lower adversarial overlap than intended.

✅ Action performed

Full review finished.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
tests/test_thread_safety.py (4)

33-33: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Consider a more generous join timeout for slower CI.

A 5s join budget can be tight on heavily loaded or free-threaded CI runners executing thousands of contended iterations, leading to flaky did not finish failures rather than real defects. Bumping to ~15–30s (or sourcing from an env var) keeps the safety net while reducing false negatives.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_thread_safety.py` at line 33, The thread join timeout in the
thread-safety test is too tight for slower CI environments. Update the
THREAD_JOIN_TIMEOUT constant in test_thread_safety.py to a more generous value,
or make it configurable via an environment variable, so the join logic in the
test remains stable under heavy load without causing flaky “did not finish”
failures.

172-189: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Add a brief note on the direct private-attribute writes.

The writer thread deliberately bypasses the normal cycle flow to mutate _poll_count/_last_probe_stats/_last_cycle_status/_last_successful_poll and call _publish_health_snapshot() directly. A one-line comment stating this intentionally emulates the event-loop writer will prevent future readers from mistaking it for a test reaching into internals by accident.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_thread_safety.py` around lines 172 - 189, Add a brief inline note
in the writer helper to explain that it intentionally writes the private
scheduler state directly and then calls _publish_health_snapshot() to emulate
the event-loop writer path; mention the specific private attributes _poll_count,
_last_probe_stats, _last_cycle_status, and _last_successful_poll so future
readers know this is deliberate and not accidental test coupling.

78-86: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Strengthen the snapshot invariant checks.

The helper validates poll_count, probe_stats, probe_success_rate, and last_updated, but skips last_cycle_status, last_cycle_error, and last_successful_poll. Since the writer mutates these under contention (lines 185-188), asserting their types would catch a torn/inconsistent snapshot that the current checks would miss.

♻️ Suggested additional assertions
     assert snap["probe_success_rate"] == _compute_probe_success_rate(snap["probe_stats"])
     last_updated = snap["last_updated"]
     assert last_updated is None or (isinstance(last_updated, str) and len(last_updated) > 0)
+    assert snap["last_cycle_status"] is None or isinstance(snap["last_cycle_status"], CycleStatus)
+    assert snap["last_cycle_error"] is None or isinstance(snap["last_cycle_error"], str)
+    last_poll = snap["last_successful_poll"]
+    assert last_poll is None or isinstance(last_poll, (int, float, str))

Adjust the expected types to match health_snapshot()'s actual output contract.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_thread_safety.py` around lines 78 - 86, The snapshot validator in
_assert_health_snapshot_consistent only checks part of the health snapshot
contract, so extend it to assert the types of last_cycle_status,
last_cycle_error, and last_successful_poll as well. Update the expectations to
match health_snapshot()’s actual output shape by verifying these fields are
present and have the correct types (including allowed None cases where
applicable), alongside the existing checks for poll_count, probe_stats,
probe_success_rate, and last_updated.

140-149: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Add a threading.Barrier so workers actually contend.

Threads are started sequentially, and the resetter does the least work per iteration, so it can finish (or mostly finish) its RESET_ITERATIONS before the snapshotters/bumpers ramp up, sharply reducing the overlap this test is meant to exercise. Gating all five workers behind a barrier (as in test_concurrent_bump_stat_totals) guarantees simultaneous contention on _stats_lock.

♻️ Suggested change
         prober = _make_prober(fake_pool)
         errors: list[Exception] = []
         errors_lock = threading.Lock()
+        barrier = threading.Barrier(5)

         def record_error(exc: Exception) -> None:
             with errors_lock:
                 errors.append(exc)

         def resetter() -> None:
             try:
+                barrier.wait()
                 for _ in range(RESET_ITERATIONS):
                     prober._reset_stats()
             except Exception as exc:
                 record_error(exc)

         def snapshotter() -> None:
             try:
+                barrier.wait()
                 for _ in range(SNAPSHOT_ITERATIONS):
                     _assert_valid_probe_stats(prober.snapshot_stats())
             except Exception as exc:
                 record_error(exc)

         def bumper() -> None:
             try:
+                barrier.wait()
                 for _ in range(BUMPER_ITERATIONS):
                     prober._bump_stat("miss")
             except Exception as exc:
                 record_error(exc)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_thread_safety.py` around lines 140 - 149, The concurrency setup in
the test does not force real overlap, so the workers may not contend on
_stats_lock as intended. Update the thread worker flow in test_thread_safety,
using threading.Barrier the same way as test_concurrent_bump_stat_totals, so
resetter, snapshotter, and bumper threads all wait and start together before
entering their loops. Keep the change localized to the thread setup and worker
start logic so the existing resetter, snapshotter, bumper, and _join_threads
structure remains intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_thread_safety.py`:
- Line 33: The thread join timeout in the thread-safety test is too tight for
slower CI environments. Update the THREAD_JOIN_TIMEOUT constant in
test_thread_safety.py to a more generous value, or make it configurable via an
environment variable, so the join logic in the test remains stable under heavy
load without causing flaky “did not finish” failures.
- Around line 172-189: Add a brief inline note in the writer helper to explain
that it intentionally writes the private scheduler state directly and then calls
_publish_health_snapshot() to emulate the event-loop writer path; mention the
specific private attributes _poll_count, _last_probe_stats, _last_cycle_status,
and _last_successful_poll so future readers know this is deliberate and not
accidental test coupling.
- Around line 78-86: The snapshot validator in
_assert_health_snapshot_consistent only checks part of the health snapshot
contract, so extend it to assert the types of last_cycle_status,
last_cycle_error, and last_successful_poll as well. Update the expectations to
match health_snapshot()’s actual output shape by verifying these fields are
present and have the correct types (including allowed None cases where
applicable), alongside the existing checks for poll_count, probe_stats,
probe_success_rate, and last_updated.
- Around line 140-149: The concurrency setup in the test does not force real
overlap, so the workers may not contend on _stats_lock as intended. Update the
thread worker flow in test_thread_safety, using threading.Barrier the same way
as test_concurrent_bump_stat_totals, so resetter, snapshotter, and bumper
threads all wait and start together before entering their loops. Keep the change
localized to the thread setup and worker start logic so the existing resetter,
snapshotter, bumper, and _join_threads structure remains intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 01ad4a36-6c25-4e43-a607-c6f2903402ae

📥 Commits

Reviewing files that changed from the base of the PR and between f0766ed and 01a6b1e.

📒 Files selected for processing (1)
  • tests/test_thread_safety.py

…so reset/snapshot/bump threads start concurrently.

- Extended _assert_health_snapshot_consistent with str/None checks for last_cycle_status, last_cycle_error, and last_successful_poll, plus a float check for probe_success_rate (using str for status, not CycleStatus, matching production).

- Documented that same-thread attribute writes happen-before _publish_health_snapshot() reads; _health_lock only guards cross-thread snapshot access.

- Added module docstring NOTE that FakePool is not thread-safe and these tests don't call pool methods concurrently.
@wpak-ai wpak-ai merged commit 3a7ef30 into develop Jun 24, 2026
10 checks passed
@wpak-ai wpak-ai deleted the test/thread-safety-contention branch June 24, 2026 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Thread safety contention tests (3pt)

2 participants