Skip to content

Persist failed-job tray watermark across restart (both apps)#1173

Merged
erikdarlingdata merged 1 commit into
devfrom
feature/failed-job-watermark-persistence
Jun 19, 2026
Merged

Persist failed-job tray watermark across restart (both apps)#1173
erikdarlingdata merged 1 commit into
devfrom
feature/failed-job-watermark-persistence

Conversation

@erikdarlingdata

Copy link
Copy Markdown
Owner

Problem

The failed-Agent-job tray watermark (_lastAlertedFailedJobTime) was an in-memory dictionary in both apps and never persisted. Blocking/deadlock watermarks got restart persistence in #1145; the failed-job path never did. So on reopen the watermark was empty, every failure still inside the lookback window looked "new," and the toast re-fired for alerts the user had already seen and dismissed before the restart.

Fix (both apps, parity)

Persist the exact server-local newestFailure value and re-seed it on startup, mirroring #1145. The persisted value and the in-session compared value (FailedJobInfo.RunDateTime) always share a basis by construction — the UTC alert_time is never mixed in (the trap: RunDateTime is server-local, alert_time is host-UTC).

Lite (DuckDB): nullable watermark_time column on the existing config_edge_trigger_watermarks table (schema v30→v31 migration + fresh CREATE); Save/LoadFailedJobWatermarksAsync; seed in SeedEdgeTriggerWatermarksAsync, persist on fire. Count-watermark load filtered by watermark_time IS NULL so count and time rows stay cleanly separated.

Dashboard (JSON prefs): FailedJobAlertWatermarkTicks on UserPreferences (server-local time as DateTime.Ticks — basis-exact across JSON, no DateTimeKind drift); lazy-seed on first sweep, persist on fire via the existing SavePreferences pattern.

Result: after a reopen, already-alerted failures stay suppressed; only a genuinely new failure (including one that happened while the app was closed) alerts.

Tests

  • Lite store round-trip: exact-value preservation, upsert, no bleed between count and time rows.
  • Dashboard prefs round-trip + legacy-JSON-without-the-key.
  • Lite 545/545, Dashboard 491/491, both apps build clean.

Live verification (recommended post-merge)

The persistence layer is unit-tested; the seed→suppress path lives in MainWindow. Behavioral check: fire a failed-job toast → dismiss → close + reopen → confirm it does not re-fire for that failure, while a new failure still does.

🤖 Generated with Claude Code

The failed-Agent-job tray watermark (_lastAlertedFailedJobTime) was an
in-memory dictionary in both apps and was never persisted. Blocking and
deadlock watermarks got restart persistence in #1145, but the failed-job
path never did. So on reopen the watermark was empty, every failure still
inside the lookback window looked "new," and the toast re-fired for alerts
the user had already seen and dismissed before the restart.

Fix: persist the exact server-local newestFailure value and re-seed it on
startup, mirroring #1145. The persisted value and the in-session compared
value (FailedJobInfo.RunDateTime) always share a basis by construction --
the UTC alert_time is never mixed in.

Lite (DuckDB): add a nullable watermark_time column to the existing
config_edge_trigger_watermarks table (schema v30->v31 migration + fresh
CREATE); add Save/LoadFailedJobWatermarksAsync; seed in
SeedEdgeTriggerWatermarksAsync and persist on fire. The count-watermark
load is now filtered by watermark_time IS NULL so the count and time rows
stay cleanly separated.

Dashboard (JSON prefs): add FailedJobAlertWatermarkTicks to UserPreferences
(server-local time as DateTime.Ticks -- basis-exact across JSON, no
DateTimeKind drift); lazy-seed on first sweep and persist on fire via the
existing SavePreferences pattern.

Result: after a reopen, already-alerted failures stay suppressed; only a
genuinely new failure (including one that occurred while the app was closed)
alerts.

Tests: Lite store round-trip (exact-value preservation, upsert, no bleed
between count and time rows); Dashboard prefs round-trip + legacy-JSON.
Lite 545/545, Dashboard 491/491, both apps build clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@erikdarlingdata erikdarlingdata merged commit 59da155 into dev Jun 19, 2026
2 checks passed
@erikdarlingdata erikdarlingdata deleted the feature/failed-job-watermark-persistence branch June 19, 2026 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant