Persist failed-job tray watermark across restart (both apps)#1173
Merged
Conversation
The failed-Agent-job tray watermark (_lastAlertedFailedJobTime) was an in-memory dictionary in both apps and was never persisted. Blocking and deadlock watermarks got restart persistence in #1145, but the failed-job path never did. So on reopen the watermark was empty, every failure still inside the lookback window looked "new," and the toast re-fired for alerts the user had already seen and dismissed before the restart. Fix: persist the exact server-local newestFailure value and re-seed it on startup, mirroring #1145. The persisted value and the in-session compared value (FailedJobInfo.RunDateTime) always share a basis by construction -- the UTC alert_time is never mixed in. Lite (DuckDB): add a nullable watermark_time column to the existing config_edge_trigger_watermarks table (schema v30->v31 migration + fresh CREATE); add Save/LoadFailedJobWatermarksAsync; seed in SeedEdgeTriggerWatermarksAsync and persist on fire. The count-watermark load is now filtered by watermark_time IS NULL so the count and time rows stay cleanly separated. Dashboard (JSON prefs): add FailedJobAlertWatermarkTicks to UserPreferences (server-local time as DateTime.Ticks -- basis-exact across JSON, no DateTimeKind drift); lazy-seed on first sweep and persist on fire via the existing SavePreferences pattern. Result: after a reopen, already-alerted failures stay suppressed; only a genuinely new failure (including one that occurred while the app was closed) alerts. Tests: Lite store round-trip (exact-value preservation, upsert, no bleed between count and time rows); Dashboard prefs round-trip + legacy-JSON. Lite 545/545, Dashboard 491/491, both apps build clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The failed-Agent-job tray watermark (
_lastAlertedFailedJobTime) was an in-memory dictionary in both apps and never persisted. Blocking/deadlock watermarks got restart persistence in #1145; the failed-job path never did. So on reopen the watermark was empty, every failure still inside the lookback window looked "new," and the toast re-fired for alerts the user had already seen and dismissed before the restart.Fix (both apps, parity)
Persist the exact server-local
newestFailurevalue and re-seed it on startup, mirroring #1145. The persisted value and the in-session compared value (FailedJobInfo.RunDateTime) always share a basis by construction — the UTCalert_timeis never mixed in (the trap:RunDateTimeis server-local,alert_timeis host-UTC).Lite (DuckDB): nullable
watermark_timecolumn on the existingconfig_edge_trigger_watermarkstable (schema v30→v31 migration + freshCREATE);Save/LoadFailedJobWatermarksAsync; seed inSeedEdgeTriggerWatermarksAsync, persist on fire. Count-watermark load filtered bywatermark_time IS NULLso count and time rows stay cleanly separated.Dashboard (JSON prefs):
FailedJobAlertWatermarkTicksonUserPreferences(server-local time asDateTime.Ticks— basis-exact across JSON, noDateTimeKinddrift); lazy-seed on first sweep, persist on fire via the existingSavePreferencespattern.Result: after a reopen, already-alerted failures stay suppressed; only a genuinely new failure (including one that happened while the app was closed) alerts.
Tests
Live verification (recommended post-merge)
The persistence layer is unit-tested; the seed→suppress path lives in
MainWindow. Behavioral check: fire a failed-job toast → dismiss → close + reopen → confirm it does not re-fire for that failure, while a new failure still does.🤖 Generated with Claude Code