Add finding lifecycle provenance events (3/3)#15152
Open
devGregA wants to merge 2 commits into
Open
Conversation
New append-only model Finding_Lifecycle_Event records the SEMANTIC
transitions in a finding's life - the "why" behind a large class of
support questions:
- created: by which import/reimport (only findings actually created)
- closed: by close_old_findings / re-upload, with the reason
- reopened: reactivated by a re-upload
- marked_duplicate: of which original (covers batch dedupe and
transitive re-points)
- pushed_jira: as which issue key
This complements, and deliberately does not duplicate, existing
history: pghistory captures field-level diffs and
Test_Import_Finding_Action records per-import actions; neither can
express why a transition happened.
Exposed read-only at /api/v2/findings/{id}/lifecycle_events/.
Performance design: transition-only writes (a matched-unchanged
reimport writes zero rows - tested), bulk_create at the existing
1000-finding import batch boundaries, no signals, detail values
truncated. The FK carries no DB constraint with on_delete=DO_NOTHING
so bulk finding deletion never touches this table; orphans and old
events are swept by a nightly retention purge task
(DD_FINDING_LIFECYCLE_EVENTS_RETENTION_DAYS, default 540). Kill
switch: DD_FINDING_LIFECYCLE_EVENTS_ENABLED. Two indexes: the
(finding, created) timeline read and (created) for the purge.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The provenance ledger writes exactly one query per transition, and the baselines now document that cost precisely: - +1 on steps that create findings (one bulk_create of CREATED events per import batch) - +0 on unchanged-match reimports (transition-only discipline, now enforced by the perf baselines as well as the unit test) - +1 per finding closed by close_old_findings (CLOSED event inside mitigate_finding, alongside the ~10 queries a close already costs) - +1 per duplicate marked by dedupe (MARKED_DUPLICATE event in set_duplicate) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR and #15150 both update the pinned query-count baselines in
unittests/test_importers_performance.pyandunittests/test_tag_inheritance_perf.py, each measured againstdevwithout the other. They are functionally independent and can merge in either order — but whichever merges second needs a rebase with re-measured baselines, because the deltas add up. Example:EXPECTED_ZAP_IMPORT_V2is 287 on dev, 288 in each PR alone, and 289 once both are in (+1 for this PR's created-events batch, +1 for #15150's batch stamp). I'll push the combined-baseline rebase on whichever PR lands second.Description
Adds a finding lifecycle provenance ledger: an append-only record of the semantic transitions in a finding's life, answering the questions behind a large class of support tickets — "why did this finding close?", "why is this a duplicate?", "when was this ticketed?"
New model
Finding_Lifecycle_Event(finding,actor_type,action,detailJSON,created), written at five capture points:createdclosedmitigate_finding(close_old_findings / re-upload)reopenedmarked_duplicateset_duplicate(covers batch dedupe; transitive re-points record their own event)pushed_jiraadd_jira_issuesuccessThis complements — deliberately does not duplicate — existing history: pghistory triggers capture field-level diffs, and
Test_Import_Finding_Actionrecords per-import actions. Neither can express why: which key matched, which re-upload closed it, what it's a duplicate of. That's what this table records.API:
GET /api/v2/findings/{id}/lifecycle_events/— the finding's provenance timeline, newest first (read-only).Performance / operational design (this table must never become a problem):
bulk_createevents at the existing 1,000-finding batch boundaries; no signals, no per-row saves; detail values truncated to 256 chars.db_constraint=False+on_delete=DO_NOTHING, so bulk finding deletion never touches this table (no ORM cascade collection, no DB cascade). Orphans are swept by retention.(finding, created)for the timeline read;(created)for the purge.DD_FINDING_LIFECYCLE_EVENTS_RETENTION_DAYS(default 540), batched deletes.DD_FINDING_LIFECYCLE_EVENTS_ENABLED(default true) turns all writes into no-ops.test_importers_performance.pyandtest_tag_inheritance_perf.pyare updated accordingly and now double as a regression tripwire for this table's write discipline.Test results
New module
unittests/test_finding_lifecycle_events.py(5 tests), including a full reimport cycle over the semgrep close-old fixtures (created → matched-with-zero-events → closed with reason on unique-id change → reactivated), dedupe originals, the API endpoint, retention purge, and the kill switch.Regression:
test_importers_closeold,test_importers_deduplication,test_deduplication_logic(135 tests) andtest_rest_framework.FindingsTest(27 tests) all pass against PostgreSQL via the unit-test compose image.makemigrations --checkclean; ruff (0.15.20, repo config) clean.Not covered by an automated test: the
pushed_jiraevent (requires a mocked JIRA stack; the capture is three lines on the existing success path). Happy to extend a JIRA test if maintainers prefer.Documentation
Additive feature; the API action is schema-annotated. Happy to add a docs page (finding lifecycle events + settings reference) in this PR or as a follow-up, whichever maintainers prefer.
🤖 Generated with Claude Code