Skip to content

Add finding lifecycle provenance events (3/3)#15152

Open
devGregA wants to merge 2 commits into
DefectDojo:devfrom
devGregA:devgrega/finding-lifecycle-events
Open

Add finding lifecycle provenance events (3/3)#15152
devGregA wants to merge 2 commits into
DefectDojo:devfrom
devGregA:devgrega/finding-lifecycle-events

Conversation

@devGregA

@devGregA devGregA commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Part 3/3 of a visibility series: #15150 · #15151 · #15152 (this PR)

⚠️ Merge-order note (interacts with #15150)

This PR and #15150 both update the pinned query-count baselines in unittests/test_importers_performance.py and unittests/test_tag_inheritance_perf.py, each measured against dev without the other. They are functionally independent and can merge in either order — but whichever merges second needs a rebase with re-measured baselines, because the deltas add up. Example: EXPECTED_ZAP_IMPORT_V2 is 287 on dev, 288 in each PR alone, and 289 once both are in (+1 for this PR's created-events batch, +1 for #15150's batch stamp). I'll push the combined-baseline rebase on whichever PR lands second.


Description

Adds a finding lifecycle provenance ledger: an append-only record of the semantic transitions in a finding's life, answering the questions behind a large class of support tickets — "why did this finding close?", "why is this a duplicate?", "when was this ticketed?"

New model Finding_Lifecycle_Event (finding, actor_type, action, detail JSON, created), written at five capture points:

Event Where Detail
created importer + reimporter (only findings actually created) test id, scan type, import vs reimport
closed mitigate_finding (close_old_findings / re-upload) close reason, test id
reopened reimporter reactivation reason, test id
marked_duplicate set_duplicate (covers batch dedupe; transitive re-points record their own event) original finding id, hash_code
pushed_jira add_jira_issue success JIRA issue key

This complements — deliberately does not duplicate — existing history: pghistory triggers capture field-level diffs, and Test_Import_Finding_Action records per-import actions. Neither can express why: which key matched, which re-upload closed it, what it's a duplicate of. That's what this table records.

API: GET /api/v2/findings/{id}/lifecycle_events/ — the finding's provenance timeline, newest first (read-only).

Performance / operational design (this table must never become a problem):

  • Transition-only writes: a reimport that matches findings unchanged writes zero rows (covered by an explicit test). Event volume tracks churn, not scan cadence.
  • Batched: importers bulk_create events at the existing 1,000-finding batch boundaries; no signals, no per-row saves; detail values truncated to 256 chars.
  • Delete-safe: the FK has db_constraint=False + on_delete=DO_NOTHING, so bulk finding deletion never touches this table (no ORM cascade collection, no DB cascade). Orphans are swept by retention.
  • Two indexes only: (finding, created) for the timeline read; (created) for the purge.
  • Retention: nightly beat task purges events older than DD_FINDING_LIFECYCLE_EVENTS_RETENTION_DAYS (default 540), batched deletes.
  • Kill switch: DD_FINDING_LIFECYCLE_EVENTS_ENABLED (default true) turns all writes into no-ops.
  • Measured cost, pinned by the perf baselines: +1 query per import batch (bulk-created CREATED events), +0 on unchanged-match reimports, +1 per finding closed by close_old_findings, +1 per duplicate marked by dedupe. The query-count baselines in test_importers_performance.py and test_tag_inheritance_perf.py are updated accordingly and now double as a regression tripwire for this table's write discipline.

Test results

New module unittests/test_finding_lifecycle_events.py (5 tests), including a full reimport cycle over the semgrep close-old fixtures (created → matched-with-zero-events → closed with reason on unique-id change → reactivated), dedupe originals, the API endpoint, retention purge, and the kill switch.

Regression: test_importers_closeold, test_importers_deduplication, test_deduplication_logic (135 tests) and test_rest_framework.FindingsTest (27 tests) all pass against PostgreSQL via the unit-test compose image. makemigrations --check clean; ruff (0.15.20, repo config) clean.

Not covered by an automated test: the pushed_jira event (requires a mocked JIRA stack; the capture is three lines on the existing success path). Happy to extend a JIRA test if maintainers prefer.

Documentation

Additive feature; the API action is schema-annotated. Happy to add a docs page (finding lifecycle events + settings reference) in this PR or as a follow-up, whichever maintainers prefer.

🤖 Generated with Claude Code

New append-only model Finding_Lifecycle_Event records the SEMANTIC
transitions in a finding's life - the "why" behind a large class of
support questions:

- created: by which import/reimport (only findings actually created)
- closed: by close_old_findings / re-upload, with the reason
- reopened: reactivated by a re-upload
- marked_duplicate: of which original (covers batch dedupe and
  transitive re-points)
- pushed_jira: as which issue key

This complements, and deliberately does not duplicate, existing
history: pghistory captures field-level diffs and
Test_Import_Finding_Action records per-import actions; neither can
express why a transition happened.

Exposed read-only at /api/v2/findings/{id}/lifecycle_events/.

Performance design: transition-only writes (a matched-unchanged
reimport writes zero rows - tested), bulk_create at the existing
1000-finding import batch boundaries, no signals, detail values
truncated. The FK carries no DB constraint with on_delete=DO_NOTHING
so bulk finding deletion never touches this table; orphans and old
events are swept by a nightly retention purge task
(DD_FINDING_LIFECYCLE_EVENTS_RETENTION_DAYS, default 540). Kill
switch: DD_FINDING_LIFECYCLE_EVENTS_ENABLED. Two indexes: the
(finding, created) timeline read and (created) for the purge.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions github-actions Bot added New Migration Adding a new migration file. Take care when merging. settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR unittests labels Jul 4, 2026
The provenance ledger writes exactly one query per transition, and the
baselines now document that cost precisely:

- +1 on steps that create findings (one bulk_create of CREATED events
  per import batch)
- +0 on unchanged-match reimports (transition-only discipline, now
  enforced by the perf baselines as well as the unit test)
- +1 per finding closed by close_old_findings (CLOSED event inside
  mitigate_finding, alongside the ~10 queries a close already costs)
- +1 per duplicate marked by dedupe (MARKED_DUPLICATE event in
  set_duplicate)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@devGregA devGregA changed the title Add finding lifecycle provenance events Add finding lifecycle provenance events (3/3) Jul 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

New Migration Adding a new migration file. Take care when merging. settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR unittests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant