feat(tables): background jobs (delete/export/backfill on trigger.dev) + tenant-scoped query performance#4915
feat(tables): background jobs (delete/export/backfill on trigger.dev) + tenant-scoped query performance#4915TheodoreSpeaks wants to merge 35 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryHigh Risk Overview Background operations: New delete-async (filter + Workflow & selection: Cancel-runs and column run accept filter / excludeRowIds for select-all semantics. Table detail/list responses expose Client: Select-all tracks exclusions; bulk delete/run/stop use filter scope without loading every row; large exports go async with tray + SSE auto-download. Shared SSE: Reviewed by Cursor Bugbot for commit 3546723. Bugbot is set up for automated code reviews on this repo. Configure here. |
Greptile SummaryThis PR adds background job infrastructure for Tables: async filtered row deletes, async CSV exports, and output-column backfill — all running on Trigger.dev (with an in-process
Confidence Score: 4/5Safe to merge with one targeted fix: the Trigger.dev dispatch path in both delete-async and import-async can leave a ghost running job that blocks the table's write slot for up to ~15 minutes on any transient Trigger.dev error. The ghost-job issue in the Trigger.dev dispatch path is a concrete, observable failure: after a successful markTableJobRunning, if tasks.trigger throws, the table's one-write-job-per-table slot stays occupied until the stale-job janitor runs, blocking all subsequent imports and deletes on that table. Adding a try/catch that calls releaseJobClaim before re-throwing is a small, targeted fix. Everything else — the keyset pagination, ownership-gated workers, planner hints, migration data migration — looks well-considered and tested. apps/sim/app/api/table/[tableId]/delete-async/route.ts and apps/sim/app/api/table/[tableId]/import-async/route.ts — the Trigger.dev dispatch section in each needs a try/catch to release the job claim on trigger failure. Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant Route as delete-async / export-async
participant DB as table_jobs (DB)
participant TDev as Trigger.dev
participant Worker as delete-runner / export-runner
participant SSE as Table SSE stream
Client->>Route: POST /delete-async (filter, excludeRowIds)
Route->>DB: markTableJobRunning (INSERT, partial-unique gate)
DB-->>Route: "claimed=true / false (409 on false)"
Route->>TDev: tasks.trigger(payload)
Note over Route,TDev: If trigger throws here, job row stays running (write slot locked ~15 min)
TDev->>Worker: run(payload)
loop Each keyset page
Worker->>DB: updateJobProgress (ownership check)
DB-->>Worker: "owns=true / false (stop if false)"
Worker->>DB: deletePageByIds / uploadFile
Worker->>SSE: appendTableEvent(progress)
end
Worker->>DB: markJobReady / markJobFailed
Worker->>SSE: appendTableEvent(ready/failed)
SSE-->>Client: job terminal event (invalidate queries)
Reviews (4): Last reviewed commit: "Merge remote-tracking branch 'origin/sta..." | Re-trigger Greptile |
| const excludeRowIds = rowSel.excluded ? [...rowSel.excluded] : [] | ||
| const estimatedCount = Math.max(0, tableRowCountRef.current - excludeRowIds.length) | ||
| onRequestDeleteAllByFilter({ excludeRowIds, estimatedCount }) |
There was a problem hiding this comment.
The
estimatedCount uses tableRowCountRef.current, which is the TOTAL table row count (tableData.rowCount), not the count matching the active filter. When a filter is active, the confirmation dialog will say "Delete 10,000 rows matching the current filter" even though only a small subset of those rows match the filter and will actually be deleted. This could alarm users or make them distrust the UI. The filtered count should come from the rows query's metadata (or at minimum the dialog should not show a count when a filter is active).
| const excludeRowIds = rowSel.excluded ? [...rowSel.excluded] : [] | |
| const estimatedCount = Math.max(0, tableRowCountRef.current - excludeRowIds.length) | |
| onRequestDeleteAllByFilter({ excludeRowIds, estimatedCount }) | |
| const excludeRowIds = rowSel.excluded ? [...rowSel.excluded] : [] | |
| // When a filter is active, tableRowCountRef.current is the *total* table count, not the | |
| // filtered count — pass undefined so the dialog avoids showing a misleading number. | |
| const estimatedCount = queryOptions?.filter | |
| ? undefined | |
| : Math.max(0, tableRowCountRef.current - excludeRowIds.length) | |
| onRequestDeleteAllByFilter({ excludeRowIds, estimatedCount }) |
| await queryClient.cancelQueries({ queryKey: tableKeys.rowsRoot(tableId) }) | ||
| const previousQueries = queryClient.getQueriesData<InfiniteData<TableRowsResponse, number>>({ | ||
| queryKey: tableKeys.rowsRoot(tableId), | ||
| }) | ||
| const previousDetail = queryClient.getQueryData<TableDefinition>(tableKeys.detail(tableId)) | ||
| const keep = new Set(excludeRowIds ?? []) | ||
| queryClient.setQueriesData<InfiniteData<TableRowsResponse, number>>( | ||
| { queryKey: tableKeys.rowsRoot(tableId), exact: false }, | ||
| (old) => | ||
| old | ||
| ? { | ||
| ...old, | ||
| pages: old.pages.map((page) => ({ | ||
| ...page, | ||
| rows: page.rows.filter((r) => keep.has(r.id)), | ||
| })), | ||
| } | ||
| : old |
There was a problem hiding this comment.
Optimistic update over-clears row caches
setQueriesData with exact: false updates every cached row query for this table, not just the one that matches the current filter. If the user previously loaded the table without a filter (or with a different filter), those cached pages are also cleared. Since onSuccess only invalidates tableKeys.lists() (not row queries), the stale no-op rows stay empty until the SSE terminal event fires — which may be minutes later for a large delete. The fix is to restrict the optimistic update to the exact query key that matches the current filter.
…ed optimistic clear, Cmd+A select-all, hide delete from tray)
…row-deletes # Conflicts: # apps/sim/app/workspace/[workspaceId]/tables/[tableId]/table.tsx # apps/sim/app/workspace/[workspaceId]/tables/components/import-csv-dialog/import-csv-dialog.tsx # apps/sim/app/workspace/[workspaceId]/tables/tables.tsx # apps/sim/lib/table/import-runner.ts # apps/sim/lib/table/service.ts # packages/db/migrations/meta/0227_snapshot.json # packages/db/migrations/meta/_journal.json
…sk, keyset index + autovacuum tuning
…ith in-process fallback
…ancel, and download
…instead of the import tray
… toasts instead of the import tray" This reverts commit 1ea5871.
…n one spinner tray
service.ts kept a private DbTransaction alias and two inline typeof db | DbTransaction unions after planner.ts began exporting the canonical DbTransaction/DbExecutor — import those instead. From the /simplify review of the perf series; no behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
@greptile review |
The local vi.mock('@sim/db/schema') stubbed only document/knowledgeBase,
but the file's import graph reaches lib/table/service whose module scope
now references tableJobs. The global schema mock already covers all of
it — rely on it per the testing rules instead of re-mocking.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Addresses the open Bugbot/Greptile findings on filtered select-all: - Filtered runs no longer cancel the whole table: cancelWorkflowGroupRuns takes a filter — it stops only dispatches with that exact filter scope and only in-flight cells on matching rows (semi-join); whole-table and differently-scoped dispatches keep running, their cancelled cells skipped via cancelledAt > requestedAt. - Stop on a filtered select-all sends the filter through cancel-runs (contract + route + mutation) instead of a table-wide cancel. - runColumnBodySchema rejects rowIds + filter together (mirrors deleteTableRowsBodySchema). - Select-all delete clears the selection in onSuccess, not at click, so a failed kickoff restores both rows and selection. - Clipboard copy/cut estimates use the filter-aware total (rowTotal) instead of the whole-table rowCount. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
@greptile review |
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…row-deletes Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| // paths can mask the doomed rows while the job runs (see `pendingDeleteMask`). | ||
| const jobId = generateId() | ||
| const payload: TableDeleteJobPayload = { filter, excludeRowIds, cutoff: cutoff.toISOString() } | ||
| const claimed = await markTableJobRunning(tableId, jobId, 'delete', payload) | ||
| if (!claimed) { | ||
| return NextResponse.json( | ||
| { error: 'A job is already in progress for this table' }, | ||
| { status: 409 } | ||
| ) | ||
| } | ||
|
|
||
| if (isTriggerDevEnabled) { | ||
| // Trigger.dev runs the delete outside the web container (survives deploys) and retries — | ||
| // safe: the keyset + cutoff walk just deletes whatever remains. | ||
| const [{ tableDeleteTask }, { tasks }] = await Promise.all([ | ||
| import('@/background/table-delete'), | ||
| import('@trigger.dev/sdk'), | ||
| ]) | ||
| await tasks.trigger<typeof tableDeleteTask>( | ||
| 'table-delete', | ||
| { jobId, tableId, workspaceId, filter, excludeRowIds, cutoff: cutoff.toISOString() }, | ||
| { tags: [`tableId:${tableId}`, `jobId:${jobId}`] } | ||
| ) | ||
| } else { | ||
| runDetached('table-delete', () => | ||
| runTableDelete({ | ||
| jobId, | ||
| tableId, | ||
| workspaceId, | ||
| filter, | ||
| excludeRowIds, |
There was a problem hiding this comment.
Ghost running job on trigger dispatch failure
If tasks.trigger (or its dynamic imports) throws — e.g., the Trigger.dev service is temporarily unreachable — the handler exits with a 500, but the table_jobs row created by markTableJobRunning remains with status='running'. This ghost job holds the table's one-write-job-per-table slot (the partial-unique index) until the stale-job janitor fires, which can take up to STALE_THRESHOLD_MINUTES (~15–20 min). During that window every subsequent import and delete for this table returns 409.
Wrap the dispatch block in a try/catch and release the claim before re-throwing:
try {
await tasks.trigger<typeof tableDeleteTask>(...)
} catch (err) {
await releaseJobClaim(tableId, jobId).catch(() => {})
throw err
}The same gap exists in apps/sim/app/api/table/[tableId]/import-async/route.ts for the Trigger.dev path added in this PR.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
If tasks.trigger (or its dynamic imports) throws after markTableJobRunning, the ghost running row held the table's one-write-job slot until the stale-job janitor fired (~15-20 min of 409s). All four kickoff routes now release the claim and rethrow; the backfill runner releases and warns (a failed backfill never fails the schema change). Greptile P1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Both are branch-only — no environment has applied them through the migration ledger yet — so the tenant-scoped GIN (btree_gin extension, index swap) folds into 0231_table_jobs_and_keyset. Snapshot chain re-pointed; drizzle-kit generate confirms zero drift. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Under select-all the context menu counted only the loaded page ("Delete
1000 rows" on a 999k-row table) while the action correctly deletes every
matching row via the background job. Delete now gets its own count from
the filter-aware total minus deselections; the run-action labels keep the
loaded-row count since those actions act on loaded rows only.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Follow-up to the label fix: under select-all the context menu's Run / Re-run / Stop only acted on the loaded page of rows. They now route through the same scopes as the action bar — runs dispatch by filter (whole table when unfiltered), Stop uses the filter-scoped cancel — and all labels share one true count (filter-aware total minus deselections, locale-formatted). Like the action bar, filter-scoped runs ignore deselections (the run API has no exclusion set). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Select-all minus deselected rows now means exactly that for every bulk action, not just delete. runColumnBodySchema and cancelTableRunsBodySchema accept excludeRowIds (bounded by MAX_EXCLUDE_ROW_IDS, select-all scope only); the dispatch scope persists it and the dispatcher window walk, eager bulk-clear, pre-run cancel, and filter/table-scoped cancel all skip excluded rows. Client threads exclusions from the selection through the action bar and the grid context menu, including the optimistic stamps. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 3546723. Configure here.
| // table's one-write-job slot until the stale-job janitor fires. | ||
| const { releaseJobClaim } = await import('@/lib/table/service') | ||
| await releaseJobClaim(table.id, importId).catch(() => {}) | ||
| throw error |
There was a problem hiding this comment.
Orphan table after trigger failure
Medium Severity
When TRIGGER_DEV_ENABLED is on, a failed tasks.trigger for create-mode import calls releaseJobClaim and rethrows, but the placeholder table from createTable is left in the workspace with no running job and no import worker—clients get a 500 while an empty table remains.
Reviewed by Cursor Bugbot for commit 3546723. Configure here.
| // their cells cancelled below are skipped via `cancelledAt > requestedAt`. | ||
| if (!rowId) { | ||
| await markActiveDispatchesCancelled(tableId) | ||
| await markActiveDispatchesCancelled(tableId, options?.filter) |
There was a problem hiding this comment.
Stop cancels unrelated dispatches
Medium Severity
Select-all Stop with excludeRowIds but no active filter routes through cancelWorkflowGroupRuns with filter undefined, so markActiveDispatchesCancelled cancels every active dispatch on the table—not just the whole-table selection—while per-row exclusions only skip individual cells.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 3546723. Configure here.


Summary
Background jobs, async exports, and a systemic performance pass for Tables.
Background row deletes +
table_jobs{ filter, excludeRowIds }and delete in keyset-paginated background batches; rows inserted mid-job are spared via acreated_atcutoff.table_jobstable (one row per job,type= import | delete | export | backfill). Concurrency gate: partial-unique index on(table_id) WHERE status='running' AND type <> 'export'— one writing job per table; read-only exports run concurrently.Jobs on trigger.dev
apps/sim/background/) with an in-processrunDetachedfallback — jobs survive deploys/restarts.CSV export
Performance (measured on a 1M-row table in a 12M-row shared relation)
(table_id, id)index (0231)(order_key, id)cursor seek$inRoot cause for most of these: jsonb predicates are unestimatable, so the planner picked parallel seq scans over the entire shared relation (every tenant's rows). Two mechanisms fix it:
withSeqscanOff(lib/table/planner.ts): transaction-scopedSET LOCAL enable_seqscan = offaround the queries with unindexable predicates (ILIKE/range/lower()/sort keys).datawithbtree_gin (table_id, data jsonb_path_ops)— containment lookups resolve tenant-scoped inside the index (and the index is smaller, 529MB vs 694MB on dev).Fit & finish
Migrations
0231_table_jobs_and_keyset:table_jobs+ import-state data migration +(table_id, id)index + autovacuum tuning +import_*column drops.0232_tenant_scoped_data_gin:btree_ginextension + tenant-scoped containment index (replacesuser_table_rows_data_gin_idx).Test plan
bunx vitest run lib/table app/api/table app/api/v1 hooks/queries(395 tests)bun run lint:check,tsc --noEmit,bun run check:api-validationTRIGGER_DEV_ENABLED=true; EXPLAIN ANALYZE before/after for each perf claim.🤖 Generated with Claude Code