fix(tracking): S3 tracking server fails on first start with 'no such table: project' (#574) by vaquarkhan · Pull Request #828 · apache/burr

vaquarkhan · 2026-06-30T03:30:17Z

Root Cause
Bug 1 — no such table: project:

On a fresh container/EKS deploy with no pre-existing snapshot in S3, the tracking server crashes immediately when the indexer runs:

sqlite3.OperationalError: no such table: project
RegisterTortoise(app, config=..., add_exception_handlers=True) does not generate schemas by default (generate_schemas defaults to False). On first start there is no snapshot DB to download, so the SQLite file is created empty (no tables). When sync_index() calls backend.update() → _update_projects() → Project.all(), it hits a nonexistent table.

Bug 2 — indexer silently drops old logs:

The max_paths batch cap in _gather_paths_to_update had a broken break that only exited the inner for loop over a single S3 page, not the outer paginator loop. The paginator continued fetching additional pages, collecting far more than max_paths files. The watermark then advanced to the last file in this oversized batch, permanently skipping files that fell between position max_paths and the actual end of the batch on subsequent cycles.

Fix
Bug 1: Call Tortoise.generate_schemas(safe=True) inside the lifespan after RegisterTortoise enters. safe=True uses CREATE TABLE IF NOT EXISTS, so it:

Creates tables on first start (no snapshot)
Is a no-op when tables already exist from a downloaded snapshot
Never clobbers existing data
Bug 2: Add a cap_reached flag that breaks the outer paginator loop when the inner break fires. The batch is now truly capped at max_paths, and the watermark only advances to the last file in a correctly-sized batch.

Files Changed
backend.py
— schema generation in lifespan (Bug 1); paginator break fix (Bug 2)
test_s3_backend_bug574.py
— regression tests (moto-based)
Tests
test_generate_schemas_safe_true_creates_tables — verifies tables are created on empty DB
test_generate_schemas_safe_true_does_not_clobber_existing — verifies safe=True preserves snapshot data
test_gather_paths_respects_max_paths_cap — verifies batch cap works (requires moto)
test_watermark_advances_only_to_last_indexed_file — verifies watermark correctness (requires moto)
Bug 1 tests pass locally. Bug 2 tests require moto (marked skip if not installed; will run in CI).

Fixes #574

…napshot (apache#574) Bug 1: On a fresh container/EKS deploy with no pre-existing snapshot, the S3 tracking server crashed with 'no such table: project' because RegisterTortoise does not generate schemas by default. Fix: Call Tortoise.generate_schemas(safe=True) after RegisterTortoise enters the context. safe=True uses CREATE TABLE IF NOT EXISTS, so it is a no-op when tables already exist from a downloaded snapshot. Bug 2: The max_paths batch cap in _gather_paths_to_update had a broken break that only exited the inner for-loop, not the outer paginator loop. This caused unbounded file collection, advancing the watermark past files that should have been indexed in subsequent cycles. Fix: Add a cap_reached flag that breaks the outer paginator loop when the inner break fires. Tests: moto-based regression tests for both bugs (schema creation without snapshot, schema coexistence with snapshot, batch cap enforcement, watermark boundary correctness).

…tibility Use moto test credentials in the mock_s3 fixture so aiobotocore can connect through the mocked S3 client during regression tests.

github-actions Bot added area/storage Persisters, state storage area/tracking Telemetry, tracing, OpenTelemetry labels Jun 30, 2026

fix(tests): set AWS env vars in mock_s3 fixture for aiobotocore compa…

c78e32d

…tibility Use moto test credentials in the mock_s3 fixture so aiobotocore can connect through the mocked S3 client during regression tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(tracking): S3 tracking server fails on first start with 'no such table: project' (#574)#828

fix(tracking): S3 tracking server fails on first start with 'no such table: project' (#574)#828
vaquarkhan wants to merge 2 commits into
apache:mainfrom
vaquarkhan:issue-574-s3-tracking-bug

vaquarkhan commented Jun 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

vaquarkhan commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vaquarkhan commented Jun 30, 2026 •

edited

Loading