Skip to content

fix: grow posting_lists before indexed access in FTS with_position builder#7330

Open
Siriapps wants to merge 2 commits into
lance-format:mainfrom
Siriapps:fix-issue-7313
Open

fix: grow posting_lists before indexed access in FTS with_position builder#7330
Siriapps wants to merge 2 commits into
lance-format:mainfrom
Siriapps:fix-issue-7313

Conversation

@Siriapps

Copy link
Copy Markdown

What does this PR do?

Fixes an index out of bounds panic in IndexWorker::process_batch() when FTS indexing runs with with_position: true (the default). The with_position branch now grows posting_lists with resize_with(token_idx + 1, ...) before indexing by token_id, matching the pattern already used in the non-position branch and in merge_from.

Why was this PR needed?

When tokens.add() returns a token_id greater than posting_lists.len() — e.g. after loading a legacy FTS partition with a stale next_id during optimize_indices — the old code only appended a posting list when token_id == posting_lists.len(). That skips growth for gaps and panics at posting_lists[token_id].

Reported in production with posting_lists.len()=1731 and token_id=4456 (#7313). Changing == to >= with a single push is insufficient for that gap; resize_with(token_idx + 1, ...) is required.

What are the relevant issue numbers?

Closes #7313

Does this PR meet the acceptance criteria?

Per CONTRIBUTING.md:

  • Tests added for new/changed behavior (test_process_batch_with_position_handles_token_id_gaps)
  • All tests passing (cargo test -p lance-index)
  • Follows project style guide (cargo fmt --all)
  • Conventional Commits PR title (fix:)
  • No breaking changes introduced
  • Documentation updated — N/A (internal bug fix)

Suggested label: critical-fix (crash during optimize_indices on FTS indexes with with_position: true)

Siriapps and others added 2 commits June 17, 2026 14:17
…7313

Add a unit test that mirrors the production failure (posting_lists.len=1731,
token_id=4456) when with_position indexing encounters a stale next_id.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ilder

When token_id exceeds posting_lists.len() during with_position indexing
(e.g. stale next_id from legacy FTS partitions), resize posting_lists to
token_id + 1 before access instead of growing only on exact equality.

Fixes lance-format#7313

Co-authored-by: Cursor <cursoragent@cursor.com>
@Siriapps

Copy link
Copy Markdown
Author

Hi @sinianluoye this addresses #7313. The fix grows posting_lists before indexed access when token_id exceeds the current vector length (stale next_id from legacy partitions). Would appreciate a review when you have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-index Vector index, linalg, tokenizer bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: index out of bounds panic in inverted index builder when token_id > posting_lists.len()

2 participants