Skip to content

feat: add SpillStore trait with local-disk implementation#7311

Open
wjones127 wants to merge 1 commit into
lance-format:mainfrom
wjones127:worktree-piped-wiggling-metcalfe
Open

feat: add SpillStore trait with local-disk implementation#7311
wjones127 wants to merge 1 commit into
lance-format:mainfrom
wjones127:worktree-piped-wiggling-metcalfe

Conversation

@wjones127

@wjones127 wjones127 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a generic SpillStore — reclaimable RAII scratch storage for intermediate state that overflows memory (e.g. index-build posting lists, shuffle runs, BTree pages). Mechanism only; consumer migration (IVF shuffler) is a follow-up.

  • SpillStore / SpillFile (lance-io): create_spill_file() vends a write-once RAII handle whose writer() / reader() hand back Box<dyn Writer> / Box<dyn Reader>, so callers feed spill files straight into FileWriter::try_new and a v2 FileReader without leaking an ObjectStore + path. Dropping the handle deletes the file and releases its bytes back to the store's budget.
  • LocalSpillStore: writes to an OS temp directory; with_cap enforces an optional byte budget shared across all handles, returning a typed Error::DiskCapExceeded instead of silently filling the disk. Enforcement lives entirely in the spill store — the spill file decorates the writer with a QuotaWriter (reserve-on-write, release-on-drop-by-stat) rather than threading a field through ObjectStore and every provider, so it works for any backend the store opens.
  • From<io::Error> recovers a wrapped lance Error, so typed errors such as DiskCapExceeded survive the AsyncWrite boundary.
  • ScanScheduler::open_reader builds a FileScheduler over an already-open Reader (no path/size lookup), bridging a bare spill reader into the v2 reader path.
  • Session gains a spill_store field (default: uncapped LocalSpillStore), a with_spill_store() builder for injection, and a spill_store() accessor.

Closes #7300

🤖 Generated with Claude Code

@github-actions github-actions Bot added the enhancement New feature or request label Jun 17, 2026
@github-actions github-actions Bot added the A-encoding Encoding, IO, file reader/writer label Jun 17, 2026
@wjones127 wjones127 force-pushed the worktree-piped-wiggling-metcalfe branch from 2a5a2b4 to ad19afe Compare June 17, 2026 20:56
Adds a `SpillStore` trait on `Session` providing uniform, reclaimable
scratch space for intermediate state that overflows memory (e.g. index
build posting lists, shuffle runs, BTree pages).

- `SpillStore` / `SpillFile` (lance-io): `create_spill_file()` vends a
  RAII handle; `writer()` / `reader()` hand back `Box<dyn Writer>` /
  `Box<dyn Reader>` so callers feed spill files directly into
  `FileWriter::try_new` and a v2 `FileReader` without leaking an
  `ObjectStore` + path. The file is deleted on drop and its bytes are
  released back to the store's usage counter.
- `LocalSpillStore`: writes to an OS temp directory; optionally enforces
  a byte cap. Enforcement lives entirely in the spill store: the spill
  file decorates the writer with a quota-enforcing `QuotaWriter`
  (reserve-on-write, release-on-drop-by-stat) rather than threading a
  field through `ObjectStore` and every provider, so it works for any
  backend the store opens.
- `From<io::Error>` recovers a wrapped lance `Error`, so typed errors
  such as `DiskCapExceeded` survive the `AsyncWrite` boundary.
- `ScanScheduler::open_reader` builds a `FileScheduler` over an
  already-open `Reader` (no path/size lookup).
- `Session` gains a `spill_store` field (defaults to uncapped
  `LocalSpillStore`), a `with_spill_store()` builder, and a
  `spill_store()` accessor so callers and tests can inject alternatives.

Mechanism only; consumer migration (IVF shuffler) is a follow-up.

Closes lance-format#7300

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@wjones127 wjones127 force-pushed the worktree-piped-wiggling-metcalfe branch from ad19afe to b2c0c08 Compare June 17, 2026 21:10
@wjones127 wjones127 marked this pull request as ready for review June 17, 2026 22:01
@wjones127 wjones127 requested a review from westonpace June 17, 2026 22:01
@codecov

codecov Bot commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 88.92508% with 34 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-io/src/spill.rs 85.77% 23 Missing and 8 partials ⚠️
rust/lance-core/src/error.rs 93.10% 1 Missing and 1 partial ⚠️
rust/lance/src/session.rs 96.96% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-encoding Encoding, IO, file reader/writer enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add SpillStore trait with local-disk implementation

1 participant