feat(mem_wal): warm flushed generations into shared caches before query#7284
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
… seam Decompose flushed-generation caching into two roles so warming can be plugged in by the consumer without lance owning it. - `DatasetCache` trait (impl'd by the by-path `FlushedMemTableCache`): `get_or_open` + a now self-contained `get_or_build_pk_hashes` (opens + scans internally, so an out-of-crate warmer needs none of lance's PK-hashing internals) + `retain_paths`. - `GenerationWarmer` trait: the seam lance fires; the consumer (e.g. the WAL pod) implements it. No lance impl ships. - Two warm seams wired to `Option<Arc<dyn GenerationWarmer>>`, `None` by default: pre-commit in `MemTableFlusher` (via `ShardWriterConfig.warmer`) and warm-on-open in `open_flushed_dataset` (threaded through the LSM scanner/planners). Both no-op without a warmer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tCache Change the LSM scanner/planners, block-list, open_flushed_dataset, and prewarm_mem_wal to take `Arc<dyn DatasetCache>` instead of the concrete `Arc<FlushedMemTableCache>`, and re-export `scan_pk_hashes`. This lets a consumer (the sophon WAL node) supply its own DatasetCache implementation — e.g. one that injects a read-through object-store byte-cache wrapper at open — instead of being locked to the built-in cache. `FlushedMemTableCache` remains the default impl; callers pass it by value and it coerces. No behavior change: the default path still uses FlushedMemTableCache. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # rust/lance/src/dataset/mem_wal/scanner/builder.rs # rust/lance/src/dataset/mem_wal/scanner/planner.rs
|
This PR abstracts away some of the caching mechanisms in |
westonpace
left a comment
There was a problem hiding this comment.
Attaching Claude's feedback. On a separate matter, how are you planning on benchmarking things like this? This PR is to improve performance but how do we avoid regressing something like this going forwards?
PR #7284 Review: feat(mem_wal): warm flushed generations into shared caches before query
Overview
This PR adds proactive cache warming for mem_wal flushed generations so the first query sees zero cold reads. It introduces two new traits (DatasetCache, GenerationWarmer), abstracts FlushedMemTableCache behind DatasetCache, and wires warming into the flush path (pre-commit, best-effort). A fire-and-forget warm-on-open backstop is also added via open_flushed_dataset.
Correctness Issues
open_flushed_dataset warm-on-open uses empty pk_columns
The warm-on-open backstop in flushed_cache.rs fires warmer.warm(&path, &[]) with an empty pk_columns slice. If the GenerationWarmer implementation (e.g. in sophon) calls get_or_build_pk_hashes with an empty column list, it will either silently build a useless empty hash set or panic depending on how scan_pk_hashes handles it. The actual columns are available at flush time (warm_generation) but not at open_flushed_dataset call sites. Either:
- Remove the backstop warm from
open_flushed_datasetsince the primary path (pre-commit inMemTableFlusher) covers it, or - Add
pk_columns: &[String]as a parameter toopen_flushed_datasetand thread it through.
The comment says "the warmer dedups already-warm paths" but this only works correctly at the flush site where pk_columns is populated. The backstop may produce a bad state.
Warm-on-open fires for all 3 query planners on every dataset open
open_flushed_dataset is called from LsmScanPlanner, LsmPointLookupPlanner, LsmFtsSearchPlanner, and LsmVectorSearchPlanner. If warmer is threaded into all four, and a query fans out across multiple planners for the same generation, multiple concurrent warm tasks spawn. The idempotency gate mentioned in the PR summary should be in the GenerationWarmer impl, but the contract isn't enforced or documented at the trait level — add a doc note to GenerationWarmer::warm stating it must be idempotent.
Design Concerns
DatasetCache::retain_paths is sync in an async trait
retain_paths is the only sync method on an otherwise async trait. The PR description says it's "now async and actively evicts retired generations' index objects," but the trait signature is fn retain_paths(&self, live_paths: &HashSet<String>) — sync. If future implementations need async eviction (e.g. calling Session::invalidate_index_prefix), this will require a breaking trait change. Consider making it async fn retain_paths(...) now while it's cheap to do so.
GenerationWarmer trait is opaque about its contract
warm(&path, &[]) with empty pk_columns from the open-path backstop is a footgun (see above). The trait doc says "prewarm its indexes and optionally pre-build the vector block-list" but doesn't specify what pk_columns = &[] means — skip pk-hash build, or error? This should be documented explicitly.
Duplicate with_warmer pattern across 5 planners
LsmScanPlanner, LsmPointLookupPlanner, LsmFtsSearchPlanner, LsmVectorSearchPlanner, and LsmScanner all get the identical warmer: Option<Arc<dyn GenerationWarmer>> field + with_warmer builder + propagation boilerplate. This is 5 copies of the same 3 lines. A small PlannerConfig struct or a PlannerState that combines session, flushed_cache, and warmer would cut the surface significantly. This isn't blocking but it's the kind of pattern that compounds as more options are added.
Test Coverage
- The
pk_hashes_cached_reuses_first_buildtest was correctly updated to use a real dataset instead of a closure — good. - There is no test for the warm-on-open backstop (empty
pk_columnspath). - There is no test covering the scenario where a warm fails and the flush still proceeds (best-effort path in
warm_generation). - No test verifies that the warmer fires exactly once when
open_flushed_datasetis called concurrently (idempotency contract).
Minor
flush.rs:68-70comment uses⇒(Unicode arrow) instead of=>— inconsistent with Rust doc style elsewhere in the file.- The bench file change (
warmer: None) is correct but worth noting that any futureShardWriterConfigfield additions will require touching the bench again since it uses struct literal syntax.
Summary
The core idea is sound. The main blocking issue is the empty pk_columns in the warm-on-open backstop — this will silently misfire. The sync retain_paths on an async trait is a design debt worth fixing before it ships. Everything else is style or test coverage gaps.
# Conflicts: # rust/lance/src/dataset/mem_wal/memtable/flush.rs # rust/lance/src/dataset/mem_wal/scanner.rs # rust/lance/src/dataset/mem_wal/scanner/block_list.rs # rust/lance/src/dataset/mem_wal/scanner/flushed_cache.rs # rust/lance/src/dataset/mem_wal/scanner/fts_search.rs # rust/lance/src/dataset/mem_wal/scanner/planner.rs # rust/lance/src/dataset/mem_wal/scanner/point_lookup.rs # rust/lance/src/dataset/mem_wal/scanner/vector_search.rs
Post-lance-format#7067 a warmer touches only path-keyed state (opened dataset, its secondary indexes, its PK dedup sidecar), so pk_columns was vestigial — the warm-on-open backstop passed an empty slice. Drop the parameter and document that warm must be idempotent and cheap when already warm, since it is fired fire-and-forget from all four planners plus pre-commit flush. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lets an out-of-crate cache evict retired generations' index objects (e.g. Session::invalidate_index_prefix) during retain without a later breaking trait change. FlushedMemTableCache's own eviction stays synchronous (moka invalidate_entries_if). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- test_flusher_commits_when_warm_fails: a warmer that errors pre-commit must not gate the flush; the generation still commits and the warm fires exactly once. - test_open_flushed_dataset_fires_warm_on_open: the fire-and-forget backstop warms a generation on cold open. Also replaces two Unicode `⇒` with `=>` in warm doc comments. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The DatasetCache trait doc linked [`super::block_list::open_pk_index`], but open_pk_index is private and the path does not resolve, failing rustdoc under -D warnings. Demote it to a plain code reference. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@westonpace appreciate the review! I think I've addressed the claude review comments, but I'll follow up here.
This is a great question. Unfortunately I'm just doing manual testing right now, but we will need to build something more automated moving forwards. IMO the two options we have are (1) extending the current PR Comments
We removed the
Updated the docs on
Made everything async here.
We removed the
This duplicates like 3 LoC for each planner. The workaround would be to consolidate this into a single struct rather than have each parameter be it's own function. Because it's not clear that we would want the configuration of each of these options to be bundled with the others I've left it as is. I think if this becomes unmanageable in the future it would make sense to refactor. |
# Conflicts: # rust/lance/src/dataset/mem_wal/scanner.rs # rust/lance/src/dataset/mem_wal/scanner/block_list.rs # rust/lance/src/dataset/mem_wal/scanner/builder.rs
westonpace
left a comment
There was a problem hiding this comment.
Sorry, meant for previous review to be approved. No need to hold this up while we figure out regression benchmarking.
Summary
Evolve
FlushedMemTableCacheinto the unified warm/open interface for mem_wal flushed generations, and populate the caches before a generation is queryable so the first query sees zero cold reads.FlushedMemTableCachenow owns a requiredSession(the indexCacheBackendseam) and an optional read-throughWrappingObjectStore(page cache), threading both into every open.get_or_open(path)drops its per-call session arg.warm(path, pk_columns): open +prewarm_all_indexes(FTS) +get_or_build_pk_hashes(vector block-list), bounded by a semaphore and idempotent via awarmedgate.open_flushed_datasetfires a warm-on-open backstop.retain_pathsis now async and actively evicts retired generations' index objects via the newSession::invalidate_index_prefix; the byte cache is left to LRU.MemTableFlusherwarms each generation pre-commit, best-effort (logged on error, never blocksupdate_manifest), threaded viaShardWriterConfig.flushed_cache.This is the Lance-side building block for WAL-pod flushed-generation caching (consumed by sophon, which supplies the backed
Session+ read-through pool).Test plan
cargo test -p lance --lib mem_wal::scanner::flushed_cache(7 tests, incl. warm/idempotency/pk-hash/retain) — passcargo test -p lance --lib mem_wal::memtable::flush(8 tests) — passcargo clippy -p lance --tests --benches— cleancargo fmt --all🤖 Generated with Claude Code