Skip to content

feat(mem_wal): warm flushed generations into shared caches before query#7284

Merged
hamersaw merged 9 commits into
lance-format:mainfrom
hamersaw:perf/wal-cache
Jun 18, 2026
Merged

feat(mem_wal): warm flushed generations into shared caches before query#7284
hamersaw merged 9 commits into
lance-format:mainfrom
hamersaw:perf/wal-cache

Conversation

@hamersaw

Copy link
Copy Markdown
Contributor

Summary

Evolve FlushedMemTableCache into the unified warm/open interface for mem_wal flushed generations, and populate the caches before a generation is queryable so the first query sees zero cold reads.

  • FlushedMemTableCache now owns a required Session (the index CacheBackend seam) and an optional read-through WrappingObjectStore (page cache), threading both into every open. get_or_open(path) drops its per-call session arg.
  • New warm(path, pk_columns): open + prewarm_all_indexes (FTS) + get_or_build_pk_hashes (vector block-list), bounded by a semaphore and idempotent via a warmed gate. open_flushed_dataset fires a warm-on-open backstop.
  • retain_paths is now async and actively evicts retired generations' index objects via the new Session::invalidate_index_prefix; the byte cache is left to LRU.
  • MemTableFlusher warms each generation pre-commit, best-effort (logged on error, never blocks update_manifest), threaded via ShardWriterConfig.flushed_cache.

This is the Lance-side building block for WAL-pod flushed-generation caching (consumed by sophon, which supplies the backed Session + read-through pool).

Test plan

  • cargo test -p lance --lib mem_wal::scanner::flushed_cache (7 tests, incl. warm/idempotency/pk-hash/retain) — pass
  • cargo test -p lance --lib mem_wal::memtable::flush (8 tests) — pass
  • cargo clippy -p lance --tests --benches — clean
  • cargo fmt --all

🤖 Generated with Claude Code

@github-actions github-actions Bot added the enhancement New feature or request label Jun 15, 2026
… seam

Decompose flushed-generation caching into two roles so warming can be
plugged in by the consumer without lance owning it.

- `DatasetCache` trait (impl'd by the by-path `FlushedMemTableCache`):
  `get_or_open` + a now self-contained `get_or_build_pk_hashes`
  (opens + scans internally, so an out-of-crate warmer needs none of
  lance's PK-hashing internals) + `retain_paths`.
- `GenerationWarmer` trait: the seam lance fires; the consumer (e.g. the
  WAL pod) implements it. No lance impl ships.
- Two warm seams wired to `Option<Arc<dyn GenerationWarmer>>`, `None` by
  default: pre-commit in `MemTableFlusher` (via `ShardWriterConfig.warmer`)
  and warm-on-open in `open_flushed_dataset` (threaded through the LSM
  scanner/planners). Both no-op without a warmer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tCache

Change the LSM scanner/planners, block-list, open_flushed_dataset, and
prewarm_mem_wal to take `Arc<dyn DatasetCache>` instead of the concrete
`Arc<FlushedMemTableCache>`, and re-export `scan_pk_hashes`. This lets a
consumer (the sophon WAL node) supply its own DatasetCache implementation
— e.g. one that injects a read-through object-store byte-cache wrapper at
open — instead of being locked to the built-in cache. `FlushedMemTableCache`
remains the default impl; callers pass it by value and it coerces.

No behavior change: the default path still uses FlushedMemTableCache.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@hamersaw hamersaw marked this pull request as ready for review June 16, 2026 17:36
# Conflicts:
#	rust/lance/src/dataset/mem_wal/scanner/builder.rs
#	rust/lance/src/dataset/mem_wal/scanner/planner.rs
@hamersaw

Copy link
Copy Markdown
Contributor Author

This PR abstracts away some of the caching mechanisms in mem_wal; resulting in two new traits, namely DatasetCache and Warmer. The goal of this is a pluggable interface that allows users to decide when and how data (both dataset bytes + indexes) gets cached when using the mem_wal. DatasetCache is implemented by the existing FlushedMemTableCache and is responsible for handling Lance dataset opens. The Warmer is tasked with ensuring that Lance datasets are prewarmed to ensure performant queries. This is primarily used when flushing active memtable to l0 cache where we write to objectstore and then ensure that data is in memory in the Lance Session cache so that it can be queried efficiently.

@westonpace westonpace left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attaching Claude's feedback. On a separate matter, how are you planning on benchmarking things like this? This PR is to improve performance but how do we avoid regressing something like this going forwards?


PR #7284 Review: feat(mem_wal): warm flushed generations into shared caches before query

Overview

This PR adds proactive cache warming for mem_wal flushed generations so the first query sees zero cold reads. It introduces two new traits (DatasetCache, GenerationWarmer), abstracts FlushedMemTableCache behind DatasetCache, and wires warming into the flush path (pre-commit, best-effort). A fire-and-forget warm-on-open backstop is also added via open_flushed_dataset.


Correctness Issues

open_flushed_dataset warm-on-open uses empty pk_columns

The warm-on-open backstop in flushed_cache.rs fires warmer.warm(&path, &[]) with an empty pk_columns slice. If the GenerationWarmer implementation (e.g. in sophon) calls get_or_build_pk_hashes with an empty column list, it will either silently build a useless empty hash set or panic depending on how scan_pk_hashes handles it. The actual columns are available at flush time (warm_generation) but not at open_flushed_dataset call sites. Either:

  • Remove the backstop warm from open_flushed_dataset since the primary path (pre-commit in MemTableFlusher) covers it, or
  • Add pk_columns: &[String] as a parameter to open_flushed_dataset and thread it through.

The comment says "the warmer dedups already-warm paths" but this only works correctly at the flush site where pk_columns is populated. The backstop may produce a bad state.

Warm-on-open fires for all 3 query planners on every dataset open

open_flushed_dataset is called from LsmScanPlanner, LsmPointLookupPlanner, LsmFtsSearchPlanner, and LsmVectorSearchPlanner. If warmer is threaded into all four, and a query fans out across multiple planners for the same generation, multiple concurrent warm tasks spawn. The idempotency gate mentioned in the PR summary should be in the GenerationWarmer impl, but the contract isn't enforced or documented at the trait level — add a doc note to GenerationWarmer::warm stating it must be idempotent.


Design Concerns

DatasetCache::retain_paths is sync in an async trait

retain_paths is the only sync method on an otherwise async trait. The PR description says it's "now async and actively evicts retired generations' index objects," but the trait signature is fn retain_paths(&self, live_paths: &HashSet<String>) — sync. If future implementations need async eviction (e.g. calling Session::invalidate_index_prefix), this will require a breaking trait change. Consider making it async fn retain_paths(...) now while it's cheap to do so.

GenerationWarmer trait is opaque about its contract

warm(&path, &[]) with empty pk_columns from the open-path backstop is a footgun (see above). The trait doc says "prewarm its indexes and optionally pre-build the vector block-list" but doesn't specify what pk_columns = &[] means — skip pk-hash build, or error? This should be documented explicitly.

Duplicate with_warmer pattern across 5 planners

LsmScanPlanner, LsmPointLookupPlanner, LsmFtsSearchPlanner, LsmVectorSearchPlanner, and LsmScanner all get the identical warmer: Option<Arc<dyn GenerationWarmer>> field + with_warmer builder + propagation boilerplate. This is 5 copies of the same 3 lines. A small PlannerConfig struct or a PlannerState that combines session, flushed_cache, and warmer would cut the surface significantly. This isn't blocking but it's the kind of pattern that compounds as more options are added.


Test Coverage

  • The pk_hashes_cached_reuses_first_build test was correctly updated to use a real dataset instead of a closure — good.
  • There is no test for the warm-on-open backstop (empty pk_columns path).
  • There is no test covering the scenario where a warm fails and the flush still proceeds (best-effort path in warm_generation).
  • No test verifies that the warmer fires exactly once when open_flushed_dataset is called concurrently (idempotency contract).

Minor

  • flush.rs:68-70 comment uses (Unicode arrow) instead of => — inconsistent with Rust doc style elsewhere in the file.
  • The bench file change (warmer: None) is correct but worth noting that any future ShardWriterConfig field additions will require touching the bench again since it uses struct literal syntax.

Summary

The core idea is sound. The main blocking issue is the empty pk_columns in the warm-on-open backstop — this will silently misfire. The sync retain_paths on an async trait is a design debt worth fixing before it ships. Everything else is style or test coverage gaps.

hamersaw and others added 5 commits June 17, 2026 14:53
# Conflicts:
#	rust/lance/src/dataset/mem_wal/memtable/flush.rs
#	rust/lance/src/dataset/mem_wal/scanner.rs
#	rust/lance/src/dataset/mem_wal/scanner/block_list.rs
#	rust/lance/src/dataset/mem_wal/scanner/flushed_cache.rs
#	rust/lance/src/dataset/mem_wal/scanner/fts_search.rs
#	rust/lance/src/dataset/mem_wal/scanner/planner.rs
#	rust/lance/src/dataset/mem_wal/scanner/point_lookup.rs
#	rust/lance/src/dataset/mem_wal/scanner/vector_search.rs
Post-lance-format#7067 a warmer touches only path-keyed state (opened dataset, its
secondary indexes, its PK dedup sidecar), so pk_columns was vestigial —
the warm-on-open backstop passed an empty slice. Drop the parameter and
document that warm must be idempotent and cheap when already warm, since
it is fired fire-and-forget from all four planners plus pre-commit flush.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lets an out-of-crate cache evict retired generations' index objects
(e.g. Session::invalidate_index_prefix) during retain without a later
breaking trait change. FlushedMemTableCache's own eviction stays
synchronous (moka invalidate_entries_if).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- test_flusher_commits_when_warm_fails: a warmer that errors pre-commit
  must not gate the flush; the generation still commits and the warm
  fires exactly once.
- test_open_flushed_dataset_fires_warm_on_open: the fire-and-forget
  backstop warms a generation on cold open.

Also replaces two Unicode `⇒` with `=>` in warm doc comments.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The DatasetCache trait doc linked [`super::block_list::open_pk_index`],
but open_pk_index is private and the path does not resolve, failing
rustdoc under -D warnings. Demote it to a plain code reference.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@hamersaw

Copy link
Copy Markdown
Contributor Author

@westonpace appreciate the review! I think I've addressed the claude review comments, but I'll follow up here.

On a separate matter, how are you planning on benchmarking things like this? This PR is to improve performance but how do we avoid regressing something like this going forwards?

This is a great question. Unfortunately I'm just doing manual testing right now, but we will need to build something more automated moving forwards. IMO the two options we have are (1) extending the current mem_wal benches to enable this caching or (2) testing at a higher level where this is already integrated. If it sounds reasonable, I would prefer to take this on as a follow up.

PR Comments

open_flushed_dataset warm-on-open uses empty pk_columns

We removed the pk_columns entirely, This was possible because we updated the active memtable dedup mechanism to use Lance indexes rather than a HashMap of primary keys. So we bypassed this with another PR.

Warm-on-open fires for all 3 query planners on every dataset open

Updated the docs on GenerationWarmer::warm to be explicit about the idempotency contract. The use of this trait is to be extremely optimistic about warming datasets, so we try to do it in multiple areas including (1) when we flush from active memtable to l0 cache -- the normal happy path (2) when we load a dataset -- WAL node restarts.

DatasetCache::retain_paths is sync in an async trait

Made everything async here.

GenerationWarmer trait is opaque about its contract

We removed the pk_columns parameter, so I think this is the same fix at ^^^.

Duplicate with_warmer pattern across 5 planners

This duplicates like 3 LoC for each planner. The workaround would be to consolidate this into a single struct rather than have each parameter be it's own function. Because it's not clear that we would want the configuration of each of these options to be bundled with the others I've left it as is. I think if this becomes unmanageable in the future it would make sense to refactor.

@hamersaw hamersaw requested a review from westonpace June 18, 2026 11:46
# Conflicts:
#	rust/lance/src/dataset/mem_wal/scanner.rs
#	rust/lance/src/dataset/mem_wal/scanner/block_list.rs
#	rust/lance/src/dataset/mem_wal/scanner/builder.rs

@westonpace westonpace left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, meant for previous review to be approved. No need to hold this up while we figure out regression benchmarking.

@hamersaw hamersaw merged commit d0d8dad into lance-format:main Jun 18, 2026
31 checks passed
@hamersaw hamersaw deleted the perf/wal-cache branch June 18, 2026 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants