[FEA] Out-of-core K-means improvements

We have received the following feedback from users: 
1. cuVS 26.6.0 host-resident multi-GPU streaming is a meaningful improvement because it allows the dataset to remain in host RAM and stream to GPUs, instead of requiring the full dataset to fit in GPU memory.

2. The RAFT comms path performed close to the cuML MNMG baseline in a benchmark setup: 16.29s per iteration versus 14.64s per iteration for cuML MNMG, about +11.3%.

3. The SNMG path was much slower in the user's test: around 256s per iteration on 8 GPUs, compared with an estimated around 30s per iteration for RAFT comms on 8 GPUs.

4. The user suspects the SNMG gap is mainly from lack of compute-transfer overlap, OMP thread scheduling overhead, batch-level barriers, and memory management overhead.

5. Their profiling also suggests the shared CUTLASS fused distance kernel is significantly slower than their cuBLAS GEMM baseline for this large-K workload.

Suggestions from user:

- [ ] Add pinned/page-locked memory support for streaming, or use internally allocated pinned staging buffers, to avoid pageable-memory bounce copy.
- [ ] Add double buffering or pipelined H2D transfer in SNMG so the next batch can transfer while the current batch is computing.
- [ ] Expose a native Python API path for RAFT comms resources, instead of requiring a duck-typing bridge around pylibraft.common.Handle.
- [ ] Consider a cuBLAS GEMM plus argmin path for large-K distance computation, or otherwise optimize the current CUTLASS fused distance kernel.
- [ ] Improve API usability around return-type consistency, adaptive streaming_batch_size, and fit progress reporting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEA] Out-of-core K-means improvements #2292

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[FEA] Out-of-core K-means improvements #2292

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions