Make Shared transparent to parent kernels#8576
Conversation
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
|
I think we caused some weird race in github, but benchmarks seem to be running https://github.com/vortex-data/vortex/actions/runs/28109422384 |
Benchmarks: PolarSignals Profiling (base)Vortex (geomean): 0.963x ➖ How to read Verdict and Engines
datafusion / vortex-file-compressed (0.963x ➖, 1↑ 0↓)
No file size changes detected. |
Benchmarks: TPC-H SF=1 on NVME (base)Verdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.882x ✅, 16↑ 0↓)
datafusion / parquet (0.906x ➖, 10↑ 0↓)
datafusion / arrow (0.844x ✅, 19↑ 0↓)
duckdb / vortex-file-compressed (0.891x ✅, 16↑ 0↓)
duckdb / parquet (0.935x ➖, 3↑ 0↓)
File Size Changes (17 files changed, -44.4% overall, 5↑ 12↓)
Totals:
|
Benchmarks: FineWeb NVMe (base)Verdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.104x ❌, 0↑ 3↓)
datafusion / parquet (1.084x ➖, 0↑ 2↓)
duckdb / vortex-file-compressed (1.112x ❌, 0↑ 7↓)
duckdb / parquet (1.074x ➖, 0↑ 1↓)
File Size Changes (3 files changed, -46.3% overall, 0↑ 3↓)
Totals:
|
Benchmarks: TPC-DS SF=1 on NVME (base)Verdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.947x ➖, 17↑ 3↓)
datafusion / parquet (0.912x ➖, 31↑ 0↓)
duckdb / vortex-file-compressed (0.926x ➖, 29↑ 0↓)
duckdb / parquet (0.951x ➖, 2↑ 1↓)
File Size Changes (31 files changed, -43.5% overall, 2↑ 29↓)
Totals:
|
Benchmarks: FineWeb S3 (base)Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.592x ✅, 6↑ 0↓)
datafusion / parquet (0.861x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.786x ➖, 3↑ 0↓)
duckdb / parquet (0.941x ➖, 0↑ 0↓)
|
Benchmarks: Statistical and Population Genetics (base)Verdict: No clear signal (low confidence) How to read Verdict and Engines
duckdb / vortex-file-compressed (1.003x ➖, 0↑ 0↓)
duckdb / parquet (0.997x ➖, 0↑ 0↓)
File Size Changes (3 files changed, -32.3% overall, 0↑ 3↓)
Totals:
|
Benchmarks: Clickbench Sorted on NVME (base)Verdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.064x ➖, 1↑ 3↓)
datafusion / parquet (1.001x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.963x ➖, 2↑ 0↓)
duckdb / parquet (0.993x ➖, 0↑ 0↓)
File Size Changes (201 files changed, -42.6% overall, 40↑ 161↓)
Totals:
|
Benchmarks: TPC-H SF=10 on NVME (base)Verdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.020x ➖, 0↑ 2↓)
datafusion / parquet (1.007x ➖, 0↑ 0↓)
datafusion / arrow (1.012x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (1.003x ➖, 0↑ 0↓)
duckdb / parquet (1.002x ➖, 0↑ 0↓)
File Size Changes (47 files changed, -44.5% overall, 14↑ 33↓)
Totals:
|
Benchmarks: Clickbench on NVME (base)Verdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.986x ➖, 1↑ 4↓)
datafusion / parquet (0.983x ➖, 1↑ 2↓)
duckdb / vortex-file-compressed (0.969x ➖, 4↑ 3↓)
duckdb / parquet (0.967x ➖, 1↑ 0↓)
File Size Changes (201 files changed, -39.1% overall, 54↑ 147↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3 (base)Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.086x ➖, 1↑ 5↓)
datafusion / parquet (1.352x ❌, 0↑ 12↓)
duckdb / vortex-file-compressed (0.996x ➖, 0↑ 0↓)
duckdb / parquet (0.976x ➖, 3↑ 2↓)
|
Merging this PR will degrade performance by 10.66%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | bitwise_not_vortex_buffer_mut[128] |
244.4 ns | 273.6 ns | -10.66% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing ngates/onpair-split-1-shared-parent-kernels (9b988d7) with develop (2a19323)
Footnotes
-
4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
| ) -> VortexResult<Option<ArrayRef>> { | ||
| if let Some(result) = execute_parent_for_exact_child(parent, child, slot_idx, kernels, ctx)? { | ||
| return Ok(Some(result)); | ||
| } |
| let mut current = child.clone(); | ||
| while let Some(source) = current | ||
| .as_opt::<Shared>() | ||
| .map(|shared| shared.current_array_ref().clone()) | ||
| { | ||
| if let Some(result) = | ||
| execute_parent_for_exact_child(parent, &source, slot_idx, kernels, ctx)? | ||
| { | ||
| return Ok(Some(result)); | ||
| } | ||
| current = source; | ||
| } | ||
|
|
||
| Ok(None) |
There was a problem hiding this comment.
Is there a way to register a kernel for shared for all parents? Then we can avoid this
Rational for this change
SharedArrayis a cache wrapper, but parent-kernel execution currently treats it as an ordinary child encoding. That can force canonicalization before the wrapped array gets a chance to handle the parent operation itself. This makesSharedArraytransparent for parent reductions and parent execution, so existing encoding-specific kernels still fire through shared wrappers.No tracked issue.
What changes are included in this PR?
SharedArraynow delegates parent reduction to its current wrapped array. Parent execution also checks the wrapped array before falling back to ordinary execution, avoiding unnecessary cache population and canonicalization.What APIs are changed? Are there any user-facing changes?
No public API changes. Query results are unchanged; this only changes internal execution dispatch.