perf(datafusion): push down list_length expression#8600
Conversation
list_length expression
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | bitwise_not_vortex_buffer_mut[128] |
186.1 ns | 215.3 ns | -13.55% |
| ❌ | Simulation | chunked_varbinview_canonical_into[(100, 100)] |
224.5 µs | 259.6 µs | -13.51% |
| ❌ | Simulation | chunked_varbinview_into_canonical[(100, 100)] |
271.1 µs | 306.3 µs | -11.51% |
| ❌ | Simulation | bitwise_not_vortex_buffer_mut[1024] |
246.4 ns | 275.6 ns | -10.58% |
| ⚡ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
205.5 µs | 168.8 µs | +21.72% |
| ⚡ | WallTime | cuda/bitpacked_u8/unpack/3bw[100M] |
349.2 µs | 299.4 µs | +16.63% |
| ⚡ | Simulation | rebuild_naive |
111 µs | 98.6 µs | +12.5% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing mk/datafusion-list-length-pushdown (2083090) with mk/list-length (f4b87bb)
Footnotes
-
4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
list_length expressionlist_length expression
2759b92 to
916f77f
Compare
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
916f77f to
e7cbe87
Compare
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.041x ➖ How to read Verdict and Engines
datafusion / vortex-file-compressed (1.041x ➖, 0↑ 2↓)
No file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.973x ➖, 0↑ 0↓)
datafusion / parquet (1.011x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (0.970x ➖, 0↑ 0↓)
duckdb / parquet (1.007x ➖, 0↑ 0↓)
File Size Changes (3 files changed, -46.3% overall, 1↑ 2↓)
Totals:
|
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.981x ➖, 0↑ 0↓)
datafusion / parquet (0.989x ➖, 1↑ 1↓)
datafusion / arrow (1.012x ➖, 1↑ 1↓)
duckdb / vortex-file-compressed (0.994x ➖, 0↑ 0↓)
duckdb / parquet (1.001x ➖, 1↑ 0↓)
File Size Changes (17 files changed, -44.6% overall, 4↑ 13↓)
Totals:
|
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.040x ➖, 0↑ 4↓)
datafusion / parquet (1.033x ➖, 0↑ 5↓)
duckdb / vortex-file-compressed (1.036x ➖, 0↑ 4↓)
duckdb / parquet (1.016x ➖, 1↑ 3↓)
File Size Changes (31 files changed, -43.5% overall, 4↑ 27↓)
Totals:
|
Benchmarks: Clickbench Sorted on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.966x ➖, 1↑ 1↓)
datafusion / parquet (0.953x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (1.025x ➖, 1↑ 2↓)
duckdb / parquet (0.991x ➖, 0↑ 0↓)
File Size Changes (201 files changed, -42.6% overall, 56↑ 145↓)
Totals:
|
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.909x ➖, 0↑ 0↓)
datafusion / parquet (0.851x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.986x ➖, 0↑ 0↓)
duckdb / parquet (0.948x ➖, 0↑ 0↓)
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) How to read Verdict and Engines
duckdb / vortex-file-compressed (0.939x ➖, 1↑ 0↓)
duckdb / parquet (0.946x ➖, 0↑ 0↓)
File Size Changes (3 files changed, -32.3% overall, 0↑ 3↓)
Totals:
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.015x ➖, 0↑ 0↓)
datafusion / parquet (1.029x ➖, 0↑ 2↓)
datafusion / arrow (0.998x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.005x ➖, 0↑ 0↓)
duckdb / parquet (0.999x ➖, 0↑ 0↓)
File Size Changes (47 files changed, -44.5% overall, 5↑ 42↓)
Totals:
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.985x ➖, 3↑ 1↓)
datafusion / parquet (0.995x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.995x ➖, 2↑ 6↓)
duckdb / parquet (0.988x ➖, 0↑ 0↓)
File Size Changes (201 files changed, -39.1% overall, 54↑ 147↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.952x ➖, 1↑ 2↓)
datafusion / parquet (0.886x ➖, 5↑ 3↓)
duckdb / vortex-file-compressed (0.937x ➖, 0↑ 0↓)
duckdb / parquet (0.952x ➖, 0↑ 0↓)
|
Signed-off-by: Matt Katz <mhkatz97@gmail.com>
Pushes DataFusion's
array_length(expr)andarray_length(expr, 1)into the Vortex scan asvortex.list.length, computed from list offsets without materializing the element values. This is the DataFusion analogue of the DuckDBarray_lengthpushdown.Semantics
The rewrite is exact — both functions:
UInt64,0for a non-null empty list,nullfor a null list.The two-argument
array_length(expr, dim)form wheredim > 1has multidimensional semantics thatlist_lengthdoes not model, so it is explicitly rejected.Changes
datafusion-functions-nesteddependency (home of theArrayLengthUDF).Stacked on #8495