Improve clickbench-sorted to better reflect sorted data#8584
Conversation
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 0.972x ➖ How to read Verdict and Engines
datafusion / vortex-file-compressed (0.972x ➖, 0↑ 0↓)
No file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.986x ➖, 0↑ 0↓)
datafusion / parquet (0.992x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.992x ➖, 0↑ 0↓)
duckdb / parquet (0.984x ➖, 0↑ 0↓)
File Size Changes (3 files changed, -46.3% overall, 0↑ 3↓)
Totals:
|
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.970x ➖, 0↑ 0↓)
datafusion / parquet (0.979x ➖, 2↑ 0↓)
datafusion / arrow (0.960x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (1.036x ➖, 0↑ 2↓)
duckdb / parquet (0.968x ➖, 1↑ 0↓)
File Size Changes (17 files changed, -44.4% overall, 3↑ 14↓)
Totals:
|
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.988x ➖, 1↑ 3↓)
datafusion / parquet (0.965x ➖, 6↑ 0↓)
duckdb / vortex-file-compressed (0.965x ➖, 20↑ 10↓)
duckdb / parquet (0.983x ➖, 2↑ 1↓)
File Size Changes (31 files changed, -43.4% overall, 1↑ 30↓)
Totals:
|
Merging this PR will improve performance by 11.11%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
273.6 ns | 244.4 ns | +11.93% |
| ⚡ | Simulation | encode_varbin[(1000, 2)] |
157.6 µs | 142.9 µs | +10.29% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing adamg/improve-codspeed-sorted (b3bb1f2) with develop (51752c8)2
Footnotes
-
4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
-
No successful run was found on
develop(aeae579) during the generation of this report, so 51752c8 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) How to read Verdict and Engines
duckdb / vortex-file-compressed (0.994x ➖, 0↑ 0↓)
duckdb / parquet (0.997x ➖, 0↑ 0↓)
File Size Changes (3 files changed, -32.3% overall, 0↑ 3↓)
Totals:
|
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.047x ➖, 1↑ 1↓)
datafusion / parquet (1.116x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (1.081x ➖, 0↑ 0↓)
duckdb / parquet (1.037x ➖, 0↑ 0↓)
|
Benchmarks: Clickbench Sorted on NVMEVerdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.052x ➖, 1↑ 2↓)
datafusion / parquet (1.009x ➖, 1↑ 2↓)
duckdb / vortex-file-compressed (0.990x ➖, 1↑ 1↓)
duckdb / parquet (1.073x ➖, 0↑ 3↓)
File Size Changes (301 files changed, -42.6% overall, 100↑ 201↓)
Totals:
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.175x ❌, 0↑ 19↓)
datafusion / parquet (1.145x ❌, 0↑ 19↓)
datafusion / arrow (1.180x ❌, 0↑ 21↓)
duckdb / vortex-file-compressed (1.168x ❌, 0↑ 20↓)
duckdb / parquet (1.087x ➖, 0↑ 10↓)
File Size Changes (47 files changed, -44.4% overall, 12↑ 35↓)
Totals:
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.067x ➖, 2↑ 18↓)
datafusion / parquet (1.114x ❌, 0↑ 25↓)
duckdb / vortex-file-compressed (1.083x ➖, 4↑ 23↓)
duckdb / parquet (1.050x ➖, 0↑ 11↓)
File Size Changes (201 files changed, -39.1% overall, 46↑ 155↓)
Totals:
|
Rationale for this change
The current setup is too sorted, which makes any sort pushdown inconsequential.
What changes are included in this PR?
While keeping the files globally sorted (now just by EventTime), their names are shuffled in a consistent way, so the their name's lexicographical order doesn't match the data's order.
What APIs are changed? Are there any user-facing changes?
Just benchmarks.