Skip to content

feat: add CollectLeft partition mode to SortMergeJoinExec#23111

Draft
Dandandan wants to merge 2 commits into
apache:mainfrom
Dandandan:feat/sort-merge-join-collect-left
Draft

feat: add CollectLeft partition mode to SortMergeJoinExec#23111
Dandandan wants to merge 2 commits into
apache:mainfrom
Dandandan:feat/sort-merge-join-collect-left

Conversation

@Dandandan

@Dandandan Dandandan commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

SortMergeJoinExec currently always runs in a symmetric, hash-partitioned mode:
both inputs are hash-partitioned on the join keys and sorted, and partition i
of the left is merge-joined with partition i of the right.

When one side is small, hash-repartitioning the large side is wasteful. For
hash joins this is already handled by PartitionMode::CollectLeft (collect the
small build side once, broadcast it, don't repartition the probe side). This PR
brings the same idea to sort-merge joins.

What changes are included in this PR?

Add PartitionMode::CollectLeft support to SortMergeJoinExec:

  • The left side is collected into a single, fully-sorted run (shared once
    across all right partitions via OnceAsync) and the right side is left
    un-repartitioned — each right partition merge-joins the full collected left,
    producing one output partition per right partition.
  • required_input_distribution becomes [SinglePartition, Unspecified] (the
    right keeps whatever partitioning it has; both sides still require their sort
    ordering), and output partitioning follows the right side.
  • Supported only for join types whose output is determined per right partition:
    Inner, Right, RightSemi, RightAnti, RightMark. Left-side joins
    (Left/LeftSemi/LeftAnti/LeftMark/Full) would require tracking
    left-row matches across all right partitions and are rejected by with_mode.
  • The JoinSelection physical-optimizer rule switches a Partitioned SMJ to
    CollectLeft when the join type is supported and the left side is estimated
    to be small enough, reusing the existing
    hash_join_single_partition_threshold / hash_join_single_partition_threshold_rows
    thresholds. EnsureRequirements then collapses the left to one sorted run
    (SortExec / SortPreservingMergeExec) and leaves the right un-hash-partitioned.
  • The mode is shown in EXPLAIN (only when CollectLeft, to avoid churning
    existing Partitioned plans) and is preserved across with_new_children and
    projection pushdown; partition_statistics is made mode-aware (mirroring
    HashJoinExec).

The existing k-way merge / streamed-vs-buffered execution is reused unchanged:
the collected left is replayed as the left input to each right partition's join.

Are these changes tested?

Yes:

  • Unit tests (physical-plan): CollectLeft output equals the default
    Partitioned output for all supported join types over a multi-partition
    right; unsupported join types are rejected; mode-aware partition_statistics
    over a multi-partition right.
  • Integration tests (core): JoinSelection selects CollectLeft for a small
    left, and stays Partitioned for a big left or an unsupported join type.
  • sqllogictest: sort_merge_join.slt / joins.slt exercise the full pipeline
    (JoinSelectionEnsureRequirementsSanityCheckPlan → execution) with
    prefer_hash_join = false; results are unchanged, only EXPLAIN plans change
    (no hash repartition on the join key; left collapsed to one sorted run). A new
    regression in sort_merge_join.slt covers a narrowing projection pushed into a
    CollectLeft join over a multi-partition right.

Are there any user-facing changes?

EXPLAIN output now shows mode=CollectLeft on sort-merge joins that use the
new mode, and such plans avoid hash-repartitioning the right input. No public
API breakage (the mode defaults to Partitioned; with_mode is additive). No
new configuration options.

🤖 Generated with Claude Code

Add `PartitionMode::CollectLeft` for `SortMergeJoinExec`. The left side is
collected into a single sorted run shared across all right partitions, and the
right side is left un-repartitioned (one output partition per right partition).
This avoids hash-repartitioning the right side and is beneficial when the left
is small and the right is large (analogous to a broadcast hash join, but for
sort-merge).

Supported only for join types whose output is determined per right partition:
Inner, Right, RightSemi, RightAnti, RightMark. Left-side joins would require
tracking left-row matches across all right partitions and are not supported.

The `JoinSelection` optimizer rule now switches a Partitioned SMJ to CollectLeft
when the join type is supported and the left side is estimated small enough
(reusing `hash_join_single_partition_threshold[_rows]`).

- exec.rs: `mode` + shared `left_fut` (OnceAsync); `with_mode()` validation;
  collect-once + replay-per-partition in `execute()`; `required_input_distribution`
  = [SinglePartition, Unspecified]; asymmetric output partitioning; mode-aware
  `partition_statistics`; mode shown in EXPLAIN; mode preserved through
  `with_new_children` and projection pushdown.
- join_selection.rs: `try_collect_left_sort_merge_join` branch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Dandandan

Copy link
Copy Markdown
Contributor Author

run benchmarks

env:
PREFER_HASH_JOIN: false

@github-actions github-actions Bot added optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels Jun 23, 2026
@Dandandan Dandandan requested a review from mbutrovich June 23, 2026 09:05
@Dandandan Dandandan marked this pull request as ready for review June 23, 2026 09:05
@adriangbot

Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4777525377-620-c2l4b 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/sort-merge-join-collect-left (f7412c8) to 3f4bcf1 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4777525377-621-j7llh 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/sort-merge-join-collect-left (f7412c8) to 3f4bcf1 (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4777525377-622-l8qzz 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/sort-merge-join-collect-left (f7412c8) to 3f4bcf1 (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@Dandandan Dandandan requested a review from neilconway June 23, 2026 09:10
Resolve conflict in sort_merge_join/exec.rs: upstream replaced
`partition_statistics(Option<usize>)` with `statistics_with_args(&StatisticsArgs)`.
Re-apply the CollectLeft mode-aware logic in the new API (mirroring HashJoinExec):
CollectLeft uses the full left stats + per-partition right stats; Partitioned
uses per-partition stats for both. Update the SMJ statistics regression test to
call `statistics_with_args`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@adriangbot

Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_sort-merge-join-collect-left
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃ feat_sort-merge-join-collect-left ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1  │ 37.99 / 39.48 ±2.64 / 44.75 ms │    38.04 / 40.45 ±3.69 / 47.81 ms │ no change │
│ QQuery 2  │ 18.57 / 18.92 ±0.30 / 19.43 ms │    18.68 / 19.31 ±0.54 / 20.32 ms │ no change │
│ QQuery 3  │ 30.82 / 32.04 ±1.02 / 33.55 ms │    33.31 / 33.43 ±0.09 / 33.54 ms │ no change │
│ QQuery 4  │ 17.15 / 17.65 ±0.65 / 18.91 ms │    17.29 / 17.88 ±0.54 / 18.59 ms │ no change │
│ QQuery 5  │ 37.95 / 39.96 ±1.53 / 41.63 ms │    37.55 / 39.54 ±1.52 / 41.31 ms │ no change │
│ QQuery 6  │ 15.96 / 16.22 ±0.15 / 16.37 ms │    16.09 / 16.39 ±0.40 / 17.17 ms │ no change │
│ QQuery 7  │ 44.45 / 45.36 ±1.11 / 47.51 ms │    45.09 / 46.84 ±1.84 / 50.16 ms │ no change │
│ QQuery 8  │ 42.53 / 42.90 ±0.30 / 43.44 ms │    42.47 / 43.27 ±0.92 / 45.06 ms │ no change │
│ QQuery 9  │ 49.04 / 50.13 ±0.70 / 51.02 ms │    49.15 / 50.06 ±0.77 / 51.29 ms │ no change │
│ QQuery 10 │ 41.92 / 42.34 ±0.29 / 42.79 ms │    42.09 / 42.59 ±0.69 / 43.96 ms │ no change │
│ QQuery 11 │ 13.31 / 13.41 ±0.11 / 13.61 ms │    13.10 / 13.31 ±0.23 / 13.72 ms │ no change │
│ QQuery 12 │ 23.87 / 24.22 ±0.24 / 24.62 ms │    23.64 / 24.39 ±0.47 / 25.00 ms │ no change │
│ QQuery 13 │ 33.08 / 35.72 ±2.47 / 39.36 ms │    34.21 / 36.15 ±1.54 / 38.31 ms │ no change │
│ QQuery 14 │ 23.27 / 24.09 ±0.65 / 25.24 ms │    23.48 / 23.64 ±0.18 / 23.94 ms │ no change │
│ QQuery 15 │ 30.72 / 31.67 ±1.07 / 33.07 ms │    31.14 / 31.46 ±0.26 / 31.84 ms │ no change │
│ QQuery 16 │ 13.86 / 14.04 ±0.17 / 14.29 ms │    14.02 / 14.24 ±0.15 / 14.41 ms │ no change │
│ QQuery 17 │ 73.99 / 74.86 ±0.84 / 76.40 ms │    74.43 / 75.13 ±0.55 / 75.61 ms │ no change │
│ QQuery 18 │ 59.15 / 61.49 ±1.74 / 64.34 ms │    59.29 / 60.32 ±0.62 / 61.02 ms │ no change │
│ QQuery 19 │ 32.66 / 33.56 ±1.00 / 35.41 ms │    32.83 / 33.18 ±0.39 / 33.92 ms │ no change │
│ QQuery 20 │ 31.46 / 31.95 ±0.38 / 32.41 ms │    32.15 / 32.67 ±0.39 / 33.16 ms │ no change │
│ QQuery 21 │ 54.80 / 57.28 ±1.99 / 60.18 ms │    55.96 / 56.30 ±0.27 / 56.58 ms │ no change │
│ QQuery 22 │ 13.99 / 14.18 ±0.14 / 14.41 ms │    13.97 / 14.14 ±0.13 / 14.35 ms │ no change │
└───────────┴────────────────────────────────┴───────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                                ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 761.49ms │
│ Total Time (feat_sort-merge-join-collect-left)   │ 764.70ms │
│ Average Time (HEAD)                              │  34.61ms │
│ Average Time (feat_sort-merge-join-collect-left) │  34.76ms │
│ Queries Faster                                   │        0 │
│ Queries Slower                                   │        0 │
│ Queries with No Change                           │       22 │
│ Queries with Failure                             │        0 │
└──────────────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric Value
Wall time 5.0s
Peak memory 1.2 GiB
Avg memory 529.6 MiB
CPU user 22.1s
CPU sys 1.6s
Peak spill 0 B

tpch — branch

Metric Value
Wall time 5.0s
Peak memory 1.2 GiB
Avg memory 521.7 MiB
CPU user 22.4s
CPU sys 1.5s
Peak spill 0 B

File an issue against this benchmark runner

@Dandandan

Copy link
Copy Markdown
Contributor Author

run benchmark tpch10

env:
PREFER_HASH_JOIN: false

@adriangbot

Copy link
Copy Markdown

Hi @Dandandan, your benchmark configuration could not be parsed (#23111 (comment)).

Error: invalid configuration: unknown field PREFER_HASH_JOIN, expected one of env, baseline, changed at line 3 column 1

Usage:

run benchmark <name>           # run specific benchmark(s)
run benchmarks                 # run default suite
run benchmarks <name1> <name2> # run specific benchmarks

Any benchmark name is accepted: bench.sh suite names (e.g. tpch, clickbench_partitioned, wide_schema) and Criterion bench targets (e.g. sql_planner) are resolved automatically. A name that matches neither fails on the runner.

Per-side configuration (run benchmark tpch followed by):

env:
# shared env is inherited by BOTH the build and the run, so build
# flags go here. Builds default to no debuginfo for speed; opt back
# in for hung-job gdb dumps and cap jobs to stay within memory:
CARGO_PROFILE_RELEASE_DEBUG: "1"
CARGO_BUILD_JOBS: "1"
baseline:
ref: v45.0.0
env:
# per-side env only reaches the benchmark run, not the build
DATAFUSION_RUNTIME_MEMORY_LIMIT: 1G
changed:
ref: v46.0.0
env:
DATAFUSION_RUNTIME_MEMORY_LIMIT: 2G

File an issue against this benchmark runner

@Dandandan

Copy link
Copy Markdown
Contributor Author

run benchmark tpch10

env:
    PREFER_HASH_JOIN: false

@Dandandan

Copy link
Copy Markdown
Contributor Author

run benchmarks

env:
    PREFER_HASH_JOIN: false

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4777704671-624-gv68c 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/sort-merge-join-collect-left (93f94c1) to 46b508e (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_sort-merge-join-collect-left
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃     feat_sort-merge-join-collect-left ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.19 / 3.37 ±4.28 / 11.94 ms │          1.21 / 3.38 ±4.24 / 11.87 ms │     no change │
│ QQuery 1  │        12.73 / 12.90 ±0.10 / 13.00 ms │        12.95 / 13.15 ±0.15 / 13.31 ms │     no change │
│ QQuery 2  │        35.56 / 36.06 ±0.38 / 36.61 ms │        35.97 / 36.20 ±0.24 / 36.66 ms │     no change │
│ QQuery 3  │        29.88 / 30.62 ±0.74 / 31.90 ms │        30.18 / 30.32 ±0.12 / 30.48 ms │     no change │
│ QQuery 4  │     221.36 / 226.11 ±3.10 / 230.51 ms │     222.53 / 227.27 ±3.16 / 231.60 ms │     no change │
│ QQuery 5  │     268.61 / 272.34 ±2.10 / 274.83 ms │     270.35 / 274.23 ±2.34 / 277.50 ms │     no change │
│ QQuery 6  │           1.21 / 1.37 ±0.24 / 1.84 ms │           1.23 / 1.37 ±0.22 / 1.81 ms │     no change │
│ QQuery 7  │        13.74 / 13.86 ±0.12 / 14.09 ms │        14.11 / 14.19 ±0.07 / 14.31 ms │     no change │
│ QQuery 8  │     322.23 / 324.84 ±2.49 / 328.97 ms │     320.83 / 322.78 ±1.80 / 325.39 ms │     no change │
│ QQuery 9  │     453.77 / 458.43 ±3.19 / 463.79 ms │     454.90 / 463.14 ±6.11 / 469.37 ms │     no change │
│ QQuery 10 │        70.11 / 70.96 ±0.90 / 72.70 ms │        68.80 / 71.37 ±3.84 / 79.01 ms │     no change │
│ QQuery 11 │        81.41 / 82.45 ±0.97 / 84.19 ms │        80.68 / 81.65 ±0.63 / 82.42 ms │     no change │
│ QQuery 12 │     266.27 / 271.18 ±3.10 / 274.74 ms │     264.60 / 272.70 ±6.96 / 284.81 ms │     no change │
│ QQuery 13 │     366.90 / 377.14 ±8.90 / 390.88 ms │     367.28 / 375.90 ±8.95 / 390.81 ms │     no change │
│ QQuery 14 │     279.71 / 284.51 ±3.31 / 288.26 ms │    280.08 / 289.79 ±12.61 / 314.41 ms │     no change │
│ QQuery 15 │     270.55 / 278.07 ±9.12 / 295.37 ms │     272.53 / 279.79 ±5.15 / 287.26 ms │     no change │
│ QQuery 16 │     611.04 / 625.05 ±9.88 / 635.20 ms │     612.04 / 621.54 ±7.09 / 632.82 ms │     no change │
│ QQuery 17 │     615.94 / 619.48 ±2.27 / 621.86 ms │     617.22 / 629.03 ±6.74 / 638.08 ms │     no change │
│ QQuery 18 │ 1248.19 / 1270.07 ±11.29 / 1278.89 ms │ 1253.14 / 1276.79 ±30.59 / 1336.67 ms │     no change │
│ QQuery 19 │        27.96 / 31.93 ±7.76 / 47.46 ms │        28.13 / 32.03 ±7.18 / 46.36 ms │     no change │
│ QQuery 20 │     515.28 / 524.56 ±8.65 / 539.34 ms │     522.33 / 532.22 ±7.43 / 540.58 ms │     no change │
│ QQuery 21 │     506.75 / 519.33 ±6.52 / 524.40 ms │     519.69 / 522.71 ±2.52 / 525.87 ms │     no change │
│ QQuery 22 │   969.70 / 989.82 ±11.11 / 1002.21 ms │  997.79 / 1027.79 ±16.48 / 1045.15 ms │     no change │
│ QQuery 23 │ 3149.34 / 3228.33 ±43.78 / 3280.53 ms │ 3161.99 / 3197.90 ±30.59 / 3237.80 ms │     no change │
│ QQuery 24 │        40.59 / 41.34 ±0.81 / 42.80 ms │        40.82 / 41.09 ±0.26 / 41.55 ms │     no change │
│ QQuery 25 │     110.25 / 112.31 ±2.01 / 115.97 ms │     111.42 / 114.93 ±4.54 / 123.83 ms │     no change │
│ QQuery 26 │        41.24 / 42.80 ±2.34 / 47.41 ms │        41.51 / 41.92 ±0.35 / 42.51 ms │     no change │
│ QQuery 27 │     662.90 / 671.46 ±7.54 / 684.74 ms │     673.14 / 675.75 ±1.60 / 678.18 ms │     no change │
│ QQuery 28 │  3017.78 / 3028.97 ±9.64 / 3046.34 ms │ 3031.12 / 3066.74 ±42.53 / 3149.80 ms │     no change │
│ QQuery 29 │       41.53 / 57.53 ±13.96 / 74.00 ms │        40.41 / 50.46 ±8.07 / 58.30 ms │ +1.14x faster │
│ QQuery 30 │    299.06 / 308.15 ±10.47 / 327.86 ms │     300.61 / 310.58 ±8.32 / 325.49 ms │     no change │
│ QQuery 31 │    286.20 / 295.45 ±11.55 / 317.91 ms │    287.84 / 298.38 ±11.15 / 319.75 ms │     no change │
│ QQuery 32 │   941.29 / 976.95 ±36.04 / 1045.92 ms │     950.00 / 962.61 ±7.32 / 972.57 ms │     no change │
│ QQuery 33 │ 1440.61 / 1479.90 ±29.08 / 1518.89 ms │ 1471.19 / 1527.72 ±34.92 / 1571.63 ms │     no change │
│ QQuery 34 │ 1472.77 / 1508.41 ±24.48 / 1537.79 ms │ 1482.65 / 1536.15 ±41.23 / 1592.40 ms │     no change │
│ QQuery 35 │    287.78 / 318.88 ±53.21 / 425.08 ms │    282.18 / 300.40 ±13.10 / 320.60 ms │ +1.06x faster │
│ QQuery 36 │        65.04 / 67.82 ±1.94 / 70.33 ms │        65.70 / 73.57 ±4.00 / 76.51 ms │  1.08x slower │
│ QQuery 37 │        35.50 / 37.36 ±2.31 / 41.90 ms │        35.46 / 38.16 ±3.36 / 44.53 ms │     no change │
│ QQuery 38 │        40.84 / 45.23 ±3.61 / 51.52 ms │        39.31 / 42.50 ±5.05 / 52.46 ms │ +1.06x faster │
│ QQuery 39 │     135.69 / 148.04 ±8.19 / 157.30 ms │     144.50 / 150.60 ±4.85 / 159.08 ms │     no change │
│ QQuery 40 │        13.77 / 14.05 ±0.29 / 14.57 ms │        13.91 / 14.37 ±0.36 / 14.89 ms │     no change │
│ QQuery 41 │        14.16 / 19.00 ±4.11 / 24.15 ms │        13.37 / 13.66 ±0.16 / 13.83 ms │ +1.39x faster │
│ QQuery 42 │        12.82 / 14.05 ±1.92 / 17.88 ms │        12.91 / 15.80 ±5.38 / 26.55 ms │  1.12x slower │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 19740.49ms │
│ Total Time (feat_sort-merge-join-collect-left)   │ 19872.62ms │
│ Average Time (HEAD)                              │   459.08ms │
│ Average Time (feat_sort-merge-join-collect-left) │   462.15ms │
│ Queries Faster                                   │          4 │
│ Queries Slower                                   │          2 │
│ Queries with No Change                           │         37 │
│ Queries with Failure                             │          0 │
└──────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 100.0s
Peak memory 10.2 GiB
Avg memory 4.1 GiB
CPU user 1010.4s
CPU sys 69.2s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 100.0s
Peak memory 11.3 GiB
Avg memory 4.5 GiB
CPU user 1011.4s
CPU sys 70.8s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4777704671-625-nprg5 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/sort-merge-join-collect-left (93f94c1) to 46b508e (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4777704671-626-6ll8s 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/sort-merge-join-collect-left (93f94c1) to 46b508e (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4777699451-623-4xq5f 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/sort-merge-join-collect-left (93f94c1) to 46b508e (merge-base) diff using: tpch10
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_sort-merge-join-collect-left
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                   HEAD ┃      feat_sort-merge-join-collect-left ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │         35.98 / 36.82 ±1.21 / 39.21 ms │         37.21 / 37.97 ±0.98 / 39.89 ms │     no change │
│ QQuery 2  │         81.93 / 82.11 ±0.18 / 82.44 ms │         81.33 / 81.68 ±0.25 / 82.10 ms │     no change │
│ QQuery 3  │         51.47 / 53.05 ±0.91 / 53.96 ms │       77.57 / 85.55 ±14.89 / 115.31 ms │  1.61x slower │
│ QQuery 4  │     803.84 / 820.56 ±11.00 / 832.04 ms │  1451.18 / 1485.05 ±27.63 / 1526.19 ms │  1.81x slower │
│ QQuery 5  │      107.03 / 112.61 ±3.83 / 118.35 ms │      104.41 / 108.55 ±2.11 / 110.13 ms │     no change │
│ QQuery 6  │         74.89 / 75.91 ±0.98 / 77.56 ms │      138.85 / 144.46 ±3.98 / 150.11 ms │  1.90x slower │
│ QQuery 7  │      119.91 / 128.02 ±8.68 / 144.45 ms │      117.72 / 126.49 ±7.05 / 137.79 ms │     no change │
│ QQuery 8  │         68.06 / 68.93 ±0.58 / 69.67 ms │         71.74 / 72.93 ±1.07 / 74.55 ms │  1.06x slower │
│ QQuery 9  │         52.23 / 54.77 ±2.73 / 59.73 ms │         51.89 / 54.52 ±2.30 / 58.62 ms │     no change │
│ QQuery 10 │      140.86 / 143.95 ±3.83 / 150.38 ms │      167.21 / 176.15 ±6.63 / 184.17 ms │  1.22x slower │
│ QQuery 11 │     424.84 / 446.37 ±17.84 / 470.39 ms │      786.10 / 794.62 ±6.56 / 804.99 ms │  1.78x slower │
│ QQuery 12 │         32.57 / 32.96 ±0.31 / 33.50 ms │         32.21 / 32.45 ±0.25 / 32.90 ms │     no change │
│ QQuery 13 │      123.69 / 124.49 ±0.56 / 125.35 ms │      125.29 / 129.16 ±3.98 / 136.42 ms │     no change │
│ QQuery 14 │     885.94 / 899.91 ±10.50 / 917.71 ms │      903.22 / 909.67 ±5.43 / 916.61 ms │     no change │
│ QQuery 15 │         54.62 / 55.22 ±0.48 / 56.05 ms │         54.99 / 57.56 ±2.58 / 62.05 ms │     no change │
│ QQuery 16 │         77.95 / 79.90 ±2.33 / 84.39 ms │         79.80 / 81.68 ±1.75 / 84.56 ms │     no change │
│ QQuery 17 │      173.14 / 178.48 ±6.95 / 192.14 ms │      173.75 / 175.67 ±1.69 / 178.69 ms │     no change │
│ QQuery 18 │      156.03 / 163.16 ±6.43 / 174.20 ms │      154.78 / 159.37 ±3.94 / 166.05 ms │     no change │
│ QQuery 19 │         76.08 / 78.57 ±2.12 / 81.58 ms │      100.01 / 100.66 ±0.63 / 101.74 ms │  1.28x slower │
│ QQuery 20 │         44.99 / 46.10 ±1.21 / 48.43 ms │         44.88 / 47.27 ±3.63 / 54.45 ms │     no change │
│ QQuery 21 │      465.58 / 473.31 ±7.79 / 485.18 ms │      470.69 / 482.78 ±8.04 / 494.08 ms │     no change │
│ QQuery 22 │      129.38 / 136.01 ±4.80 / 143.46 ms │      128.96 / 133.97 ±5.92 / 145.58 ms │     no change │
│ QQuery 23 │      616.07 / 619.36 ±2.43 / 623.06 ms │     619.23 / 632.26 ±13.50 / 657.94 ms │     no change │
│ QQuery 24 │      356.85 / 362.81 ±6.65 / 374.66 ms │      361.78 / 368.41 ±3.84 / 371.97 ms │     no change │
│ QQuery 25 │      189.30 / 194.38 ±2.66 / 196.80 ms │      191.28 / 198.93 ±7.31 / 212.02 ms │     no change │
│ QQuery 26 │      100.02 / 103.08 ±2.49 / 105.92 ms │      100.92 / 106.18 ±5.29 / 115.89 ms │     no change │
│ QQuery 27 │      121.93 / 128.37 ±6.08 / 138.07 ms │      122.24 / 125.74 ±2.93 / 129.89 ms │     no change │
│ QQuery 28 │         55.78 / 59.00 ±2.48 / 61.22 ms │         56.93 / 59.68 ±2.24 / 62.05 ms │     no change │
│ QQuery 29 │      173.00 / 175.16 ±3.09 / 181.22 ms │      172.56 / 177.41 ±5.97 / 188.50 ms │     no change │
│ QQuery 30 │         46.20 / 47.43 ±0.90 / 48.52 ms │         45.76 / 47.01 ±1.65 / 50.16 ms │     no change │
│ QQuery 31 │      249.12 / 252.17 ±4.95 / 262.05 ms │      199.71 / 203.34 ±2.20 / 206.27 ms │ +1.24x faster │
│ QQuery 32 │         70.51 / 73.19 ±2.94 / 78.89 ms │         68.66 / 71.78 ±2.67 / 75.44 ms │     no change │
│ QQuery 33 │         72.67 / 74.59 ±1.46 / 77.01 ms │         72.82 / 74.99 ±1.63 / 77.60 ms │     no change │
│ QQuery 34 │         71.15 / 73.99 ±3.86 / 81.59 ms │         72.43 / 73.02 ±0.90 / 74.80 ms │     no change │
│ QQuery 35 │      133.26 / 135.01 ±1.88 / 138.61 ms │      166.08 / 172.14 ±4.60 / 178.68 ms │  1.28x slower │
│ QQuery 36 │         88.23 / 90.99 ±2.02 / 93.52 ms │         88.98 / 91.25 ±2.80 / 96.67 ms │     no change │
│ QQuery 37 │         42.64 / 45.32 ±1.81 / 48.01 ms │         60.69 / 63.95 ±4.85 / 73.54 ms │  1.41x slower │
│ QQuery 38 │      105.15 / 105.24 ±0.10 / 105.40 ms │      105.02 / 106.60 ±2.18 / 110.92 ms │     no change │
│ QQuery 39 │ 7712.35 / 8824.84 ±634.74 / 9523.69 ms │ 7900.01 / 8861.03 ±546.56 / 9430.64 ms │     no change │
│ QQuery 40 │    117.61 / 188.64 ±124.31 / 436.19 ms │    114.86 / 217.28 ±185.90 / 588.43 ms │  1.15x slower │
│ QQuery 41 │        11.66 / 24.21 ±24.41 / 73.03 ms │         11.38 / 11.60 ±0.27 / 12.08 ms │ +2.09x faster │
│ QQuery 42 │         54.95 / 56.96 ±2.00 / 60.31 ms │         74.92 / 76.12 ±0.91 / 77.61 ms │  1.34x slower │
│ QQuery 43 │         59.00 / 61.62 ±1.36 / 62.72 ms │        89.29 / 93.20 ±5.13 / 102.93 ms │  1.51x slower │
│ QQuery 44 │         12.25 / 16.23 ±7.23 / 30.68 ms │         11.34 / 13.65 ±3.79 / 21.20 ms │ +1.19x faster │
│ QQuery 45 │         48.77 / 50.32 ±1.76 / 52.80 ms │         49.75 / 51.92 ±1.97 / 54.44 ms │     no change │
│ QQuery 46 │      130.54 / 138.03 ±6.78 / 148.68 ms │     126.94 / 138.98 ±10.22 / 157.96 ms │     no change │
│ QQuery 47 │     407.84 / 424.26 ±18.30 / 458.74 ms │     719.46 / 737.43 ±22.64 / 778.75 ms │  1.74x slower │
│ QQuery 48 │     124.55 / 136.12 ±14.99 / 165.77 ms │      122.94 / 135.40 ±9.94 / 147.75 ms │     no change │
│ QQuery 49 │         80.67 / 83.04 ±2.01 / 86.71 ms │         81.40 / 82.56 ±0.86 / 83.95 ms │     no change │
│ QQuery 50 │        94.17 / 98.49 ±4.21 / 105.79 ms │       93.91 / 100.83 ±5.31 / 107.29 ms │     no change │
│ QQuery 51 │      129.83 / 136.75 ±7.69 / 150.33 ms │      127.93 / 129.62 ±1.03 / 131.10 ms │ +1.06x faster │
│ QQuery 52 │         53.46 / 55.25 ±1.33 / 56.64 ms │       76.84 / 86.23 ±11.52 / 107.71 ms │  1.56x slower │
│ QQuery 53 │         61.36 / 63.69 ±2.61 / 68.39 ms │      107.11 / 109.97 ±1.83 / 112.12 ms │  1.73x slower │
│ QQuery 54 │      107.78 / 113.76 ±4.94 / 119.41 ms │     130.60 / 144.39 ±18.62 / 180.83 ms │  1.27x slower │
│ QQuery 55 │         52.39 / 54.64 ±1.60 / 56.35 ms │         72.00 / 74.37 ±1.54 / 76.54 ms │  1.36x slower │
│ QQuery 56 │         75.99 / 81.28 ±3.96 / 87.29 ms │         78.20 / 80.60 ±1.49 / 82.37 ms │     no change │
│ QQuery 57 │      236.52 / 246.57 ±7.86 / 259.42 ms │     580.49 / 606.40 ±16.95 / 629.10 ms │  2.46x slower │
│ QQuery 58 │      202.87 / 210.19 ±8.57 / 226.56 ms │      212.67 / 219.60 ±6.61 / 227.61 ms │     no change │
│ QQuery 59 │      127.56 / 136.63 ±8.65 / 148.64 ms │      125.96 / 129.92 ±4.44 / 138.54 ms │     no change │
│ QQuery 60 │         78.52 / 82.33 ±4.22 / 89.04 ms │         76.24 / 81.57 ±6.21 / 92.56 ms │     no change │
│ QQuery 61 │      106.99 / 115.17 ±9.50 / 133.49 ms │     112.78 / 126.53 ±14.96 / 147.34 ms │  1.10x slower │
│ QQuery 62 │         72.49 / 73.96 ±0.96 / 75.41 ms │         71.64 / 80.27 ±7.14 / 89.39 ms │  1.09x slower │
│ QQuery 63 │         65.02 / 67.05 ±2.53 / 72.00 ms │      108.26 / 110.86 ±1.85 / 113.58 ms │  1.65x slower │
│ QQuery 64 │     741.17 / 785.52 ±38.50 / 833.75 ms │     762.38 / 779.93 ±16.86 / 809.10 ms │     no change │
│ QQuery 65 │      118.90 / 123.83 ±4.50 / 130.92 ms │      117.76 / 127.87 ±8.84 / 141.43 ms │     no change │
│ QQuery 66 │      177.10 / 187.65 ±6.18 / 194.96 ms │      188.96 / 194.99 ±3.52 / 198.71 ms │     no change │
│ QQuery 67 │     141.86 / 155.22 ±10.83 / 169.87 ms │      142.39 / 145.97 ±3.62 / 152.29 ms │ +1.06x faster │
│ QQuery 68 │      146.64 / 155.18 ±7.10 / 163.50 ms │     140.81 / 152.19 ±11.64 / 173.77 ms │     no change │
│ QQuery 69 │     154.75 / 164.04 ±11.00 / 180.41 ms │     177.35 / 191.22 ±17.93 / 225.22 ms │  1.17x slower │
│ QQuery 70 │      231.36 / 239.33 ±4.74 / 244.04 ms │      224.05 / 232.56 ±8.75 / 245.99 ms │     no change │
│ QQuery 71 │         68.70 / 73.23 ±4.73 / 79.23 ms │         68.81 / 73.06 ±2.16 / 74.70 ms │     no change │
│ QQuery 72 │  8108.85 / 8161.75 ±83.53 / 8327.01 ms │ 7563.96 / 8152.15 ±374.57 / 8726.55 ms │     no change │
│ QQuery 73 │      71.94 / 131.66 ±90.08 / 305.54 ms │     72.73 / 131.92 ±114.38 / 360.65 ms │     no change │
│ QQuery 74 │     295.70 / 346.55 ±72.32 / 483.26 ms │     584.77 / 603.06 ±16.43 / 624.28 ms │  1.74x slower │
│ QQuery 75 │     222.78 / 238.21 ±23.58 / 284.89 ms │     223.59 / 235.24 ±14.15 / 262.84 ms │     no change │
│ QQuery 76 │         38.05 / 39.38 ±1.16 / 41.21 ms │         37.73 / 38.55 ±0.54 / 39.11 ms │     no change │
│ QQuery 77 │      107.30 / 113.93 ±3.64 / 118.31 ms │       94.98 / 103.11 ±5.13 / 109.68 ms │ +1.10x faster │
│ QQuery 78 │     289.72 / 304.69 ±16.25 / 333.52 ms │     285.86 / 301.91 ±13.40 / 322.06 ms │     no change │
│ QQuery 79 │     126.32 / 135.84 ±10.88 / 156.24 ms │      129.16 / 136.39 ±7.46 / 150.09 ms │     no change │
│ QQuery 80 │     233.28 / 245.20 ±15.11 / 274.55 ms │      231.08 / 242.38 ±9.84 / 260.63 ms │     no change │
│ QQuery 81 │         44.17 / 45.93 ±2.64 / 51.18 ms │         39.55 / 42.25 ±2.83 / 46.53 ms │ +1.09x faster │
│ QQuery 82 │        56.79 / 67.03 ±11.20 / 88.86 ms │         73.19 / 77.91 ±3.49 / 82.52 ms │  1.16x slower │
│ QQuery 83 │         55.50 / 56.54 ±1.20 / 58.13 ms │         55.24 / 56.64 ±0.74 / 57.41 ms │     no change │
│ QQuery 84 │         50.92 / 52.22 ±2.26 / 56.74 ms │         79.23 / 83.12 ±4.42 / 90.51 ms │  1.59x slower │
│ QQuery 85 │      136.90 / 145.77 ±5.72 / 153.34 ms │      136.24 / 140.30 ±3.53 / 146.29 ms │     no change │
│ QQuery 86 │         27.69 / 29.10 ±1.43 / 30.88 ms │         26.60 / 27.27 ±0.37 / 27.66 ms │ +1.07x faster │
│ QQuery 87 │      107.67 / 109.77 ±1.65 / 111.98 ms │      108.34 / 116.83 ±7.65 / 127.03 ms │  1.06x slower │
│ QQuery 88 │      187.75 / 193.44 ±4.34 / 200.07 ms │      189.20 / 194.60 ±4.91 / 203.13 ms │     no change │
│ QQuery 89 │         70.58 / 72.84 ±2.14 / 76.38 ms │      116.19 / 117.96 ±2.08 / 122.02 ms │  1.62x slower │
│ QQuery 90 │         27.63 / 28.50 ±0.88 / 30.09 ms │         27.91 / 29.86 ±3.00 / 35.76 ms │     no change │
│ QQuery 91 │         52.60 / 53.78 ±1.35 / 56.36 ms │         62.14 / 63.28 ±0.71 / 64.04 ms │  1.18x slower │
│ QQuery 92 │         48.64 / 52.80 ±3.54 / 58.04 ms │         48.72 / 50.00 ±0.74 / 50.98 ms │ +1.06x faster │
│ QQuery 93 │      222.52 / 229.25 ±5.09 / 238.31 ms │      223.44 / 228.67 ±5.04 / 236.99 ms │     no change │
│ QQuery 94 │         61.61 / 65.58 ±3.94 / 72.80 ms │         61.71 / 70.75 ±6.34 / 78.68 ms │  1.08x slower │
│ QQuery 95 │      159.26 / 171.11 ±6.05 / 176.21 ms │      152.13 / 160.54 ±5.15 / 166.12 ms │ +1.07x faster │
│ QQuery 96 │         50.06 / 50.78 ±0.41 / 51.24 ms │         48.61 / 53.84 ±6.03 / 65.49 ms │  1.06x slower │
│ QQuery 97 │         82.16 / 85.86 ±2.45 / 88.99 ms │         83.95 / 85.27 ±1.49 / 87.76 ms │     no change │
│ QQuery 98 │         66.80 / 71.61 ±2.70 / 74.30 ms │         66.01 / 68.60 ±1.68 / 70.73 ms │     no change │
│ QQuery 99 │      149.50 / 153.42 ±4.10 / 160.36 ms │     149.30 / 157.20 ±10.66 / 178.19 ms │     no change │
└───────────┴────────────────────────────────────────┴────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 31707.83ms │
│ Total Time (feat_sort-merge-join-collect-left)   │ 34218.63ms │
│ Average Time (HEAD)                              │   320.28ms │
│ Average Time (feat_sort-merge-join-collect-left) │   345.64ms │
│ Queries Faster                                   │         10 │
│ Queries Slower                                   │         30 │
│ Queries with No Change                           │         59 │
│ Queries with Failure                             │          0 │
└──────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 160.0s
Peak memory 19.5 GiB
Avg memory 6.0 GiB
CPU user 791.2s
CPU sys 76.4s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 175.0s
Peak memory 19.1 GiB
Avg memory 5.5 GiB
CPU user 796.9s
CPU sys 77.9s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_sort-merge-join-collect-left
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃ feat_sort-merge-join-collect-left ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1  │ 37.96 / 38.71 ±1.06 / 40.80 ms │    37.94 / 38.65 ±1.00 / 40.63 ms │ no change │
│ QQuery 2  │ 19.12 / 19.31 ±0.16 / 19.57 ms │    19.01 / 19.55 ±0.35 / 20.12 ms │ no change │
│ QQuery 3  │ 31.32 / 33.79 ±1.85 / 37.06 ms │    30.98 / 33.16 ±2.95 / 38.80 ms │ no change │
│ QQuery 4  │ 17.37 / 18.00 ±0.71 / 19.35 ms │    17.22 / 17.52 ±0.33 / 18.11 ms │ no change │
│ QQuery 5  │ 38.07 / 40.78 ±1.86 / 43.91 ms │    37.73 / 39.82 ±1.31 / 41.18 ms │ no change │
│ QQuery 6  │ 16.25 / 16.57 ±0.28 / 17.05 ms │    16.40 / 17.20 ±1.00 / 19.16 ms │ no change │
│ QQuery 7  │ 43.75 / 44.70 ±0.62 / 45.70 ms │    43.46 / 45.62 ±1.28 / 47.08 ms │ no change │
│ QQuery 8  │ 42.91 / 43.46 ±0.62 / 44.62 ms │    43.32 / 43.83 ±0.58 / 44.92 ms │ no change │
│ QQuery 9  │ 48.81 / 49.86 ±0.88 / 50.92 ms │    49.17 / 49.90 ±0.78 / 51.40 ms │ no change │
│ QQuery 10 │ 41.94 / 42.24 ±0.23 / 42.50 ms │    42.30 / 42.50 ±0.21 / 42.86 ms │ no change │
│ QQuery 11 │ 13.18 / 13.43 ±0.19 / 13.71 ms │    13.44 / 14.04 ±0.63 / 14.86 ms │ no change │
│ QQuery 12 │ 23.66 / 24.16 ±0.28 / 24.52 ms │    24.20 / 24.46 ±0.27 / 24.90 ms │ no change │
│ QQuery 13 │ 32.27 / 34.19 ±2.03 / 38.03 ms │    32.77 / 33.68 ±1.06 / 35.16 ms │ no change │
│ QQuery 14 │ 23.48 / 23.88 ±0.22 / 24.12 ms │    23.73 / 24.34 ±0.43 / 25.08 ms │ no change │
│ QQuery 15 │ 31.01 / 31.88 ±0.70 / 32.78 ms │    31.07 / 31.92 ±0.95 / 33.41 ms │ no change │
│ QQuery 16 │ 14.39 / 14.46 ±0.10 / 14.64 ms │    13.88 / 14.08 ±0.14 / 14.23 ms │ no change │
│ QQuery 17 │ 73.09 / 73.58 ±0.49 / 74.45 ms │    72.57 / 73.76 ±0.99 / 75.28 ms │ no change │
│ QQuery 18 │ 58.23 / 61.02 ±3.10 / 67.05 ms │    58.04 / 59.35 ±0.77 / 60.29 ms │ no change │
│ QQuery 19 │ 32.93 / 33.62 ±1.32 / 36.26 ms │    32.93 / 33.25 ±0.32 / 33.85 ms │ no change │
│ QQuery 20 │ 31.61 / 31.83 ±0.16 / 32.08 ms │    31.38 / 32.03 ±0.62 / 33.02 ms │ no change │
│ QQuery 21 │ 54.95 / 56.35 ±1.13 / 58.35 ms │    54.97 / 57.29 ±1.42 / 59.01 ms │ no change │
│ QQuery 22 │ 13.68 / 14.06 ±0.24 / 14.42 ms │    13.58 / 13.73 ±0.12 / 13.91 ms │ no change │
└───────────┴────────────────────────────────┴───────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                                ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 759.88ms │
│ Total Time (feat_sort-merge-join-collect-left)   │ 759.70ms │
│ Average Time (HEAD)                              │  34.54ms │
│ Average Time (feat_sort-merge-join-collect-left) │  34.53ms │
│ Queries Faster                                   │        0 │
│ Queries Slower                                   │        0 │
│ Queries with No Change                           │       22 │
│ Queries with Failure                             │        0 │
└──────────────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric Value
Wall time 5.0s
Peak memory 1.1 GiB
Avg memory 497.7 MiB
CPU user 21.9s
CPU sys 1.6s
Peak spill 0 B

tpch — branch

Metric Value
Wall time 5.0s
Peak memory 1.2 GiB
Avg memory 524.1 MiB
CPU user 21.9s
CPU sys 1.6s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_sort-merge-join-collect-left
--------------------
Benchmark tpch_sf10.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                               HEAD ┃  feat_sort-merge-join-collect-left ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1  │  315.74 / 317.19 ±1.25 / 319.47 ms │  313.02 / 315.05 ±1.41 / 316.72 ms │ no change │
│ QQuery 2  │  103.74 / 106.55 ±1.93 / 108.96 ms │  101.38 / 105.69 ±4.16 / 113.49 ms │ no change │
│ QQuery 3  │  229.63 / 236.98 ±4.71 / 242.62 ms │  235.40 / 237.73 ±2.64 / 241.20 ms │ no change │
│ QQuery 4  │  113.52 / 114.73 ±1.02 / 116.09 ms │  113.62 / 115.68 ±1.45 / 117.99 ms │ no change │
│ QQuery 5  │  350.81 / 356.59 ±5.10 / 364.96 ms │  353.67 / 359.42 ±6.27 / 367.92 ms │ no change │
│ QQuery 6  │  127.44 / 129.85 ±1.80 / 132.24 ms │  127.14 / 129.91 ±5.22 / 140.34 ms │ no change │
│ QQuery 7  │  464.54 / 467.26 ±2.35 / 470.40 ms │  455.32 / 462.74 ±8.29 / 477.04 ms │ no change │
│ QQuery 8  │  379.34 / 382.83 ±2.89 / 386.65 ms │  386.35 / 390.14 ±3.47 / 394.70 ms │ no change │
│ QQuery 9  │  554.67 / 558.97 ±4.45 / 567.52 ms │  553.55 / 563.54 ±5.75 / 569.80 ms │ no change │
│ QQuery 10 │ 296.28 / 309.63 ±10.07 / 320.31 ms │  295.09 / 307.64 ±8.50 / 320.86 ms │ no change │
│ QQuery 11 │   84.74 / 92.08 ±10.25 / 112.35 ms │     88.60 / 91.53 ±3.48 / 96.79 ms │ no change │
│ QQuery 12 │  180.43 / 184.91 ±3.02 / 188.86 ms │  177.99 / 181.59 ±2.96 / 186.63 ms │ no change │
│ QQuery 13 │  283.25 / 295.64 ±6.46 / 302.18 ms │ 284.46 / 298.73 ±10.67 / 313.85 ms │ no change │
│ QQuery 14 │  175.06 / 180.17 ±5.33 / 187.06 ms │  175.76 / 179.83 ±4.48 / 186.84 ms │ no change │
│ QQuery 15 │  308.22 / 312.23 ±2.79 / 316.57 ms │  310.12 / 311.98 ±1.07 / 313.42 ms │ no change │
│ QQuery 16 │     63.90 / 67.06 ±1.85 / 69.33 ms │     64.53 / 66.34 ±1.46 / 68.25 ms │ no change │
│ QQuery 17 │  629.57 / 636.63 ±5.84 / 645.73 ms │  633.23 / 643.90 ±5.55 / 648.94 ms │ no change │
│ QQuery 18 │ 668.31 / 686.01 ±13.93 / 707.86 ms │ 668.19 / 695.76 ±16.51 / 719.84 ms │ no change │
│ QQuery 19 │ 250.82 / 269.23 ±14.97 / 291.15 ms │ 250.86 / 270.22 ±24.12 / 310.01 ms │ no change │
│ QQuery 20 │  279.99 / 291.27 ±8.84 / 303.17 ms │  274.48 / 288.84 ±8.17 / 299.26 ms │ no change │
│ QQuery 21 │  651.38 / 660.66 ±7.15 / 672.62 ms │  658.82 / 668.20 ±9.42 / 682.67 ms │ no change │
│ QQuery 22 │     59.55 / 63.26 ±3.26 / 68.94 ms │     58.43 / 61.21 ±1.64 / 62.61 ms │ no change │
└───────────┴────────────────────────────────────┴────────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 6719.74ms │
│ Total Time (feat_sort-merge-join-collect-left)   │ 6745.66ms │
│ Average Time (HEAD)                              │  305.44ms │
│ Average Time (feat_sort-merge-join-collect-left) │  306.62ms │
│ Queries Faster                                   │         0 │
│ Queries Slower                                   │         0 │
│ Queries with No Change                           │        22 │
│ Queries with Failure                             │         0 │
└──────────────────────────────────────────────────┴───────────┘

Resource Usage

tpch10 — base (merge-base)

Metric Value
Wall time 35.0s
Peak memory 4.4 GiB
Avg memory 1.4 GiB
CPU user 339.4s
CPU sys 21.6s
Peak spill 0 B

tpch10 — branch

Metric Value
Wall time 35.0s
Peak memory 5.1 GiB
Avg memory 1.5 GiB
CPU user 340.6s
CPU sys 21.6s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_sort-merge-join-collect-left
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃     feat_sort-merge-join-collect-left ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.24 / 3.96 ±5.37 / 14.69 ms │          1.20 / 3.92 ±5.36 / 14.65 ms │     no change │
│ QQuery 1  │        12.92 / 13.25 ±0.18 / 13.42 ms │        12.62 / 12.88 ±0.18 / 13.12 ms │     no change │
│ QQuery 2  │        36.41 / 36.68 ±0.16 / 36.90 ms │        35.89 / 36.14 ±0.18 / 36.42 ms │     no change │
│ QQuery 3  │        30.55 / 31.19 ±0.67 / 32.47 ms │        30.50 / 30.79 ±0.23 / 31.01 ms │     no change │
│ QQuery 4  │     222.19 / 227.09 ±2.97 / 230.64 ms │     222.49 / 226.77 ±3.62 / 231.15 ms │     no change │
│ QQuery 5  │     273.75 / 275.04 ±0.75 / 275.83 ms │     268.22 / 272.39 ±2.87 / 276.66 ms │     no change │
│ QQuery 6  │           1.30 / 1.44 ±0.22 / 1.87 ms │           1.27 / 1.42 ±0.23 / 1.87 ms │     no change │
│ QQuery 7  │        14.28 / 14.49 ±0.19 / 14.79 ms │        13.72 / 13.89 ±0.10 / 13.99 ms │     no change │
│ QQuery 8  │     319.70 / 325.14 ±3.22 / 329.24 ms │     319.97 / 326.34 ±4.46 / 332.02 ms │     no change │
│ QQuery 9  │     454.91 / 461.74 ±7.01 / 475.05 ms │     451.48 / 458.35 ±5.79 / 466.56 ms │     no change │
│ QQuery 10 │        68.72 / 70.36 ±2.28 / 74.85 ms │        67.90 / 69.21 ±0.97 / 70.84 ms │     no change │
│ QQuery 11 │        79.89 / 81.31 ±1.23 / 83.30 ms │        79.60 / 82.65 ±3.74 / 89.92 ms │     no change │
│ QQuery 12 │     263.40 / 269.49 ±4.09 / 274.91 ms │     265.22 / 271.40 ±4.72 / 277.47 ms │     no change │
│ QQuery 13 │    361.82 / 377.41 ±13.98 / 396.97 ms │     363.89 / 375.23 ±7.29 / 384.40 ms │     no change │
│ QQuery 14 │     282.60 / 289.23 ±4.21 / 295.42 ms │     278.45 / 283.36 ±4.95 / 292.88 ms │     no change │
│ QQuery 15 │     273.36 / 281.61 ±7.55 / 295.59 ms │     273.00 / 279.96 ±4.62 / 284.93 ms │     no change │
│ QQuery 16 │     613.28 / 619.57 ±5.68 / 627.35 ms │    612.11 / 626.89 ±14.38 / 653.84 ms │     no change │
│ QQuery 17 │     619.39 / 626.59 ±4.69 / 632.86 ms │     614.68 / 626.57 ±7.15 / 636.75 ms │     no change │
│ QQuery 18 │ 1254.53 / 1279.54 ±22.65 / 1319.57 ms │ 1275.77 / 1287.47 ±11.86 / 1305.36 ms │     no change │
│ QQuery 19 │        28.05 / 32.35 ±5.49 / 42.56 ms │       28.04 / 36.60 ±12.57 / 60.43 ms │  1.13x slower │
│ QQuery 20 │    515.44 / 538.63 ±14.27 / 555.41 ms │     516.46 / 523.61 ±4.81 / 529.82 ms │     no change │
│ QQuery 21 │     516.45 / 520.82 ±2.40 / 523.42 ms │     520.77 / 524.13 ±2.68 / 527.74 ms │     no change │
│ QQuery 22 │   988.12 / 998.23 ±10.19 / 1016.34 ms │    984.53 / 996.32 ±8.61 / 1007.85 ms │     no change │
│ QQuery 23 │ 3009.93 / 3065.35 ±40.12 / 3125.89 ms │ 3040.57 / 3058.59 ±11.58 / 3070.44 ms │     no change │
│ QQuery 24 │       41.17 / 50.28 ±11.17 / 71.18 ms │        41.37 / 44.25 ±4.13 / 52.42 ms │ +1.14x faster │
│ QQuery 25 │     111.40 / 113.26 ±1.66 / 116.12 ms │     110.77 / 112.61 ±1.36 / 114.37 ms │     no change │
│ QQuery 26 │        42.07 / 43.69 ±2.29 / 48.22 ms │        41.57 / 42.25 ±0.73 / 43.40 ms │     no change │
│ QQuery 27 │     663.45 / 668.70 ±4.17 / 675.08 ms │     668.83 / 676.88 ±5.61 / 686.21 ms │     no change │
│ QQuery 28 │ 3013.30 / 3038.43 ±17.79 / 3064.28 ms │ 3034.67 / 3068.91 ±25.05 / 3104.81 ms │     no change │
│ QQuery 29 │        41.64 / 45.26 ±6.67 / 58.58 ms │        40.34 / 41.01 ±0.81 / 42.53 ms │ +1.10x faster │
│ QQuery 30 │    300.23 / 315.88 ±13.52 / 340.21 ms │     302.07 / 308.73 ±6.56 / 321.21 ms │     no change │
│ QQuery 31 │    281.26 / 292.43 ±10.20 / 304.84 ms │     275.75 / 287.54 ±8.03 / 298.19 ms │     no change │
│ QQuery 32 │    918.01 / 946.74 ±20.98 / 973.85 ms │    935.72 / 974.08 ±23.23 / 997.17 ms │     no change │
│ QQuery 33 │ 1445.17 / 1485.06 ±39.64 / 1552.28 ms │ 1459.83 / 1484.80 ±15.69 / 1508.12 ms │     no change │
│ QQuery 34 │ 1456.89 / 1504.16 ±34.88 / 1541.68 ms │ 1468.88 / 1506.32 ±21.92 / 1529.51 ms │     no change │
│ QQuery 35 │    272.51 / 302.97 ±45.71 / 393.96 ms │    281.80 / 313.20 ±39.51 / 379.62 ms │     no change │
│ QQuery 36 │        68.36 / 75.06 ±5.88 / 85.17 ms │        64.49 / 71.08 ±5.29 / 78.75 ms │ +1.06x faster │
│ QQuery 37 │        35.88 / 40.26 ±4.83 / 48.84 ms │        36.05 / 41.66 ±7.02 / 55.09 ms │     no change │
│ QQuery 38 │        41.81 / 43.68 ±1.05 / 45.06 ms │        41.69 / 47.05 ±5.75 / 58.17 ms │  1.08x slower │
│ QQuery 39 │     137.33 / 146.06 ±5.31 / 151.73 ms │     141.18 / 150.28 ±5.27 / 155.13 ms │     no change │
│ QQuery 40 │        14.25 / 15.91 ±2.43 / 20.71 ms │        14.04 / 18.21 ±7.43 / 33.04 ms │  1.14x slower │
│ QQuery 41 │        14.06 / 15.12 ±1.61 / 18.30 ms │        13.51 / 13.83 ±0.20 / 14.12 ms │ +1.09x faster │
│ QQuery 42 │        13.61 / 13.76 ±0.08 / 13.81 ms │        12.92 / 15.58 ±4.92 / 25.40 ms │  1.13x slower │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 19597.21ms │
│ Total Time (feat_sort-merge-join-collect-left)   │ 19643.10ms │
│ Average Time (HEAD)                              │   455.75ms │
│ Average Time (feat_sort-merge-join-collect-left) │   456.82ms │
│ Queries Faster                                   │          4 │
│ Queries Slower                                   │          4 │
│ Queries with No Change                           │         35 │
│ Queries with Failure                             │          0 │
└──────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 100.0s
Peak memory 11.6 GiB
Avg memory 4.4 GiB
CPU user 1008.1s
CPU sys 71.3s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 100.0s
Peak memory 11.9 GiB
Avg memory 4.6 GiB
CPU user 1007.2s
CPU sys 71.1s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and feat_sort-merge-join-collect-left
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                   HEAD ┃      feat_sort-merge-join-collect-left ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │         37.30 / 37.91 ±1.00 / 39.88 ms │         38.99 / 40.21 ±1.14 / 42.23 ms │  1.06x slower │
│ QQuery 2  │         82.63 / 82.82 ±0.13 / 82.97 ms │         83.26 / 84.14 ±0.56 / 84.78 ms │     no change │
│ QQuery 3  │         51.97 / 53.47 ±1.19 / 54.76 ms │         81.08 / 81.34 ±0.29 / 81.89 ms │  1.52x slower │
│ QQuery 4  │     824.82 / 846.90 ±15.76 / 870.97 ms │  1504.40 / 1545.15 ±35.87 / 1605.45 ms │  1.82x slower │
│ QQuery 5  │      107.39 / 110.51 ±2.72 / 115.06 ms │     110.80 / 117.75 ±10.65 / 138.84 ms │  1.07x slower │
│ QQuery 6  │         75.40 / 80.12 ±4.82 / 88.50 ms │      139.84 / 140.30 ±0.38 / 140.73 ms │  1.75x slower │
│ QQuery 7  │      121.82 / 127.52 ±7.77 / 142.58 ms │      121.34 / 127.59 ±3.88 / 132.59 ms │     no change │
│ QQuery 8  │         69.20 / 71.10 ±3.04 / 77.12 ms │         73.30 / 74.00 ±0.39 / 74.50 ms │     no change │
│ QQuery 9  │         54.52 / 55.65 ±1.20 / 57.91 ms │         54.15 / 58.97 ±2.60 / 62.02 ms │  1.06x slower │
│ QQuery 10 │      142.40 / 152.40 ±9.76 / 169.94 ms │      170.13 / 176.14 ±4.40 / 183.04 ms │  1.16x slower │
│ QQuery 11 │     436.96 / 456.16 ±15.86 / 476.01 ms │     815.89 / 837.88 ±23.12 / 870.20 ms │  1.84x slower │
│ QQuery 12 │         33.35 / 33.70 ±0.34 / 34.27 ms │         34.83 / 36.25 ±0.93 / 37.56 ms │  1.08x slower │
│ QQuery 13 │      126.65 / 130.88 ±7.11 / 145.06 ms │      128.51 / 131.53 ±4.75 / 140.95 ms │     no change │
│ QQuery 14 │     906.20 / 921.57 ±12.71 / 939.93 ms │     922.43 / 930.15 ±11.95 / 953.72 ms │     no change │
│ QQuery 15 │         55.37 / 55.91 ±0.47 / 56.47 ms │         56.08 / 56.90 ±0.69 / 57.96 ms │     no change │
│ QQuery 16 │         80.75 / 86.65 ±6.31 / 98.49 ms │         81.27 / 85.28 ±4.60 / 90.93 ms │     no change │
│ QQuery 17 │      177.05 / 178.12 ±1.30 / 180.64 ms │      175.87 / 179.79 ±4.53 / 188.58 ms │     no change │
│ QQuery 18 │      155.13 / 161.39 ±5.69 / 169.60 ms │      157.98 / 162.78 ±6.05 / 174.40 ms │     no change │
│ QQuery 19 │         77.80 / 79.98 ±1.23 / 81.47 ms │      104.03 / 108.25 ±3.31 / 112.80 ms │  1.35x slower │
│ QQuery 20 │         45.99 / 46.51 ±0.29 / 46.81 ms │         47.18 / 48.41 ±1.68 / 51.75 ms │     no change │
│ QQuery 21 │      477.05 / 481.66 ±4.39 / 488.70 ms │      474.09 / 483.08 ±9.09 / 500.37 ms │     no change │
│ QQuery 22 │      117.11 / 125.25 ±8.82 / 136.31 ms │      118.24 / 122.99 ±7.85 / 138.58 ms │     no change │
│ QQuery 23 │      626.21 / 635.82 ±7.36 / 647.15 ms │      632.07 / 641.43 ±9.94 / 658.58 ms │     no change │
│ QQuery 24 │      361.36 / 370.28 ±7.16 / 379.81 ms │     370.47 / 379.96 ±10.49 / 399.32 ms │     no change │
│ QQuery 25 │      193.98 / 197.82 ±3.33 / 202.62 ms │      192.77 / 198.90 ±5.49 / 209.18 ms │     no change │
│ QQuery 26 │      102.69 / 107.30 ±5.96 / 118.45 ms │      107.21 / 115.15 ±7.44 / 127.30 ms │  1.07x slower │
│ QQuery 27 │      122.63 / 124.70 ±1.73 / 127.62 ms │      122.47 / 127.75 ±5.72 / 138.13 ms │     no change │
│ QQuery 28 │         58.69 / 63.19 ±3.22 / 68.74 ms │         57.75 / 63.11 ±4.21 / 70.70 ms │     no change │
│ QQuery 29 │      178.12 / 181.66 ±6.37 / 194.38 ms │      172.06 / 178.05 ±4.59 / 186.28 ms │     no change │
│ QQuery 30 │         51.46 / 52.79 ±1.67 / 56.09 ms │         47.90 / 48.74 ±1.21 / 51.15 ms │ +1.08x faster │
│ QQuery 31 │      258.12 / 262.38 ±5.42 / 272.93 ms │      205.91 / 213.26 ±6.33 / 223.65 ms │ +1.23x faster │
│ QQuery 32 │         71.02 / 71.59 ±0.41 / 72.09 ms │         71.39 / 72.04 ±0.37 / 72.40 ms │     no change │
│ QQuery 33 │         76.66 / 77.06 ±0.35 / 77.69 ms │         75.02 / 81.29 ±3.91 / 85.80 ms │  1.05x slower │
│ QQuery 34 │         72.88 / 73.43 ±0.50 / 74.27 ms │         74.54 / 75.47 ±0.95 / 76.91 ms │     no change │
│ QQuery 35 │     134.67 / 141.29 ±11.32 / 163.88 ms │      170.45 / 175.37 ±4.71 / 182.25 ms │  1.24x slower │
│ QQuery 36 │         90.23 / 92.02 ±1.24 / 93.83 ms │        90.50 / 95.71 ±4.10 / 100.31 ms │     no change │
│ QQuery 37 │         43.51 / 49.00 ±6.68 / 62.14 ms │         62.46 / 64.18 ±1.33 / 66.34 ms │  1.31x slower │
│ QQuery 38 │      107.47 / 108.20 ±0.67 / 109.28 ms │      107.64 / 108.48 ±0.68 / 109.67 ms │     no change │
│ QQuery 39 │ 8076.09 / 9139.69 ±595.13 / 9907.50 ms │ 8222.70 / 9193.11 ±573.64 / 9783.96 ms │     no change │
│ QQuery 40 │    117.86 / 268.22 ±224.04 / 698.82 ms │    116.07 / 265.48 ±247.05 / 752.74 ms │     no change │
│ QQuery 41 │         12.42 / 12.65 ±0.19 / 12.88 ms │         12.45 / 12.65 ±0.14 / 12.82 ms │     no change │
│ QQuery 42 │         52.86 / 56.06 ±1.84 / 58.32 ms │         79.52 / 80.54 ±1.07 / 81.90 ms │  1.44x slower │
│ QQuery 43 │         59.53 / 62.22 ±1.47 / 63.80 ms │        91.94 / 97.10 ±8.47 / 113.98 ms │  1.56x slower │
│ QQuery 44 │         13.34 / 13.48 ±0.11 / 13.66 ms │         12.90 / 13.06 ±0.18 / 13.39 ms │     no change │
│ QQuery 45 │        49.39 / 58.20 ±10.44 / 78.71 ms │         54.41 / 56.24 ±1.20 / 58.00 ms │     no change │
│ QQuery 46 │      130.92 / 136.95 ±4.71 / 142.54 ms │     133.94 / 145.33 ±12.05 / 166.79 ms │  1.06x slower │
│ QQuery 47 │     411.20 / 434.47 ±25.69 / 470.94 ms │     735.45 / 769.66 ±23.55 / 798.19 ms │  1.77x slower │
│ QQuery 48 │     126.52 / 141.31 ±19.38 / 179.26 ms │     134.28 / 142.50 ±10.06 / 162.07 ms │     no change │
│ QQuery 49 │         84.44 / 87.16 ±3.00 / 92.27 ms │         85.49 / 86.95 ±1.11 / 88.68 ms │     no change │
│ QQuery 50 │        95.51 / 99.08 ±2.81 / 103.46 ms │        94.30 / 99.89 ±3.03 / 103.46 ms │     no change │
│ QQuery 51 │      128.71 / 139.21 ±9.27 / 155.76 ms │     130.26 / 141.07 ±12.96 / 166.30 ms │     no change │
│ QQuery 52 │         55.28 / 57.83 ±3.04 / 63.79 ms │         80.18 / 81.41 ±0.94 / 82.90 ms │  1.41x slower │
│ QQuery 53 │         62.99 / 64.38 ±1.02 / 66.00 ms │     111.11 / 130.67 ±34.37 / 199.35 ms │  2.03x slower │
│ QQuery 54 │      114.84 / 119.92 ±4.48 / 127.64 ms │      135.52 / 140.63 ±5.28 / 150.78 ms │  1.17x slower │
│ QQuery 55 │         52.79 / 53.65 ±0.99 / 55.07 ms │       79.20 / 96.12 ±19.29 / 124.32 ms │  1.79x slower │
│ QQuery 56 │        79.01 / 85.54 ±8.83 / 103.00 ms │         78.21 / 82.10 ±3.29 / 87.71 ms │     no change │
│ QQuery 57 │     247.80 / 262.00 ±11.18 / 279.26 ms │     606.92 / 625.62 ±17.47 / 658.32 ms │  2.39x slower │
│ QQuery 58 │     203.49 / 219.25 ±14.55 / 246.78 ms │     204.15 / 223.31 ±15.04 / 250.11 ms │     no change │
│ QQuery 59 │     129.47 / 142.60 ±15.42 / 170.40 ms │     139.03 / 151.67 ±13.55 / 169.13 ms │  1.06x slower │
│ QQuery 60 │         80.29 / 82.24 ±2.51 / 87.18 ms │         80.36 / 82.58 ±1.16 / 83.65 ms │     no change │
│ QQuery 61 │     113.94 / 131.73 ±19.53 / 168.57 ms │     116.91 / 124.47 ±11.20 / 146.43 ms │ +1.06x faster │
│ QQuery 62 │         73.39 / 78.16 ±6.14 / 89.96 ms │         71.77 / 73.56 ±2.00 / 77.44 ms │ +1.06x faster │
│ QQuery 63 │         63.56 / 68.35 ±5.44 / 78.89 ms │      114.11 / 118.99 ±5.03 / 128.69 ms │  1.74x slower │
│ QQuery 64 │     771.79 / 809.23 ±27.06 / 855.10 ms │      797.34 / 808.59 ±9.04 / 822.48 ms │     no change │
│ QQuery 65 │     120.16 / 132.37 ±13.86 / 158.08 ms │      122.92 / 126.68 ±5.21 / 136.87 ms │     no change │
│ QQuery 66 │      186.30 / 192.34 ±4.10 / 196.63 ms │      193.35 / 198.61 ±9.13 / 216.86 ms │     no change │
│ QQuery 67 │     146.74 / 160.72 ±15.36 / 183.65 ms │     147.79 / 159.53 ±11.22 / 178.84 ms │     no change │
│ QQuery 68 │      142.13 / 150.10 ±5.79 / 155.52 ms │     148.31 / 165.13 ±14.61 / 190.80 ms │  1.10x slower │
│ QQuery 69 │     150.71 / 166.93 ±18.76 / 202.23 ms │     181.03 / 206.76 ±21.91 / 241.33 ms │  1.24x slower │
│ QQuery 70 │     229.03 / 241.72 ±16.58 / 273.97 ms │      229.47 / 243.04 ±9.76 / 259.95 ms │     no change │
│ QQuery 71 │         71.69 / 76.93 ±4.33 / 83.20 ms │         70.50 / 73.02 ±1.98 / 75.65 ms │ +1.05x faster │
│ QQuery 72 │ 8215.23 / 8456.22 ±159.42 / 8717.87 ms │ 8252.91 / 8427.19 ±177.89 / 8684.08 ms │     no change │
│ QQuery 73 │       73.29 / 89.78 ±32.24 / 154.26 ms │     72.91 / 150.41 ±141.64 / 433.29 ms │  1.68x slower │
│ QQuery 74 │    297.62 / 372.94 ±125.12 / 622.07 ms │     597.50 / 623.29 ±21.78 / 660.21 ms │  1.67x slower │
│ QQuery 75 │     228.64 / 251.42 ±26.68 / 303.74 ms │     236.75 / 245.83 ±10.88 / 267.02 ms │     no change │
│ QQuery 76 │         41.07 / 41.51 ±0.36 / 42.03 ms │         41.53 / 45.73 ±7.13 / 59.94 ms │  1.10x slower │
│ QQuery 77 │     111.53 / 124.12 ±19.72 / 162.96 ms │      100.31 / 101.64 ±1.89 / 105.22 ms │ +1.22x faster │
│ QQuery 78 │     299.90 / 319.68 ±28.06 / 374.66 ms │      293.30 / 310.88 ±9.76 / 320.07 ms │     no change │
│ QQuery 79 │     128.31 / 143.93 ±19.41 / 181.14 ms │      129.58 / 137.70 ±6.46 / 149.38 ms │     no change │
│ QQuery 80 │      244.34 / 247.17 ±3.06 / 251.39 ms │      238.09 / 246.64 ±5.70 / 255.58 ms │     no change │
│ QQuery 81 │         46.12 / 47.46 ±1.72 / 50.87 ms │         41.73 / 42.34 ±0.58 / 43.37 ms │ +1.12x faster │
│ QQuery 82 │       58.94 / 80.28 ±32.10 / 143.51 ms │         75.59 / 79.61 ±3.68 / 86.32 ms │     no change │
│ QQuery 83 │         58.79 / 59.77 ±0.75 / 60.97 ms │         60.30 / 64.82 ±2.99 / 69.01 ms │  1.08x slower │
│ QQuery 84 │         52.29 / 53.92 ±0.91 / 54.86 ms │         82.01 / 83.72 ±2.80 / 89.29 ms │  1.55x slower │
│ QQuery 85 │      144.44 / 150.00 ±6.49 / 162.41 ms │      141.19 / 149.09 ±5.37 / 157.78 ms │     no change │
│ QQuery 86 │         29.11 / 29.86 ±0.56 / 30.67 ms │         29.68 / 30.32 ±0.69 / 31.28 ms │     no change │
│ QQuery 87 │     109.36 / 118.01 ±11.80 / 141.38 ms │      111.57 / 114.45 ±3.50 / 121.32 ms │     no change │
│ QQuery 88 │      193.87 / 203.40 ±8.41 / 215.47 ms │      195.01 / 198.26 ±3.84 / 205.27 ms │     no change │
│ QQuery 89 │         69.33 / 70.57 ±1.07 / 72.13 ms │      122.21 / 124.14 ±1.36 / 125.81 ms │  1.76x slower │
│ QQuery 90 │         28.73 / 29.40 ±0.47 / 29.91 ms │         28.86 / 29.61 ±0.54 / 30.47 ms │     no change │
│ QQuery 91 │         54.42 / 54.63 ±0.16 / 54.87 ms │         63.82 / 67.05 ±3.41 / 73.06 ms │  1.23x slower │
│ QQuery 92 │        54.53 / 64.34 ±15.22 / 94.67 ms │         53.55 / 57.42 ±3.42 / 62.48 ms │ +1.12x faster │
│ QQuery 93 │      222.77 / 229.39 ±3.70 / 234.09 ms │      225.95 / 231.40 ±5.00 / 239.56 ms │     no change │
│ QQuery 94 │         66.05 / 68.77 ±2.21 / 71.23 ms │         65.54 / 68.55 ±3.93 / 76.10 ms │     no change │
│ QQuery 95 │      166.90 / 176.34 ±6.70 / 187.68 ms │      166.40 / 176.03 ±5.84 / 184.65 ms │     no change │
│ QQuery 96 │         50.59 / 51.64 ±0.87 / 52.93 ms │         51.23 / 52.00 ±0.70 / 53.17 ms │     no change │
│ QQuery 97 │        86.83 / 90.85 ±5.22 / 100.39 ms │         87.08 / 90.90 ±3.62 / 96.59 ms │     no change │
│ QQuery 98 │         69.03 / 72.56 ±3.31 / 76.52 ms │         67.84 / 70.86 ±2.73 / 75.86 ms │     no change │
│ QQuery 99 │     152.65 / 165.84 ±10.66 / 177.38 ms │      153.19 / 160.49 ±6.19 / 169.03 ms │     no change │
└───────────┴────────────────────────────────────────┴────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 32793.20ms │
│ Total Time (feat_sort-merge-join-collect-left)   │ 35464.19ms │
│ Average Time (HEAD)                              │   331.24ms │
│ Average Time (feat_sort-merge-join-collect-left) │   358.22ms │
│ Queries Faster                                   │          8 │
│ Queries Slower                                   │         34 │
│ Queries with No Change                           │         57 │
│ Queries with Failure                             │          0 │
└──────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 165.0s
Peak memory 19.4 GiB
Avg memory 6.3 GiB
CPU user 819.0s
CPU sys 79.1s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 180.0s
Peak memory 19.2 GiB
Avg memory 5.6 GiB
CPU user 818.4s
CPU sys 82.8s
Peak spill 0 B

File an issue against this benchmark runner

@Dandandan Dandandan marked this pull request as draft June 23, 2026 09:50
@Dandandan

Dandandan commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Hmm seems quite a bit slower in modt cases (probably as merging is slow/single threaded at the moment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate optimizer Optimizer rules physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add CollectLeft partition mode to SortMergeJoinExec

2 participants