Skip to content

Chunked dict values pullup#8577

Merged
gatesn merged 1 commit into
developfrom
ngates/onpair-split-3-chunked-dict-values-pullup
Jun 25, 2026
Merged

Chunked dict values pullup#8577
gatesn merged 1 commit into
developfrom
ngates/onpair-split-3-chunked-dict-values-pullup

Conversation

@gatesn

@gatesn gatesn commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Rational for this change

Repeated dictionary chunks can share the exact same values array while keeping separate codes arrays. In that shape, later predicate and scalar-function pushdown works better if the common values array is above the chunked codes rather than repeated under every chunk. This adds that optimizer rewrite with a strict pointer-identity precondition.

No tracked issue.

What changes are included in this PR?

Adds an optimizer rule that rewrites Chunked<Dict<codes_i, values>> into Dict<Chunked<codes_i>, values> when all chunks are dictionaries with the same values allocation and compatible code dtypes. It also registers the parent kernel needed for dictionary-over-chunked execution and adds tests for both the shared-values rewrite and the distinct-values no-op case.

What APIs are changed? Are there any user-facing changes?

No public API changes. Optimized array shape may change internally, but logical array values are unchanged.

Note: I don't expect this to impact current develop since we do not return ChunkedArrays from scans that use SplitBy::Layout

@gatesn gatesn requested a review from a team June 24, 2026 15:19
@gatesn gatesn changed the title Ngates/onpair split 3 chunked dict values pullup Chunked dict values pullup Jun 24, 2026
@gatesn gatesn added the changelog/performance A performance improvement label Jun 24, 2026
Signed-off-by: Nicholas Gates <nick@nickgates.com>
@gatesn gatesn force-pushed the ngates/onpair-split-3-chunked-dict-values-pullup branch from 192484b to 7b1f5ed Compare June 24, 2026 15:26
@codspeed-hq

codspeed-hq Bot commented Jun 24, 2026

Copy link
Copy Markdown

Merging this PR will degrade performance by 12.69%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 1 improved benchmark
❌ 4 regressed benchmarks
✅ 1584 untouched benchmarks
⏩ 4 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_bool_canonical_into[(1000, 10)] 16.3 µs 26.7 µs -39.01%
Simulation chunked_varbinview_canonical_into[(100, 100)] 224.2 µs 259.4 µs -13.57%
Simulation chunked_varbinview_into_canonical[(100, 100)] 271.4 µs 306.6 µs -11.49%
Simulation bitwise_not_vortex_buffer_mut[128] 244.4 ns 273.6 ns -10.66%
Simulation chunked_varbinview_into_canonical[(1000, 10)] 205.6 µs 168.9 µs +21.71%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing ngates/onpair-split-3-chunked-dict-values-pullup (7b1f5ed) with develop (2a19323)

Open in CodSpeed

Footnotes

  1. 4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

pub(crate) fn initialize(session: &VortexSession) {
let kernels = session.kernels();
kernels.register_execute_parent_kernel(Binary.id(), Dict, CompareExecuteAdaptor(Dict));
kernels.register_execute_parent_kernel(Dict.id(), Chunked, TakeExecuteAdaptor(Chunked));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this exist before and was not registered?

@gatesn gatesn merged commit 3493ecb into develop Jun 25, 2026
87 of 88 checks passed
@gatesn gatesn deleted the ngates/onpair-split-3-chunked-dict-values-pullup branch June 25, 2026 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/performance A performance improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants