Chunked dict values pullup by gatesn · Pull Request #8577 · vortex-data/vortex

gatesn · 2026-06-24T15:19:23Z

Rational for this change

Repeated dictionary chunks can share the exact same values array while keeping separate codes arrays. In that shape, later predicate and scalar-function pushdown works better if the common values array is above the chunked codes rather than repeated under every chunk. This adds that optimizer rewrite with a strict pointer-identity precondition.

No tracked issue.

What changes are included in this PR?

Adds an optimizer rule that rewrites Chunked<Dict<codes_i, values>> into Dict<Chunked<codes_i>, values> when all chunks are dictionaries with the same values allocation and compatible code dtypes. It also registers the parent kernel needed for dictionary-over-chunked execution and adds tests for both the shared-values rewrite and the distinct-values no-op case.

What APIs are changed? Are there any user-facing changes?

No public API changes. Optimized array shape may change internally, but logical array values are unchanged.

Note: I don't expect this to impact current develop since we do not return ChunkedArrays from scans that use SplitBy::Layout

Signed-off-by: Nicholas Gates <nick@nickgates.com>

codspeed-hq · 2026-06-24T15:27:43Z

Merging this PR will degrade performance by 12.69%

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 1 improved benchmark
❌ 4 regressed benchmarks
✅ 1584 untouched benchmarks
⏩ 4 skipped benchmarks¹

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`chunked_bool_canonical_into[(1000, 10)]`	16.3 µs	26.7 µs	-39.01%
❌	Simulation	`chunked_varbinview_canonical_into[(100, 100)]`	224.2 µs	259.4 µs	-13.57%
❌	Simulation	`chunked_varbinview_into_canonical[(100, 100)]`	271.4 µs	306.6 µs	-11.49%
❌	Simulation	`bitwise_not_vortex_buffer_mut[128]`	244.4 ns	273.6 ns	-10.66%
⚡	Simulation	`chunked_varbinview_into_canonical[(1000, 10)]`	205.6 µs	168.9 µs	+21.71%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing ngates/onpair-split-3-chunked-dict-values-pullup (7b1f5ed) with develop (2a19323)}

4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

robert3005 · 2026-06-25T10:22:42Z

 pub(crate) fn initialize(session: &VortexSession) {
    let kernels = session.kernels();
    kernels.register_execute_parent_kernel(Binary.id(), Dict, CompareExecuteAdaptor(Dict));
+    kernels.register_execute_parent_kernel(Dict.id(), Chunked, TakeExecuteAdaptor(Chunked));


Did this exist before and was not registered?

gatesn requested a review from a team June 24, 2026 15:19

gatesn changed the title ~~Ngates/onpair split 3 chunked dict values pullup~~ Chunked dict values pullup Jun 24, 2026

gatesn added the changelog/performance A performance improvement label Jun 24, 2026

Pull up shared dictionary values from chunks

7b1f5ed

Signed-off-by: Nicholas Gates <nick@nickgates.com>

gatesn force-pushed the ngates/onpair-split-3-chunked-dict-values-pullup branch from 192484b to 7b1f5ed Compare June 24, 2026 15:26

lwwmanning approved these changes Jun 24, 2026

View reviewed changes

robert3005 reviewed Jun 25, 2026

View reviewed changes

robert3005 approved these changes Jun 25, 2026

View reviewed changes

gatesn merged commit 3493ecb into develop Jun 25, 2026
87 of 88 checks passed

gatesn deleted the ngates/onpair-split-3-chunked-dict-values-pullup branch June 25, 2026 13:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chunked dict values pullup#8577

Chunked dict values pullup#8577
gatesn merged 1 commit into
developfrom
ngates/onpair-split-3-chunked-dict-values-pullup

gatesn commented Jun 24, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

robert3005 Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

gatesn commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rational for this change

What changes are included in this PR?

What APIs are changed? Are there any user-facing changes?

Uh oh!

codspeed-hq Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 12.69%

Performance Changes

Footnotes

Uh oh!

robert3005 Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gatesn commented Jun 24, 2026 •

edited

Loading

codspeed-hq Bot commented Jun 24, 2026 •

edited

Loading