Chunked dict values pullup#8577
Conversation
Signed-off-by: Nicholas Gates <nick@nickgates.com>
192484b to
7b1f5ed
Compare
Merging this PR will degrade performance by 12.69%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
16.3 µs | 26.7 µs | -39.01% |
| ❌ | Simulation | chunked_varbinview_canonical_into[(100, 100)] |
224.2 µs | 259.4 µs | -13.57% |
| ❌ | Simulation | chunked_varbinview_into_canonical[(100, 100)] |
271.4 µs | 306.6 µs | -11.49% |
| ❌ | Simulation | bitwise_not_vortex_buffer_mut[128] |
244.4 ns | 273.6 ns | -10.66% |
| ⚡ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
205.6 µs | 168.9 µs | +21.71% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing ngates/onpair-split-3-chunked-dict-values-pullup (7b1f5ed) with develop (2a19323)
Footnotes
-
4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
| pub(crate) fn initialize(session: &VortexSession) { | ||
| let kernels = session.kernels(); | ||
| kernels.register_execute_parent_kernel(Binary.id(), Dict, CompareExecuteAdaptor(Dict)); | ||
| kernels.register_execute_parent_kernel(Dict.id(), Chunked, TakeExecuteAdaptor(Chunked)); |
There was a problem hiding this comment.
Did this exist before and was not registered?
Rational for this change
Repeated dictionary chunks can share the exact same values array while keeping separate codes arrays. In that shape, later predicate and scalar-function pushdown works better if the common values array is above the chunked codes rather than repeated under every chunk. This adds that optimizer rewrite with a strict pointer-identity precondition.
No tracked issue.
What changes are included in this PR?
Adds an optimizer rule that rewrites
Chunked<Dict<codes_i, values>>intoDict<Chunked<codes_i>, values>when all chunks are dictionaries with the same values allocation and compatible code dtypes. It also registers the parent kernel needed for dictionary-over-chunked execution and adds tests for both the shared-values rewrite and the distinct-values no-op case.What APIs are changed? Are there any user-facing changes?
No public API changes. Optimized array shape may change internally, but logical array values are unchanged.
Note: I don't expect this to impact current develop since we do not return ChunkedArrays from scans that use SplitBy::Layout