Skip to content

GH-50140: [C++][Gandiva] Fix castVARCHAR(decimal128) native memory corruption / SIGSEGV on allocation failure#50141

Open
lriggs wants to merge 2 commits into
apache:mainfrom
lriggs:DX-116032-gandiva-castvarchar-decimal-crash
Open

GH-50140: [C++][Gandiva] Fix castVARCHAR(decimal128) native memory corruption / SIGSEGV on allocation failure#50141
lriggs wants to merge 2 commits into
apache:mainfrom
lriggs:DX-116032-gandiva-castvarchar-decimal-crash

Conversation

@lriggs

@lriggs lriggs commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Rationale for this change

The Gandiva castVARCHAR_decimal128_int64 path could corrupt native memory and
crash the process (SIGSEGV) when the output-string arena allocation failed
(e.g. CAST(decimal AS VARCHAR) under memory pressure). Three independent
defects combined to cause this:

  1. The castVARCHAR decimal128 registry entry was missing
    NativeFunction::kCanReturnErrors, so generated code skipped the error check
    and ignored any error the function reported.
  2. gdv_fn_dec_to_string set the output length to a positive value before
    checking whether the allocation succeeded, then returned nullptr — leaving
    the caller to copy from an invalid buffer with a positive length.
  3. castVARCHAR_decimal128_int64 did not validate a negative requested output
    length and did not handle an upstream allocation failure.

What changes are included in this PR?

  • function_registry_string.cc: Add NativeFunction::kCanReturnErrors to the
    castVARCHAR decimal128 entry so the generated code checks for and
    propagates errors instead of assuming the function never fails.

  • gdv_function_stubs.cc (gdv_fn_dec_to_string): Only write the output
    length after a successful allocation. On allocation failure, set
    *dec_str_len = 0 and return an empty string so callers never copy from an
    invalid buffer using a stale, positive length.

  • precompiled/decimal_wrapper.cc (castVARCHAR_decimal128_int64):

    • Reject a negative output length with a graceful error
      ("Output buffer length can't be negative") instead of using it as a copy
      size.
    • Bail out safely (zero length, empty string) if the upstream
      gdv_fn_dec_to_string call failed, since the error has already been set.
  • tests/decimal_test.cc: Add TestCastVarCharDecimalNegativeLength, a
    regression test that casts a decimal to varchar with a negative output length
    and asserts the query fails gracefully with the expected error message rather
    than crashing. This also exercises the kCanReturnErrors flag — without it the
    error would not propagate and the test would fail.

Behavior change

Queries such as CAST(decimal AS VARCHAR) that previously crashed the process
(SIGSEGV) under memory pressure now fail gracefully with an error message about
the allocation failure / invalid length, and the rest of the system is
unaffected.

Are these changes tested?

Yes, unit tests.

Are there any user-facing changes?

No.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

⚠️ GitHub issue #50140 has been automatically assigned in GitHub to PR creator.

@kou kou changed the title GH-50140 Fix castVARCHAR(decimal128) native memory corruption / SIGSEGV on allocation failure GH-50140: [C++][Gandiva] Fix castVARCHAR(decimal128) native memory corruption / SIGSEGV on allocation failure Jun 10, 2026
@@ -423,11 +423,23 @@ FORCE_INLINE
char* castVARCHAR_decimal128_int64(int64_t context, int64_t x_high, uint64_t x_low,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name is kind of misleading. Intuitively, I thought that we are casting varchar into decimal. It looks like it is converting decimal and int64 to varchar.

if (dec_str == nullptr) {
// Allocation failed upstream; error message is already set. Avoid copying from
// an invalid buffer with a non-zero length.
*out_length = 0;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the result be NULL? Curious Is this case of CAST((CAST(NULL AS DECIMAL) AS VARCHAR) ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will be null. This nullptr is set when the memory allocation for the output string fails. The null part is actually handled at another level using the validity bits.

null decimal input is handled correctly, and it's actually cleaner now than before.

Tracing the slow path (now active because we added kCanReturnErrors) in llvm_generator.cc:797-832:

For a null input, is_valid is false, so BuildIfElse takes the else_lambda (lines 821-829):

castVARCHAR_decimal128_int64 is not called at all.
It returns a dummy value: else_value = NullConstant(...) and, because utf8() is binary-like, else_value_len = i32_constant(0).
Separately, the result validity is computed by the validity dex as the AND of the input validities (kResultNullIfNull, per expr_decomposer.cc:92. A null input → result validity bit = 0 → the output row is marked null, and the dummy value/length are masked out.

So the consumer sees a proper null. Two things worth noting:

Correctness was the same before our fix — the fast path also produced a null output for null rows (via the validity mask); the only difference was that it ran the function on garbage x_high/x_low and threw the result away. Now it's genuinely skipped.

No risk from the 0-length dummy — the else branch returns a null pointer with length 0, and since the row is invalid in the bitmap, nothing reads it. This is unrelated to the arena 0-size behavior we just discussed (no arena_malloc call happens on this path at all).

So: null input → function skipped → output null, exactly as expected.

@github-actions github-actions Bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants