Skip to content

fix: out-of-bounds write in fsst decompress when data compresses better than 3:1#7323

Closed
LuciferYang wants to merge 1 commit into
lance-format:mainfrom
LuciferYang:fix/fsst-decompress-oob-7266
Closed

fix: out-of-bounds write in fsst decompress when data compresses better than 3:1#7323
LuciferYang wants to merge 1 commit into
lance-format:mainfrom
LuciferYang:fix/fsst-decompress-oob-7266

Conversation

@LuciferYang

Copy link
Copy Markdown
Contributor

Problem

fsst::decompress enforces a minimum output buffer of 3 * input, but that bound is unsound. A code can decode to an 8-byte symbol, and decompress_bulk writes a full 8-byte word per code through raw pointers, so the decoded output can be as large as 8 * input. When data compresses better than 3:1, decoding into a buffer sized at the documented minimum runs past the end of the allocation: a segfault under a strict allocator, an out-of-bounds write under Miri/ASan.

In-tree callers happen to allocate 8 * input, so they were never affected. Only the public contract was wrong.

Fix

decompress now grows its (owned) output buffer to 8 * in_buf.len() + 8 before decoding, instead of requiring 3x. The + 8 is one word of headroom for the final speculative write, which lands at out_curr + 8 even when the last symbol is shorter than 8 bytes. The buffer is shrunk back to the exact decoded length afterwards, so callers see no change.

That bound only holds while every symbol is at most 8 bytes. Since the symbol table is read from untrusted on-disk data, init now rejects any table carrying a symbol length above 8 (InvalidData) rather than letting it break the bound.

Tests

  • test_decompress_better_than_3x_does_not_overflow and its _64_bit_offsets variant compress highly repetitive input (about 8:1) and decode it into a 3x buffer, checking a correct round trip for both 32- and 64-bit offsets. Both fail before the fix.
  • test_decompress_rejects_oversized_symbol_length corrupts a symbol length byte and expects InvalidData.

cargo test -p fsst, cargo clippy -p fsst --tests -- -D warnings, and cargo fmt --all pass.

Closes #7266.

@github-actions github-actions Bot added the bug Something isn't working label Jun 17, 2026
@github-actions

Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

…er than 3:1

`decompress` required the output buffer to be at least 3x the input, but a
code can decode to an 8-byte symbol and `decompress_bulk` writes a full
8-byte word per code, so the decoded output can be up to 8x the input.
Inputs that compress better than 3:1 overran the buffer (segfault under a
strict allocator, out-of-bounds write under Miri/ASan). In-tree callers
already allocate 8x, so only the public API contract was wrong.

Grow the owned output buffer to `8 * in_buf.len() + 8` before decoding (the
8x worst case plus one word of headroom for the final speculative write)
instead of enforcing the 3x minimum; it is shrunk back to the decoded
length afterwards, so callers see no change. Also reject symbol tables
whose symbol length exceeds 8 in `init`: the table is read from untrusted
on-disk data and a larger length would break the same bound.
@LuciferYang LuciferYang force-pushed the fix/fsst-decompress-oob-7266 branch from 0727889 to 5f238ff Compare June 17, 2026 16:03
@LuciferYang LuciferYang changed the title fix: FSST decompress writes out of bounds when data compresses better than 3:1 fix: out-of-bounds write in fsst decompress when data compresses better than 3:1 Jun 17, 2026
@codecov

codecov Bot commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.47059% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/compression/fsst/src/fsst.rs 96.47% 2 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@LuciferYang

Copy link
Copy Markdown
Contributor Author

close this one since #7267 before this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FSST decompress writes out of bounds when data compresses better than 3:1

1 participant