fix: out-of-bounds write in fsst decompress when data compresses better than 3:1 by LuciferYang · Pull Request #7323 · lance-format/lance

LuciferYang · 2026-06-17T16:00:09Z

Problem

fsst::decompress enforces a minimum output buffer of 3 * input, but that bound is unsound. A code can decode to an 8-byte symbol, and decompress_bulk writes a full 8-byte word per code through raw pointers, so the decoded output can be as large as 8 * input. When data compresses better than 3:1, decoding into a buffer sized at the documented minimum runs past the end of the allocation: a segfault under a strict allocator, an out-of-bounds write under Miri/ASan.

In-tree callers happen to allocate 8 * input, so they were never affected. Only the public contract was wrong.

Fix

decompress now grows its (owned) output buffer to 8 * in_buf.len() + 8 before decoding, instead of requiring 3x. The + 8 is one word of headroom for the final speculative write, which lands at out_curr + 8 even when the last symbol is shorter than 8 bytes. The buffer is shrunk back to the exact decoded length afterwards, so callers see no change.

That bound only holds while every symbol is at most 8 bytes. Since the symbol table is read from untrusted on-disk data, init now rejects any table carrying a symbol length above 8 (InvalidData) rather than letting it break the bound.

Tests

test_decompress_better_than_3x_does_not_overflow and its _64_bit_offsets variant compress highly repetitive input (about 8:1) and decode it into a 3x buffer, checking a correct round trip for both 32- and 64-bit offsets. Both fail before the fix.
test_decompress_rejects_oversized_symbol_length corrupts a symbol length byte and expects InvalidData.

cargo test -p fsst, cargo clippy -p fsst --tests -- -D warnings, and cargo fmt --all pass.

Closes #7266.

github-actions · 2026-06-17T16:00:37Z

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

…er than 3:1 `decompress` required the output buffer to be at least 3x the input, but a code can decode to an 8-byte symbol and `decompress_bulk` writes a full 8-byte word per code, so the decoded output can be up to 8x the input. Inputs that compress better than 3:1 overran the buffer (segfault under a strict allocator, out-of-bounds write under Miri/ASan). In-tree callers already allocate 8x, so only the public API contract was wrong. Grow the owned output buffer to `8 * in_buf.len() + 8` before decoding (the 8x worst case plus one word of headroom for the final speculative write) instead of enforcing the 3x minimum; it is shrunk back to the decoded length afterwards, so callers see no change. Also reject symbol tables whose symbol length exceeds 8 in `init`: the table is read from untrusted on-disk data and a larger length would break the same bound.

codecov · 2026-06-17T16:45:28Z

Codecov Report

❌ Patch coverage is 96.47059% with 3 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/compression/fsst/src/fsst.rs	96.47%	2 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

LuciferYang · 2026-06-18T06:50:11Z

close this one since #7267 before this one.

github-actions Bot added the bug Something isn't working label Jun 17, 2026

LuciferYang force-pushed the fix/fsst-decompress-oob-7266 branch from 0727889 to 5f238ff Compare June 17, 2026 16:03

LuciferYang changed the title ~~fix: FSST decompress writes out of bounds when data compresses better than 3:1~~ fix: out-of-bounds write in fsst decompress when data compresses better than 3:1 Jun 17, 2026

LuciferYang closed this Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: out-of-bounds write in fsst decompress when data compresses better than 3:1#7323

fix: out-of-bounds write in fsst decompress when data compresses better than 3:1#7323
LuciferYang wants to merge 1 commit into
lance-format:mainfrom
LuciferYang:fix/fsst-decompress-oob-7266

LuciferYang commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

codecov Bot commented Jun 17, 2026

Uh oh!

LuciferYang commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LuciferYang commented Jun 17, 2026

Problem

Fix

Tests

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

codecov Bot commented Jun 17, 2026

Codecov Report

Uh oh!

LuciferYang commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant