fix: out-of-bounds write in fsst decompress when data compresses better than 3:1#7323
Closed
LuciferYang wants to merge 1 commit into
Closed
fix: out-of-bounds write in fsst decompress when data compresses better than 3:1#7323LuciferYang wants to merge 1 commit into
LuciferYang wants to merge 1 commit into
Conversation
Contributor
|
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
…er than 3:1 `decompress` required the output buffer to be at least 3x the input, but a code can decode to an 8-byte symbol and `decompress_bulk` writes a full 8-byte word per code, so the decoded output can be up to 8x the input. Inputs that compress better than 3:1 overran the buffer (segfault under a strict allocator, out-of-bounds write under Miri/ASan). In-tree callers already allocate 8x, so only the public API contract was wrong. Grow the owned output buffer to `8 * in_buf.len() + 8` before decoding (the 8x worst case plus one word of headroom for the final speculative write) instead of enforcing the 3x minimum; it is shrunk back to the decoded length afterwards, so callers see no change. Also reject symbol tables whose symbol length exceeds 8 in `init`: the table is read from untrusted on-disk data and a larger length would break the same bound.
0727889 to
5f238ff
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Contributor
Author
|
close this one since #7267 before this one. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
fsst::decompressenforces a minimum output buffer of3 * input, but that bound is unsound. A code can decode to an 8-byte symbol, anddecompress_bulkwrites a full 8-byte word per code through raw pointers, so the decoded output can be as large as8 * input. When data compresses better than 3:1, decoding into a buffer sized at the documented minimum runs past the end of the allocation: a segfault under a strict allocator, an out-of-bounds write under Miri/ASan.In-tree callers happen to allocate
8 * input, so they were never affected. Only the public contract was wrong.Fix
decompressnow grows its (owned) output buffer to8 * in_buf.len() + 8before decoding, instead of requiring 3x. The+ 8is one word of headroom for the final speculative write, which lands atout_curr + 8even when the last symbol is shorter than 8 bytes. The buffer is shrunk back to the exact decoded length afterwards, so callers see no change.That bound only holds while every symbol is at most 8 bytes. Since the symbol table is read from untrusted on-disk data,
initnow rejects any table carrying a symbol length above 8 (InvalidData) rather than letting it break the bound.Tests
test_decompress_better_than_3x_does_not_overflowand its_64_bit_offsetsvariant compress highly repetitive input (about 8:1) and decode it into a 3x buffer, checking a correct round trip for both 32- and 64-bit offsets. Both fail before the fix.test_decompress_rejects_oversized_symbol_lengthcorrupts a symbol length byte and expectsInvalidData.cargo test -p fsst,cargo clippy -p fsst --tests -- -D warnings, andcargo fmt --allpass.Closes #7266.