Skip to content

feat(blob-v2): configure packed blob file max size with field metadata#7322

Open
jo-migo wants to merge 2 commits into
lance-format:mainfrom
jo-migo:fix-packed-storage-filesize
Open

feat(blob-v2): configure packed blob file max size with field metadata#7322
jo-migo wants to merge 2 commits into
lance-format:mainfrom
jo-migo:fix-packed-storage-filesize

Conversation

@jo-migo

@jo-migo jo-migo commented Jun 17, 2026

Copy link
Copy Markdown

Fixes

#7292

Problem

There's currently no way to encode a threshold for maximum packed blob sidecar file size into a dataset in the way that you can encode inline blob and dedicated blob size thresholds via lance-encoding:blob-dedicated-size-threshold and lance-encoding:blob-inline-size-threshold respectively.

Solution

Add a new lance-encoding:blob-pack-file-size-threshold field-level metadata key which informs the blob writer to only start new packed files when current packed file reaches that threshold.

It can be overridden by supplying another value via the existing blob_pack_file_size_threshold parameter to the write_dataset function.

@github-actions github-actions Bot added A-python Python bindings enhancement New feature or request labels Jun 17, 2026
@github-actions

Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

Fixes: lance-format#7292

Problem:

There's currently no way to encode a threshold for maximum packed blob
sidecar file size into a dataset in the way that you can encode
inline blob and dedicated blob size thresholds via `lance-encoding:blob-dedicated-size-threshold`
and `lance-encoding:blob-inline-size-threshold` respectively.

Solution:

Add a new `lance-encoding:blob-pack-file-size-threshold` field-level
metadata key which informs the blob writer to only start new packed
files when current packed file reaches that threshold.

It can be overridden by supplying another value via the existing `blob_pack_file_size_threshold`
parameter to the `write_dataset` function.
@jo-migo jo-migo force-pushed the fix-packed-storage-filesize branch from e6b4c40 to 08c0e06 Compare June 17, 2026 14:28
@jo-migo jo-migo changed the title feat(blob-v2): Configure Packed Blob File Max Size with Field Metadata feat(blob-v2): configure packed blob file max size with field metadata Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-python Python bindings enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant