Skip to content

allow roundtripping the BytesCodec#3417

Merged
d-v-b merged 20 commits into
zarr-developers:mainfrom
keewis:endian-roundtrip
Jun 18, 2026
Merged

allow roundtripping the BytesCodec#3417
d-v-b merged 20 commits into
zarr-developers:mainfrom
keewis:endian-roundtrip

Conversation

@keewis

@keewis keewis commented Aug 30, 2025

Copy link
Copy Markdown
Contributor

closes #3416

Basically, we're changing the default for endian when constructing BytesCodec using from_dict.

(I also renamed test_endian.py to test_bytes.py as mentioned in #3416 (comment))

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.rst
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions Bot added the needs release notes Automatically applied to PRs which haven't added release notes label Aug 30, 2025
@github-actions github-actions Bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Aug 30, 2025
@keewis

keewis commented Aug 30, 2025

Copy link
Copy Markdown
Contributor Author

looks like some tests actually relied on this behavior, i.e. choosing the default endian if none was specified for multi-byte dtypes (even though this technically violates the spec).

Since this might be potentially user facing (I'm not certain, but at least virtualizarr uses ArrayV3Metadata directly) someone might be relying on this behavior?

What we could do is start a deprecation cycle by checking for this in ArrayV3Metadata.from_dict and warning, then adding the system default. I'll leave the decision to you.

d-v-b and others added 8 commits September 1, 2025 22:26
…zarr-developers#176)

Bumps the actions group with 8 updates in the / directory:

| Package | From | To |
| --- | --- | --- |
| [prefix-dev/setup-pixi](https://github.com/prefix-dev/setup-pixi) | `0.9.5` | `0.9.6` |
| [codecov/codecov-action](https://github.com/codecov/codecov-action) | `6.0.0` | `6.0.1` |
| [github/issue-metrics](https://github.com/github/issue-metrics) | `4.2.2` | `4.2.7` |
| [j178/prek-action](https://github.com/j178/prek-action) | `2.0.3` | `2.0.4` |
| [actions/upload-artifact](https://github.com/actions/upload-artifact) | `7.0.0` | `7.0.1` |
| [actions/download-artifact](https://github.com/actions/download-artifact) | `7.0.0` | `8.0.1` |
| [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish) | `1.13.0` | `1.14.0` |
| [zizmorcore/zizmor-action](https://github.com/zizmorcore/zizmor-action) | `0.5.3` | `0.5.6` |



Updates `prefix-dev/setup-pixi` from 0.9.5 to 0.9.6
- [Release notes](https://github.com/prefix-dev/setup-pixi/releases)
- [Commits](prefix-dev/setup-pixi@1b2de7f...5185adf)

Updates `codecov/codecov-action` from 6.0.0 to 6.0.1
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](codecov/codecov-action@57e3a13...e79a696)

Updates `github/issue-metrics` from 4.2.2 to 4.2.7
- [Release notes](https://github.com/github/issue-metrics/releases)
- [Commits](github-community-projects/issue-metrics@c9e9838...1e38d5e)

Updates `j178/prek-action` from 2.0.3 to 2.0.4
- [Release notes](https://github.com/j178/prek-action/releases)
- [Commits](j178/prek-action@6ad8027...bdca6f1)

Updates `actions/upload-artifact` from 7.0.0 to 7.0.1
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@v7...043fb46)

Updates `actions/download-artifact` from 7.0.0 to 8.0.1
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](actions/download-artifact@v7...3e5f45b)

Updates `pypa/gh-action-pypi-publish` from 1.13.0 to 1.14.0
- [Release notes](https://github.com/pypa/gh-action-pypi-publish/releases)
- [Commits](pypa/gh-action-pypi-publish@v1.13.0...cef2210)

Updates `zizmorcore/zizmor-action` from 0.5.3 to 0.5.6
- [Release notes](https://github.com/zizmorcore/zizmor-action/releases)
- [Commits](zizmorcore/zizmor-action@b1d7e1f...5f14fd0)

---
updated-dependencies:
- dependency-name: prefix-dev/setup-pixi
  dependency-version: 0.9.6
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions
- dependency-name: codecov/codecov-action
  dependency-version: 6.0.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions
- dependency-name: github/issue-metrics
  dependency-version: 4.2.7
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions
- dependency-name: j178/prek-action
  dependency-version: 2.0.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions
- dependency-name: actions/upload-artifact
  dependency-version: 7.0.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions
- dependency-name: actions/download-artifact
  dependency-version: 8.0.1
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions
- dependency-name: pypa/gh-action-pypi-publish
  dependency-version: 1.14.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions
- dependency-name: zizmorcore/zizmor-action
  dependency-version: 0.5.6
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@clbarnes

Copy link
Copy Markdown
Contributor

I agree that this should change (see #4073). Ideally the BytesCodec wouldn't default to the native byte order on regular instantiation but I can see that being a more breaking change; fixing it in from_dict seems a good compromise which would resolve my issue at least.

@d-v-b

d-v-b commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

@keewis do you mind if I push fixes to resolve the conflicts?

@keewis

keewis commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

not at all, go ahead (and the same for any future PRs I submit)

d-v-b and others added 3 commits June 18, 2026 15:10
Resolve conflicts from main's BytesCodec rewrite (deprecated Endian enum,
introduced EndianLiteral/ENDIAN):

- bytes.py: keep main's rewrite plus the branch's from_dict fix
  (setdefault "endian" to None so {"name": "bytes"} round-trips to
  endian=None instead of sys.byteorder).
- test_bytes.py (add/add): union the branch's renamed test_endian.py
  content (integration + sync tests, which the rename would otherwise
  drop) with main's deprecation/evolve suite, plus the branch's
  to_dict/from_dict/roundtrip tests covering the endian=None case.
  Adapt expected values to literal strings rather than the now-deprecated
  Endian members.
- changes/3417.bugfix: convert .rst -> .md to match main's changelog
  migration.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.47%. Comparing base (6c36245) to head (2454c90).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3417   +/-   ##
=======================================
  Coverage   93.47%   93.47%           
=======================================
  Files          90       90           
  Lines       11966    11967    +1     
=======================================
+ Hits        11185    11186    +1     
  Misses        781      781           
Files with missing lines Coverage Δ
src/zarr/codecs/bytes.py 98.71% <100.00%> (+0.01%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

d-v-b and others added 3 commits June 18, 2026 18:21
test_endian and test_endian_write tested the same property — the bytes
codec serializes multi-byte data correctly given an input configuration —
differing only in the input array's byte order, which test_endian fixed to
native (i.e. a slice of test_endian_write's matrix). Merge them into a single
test parameterized over the configuration: input dtype byte order
(>u2 / <u2) x stored byte order (big / little).

Also strengthen the assertion: the originals only checked np.array_equal
round-trips, which a symmetric encode+decode bug would pass. Disable
compression (compressors=None) so the stored chunk is the codec's raw output,
and assert its byte layout matches the configured store endianness directly,
then check the round-trip.

Drop the v2 chunk_key_encoding: it was a vestige of the zarrita original,
where it forced the chunk key to "0.0" to compare byte-for-byte against a
zarr-v2 reference array. That comparison is gone; the encoding is unrelated
to endianness, so use the default key.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- to_dict and from_dict are inverses over the same (endian, wire-dict)
  table; express that with a single shared list of `Expect` cases instead
  of two duplicated parametrize tables.
- rejects_unknown_endian now uses `ExpectFail`, matching the error-table
  idiom used elsewhere (e.g. test_chunk_grids).
- Move the annotation-only `Store` import into a TYPE_CHECKING block
  (TC001), a latent issue from combining `from __future__ import
  annotations` with the integration tests during the earlier merge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Clarify that the fix alters BytesCodec.from_dict: a missing `endian`
configuration is now interpreted as endian=None (what to_dict emits)
rather than the system's native byte order.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@d-v-b

d-v-b commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

@keewis have a look

@keewis keewis left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me.

The only comment I have is, what to do with native endianness? Though maybe that's not an issue because zarr doesn't support it (I didn't check). It certainly wouldn't be easy to support for reading on a machine with different endianness.

@d-v-b

d-v-b commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

looks good to me.

The only comment I have is, what to do with native endianness? Though maybe that's not an issue because zarr doesn't support it (I didn't check). It certainly wouldn't be easy to support for reading on a machine with different endianness.

there shouldn't be any issues with native endianness because the bytes codec describes the endianness that was used for writing bytes. Any other reader that understands the bytes codec can then interpret the bytes as they were written, no matter what the native endianness of the platform. Does that make sense? This is also why endianness is not part of zarr v3 data types.

@keewis

keewis commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

yes, it does, thanks.

@d-v-b d-v-b merged commit ed60e13 into zarr-developers:main Jun 18, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BytesCodec's endian does not roundtrip

3 participants