gh-151289: Add a wide int fast path for add/sub by KRRT7 · Pull Request #151290 · python/cpython

KRRT7 · 2026-06-10T22:31:46Z

gh-151289: Add a wide int fast path for add/sub

This adds a separate fast path for exact PyLong add/sub operands that fit in signed 64-bit integers, while preserving the existing compact-int specialization.

This keeps the compact-int hot path unchanged and avoids broad opcode churn there, while allowing wide exact ints to bypass the slower generic long arithmetic path.

Performance: representative interpreter-only results with JIT disabled:

add_wide:
sub_wide:
add_compact/sub_compact:

Related issue:

Optimize int add/sub for wide exact ints #151289

…declaration Add inline infrastructure to pycore_long.h for the upcoming wide int addition fast path: - _PY_LONG_MAX_DIGITS_FOR_INT64: macro for the maximum digit count that can still fit in int64_t (2 on 30-bit builds, 5 on 15-bit) - _PyLong_FitsInt64(): cheap tag-based check; fast-paths compact and small-digit ints before inspecting the boundary digit - _PyLong_CheckExactAndFitsInt64(): exact-type + fits-int64 guard for use in specialization guards - _PyLong_TryAsInt64Exact(): no-exception int64 extraction; special-cases the ndigits==2/30-bit path for the common case - PyAPI_FUNC declaration for _PyCompactLong_AddWide()

Add three new micro-ops and update the BINARY_OP_ADD_INT macro to use them, replacing the compact-only path: - _GUARD_TOS_INT_WIDE / _GUARD_NOS_INT_WIDE: type guards that accept any exact int fitting in int64_t (via _PyLong_CheckExactAndFitsInt64) - _BINARY_OP_ADD_INT_WIDE: calls _PyCompactLong_AddWide; EXIT_IF on int64 overflow (deopt), ERROR_IF on OOM The existing _GUARD_TOS_INT / _GUARD_NOS_INT compact guards are kept unchanged — they are still used by BINARY_OP_SUBTRACT_INT, BINARY_OP_MULTIPLY_INT, COMPARE_OP_INT, and all subscr ops. Regenerate: generated_cases.c.h, executor_cases.c.h, optimizer_cases.c.h, pycore_opcode_metadata.h, pycore_uop_ids.h, pycore_uop_metadata.h, test_cases.c.h

Change the add specialization condition from _PyLong_CheckExactAndCompact to _PyLong_CheckExactAndFitsInt64 so that exact int operands in the full int64 range (not just compact/single-digit values) are specialized to BINARY_OP_ADD_INT. Subtract and multiply retain their compact-only conditions.

BINARY_OP_ADD_INT now specializes for non-compact int64-range operands (e.g. 10_000_000_000). Update the test accordingly: - Assert BINARY_OP_ADD_INT is used for wide int add - Keep the assertions that BINARY_OP_SUBTRACT_INT and BINARY_OP_MULTIPLY_INT are not used for non-compact ints

…Exact Verify that _PyLong_TryAsInt64Exact correctly handles INT64_MIN (abs_val == INT64_MAX + 1 with negative sign), INT64_MAX, and that values outside the int64 range gracefully fall back to the slow path.

Non-compact (2-digit) int results previously bypassed the freelist and called PyObject_Malloc directly. Add an `ints2` freelist alongside the existing `ints` (1-digit) freelist. - `long_alloc(2)` checks `ints2` before `PyObject_Malloc` - `_PyLong_ExactDealloc` and `long_dealloc` recycle exact 2-digit ints to `ints2` instead of immediately freeing them - `_PyObject_ClearFreeLists` clears `ints2` the same way as `ints`

Extends the ints2 freelist pattern to 3-digit objects, which cover the range [2^60, 2^63-1] (positive) and [-2^63, -2^60] (negative) on 30-bit builds - including INT64_MAX, INT64_MIN, and nanosecond-precision timestamps. Also fuses the two _PyLong_IsCompact + _PyLong_DigitCount checks in long_dealloc under a single PyLong_CheckExact branch. Benchmark (5M ops, 30-bit build): 2-digit+2-digit -> 3-digit result: 19.6 ns -> 17.0 ns (-13%) 3-digit+compact -> 3-digit result: 18.3 ns -> 15.4 ns (-16%) INT64_MAX + 0: 18.2 ns -> 15.9 ns (-13%) INT64_MIN + 0: 18.1 ns -> 16.2 ns (-10%)

…T-free - Remove the dead `_BINARY_OP_ADD_INT` micro-op (no longer referenced by the macro); remove its abstract op from optimizer_bytecodes.c. - Annotate `_GUARD_TOS_INT_WIDE`, `_GUARD_NOS_INT_WIDE`, and `_BINARY_OP_ADD_INT_WIDE` as `tier1`-only so the JIT executor and optimizer generator skip them entirely. The JIT defers to tier 1 for any `BINARY_OP_ADD_INT` trace; no new JIT code paths are introduced. - Add a compact fast-path to `_PyCompactLong_AddWide` so compact-only int addition retains its original `medium_value` cost and avoids the int64-extraction overhead. - Use `__builtin_add_overflow` in `_Py_i64_add_overflow` on GCC/Clang (single instruction on x86-64 / ARM64). - Peel the last loop iteration in `_PyLong_TryAsInt64Exact` to hoist the max-digit overflow-guard out of the inner loop body.

Change the subtract specialization condition to accept exact ints in the full int64 range, matching the widened add path while keeping multiply compact-only.

bedevere-app · 2026-06-10T22:31:54Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

bedevere-app · 2026-06-10T22:32:58Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

KRRT7 added 11 commits June 10, 2026 19:10

test(longobject): add INT64_MIN boundary tests for _PyLong_TryAsInt64…

5b69f64

…Exact Verify that _PyLong_TryAsInt64Exact correctly handles INT64_MIN (abs_val == INT64_MAX + 1 with negative sign), INT64_MAX, and that values outside the int64 range gracefully fall back to the slow path.

perf(specialize): widen BINARY_OP_SUBTRACT_INT to full int64 range

81713ed

Change the subtract specialization condition to accept exact ints in the full int64 range, matching the widened add path while keeping multiply compact-only.

perf(longobject): keep wide int helper local

a4b3e95

perf(longobject): add wide int fast path

8d2d3c9

KRRT7 requested review from Fidget-Spinner, ZeroIntensity, ericsnowcurrently, markshannon, savannahostrowski and tomasr8 as code owners June 10, 2026 22:31

bedevere-app Bot added the awaiting review label Jun 10, 2026

bedevere-app Bot mentioned this pull request Jun 10, 2026

Optimize int add/sub for wide exact ints #151289

Open

Merge remote-tracking branch 'upstream/main' into wide-int-accel

d8b9f3f

KRRT7 added 5 commits June 10, 2026 17:41

Misc/NEWS: add blurb for wide int fast path

05023f4

regen opcode cases for wide int fast path

c1a95ef

perf(longobject): restore JIT optimizer cases for wide ints

540d96c

test: make wide int benchmark more stable

0f42443

test: make wide int benchmark import-safe

4864750

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-151289: Add a wide int fast path for add/sub#151290

gh-151289: Add a wide int fast path for add/sub#151290
KRRT7 wants to merge 17 commits into
python:mainfrom
KRRT7:wide-int-accel

KRRT7 commented Jun 10, 2026

Uh oh!

bedevere-app Bot commented Jun 10, 2026

Uh oh!

bedevere-app Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

KRRT7 commented Jun 10, 2026

Uh oh!

bedevere-app Bot commented Jun 10, 2026

Uh oh!

bedevere-app Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant