Skip to content

JIT can trigger assert(uop_buffer_remaining_space(trace) > space_needed); for widened uops #149335

@eendebakpt

Description

@eendebakpt

Bug report

Bug description:

The assert(uop_buffer_remaining_space(trace) > space_needed); in _PyJit_translate_single_bytecode_to_trace can be triggered if a uop is widened. Found when the CI for #148271 was failing (there TO_BOOL gets an extra _RECORD_TOS_TYPE). @markshannon

Reporting the issue already, I will try to create a simple reproducer and more concise explanation.

Full explanation by Claude

JIT trace-buffer assertion: full explanation

The assertion

In _PyJit_translate_single_bytecode_to_trace (Python/optimizer.c), each bytecode appended to a tier-2 trace is gated by a Py_DEBUG assertion:

assert(uop_buffer_remaining_space(trace) > space_needed);

where

space_needed = expansion->nuops + needs_guard_ip + 2 + (!OPCODE_HAS_NO_SAVE_IP(opcode))

expansion->nuops is the number of uops the bytecode's macro expands to. The + 2 covers the _CHECK_VALIDITY uop emitted at the start of every bytecode plus a reserved _DEOPT slot for it. + needs_guard_ip covers the _GUARD_IP_* slot for opcodes that need it. + (!OPCODE_HAS_NO_SAVE_IP) covers the _SET_IP slot.

Just before the assertion, trace->end has already been decremented to reserve the side-exit/error/deopt/guard-ip stubs at the buffer's tail:

trace->end -= 2 + has_exit + has_error + has_deopt + needs_guard_ip

So if R is the buffer's true remaining space at the start of the bytecode, the assertion is checking R - tail > space_needed, i.e.

R > 4 + has_exit + has_error + has_deopt + 2*needs_guard_ip + nuops + has_save_ip

The fitness mechanism

_PyJit_translate_single_bytecode_to_trace also runs a fitness gate before the assertion:

if (fitness < eq) goto done;

eq is compute_exit_quality(opcode). For specializable opcodes (like TO_BOOL, CALL_*) it returns EXIT_QUALITY_SPECIALIZABLE, defined as FITNESS_INITIAL / 80. With FITNESS_INITIAL == UOP_MAX_TRACE_LENGTH == 1000 in debug builds, that's 12.

Per bytecode, fitness is debited by slots_used = remaining_before − remaining_after, i.e. exactly the number of buffer slots the bytecode just consumed (uops emitted + tail reservations made). So fitness and remaining_space are intended to track in lockstep, with branch/frame penalties only making fitness drop faster (which would only ever make the assertion safer).

The intended invariant is therefore fitness <= remaining_space. If that held, then fitness >= eq = 12 would imply remaining_space >= 12 too, and 12 slots is enough for every existing space_needed.

Where the invariant breaks

Two accounting holes push fitness above remaining_space:

  1. Trace prelude not charged. _PyJit_StartTracing emits _START_EXECUTOR and _MAKE_WARM into the buffer, costing 2 slots. Fitness is initialized to FITNESS_INITIAL (1000) right after, with no debit. So before the first bytecode: fitness = 1000, remaining = 998. Drift = +2.

  2. needs_guard_ip under-reserved. When needs_guard_ip is true (e.g. after _PUSH_FRAME), the code emits up to three uops (_RECORD_CODE, the guard_ip uop itself, and guard_code_version_uop if the called object is a code object). But space_needed only adds + needs_guard_ip (i.e. 1). Each such occurrence under-reserves by 1–2 slots, but those slots are still consumed by next++, so slots_used does include them and fitness drops correctly. The mismatch is between the predicted space_needed and the actual emission, not between fitness and buffer — but it tightens the assertion further.

I confirmed the drift directly. With a release build with UOP_MAX_TRACE_LENGTH hacked down to 1000 (matching debug) and a printf inserted at the trace-finalize site, a long branchless reproducer produced:

TRACE_LEN: length=666 remaining=14 fitness=18 UOP_MAX=1000

fitness = 18, remaining = 14 — fitness sits 4 slots higher than the buffer.

Why this PR exposes it

The PR for gh-143732 changes the TO_BOOL macro from

macro(TO_BOOL) = _SPECIALIZE_TO_BOOL + unused/2 + _TO_BOOL;

to

macro(TO_BOOL) = _SPECIALIZE_TO_BOOL + _RECORD_TOS_TYPE + unused/2 + _TO_BOOL;

That bumps expansion->nuops for TO_BOOL from 1 to 2.

Concrete space_needed for TO_BOOL (no exit/deopt/guard_ip, has_error, has_save_ip):

nuops needs_guard_ip +2 has_save_ip space_needed
main 1 0 2 1 4
PR 2 0 2 1 5

After tail reservations (trace->end -= 2 + has_error = 3), the post-decrement remaining must exceed space_needed. So the true-remaining R we need is:

tail space_needed R required
main 3 4 > 7
PR 3 5 > 8

Worst case at the fitness gate: fitness == 12 and the drift is fitness − remaining = +4, so R = 8. On main, 8 > 7 ✓. On the PR, 8 > 8 ✗ — assertion fires.

Empirical confirmation

Debug + JIT build, 30 runs of ./python -m test test_strtod --multiprocess 0 --verbose2 --verbose3:

Build TO_BOOL macro Pass Fail
main original 30 0
main + PR's macro change with _RECORD_TOS_TYPE 28 2

Failures are always in test_short_halfway_cases, which uses random.randrange, so the failing trace shape varies and the failure is flaky (~5–10%). The C backtrace lands in _PyJit_translate_single_bytecode_to_trace reached from the test's if e + q.bit_length() > max_exp: line — so the bytecode that actually trips the assertion is plausibly a wide CALL (e.g. CALL_METHOD_DESCRIPTOR_NOARGS for q.bit_length()), not TO_BOOL itself; widening TO_BOOL just shifts trace timing enough to expose it.

Bottom line

It's a thin-margin coupling, not a PR bug. The intended invariant fitness <= remaining_space doesn't actually hold today; in some traces fitness is ~4 slots above remaining. With EXIT_QUALITY_SPECIALIZABLE = 12, the worst-case assertion threshold is right at the boundary for the PR's wider TO_BOOL, so any future macro widening can resurface this. Fixes (in increasing principled-ness): (a) graceful goto done instead of asserting; (b) raise EXIT_QUALITY_SPECIALIZABLE to ~25 to leave room for the widest opcode plus the +4 drift; (c) fix the two accounting holes so the invariant actually holds.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)topic-JITtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions