Skip to content

gh-143732: add specialization for FOR_ITER#148745

Open
NekoAsakura wants to merge 12 commits intopython:mainfrom
NekoAsakura:gh-143732/for-iter-type-recording
Open

gh-143732: add specialization for FOR_ITER#148745
NekoAsakura wants to merge 12 commits intopython:mainfrom
NekoAsakura:gh-143732/for-iter-type-recording

Conversation

@NekoAsakura
Copy link
Copy Markdown
Contributor

@NekoAsakura NekoAsakura commented Apr 19, 2026

main branch ratio
for_iter_dict_items 110.23 ms ± 1.74 ms 99.29 ms ± 1.24 ms 1.11× faster
for_iter_dict_keys 85.20 ms ± 1.03 ms 76.20 ms ± 1.02 ms 1.12× faster
for_iter_dict_values 84.91 ms ± 1.15 ms 76.27 ms ± 1.24 ms 1.11× faster
for_iter_set 96.88 ms ± 1.05 ms 87.46 ms ± 1.16 ms 1.11× faster
for_iter_reversed 79.99 ms ± 0.77 ms 72.03 ms ± 0.79 ms 1.11× faster
for_iter_enumerate 123.65 ms ± 1.94 ms 110.14 ms ± 1.82 ms 1.12× faster
for_iter_zip 131.47 ms ± 1.38 ms 122.17 ms ± 2.87 ms 1.08× faster
for_iter_list 59.21 ms ± 1.17 ms 59.48 ms ± 1.06 ms 1.00× slower
for_iter_tuple 53.78 ms ± 0.61 ms 53.77 ms ± 0.74 ms 1.00× faster
for_iter_range 72.62 ms ± 0.80 ms 72.55 ms ± 1.08 ms 1.00× faster

Copy link
Copy Markdown
Member

@cocolato cocolato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for doing this!

Comment thread Python/bytecodes.c Outdated
}

macro(FOR_ITER) = _SPECIALIZE_FOR_ITER + _FOR_ITER;
macro(FOR_ITER) = _SPECIALIZE_FOR_ITER + _RECORD_NOS_GEN_FUNC + _RECORD_NOS_TYPE + _FOR_ITER;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case JIT_SYM_RECORDED_GEN_FUNC_TAG:
return &PyGen_Type;

We don't need _RECORD_NOS_GEN_FUNC here, because the GEN_FUNC_TAG recorded here does not contribute to the current optimization.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every FOR_ITER specialisation's record list must be a prefix of FOR_ITER's.
_RECORD_NOS_GEN_FUNC writes a gen func or NULL to slot 0, matching what FOR_ITER_GEN reads from it.
https://github.com/python/cpython/actions/runs/24623062259/job/71997168333

PyType_Watch(TYPE_WATCHER_ID, (PyObject *)probable);
_Py_BloomFilter_Add(dependencies, probable);
sym_set_type(iter, probable);
int32_t orig_target = (this_instr - 1)->target;
Copy link
Copy Markdown
Member

@cocolato cocolato Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add an assert to make sure the last uop is _RECORD_NOS_TYPE

Copy link
Copy Markdown
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've a few suggestions inline.

Comment thread Python/optimizer_bytecodes.c Outdated
Comment thread Python/optimizer_bytecodes.c Outdated
Comment thread Python/bytecodes.c
}

tier2 op(_ITER_NEXT_INLINE, (iternext_fn/4, iter, null_or_index -- iter, null_or_index, next)) {
volatile iternextfunc iternext_v = (iternextfunc)iternext_fn;
Copy link
Copy Markdown
Member

@markshannon markshannon May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why volatile? It shouldn't be necessary.
Also function pointers may not be the same size as normal pointers.
Can you add assert(sizeof(iternextfunc) == sizeof(uintptr_t)); to be on the safe side.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because mmap pages usually sit too far from tp_iternext for the offset to reach, we have to use volatile to force compiler to emit callq *%rax (target read from a register) instead of callq <rel32> (target baked in as a fixed offset). Otherwise the call jumps to the wrong address and segfaults.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this fail just for x86 or for AArch64 as well?

This is a bug in the JIT and we fix it, rather than add workarounds.
I think we should be adding a trampoline here, but it seems that we are not.
If you're looking at the stencils, can you see how it is being patched (what patch function is being called)?

@brandtbucher

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both. (aarch64 was tested under qemu-user)
Trampoline path doesn't appear to be designed for _JIT_* symbols:

def symbol_to_value(symbol: str) -> tuple[HoleValue, str | None]:
"""
Convert a symbol name to a HoleValue and a symbol name.
Some symbols (starting with "_JIT_") are special and are converted to their
own HoleValues.
"""
if symbol.startswith("_JIT_"):
try:
return HoleValue[symbol.removeprefix("_JIT_")], None
except KeyError:
pass
return HoleValue.ZERO, symbol

if (
hole.kind
in {"R_AARCH64_CALL26", "R_AARCH64_JUMP26", "ARM64_RELOC_BRANCH26"}
and hole.value is HoleValue.ZERO
and hole.symbol not in self.symbols

# x86_64 Darwin trampolines for external symbols
elif (
hole.kind == "X86_64_RELOC_BRANCH"
and hole.value is HoleValue.ZERO
and hole.symbol not in self.symbols

Even if we extended it to those, we'd need a stub per instance. I don't think it gains us anything.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need a trampoline if the jitted code is too far from the executable. See #148822

@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented May 1, 2026

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be poked with soft cushions!

@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community Bot commented May 2, 2026

Documentation build overview

📚 cpython-previews | 🛠️ Build #32515427 | 📁 Comparing 49baba9 against main (4b33308)

  🔍 Preview build  

97 files changed · + 1 added · ± 96 modified

+ Added

± Modified

@markshannon
Copy link
Copy Markdown
Member

OK, well let's go with your approach for now. We already do the same here

I've created #149316 so we won't need the "volatile"

@markshannon markshannon self-requested a review May 3, 2026 09:54
Comment thread Python/bytecodes.c Outdated

macro(FOR_ITER_VIRTUAL) =
unused/1 + // Skip over the counter
_RECORD_NOS +
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is by design from #148730. When picking the family-wide recorder, the cases generator only looks at members, not the head:

member_records = [instruction_records[m.name] for m in family_members]
all_member_names = {n for names in member_records for n in names}

So without _RECORD_NOS on a member somewhere, _RECORD_NOS_TYPE on FOR_ITER and _RECORD_NOS_GEN_FUNC on FOR_ITER_GEN will clash. I can add a comment about this.

Comment thread Python/optimizer_bytecodes.c Outdated
op(_FOR_ITER_TIER_TWO, (iter, null_or_index -- iter, null_or_index, next)) {
PyTypeObject *type = sym_get_type(iter);
if (type != NULL && type != &PyGen_Type && type->tp_iternext != NULL) {
ADD_OP(_ITER_NEXT_INLINE, 0, (uintptr_t)type->tp_iternext);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the watcher here as well.

What we do elsewhere is something like this:

bool definite = true;
PyTypeObject *type = sym_get_type(iter);
if (type == NULL) {
    type = sym_get_probable_type(iter);
    definite = false;
}
if (type != NULL) {
   // Add if not definite, otherwise NOP
   // Add watcher(s)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants