[mypyc] Enable incremental self-compilation#21369
[mypyc] Enable incremental self-compilation#21369VaggelisD wants to merge 1 commit intopython:masterfrom
Conversation
This comment has been minimized.
This comment has been minimized.
Six fixes on top of python#21299 — all required to compile mypy itself or to install a separate=True wheel via pip. - mypyc/build.py: pip invokes setup.py twice when building a wheel. On the second invocation mypy's incremental cache is fully warm, so we generate no new C source for any group; the resulting extensions ship without their entry points and import as stubs. Fix: when a group emits no C source, reuse the .c file from the previous pass. - mypyc/codegen/{emit,emitfunc}.py: when code in one compiled group reads an attribute on an object whose class lives in another group, the generated cast depends on that other group's struct definitions. We weren't recording the dependency, so the C compiler couldn't see the layout and the build failed. Fix: register the dependency at the cast site. - mypyc/codegen/emitmodule.py + mypyc/build.py: when mypy compiles itself, a generated shim file can share a basename with a runtime C file. The C compiler resolves the runtime include relative to the shim's directory and picks up the shim instead. Fix: emit those includes with the <> form so the search uses -I paths only. The `get_header_deps` regex was tightened to match both quote styles (otherwise headers in <> form drop out of Extension.depends and incremental rebuilds miss layout changes). - mypyc/lib-rt/misc_ops.c: each compiled module gets its own shared library next to it in the package tree. The runtime was computing the module's file path as if a single shared library sat above the whole package, which doubled the package prefix and broke submodule lookups. Fix: detect the per-module case and use only the module's leaf name. - mypyc/irbuild/prepare.py: traits and builtin-derived classes don't get a real C constructor emitted. A clean build sidesteps that, but a fully cached rebuild was taking the direct-call path and producing C that referenced a constructor that doesn't exist. Fix: skip the registration the same way a clean build does. - mypyc/build.py: on every build_ext, setuptools rewrites every compiled .so in the source tree even when nothing changed. On macOS this invalidates the OS signature cache, so every import on the next run pays a re-verification cost. Fix: skip the copy when source and destination already match — takes a 1-line edit rebuild from ~72s to ~6s. setup.py also gets a MYPYC_SEPARATE env knob so CI can exercise the codegen path against mypy itself.
22d5351 to
5aea6ec
Compare
|
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
| # Trait/builtin-base classes have an ir.ctor FuncDecl | ||
| # but no emitted CPyDef_<ctor>, so a cross-group direct | ||
| # call would hit an undefined symbol. Mirror the same | ||
| # skip in prepare_ext_class_def. | ||
| if not ir.is_trait and not ir.builtin_base: | ||
| mapper.func_to_decl[node.node] = ir.ctor |
There was a problem hiding this comment.
i can't find prepare_ext_class_def, i think the mentioned skip is actually in prepare_init_method.
would it be possible to make ClassIR.ctor optional instead? if we don't actually generate it anyway in this case, setting it to None explicitly could help with bugs like this one in the future.
| PyObject *last_segment = last_dot >= 0 | ||
| ? PyUnicode_Substring(module_name, last_dot + 1, name_len) | ||
| : (Py_INCREF(module_name), module_name); |
There was a problem hiding this comment.
i think this is too complicated for a single ternary expression, splitting into if / else would make it more legible.
| PyObject *actual_basename = sep >= 0 | ||
| ? PyUnicode_Substring(shared_lib_file, sep + 1, sf_len) | ||
| : (Py_INCREF(shared_lib_file), shared_lib_file); |
| # Fully-cached SCC (e.g. pip's second setup.py invoke for the | ||
| # wheel phase): mypyc returns empty ctext but the previous run's | ||
| # .c file is still on disk. Reuse it so we don't link with | ||
| # sources=[]. | ||
| if not cfilenames and group_name is not None: | ||
| from mypyc.codegen.emitmodule import group_dir as _group_dir | ||
|
|
||
| short_suffix = "_" + exported_name(group_name.split(".")[-1]) | ||
| existing = os.path.join( | ||
| compiler_options.target_dir, _group_dir(group_name), f"__native{short_suffix}.c" | ||
| ) | ||
| if os.path.exists(existing): | ||
| cfilenames.append(existing) | ||
|
|
There was a problem hiding this comment.
in multi-file compilation we generate multiple .c files so we might need to append all of them here. also somehow handle cases where group_name is None?
i wonder if it would be possible to add the filepaths to the cache instead of reconstructing them here, it seems error-prone.
| headers: set[str] = set() | ||
| for _, contents in cfiles: | ||
| headers.update(re.findall(r'#include "(.*)"', contents)) | ||
| headers.update(re.findall(r'#include [<"]([^>"]+)[>"]', contents)) |
There was a problem hiding this comment.
could this add stuff like <Python.h>?
Six fixes on top of #21299, all required to self-compile mypy or to install a
separate=Truewheel via pip.mypyc/build.py: pip invokessetup.pytwice when building a wheel. On the second invocation mypy's incremental cache is fully warm, so we generate no new C source for any group; the resulting extensions ship without their entry points and import as stubs.mypyc/codegen/{emit,emitfunc}.py: when code in one compiled group reads an attribute on an object whose class lives in another group, the generated cast depends on that other group's struct definitions. We weren't recording the dependency, so the C compiler couldn't see the layout and the build failed.mypyc/codegen/emitmodule.py: when mypy compiles itself, a generated shim file can share a basename with a runtime C file. The C compiler resolves the runtime include relative to the shim's directory and picks up the shim instead.mypyc/lib-rt/misc_ops.c: each compiled module gets its own shared library next to it in the package tree. The runtime was computing the module's file path as if a single shared library sat above the whole package, which doubled the package prefix and broke submodule lookups.mypyc/irbuild/prepare.py: traits and builtin-derived classes don't get a real C constructor emitted. A clean build sidesteps that, but a fully cached rebuild was taking the direct-call path and producing C that referenced a constructor that doesn't exist.mypyc/build.py: on every build_ext, setuptools rewrites every compiled .so in the source tree even when nothing changed. On macOS this invalidates the OS signature cache, so every import on the next run pays a re-verification cost.setuptools limitation though (relevant mypy issue ?)I also added a
MYPYC_SEPARATEenv knob so CI can exercise the codegen path against mypy itself.Benchmarks
Mypy self-compile on macOS,
MYPYC_OPT_LEVEL=0,-j 11. Three scenarios: