Add numpy + tensorflow thread-affinity regression suite#62
Merged
Conversation
The v3.0 fix for numpy/torch/tensorflow segfaults was per-context worker pthreads with stable thread affinity (commit 8a7a68c). py_thread_affinity_ SUITE checks threading.get_native_id invariants in isolation; this new suite drives real numpy and tensorflow operations through exec / eval / call and across multiple Erlang processes hitting the same context. Each case spins up a fresh py_context for isolation. Library imports use a Python try/except that flags import status into a variable so missing modules surface as a clean skip while genuine errors still propagate. TensorFlow stays user-skipped on CI; numpy is now installed on every CT-running leg (Linux/macOS matrix, ASan, free-threaded 3.13t, FreeBSD via py<ver>-numpy). numpy 2.4 still rejects subinterpreter loading, so numpy_owngil_basic self-skips with that ImportError; the case will start running once numpy adds OWN_GIL support.
py312-numpy isn't published in FreeBSD 14.1's pkg repository yet, so pkg install fails the job before the build even starts. The ML SUITE already self-skips when numpy isn't importable, so the right move is to log a warning and move on rather than block CI on an upstream package gap.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The v3.0 simplification fixed numpy / torch / tensorflow segfaults caused by the executor pool moving calls across OS threads. The fix is per-context worker pthreads with stable thread affinity. So far the regression contract has been indirect:
py_thread_affinity_SUITEonly checksthreading.get_native_id()invariants without exercising any library that has the thread-local state we care about.This branch adds a
test/py_ml_libs_SUITE.erlthat drives real numpy and tensorflow operations throughpy_context:exec / eval / calland across multiple Erlang processes targeting the same context. If a future change re-introduces thread-bouncing, these tests crash or return wrong results.Each case spins up a fresh
py_contextso heavy module state doesn't leak between cases and a single failure stays localized. The import probe writes the result into a Python variable, which lets us tell "module not installed" (skip) apart from a real native-extension fault (let it propagate).The CI workflow now installs numpy on every leg that runs CT: the OTP/Python matrix, the ASan matrix, the free-threaded 3.13t job, and the FreeBSD job (via
py<ver>-numpyderived from the existingpython_pkg). TensorFlow remains user-skipped on CI; the cases would run if a developer chooses to install it locally.Notes
numpy_owngil_basiccurrently self-skips with a clear ImportError. The case will start running once numpy adds OWN_GIL support.TF_CPP_MIN_LOG_LEVEL=3from the suite before any TF import.