Add numpy + tensorflow thread-affinity regression suite by benoitc · Pull Request #62 · benoitc/erlang-python

benoitc · 2026-05-03T07:55:41Z

Summary

The v3.0 simplification fixed numpy / torch / tensorflow segfaults caused by the executor pool moving calls across OS threads. The fix is per-context worker pthreads with stable thread affinity. So far the regression contract has been indirect: py_thread_affinity_SUITE only checks threading.get_native_id() invariants without exercising any library that has the thread-local state we care about.

This branch adds a test/py_ml_libs_SUITE.erl that drives real numpy and tensorflow operations through py_context:exec / eval / call and across multiple Erlang processes targeting the same context. If a future change re-introduces thread-bouncing, these tests crash or return wrong results.

Each case spins up a fresh py_context so heavy module state doesn't leak between cases and a single failure stays localized. The import probe writes the result into a Python variable, which lets us tell "module not installed" (skip) apart from a real native-extension fault (let it propagate).

The CI workflow now installs numpy on every leg that runs CT: the OTP/Python matrix, the ASan matrix, the free-threaded 3.13t job, and the FreeBSD job (via py<ver>-numpy derived from the existing python_pkg). TensorFlow remains user-skipped on CI; the cases would run if a developer chooses to install it locally.

Notes

numpy 2.4's C extension still rejects subinterpreter loading, so numpy_owngil_basic currently self-skips with a clear ImportError. The case will start running once numpy adds OWN_GIL support.
TensorFlow's chatty C++ logging is suppressed by setting TF_CPP_MIN_LOG_LEVEL=3 from the suite before any TF import.
Local run on Python 3.14: 3 numpy cases pass, 1 numpy case skipped (subinterpreter limitation), 2 TensorFlow cases skipped (TF not installed locally). Full CT: 532 / 8 / 0.

The v3.0 fix for numpy/torch/tensorflow segfaults was per-context worker pthreads with stable thread affinity (commit 8a7a68c). py_thread_affinity_ SUITE checks threading.get_native_id invariants in isolation; this new suite drives real numpy and tensorflow operations through exec / eval / call and across multiple Erlang processes hitting the same context. Each case spins up a fresh py_context for isolation. Library imports use a Python try/except that flags import status into a variable so missing modules surface as a clean skip while genuine errors still propagate. TensorFlow stays user-skipped on CI; numpy is now installed on every CT-running leg (Linux/macOS matrix, ASan, free-threaded 3.13t, FreeBSD via py<ver>-numpy). numpy 2.4 still rejects subinterpreter loading, so numpy_owngil_basic self-skips with that ImportError; the case will start running once numpy adds OWN_GIL support.

py312-numpy isn't published in FreeBSD 14.1's pkg repository yet, so pkg install fails the job before the build even starts. The ML SUITE already self-skips when numpy isn't importable, so the right move is to log a warning and move on rather than block CI on an upstream package gap.

benoitc added 2 commits May 3, 2026 09:55

benoitc merged commit c74bc39 into main May 3, 2026
27 of 28 checks passed

benoitc mentioned this pull request May 3, 2026

py_thread_callback_SUITE: concurrent threadpool tests flaky under load #63

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add numpy + tensorflow thread-affinity regression suite#62

Add numpy + tensorflow thread-affinity regression suite#62
benoitc merged 2 commits intomainfrom
feature/ml-libs-tests

benoitc commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

benoitc commented May 3, 2026

Summary

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant