Skip to content

Add numpy + tensorflow thread-affinity regression suite#62

Merged
benoitc merged 2 commits intomainfrom
feature/ml-libs-tests
May 3, 2026
Merged

Add numpy + tensorflow thread-affinity regression suite#62
benoitc merged 2 commits intomainfrom
feature/ml-libs-tests

Conversation

@benoitc
Copy link
Copy Markdown
Owner

@benoitc benoitc commented May 3, 2026

Summary

The v3.0 simplification fixed numpy / torch / tensorflow segfaults caused by the executor pool moving calls across OS threads. The fix is per-context worker pthreads with stable thread affinity. So far the regression contract has been indirect: py_thread_affinity_SUITE only checks threading.get_native_id() invariants without exercising any library that has the thread-local state we care about.

This branch adds a test/py_ml_libs_SUITE.erl that drives real numpy and tensorflow operations through py_context:exec / eval / call and across multiple Erlang processes targeting the same context. If a future change re-introduces thread-bouncing, these tests crash or return wrong results.

Each case spins up a fresh py_context so heavy module state doesn't leak between cases and a single failure stays localized. The import probe writes the result into a Python variable, which lets us tell "module not installed" (skip) apart from a real native-extension fault (let it propagate).

The CI workflow now installs numpy on every leg that runs CT: the OTP/Python matrix, the ASan matrix, the free-threaded 3.13t job, and the FreeBSD job (via py<ver>-numpy derived from the existing python_pkg). TensorFlow remains user-skipped on CI; the cases would run if a developer chooses to install it locally.

Notes

  • numpy 2.4's C extension still rejects subinterpreter loading, so numpy_owngil_basic currently self-skips with a clear ImportError. The case will start running once numpy adds OWN_GIL support.
  • TensorFlow's chatty C++ logging is suppressed by setting TF_CPP_MIN_LOG_LEVEL=3 from the suite before any TF import.
  • Local run on Python 3.14: 3 numpy cases pass, 1 numpy case skipped (subinterpreter limitation), 2 TensorFlow cases skipped (TF not installed locally). Full CT: 532 / 8 / 0.

benoitc added 2 commits May 3, 2026 09:55
The v3.0 fix for numpy/torch/tensorflow segfaults was per-context worker
pthreads with stable thread affinity (commit 8a7a68c). py_thread_affinity_
SUITE checks threading.get_native_id invariants in isolation; this new
suite drives real numpy and tensorflow operations through exec / eval /
call and across multiple Erlang processes hitting the same context.

Each case spins up a fresh py_context for isolation. Library imports use
a Python try/except that flags import status into a variable so missing
modules surface as a clean skip while genuine errors still propagate.
TensorFlow stays user-skipped on CI; numpy is now installed on every
CT-running leg (Linux/macOS matrix, ASan, free-threaded 3.13t, FreeBSD
via py<ver>-numpy).

numpy 2.4 still rejects subinterpreter loading, so numpy_owngil_basic
self-skips with that ImportError; the case will start running once
numpy adds OWN_GIL support.
py312-numpy isn't published in FreeBSD 14.1's pkg repository yet, so
pkg install fails the job before the build even starts. The ML SUITE
already self-skips when numpy isn't importable, so the right move is
to log a warning and move on rather than block CI on an upstream
package gap.
@benoitc benoitc merged commit c74bc39 into main May 3, 2026
27 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant