Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,9 @@ jobs:
- name: Run xref
run: rebar3 xref

- name: Lint docs
run: escript scripts/lint_doc_snippets.escript

# FreeBSD test using cross-platform action
test-freebsd:
name: FreeBSD 14 / Python ${{ matrix.python }}
Expand Down
20 changes: 20 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
.PHONY: all compile test lint-docs clean

all: compile

compile:
rebar3 compile

test:
rebar3 ct --readable=compact

# Validate fenced code blocks in README.md and docs/*.md.
# Erlang `py:Fn(...)` calls must reference a real export at the right
# arity; Python blocks must parse (IndentationError tolerated for
# tutorial fragments). Mark a block to skip with `<!-- skip-lint -->`
# on the line immediately above the opening fence.
lint-docs: compile
escript scripts/lint_doc_snippets.escript

clean:
rebar3 clean
9 changes: 6 additions & 3 deletions docs/migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ application:set_env(erlang_python, context_mode, owngil).

**`py:num_executors/0`** - Removed. Contexts now use per-context worker threads.

<!-- skip-lint -->
```erlang
%% v2.x - check executor count
N = py:num_executors().
Expand Down Expand Up @@ -254,6 +255,7 @@ N = py_context_router:num_contexts().
The function for non-blocking Python calls has been renamed to follow gen_server conventions:

**Before (v1.8.x):**
<!-- skip-lint -->
```erlang
Ref = py:call_async(math, factorial, [100]),
{ok, Result} = py:await(Ref).
Expand Down Expand Up @@ -355,6 +357,7 @@ For more sophisticated web framework integration, consider the [Reactor API](rea
The process-binding functions have been removed. The new architecture uses `py_context_router` for automatic scheduler-affinity routing.

**Before (v1.8.x):**
<!-- skip-lint -->
```erlang
ok = py:bind(),
ok = py:exec(<<"x = 42">>),
Expand Down Expand Up @@ -760,9 +763,9 @@ ImportError: module does not support subinterpreters
```

Options:
1. Use Python < 3.12 (falls back to multi_executor mode)
2. Check if the library has a subinterpreter-compatible version
3. Isolate the library usage to a single context
1. Use Python 3.12 or 3.13: the runtime uses `worker` mode (subinterpreters require Python 3.14+).
2. Check if the library has a subinterpreter-compatible version.
3. Isolate the library usage to a single context.

### Python 3.14: `erlang_loop_import_failed`

Expand Down
61 changes: 46 additions & 15 deletions docs/owngil_internals.md
Original file line number Diff line number Diff line change
Expand Up @@ -425,22 +425,53 @@ class EchoProtocol(reactor.Protocol):

## Performance Characteristics

| Operation | Shared-GIL | OWN_GIL |
|-----------|-----------|---------|
| Operation | Worker (shared GIL) | OWN_GIL |
|-----------|--------------------|---------|
| Call overhead | ~2.5μs | ~10μs |
| Throughput (single) | 400K/s | 100K/s |
| Parallelism | None | True |
| Resource usage | Lower | Higher (1 pthread per context) |

Use OWN_GIL when:
- CPU-bound Python work that benefits from parallelism
- Long-running computations
- Need true concurrent Python execution

Use worker mode when:
- I/O-bound or short operations
- High call frequency
- Resource constraints
| Throughput (single context) | ~400K/s | ~100K/s |
| Parallelism (N contexts) | GIL-bound | Linear up to N cores |
| Resource usage | One pthread per context | One pthread + one subinterpreter per context |

## Pros and Cons

### Pros

- **True CPU parallelism.** Each context owns its GIL, so N contexts run on N cores at once. Worker mode serialises on the main GIL unless Python is built free-threaded (3.13t+).
- **Crash isolation.** A C-level fault in one subinterpreter leaves the others alive. Worker mode shares the main interpreter, so a corrupt module state can take everything down.
- **Clean namespace per context.** Each subinterpreter has its own `sys.modules`, so module-level state cannot bleed between contexts. Useful when running adversarial or untrusted code paths side by side.
- **Predictable scheduling.** Requests are dispatched via mutex/condvar IPC, not dirty schedulers, so OWN_GIL contexts will not be starved by other dirty NIF traffic.

### Cons

- **Python 3.14+ only.** Earlier versions have C-extension global-state bugs (`_decimal`, `numpy`, etc.) that crash inside subinterpreters. See [cpython#106078](https://github.com/python/cpython/issues/106078).
- **Higher per-call latency.** ~4x the round-trip cost of worker mode (~10μs vs ~2.5μs) because every call crosses a mutex/condvar handoff to the dedicated thread.
- **Higher memory.** Each subinterpreter imports its own copy of every module. A 50 MB module set across 8 contexts is ~400 MB resident, not 50 MB.
- **C-extension compatibility is not universal.** Extensions must opt in via the multi-phase init protocol (PEP 489) and `Py_mod_multiple_interpreters`. Pure-Python and well-behaved C extensions work; older ones fail at import inside the subinterpreter.
- **No shared Python state.** Module globals, class definitions, and cached objects are per-interpreter. Use `py:state_store/2` (ETS-backed) or `erlang.send` for cross-context data.
- **Callback re-entry is restricted.** When Python in an OWN_GIL context calls `erlang.call`, the callback runs on a thread worker, not back on the OWN_GIL thread (which cannot suspend). Re-entrant Python -> Erlang -> *same* OWN_GIL context calls will not work; use a different context for the nested call, or use `erlang.async_call` from asyncio code.
- **Process-local envs do not span interpreters.** A `py_env_resource_t` is bound to the interpreter that created it. Reusing one across contexts returns `{error, env_wrong_interpreter}`.

### When to Use Each

Use **OWN_GIL** when:

- The workload is CPU-bound Python (ML inference, numpy/torch compute, parsing, codecs) and you want N-way parallelism per BEAM scheduler.
- You can pin the per-context memory budget and the modules in use are subinterpreter-safe.
- You are on Python 3.14+.

Use **worker** (default) when:

- You are on Python 3.12 or 3.13.
- Calls are short and frequent (every microsecond of overhead matters).
- You are running modules that are not subinterpreter-safe (some scientific stacks, older C extensions).
- You are already running free-threaded Python (3.13t+); worker mode gets parallelism for free without the per-interpreter memory cost.

### Common Pitfalls

- **Importing once is not enough.** Imports happen per subinterpreter. Pre-warming a worker context will not pre-warm the OWN_GIL contexts; do it inside each `py_context`.
- **Sharing Python objects across contexts.** Passing a `PyObject*` reference (via `py_state` or otherwise) between OWN_GIL contexts is undefined behaviour. Round-trip through Erlang terms or ETS-backed state.
- **Long-running tasks block the dispatcher.** A single OWN_GIL context processes one request at a time. If you have a 30-second compute job, parallelise across contexts; do not queue everything onto context 1.
- **Callback storms.** Heavy `erlang.call` use inside an OWN_GIL context routes to thread workers, which is fine, but the round-trip cost is then worker-style on top of OWN_GIL dispatch. For tight callback loops, prefer worker mode end-to-end.

## Benchmarking

Expand Down
8 changes: 6 additions & 2 deletions docs/scalability.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,10 @@ Ctx = py:context(1),
- Higher memory usage (each interpreter loads modules separately)
- Some C extensions don't support subinterpreters
- Requires Python 3.14+
- Higher per-call latency (~4x worker)
- Callback re-entry to the same context is restricted (`erlang.call` from inside an OWN_GIL context routes to a thread worker, not back to that context)

For a fuller breakdown of OWN_GIL tradeoffs, common pitfalls, and a usage decision guide, see [OWN_GIL Internals: Pros and Cons](owngil_internals.md#pros-and-cons).

## Subinterpreter Architecture

Expand Down Expand Up @@ -144,7 +148,7 @@ Ctx = py:context(1),
│ │ └──────────┘ │ │ └──────────┘ │ │ └──────────┘ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ Each thread owns its interpreter's GIL (Py_GIL_OWN)
│ Each thread owns its GIL (PyInterpreterConfig_OWN_GIL)
│ No GIL contention between threads │
└─────────────────────────────────────────────────────────────────┘
```
Expand All @@ -155,7 +159,7 @@ Ctx = py:context(1),

**py_context_process**: Gen_server that owns a Python context reference and handles call/eval/exec operations.

**Subinterpreter Thread Pool (C)**: Manages N threads, each with its own Python subinterpreter created with `Py_NewInterpreterFromConfig()` and `Py_GIL_OWN`.
**Subinterpreter Thread Pool (C)**: Manages N threads, each with its own Python subinterpreter created with `Py_NewInterpreterFromConfig()` and `PyInterpreterConfig_OWN_GIL`.

### Request Flow

Expand Down
2 changes: 2 additions & 0 deletions docs/security.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ This provides defense-in-depth - even if Python code tries to import `os` or `su

When blocked operations are attempted, you'll see:

<!-- skip-lint -->
```python
>>> import subprocess
>>> subprocess.run(['ls'])
Expand All @@ -50,6 +51,7 @@ fork()/exec() would corrupt the Erlang runtime.
Use Erlang ports (open_port/2) for subprocess management.
```

<!-- skip-lint -->
```python
>>> import os
>>> os.fork()
Expand Down
17 changes: 11 additions & 6 deletions docs/shared-dict.md
Original file line number Diff line number Diff line change
Expand Up @@ -279,16 +279,21 @@ ok = py:shared_dict_destroy(Session).
%% Create shared cache
{ok, Cache} = py:shared_dict_new(),

%% Python can populate the cache
%% Inject the handle into Python globals (py:exec/1 has no locals
%% argument, so we stash it via py:eval with a side effect).
{ok, _} = py:eval(
<<"(globals().__setitem__('_cache_handle', handle), None)[-1]">>,
#{handle => Cache}),

%% Python can now populate the cache
ok = py:exec(<<"
from erlang import SharedDict
cache = SharedDict(handle)
cache['computed'] = expensive_computation()
">>,
ok = py:eval(<<"1">>, #{<<"handle">> => Cache}),
cache = SharedDict(_cache_handle)
cache['computed'] = 42
">>),

%% Erlang can read cached values
CachedValue = py:shared_dict_get(Cache, <<"computed">>).
42 = py:shared_dict_get(Cache, <<"computed">>).
```

## See Also
Expand Down
Loading
Loading