Skip to content

cuda.core: Cythonize GraphBuilder and Graph with handle-layer cleanup#2008

Open
Andy-Jost wants to merge 4 commits intoNVIDIA:mainfrom
Andy-Jost:graph-builder-refactor
Open

cuda.core: Cythonize GraphBuilder and Graph with handle-layer cleanup#2008
Andy-Jost wants to merge 4 commits intoNVIDIA:mainfrom
Andy-Jost:graph-builder-refactor

Conversation

@Andy-Jost
Copy link
Copy Markdown
Contributor

@Andy-Jost Andy-Jost commented May 1, 2026

Summary

Convert GraphBuilder and Graph from Python classes (using _MembersNeededForFinalize + weakref.finalize) to Cython cdef class objects backed by typed C++ resource handles.

This does two things. First, it lays groundwork for step 3 of #1330 (graph updates) by giving graph objects the same handle-based ownership pattern as the rest of cuda.core. Second, it clarifies GraphBuilder's state machine: what used to be a tangle of implicit flags and conditional cleanup paths is now two orthogonal enums — _BuilderKind (PRIMARY/FORKED/CONDITIONAL_BODY) describing how the builder was created, and _CaptureState (CAPTURE_NOT_STARTED/CAPTURING/CAPTURE_ENDED) tracking the capture lifecycle. Methods can now check exactly the state they care about, illegal transitions are detectable, and __dealloc__ has a single, well-defined condition for ending capture.

Changes

  • Add GraphExecHandle to the resource-handle layer (_cpp/resource_handles.{hpp,cpp}, _resource_handles.{pxd,pyx}), wrapping CUgraphExec with a cuGraphExecDestroy-based deleter run under GILReleaseGuard.
  • GraphBuilder becomes a cdef class with the explicit _BuilderKind/_CaptureState enums described above. Live-API methods (begin_building, end_building, embed, split, join, etc.) move to nogil cydriver paths where practical, and end-of-capture in __dealloc__ runs against the cached StreamHandle rather than reaching into a possibly-cleared Stream attribute.
  • Graph becomes a cdef class holding GraphExecHandle _h_graph_exec directly; update/upload/launch move to nogil cydriver. weakref.finalize is gone.
  • Device.create_graph_builder and Stream.create_graph_builder cimport GraphBuilder and call its _init factory; quoted forward-reference annotations are removed (clears Cython "Strings should no longer be used for type declarations" warnings).

Related work

Andy-Jost added 3 commits May 1, 2026 14:52
…hine

Refactor GraphBuilder from a Python class using _MembersNeededForFinalize
to a cdef class with explicit _BuilderKind (PRIMARY/FORKED/CONDITIONAL_BODY)
and _CaptureState (NOT_STARTED/CAPTURING/ENDED) tracking. Cleanup moves
into __dealloc__/close, and the builder now uses GraphHandle/StreamHandle
from _resource_handles instead of holding raw driver objects. Drop the
is_stream_owner flag now that StreamHandle owns the lifetime.

End-capture paths in __dealloc__ and close guard on _h_stream so cleanup
is safe even if _init* fails before completing assignment.

Made-with: Cursor
Add a GraphExecHandle to the resource-handle layer (parallel to
GraphHandle) wrapping CUgraphExec with RAII cleanup via
cuGraphExecDestroy on shared_ptr release. Convert Graph from a Python
class using _MembersNeededForFinalize to a cdef class holding a typed
_h_graph_exec attribute, dropping the weakref.finalize machinery.
update/upload/launch move to nogil cydriver paths consistent with the
GraphBuilder rewrite.

Also drop quoted forward-reference annotations on create_graph_builder
and _instantiate_graph/complete now that GraphBuilder is cimported in
_device.pyx and _stream.pyx and Cython accepts the in-module forward
reference to Graph. Clears the related "Strings should no longer be
used for type declarations" warnings.

Made-with: Cursor
The cdef-class member declarations live in the .pxd, so the .pyx does
not need to re-cimport GraphExecHandle, GraphHandle, or StreamHandle.

Made-with: Cursor
@Andy-Jost Andy-Jost added this to the cuda.core v1.0.0 milestone May 1, 2026
@Andy-Jost Andy-Jost added enhancement Any code-related improvements P1 Medium priority - Should do cuda.core Everything related to the cuda.core module labels May 1, 2026
@Andy-Jost Andy-Jost self-assigned this May 1, 2026
… cycle

cimport-ing GraphBuilder at the top of _stream.pyx and _device.pyx made
Cython emit a Python-level import of cuda.core.graph._graph_builder
during _stream module init. That triggered the chain
graph -> _graph_node -> _kernel_arg_handler -> _memory._buffer
-> _device, which then re-entered the still-initializing _stream module
via "from cuda.core._stream import IsStreamT", failing with
ImportError: cannot import name IsStreamT.

Restore the original lazy "import GraphBuilder" inside
create_graph_builder (Stream and Device) and Stream_accept. The return
annotations stay as bare names; "from __future__ import annotations" in
both files defers their evaluation, so they need not resolve at
function-definition time.

Made-with: Cursor
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P1 Medium priority - Should do

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant