cuda.core: Cythonize GraphBuilder and Graph with handle-layer cleanup#2008
Open
Andy-Jost wants to merge 4 commits intoNVIDIA:mainfrom
Open
cuda.core: Cythonize GraphBuilder and Graph with handle-layer cleanup#2008Andy-Jost wants to merge 4 commits intoNVIDIA:mainfrom
Andy-Jost wants to merge 4 commits intoNVIDIA:mainfrom
Conversation
…hine Refactor GraphBuilder from a Python class using _MembersNeededForFinalize to a cdef class with explicit _BuilderKind (PRIMARY/FORKED/CONDITIONAL_BODY) and _CaptureState (NOT_STARTED/CAPTURING/ENDED) tracking. Cleanup moves into __dealloc__/close, and the builder now uses GraphHandle/StreamHandle from _resource_handles instead of holding raw driver objects. Drop the is_stream_owner flag now that StreamHandle owns the lifetime. End-capture paths in __dealloc__ and close guard on _h_stream so cleanup is safe even if _init* fails before completing assignment. Made-with: Cursor
Add a GraphExecHandle to the resource-handle layer (parallel to GraphHandle) wrapping CUgraphExec with RAII cleanup via cuGraphExecDestroy on shared_ptr release. Convert Graph from a Python class using _MembersNeededForFinalize to a cdef class holding a typed _h_graph_exec attribute, dropping the weakref.finalize machinery. update/upload/launch move to nogil cydriver paths consistent with the GraphBuilder rewrite. Also drop quoted forward-reference annotations on create_graph_builder and _instantiate_graph/complete now that GraphBuilder is cimported in _device.pyx and _stream.pyx and Cython accepts the in-module forward reference to Graph. Clears the related "Strings should no longer be used for type declarations" warnings. Made-with: Cursor
The cdef-class member declarations live in the .pxd, so the .pyx does not need to re-cimport GraphExecHandle, GraphHandle, or StreamHandle. Made-with: Cursor
… cycle cimport-ing GraphBuilder at the top of _stream.pyx and _device.pyx made Cython emit a Python-level import of cuda.core.graph._graph_builder during _stream module init. That triggered the chain graph -> _graph_node -> _kernel_arg_handler -> _memory._buffer -> _device, which then re-entered the still-initializing _stream module via "from cuda.core._stream import IsStreamT", failing with ImportError: cannot import name IsStreamT. Restore the original lazy "import GraphBuilder" inside create_graph_builder (Stream and Device) and Stream_accept. The return annotations stay as bare names; "from __future__ import annotations" in both files defers their evaluation, so they need not resolve at function-definition time. Made-with: Cursor
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Convert
GraphBuilderandGraphfrom Python classes (using_MembersNeededForFinalize+weakref.finalize) to Cythoncdef classobjects backed by typed C++ resource handles.This does two things. First, it lays groundwork for step 3 of #1330 (graph updates) by giving graph objects the same handle-based ownership pattern as the rest of
cuda.core. Second, it clarifiesGraphBuilder's state machine: what used to be a tangle of implicit flags and conditional cleanup paths is now two orthogonal enums —_BuilderKind(PRIMARY/FORKED/CONDITIONAL_BODY) describing how the builder was created, and_CaptureState(CAPTURE_NOT_STARTED/CAPTURING/CAPTURE_ENDED) tracking the capture lifecycle. Methods can now check exactly the state they care about, illegal transitions are detectable, and__dealloc__has a single, well-defined condition for ending capture.Changes
GraphExecHandleto the resource-handle layer (_cpp/resource_handles.{hpp,cpp},_resource_handles.{pxd,pyx}), wrappingCUgraphExecwith acuGraphExecDestroy-based deleter run underGILReleaseGuard.GraphBuilderbecomes acdef classwith the explicit_BuilderKind/_CaptureStateenums described above. Live-API methods (begin_building,end_building,embed,split,join, etc.) move tonogilcydriverpaths where practical, and end-of-capture in__dealloc__runs against the cachedStreamHandlerather than reaching into a possibly-clearedStreamattribute.Graphbecomes acdef classholdingGraphExecHandle _h_graph_execdirectly;update/upload/launchmove tonogilcydriver.weakref.finalizeis gone.Device.create_graph_builderandStream.create_graph_buildercimportGraphBuilderand call its_initfactory; quoted forward-reference annotations are removed (clears Cython "Strings should no longer be used for type declarations" warnings).Related work