-
Notifications
You must be signed in to change notification settings - Fork 278
cuda.core: require explicit stream for stream-scheduling APIs #2020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
9943b1e
23d8d8b
7c4debe
c2d4dfe
5fd0958
91a3211
ca2f5f7
91807ee
5e9a5ea
e1d3b8c
b88ebc6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -24,7 +24,7 @@ from cuda.core._resource_handles cimport ( | |
| ) | ||
| from cuda.core.typing import DevicePointerType | ||
|
|
||
| from cuda.core._stream cimport Stream, Stream_accept | ||
| from cuda.core._stream cimport Stream, Stream_accept, default_stream | ||
| from cuda.core._utils.cuda_utils cimport HANDLE_RETURN, _parse_fill_value | ||
|
|
||
| import sys | ||
|
|
@@ -49,12 +49,24 @@ cdef void _mr_dealloc_callback( | |
| size_t size, | ||
| const StreamHandle& h_stream, | ||
| ) noexcept: | ||
| """Called by the C++ deleter to deallocate via MemoryResource.deallocate.""" | ||
| """Called by the C++ deleter to deallocate via MemoryResource.deallocate. | ||
|
|
||
| This is the C++ teardown path: there is no Python caller frame from | ||
| which to obtain a stream. If the device-pointer handle was created | ||
| without ``set_deallocation_stream`` being called (e.g. buffers minted | ||
| via ``Buffer.from_handle(ptr, size, mr=mr)`` from DLPack import, | ||
| third-party adapters, or other foreign sources), ``h_stream`` is | ||
| empty here. Stream-ordered MR ``deallocate`` overrides reject | ||
| ``stream=None`` (issue #2001), so without a fallback the destructor | ||
| would print a warning and leak the allocation. Fall back to the | ||
| legacy/per-thread default stream so the free still happens; this is | ||
| the unique exception to the "no implicit default-stream fallback" | ||
| policy because the teardown has no other source of truth. | ||
| """ | ||
| cdef Stream stream | ||
| try: | ||
| stream = None | ||
| if h_stream: | ||
| stream = Stream._from_handle(Stream, h_stream) | ||
| mr.deallocate(int(ptr), size, stream) | ||
| stream = Stream._from_handle(Stream, h_stream) if h_stream else default_stream() | ||
| mr.deallocate(int(ptr), size, stream=stream) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My apology. It seems my bot was hallucinating? I thought we guaranteed already that we always have a deallocation stream. There should be no way that we would need the default stream here. @Andy-Jost could you confirm? Below is bot-generated: For pool allocations via _MemPool.allocate(): the invariant holds — h_stream is always non-zero. In deviceptr_alloc_from_pool() (resource_handles.cpp:719), the stream is stored in the DevicePtrBox at creation and captured by the shared_ptr deleter. Same for Buffer.from_ipc_descriptor(). So for the normal async MR allocation path, the default_stream() fallback is unreachable. The one path that breaks the invariant: Buffer.from_handle(ptr, size, mr=some_mr). This calls deviceptr_create_with_mr() (resource_handles.cpp:847) which creates the box with StreamHandle{} (empty). No stream parameter is accepted by from_handle(). So if someone constructs a buffer via from_handle(mr=pool_mr), the callback sees h_stream=0. So the question is: is Buffer.from_handle(mr=pool_mr) with an async MR a legitimate use case? If not, the default_stream() fallback papers over something that should be an error. If yes, then from_handle() arguably should accept an optional stream when mr is provided. For the test Andy added (test_mr_dealloc_callback_falls_back_to_default_stream), it constructs a buffer via Buffer.from_handle(1, 1024, mr=mr) — which is exactly the from_handle path, not the pool allocation path. This confirms the fallback only matters for from_handle, not for normal pool allocations. Should this be flagged in the review? The options seem to be:
|
||
| except Exception as exc: | ||
| print(f"Warning: mr.deallocate() failed during Buffer destruction: {exc}", | ||
| file=sys.stderr) | ||
|
|
@@ -119,7 +131,11 @@ cdef class Buffer: | |
|
|
||
| @staticmethod | ||
| def _reduce_helper(mr, ipc_descriptor): | ||
| return Buffer.from_ipc_descriptor(mr, ipc_descriptor) | ||
| # The parent process's stream is not portable across processes, so the | ||
| # pickle path cannot thread an explicit stream through. Seed the | ||
| # imported buffer's deallocation with the current context's default | ||
| # stream; the receiver can override via buffer.close(stream). | ||
| return Buffer.from_ipc_descriptor(mr, ipc_descriptor, stream=default_stream()) | ||
|
leofang marked this conversation as resolved.
|
||
|
|
||
| def __reduce__(self): | ||
| # Must not serialize the parent's stream! | ||
|
|
@@ -158,9 +174,20 @@ cdef class Buffer: | |
| @classmethod | ||
| def from_ipc_descriptor( | ||
| cls, mr: DeviceMemoryResource | PinnedMemoryResource, ipc_descriptor: IPCBufferDescriptor, | ||
| stream: Stream = None | ||
| *, stream: Stream | ||
| ) -> Buffer: | ||
| """Import a buffer that was exported from another process.""" | ||
| """Import a buffer that was exported from another process. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| mr : :obj:`~_memory.DeviceMemoryResource` | :obj:`~_memory.PinnedMemoryResource` | ||
| The IPC-enabled memory resource matching the exporting process. | ||
| ipc_descriptor : :obj:`~_memory.IPCBufferDescriptor` | ||
| The descriptor exported from another process. | ||
| stream : :obj:`~_stream.Stream` | ||
| Keyword-only. The stream used for asynchronous deallocation when | ||
| the buffer is closed or garbage collected. | ||
| """ | ||
| return _ipc.Buffer_from_ipc_descriptor(cls, mr, ipc_descriptor, stream) | ||
|
|
||
| @property | ||
|
|
@@ -215,7 +242,7 @@ cdef class Buffer: | |
| if self._memory_resource is None: | ||
| raise ValueError("a destination buffer must be provided (this " | ||
| "buffer does not have a memory_resource)") | ||
| dst = self._memory_resource.allocate(src_size, s) | ||
| dst = self._memory_resource.allocate(src_size, stream=s) | ||
|
|
||
| cdef size_t dst_size = dst._size | ||
| if dst_size != src_size: | ||
|
|
@@ -490,17 +517,17 @@ cdef class MemoryResource: | |
| resource's respective property.) | ||
| """ | ||
|
|
||
| def allocate(self, size_t size, stream: Stream | GraphBuilder | None = None) -> Buffer: | ||
| def allocate(self, size_t size, *, stream: Stream | GraphBuilder) -> Buffer: | ||
| """Allocate a buffer of the requested size. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| size : int | ||
| The size of the buffer to allocate, in bytes. | ||
| stream : :obj:`~_stream.Stream` | :obj:`~graph.GraphBuilder`, optional | ||
| The stream on which to perform the allocation asynchronously. | ||
| If None, it is up to each memory resource implementation to decide | ||
| and document the behavior. | ||
| stream : :obj:`~_stream.Stream` | :obj:`~graph.GraphBuilder` | ||
| Keyword-only. The stream on which to perform the allocation | ||
| asynchronously. Must be passed explicitly; pass | ||
| ``device.default_stream`` to use the default stream. | ||
|
|
||
| Returns | ||
| ------- | ||
|
|
@@ -510,7 +537,7 @@ cdef class MemoryResource: | |
| """ | ||
| raise TypeError("MemoryResource.allocate must be implemented by subclasses.") | ||
|
|
||
| def deallocate(self, ptr: DevicePointerType, size_t size, stream: Stream | GraphBuilder | None = None): | ||
| def deallocate(self, ptr: DevicePointerType, size_t size, *, stream: Stream | GraphBuilder): | ||
| """Deallocate a buffer previously allocated by this resource. | ||
|
|
||
| Parameters | ||
|
|
@@ -519,10 +546,10 @@ cdef class MemoryResource: | |
| The pointer or handle to the buffer to deallocate. | ||
| size : int | ||
| The size of the buffer to deallocate, in bytes. | ||
| stream : :obj:`~_stream.Stream` | :obj:`~graph.GraphBuilder`, optional | ||
| The stream on which to perform the deallocation asynchronously. | ||
| If None, it is up to each memory resource implementation to decide | ||
| and document the behavior. | ||
| stream : :obj:`~_stream.Stream` | :obj:`~graph.GraphBuilder` | ||
| Keyword-only. The stream on which to perform the deallocation | ||
| asynchronously. Must be passed explicitly; pass | ||
| ``device.default_stream`` to use the default stream. | ||
| """ | ||
| raise TypeError("MemoryResource.deallocate must be implemented by subclasses.") | ||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.