Skip to content

Move all cuda.core.system enums into cuda.core.system.typing#2022

Merged
mdboom merged 13 commits intoNVIDIA:mainfrom
mdboom:re-expose-enums3
May 5, 2026
Merged

Move all cuda.core.system enums into cuda.core.system.typing#2022
mdboom merged 13 commits intoNVIDIA:mainfrom
mdboom:re-expose-enums3

Conversation

@mdboom
Copy link
Copy Markdown
Contributor

@mdboom mdboom commented May 5, 2026

This is a follow-on to #2014, and based on a comment in #2016 that all of these new enums should go in a separate typing module dedicated to this and type annotations.

For cuda.core.system, we decided to put the enums in cuda.core.system.typing rather than cuda.core.typing because cuda.core.system is deliberately designed to be a little bit independent of CUDA. (It could become its own package someday, or be under a different namespace etc.)

This also addresses a few small bugs in the test_enum_coverage.py tests that were discovered while working on #2016. Otherwise, this PR is exclusively moving content and updating imports and doc references accordingly.

@mdboom mdboom self-assigned this May 5, 2026
@github-actions github-actions Bot added the cuda.core Everything related to the cuda.core module label May 5, 2026
@mdboom mdboom added P0 High priority - Must do! breaking Breaking changes are introduced labels May 5, 2026
@mdboom mdboom added this to the cuda.core v1.0.0 milestone May 5, 2026
@github-actions

This comment has been minimized.

@mdboom mdboom force-pushed the re-expose-enums3 branch from f1a36d6 to b9797a8 Compare May 5, 2026 17:32
@rwgk
Copy link
Copy Markdown
Contributor

rwgk commented May 5, 2026

I have a naming concern: When I see typing, I expect Python type hints, protocols, or type aliases. This module is mostly runtime enums (plus related public types like FieldId / DeviceArch), so typing feels a little misleading. If the goal is "public enums for the system API", enum_types.py seems clearer. If the goal is the broader "public named types used by cuda.core.system", then types.py feels like a better umbrella than typing.py.

@mdboom
Copy link
Copy Markdown
Contributor Author

mdboom commented May 5, 2026

I have a naming concern: When I see typing, I expect Python type hints, protocols, or type aliases. This module is mostly runtime enums (plus related public types like FieldId / DeviceArch), so typing feels a little misleading. If the goal is "public enums for the system API", enum_types.py seems clearer. If the goal is the broader "public named types used by cuda.core.system", then types.py feels like a better umbrella than typing.py.

@leofang: What do you think? I know you sort of see enums as a type-checking feature (though they are a bit more than that). I'm on the fence. If we change here we should also change cuda.core.typing.

@rwgk
Copy link
Copy Markdown
Contributor

rwgk commented May 5, 2026

This is a pure agent review. I'm posting it here before drilling down myself, for visibility b/o our approaching deadlines.


Cusor GPT-5.4 Extra High Fast

Findings

  • High cuda_core/cuda/core/system/__init__.py:21 now eagerly imports typing before the existing compatibility gate in cuda_core/cuda/core/system/_system.pyx:13. But cuda_core/cuda/core/system/typing.py:329 only suppresses ImportError, while it still dereferences version-specific NVML symbols like cuda_core/cuda/core/system/typing.py:352. So an older or otherwise incompatible cuda.bindings install can still fail at import cuda.core.system with AttributeError instead of cleanly falling back to CUDA_BINDINGS_NVML_IS_COMPATIBLE = False. The fresh commits look like they addressed a CI/import cleanup issue, but I don’t think they closed this compatibility hole.

  • Medium cuda_core/cuda/core/system/_device.pyx:1121 still removes Utilization from _device.__all__, which means cuda.core.system.Utilization disappears from the public package surface. cuda_core/tests/system/test_system_device.py:783 was updated to assert against _device.Utilization instead, so the tests now normalize the API drop rather than catch it. Unless the intent is to demote that type to private, this looks like unrelated public API churn.

  • Medium cuda_core/tests/system/test_system_device.py:503 now asserts isinstance(near_device, typing.Device), but cuda_core/cuda/core/system/typing.py does not define Device. That is a real test bug on the current tip, even if CI may not hit it today because the test is hardware/topology dependent.

  • Low cuda_core/tests/test_enum_coverage.py:25 and cuda_core/tests/test_enum_coverage.py:256 narrow test_all_str_enums_in_cases() from package-wide discovery to a hand-maintained _MODULES list. That fixes the immediate missing-test problem, but it weakens the original guardrail: future StrEnum additions elsewhere in cuda.core will no longer be discovered automatically.

Open Questions

  • Was the new from cuda.core.system import typing in cuda_core/cuda/core/system/__init__.py:21 actually required? If the goal is just to support from cuda.core.system import typing, Python already resolves submodules there, so dropping the eager import may avoid the compatibility regression entirely.
  • Is demoting Utilization from cuda.core.system.Utilization to _device.Utilization intentional? The PR description still reads like a pure enum move plus test fixes, so that API change stands out.

Change Summary

  • I re-reviewed the current PR head 82c5f98. Since the earlier pass, two follow-up commits landed: Fix missing StrEnum test and Fix imports.
  • The CI picture is better now: static checks are green, several build legs have completed successfully, and the rest of the matrix is still in progress. So the branch is in a healthier state than before.
  • My updated read is that the earlier low-level CI issue likely was addressed, but the main compatibility concern still looks live, and there is now also a concrete test typo in test_get_nearest_gpus().

@mdboom
Copy link
Copy Markdown
Contributor Author

mdboom commented May 5, 2026

High cuda_core/cuda/core/system/init.py:21 now eagerly imports typing before the existing compatibility gate in cuda_core/cuda/core/system/_system.pyx:13

This is totally fine. The enum in question (DeviceArch) has been there from the beginning on cuda.bindings.nvml.

Medium cuda_core/cuda/core/system/_device.pyx:1121 still removes Utilization from _device.all, which means cuda.core.system.Utilization disappears from the public package surface.

This was on purpose. It was unintentionally public before. It is now private like all other "helper" classes in cuda.core.system.

Medium cuda_core/tests/system/test_system_device.py:503 now asserts isinstance(near_device, typing.Device), but cuda_core/cuda/core/system/typing.py does not define Device

Good catch. Fixed.

Low cuda_core/tests/test_enum_coverage.py:25 and cuda_core/tests/test_enum_coverage.py:256 narrow test_all_str_enums_in_cases() from package-wide discovery to a hand-maintained _MODULES list. That fixes the immediate missing-test problem, but it weakens the original guardrail: future StrEnum additions elsewhere in cuda.core will no longer be discovered automatically.

This is fine. The sort of import hooks this once used are pretty broken by our "megapackage" approach. I think it's good enough to just declare the places where we might find public enums.

Was the new from cuda.core.system import typing in cuda_core/cuda/core/system/init.py:21 actually required? If the goal is just to support from cuda.core.system import typing, Python already resolves submodules there, so dropping the eager import may avoid the compatibility regression entirely.

Yes, this is required to get around a cyclical import issue.

Is demoting Utilization from cuda.core.system.Utilization to _device.Utilization intentional? The PR description still reads like a pure enum move plus test fixes, so that API change stands out.

Yes.

@mdboom mdboom requested a review from rwgk May 5, 2026 19:02
@rwgk
Copy link
Copy Markdown
Contributor

rwgk commented May 5, 2026

Logging a somewhat unusual observation:

https://github.com/NVIDIA/cuda-python/actions/runs/25394895050/job/74482106980?pr=2022

failed without logging any error message:

2026-05-05T18:55:12.3160422Z Downloading single artifact
2026-05-05T18:55:12.5147791Z Post job cleanup.

Agent take:

...
  So the best read is:

  • this was a failure inside actions/download-artifact,
  • very early in the download flow,
  • likely a transient GitHub artifact service / runner networking / action hiccup,
  • not a deterministic failure caused by your code changes.

  Why it looks like it “just stops”:
  • In the failed job, the log reaches Downloading single artifact and then immediately jumps to Post job cleanup.
  • In a successful sibling job, the next lines are:
    • Preparing to download the following artifacts:
    • artifact ID / expected digest
    • redirect to blob URL
    • Artifact download completed successfully.
  • Your failed job never gets that far, so it likely died before or during artifact metadata resolution, and GitHub didn’t preserve a
    useful Error: line or annotation for it.
...
  So my conclusion is: most likely transient infrastructure/action failure, not related to the enum PR. Rerunning just that failed job
  would be the normal next move.

@mdboom mdboom enabled auto-merge (squash) May 5, 2026 20:49
@mdboom mdboom merged commit 38c032f into NVIDIA:main May 5, 2026
180 of 182 checks passed
@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking changes are introduced cuda.core Everything related to the cuda.core module P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants