Benchmarks: cuda.core by danielfrg · Pull Request #2005 · NVIDIA/cuda-python

danielfrg · 2026-05-01T18:59:32Z

Description

This is for matching benchmarks we have been doing for cuda.bindings to cuda.core.

I guess its up for discussion if we need these and what we want to compare them against.

Right now its basically trying to measure extra latency of the cuda.core layer by comparing the to cuda.bindings ones and matching benchmark IDs to that suite 1:1.

The main question I think is regarding the "caching" that we get from cuda.core on Device. Device instances are singletons so after a first call Device(0)doesnt hit the driver. And probably other similar cases.

I guess we could also introduce some sort of cleanups or process spawns but that would come with other latencies.

copy-pr-bot · 2026-05-01T18:59:36Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

rwgk · 2026-05-01T23:34:22Z

Do you have a side-by-side bindings-vs-core delta table that you could post here?

Quick "Low" findings from Cursor GPT-5.4 Extra High Fast

Low: benchmarks/cuda_core/compare.py and benchmarks/cuda_core/benchmarks/bench_ctx_device.py tell readers to consult BENCHMARK_PLAN.md, but there is no BENCHMARK_PLAN.md under benchmarks/cuda_core or elsewhere in the repo. The starred-row legend is useful, but the referenced deeper rationale document is missing.
Low: benchmarks/cuda_core/benchmarks/bench_ctx_device.py says Device() with no args returns the TLS-cached current device, but cuda_core/cuda/core/_device.pyx actually resolves that case by calling cuCtxGetDevice() when a context is active. The benchmark behavior itself is fine, and benchmarks/cuda_core/compare.py already treats that row as a different code path, but the benchmark comment is misleading about what work is really being measured.

danielfrg added 4 commits May 1, 2026 12:50

cuda.core benchmarks

ed099ad

cuda.core benchmarks

a711361

cuda.core benchmarks

2144446

cuda.core benchmarks

c25b82f

danielfrg self-assigned this May 1, 2026

danielfrg added cuda.bindings Everything related to the cuda.bindings module performance labels May 1, 2026

danielfrg added this to the cuda.core v1.0.0 milestone May 1, 2026

danielfrg requested review from leofang, mdboom and rwgk May 1, 2026 19:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks: cuda.core#2005

Benchmarks: cuda.core#2005
danielfrg wants to merge 4 commits intomainfrom
benchmarks-cuda-core

danielfrg commented May 1, 2026

Uh oh!

copy-pr-bot Bot commented May 1, 2026

Uh oh!

rwgk commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danielfrg commented May 1, 2026

Description

Uh oh!

copy-pr-bot Bot commented May 1, 2026

Uh oh!

rwgk commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants