-->

Starting with CUDA 10.0, a new set of metric APIs are added for devices with compute capability 7.0 and higher. These APIs provide low and deterministic profiling overhead on the target system. These are supported on all CUDA supported platforms except Android, and are not supported under MPS (Multi-Process Service), Confidential Compute, or SLI configured systems. In order to determine whether a device is compatible with this API, a new function cuptiProfilerDeviceSupported is introduced in CUDA 11.5 which exposes overall Profiling API support and specific requirements for a given device. Profiling API must be initialized by calling cuptiProfilerInitialize before testing device support.

This section covers performance profiling Host and Target APIs for CUDA. Broadly profiling APIs are divided into following four sections:
  • Enumeration (Host)
  • Configuration (Host)
  • Collection (Target)
  • Evaluation (Host)
Host APIs provide a metric interface for enumeration, configuration and evaluation that doesn't require a compute(GPU) device, and can also run in an offline mode. In the samples section under extensions, profiler host utility covers the usage of host APIs. Target APIs are used for data collection of the metrics and requires a compute (GPU) device. Refer to samples auto_rangeProfiling and userrange_profiling for usage of profiling APIs.

The list of metrics has been overhauled from earlier generation metrics and event APIs, to support a standard naming convention based upon unit__(subunit?)_(pipestage?)_quantity_qualifiers