Skip to content

GPU basics

Smallest GPU programs: single-kernel, single-session, single-tenant.

When to read this sub-group

You have a GPU available and want to run something on it. You are not yet sharing the GPU with other tenants, not yet running a model engine, not yet worried about revocation. You want the smallest program that proves the lease layer holds together for compute.

Suggested order

  1. Running a CUDA Kernel on a Leased GPU — one kernel, one tenant, one session. Establishes the acquire → submit → drop shape.
  2. GPU as a Pure Function Call — calls a kernel as if it were a function. Establishes the input/output model.
  3. A Development Environment That Leases a GPU for Five Minutes — the dev-loop variant. Short TTL, fast iteration.
  4. Borrowed GPU Studio — the interactive variant. Hold the lease across multiple kernels in one session.
  5. Leasing a Slice of a GPU for a Multi-Kernel Workload — production-shape persistent session.
  6. Parallel GPU Sessions for Multi-Kernel Burst — burst compute with parallel sessions.
  7. 1000 GPUs for One Second — the burst variant at scale. Multi-cloud is optional; the recipe shows both.

What’s not here

Inference engines. See shared inference. Engine correctness measurement. See correctness and memory. Mid-decode revocation handling. See revocation and recovery.