GPU basics
Smallest GPU programs: single-kernel, single-session, single-tenant.
When to read this sub-group
You have a GPU available and want to run something on it. You are not yet sharing the GPU with other tenants, not yet running a model engine, not yet worried about revocation. You want the smallest program that proves the lease layer holds together for compute.
Suggested order
- Running a CUDA Kernel on a Leased GPU — one kernel, one tenant, one session. Establishes the
acquire → submit → dropshape. - GPU as a Pure Function Call — calls a kernel as if it were a function. Establishes the input/output model.
- A Development Environment That Leases a GPU for Five Minutes — the dev-loop variant. Short TTL, fast iteration.
- Borrowed GPU Studio — the interactive variant. Hold the lease across multiple kernels in one session.
- Leasing a Slice of a GPU for a Multi-Kernel Workload — production-shape persistent session.
- Parallel GPU Sessions for Multi-Kernel Burst — burst compute with parallel sessions.
- 1000 GPUs for One Second — the burst variant at scale. Multi-cloud is optional; the recipe shows both.
What’s not here
Inference engines. See shared inference. Engine correctness measurement. See correctness and memory. Mid-decode revocation handling. See revocation and recovery.