Revocation & recovery
Mid-decode revocation, FENCED state detection, hot-rebind continuity, preemptible training.
When to read this sub-group
You are running an inference engine on a shared GPU and need to handle the case where a lease is reclaimed mid-decode — a tenant’s TTL expires, a quota threshold trips, an operator drains a node — without crashing the engine, leaking memory, or affecting the OTHER tenants currently being served.
This is the load-bearing operational story for production fabric-leased inference. Read these in order.
Suggested order
- Handling Mid-Kernel Lease Revocation in a Decode Loop — the engine-side primitive. How the kernel-launch poll catches a revoked lease in sub-millisecond wall-clock, returning a typed error before launching against unowned memory.
- Detecting a FENCED Lease State After Revocation — the application-side primitive. How the typed error propagates up so the harness can match on it and route to recovery.
- Hot-Rebind Inference Continuity After a Lease Revocation — the multi-tenant variant. How surviving tenants keep serving while the revoked one is re-admitted on a fresh lease.
- Clean Preemptible GPU Training Job — the training variant. A long-running job that checkpoints often enough to lose only seconds when preempted.
What’s not here
Per-request audit attribution for the revocation event. See audit and attribution. Cross-cloud lease migration. See Placement, Scaling & Operations.