Skip to content

Revocation & recovery

Mid-decode revocation, FENCED state detection, hot-rebind continuity, preemptible training.

When to read this sub-group

You are running an inference engine on a shared GPU and need to handle the case where a lease is reclaimed mid-decode — a tenant’s TTL expires, a quota threshold trips, an operator drains a node — without crashing the engine, leaking memory, or affecting the OTHER tenants currently being served.

This is the load-bearing operational story for production fabric-leased inference. Read these in order.

Suggested order

  1. Handling Mid-Kernel Lease Revocation in a Decode Loop — the engine-side primitive. How the kernel-launch poll catches a revoked lease in sub-millisecond wall-clock, returning a typed error before launching against unowned memory.
  2. Detecting a FENCED Lease State After Revocation — the application-side primitive. How the typed error propagates up so the harness can match on it and route to recovery.
  3. Hot-Rebind Inference Continuity After a Lease Revocation — the multi-tenant variant. How surviving tenants keep serving while the revoked one is re-admitted on a fresh lease.
  4. Clean Preemptible GPU Training Job — the training variant. A long-running job that checkpoints often enough to lose only seconds when preempted.

What’s not here

Per-request audit attribution for the revocation event. See audit and attribution. Cross-cloud lease migration. See Placement, Scaling & Operations.