Placement, Scaling & Operations

Placement policy, multi-cloud deploy, drain / migration, scaling under load, borrowed hardware.

When to read this section

You are deciding where workloads run, how they survive node failures, and how to redeploy without losing work. These are the “where” and “when” recipes — placement policy, anti-affinity, multi-cloud deploy, provider migration, scaling under load. The grafOS scheduler does the routing; these recipes show how to instruct it.

Suggested order

Database Buffer Pool That Borrows Memory From the Network and A Build System That Scales to 100 Cores in 200 Milliseconds — the smallest scaling recipes. Lease whatever capacity you need, drop it when you’re done.
Content Delivery With Automatic Popularity-Based Scaling — the cost-aware scaling variant.
Deterministic Latency Audio Over Borrowed Hardware — the placement-constrained variant. Real-time SLOs from leased compute.
Data-Affinity Placement and Anti-Affinity For Failure Domains — placement policy basics.
Pop-Up Supercomputer in 30 Seconds — multi-cloud burst.
Live Event Mode (Flash-Crowd Autopilot) — the autopilot variant for sudden load.
Workload on Node Drain — graceful response to operator-initiated drains.
Multi-Cloud Deploy — the production multi-cloud baseline.
Cross-Cloud Order Pipeline — the production multi-cloud pipeline.
Provider Migration Without Losing Work — the recovery variant. Read when you have multiple providers and need to fail one over.

What’s not here

GPU-specific placement (generation targeting, MIG profiles). See GPU & Inference / shared inference. Tenant-quota fair-share scheduling. See Security, Audit & Cost.