Resource Isolation And Exclusivity Semantics
This note describes a more explicit way to model performance isolation in fabricBIOS and grafOS.
The core idea is:
- capacity answers “how much hardware is reserved?”
- execution mode answers “how does the workload run on that hardware?”
- isolation / exclusivity policy answers “how much interference from other workloads is allowed?”
These concerns are related, but they are not the same thing. The repo already separates lease width from tasklet execution width for CPU tasklets. This document argues that isolation should also be represented explicitly instead of being inferred indirectly from lease size.
Related documents:
docs/spec/resource-types.mddocs/tasklet-profile-v0.mddocs/grafos/tasklet-parallelism-model.mddocs/grafos/shared-memory-tasklet-model.md
1. Problem statement
Today it is easy for a user to infer the wrong meaning from a request such as “lease 4 CPU cores.”
That request could mean several different things:
- “run my workload in parallel across 4 workers”
- “reserve enough capacity for 4 separate tasklets”
- “give my single-threaded tasklet stronger isolation from other tenants”
- “reserve scheduler headroom because this stage may later fan out”
Those are different intents. A single numeric lease width is a poor proxy for all of them.
The repo has already addressed part of this problem for CPU tasklets:
num_coresreserves CPU capacity- tasklet execution width stays separate
- shared-memory tasklets are an explicit execution mode, not an accidental
consequence of
.cores(n)
That same style should be carried further. Users who want performance predictability should be able to ask for isolation directly rather than smuggling that request through lease width.
2. Design goals
This model should:
- make user intent explicit
- reduce accidental over-reservation of hardware
- preserve the lease-width vs execution-width separation
- work across CPU and GPU resources
- let schedulers reason about density vs predictability
- expose operator-visible policy in inventory and logs
This model should not:
- turn fabricBIOS into a process scheduler
- imply POSIX-like threading semantics
- promise hardware isolation properties that the runtime cannot actually enforce
3. Common model
Every resource request should be decomposable into three axes.
3.1 Capacity
How much of the resource is reserved.
Examples:
- CPU:
num_cores = 4 - MEM:
mem_bytes = 256 MiB - GPU:
vram_bytes = 8 GiB,compute_slices = 1
3.2 Execution mode
How software is allowed to execute against that capacity.
Examples:
- CPU tasklet profile v0: single-threaded tasklet execution
- CPU tasklet profile v1: explicit
shared_memory_tasklet - GPU: stateless kernel submit vs persistent session vs future graph/session execution modes
3.3 Isolation / exclusivity policy
How much cross-tenant sharing is allowed for the reserved hardware.
Examples:
- best-effort sharing
- whole-core exclusivity
- strict isolated placement
- GPU session-exclusive
- GPU device-exclusive
The important rule is:
- capacity does not imply execution mode
- capacity does not imply isolation
- isolation does not imply parallel execution
4. CPU interpretation
For CPUs, the following distinctions matter.
4.1 Single-threaded but isolation-sensitive work
Some workloads cannot parallelize well, but still benefit from reduced interference:
- low-jitter control loops
- latency-sensitive parsers or codecs
- cache-sensitive single-threaded compute
- workloads harmed by SMT sibling contention
Those workloads may want:
- execution width =
1 - capacity = small
- isolation = strong
That intent is better expressed as an isolation policy than as “lease more cores and hope that implies exclusivity.”
4.2 CPU policy classes
The current Linux path already uses CPU isolation policies such as:
BestEffortWholeCoreStrictIsolated
That is the right general shape. Over time, CPU requests should read more like:
- capacity:
1 core - execution mode:
single-threaded tasklet - isolation:
whole_core
or:
- capacity:
8 cores - execution mode:
shared_memory_tasklet - isolation:
strict_isolated
4.3 Bare-metal consequence
On runtimes that still execute tasklets single-threaded, leasing multiple CPU cores should not be the only way to ask for stronger performance isolation.
If the runtime cannot turn wider capacity into useful execution width, the user should still have a clear way to ask for:
- exclusive core ownership
- no cross-tenant sibling sharing
- stronger cache/topology constraints
That argues for explicit isolation/exclusivity fields rather than only wider leases.
Bare-metal semantics locked: see
docs/spec/bare-metal-cpu-lease-semantics.md. Bare-metal runtimes reject wider-than-execution-width CPU leases for single-threaded tasklets; clients useTLV_LEASE_CPU_ISOLATIONor rely on the daemon-wide--cpu-isolation-policydefault. Linux is unaffected.
5. GPU interpretation
The same conceptual split applies to GPUs.
5.1 Capacity
Examples:
- VRAM bytes
- compute partitions / slices
- queue slots or session slots
5.2 Execution mode
Examples:
- one-shot kernel submit
- persistent GPU session
- future multi-stage session or command graph mode
5.3 Isolation / exclusivity
Examples:
- shared: device may multiplex other tenants
- session-exclusive: a session has exclusive residency/state for its lifetime
- device-exclusive: one tenant gets the whole accelerator
- future partition-exclusive: one tenant gets an isolated hardware partition if the device exposes one
This is more honest than trying to encode GPU isolation indirectly through raw VRAM or “number of GPUs” alone.
GPU wire shape locked: see
docs/spec/gpu-exclusivity-wire-format.md. Per-lease GPU exclusivity is carried asTLV_LEASE_GPU_EXCLUSIVITY(0x0903) on the existingLeaseAllocRequestparams blob. Absent TLV inherits daemon-wide--gpu-share-mode. Unsupported classes fail closed. Daemon mode is a permission envelope, not a default-only hint.
6. Inventory and request consequences
If this model is adopted, the system should expose and consume isolation more explicitly.
6.1 Inventory / discovery
Inventory should advertise:
- what isolation classes exist for each resource type
- which class is currently active by default
- whether the runtime can enforce the requested class or only best effort
For CPU, some of this already exists through isolation/topology flags. Equivalent GPU advertisement will be needed as GPU sessions become richer.
6.2 Lease requests
Lease requests should be able to express:
- capacity
- execution mode where relevant
- requested isolation / exclusivity class
A request should fail closed when the node cannot honor the requested isolation/exclusivity semantics.
CPU wire shape locked: see
docs/spec/cpu-isolation-wire-format.md. Per-lease CPU isolation is carried asTLV_LEASE_CPU_ISOLATION(0x0902) on the existingLeaseAllocRequestparams blob, mirroringTLV_LEASE_INTENT_KV_CACHE. Unsupported classes fail closed. Implementation is a separate wave; see §7 of that note.
6.3 Scheduler / admission
Schedulers should be able to trade:
- density
- fairness
- predictability
without guessing user intent from lease width alone.
Scheduler policy locked: see
docs/spec/scheduler-isolation-policy.md. Adopts a filter→score→adapt pipeline shared with Phase 48.7 affinity. Isolation is filter-only (binary), not scored. Tenant priority is orthogonal to per-lease isolation. Rejection reasons are structured and distinguish permanent (no node supports class) from transient (contended) from policy (daemon-mode conflict). Inventory advertisement is a hard prerequisite — until nodes advertise supported classes, the scheduler fails closed on any request stricter than BestEffort/Shared.
7. SDK consequences
The SDK should not force users to communicate isolation indirectly via lease width if that is not what they mean.
For example, the long-term CPU API should trend toward something like:
CpuBuilder::new() .single_core() .isolation(IsolationPolicy::WholeCore) .lease_secs(60) .acquire()?;rather than encouraging:
CpuBuilder::new() .cores(4) // hoping this means "give my single thread a quieter core" .lease_secs(60) .acquire()?;Likewise for GPU, session exclusivity should eventually be requested directly instead of inferred from larger capacity asks.
8. Recommended direction
The repo should evolve toward a common resource model in which:
- capacity, execution mode, and isolation are distinct axes
- CPU isolation is requested explicitly instead of inferred from wider single- threaded leases
- GPU exclusivity follows the same pattern
- unsupported isolation classes fail closed
This does not require every resource type to implement the same policies. It only requires the system to expose the concept consistently and honestly.
9. Initial follow-on work
The next concrete steps are:
- define the common vocabulary in spec/docs
- decide how CPU lease requests expose isolation separately from execution width
- define the analogous GPU exclusivity vocabulary
- align inventory and scheduler reporting with that model
- update SDK builders so examples express isolation explicitly
This should be tracked as a dedicated follow-on phase rather than being folded implicitly into unrelated CPU or GPU work.