Resource Isolation And Exclusivity Semantics

This note describes a more explicit way to model performance isolation in fabricBIOS and grafOS.

The core idea is:

capacity answers “how much hardware is reserved?”
execution mode answers “how does the workload run on that hardware?”
isolation / exclusivity policy answers “how much interference from other workloads is allowed?”

These concerns are related, but they are not the same thing. The repo already separates lease width from tasklet execution width for CPU tasklets. This document argues that isolation should also be represented explicitly instead of being inferred indirectly from lease size.

1. Problem statement

Today it is easy for a user to infer the wrong meaning from a request such as “lease 4 CPU cores.”

That request could mean several different things:

“run my workload in parallel across 4 workers”
“reserve enough capacity for 4 separate tasklets”
“give my single-threaded tasklet stronger isolation from other tenants”
“reserve scheduler headroom because this stage may later fan out”

Those are different intents. A single numeric lease width is a poor proxy for all of them.

The repo has already addressed part of this problem for CPU tasklets:

num_cores reserves CPU capacity
tasklet execution width stays separate
shared-memory tasklets are an explicit execution mode, not an accidental consequence of .cores(n)

That same style should be carried further. Users who want performance predictability should be able to ask for isolation directly rather than smuggling that request through lease width.

2. Design goals

This model should:

make user intent explicit
reduce accidental over-reservation of hardware
preserve the lease-width vs execution-width separation
work across CPU and GPU resources
let schedulers reason about density vs predictability
expose operator-visible policy in inventory and logs

This model should not:

turn fabricBIOS into a process scheduler
imply POSIX-like threading semantics
promise hardware isolation properties that the runtime cannot actually enforce

3. Common model

Every resource request should be decomposable into three axes.

3.1 Capacity

How much of the resource is reserved.

Examples:

CPU: num_cores = 4
MEM: mem_bytes = 256 MiB
GPU: vram_bytes = 8 GiB, compute_slices = 1

3.2 Execution mode

How software is allowed to execute against that capacity.

Examples:

CPU tasklet profile v0: single-threaded tasklet execution
CPU tasklet profile v1: explicit shared_memory_tasklet
GPU: stateless kernel submit vs persistent session vs future graph/session execution modes

3.3 Isolation / exclusivity policy

How much cross-tenant sharing is allowed for the reserved hardware.

Examples:

best-effort sharing
whole-core exclusivity
strict isolated placement
GPU session-exclusive
GPU device-exclusive

The important rule is:

capacity does not imply execution mode
capacity does not imply isolation
isolation does not imply parallel execution

4. CPU interpretation

For CPUs, the following distinctions matter.

4.1 Single-threaded but isolation-sensitive work

Some workloads cannot parallelize well, but still benefit from reduced interference:

low-jitter control loops
latency-sensitive parsers or codecs
cache-sensitive single-threaded compute
workloads harmed by SMT sibling contention

Those workloads may want:

execution width = 1
capacity = small
isolation = strong

That intent is better expressed as an isolation policy than as “lease more cores and hope that implies exclusivity.”

4.2 CPU policy classes

The current Linux path already uses CPU isolation policies such as:

BestEffort
WholeCore
StrictIsolated

That is the right general shape. Over time, CPU requests should read more like:

capacity: 1 core
execution mode: single-threaded tasklet
isolation: whole_core

or:

capacity: 8 cores
execution mode: shared_memory_tasklet
isolation: strict_isolated

4.3 Bare-metal consequence

On runtimes that still execute tasklets single-threaded, leasing multiple CPU cores should not be the only way to ask for stronger performance isolation.

If the runtime cannot turn wider capacity into useful execution width, the user should still have a clear way to ask for:

exclusive core ownership
no cross-tenant sibling sharing
stronger cache/topology constraints

That argues for explicit isolation/exclusivity fields rather than only wider leases.

Bare-metal semantics locked: see docs/spec/bare-metal-cpu-lease-semantics.md. Bare-metal runtimes reject wider-than-execution-width CPU leases for single-threaded tasklets; clients use TLV_LEASE_CPU_ISOLATION or rely on the daemon-wide --cpu-isolation-policy default. Linux is unaffected.

5. GPU interpretation

The same conceptual split applies to GPUs.

5.1 Capacity

Examples:

VRAM bytes
compute partitions / slices
queue slots or session slots

5.2 Execution mode

Examples:

one-shot kernel submit
persistent GPU session
future multi-stage session or command graph mode

5.3 Isolation / exclusivity

Examples:

shared: device may multiplex other tenants
session-exclusive: a session has exclusive residency/state for its lifetime
device-exclusive: one tenant gets the whole accelerator
future partition-exclusive: one tenant gets an isolated hardware partition if the device exposes one

This is more honest than trying to encode GPU isolation indirectly through raw VRAM or “number of GPUs” alone.

GPU wire shape locked: see docs/spec/gpu-exclusivity-wire-format.md. Per-lease GPU exclusivity is carried as TLV_LEASE_GPU_EXCLUSIVITY (0x0903) on the existing LeaseAllocRequest params blob. Absent TLV inherits daemon-wide --gpu-share-mode. Unsupported classes fail closed. Daemon mode is a permission envelope, not a default-only hint.

6. Inventory and request consequences

If this model is adopted, the system should expose and consume isolation more explicitly.

6.1 Inventory / discovery

Inventory should advertise:

what isolation classes exist for each resource type
which class is currently active by default
whether the runtime can enforce the requested class or only best effort

For CPU, some of this already exists through isolation/topology flags. Equivalent GPU advertisement will be needed as GPU sessions become richer.

6.2 Lease requests

Lease requests should be able to express:

capacity
execution mode where relevant
requested isolation / exclusivity class

A request should fail closed when the node cannot honor the requested isolation/exclusivity semantics.

CPU wire shape locked: see docs/spec/cpu-isolation-wire-format.md. Per-lease CPU isolation is carried as TLV_LEASE_CPU_ISOLATION (0x0902) on the existing LeaseAllocRequest params blob, mirroring TLV_LEASE_INTENT_KV_CACHE. Unsupported classes fail closed. Implementation is a separate wave; see §7 of that note.

6.3 Scheduler / admission

Schedulers should be able to trade:

density
fairness
predictability

without guessing user intent from lease width alone.

Scheduler policy locked: see docs/spec/scheduler-isolation-policy.md. Adopts a filter→score→adapt pipeline shared with affinity-aware placement. Isolation is filter-only (binary), not scored. Tenant priority is orthogonal to per-lease isolation. Rejection reasons are structured and distinguish permanent (no node supports class) from transient (contended) from policy (daemon-mode conflict). Inventory advertisement is a hard prerequisite — until nodes advertise supported classes, the scheduler fails closed on any request stricter than BestEffort/Shared.

7. SDK consequences

The SDK should not force users to communicate isolation indirectly via lease width if that is not what they mean.

For example, the long-term CPU API should trend toward something like:

CpuBuilder::new()
    .single_core()
    .isolation(IsolationPolicy::WholeCore)
    .lease_secs(60)
    .acquire()?;

rather than encouraging:

CpuBuilder::new()
    .cores(4) // hoping this means "give my single thread a quieter core"
    .lease_secs(60)
    .acquire()?;

Likewise for GPU, session exclusivity should eventually be requested directly instead of inferred from larger capacity asks.

8. Recommended direction

The repo should evolve toward a common resource model in which:

capacity, execution mode, and isolation are distinct axes
CPU isolation is requested explicitly instead of inferred from wider single- threaded leases
GPU exclusivity follows the same pattern
unsupported isolation classes fail closed

This does not require every resource type to implement the same policies. It only requires the system to expose the concept consistently and honestly.

9. Initial follow-on work

The next concrete steps are:

define the common vocabulary in spec/docs
decide how CPU lease requests expose isolation separately from execution width
define the analogous GPU exclusivity vocabulary
align inventory and scheduler reporting with that model
update SDK builders so examples express isolation explicitly

This should be tracked as a dedicated follow-on phase rather than being folded implicitly into unrelated CPU or GPU work.