Skip to content

Bare-Metal CPU Lease Semantics

Status: design decision. Commits what bare-metal runtimes do with wider-than-1-core CPU leases for single-threaded tasklets once per-lease CPU isolation exists.

Addendum to: docs/spec/resource-isolation-and-exclusivity.md §4.3. Builds on: docs/spec/cpu-isolation-wire-format.md.


1. Problem

Once TLV_LEASE_CPU_ISOLATION exists, a client can ask for isolation directly. The question this note answers: what should fabricbios-rpi-baremetal and other bare-metal runtimes do when a client submits a single-threaded tasklet under a wider-than-1-core CPU lease?

Today, a “4-core lease running a single-threaded tasklet” on bare metal silently reserves 4 cores of which 3 sit idle. That accidentally provides isolation (no other tenant runs on the other 3), which encourages the exact anti-pattern design note §4.3 is trying to eliminate.

2. Decision

Fail closed. Bare-metal runtimes MUST reject a CPU lease whose capacity exceeds the declared execution width of the tasklet it will run, regardless of whether TLV_LEASE_CPU_ISOLATION is present.

Rejection uses LeaseError::InvalidArgs with a rejection-reason TLV naming the specific mismatch:

wider CPU lease not accepted: capacity=4 execution_width=1;
use TLV_LEASE_CPU_ISOLATION to request isolation explicitly,
or reduce capacity to match execution width.

2.1 Migration path for existing clients

Clients that rely on today’s implicit wider-lease-as-isolation pattern have two clean paths forward:

  1. Emit TLV_LEASE_CPU_ISOLATION with WholeCore (0x01) or StrictIsolated (0x02) and request capacity=1. This is the preferred long-term shape per design note §7.

  2. Operator sets --cpu-isolation-policy WholeCore on the daemon. An absent TLV inherits the daemon-wide default, so clients continue to receive whole-core isolation without code changes. This is the zero-client-change migration lever.

Neither path requires the client to keep asking for wider capacity than it can execute.

2.2 Why not the alternatives

Four options were considered. This note picks option 1 (fail closed) and rejects the others.

  • Silent narrow (option 2) delivers weaker isolation than the client expected — violates the design note §6.2 fail-closed requirement.
  • Honor as idle reservation (option 3) preserves the exact anti-pattern §4.3 names. Backwards-compatible but dishonest.
  • Accept but warn (option 4) lets clients ignore the warning indefinitely and never migrate. Structured events are not a substitute for a semantic commit.

Fail-closed with the --cpu-isolation-policy daemon-wide override gives operators a migration lever that doesn’t depend on clients reading warning logs.

3. Hard rules

  1. Single-threaded tasklets (Profile v0): capacity > 1 core is rejected unconditionally on bare metal. There is no bare-metal runtime today that can make extra cores useful for a single-threaded tasklet.
  2. Shared-memory tasklets (when they land on bare metal): capacity must match declared max_threads. Capacity > max threads is rejected. Capacity < max threads is rejected.
  3. Isolation TLV absence: treated as “use daemon default”. The fail-closed check above is on the capacity vs execution width mismatch, independent of the isolation class.
  4. Linux runtime is unaffected. Linux keeps honoring wider leases because cgroup enforcement is meaningful there. The distinction is explicit: bare-metal lacks a scheduler to which wider capacity could be delegated.
  5. No silent downgrade. If a runtime cannot enforce the requested isolation class (e.g. bare metal asked for StrictIsolated with no NUMA topology to isolate against), it rejects per the CPU isolation wire format §3, not this note.

4. Scheduler interaction

The scheduler does not need to know about this rule today — placement still sees a CPU capacity ask and routes by capacity. The rejection happens at the node-side lease handler after placement, which converts to a scheduler-level retry on a different cell. This matches the existing failure model for “cell capacity changed since summary.”

5. Inventory advertisement

Once the wire format ships, inventory (GET_INVENTORY) should advertise a per-resource flag CPU_REJECTS_WIDE_SINGLE_THREADED so clients can discover the policy without trial-and-error. This is deferred and picked up alongside the broader isolation-class inventory advertisement.

6. Doc updates required when this lands

  • docs/grafos/tasklet-parallelism-model.md — add a “bare-metal semantics” subsection: wider single-threaded leases are rejected; use .isolation() instead.
  • docs/grafos/parallelism-decision-ladder.md — at Rung 1 (single-threaded tasklet), add a note that wider leases do NOT confer isolation and will be rejected on bare-metal targets.
  • docs/grafos-std-guide.md — the existing “CPU isolation policy” subsection should gain a pointer to this note.

7. What this note does NOT commit to

  • Implementation. This is the design decision. The capacity != execution_width rejection path in the bare-metal runtimes lands as a follow-on.
  • Scheduler awareness of the rejection policy — currently unsupported.
  • Inventory advertisement — deferred.
  • Shared-memory tasklet bare-metal support — a separate bring-up question. Rule 2 above only applies if/when that code path exists.
  • docs/spec/resource-isolation-and-exclusivity.md §4.3
  • docs/spec/cpu-isolation-wire-format.md
  • docs/grafos/tasklet-parallelism-model.md — single-threaded default that this note reinforces
  • docs/grafos/parallelism-decision-ladder.md — decision ladder that this note adds a bare-metal footnote to