Bare-Metal CPU Lease Semantics
Status: design decision. Commits what bare-metal runtimes do with wider-than-1-core CPU leases for single-threaded tasklets once per-lease CPU isolation exists.
Addendum to:
docs/spec/resource-isolation-and-exclusivity.md§4.3. Builds on:docs/spec/cpu-isolation-wire-format.md.
1. Problem
Once TLV_LEASE_CPU_ISOLATION exists, a client can ask for isolation
directly. The question this note answers: what should
fabricbios-rpi-baremetal and other bare-metal runtimes do when a
client submits a single-threaded tasklet under a wider-than-1-core CPU
lease?
Today, a “4-core lease running a single-threaded tasklet” on bare metal silently reserves 4 cores of which 3 sit idle. That accidentally provides isolation (no other tenant runs on the other 3), which encourages the exact anti-pattern design note §4.3 is trying to eliminate.
2. Decision
Fail closed. Bare-metal runtimes MUST reject a CPU lease whose
capacity exceeds the declared execution width of the tasklet it will
run, regardless of whether TLV_LEASE_CPU_ISOLATION is present.
Rejection uses LeaseError::InvalidArgs with a rejection-reason TLV
naming the specific mismatch:
wider CPU lease not accepted: capacity=4 execution_width=1;use TLV_LEASE_CPU_ISOLATION to request isolation explicitly,or reduce capacity to match execution width.2.1 Migration path for existing clients
Clients that rely on today’s implicit wider-lease-as-isolation pattern have two clean paths forward:
-
Emit
TLV_LEASE_CPU_ISOLATIONwithWholeCore(0x01) orStrictIsolated(0x02) and request capacity=1. This is the preferred long-term shape per design note §7. -
Operator sets
--cpu-isolation-policy WholeCoreon the daemon. An absent TLV inherits the daemon-wide default, so clients continue to receive whole-core isolation without code changes. This is the zero-client-change migration lever.
Neither path requires the client to keep asking for wider capacity than it can execute.
2.2 Why not the alternatives
Four options were considered. This note picks option 1 (fail closed) and rejects the others.
- Silent narrow (option 2) delivers weaker isolation than the client expected — violates the design note §6.2 fail-closed requirement.
- Honor as idle reservation (option 3) preserves the exact anti-pattern §4.3 names. Backwards-compatible but dishonest.
- Accept but warn (option 4) lets clients ignore the warning indefinitely and never migrate. Structured events are not a substitute for a semantic commit.
Fail-closed with the --cpu-isolation-policy daemon-wide override
gives operators a migration lever that doesn’t depend on clients
reading warning logs.
3. Hard rules
- Single-threaded tasklets (Profile v0): capacity > 1 core is rejected unconditionally on bare metal. There is no bare-metal runtime today that can make extra cores useful for a single-threaded tasklet.
- Shared-memory tasklets (when they land on bare metal): capacity
must match declared
max_threads. Capacity > max threads is rejected. Capacity < max threads is rejected. - Isolation TLV absence: treated as “use daemon default”. The fail-closed check above is on the capacity vs execution width mismatch, independent of the isolation class.
- Linux runtime is unaffected. Linux keeps honoring wider leases because cgroup enforcement is meaningful there. The distinction is explicit: bare-metal lacks a scheduler to which wider capacity could be delegated.
- No silent downgrade. If a runtime cannot enforce the requested
isolation class (e.g. bare metal asked for
StrictIsolatedwith no NUMA topology to isolate against), it rejects per the CPU isolation wire format §3, not this note.
4. Scheduler interaction
The scheduler does not need to know about this rule today — placement still sees a CPU capacity ask and routes by capacity. The rejection happens at the node-side lease handler after placement, which converts to a scheduler-level retry on a different cell. This matches the existing failure model for “cell capacity changed since summary.”
5. Inventory advertisement
Once the wire format ships, inventory (GET_INVENTORY) should
advertise a per-resource flag CPU_REJECTS_WIDE_SINGLE_THREADED so
clients can discover the policy without trial-and-error. This is
deferred and picked up alongside the broader isolation-class inventory
advertisement.
6. Doc updates required when this lands
docs/grafos/tasklet-parallelism-model.md— add a “bare-metal semantics” subsection: wider single-threaded leases are rejected; use.isolation()instead.docs/grafos/parallelism-decision-ladder.md— at Rung 1 (single-threaded tasklet), add a note that wider leases do NOT confer isolation and will be rejected on bare-metal targets.docs/grafos-std-guide.md— the existing “CPU isolation policy” subsection should gain a pointer to this note.
7. What this note does NOT commit to
- Implementation. This is the design decision. The
capacity != execution_widthrejection path in the bare-metal runtimes lands as a follow-on. - Scheduler awareness of the rejection policy — currently unsupported.
- Inventory advertisement — deferred.
- Shared-memory tasklet bare-metal support — a separate bring-up question. Rule 2 above only applies if/when that code path exists.
8. Cross-links
docs/spec/resource-isolation-and-exclusivity.md§4.3docs/spec/cpu-isolation-wire-format.mddocs/grafos/tasklet-parallelism-model.md— single-threaded default that this note reinforcesdocs/grafos/parallelism-decision-ladder.md— decision ladder that this note adds a bare-metal footnote to