fabricBIOS vs grafOS
The system has two layers, and it’s worth keeping them straight because they have different authorities and different failure modes.
fabricBIOS — the substrate
fabricBIOS is a firmware specification, not an operating system.
It defines how nodes on a disaggregated fabric:
- discover each other (signed
ANNOUNCE/WITHDRAW/SOLICIT) - bootstrap trust (pinned controller key + TOFU)
- exchange capability tokens
- create lease-bound bindings to standard data planes (RDMA, NVMe-oF, CXL, GPU fabrics)
That’s all. fabricBIOS does not schedule, does not place workloads, does not run user code, does not own policy. It exposes resources and enforces lease-bound access. Policy lives above.
The reference implementation is in Rust under crates/fabricbios-*, with a normative wire spec at /spec/fabricbios-wire-encoding-v0 and golden vectors that any conforming implementation must reproduce byte-for-byte.
grafOS — the resource-graph runtime
grafOS is the operating system that runs on top of fabricBIOS. It treats the fabric as a substrate and adds:
- a resource graph (
grafos-core) — typed nodes with ports, edges that represent leased connections between resource ports - a scheduler that admits programs and places leases on cells
- a lease lifecycle engine with renewal, revocation, fencing, and event delivery
- WASM tasklets as the unit of compute the fabric admits
- typed SDK crates for memory, block, network, GPU, CPU
- higher-level abstractions: distributed collections, sync primitives, KV, streams, RPC, object storage, message queues, etc.
The grafOS scheduler is where placement and policy decisions happen. The grafOS runtime is what programs link against. The grafOS CLI is how developers talk to all of it.
Why two layers
It would have been simpler to make grafOS a monolith. The two-layer split exists because:
- Different lifecycles. Hardware vendors implement fabricBIOS in firmware; that firmware updates on hardware refresh cycles (years). grafOS updates per release (weeks). Splitting the spec from the runtime means a grafOS upgrade does not require a firmware reflash.
- Different authorities. fabricBIOS owns the trust bootstrap and the data-plane binding lifecycle. grafOS owns scheduling and program execution. If you find a bug in scheduling you can ship a grafOS patch without re-attesting any firmware.
- Different conformance surfaces. Anyone can implement fabricBIOS and pass the golden-vector test suite — the spec is small and explicit. grafOS is the reference runtime; you don’t reimplement it, you use it.
In practice, this means: when you hold a lease and call read(), you’re issuing a fabricBIOS data-plane op that the cell’s fabricBIOS implementation enforces. When the lease expires and read() returns LeaseRevoked, both layers cooperate — fabricBIOS tore down the binding, grafOS surfaced the typed event.
What you the developer see
Most of the time: just grafOS. The SDK and CLI hide the substrate. You’ll only think about fabricBIOS if you’re:
- Writing a non-Rust tasklet (you may need to read the wire spec to debug encoding issues).
- Implementing a new provider cell (you need to host fabricBIOS in your firmware path).
- Auditing the trust model end-to-end (you need to follow signatures from the controller key down through
ANNOUNCEand into per-lease tokens).
For everything else, the two-layer split is an implementation detail. You write programs against the grafOS SDK; they run on cells that speak fabricBIOS underneath.
Where to next
- Lease primitive — the lease state machine the two layers cooperate on.
- Trust model — what each layer signs and what each layer verifies.
- The fabricBIOS normative spec lives in
/spec/fabricBIOS-design-document. Public publication on this site is pending an editorial pass (Phase 212.h).