fabricBIOS Architecture
This document provides a single-page overview of the fabricBIOS system architecture: components, protocols, resource lifecycle, and trust model.
What is fabricBIOS?
fabricBIOS is a minimal firmware specification for disaggregated computing fabrics. It enables nodes to advertise hardware resources, establish trust, exchange capability tokens, and create lease-based bindings to standard data planes (RDMA, NVMe-oF, GPU fabrics, CXL).
fabricBIOS is not an operating system. It exposes resources and enforces access control and lease expiry; policy, scheduling, and placement decisions live above (e.g., in grafOS).
See Premium Dataplane Methodology for the canonical reference on fabricBIOS’s premium dataplane model (RDMA, NVMe-oF, SR-IOV, GPU, CXL).
Component Map
┌──────────────────────────────────────────────────┐ │ Application Layer │ │ grafos-store, grafos-mq, grafos-registry, │ │ grafos-dashboard, grafos-cli, grafos-store-cli │ └────────────────────┬─────────────────────────────┘ │ ┌────────────────────┴─────────────────────────────┐ │ grafOS Standard Libraries │ │ High: collections, tensor, stream, kv, fs, │ │ batch, dsp, jobs │ │ Mid: rpc, sync, net, observe, cache, │ │ securestore, pipeline, profile │ │ Foundation: grafos-std, leasekit, fence, │ │ locator, testkit │ └────────────────────┬─────────────────────────────┘ │ ┌────────────────────┴─────────────────────────────┐ │ grafOS Runtime Layer │ │ grafos-core (graph model, rewrite plans) │ │ grafos-runtime (engine, adapters, scenarios) │ │ grafos-sdk (node authoring helpers) │ │ grafos-posix (WASI/ELF program execution) │ └────────────────────┬─────────────────────────────┘ │ QUIC 5701 + FBMU/FBBU ┌────────────────────┴─────────────────────────────┐ │ fabricbios-core (no_std) │ │ wire, codec, discovery, tokens, leases, │ │ FBMU, FBBU, QUIC adapter, inventory, │ │ bindings, cap_token │ └──┬───────────────┬───────────────┬───────────────┘ │ │ │ ┌────────┴──────┐ ┌──────┴───────┐ ┌────┴──────────────┐ │ fabricbiosd │ │ platform- │ │ platform-rpi- │ │ (Linux │ │ linux │ │ baremetal │ │ daemon) │ │ │ │ (Pi5 firmware) │ └───────────────┘ └──────────────┘ └───────────────────┘Crate Ecosystem (~45 crates)
fabricBIOS Protocol and Platform
| Crate | Role |
|---|---|
fabricbios-core | Portable protocol logic (wire, codec, discovery, tokens, leases, FBMU/FBBU, QUIC adapter, inventory). Supports no_std+alloc for bare-metal targets. |
fabricbios-platform-traits | no_std trait abstractions (Clock, Entropy, UdpSocket, TcpStream, KeyValueStore, Logger). |
fabricbios-platform-linux | Linux/std implementation: persistence, time, QUIC/UDP data-plane helpers, resource auto-detection. |
fabricbios-platform-rpi-baremetal | Pi5 bare-metal: RP1 GEM Ethernet, BCM2712 PCIe, NVMe, DTB parsing, QUIC server, DMA, cache, heap, serial. |
fabricbios-platform-x86-baremetal | x86_64 bare-metal: serial, PCI, virtio, NIC (e1000), entropy, identity, storage. |
fabricbiosd | Linux daemon: node, relay, solicit, control-server, control-client, simulate subcommands. QUIC server, UDP discovery, resource auto-detection. |
fabricbios-pi5-bringup | Pi5 bare-metal entry point. Boots via TFTP/netboot, runs QUIC server + FBMU/FBBU data planes. |
fabricbios-qemu-virt | QEMU aarch64 virt-machine bare-metal target. |
fabricbios-quic-interop | QUIC interop test client: client (control ops), fbmu (memory data-plane), fbbu (block data-plane), gen-client-cert (mTLS). |
fabricbios-quic-crypto | QUIC packet protection (header protection, packet number encryption) for no_std. |
fabricbios-sim | Deterministic network simulator (SimNet, SimClock, SimEntropy). |
fabricbios-harness | Integration test harness: simulated nodes, relays, controllers. Tests churn, replay, partitions, 100-node scale. |
grafOS Runtime
| Crate | Role |
|---|---|
grafos-core | Graph data model: Node, Port, Edge, Capability, LeaseRef, Binding, RewritePlan. |
grafos-runtime | Runtime engine: graph store, rewrite engine, event queue, adapters (sim/live/QUIC), Pi5 demo scenarios, mixed-fleet orchestration, dashboard demo. |
grafos-sdk | Re-exports from grafos-core + grafos-runtime. Helper functions for graph-native nodes. |
grafos-posix | POSIX-ish layer: ELF64 loader, WASI runtime, aarch64 initial stack builder. |
grafos-posix-programs | WASI test programs (smoke tests, filesystem walker, HTTP echo). |
grafOS Standard Libraries — Foundation
| Crate | Role |
|---|---|
grafos-std | Typed access to fabric memory, block, GPU, and CPU resources; resource RAII; builder APIs. |
grafos-leasekit | Lease renewal and TTL budgeting (poll-driven, no_std compatible). |
grafos-fence | Typed epoch/fencing helpers for stale-write rejection and leader fencing. |
grafos-locator | Typed locators and rendezvous/handoff records for fabric resource discovery. |
grafos-testkit | Testing utilities and harness helpers for grafOS libraries. |
grafOS Standard Libraries — Mid-Level
| Crate | Role |
|---|---|
grafos-collections | Distributed data structures (FabricVec, FabricHashMap, FabricQueue) backed by leased memory. |
grafos-sync | Distributed synchronization primitives (FabricMutex, FabricBarrier, FabricWatch) with lease-backed timeouts. |
grafos-net | Network-aware programming (FabricSocket, FabricListener, bandwidth-aware routing). |
grafos-rpc | RPC framework where the hot path is lease-backed shared memory instead of TCP. |
grafos-observe | Fabric observability: metrics, events, distributed tracing, structured logging, OpenTelemetry (OTLP) export, Prometheus format. |
grafos-observe-macros | Proc macros for grafos-observe (#[grafos::instrument] etc.). |
grafos-cache | Lease-backed caching with tiered eviction. |
grafos-securestore | Encrypted storage over fabric resources. |
grafos-pipeline | Multi-stage data processing pipelines across nodes. |
grafos-profile | Program-level resource profiler: flame graphs, lease timelines, data-flow diagrams, waste reports. |
grafOS Standard Libraries — High-Level
| Crate | Role |
|---|---|
grafos-tensor | Tensor/ndarray operations on disaggregated memory and GPU. |
grafos-stream | Stream processing / dataflow with cross-node pipeline stages. |
grafos-kv | Key-value store abstraction over leased memory with block-storage spillover. |
grafos-fs | Distributed filesystem abstraction over leased block storage. |
grafos-batch | Batch job / task graph executor with automatic retry and cleanup. |
grafos-dsp | Signal processing pipelines (FFT, FIR/IIR, mixer, resample) with deterministic latency from lease-based reservation. |
grafos-jobs | Idempotent burst compute and retry scaffolding. |
Application Infrastructure
| Crate | Role |
|---|---|
grafos-store | Universal object storage convention: MemObjectStore, BlockObjectStore, TieredObjectStore with CRC32 checksumming. |
grafos-store-cli | CLI tool for grafOS fabric object stores (grafos-store-cli binary). |
grafos-mq | Lease-based message queue: partitioned topics with ring-buffer storage, consumer groups, dead-letter routing. |
grafos-registry | Fabric-wide service registry: register, discover, and watch services backed by leased fabric memory. |
grafos-dashboard | Real-time monitoring dashboard: topology map, utilization heatmap, lease churn, contention, alerts. |
grafos-cli | Unified operational CLI (grafos binary): inspect nodes, manage leases, object storage, profiling, health checks. |
grafos-rpc-macros | Proc macros for grafos-rpc service definitions. |
Protocol Stack
┌─────────────────────────────────────────────────────┐│ Application Layer ││ Control ops: PING, GET_IDENTITY, GET_INVENTORY, ││ LEASE_ALLOC, LEASE_FREE, LEASE_RENEW, LEASE_QUERY, ││ CAP_REQUEST, CAP_REFRESH, CAP_REVOKE │├─────────────────────────────────────────────────────┤│ Transport Layer ││ QUIC / TLS 1.3 (port 5701) -- control plane ││ UDP (port 5700) -- discovery (ANNOUNCE/SOLICIT) ││ UDP (port 5702) -- FBMU memory data plane ││ UDP (port 5703) -- FBBU block data plane │├─────────────────────────────────────────────────────┤│ Security Layer ││ Ed25519 signatures (discovery, tokens) ││ TLS 1.3 mTLS (QUIC control) ││ HMAC-SHA256 (FBMU/FBBU per-lease dp_key) ││ HMAC-SHA256 capability tokens (cap-tokens feature) │├─────────────────────────────────────────────────────┤│ Wire Format ││ Big-endian, TLV extensible, FRAG_V2 ││ See docs/spec/fabricbios-wire-encoding-v0.md │└─────────────────────────────────────────────────────┘Port Assignments
| Port | Protocol | Purpose |
|---|---|---|
| 5700/UDP | fabricBIOS discovery | ANNOUNCE, SOLICIT, WITHDRAW |
| 5701/QUIC | fabricBIOS control | Lease management, capability tokens, identity |
| 5702/UDP | FBMU | Memory data-plane (read/write with HMAC auth) |
| 5703/UDP | FBBU | Block data-plane (read_block/write_block with HMAC auth) |
Data Plane Landscape
fabricBIOS supports multiple data planes, selected by resource type and deployment context:
| Data Plane | Protocol | Resource | Status |
|---|---|---|---|
| FBMU | UDP 5702 | Memory | Production (Pi5 bare-metal + fabricbiosd) |
| FBBU | UDP 5703 | Block storage | Production (Pi5 NVMe HAT + SD + fabricbiosd) |
| RDMA | soft-RoCE (RXE) | High-perf memory | Dev/CI (Linux soft-RoCE) |
| NVMe-oF | nvmet loop | Premium block binding | Dev/CI (Linux loop-backed nvmet target lifecycle; steady-state remote I/O unproven) |
| macvlan network | macvlan / SR-IOV VF | Network interfaces | Linux fabricbiosd |
| QUIC streams | QUIC bidi | Control + data | All targets (primary transport) |
grafOS applications use these data planes transparently through the library stack. grafos-std provides typed lease handles; higher-level libraries (collections, tensor, store) map operations to the appropriate data plane automatically.
Mixed-Fleet Orchestration
grafOS supports heterogeneous fabrics where bare-metal Pi5 nodes and Linux fabricbiosd nodes participate in a single resource graph:
grafOS runtime │ ┌───────────┴───────────┐ │ │ Pi5 bare-metal (QUIC) fabricbiosd (QUIC) - FBMU memory - FBMU memory - FBBU block (NVMe/SD) - FBBU block - 2-4 GB DRAM - macvlan network - no OS, direct HW - RDMA (soft-RoCE) - NVMe-oF - auto-detected resourcesThe mixed-fleet scenario exercises this: discovering nodes of both types, allocating resources across them, and performing data-plane I/O regardless of the underlying platform. Resource types (MEM, BLOCK, NET, CPU, GPU) are uniform across node types; only the available data planes differ.
Observability Stack
┌────────────────────────────────────────────┐│ grafos-dashboard ││ topology map, heatmap, lease churn, alerts │├────────────────────────────────────────────┤│ grafos-profile ││ flame graphs, lease timelines, waste │├────────────────────────────────────────────┤│ grafos-observe ││ metrics, events, tracing, OTLP, Prometheus │├────────────────────────────────────────────┤│ grafos-observe-macros ││ #[grafos::instrument] proc macro │└────────────────────────────────────────────┘Every lease acquisition, data-plane operation, and rewrite plan execution is an observable event. The stack exports to standard formats (OpenTelemetry OTLP, Prometheus) and provides grafOS-specific views (resource flame graphs, lease timelines, data-flow diagrams). The dashboard provides real-time visualization with QUIC polling against live nodes.
Resource Lifecycle
A resource passes through the following stages:
Advertise Discover Lease Bind ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ Node │─────>│ Relay │─────>│ Client │─────>│ Data │ │ ANNOUNCE │ │ answers │ │ ALLOC │ │ Plane │ │ (signed) │ │ SOLICIT │ │ (QUIC) │ │ I/O │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ v ┌─────────┐ ┌─────────┐ │ Expire │─────>│ Fence │ │ or │ │ (if │ │ Revoke │ │ teardown│ └─────────┘ │ fails) │ └─────────┘-
Advertise: Node sends signed ANNOUNCE to relay (periodically, 30s default). Includes resource inventory (MEM, BLOCK, CPU, GPU, NET), locality, and health flags.
-
Discover: Client sends SOLICIT to relay. Relay responds with aggregated ANNOUNCE payloads, optionally filtered by resource type, node ID, or locality.
-
Lease: Client connects to node via QUIC (port 5701). Issues LEASE_ALLOC (or other resource-specific op) to create a time-bounded lease. Response includes binding TLVs (lease_id, dp_key, endpoint, limits).
-
Bind: Client uses binding credentials to perform data-plane I/O. FBMU for memory (UDP 5702), FBBU for block storage (UDP 5703), RDMA for high-performance memory, and NVMe-oF for premium block bindings where target/session support is available.
-
Use: Client reads/writes via the data plane. Each request carries the lease_id, a nonce (for replay protection), and an HMAC auth_tag (keyed by dp_key).
-
Expire/Revoke: Leases have mandatory expiry. On expiry or explicit FREE, the node tears down data-plane authorization. Subsequent data-plane ops return NO_LEASE.
-
Fence: If teardown fails (hardware fault, driver error), the resource enters FENCED state. No new leases are granted. The resource is reported as FENCED in discovery until remediated.
Trust Model
fabricBIOS uses a layered trust model that progresses from initial contact to full mutual authentication:
Trust Bootstrap
┌───────────────┐ │ TOFU │ First contact: pin server cert hash │ (default) │ Subsequent: verify pinned hash └───────┬───────┘ │ upgrade v ┌───────────────┐ │ mTLS │ Client and server present certificates │ (fbmu-auth) │ Node verifies client cert (TOFU pin) └───────┬───────┘ │ add v ┌───────────────┐ │ Capability │ HMAC-SHA256 tokens for resource access │ Tokens │ Audience-bound, short TTL, attenuable │ (cap-tokens)│ └───────────────┘-
TOFU (Trust On First Use): Default for QUIC connections. Client pins server certificate hash on first connection; subsequent connections verify the pin. Simple but effective for small fabrics.
-
mTLS (Mutual TLS): Enabled via
fbmu-authfeature flag. Both client and server present TLS certificates. Provides bidirectional authentication for the control plane. -
Capability Tokens: Enabled via
cap-tokensfeature flag. HMAC-SHA256 tokens minted by LEASE_ALLOC (via CAP_REQUEST). Tokens are audience-bound, have short TTL (default max 300s), and can be attenuated. Validated on every data-plane operation. -
FBMU/FBBU Auth Tags: Per-lease
dp_keygenerated at allocation time. Every data-plane request carries an HMAC-SHA256 auth tag computed over the request header. Provides per-operation authentication without TLS overhead on the data plane.
Discovery Trust
- ANNOUNCE and WITHDRAW messages are Ed25519-signed.
- Relays can operate in pinned trust bundle mode (verify signatures against known node keys).
- Replay protection via nonce (timestamp or random) + bounded replay cache.
- Rate limiting for unsigned messages (default 10/sec/source).
Deployment Targets
| Target | Transport | Status |
|---|---|---|
Linux daemon (fabricbiosd) | QUIC 5701 (default), UDP 5700 | Production-ready for dev/test |
| Raspberry Pi 5 bare-metal | QUIC 5701, FBMU 5702, FBBU 5703 | 3-node fleet operational |
| x86_64 bare-metal (QEMU) | QUIC 5701 target default, UDP 5700 | Bringup/validation target (not primary deployment path) |
| grafOS runtime | QUIC client to above targets | Sim + live modes |
Key Design Decisions
- QUIC-first: QUIC 5701 is the normative default control transport. All TCP transport code has been removed.
- Fail-closed: Unknown protocol versions, unknown flag bits, missing signatures, and ambiguous identities are all rejected.
- Lease-mandatory: All data-plane bindings require leases with explicit lifetimes. No permanent access grants.
- Signature before decompress: Signature verification always precedes decompression or deep parsing, preventing DoS via malformed compressed payloads.
- Minimal TCB: Core protocol logic is
no_stdcompatible. Platform-specific code is isolated in platform crates. - Mixed-fleet native: The same grafOS runtime and library stack operates uniformly across bare-metal and Linux nodes in a single graph.
- Layered libraries: Applications build on progressively higher-level grafOS libraries (std -> collections -> store) rather than raw wire protocol, keeping application code portable across data planes.
Related Documents
docs/spec/fabricBIOS-design-document.md— Normative specificationdocs/spec/fabricbios-wire-encoding-v0.md— Wire format referencedocs/grafos-design-document.md— grafOS resource graph designdocs/grafos/README.md— grafOS documentation indexdocs/grafos-libraries-overview.md— Library stack layering and guidesdocs/runbooks/getting-started.md— Quick start guidedocs/spec/resource-types.md— Resource type referencedocs/runbooks/pi5-bringup-summary.md— Pi5 bare-metal status