Skip to content

fabricBIOS Specification v0.1

Abstract

fabricBIOS is a minimal firmware specification for disaggregated computing fabrics. It enables nodes to advertise hardware resources, establish trust, exchange capability tokens, and create lease-based bindings to standard data planes (RDMA, NVMe-oF, vendor GPU fabrics, and future fabrics such as CXL).

fabricBIOS is not an operating system. It exposes resources and enforces access control and lease expiry; policy, scheduling, and placement decisions live above.

The name reflects its role: a “BIOS” for the fabric—the layer below the OS that exports hardware resources over the network.

See Premium Dataplane Methodology for the canonical reference on fabricBIOS’s premium dataplane model (RDMA, NVMe-oF, SR-IOV, GPU, CXL).


1. Design Principles

  1. Minimal TCB: Small enough to verify, audit, and run in firmware/DPU, or behind a small trusted proxy.
  2. Mechanism, not policy: Exposes capabilities and enforces safety-critical limits only. Scheduling, isolation, paging, fairness, and optimization are OS concerns.
  3. Leverage existing data planes: Discover and authorize RDMA/NVMe-oF; do not reimplement bulk transfer.
  4. Node-addressed, resource-identified: Nodes have IPv6 addresses; resources have stable UUIDs.
  5. Strong security: Cryptographic signatures, audience binding, anti-replay, and lease-based revocation.
  6. Lease-oriented: Data-plane bindings are leases with explicit lifetime, renewal, and mandatory teardown.
  7. Interop-first: Unambiguous wire format, byte order, signature rules, fragmentation behavior, and replay handling.
  8. Profiles for feasibility: Multiple secure transport profiles accommodate both DPU-class devices and minimal endpoints.

2. Scope

2.1 Responsibilities

ResponsibilityDescription
DiscoveryAdvertise node identity, locality, and resource inventory
Trust bootstrapIdentity, attestation hooks, fabric enrollment
Capability exchangeMint, attenuate, validate, revoke capability tokens
Lease managementCreate, renew, revoke leases; enforce expiry
Data plane bindingProvide endpoints/credentials for RDMA/NVMe-oF/vendor protocols
Safety enforcementRate-limit unauthenticated traffic; enforce lease teardown; reject invalid tokens

2.2 Non-responsibilities

Not fabricBIOSWhy
CPU execution / schedulingPolicy; requires isolation, preemption, accounting
Memory pagingPolicy; OS decides faulting, caching, placement
Filesystem semanticsPolicy; fabricBIOS exposes block endpoints
QoS fairnessPolicy; OS/fabric decides allocation; fabricBIOS may enforce safety limits only
Process isolationOS/hypervisor concern
Global optimizationOS planners/optimizers

CPU resources: fabricBIOS advertises CPU topology/capacity so a higher-layer OS can schedule compute across nodes, but fabricBIOS does not execute workloads.


3. Addressing Model

3.1 Node Addressing

Each node has one primary IPv6 address for the fabricBIOS control endpoint:

fd00:FABRIC:SITE:NODE::1/64

Conventions:

  • Operator allocates a /48 ULA prefix (e.g., fd00:abcd::/48)
  • Sites/racks get /56 or /60
  • Nodes get /64; ::1 is the fabricBIOS control endpoint

Example:

fd00:abcd:0001:0001::1 ← Site 1, Node 1
fd00:abcd:0001:0002::1 ← Site 1, Node 2
fd00:abcd:0002:0001::1 ← Site 2, Node 1

3.2 Resource Identification

Resources are identified by 128-bit UUIDs carried in protocol payloads.

  • UUIDs are transmitted as 16 bytes in canonical RFC 4122 byte order.
  • A structured UUID layout is recommended but not required:
Resource UUID (128 bits):
┌────────────────┬────────────────┬────────────────┐
│ Type (16) │ Node (48) │ Instance (64) │
└────────────────┴────────────────┴────────────────┘

4. Wire Format

4.1 Common Header

All fabricBIOS messages share a common header.

All multi-byte integer fields are big-endian (network byte order).

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Version | Msg Type | Flags |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Payload Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Request ID (64) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Nonce (64) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Payload (variable) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Signature (Ed25519, 64B) |
| (if SIGNED flag) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Field definitions:

FieldSizeDescription
Version8 bitsProtocol version (0x01)
Msg Type8 bitsMessage type (§4.2)
Flags16 bitsFlags (§4.3)
Payload Length32 bitsPayload bytes only (excludes signature)
Request ID64 bitsCorrelation ID; responses echo this
Nonce64 bitsAnti-replay; semantics depend on flags (§4.5)
Frag Offset32 bitsIf FRAG_V2: byte offset within full payload
Frag Total Len32 bitsIf FRAG_V2: total unfragmented payload size
PayloadvarType-specific payload
Signature64 bytesEd25519 signature if SIGNED flag set

If FRAG_V2 is unset, Frag Offset and Frag Total Len are omitted from the header.

4.1.1 Canonical variable-length encoding (v0)

  • bytes := u32 len + len bytes (reject over-limit).
  • list := u16 count + repeated items (reject over-limit).
  • tlv := u8 type + u16 len + len bytes; unknown TLVs MUST be skippable.
  • Optional fields use a u8 present flag (0=absent, 1=present) followed by the field bytes.

4.2 Message Types

CodeNameDirectionSigned
0x01ANNOUNCENode → FabricRequired
0x02SOLICITClient → Fabric/RelayOptional (recommended in routed fabrics)
0x03WITHDRAWNode → FabricRequired
0x10REQUESTClient → NodeRequired (for any op beyond discovery)
0x11RESPONSENode → ClientRequired
0x20REVOKE_BROADCASTNode → FabricRequired

4.3 Flags

BitNameMeaning
0SIGNEDSignature trailer present
1COMPRESSEDPayload is zstd-compressed
2CONTINUEDFragmented message: more fragments follow
3FINALFragmented message: last fragment
4NONCE_IS_TIMESTAMPNonce is UNIX seconds (u64). If unset, Nonce is random u64
5FRAG_V2Fragment metadata present (offset + total length)
6–15ReservedMUST be zero

4.4 Signature and Compression Rules

  • If COMPRESSED is set, payload bytes are compressed before signing.
  • Receivers MUST verify the signature before attempting decompression or deep parsing.
  • Signature is computed over the exact on-wire bytes of header fields + payload (excluding the signature bytes).
  • If fragmented, each fragment is signed independently over that fragment’s header+payload.

4.5 Anti-Replay and Nonce Handling

fabricBIOS supports two nonce modes:

Timestamp mode (NONCE_IS_TIMESTAMP=1):

  • Nonce is UNIX seconds.
  • Receiver MUST reject messages outside an allowed skew window (default ±300s).
  • Receiver SHOULD keep a small replay cache (sender_id, request_id, nonce) within the skew window to reject duplicates.

Random mode (NONCE_IS_TIMESTAMP=0):

  • Nonce is random u64.
  • Receiver MUST maintain a bounded replay cache per sender for at least REPLAY_WINDOW seconds (default 300s) or MAX_NONCES entries (implementation-defined), evicting oldest.

4.6 Anti-DoS Requirements

Implementations MUST:

  1. Rate-limit unsigned messages (≤10/sec per source IP by default).
  2. For signed messages, perform cheap header sanity checks (version, flags, payload length bounds), then validate signature before parsing payload.
  3. Reject invalid or replayed nonces (§4.5).
  4. Drop malformed headers without processing payload.
  5. Enforce a maximum UDP payload size for discovery/control; large responses MUST use fragmentation flags (§5.6).

5. Discovery Protocol

5.1 Transport

Discovery uses UDP port 5700.

5.2 Multicast Groups

fabricBIOS defines well-known IPv6 multicast group IDs (private-by-spec). The group ID encodes ASCII “fbio” (0x66 62 69 6f) plus suffix 0x0001.

  • Link-local: ff02::6662:696f:0001
  • Site-local: ff05::6662:696f:0001
  • Organization-local: ff08::6662:696f:0001

In routed fabrics, multicast may be unavailable; discovery relays are recommended and often required (§5.5).

5.3 ANNOUNCE

ANNOUNCE is sent on boot, periodically (default 30s), and on resource change. Under the normative default profiles in §8.1-§8.4, ANNOUNCE MUST be signed. The only exception is the explicit trusted-fabric exception profile in §8.5.

ANNOUNCE Payload:

node_id : u128
node_addr : 16B IPv6
fabric_id : u64
sequence : u64 (monotonic per node)
locality : LocalityInfo
attestation_tlv: TLV (optional)
resources : ResourceSummary[]

LocalityInfo:

rack_id : u32
row_id : u32
site_id : u32
geo_hash : u64 (optional)
custom : 32B operator-defined

ResourceSummary:

resource_id : u128
type : u16
flags : u16
capacity : u64
available : u64
descriptors : DescriptorTLV[]
endpoints : EndpointTLV[] (optional; may be omitted and fetched via GET_INVENTORY)

5.4 SOLICIT

SOLICIT queries for resources. Signing SOLICIT is RECOMMENDED outside a single trusted L2 domain.

SOLICIT Payload:

query_type : u8 (0=all, 1=by_type, 2=by_node, 3=by_locality)
filters : Filter[]

Filter:

field : u8
op : u8 (EQ, GT, LT, CONTAINS)
value : 32B

5.4.1 Filter Field Registry (v0)

Filter field IDs are u8 with the following ranges:

  • 0x00 reserved
  • 0x01..=0x3F core registry (this document)
  • 0x40..=0x7F experimental/extension
  • 0x80..=0xFF vendor-specific

Defined fields:

FieldNameValue encodingOps
0x01RESOURCE_TYPEvalue[0..2] = u16 BE resource type codeEQ
0x02NODE_IDvalue[0..16] = u128 BEEQ
0x03SITE_IDvalue[0..4] = u32 BEEQ, GT, LT
0x04ROW_IDvalue[0..4] = u32 BEEQ, GT, LT
0x05RACK_IDvalue[0..4] = u32 BEEQ, GT, LT
0x06LOCALITY_CUSTOMvalue[0..32] opaqueEQ, CONTAINS
0x07RESOURCE_FLAGSvalue[0..2] = u16 BE bitmaskEQ, CONTAINS

CONTAINS for LOCALITY_CUSTOM uses non-zero bytes as required matches. RESOURCE_FLAGS uses a bitmask where FENCED=0x0001 and DEGRADED=0x0002; CONTAINS requires all bits set.

5.5 Discovery Relay

Multicast is frequently unreliable in routed leaf-spine networks. fabricBIOS supports a unicast discovery relay.

  • Anycast relay address (operator-configured): fd00:FABRIC::ffff:1
  • Nodes periodically unicast ANNOUNCE to the relay.
  • Clients unicast SOLICIT to the relay.
  • Relay responds with RESPONSE messages containing aggregated ANNOUNCE payloads (may be chunked).

In routed fabrics, a discovery relay SHOULD be considered required unless the operator provisions and validates multicast routing reliability.

5.5.1 Relay Discovery Profile (RESPONSE)

The relay discovery profile defines how relays encode inventory responses using MsgType::RESPONSE. This keeps RESPONSE as the message type while specifying a fixed payload layout for interop.

Payload encoding:

count : u16
repeated count times:
announce_payload_bytes : bytes (u32 length + bytes of AnnouncePayload encoding)

Rules:

  • Each list entry MUST be a complete AnnouncePayload encoded per this spec.
  • Relays MAY split inventory across multiple RESPONSE frames to honor MTU guidance (§5.6).
  • Inventory ordering is implementation-defined; clients MUST treat the list as an unordered snapshot and MAY sort by node_id for stable presentation.
  • Relays SHOULD include at most one entry per node_id, choosing the most recently observed ANNOUNCE (or highest sequence when available).
  • Inventory may be truncated to fit relay caps; absence of a node in a RESPONSE does not imply withdrawal or fencing.

5.5.2 Relay verification requirements

Relays enforce the same replay protections as any other receiver:

  • Relays MUST reject replayed or invalid nonces for ANNOUNCE, WITHDRAW, and SOLICIT.
  • Relays SHOULD support a pinned trust bundle mode for verifying ANNOUNCE/WITHDRAW signatures. When configured with a trust bundle, relays MUST reject unsigned or invalidly signed frames.

5.5.3 Relay limits and defaults

Relays MUST bound in-memory caches and rate-limit unauthenticated traffic. Recommended defaults (subject to tuning):

  • max_nodes_cached=4096
  • max_inventory_bytes=1MiB
  • replay_max_entries=4096
  • rate_limit_max_senders=4096
  • unsigned_per_sender_per_sec=10
  • sig_verifies_per_sec=2000

Control servers SHOULD use comparable replay and rate-limit defaults for control-plane requests.

5.6 Chunking and MTU Guidance

To avoid IP fragmentation, implementations SHOULD limit UDP payloads to ≤1200 bytes. For larger inventories/responses:

  • Use CONTINUED and FINAL fragmentation flags plus FRAG_V2 metadata.
  • When FRAG_V2 is set, each fragment includes Frag Offset + Frag Total Len.
  • FINAL MUST be set on the fragment where offset + len == total_len.
  • Receivers MUST reassemble fragments keyed by (sender, request_id) and reject overlaps.
  • Fragments with the same key MUST agree on msg_type, nonce, and cleared flags (and total len when present); mismatches cause the in-progress reassembly to be dropped.
  • Implementations SHOULD bound in-flight reassembly and drop incomplete entries after a short timeout to avoid unbounded memory growth.
  • Capability negotiation: responders MAY emit FRAG_V2 fragments only when the requester advertises FRAG_V2 support; otherwise they fall back to legacy CONTINUED/FINAL fragments.

6. Resource Types

CodeTypeCapacity UnitNotes
0x0001CPUCore countAdvertised only; no remote exec
0x0002MEMBytesRDMA-accessible memory pool
0x0003GPUCompute unitsVendor fabric / RDMA-based
0x0004NVMEBytesBlock storage via NVMe-oF
0x0005FPGASlotsVendor-defined binding
0x0006PMEMBytesPersistent memory pool
0x0007CXL_MEMBytesCXL memory pooling/switch binding (extension)
0x00FFVENDORVendor-definedExtension space

7. Trust and Identity

7.1 Node Identity

NodeIdentity:

node_id : u128
public_key : 32B (Ed25519)
certificate : Certificate

Certificate:

subject : u128 (node_id)
issuer : u128 (CA id)
issued_at : u64
expires_at : u64
public_key : 32B
extensions : ExtensionTLV[]
signature : 64B (Ed25519 by CA)

Canonical identifier formats:

  • Hex string: 0x + 32 lowercase hex digits (zero-padded).
  • URN string: urn:fabricbios:node:0x<32-hex>.
  • Tooling MAY accept bare hex input, but MUST normalize to the canonical 0x form for display.

TLS identity mapping (all profiles using TLS/mTLS):

  • The peer certificate SAN URI MUST include urn:fabricbios:node:0x<32-hex>.
  • Implementations MUST fail closed if the SAN URI is missing or ambiguous.

7.2 Controller Discovery and Trust Bootstrap

Enrollment requires locating and trusting the fabric controller before a fabric certificate exists. Operators choose one:

A) Pinned Fabric CA (recommended):

  • Node firmware contains fabric CA public key (or hash).
  • Controller presents a chain anchored at that CA.

B) Pinned Controller Key (recommended for small fabrics):

  • Node firmware contains controller public key (or hash).

C) TOFU (allowed only if explicitly enabled):

  • Node pins controller key on first contact.
  • Subsequent enrollment requires same key.
  • Operators SHOULD disable TOFU in high-security deployments.

Controller anycast address may be configured, e.g. fd00:FABRIC::ffff:2, or provided by DHCPv6 option/local config.

7.3 Fabric Enrollment

  1. Node boots with manufacturer identity.
  2. Node contacts controller, verifies identity per §7.2.
  3. Controller verifies manufacturer identity and optional attestation.
  4. Controller issues fabric certificate binding node_id to public_key.
  5. Node signs ANNOUNCE/WITHDRAW/REVOKE_BROADCAST and control responses using fabric identity.

7.4 Attestation (Optional)

Attestation is carried as a TLV:

type : u8 (0=none, 1=TPM2, 2=SGX, 3=SEV, 4=TDX)
evidence : bytes

Higher layers may impose policy using attestation evidence.


8. Control Plane Transports and Profiles

Lease management operations require an authenticated reliable transport. Default policy (2026-02-19): QUIC/TLS 1.3 is the standard control-plane transport for general-purpose nodes (including Pi-class bare metal and Linux nodes). Legacy profiles remain documented for migration only.

8.1 Profile FULL (normative default)

  • Discovery: UDP 5700
  • Control + lease ops: QUIC 5701 (TLS 1.3, mTLS)

8.2 Profile COMPAT (legacy migration only)

  • Discovery: UDP 5700
  • Control + lease ops: TCP 5701 + TLS 1.3 (mTLS)
  • Status: retained only for migration/backward compatibility; SHOULD NOT be used for new deployments.

8.3 Profile PROXIED (constrained bring-up only)

  • Firmware supports discovery signing + a minimal local interface to a trusted on-node proxy.
  • Proxy terminates QUIC/TLS or TCP/TLS and performs lease ops + device programming.
  • Status: implementation aid for constrained bring-up; SHOULD NOT be the steady-state profile for Pi/Linux nodes.

8.4 Normative Requirement

Lease management operations (ALLOC/BIND/RENEW/FREE) MUST use an authenticated reliable transport, satisfied by:

  • QUIC/TLS 1.3, or
  • TCP/TLS 1.3, or
  • a trusted proxy that provides one of the above.

General-purpose nodes SHOULD implement QUIC/TLS 1.3 directly.

UDP is acceptable for discovery and limited low-risk control only.

8.5 Trusted-Fabric Exception Profile (non-default)

This is an explicit exception profile for physically isolated trusted segments under a single administrative domain. It does not replace the normative default in §8.1-§8.4. Operators MUST opt in deliberately and document the accepted risks.

Allowed relaxations in this exception profile are intentionally narrow:

  • Control + lease ops MAY omit TLS only on the trusted segment, using the same transport framing and lease semantics, when the operator provides equivalent physical/network isolation.
  • ANNOUNCE and SOLICIT MAY omit signature requirements on that trusted segment.
  • Local east-west firewalling between fabric nodes MAY be relaxed when the segment itself is dedicated and isolated.

The following remain mandatory even in this exception profile:

  • capability token validation,
  • lease TTL enforcement,
  • revoke/expiry teardown,
  • fencing on teardown failure,
  • audit logging,
  • anti-replay for compatibility dataplanes where it exists.

WITHDRAW, REVOKE_BROADCAST, and the general secure default for Pi/Linux-class nodes remain governed by the normative posture above unless an operator has explicitly enabled and documented the trusted-fabric exception.


9. Capability Tokens

9.1 Token Structure

version : u8
token_id : u128
resource_id : u128
audience : u128 (node_id or service_id)
permissions : u32
issued_at : u64
expires_at : u64 (default max TTL 300s)
issuer : u128 (issuer node_id)
caveats : CaveatTLV[]
signature : 64B (Ed25519 by issuer)

9.2 Audience Binding (Normative)

A token is valid only when presented by its audience.

  • Over QUIC/TLS or TCP/TLS: audience binding is satisfied by the authenticated peer identity mapped to node_id/service_id.
  • Over UDP: REQUEST MUST include presenter proof (see §11.2).

audience = 0 indicates a bearer token and MUST be restricted to short TTL and SHOULD include SOURCE_IP caveats.

9.3 Permissions

BitName
0READ
1WRITE
2ADMIN
3DELEGATE
4EXCLUSIVE

Reserved bits MUST be zero.

9.4 Caveats (TLV)

Caveat TLV:

type : u8
length : u16
data : bytes

Caveat types:

  • TIME_BOUND
  • SOURCE_IP
  • RANGE (offset/length for MEM/NVME)
  • RATE_LIMIT (safety limit; MAY be enforced by fabricBIOS)
  • DEPTH (max delegation depth)
  • AUDIENCE (additional audience constraint)

9.5 Delegation (Attenuation)

Derived tokens can only restrict:

  • add caveats
  • narrow permissions
  • set a new audience

Verifier checks parent validity, DELEGATE permission, delegator identity, and restriction-only semantics.

9.6 Token Revocation

  • Tokens SHOULD have short TTL (default max 300s).
  • Immediate revocation can be broadcast:

REVOKE_BROADCAST Payload:

issuer : u128
token_ids : u128[]
until : u64

Receivers cache revocations until until and reject revoked token_ids.


10. Lease Management

10.1 Lease Model

Any data-plane binding (e.g., LEASE_ALLOC, NVME_BIND, GPU_BIND, CXL_BIND) creates a lease.

A token authorizes control operations; a lease governs data-plane lifetime.

Lease:

lease_id : u128
resource_id : u128
holder : u128 (node_id/service_id)
granted_at : u64
expires_at : u64
binding : DataPlaneBinding

10.2 Timing Defaults

  • duration: 60s (range 10s–3600s)
  • renewal window: last 20%
  • grace: 10s (range 0–60s)

10.3 Teardown (Normative)

On expiry or revoke:

  • fabricBIOS MUST tear down the data-plane authorization such that subsequent data-plane operations fail.

Examples:

  • RDMA: invalidate/rotate rkey and/or destroy/poison QP; deregister memory
  • NVMe-oF: disconnect controller session; revoke auth material
  • GPU fabric: revoke endpoint/session credentials
  • CXL: remove/disable mapping window or decoder rule (per platform)

10.4 Teardown Failure: FENCED State

If teardown fails or hardware enters an unsafe state, fabricBIOS MUST fence the resource:

  • No new leases are granted for that resource.
  • Existing leases remain invalid at the control plane.
  • Resource is reported as FENCED/DEGRADED in discovery (ResourceSummary flags).
  • fabricBIOS SHOULD attempt remediation (reset session/device) if supported.

Recommended status code:

  • RESOURCE_FENCED — resource is fenced due to teardown failure or hardware fault.

11. Control Plane Operations

11.1 Message Types and Ports

  • UDP 5700: discovery + limited control
  • QUIC 5701: control + lease management (required default)
  • TCP 5701: legacy migration path only (optional)

11.2 REQUEST/RESPONSE Payloads

Normative wire encoding for REQUEST/RESPONSE is defined in docs/spec/fabricbios-wire-encoding-v0.md. The summary below is informative.

REQUEST Payload:

op : u16
resource_id : u128
token : CapabilityToken
params : bytes
presenter_id : u128 (REQUIRED on UDP; OPTIONAL on QUIC — peer identity is TLS-authenticated)
presenter_sig : 64B (REQUIRED on UDP; OPTIONAL on QUIC — peer identity is TLS-authenticated)

On UDP, presenter_sig proves possession of presenter_id private key and binds the request to the token. The signature MUST cover the on-wire REQUEST payload with presenter_sig zeroed. On QUIC, the TLS-authenticated peer identity satisfies the same binding requirement.

RESPONSE Payload:

status : u16
op : u16 (echo)
result : bytes

11.3 Status Codes

CodeName
0x0000OK
0x0001INVALID_TOKEN
0x0002INSUFFICIENT_PERM
0x0003RESOURCE_NOT_FOUND
0x0004RESOURCE_BUSY
0x0005CAPACITY_EXCEEDED
0x0006LEASE_EXPIRED
0x0007RATE_LIMITED
0x0008RESOURCE_FENCED
0x00FFINTERNAL_ERROR

11.4 Operations

Node-level (resource_id = 0):

  • PING → uptime
  • GET_INVENTORY → full inventory (chunked as needed)

Capability:

  • CAP_REQUEST → mint token with perms, duration, audience, caveats
  • CAP_REFRESH → refresh token expiry
  • CAP_REVOKE → revoke token_id

Leases (illustrative; resource-specific):

  • LEASE_ALLOC, LEASE_FREE, LEASE_RENEW, LEASE_QUERY
  • NVME_BIND, NVME_UNBIND, NVME_RENEW
  • GPU_BIND, GPU_UNBIND, GPU_RENEW
  • CXL_BIND, CXL_UNBIND, CXL_RENEW (extension)

12. Data Plane Bindings

fabricBIOS returns binding credentials and endpoint descriptors. It does not implement bulk transfer.

12.1 RDMA Binding (Example)

transport : u8
gid : 16B
qp_type : u8
qp_num : u32
psn : u32
mtu : u16
rkey : u32
remote_addr : u64
length : u64
vendor_data : bytes (opaque)

remote_addr is a remote address in the RNIC registration context, not a CPU virtual address.

12.2 NVMe-oF Binding (Example)

transport : u8
address : 16B IPv6
port : u16
nqn : bytes
controller_id : u16
namespace_id : u32
auth_key : bytes (optional)

12.3 GPU Binding (Vendor)

Vendor protocol endpoint + metadata blob.

12.4 CXL Binding (Extension)

A CXL binding MAY include:

  • switch/port identifiers
  • mapping window identifiers
  • decoder configuration references
  • required authentication material (if any)

Exact format is platform-specific until standardized.


13. Security Model

13.1 Threats and Mitigations

  • Spoofed ANNOUNCE/WITHDRAW → signed + CA verification
  • Token forgery → Ed25519 signatures
  • Replay across principals → audience binding
  • Replay in time → nonce checks + replay caches
  • Stale data-plane access → lease expiry + teardown
  • MITM control → authenticated reliable transport for leases
  • DoS unauth → rate limiting + signature-before-decompress

13.2 Mandatory Requirements

A conforming implementation MUST:

  1. Sign ANNOUNCE/WITHDRAW/REVOKE_BROADCAST
  2. Verify tokens, audience binding, and expiry on control ops
  3. Verify signatures before decompression
  4. Enforce nonce validity and replay cache behavior
  5. Enforce lease expiry teardown; fence on teardown failure
  6. Support one of the secure control profiles (FULL is the normative default; COMPAT/PROXIED only where constrained)

14. Discovery Scaling Guidance

  • Small L2 fabrics (≤100 nodes): link-local multicast may be sufficient.
  • Routed fabrics (typical leaf-spine): relays are strongly recommended and often required.
  • Large fabrics (1000+): deploy relays (often per rack/ToR), aggregate, and answer SOLICIT; do not rely on multicast as primary.

15. Relationship to Operating System (Compute Scheduling Across Nodes)

fabricBIOS is designed to support an OS that schedules compute across nodes by making CPU capacity/topology and locality discoverable and by providing secure discovery/trust/bootstrap so the OS can deploy its own compute control plane.

A composed OS (e.g., a resource-graph OS) can:

  • discover CPUs and locality
  • deploy OS agents to nodes
  • schedule work across nodes using its own execution model
  • bind remote memory/storage/accelerators via leases
  • react to withdrawals and fenced resources deterministically

16. Implementation Guidance

16.1 Size Expectations (Typical)

PlatformTypical Size
Reference daemon1–5 MB
DPU firmware200 KB–1 MB
Minimal embedded100–500 KB

16.2 Deployment Targets

Linux daemon (dev/test), DPU (primary), BMC (FULL preferred; PROXIED where constrained), FPGA (research).

16.3 MVP Milestones

  1. ANNOUNCE/SOLICIT + relay
  2. CAP_REQUEST/CAP_REFRESH
  3. LEASE_ALLOC with RDMA binding
  4. Lease renewal + expiry teardown + fencing on failure
  5. NVME_BIND next

Appendix A: Constants

Ports

  • 5700/UDP: discovery + limited control
  • 5701/QUIC: secure control (FULL profile, normative default)
  • 5701/TCP: legacy migration only (COMPAT profile)

Multicast Groups

  • ff02::6662:696f:0001 (link-local)
  • ff05::6662:696f:0001 (site-local)
  • ff08::6662:696f:0001 (org-local)

Resource Types

  • CPU 0x0001
  • MEM 0x0002
  • GPU 0x0003
  • NVME 0x0004
  • FPGA 0x0005
  • PMEM 0x0006
  • CXL_MEM 0x0007
  • VENDOR 0x00FF