Skip to content

fabricBIOS Resource Types

This document describes each resource type supported by fabricBIOS: wire encoding, enforcement mechanisms, lease lifecycle, and failure modes.

Overview

fabricBIOS defines a set of resource types that nodes advertise in their inventory. Each resource has a type code, capacity unit, and a set of operations for lease management and data-plane access.

See Premium Dataplane Methodology for the canonical reference on fabricBIOS’s premium dataplane model (RDMA, NVMe-oF, SR-IOV, GPU, CXL).

CodeConstantTypeCapacity UnitStatus
0x0001RES_TYPE_CPUCPUCore countAdvertised; Linux topology-aware reservation implemented
0x0002RES_TYPE_MEMMEMBytesFull data-plane (FBMU, RDMA)
0x0003RES_TYPE_BLOCKBLOCKBytes (or sectors)Full data-plane (FBBU); premium target/session bindings for NVMe-oF
0x0004RES_TYPE_NETNETBits/secAdvertised; Linux macvlan and SR-IOV support implemented
0x0005RES_TYPE_GPUGPUVRAM bytes / compute unitsAdvertised; HIP/ROCm path implemented, CUDA discovery partial
0x0010RES_TYPE_SCHEDULERSCHEDULERHTTP portService discovery via ANNOUNCE; capacity=port, available=1
0x00FFVENDORVendor-definedExtension space

Wire Encoding

Flat Inventory Format

Resources are advertised in GET_INVENTORY responses using a flat wire format:

count : u16
repeated count times:
resource_id : u128 (16 bytes)
resource_type : u16 (2 bytes)
flags : u16 (2 bytes)
capacity : u64 (8 bytes)
available : u64 (8 bytes)

Each entry is 36 bytes. All integers are big-endian.

Structured Inventory Format

The ResourceInventory structure provides typed fields per resource category:

node_id : [u8; 32]
epoch : u64
health : u32
mem_arenas : list of MemArena
block_devs : list of BlockDev
cpu_caps : list of CpuCap
nics : list of NicInfo
gpus : list of GpuInfo

Health and Flag Bits

Resource flags (flat format, flags field):

BitNameMeaning
0FENCEDResource is fenced; no new leases granted
1DEGRADEDResource is usable but with reduced guarantees

Health constants (structured format):

ValueConstantMeaning
0x00HEALTH_OKResource is healthy
0x01HEALTH_DEGRADEDDegraded but usable
0x02HEALTH_FENCEDFenced, not leasable

MEM (0x0002) — Memory

Description

Memory resources represent addressable memory regions that can be read and written by remote clients. On Linux, these are backed by mmap’d anonymous memory or file-backed regions. On Pi5 bare-metal, these are DMA arenas carved from physical DRAM.

Inventory Fields (MemArena)

resource_id : u128
base : u64 (base address within node)
len : u64 (size in bytes)
align : u32 (alignment requirement)
max_read : u32 (maximum read size per operation)
max_write : u32 (maximum write size per operation)
health : u32

Enforcement Mechanisms

PlatformBackingEnforcement
Linux (fabricbiosd)mmap anonymous / fileLease lookup per request; dp_key HMAC verification
Pi5 bare-metalStatic 65KB DMA arena (8KB per lease slot, 8 slots)Per-lease dp_key + nonce replay cache (64-entry window)
Future: RDMAibverbs memory regionrkey rotation on expiry; QP teardown

Lease Lifecycle

  1. LEASE_ALLOC (QUIC control): Creates a lease with specified duration and grace period. Returns binding TLVs: lease_id, dp_key, endpoint (IP:port), limits (offset + length).
  2. LEASE_RENEW (QUIC control): Extends lease expiry. Must be called within renewal window (last 20% of lease duration).
  3. LEASE_QUERY (QUIC control): Returns current lease status (ACTIVE, EXPIRED, REVOKED) and expiry time.
  4. Data-plane I/O (FBMU on UDP 5702):
    • HELLO: Presents lease_id, receives resource_len and I/O limits.
    • READ: offset + length -> data.
    • WRITE: offset + length + data -> status.
    • Each request carries lease_id, nonce, auth_tag (HMAC-SHA256 of header using dp_key).
  5. LEASE_FREE (QUIC control): Explicitly releases the lease. Subsequent data-plane ops return NO_LEASE.
  6. Expiry: Background tick_leases scan evicts expired leases. Data-plane ops on expired leases return NO_LEASE.

Failure Modes

  • EXPIRED: Lease TTL elapsed. Data-plane returns FBMU_STATUS_NO_LEASE.
  • REPLAY: Duplicate nonce detected. Data-plane returns FBMU_STATUS_REPLAY.
  • OUT_OF_RANGE: Read/write exceeds allocated region. Returns FBMU_STATUS_RANGE.
  • INVALID: Bad auth_tag or unknown lease. Returns FBMU_STATUS_INVALID.
  • FENCED: Teardown failure. Resource reports FENCED in discovery; no new leases.

BLOCK (0x0003) — Block Storage

Description

Block storage resources expose a fixed-size block device with sector-aligned read/write access. On Linux, these are backed by files or raw block devices with O_DIRECT. On Pi5 bare-metal, the NVMe HAT (PCIe1) provides a real NVMe SSD, or an SD card can serve as block backing.

Inventory Fields (BlockDev)

resource_id : u128
block_size : u32 (bytes per block, typically 512)
capacity : u64 (total capacity in bytes or sectors)
flags : u32 (BLOCK_RO = bit 0, BLOCK_REMOVABLE = bit 1)
health : u32

Enforcement Mechanisms

PlatformBackingEnforcement
Linux (fabricbiosd)File (O_DIRECT) or raw block deviceLease lookup per request; dp_key HMAC verification
Pi5 bare-metal (NVMe)NVMe SSD via PCIe1Per-lease dp_key + bounds check (LBA range)
Pi5 bare-metal (SD)SD card via EMMC/SDHCIPer-lease dp_key + bounds check
Future: NVMe-oFnvmet kernel targetTarget/session lifecycle managed on Linux; steady-state remote I/O proof still pending

Lease Lifecycle

  1. LEASE_ALLOC with BLOCK type (QUIC control): Allocates a block lease. Returns binding TLVs with endpoint (IP:port for FBBU) and limits (LBA start + count).
  2. Data-plane I/O (FBBU on UDP 5703):
    • HELLO: Presents lease_id, receives block_size, device_block_cnt, max_blocks_per_io.
    • READ_BLOCK: block_index + block_count -> block_data.
    • WRITE_BLOCK: block_index + block_count + block_data -> status.
    • Each request carries lease_id, nonce, auth_tag.
  3. LEASE_FREE: Releases the block lease.
  4. Expiry: Same as MEM — background scan evicts expired leases.

Failure Modes

  • Same status codes as FBMU (NO_LEASE, RANGE, REPLAY, INVALID, FENCED).
  • BLOCK_RO: Write operations fail on read-only devices.

CPU (0x0001) — Compute

Description

CPU resources advertise compute capacity, topology, and CPU lease policy. fabricBIOS does not execute workloads on CPUs directly, but on Linux it does enforce CPU leases with topology-aware cpuset.cpus placement so higher layers such as grafOS can rely on whole-core isolation defaults.

Inventory Fields (CpuCap)

resource_id : u128
cores : u16
threads : u16
max_mhz : u32
flags : u32 (bits 16..23: arch, bits 0..15: features, bits 24..27: isolation/topology policy)
health : u32

cores is the number of physical core groups exported under the current CPU lease policy. threads is the number of online logical CPUs.

Architecture Flags (bits 16..23)

ValueConstantArchitecture
0x01ARCH_AARCH64AArch64 (ARM64)

Feature Flags (bits 0..15)

BitConstantFeature
0CPU_FEAT_TASKLET_HOSTCan host WASM/ELF tasklets
1CPU_FEAT_NEONARM NEON SIMD
2CPU_FEAT_CRC32CRC32 hardware acceleration
3CPU_FEAT_TOPOLOGY_AWARE_LEASINGCPU leases are enforced with topology-aware placement

Isolation / Topology Flags (bits 24..27)

BitsConstantMeaning
24..26CPU_ISOLATION_BEST_EFFORTLease allocation may split SMT siblings; density preferred over isolation
24..26CPU_ISOLATION_WHOLE_CORELease allocation uses physical-core groups and keeps SMT siblings together
24..26CPU_ISOLATION_STRICTLease allocation requires single-thread-per-core groups; fails closed on SMT-sharing hosts
27CPU_TOPOLOGY_COMPLETESibling topology was detected for all online CPUs

Enforcement Mechanisms

PlatformMechanismStatus
Linux (fabricbiosd)Topology-aware cpuset.cpus leasing with explicit isolation policyImplemented
Pi5 bare-metalCPU reservation (advertise-only)Advertised, no enforcement yet

Lease Lifecycle

On Linux, CPU leases reserve CPU capacity and create a per-lease cpuset cgroup. The default policy is whole_core, which allocates whole physical-core groups and prevents different leases from landing on SMT siblings of the same core. best_effort is an explicit opt-in density mode, while strict fails closed unless the host exposes single-thread-per-core groups.

If topology data is incomplete, whole_core and strict deny new CPU leases rather than silently degrading to flat logical-CPU placement. best_effort remains available as the explicit degraded policy.


GPU (0x0005) — Accelerator

Description

GPU resources advertise accelerator capacity (VRAM, compute units). On Linux, these can be backed by AMD GPUs via HIP/ROCm (behind gpu-hip feature flag). NVIDIA discovery is implemented via NVML; the broader CUDA compute path remains partial.

Inventory Fields (GpuInfo)

resource_id : u128
vram_bytes : u64 (VRAM capacity in bytes)
compute_units : u32 (CUDA cores / stream processors)
flags : u32 (GPU_CUDA = bit 0, GPU_ROCM = bit 1)
health : u32

GPU Flags

BitConstantVendor
0GPU_CUDANVIDIA CUDA
1GPU_ROCMAMD ROCm

Share Modes

GPU resources support two lease sharing policies, controlled by the --gpu-share-mode flag on fabricbiosd control-server:

ModeCLI valueBehaviorDefault
ExclusiveexclusiveAt most one active lease per GPU. Second LEASE_ALLOC returns CapacityExceeded.Yes
FractionalfractionalMultiple concurrent leases subject to VRAM capacity accounting.No

Default: exclusive. Non-partitioned GPUs (RX 6600 class) do not provide hardware isolation between tenants, so exclusive leasing is the only mode that provides strong isolation guarantees. Fractional mode is intended for future use when hardware partitioning (MIG/vGPU/SR-IOV-class semantics) is available.

When a LEASE_ALLOC is denied due to exclusive mode, the response status is CapacityExceeded (same as VRAM exhaustion). The client should free the existing lease before attempting a new allocation.

Enforcement Mechanisms

PlatformMechanismStatus
Linux (fabricbiosd, gpu-hip)HIP device submitImplemented behind feature flag
Linux (fabricbiosd, gpu-cuda)NVML discovery / partial CUDA pathDiscovery implemented; compute path partial
Pi5 bare-metalN/ANo GPU on Pi5

Lease Lifecycle

GPU lease management follows the same pattern as MEM: allocate, bind, use, expire/revoke. The data-plane binding is vendor-specific (HIP/ROCm API calls). Lease expiry triggers session/context teardown.


NET (0x0004) — Network Interface

Description

Network resources advertise NIC capacity (link speed, MTU). On Linux, these can be backed by macvlan or SR-IOV virtual functions.

Inventory Fields (NicInfo)

resource_id : u128
speed_bps : u64 (link speed in bits per second)
mtu : u32
flags : u32 (NIC_SRIOV = bit 0, NIC_LINK_UP = bit 1)
health : u32

NIC Flags

BitConstantFeature
0NIC_SRIOVSR-IOV capable
1NIC_LINK_UPLink is up

Enforcement Mechanisms

PlatformMechanismStatus
Linux (fabricbiosd)macvlan / SR-IOV VF assignmentImplemented; real SR-IOV hardware validation still pending
Pi5 bare-metalN/ASingle NIC, not subdivided

Lease Lifecycle

NET leases assign a VF or macvlan interface for the lease duration on Linux. Expiry tears down the virtual interface. End-to-end SR-IOV passthrough validation on real hardware remains a future milestone.


Common Patterns

Binding TLV Format

All data-plane bindings use a common TLV encoding:

TLV_LEASE_ID (0x0101) : u128 lease identifier
TLV_DP_KEY (0x0102) : [u8; 32] per-lease HMAC key
TLV_UDP_ENDPOINT (0x0103) : [u8; 16] IPv6 addr + u16 port
TLV_LIMITS (0x0104) : u64 offset + u64 length

RDMA bindings add:

TLV_RDMA_RKEY (0x0201) : u32 remote key
TLV_RDMA_REMOTE_ADDR (0x0202) : u64 remote address
TLV_RDMA_QP_NUM (0x0203) : u32 queue pair number
TLV_RDMA_GID (0x0204) : [u8; 16] GID
TLV_RDMA_PORT (0x0205) : u8 port number

Fencing

All resource types follow the same fencing contract:

  1. Lease expires or is revoked.
  2. Node attempts data-plane teardown (invalidate rkey, close session, destroy QP, etc.).
  3. If teardown succeeds: resource is available for new leases.
  4. If teardown fails: resource enters FENCED state.
    • No new leases granted.
    • Resource reports FENCED flag in discovery.
    • Remediation (reset, power cycle) required to clear fenced state.

Auto-Detection (Linux)

When fabricbiosd control-server --auto-detect is used, the Linux platform detects:

  • Memory: Total system memory from /proc/meminfo.
  • CPU: Core/thread count, frequency from /proc/cpuinfo and /sys/devices/system/cpu/.
  • NICs: Network interfaces from /sys/class/net/, including speed, MTU, SR-IOV VF count.
  • GPUs: AMD GPUs via HIP (when gpu-hip feature enabled), NVIDIA GPUs via sysfs.
  • Block devices: From /sys/block/, including size and read-only status.