fabricBIOS wire encoding (v0)
This document defines the normative v0 wire encoding used by the Rust reference implementation. It removes ambiguity around variable-length fields, fragmentation metadata, and version/flag handling so independent implementations can interoperate.
Conventions
All integers are big-endian.
Variable-length fields (normative v0)
bytes := u32 len + len bytes(reject if len exceeds implementation limit).list := u16 count + repeated items(reject if count exceeds implementation limit).tlv := u8 type + u16 len + len bytes.- Unknown TLVs must be skippable (do not treat as fatal).
- Optional fields use a
u8 presentflag (0=absent,1=present) followed by the field bytes.
Wire version and flags (normative v0)
- Unknown protocol versions MUST be rejected (fail-closed).
- Unknown flag bits MUST be rejected (fail-closed).
- Fragmentation metadata is present only when
FRAG_V2is set (see below).
Fragmentation (normative v0)
Fragmentation uses CONTINUED/FINAL plus the FRAG_V2 flag and fragment metadata fields.
When FRAG_V2 is set, the header includes:
frag_offset : u32 (byte offset within the full payload)frag_total_len : u32 (total unfragmented payload length)This extends the header by 8 bytes; payload length remains the fragment payload only.
Rules:
- Fragments are keyed by
(sender, request_id). - Fragments with the same key MUST agree on
msg_type,nonce, cleared flags, andfrag_total_len(if present); otherwise the reassembly MUST be dropped. frag_offset + payload_lenMUST NOT exceedfrag_total_len.FINALMUST be set on the fragment wherefrag_offset + payload_len == frag_total_len.- Overlapping fragments MUST be rejected.
CONTINUEDandFINALMUST NOT both be set in the same frame.- Implementations SHOULD bound in-flight reassembly and drop incomplete entries after a short timeout to avoid unbounded memory growth.
Capability negotiation / fallback:
- A requester advertises support by setting
FRAG_V2on its request frame. - A responder MUST use
FRAG_V2fragments only if the requester advertised support; otherwise it falls back to legacyCONTINUED/FINALfragments with no offset metadata.
Relay discovery profile (v0)
Relays answering SOLICIT MUST use MsgType::RESPONSE with the following payload encoding:
count : u16repeated count times: announce_payload_bytes : bytes (u32 length + bytes of AnnouncePayload encoding)Rules:
- Inventory ordering is implementation-defined; clients MUST treat the list as an unordered
snapshot and MAY sort by
node_idfor stable presentation. - Relays SHOULD include at most one entry per
node_id, choosing the most recently observed ANNOUNCE (or highestsequencewhen available). - Inventory may be truncated to fit relay caps; absence of a node in a RESPONSE does not imply withdrawal or fencing.
WITHDRAW payload
| Field | Type | Bytes | Description |
|---|---|---|---|
| node_id | u128 | 16 | Node being withdrawn |
| sequence | u64 | 8 | Monotonic sequence number |
| reason | u8 | 1 | Reason code (see below) |
Total: 25 bytes.
Reason codes:
| Code | Name |
|---|---|
| 0x00 | GRACEFUL_SHUTDOWN |
| 0x01 | MAINTENANCE |
| 0x02 | RESOURCE_EXHAUSTION |
| 0x03 | POLICY |
For backwards compatibility, decoders SHOULD accept 16-byte payloads (old format) as node_id only with sequence=0, reason=GRACEFUL_SHUTDOWN.
SOLICIT filters (v0)
SOLICIT includes a query_type and a list of fixed-size filters (field, op, value[32]).
Filter field ID ranges:
0x00reserved0x01..=0x3Fcore registry (this doc)0x40..=0x7Fexperimental/extension0x80..=0xFFvendor-specific
Core registry fields:
field=1RESOURCE_TYPE:value[0..2]= BEu16(e.g., CPU=0x0001, MEM=0x0002); opEQfield=2NODE_ID:value[0..16]= BEu128; opEQfield=3SITE_ID:value[0..4]= BEu32; opsEQ/GT/LTfield=4ROW_ID:value[0..4]= BEu32; opsEQ/GT/LTfield=5RACK_ID:value[0..4]= BEu32; opsEQ/GT/LTfield=6LOCALITY_CUSTOM:value[0..32]opaque; opsEQ/CONTAINS(non-zero bytes must match)field=7RESOURCE_FLAGS:value[0..2]= BEu16bitmask; opsEQ/CONTAINS(FENCED=0x0001,DEGRADED=0x0002;CONTAINSrequires all bits set)field=6LOCALITY_CUSTOM:value[0..32]is a 32-byte value; forCONTAINS, non-zero bytes act as a position-wise mask
Current reference semantics:
- Filters are applied with AND semantics.
RESOURCE_TYPEis evaluated against theResourceSummary[]list (must exist a resource that matches all resource-scoped filters).NODE_ID/ locality filters are evaluated against theANNOUNCEpayload.
ResourceSummary flags (implementation note)
ResourceSummary.flags is a u16 bitfield.
- bit0
FENCED: resource is fenced and must not be leased. - bit1
DEGRADED: resource is usable but degraded.
All other bits are reserved and must be zero in the reference implementation.
Capability token caveats (implementation note)
The spec names caveat types but does not assign numeric type codes. The current reference assigns:
TIME_BOUND= 1:data = u64 start_be || u64 end_beSOURCE_IP= 2:data = [u8; 16](IPv6) or[u8; 4](IPv4)RANGE= 3:data = u64 offset_be || u64 len_beRATE_LIMIT= 4:data = u32 max_per_sec_be(exposed as a requirement for the caller to enforce)DEPTH= 5:data = u8 max_delegation_edgesAUDIENCE= 6:data = u128 audience_be(additional audience constraint)
Unknown caveat types are rejected by the reference verifier.
Revocation broadcast payload (implementation note)
The spec defines REVOKE_BROADCAST as:
issuer : u128token_ids : u128[]until : u64The reference uses list encoding count:u16 + repeated u128.
Certificate encoding (implementation note)
fabricbios_core::identity::Certificate is encoded as:
subject : u128issuer : u128issued_at : u64expires_at : u64public_key : [u8; 32]extensions : tlv[] (list encoding with `count:u16`)signature : [u8; 64]Control plane ops (implementation note)
For early development, the reference originally implemented a minimal control-plane on TCP using the common
Frame header and these payload encodings.
Current Pi5 bare-metal direction uses QUIC over UDP for the control plane (no TCP). The message payload schemas remain useful, but the transport framing is QUIC-stream based. See:
docs/platform/pi5/design-doc-1-quic-controlplane-udp-dataplanes.md
REQUEST payload
This is the normative v0 wire encoding for control REQUEST payloads. Design-level transport
and authentication requirements are in docs/spec/fabricBIOS-design-document.md (Section 11)
and docs/spec/fabricBIOS-design-document-v1.1.md (Section 12).
op : u16resource_id : u128token : bytesparams : bytespresenter_id : u128 (REQUIRED on UDP control; OPTIONAL on QUIC — peer identity is TLS-authenticated)presenter_sig: [u8; 64] (REQUIRED on UDP control; OPTIONAL on QUIC — peer identity is TLS-authenticated)RESPONSE payload
status : u16op : u16result : bytesDev op codes and params
These are not yet part of the normative spec; they are a dev scaffold:
PING (0x0001): params empty, resultu64 uptime_secsGET_IDENTITY (0x0003): params empty, resultu128 node_id || [u8;32] controller_pubkeyGET_INVENTORY (0x0002): params empty, resultResourceSummary[](list encoding)ENROLL_REQUEST (0x0004): paramsu128 node_id || [u8;32] public_key, resultCertificatebytesCAP_REQUEST (0x0100): paramsu32 perms || u32 ttl_secs || u128 audience, resultCapabilityTokenbytesCAP_REFRESH (0x0101): paramsu32 ttl_secs, request token is the token to refresh, resultCapabilityTokenbytesCAP_REVOKE (0x0102): paramsu128 token_id || u32 ttl_secs, request token must authorize revocation, resultRevokeBroadcastbytesLEASE_ALLOC (0x0200): paramsu32 duration_secs || u32 grace_secs [|| u16 resource_type [|| u64 requested_bytes]], resultu128 lease_id || u64 expires_at || binding TLVsresource_type(optional, 10+ bytes):0x0001=CPU,0x0002=MEM,0x0003=BLOCK,0x0004=NET,0x0005=GPU,0x0010=SCHEDULER. Default: MEM.RES_TYPE_SCHEDULER(0x0010): used in ANNOUNCE payloads by the grafos-scheduler-service. Capacity field carries the HTTP API port. Discovery is visibility only — trust requires explicit--scheduler-url.requested_bytes(optional, 18+ bytes): desired allocation size in bytes. Server uses its default (1 MiB) if absent.- Response binding TLVs include
TLV_LIMITS (0x0104)whoselenfield reflects the actual granted region size.
LEASE_RENEW (0x0202): paramsu128 lease_id || u32 duration_secsLEASE_FREE (0x0201): paramsu128 lease_idLEASE_QUERY (0x0203): paramsu128 lease_id, resultu8 lease_status || u64 expires_atlease_statusvalues:0=ACTIVE,1=EXPIRED,2=REVOKED
GET_THERMAL (0x0006): params empty, result TLV sequence (extensible)- Response TLV tags (0x0900 range):
0x0900(4 bytes, u32):age_ms— milliseconds since sensor was last sampled at response-generation time0x0901(4 bytes, i32):soc_temp_milli_c— SoC temperature in milli-°C0x0902(1 byte, u8):soc_status— sensor status:0=OK,1=Unavailable,2=Stale0x0903(1 byte, u8):throttled_bits— bit 0 = thermal throttled, remaining bits reserved0x0904(4 bytes, i32):nvme_temp_milli_c— NVMe temperature in milli-°C (optional)0x0905(1 byte, u8):nvme_status— NVMe sensor status (same values as soc_status)
- Missing TLVs indicate “not supported on this platform” (not a protocol error).
- Unknown TLV tags must be skipped by consumers (forward-compatible).
- Temperature validity bounds: -40 000 to 150 000 milli-°C. Values outside this range are malformed.
- Older nodes without
GET_THERMALreturn a standard error response; consumers must handle this gracefully.
- Response TLV tags (0x0900 range):
Scheduler fencing ops
These ops allow an external scheduler to install and query epoch fences on
managed nodes. Once a fence is installed, CAP_REQUEST on that node requires
a matching epoch trailer.
-
SCHEDULER_FENCE_SET (0x0009): paramsu64 new_epoch- Installs the exact epoch value
new_epochon the node. - The node stores the epoch verbatim (not
current + 1). - If a fence is already installed,
new_epochMUST be strictly greater than the installed epoch; otherwise the node returnsSTALE_EPOCH(0x09). - If no fence is installed, any non-zero epoch is accepted as the initial installation.
- Result: empty on success.
- Installs the exact epoch value
-
SCHEDULER_FENCE_GET (0x000A): params empty- Returns the current fence state of the node.
- Result:
u8 installed || u64 epochinstalled:0= no scheduler fence installed,1= fence installed.epoch: the currently installed epoch value.0wheninstalled = 0.
Lease list op
LEASE_QUERY (0x0203) is an existing per-lease targeted probe that returns
the status of a single lease by ID. LEASE_LIST_ACTIVE is a new bulk
snapshot operation that returns all active leases on the node.
LEASE_LIST_ACTIVE (0x0208): params empty- Returns all active leases as a list.
- Result encoding:
count : u16repeated count times:lease_id : u128resource_id : u128holder : u128expires_at : u64alloc_bytes : u64
- Each entry is 64 bytes. The list uses standard
u16count prefix encoding. - Nodes SHOULD return a consistent snapshot (no new leases created or expired mid-response). If the node cannot guarantee atomicity, the response is best-effort and the caller MUST tolerate minor staleness.
Control-plane status codes
| Code | Name | Description |
|---|---|---|
| 0x0000 | OK | Success |
| 0x0001 | INVALID_TOKEN | Token is malformed, expired, or revoked |
| 0x0002 | INSUFFICIENT_PERM | Token lacks required permission |
| 0x0003 | RESOURCE_NOT_FOUND | Target resource does not exist |
| 0x0004 | RESOURCE_BUSY | Resource is in use and cannot be modified |
| 0x0005 | CAPACITY_EXCEEDED | Node cannot satisfy the requested allocation |
| 0x0006 | LEASE_EXPIRED | The referenced lease has expired |
| 0x0007 | RATE_LIMITED | Too many requests from this sender |
| 0x0008 | RESOURCE_FENCED | Resource is fenced (teardown failure or admin action) |
| 0x0009 | STALE_EPOCH | The scheduler epoch in the request is older than the installed fence |
| 0x000A | FENCE_REQUIRED | The node requires a scheduler fence but none is installed |
| 0x00FF | INTERNAL_ERROR | Unrecoverable internal error |
Scheduler substrate state model
The scheduler maintains the following state categories. These are not on-wire types but are referenced by the protocol extensions above and by scheduler service implementations.
-
PendingGrant: The scheduler has issued authority (reserved capacity and quota) for a lease, but no real lease exists on the fabric yet. The client has not yet called
LEASE_ALLOCor has not confirmed the result. Pending grants expire if confirmation does not arrive within the token TTL window. -
ConfirmedLease: The client has reported back a real
lease_idfrom a successfulLEASE_ALLOC. The scheduler can now attribute this lease to a tenant and policy domain, and it becomes eligible for preemption and lifecycle tracking. -
UnattributedLease: A node reports an active lease (via
LEASE_LIST_ACTIVE) that the scheduler cannot map to any known pending grant or confirmed lease. This arises from direct client allocations, stale scheduler state, or crash recovery gaps. Unattributed leases consume node capacity but are excluded from tenant quota and automatic preemption until they are explicitly attributed or cleared by an operator. -
LeaderEpoch: A monotonically increasing
u64epoch counter identifying the current scheduler leader’s term. Each promotion increments the epoch and installs it on all managed nodes viaSCHEDULER_FENCE_SET. Token minting on managed nodes requires the request to carry the currently installed epoch. -
SchedulerRole:
Standby: Read-only. The scheduler replays its WAL and reconciles state but does not serve mutating APIs (lease admission, token minting, preemption). A standby can be promoted to active via manual intervention.Active: Serves all mutating APIs. Promotion requires WAL replay, reconciliation, and successful epoch fence installation on all healthy managed nodes.
Dev-only service/network scaffold (lease-bound ingress TCP proxy):
SVC_TCP_BIND (0x0400): paramsu32 duration_secs || u16 backend_port, resultu128 lease_id || u64 expires_at || u16 ingress_portingress_portis the leased TCP listener port on localhost (127.0.0.1) that proxies tobackend_portwhile the lease is active.- The request token must include
WRITEpermission.
SVC_TCP_QUERY (0x0403): paramsu128 lease_id, resultu8 lease_status || u64 expires_at || u16 ingress_portingress_portis0if unknown/not tracked by the implementation (the socketed simulator tracks it while the gate is active).lease_statusvalues:0=ACTIVE,1=EXPIRED,2=REVOKED
Tasklet execution ops
These ops submit and manage WASM tasklet invocations on fabric nodes. The tasklet executor runs a sandboxed WASM module with fuel-metered execution, capability-scoped resource access, and configurable limits.
Tasklet status codes (u8):
| Code | Name |
|---|---|
| 0 | OK |
| 1 | INVALID |
| 2 | UNAUTHORIZED |
| 3 | FUEL_EXHAUSTED |
| 4 | OOM |
| 5 | EXEC_FAILED |
| 6 | HOSTCALL_LIMIT |
| 7 | UNSUPPORTED_PROFILE |
| 8 | MODULE_NOT_FOUND |
| 9 | UNKNOWN |
| 10 | NOT_FOUND |
| 11 | UNSUPPORTED |
| 12 | FINISHED |
| 13 | LISTENER_REVOKED |
| 14 | LISTENER_FENCED |
| 15 | MAX_SESSIONS |
| 16 | SESSION_CLOSED |
Auxiliary structures:
CapEntry (20 bytes):
| Field | Type | Bytes | Description |
|---|---|---|---|
| cap_token | u128 | 16 | Capability token granting access to a lease resource |
| rights | u32 | 4 | Bitmask of rights granted for this token |
Rights bitmask bits:
| Bit | Name |
|---|---|
| 0 | QUERY |
| 1 | REVOKE |
| 2 | FBMU_ALLOC |
| 3 | FBMU_LEASE_IO |
| 4 | FBBU_ALLOC |
| 5 | FBBU_LEASE_IO |
| 6 | SVC_LISTEN |
| 7 | SVC_SESSION_IO |
TaskletLimits (20 bytes):
| Field | Type | Bytes | Description |
|---|---|---|---|
| max_fuel | u64 | 8 | WASM fuel units (instruction budget) |
| max_linear_memory | u32 | 4 | Max WASM linear memory in bytes |
| max_output_bytes | u32 | 4 | Max output bytes from tasklet_run |
| max_hostcalls | u32 | 4 | Total hostcall invocations allowed |
-
TASKLET_SUBMIT (0x0500): Submit a WASM tasklet for execution.Request params:
Field Type Bytes Description resource_id u128 16 MEM resource (WASM linear memory arena) cpu_resource_id u128 16 CPU resource (compute time); zero = legacy module_sha256 [u8; 32] 32 SHA-256 of the WASM module wasm bytes 4 + len WASM module bytes (empty = submit-by-hash) caps list(CapEntry) 2 + N*20 Capability entries granted to the tasklet input bytes 4 + len Input message (typically TaskletInputV0) limits TaskletLimits 20 Resource limits for this invocation Response result:
Field Type Bytes Description negotiated_profile_version u16 2 Profile version the runtime negotiated status u8 1 Tasklet status code tasklet_id u64 8 Node-assigned invocation ID output bytes 4 + len Output from tasklet_run log_count u16 2 Optional trailer: number of fb_logrecords; absent in legacy responseslogs list(LogRecord) log_count entries Optional trailer entries; omitted when log_countis absentLogRecordentries are encoded as:Field Type Bytes Description level u8 1 fb_loglevel supplied by the taskletmessage bytes 4 + len Raw message bytes after hostcall-boundary clamping Decoders MUST accept legacy
TASKLET_SUBMITresponses that end immediately afteroutputand treat them aslog_count = 0. Encoders in this version SHOULD include the trailer, even whenlog_count = 0, so downstream cell schedulers can persist first-class run log artifacts without inferring logs from the tasklet output payload. -
TASKLET_STATUS (0x0501): Query the execution status of a submitted tasklet.Request params:
Field Type Bytes Description tasklet_id u64 8 Tasklet invocation ID Response result:
Field Type Bytes Description status u8 1 Tasklet status code -
TASKLET_FETCH_RESULT (0x0502): Fetch the output of a completed tasklet.Request params:
Field Type Bytes Description tasklet_id u64 8 Tasklet invocation ID Response result:
Field Type Bytes Description status u8 1 Tasklet status code output bytes 4 + len Tasklet output data -
TASKLET_CANCEL (0x0503): Cancel a running tasklet.Request params:
Field Type Bytes Description tasklet_id u64 8 Tasklet invocation ID Response result:
Field Type Bytes Description status u8 1 Tasklet status code
GPU operations
These ops submit compiled GPU kernels for execution on leased GPU devices.
GPU status codes (u8):
| Code | Name |
|---|---|
| 0 | OK |
| 1 | INVALID |
| 2 | UNAUTHORIZED |
| 3 | LOAD_FAILED |
| 4 | LAUNCH_FAILED |
| 5 | UNSUPPORTED |
| 6 | READ_FAILED |
-
GPU_SUBMIT (0x0600): Load and launch a compiled GPU kernel (HSACO/PTX) on a leased GPU device.Request params:
Field Type Bytes Description resource_id u128 16 GPU resource from GET_INVENTORY binary bytes 4 + len Compiled GPU binary (HSACO for ROCm, PTX/cubin for CUDA) kernel_name u16 len + UTF-8 2 + len Kernel function name within the module grid_dim [u32; 3] 12 Grid dimensions (x, y, z) block_dim [u32; 3] 12 Block dimensions (x, y, z) args bytes 4 + len Serialized kernel arguments (packed contiguously) arg_sizes list(u32) 2 + N*4 Per-argument byte sizes (sum must equal args length) output_offset u64 8 Offset in lease GPU memory to read after kernel output_size u32 4 Bytes to copy device-to-host (0 = no output) Response result:
Field Type Bytes Description status u8 1 GPU status code submit_id u64 8 Node-assigned monotonic submit ID exit_code i32 4 Kernel exit/error code (0 = success); wire-encoded as u32 output bytes 4 + len Output data (device-to-host results)
Composite tasklet lease ops
These ops allocate and free composite leases that bundle CPU + MEM resources with affinity constraints, providing a single lease covering both compute and memory for tasklet execution.
Affinity values (u8):
| Code | Name | Description |
|---|---|---|
| 0 | SAME_NODE | CPU + MEM on the same physical node (default) |
| 1 | SAME_RACK | CPU on one node, MEM on a nearby node in the same rack (future) |
| 2 | ANY | Scheduler picks the cheapest combination (future) |
-
TASKLET_LEASE_ALLOC (0x0800): Allocate a composite CPU + MEM lease.Request params (20 bytes, fixed):
Field Type Bytes Description duration_secs u32 4 Requested lease duration in seconds grace_secs u32 4 Grace period before teardown after expiry num_cores u16 2 Number of CPU cores requested mem_bytes u64 8 Memory size in bytes affinity u8 1 TaskletAffinity value max_threads u8 1 Advisory only in Tasklet Profile v0; current runtimes keep execution width fixed at 1 Response result (66 bytes, fixed):
Field Type Bytes Description lease_id u128 16 Composite lease identifier expires_at u64 8 Lease expiry timestamp (seconds since epoch) mem_resource_id u128 16 Allocated MEM resource ID cpu_resource_id u128 16 Allocated CPU resource ID arena_size u64 8 Actual granted memory arena size in bytes num_cores u16 2 Actual granted core count -
TASKLET_LEASE_FREE (0x0801): Free a composite tasklet lease.Request params:
Field Type Bytes Description lease_id u128 16 Composite lease to free Response: standard control-plane RESPONSE with empty result on success.
WASM host ABI (implementation note, v0 in-place update)
The fabricbios_fbmu_v0 and fabricbios_fbbu_v0 WASM import modules keep the
legacy hello/read/write/get-size calls and now also expose lease-aware calls
in-place under the same v0 module names.
fabricbios_fbmu_v0 additions:
fbmu_alloc(min_bytes, lease_secs) -> lease_id + expires_at + arena_sizefbmu_query(lease_id) -> lease_status + expires_at + arena_sizefbmu_renew(lease_id, duration_secs) -> expires_atfbmu_free(lease_id)fbmu_write_lease(lease_id, offset, ptr, len)fbmu_read_lease(lease_id, offset, len, out_ptr)
fabricbios_fbbu_v0 additions:
fbbu_alloc(min_blocks, lease_secs) -> lease_id + expires_at + num_blocks + block_sizefbbu_query(lease_id) -> lease_status + expires_at + num_blocks + block_sizefbbu_renew(lease_id, duration_secs) -> expires_atfbbu_free(lease_id)fbbu_write_block_lease(lease_id, lba, ptr)fbbu_read_block_lease(lease_id, lba, out_ptr)
Lease status values used by query calls:
0 = ACTIVE1 = EXPIRED2 = REVOKED
bytes
Encoded as:
len : u32data: [u8; len]Lists
Encoded as:
count : u16items : repeated `count` timesTLV
Encoded as:
type : u8length : u16data : [u8; length]Optional TLV fields
Encoded as:
present : u8 (0/1)tlv : TLV (only if present=1)RDMA dataplane binding TLVs
RDMA (RoCE v2) binding credentials are returned to the client after lease creation. The client uses these fields to set up its own QP and perform RDMA READ/WRITE operations against the server’s registered memory region.
Tag range: 0x02xx.
| Tag | Name | Length | Description |
|---|---|---|---|
0x0201 | RDMA_RKEY | 4 | Remote key for RDMA access (big-endian u32) |
0x0202 | RDMA_REMOTE_ADDR | 8 | Virtual address of the registered memory region (big-endian u64) |
0x0203 | RDMA_QP_NUM | 4 | Queue Pair number on the server (big-endian u32) |
0x0204 | RDMA_GID | 16 | GID (Global Identifier) of the RDMA port (128-bit, IPv6-format) |
0x0205 | RDMA_PORT | 2 | RDMA port number on the server HCA (big-endian u16) |
Encoded using the standard TLV framing (u16 tag + u16 length + data).
Unknown TLV tags in the 0x02xx range MUST be skipped (not treated as fatal).
NVMe-oF dataplane binding TLVs
NVMe-oF binding credentials are returned to the client after lease creation.
The client uses these fields to connect to the NVMe-oF target subsystem
via nvme connect.
Tag range: 0x03xx.
| Tag | Name | Length | Description |
|---|---|---|---|
0x0301 | NVMEOF_NQN | variable | NVMe Qualified Name of the target subsystem (UTF-8) |
0x0302 | NVMEOF_TRADDR | variable | Transport address (IP address, UTF-8) |
0x0303 | NVMEOF_TRSVCID | 2 | Transport service ID (TCP port, big-endian u16) |
0x0304 | NVMEOF_TRTYPE | 1 | Transport type: 0=tcp, 1=rdma, 2=loop |
NQN format: nqn.2026-02.io.fabricbios:lease:<lease_id_hex_32>.
Encoded using the standard TLV framing (u16 tag + u16 length + data).
Unknown TLV tags in the 0x03xx range MUST be skipped (not treated as fatal).
The initial transport is TCP-only (trtype=0); RDMA transport support (trtype=1) is
available but optional and requires rdma-core on both sides.
SR-IOV dataplane binding TLVs
SR-IOV binding credentials identify a Virtual Function and its parent Physical Function. Used when a lease allocates an SR-IOV VF for direct device assignment.
Tag range: 0x05xx.
| Tag | Name | Length | Description |
|---|---|---|---|
0x0501 | SRIOV_VF_PCI_ADDR | 4 | PCI BDF address of the VF ([domain_hi, domain_lo, bus, devfn] where devfn = (device << 3) | function) |
0x0502 | SRIOV_VF_INDEX | 2 | VF index within the PF (big-endian u16) |
0x0503 | SRIOV_PF_PCI_ADDR | 4 | PCI BDF address of the parent PF (same encoding as SRIOV_VF_PCI_ADDR) |
Encoded using the standard TLV framing (u16 tag + u16 length + data).
Unknown TLV tags in the 0x05xx range MUST be skipped (not treated as fatal).
OFI (libfabric) dataplane binding TLVs
OFI binding credentials are returned to the client after lease creation when the server supports OpenFabrics Interfaces (libfabric). The binding specifies the access model (one-sided RMA or two-sided messaging) and endpoint address.
Tag range: 0x06xx.
| Tag | Name | Length | Description |
|---|---|---|---|
0x0601 | OFI_PROVIDER | variable | Provider name (UTF-8, e.g. “efa”, “tcp”, “verbs”) |
0x0602 | OFI_FABRIC_NAME | variable | Fabric name from fi_fabric_attr (UTF-8) |
0x0603 | OFI_EP_TYPE | 1 | Endpoint type: 1=FI_EP_MSG, 2=FI_EP_DGRAM, 3=FI_EP_RDM |
0x0604 | OFI_ADDR | variable | Endpoint address (opaque, from fi_getname) |
0x0605 | OFI_ACCESS_MODEL | 1 | Access model: 0x01=RMA (one-sided), 0x02=MSG (two-sided) |
0x0606 | OFI_CAPABILITY_FLAGS | 8 | Capability flags from fi_info->caps (big-endian u64) |
0x0607 | OFI_MAX_MSG_SIZE | 8 | Maximum message size (big-endian u64) |
0x0608 | OFI_PROTOCOL_VERSION | 1 | Two-sided message framing version (currently 1) |
0x0609 | OFI_LEASE_ID | 8 | Lease ID this binding is associated with (big-endian u64) |
Encoded using the standard TLV framing (u16 tag + u16 length + data).
Unknown TLV tags in the 0x06xx range MUST be skipped (not treated as fatal).
Two-sided messaging protocol (access_model=0x02)
When OFI_ACCESS_MODEL is 0x02 (MSG), data-plane operations use framed send/recv
with server-side lease validation on every request.
Request header (32 bytes, big-endian):
| Offset | Size | Field |
|---|---|---|
| 0 | 8 | lease_id |
| 8 | 8 | generation (high byte = opcode: 0x01=WRITE, 0x02=READ) |
| 16 | 8 | offset |
| 24 | 8 | len |
For WRITE requests, len bytes of payload follow the header.
Response (1 + N bytes):
| Offset | Size | Field |
|---|---|---|
| 0 | 1 | status: 0x00=OK, 0x10=LEASE_REVOKED, 0x11=LEASE_EXPIRED, 0x12=STALE_GENERATION |
| 1 | N | Payload (READ response data; empty for WRITE) |
The server MUST validate (lease_id, generation) against its lease table before
servicing any request. Rejection codes:
LEASE_REVOKED(0x10): lease has been revoked.LEASE_EXPIRED(0x11): lease has expired.STALE_GENERATION(0x12): generation counter does not match (indicates a stale client).
Request param TLV tags
Tag range: 0x04xx.
| Tag | Name | Size | Description |
|---|---|---|---|
0x0401 | REQUESTED_BYTES | 8 | Desired allocation size in bytes (big-endian u64) |
Used in LEASE_ALLOC request params as an optional extension. Currently encoded as a flat params extension at byte offset 10 rather than as a TLV wrapper. The tag constant is reserved for future TLV-based param encoding.
0x08xx — Inventory Tier TLVs
These TLVs extend the resource inventory with tier-level memory and GPU attachment
metadata. They use the standard TLV framing (u16 tag + u16 length + value).
Unknown tags in the 0x08xx range MUST be skipped (not treated as fatal).
TLV_MEM_TIER_INFO (0x0801)
Describes a single memory tier exported by a node.
TLV header:
tag : u16 = 0x0801length : u16 (value length in bytes)Value payload (all big-endian):
| Offset | Field | Type | Description |
|---|---|---|---|
| 0 | tier_kind | u16 | Memory tier classification (see below) |
| 2 | capacity_bytes | u64 | Total capacity in bytes |
| 10 | latency_class_ns | u32 | Approximate local access latency in nanoseconds |
| 14 | bandwidth_hint_bps | u64 | Approximate bandwidth hint in bits per second |
| 22 | page_granule_bytes | u32 | Minimum allocation granularity in bytes (e.g. 4096) |
| 26 | share_mode | u16 | Sharing mode (see below) |
| 28 | attach_domains_count | u16 | Number of attach domain entries |
| 30 | attach_domains | [u128; N] | NUMA/CXL/PCIe domain identifiers (N = attach_domains_count) |
| 30+N*16 | flags | u32 | Tier-specific flags (see below) |
Total value length: 34 + attach_domains_count * 16 bytes.
tier_kind values:
| Value | Name | Description |
|---|---|---|
| 0 | Dram | Standard DDR DRAM |
| 1 | Hbm | High Bandwidth Memory (HBM2/HBM3) |
| 2 | CxlExpander | CXL Type 3 memory expander |
| 3 | Pmem | Persistent memory (Intel Optane, CXL PM) |
| 4 | VramVisibleHost | GPU VRAM visible to host CPU via BAR |
| 5 | Block | Block storage (cold cache tier only, not CPU-addressable) |
Unknown tier_kind values MUST be rejected (InvalidValue).
share_mode values:
| Value | Name |
|---|---|
| 0 | EXCLUSIVE |
| 1 | SHARED_RO |
| 2 | SHARED_RW |
flags bits:
| Bit | Name | Description |
|---|---|---|
| 0 | MEM_TIER_ECC | ECC protection enabled |
| 1 | MEM_TIER_HOTPLUG | Hotpluggable memory |
| 2 | MEM_TIER_INTERLEAVED | Interleaved across channels |
TLV_GPU_ATTACH_INFO (0x0802)
Describes how a GPU resource attaches to the memory fabric.
TLV header:
tag : u16 = 0x0802length : u16 (value length in bytes)Value payload (all big-endian):
| Offset | Field | Type | Description |
|---|---|---|---|
| 0 | gpu_resource_id | u128 | Resource ID of the GPU |
| 16 | peer_group_id | u128 | Peer group for P2P transfers (e.g. NVLink domain) |
| 32 | vram_bytes | u64 | VRAM capacity in bytes |
| 40 | supports_host_map | u8 | 1 = GPU VRAM is mappable by host CPU, 0 = not |
| 41 | supports_remote_map | u8 | 1 = GPU VRAM is remotely accessible via fabric, 0 = not |
| 42 | preferred_mem_tiers_count | u16 | Number of preferred tier entries |
| 44 | preferred_mem_tiers | [u16; N] | MemTierKind values ordered by preference (N = count) |
Total value length: 44 + preferred_mem_tiers_count * 2 bytes.
0x09xx — Lease Intent TLVs
Lease intent TLVs are advisory placement hints carried in LEASE_ALLOC request
params. The allocator uses them for placement scoring but the resulting lease is
still a standard memory/block lease. Unknown or unsupported intent TLVs are silently
ignored (forward-compatible).
TLV_LEASE_INTENT_KV_CACHE (0x0901)
Advisory placement hint indicating the lease will be used as a KV cache for LLM inference.
TLV header:
tag : u16 = 0x0901length : u16 (value length: 41 or 57)Value payload (all big-endian):
| Offset | Field | Type | Description |
|---|---|---|---|
| 0 | model_fingerprint | [u8; 32] | SHA-256 fingerprint of the model |
| 32 | preferred_tier | u16 | Preferred memory tier (see below) |
| 34 | page_bytes | u32 | Page size the runtime will use for cache pages |
| 38 | sharing_mode | u16 | Sharing mode (see below) |
| 40 | has_gpu | u8 | 0x00 = no GPU, 0x01 = GPU resource ID follows |
| 41 | gpu_resource_id | u128 | (only present when has_gpu = 0x01) GPU to co-locate with |
Total value length: 41 bytes (no GPU) or 57 bytes (with GPU).
preferred_tier values:
| Value | Name |
|---|---|
| 0x0001 | HBM — High-bandwidth memory |
| 0x0002 | DDR — Standard DDR4/DDR5 |
| 0x0003 | CXL — CXL-attached memory |
| 0x0004 | PMEM — Persistent memory |
sharing_mode values:
| Value | Name | Description |
|---|---|---|
| 0x0000 | EXCLUSIVE | Single consumer, no sharing |
| 0x0001 | READ_SHARED | Multiple readers, single writer |
| 0x0002 | COPY_ON_WRITE | Readers see snapshot; writer gets private copy |
Invalid has_gpu values (not 0x00 or 0x01) MUST be rejected.
TLV_LEASE_CPU_ISOLATION (0x0902)
Per-lease CPU isolation policy. See docs/spec/cpu-isolation-wire-format.md
for the design rationale.
TLV header:
tag : u16 = 0x0902length : u16 = 0x0001Value payload:
| Offset | Field | Type | Description |
|---|---|---|---|
| 0 | class | u8 | CPU isolation class |
class values:
| Value | Name | Description |
|---|---|---|
| 0x00 | BestEffort | No pinning beyond cgroup quota (equivalent to absent TLV) |
| 0x01 | WholeCore | Lease owns a full core; no SMT sibling sharing |
| 0x02 | StrictIsolated | WholeCore + topology/NUMA constraints |
| 0x03–0xFE | reserved | MUST be rejected |
| 0xFF | reserved sentinel | Never valid on the wire |
When the TLV is absent, the lease inherits the daemon-wide default set by
--cpu-isolation-policy. Unknown class bytes and incorrect TLV lengths
(anything other than 1) MUST be rejected with InvalidIntent.
TLV_LEASE_GPU_EXCLUSIVITY (0x0903)
Per-lease GPU exclusivity class. See docs/spec/gpu-exclusivity-wire-format.md
for the design rationale.
TLV header:
tag : u16 = 0x0903length : u16 = 0x0001Value payload:
| Offset | Field | Type | Description |
|---|---|---|---|
| 0 | class | u8 | GPU exclusivity class |
class values:
| Value | Name | Description |
|---|---|---|
| 0x00 | Shared | Device may multiplex other tenants |
| 0x01 | SessionExclusive | Exclusive residency for session lifetime |
| 0x02 | DeviceExclusive | Whole device for lease lifetime |
| 0x03 | PartitionExclusive | Reserved for future MIG/partition isolation |
| 0x04–0xFE | reserved | MUST be rejected |
| 0xFF | reserved sentinel | Never valid on the wire |
When the TLV is absent, the lease inherits the daemon-wide default set by
--gpu-share-mode. The daemon mode acts as a permission envelope (not a
ceiling): clients cannot escape a tighter daemon mode by requesting a looser
class. Unknown class bytes and incorrect TLV lengths MUST be rejected with
InvalidIntent.
TLV_LEASE_AFFINITY (0x0910)
Per-lease affinity constraint. Multiple entries may appear in the same params
blob — one per constraint. See docs/spec/affinity-request-model.md for
the design rationale and docs/grafos/affinity-taxonomy.md for the taxonomy.
TLV header:
tag : u16 = 0x0910length : u16 (variable)Value payload:
| Offset | Field | Type | Description |
|---|---|---|---|
| 0 | category | u8 | Affinity category |
| 1 | strength | u8 | Strength (bits 0:6) + anti-affinity flag (bit 7) |
| 2 | target_type | u8 | Target type |
| 3 | target_len | u16 BE | Length of target bytes |
| 5 | target | [u8; target_len] | Target value |
category values:
| Value | Name |
|---|---|
| 0x01 | Resource — co-locate with a specific resource |
| 0x02 | State — co-locate with a data shard or lease |
| 0x03 | Topology — topology / failure-domain placement |
| 0x04 | Trust — trust domain / attestation requirement |
| 0x05 | Facility — reserved (deferred) |
strength byte layout: bits 0:6 = strength value, bit 7 = anti-affinity flag.
| Value (masked) | Name |
|---|---|
| 0x01 | Required — filter stage, fail-closed |
| 0x02 | Preferred — score stage, soft ranking |
| 0x03 | Adaptive — reserved (deferred) |
target_type values:
| Value | Name | target_len |
|---|---|---|
| 0x01 | NodeId | 16 (u128 BE) |
| 0x02 | ResourceId | 16 (u128 BE) |
| 0x03 | LeaseId | 16 (u128 BE) |
| 0x04 | ServiceId | 16 (u128 BE) |
| 0x05 | TrustDomain | variable (UTF-8) |
| 0x06 | RackId | 4 (u32 BE) |
Unknown category, strength, or target_type values MUST be rejected with
InvalidIntent. Required affinity that cannot be satisfied results in
an empty placement (no candidates pass the filter).
TLV_NODE_AFFINITY_META (0x0804)
Node-level affinity metadata: topology locality and trust domain.
TLV header:
tag : u16 = 0x0804length : u16 (variable)Value payload (all big-endian):
| Offset | Field | Type | Description |
|---|---|---|---|
| 0 | rack_id | u32 | Rack identifier |
| 4 | row_id | u32 | Row/aisle identifier |
| 8 | site_id | u32 | Site/datacenter identifier |
| 12 | geo_hash | u64 | Geographic hash |
| 20 | trust_domain_len | u16 | Length of trust domain name |
| 22 | trust_domain | UTF-8 | Trust domain name (empty = not advertised) |
TLV_SUPPORTED_ISOLATION (0x0805)
Advertises which CPU isolation and GPU exclusivity classes this node honors per-lease.
TLV header:
tag : u16 = 0x0805length : u16 (variable)Value payload:
| Offset | Field | Type | Description |
|---|---|---|---|
| 0 | cpu_count | u8 | Number of supported CPU isolation classes |
| 1 | cpu_classes | [u8; cpu_count] | CpuIsolationClass values (0x00–0x02) |
| 1+cpu_count | gpu_count | u8 | Number of supported GPU exclusivity classes |
| 2+cpu_count | gpu_classes | [u8; gpu_count] | GpuExclusivityClass values (0x00–0x03) |
Page Operations (FBMKV_PAGE_V0)
Profile identifier: PROFILE_FBMKV_PAGE_V0 = 0x0A01.
This profile provides a page-based abstraction over leased fabric memory. Each lease is divided into fixed-size pages with tracked state (dirty, access count, tier residency, advisory hints). Six operations form the profile.
Operation codes
| Code | Name | Description |
|---|---|---|
0x10 | PAGE_READ | Read bytes from a page |
0x11 | PAGE_WRITE | Write bytes to a page |
0x12 | PAGE_COPY | Copy a page between leases (or within one) |
0x13 | PAGE_ZERO | Zero a page |
0x14 | PAGE_ADVISE | Set advisory hints on a page |
0x15 | PAGE_QUERY | Query current state of a page |
PAGE_READ (0x10) request
lease_id : u128page_idx : u32offset : u32length : u32PAGE_READ response:
status : u8data : [u8; length] (only if status == OK)PAGE_WRITE (0x11) request
lease_id : u128page_idx : u32offset : u32length : u32data : [u8; length]PAGE_WRITE response:
status : u8PAGE_COPY (0x12) request
src_lease_id : u128src_page_idx : u32dst_lease_id : u128dst_page_idx : u32Cross-tenant copies are denied. Both leases must be active.
PAGE_COPY response:
status : u8PAGE_ZERO (0x13) request
lease_id : u128page_idx : u32PAGE_ZERO response:
status : u8PAGE_ADVISE (0x14) request
lease_id : u128page_idx : u32advice_flags : u32PAGE_ADVISE response:
status : u8PAGE_QUERY (0x15) request
lease_id : u128page_idx : u32PAGE_QUERY response:
status : u8page_idx : u32resident_tier : u8dirty : u8 (0 = clean, 1 = dirty)access_count : u32last_access_ms : u64advice_flags : u32MemTierKind (page tier classification)
| Value | Name |
|---|---|
| 0 | Dram — Fast local DRAM |
| 1 | Hbm — High-bandwidth memory |
| 2 | Pmem — Persistent memory |
| 3 | Remote — Remote fabric memory (RDMA/NVMe-oF/CXL) |
PageAdvice flags
| Bit | Value | Name | Description |
|---|---|---|---|
| 0 | 0x01 | WILL_NEED | Page will be accessed soon |
| 1 | 0x02 | DONT_NEED | Page is no longer needed |
| 2 | 0x04 | SEQUENTIAL | Sequential access pattern expected |
| 3 | 0x08 | RANDOM | Random access pattern expected |
| 4 | 0x10 | EVICT_PRIORITY_LOW | Low eviction priority |
| 5 | 0x20 | EVICT_PRIORITY_HIGH | High eviction priority |
Page error status codes
0x00 OK0x01 LEASE_NOT_FOUND0x02 LEASE_NOT_ACTIVE0x03 LEASE_EXPIRED0x04 LEASE_REVOKED0x05 LEASE_FENCED0x06 PAGE_OUT_OF_RANGE0x07 INSUFFICIENT_CAPACITY0x08 SRC_LEASE_NOT_FOUND0x09 DST_LEASE_NOT_FOUND0x0A CROSS_TENANT_COPY_DENIEDLease Export/Import/Relocate
Three QUIC control-plane opcodes for exporting lease handles to other runtimes, importing them, and relocating data between leases.
Op codes
| Code | Name | Description |
|---|---|---|
0x0020 | OP_LEASE_EXPORT_HANDLE | Export a lease handle for another runtime |
0x0021 | OP_LEASE_IMPORT_HANDLE | Import a previously exported handle |
0x0022 | OP_LEASE_RELOCATE | Relocate data between leases |
BindingType registry
The binding type is a u8 discriminant describing how the leased resource is
accessed on the data plane. Unknown binding types MUST be rejected (fail-closed).
| Value | Name | binding_desc layout |
|---|---|---|
0x01 | PcieBar | u64 bar_base + u64 bar_len |
0x02 | RdmaRkey | u32 rkey + u64 remote_addr |
0x03 | LinuxDmabuf | u32 fd + u64 offset + u64 len |
0x04 | VramOffset | u64 vram_offset + u64 len |
0x05 | FabricPage | u128 page_id + u64 offset + u64 len |
Target runtime kinds
| Value | Name |
|---|---|
| 0 | GPU_WORKER |
| 1 | TASKLET |
| 2 | NATIVE_SERVICE |
| 3 | ACCELERATOR |
Rights bitmask
| Bit | Value | Name |
|---|---|---|
| 0 | 1 | RIGHT_READ |
| 1 | 2 | RIGHT_WRITE |
| 2 | 4 | RIGHT_APPEND |
RelocateMode values
| Value | Name | Description |
|---|---|---|
| 0 | Copy | Full copy; both source and destination remain valid |
| 1 | Move | Copy data then mark source for deferred free |
| 2 | CowSeed | Create reference without immediate copy |
Unknown mode values MUST be rejected.
LEASE_EXPORT_HANDLE (0x0020)
Request payload (all big-endian):
lease_id : u64target_runtime_kind : u16target_identity : u128rights : u32ttl_secs : u32Total: 30 bytes.
Response payload:
export_handle : u128expires_at : u64binding_type : u8binding_desc : bytes (u32 length + data; layout depends on binding_type)LEASE_IMPORT_HANDLE (0x0021)
Request payload:
export_handle : u128Total: 16 bytes.
Response payload:
local_attach_id : u64rights : u32expires_at : u64Total: 20 bytes.
LEASE_RELOCATE (0x0022)
Request payload:
src_lease_id : u64dst_lease_id : u64mode : u8 (RelocateMode)has_preserve : u8 (0 = absent, 1 = present)preserve_until_ms : u64 (only if has_preserve == 1)Total: 18 bytes (without preserve) or 26 bytes (with preserve).
Response payload:
bytes_transferred : u64status : u8 (0 = Complete, 1 = InProgress, 2 = Failed)reason : bytes (u32 length + UTF-8 data; only if status == 2)Constrained dataplane bindings (v0 draft)
These bindings are transport-specific protocols used by constrained nodes that cannot implement RDMA/NVMe-oF yet. They are part of the v0 wire contract once stabilized; until then they are treated as draft profiles.
Default direction for general-purpose nodes is a secure QUIC control plane plus QUIC dataplane transport. UDP shims (FBMU/FBBU) are a constrained compatibility profile for bring-up and limited environments. See:
docs/platform/pi5/draft-dataplane-shims-v0.md(draft, UDP shims)docs/platform/pi5/design-doc-1-quic-controlplane-udp-dataplanes.md(architecture)- Golden vectors:
vectors/v0/fbmu/manifest.txtandvectors/v0/fbbu/manifest.txt
Note: The TCP constrained dataplane profiles below (FBMT/FBBT) are deprecated.
The intended replacement direction is QUIC streams carrying the same application framing
(READ/WRITE, READ_BLOCK/WRITE_BLOCK). Until the QUIC dataplane profile is fully specified,
UDP shims remain draft-only compatibility transport. The forthcoming QUIC dataplane profile will be
documented alongside docs/platform/pi5/quic-profile-v0.md.
TCP memory protocol (FBMT)
Deprecated: kept for historical reference; do not extend.
This protocol runs over TCP and exposes a leased memory region using simple read/write ops.
All integers are big-endian.
Header
Each message begins with a fixed 16-byte header:
magic : [u8; 4] = "FBMT"version : u8 = 1op : u8flags : u16 (must be 0 for v1)payload_len : u32request_id : u32payload_len counts only the payload bytes following the header. request_id
is client-chosen; for IO operations it MUST be non-zero. Unknown magic,
unsupported version, or non-zero flags MUST result in the connection being
closed.
Opcodes
0x01 HELLO0x02 HELLO_ACK0x10 READ0x11 READ_RESP0x20 WRITE0x21 WRITE_RESP0x30 PING0x31 PONG0x7F ERRORHandshake
Client MUST send HELLO immediately after connect. Server replies with
HELLO_ACK and either accepts or rejects the lease.
HELLO payload:
lease_id : u128HELLO_ACK payload:
status : u8reserved : u8reserved : u16resource_len : u64max_read_len : u32max_write_len : u32If status != 0, the server SHOULD close the connection after sending the
response.
IO operations
READ payload:
offset : u64length : u32READ_RESP payload:
status : u8reserved : u8reserved : u16length : u32data : [u8; length]WRITE payload:
offset : u64length : u32data : [u8; length]WRITE_RESP payload:
status : u8reserved : u8reserved : u16Rules:
offset + lengthMUST be withinresource_lenfromHELLO_ACK.lengthMUST be> 0and<= max_{read,write}_len.- Payload sizes MUST match
length; otherwise returnBAD_LENGTH. - Server MAY process requests out of order; client matches responses by
request_id.
Status codes
0x00 OK0x01 INVALID_LEASE0x02 EXPIRED0x03 FENCED0x04 OUT_OF_RANGE0x05 BAD_LENGTH0x06 BUSY0x07 INTERNALFor non-OK statuses on IO requests, responses MUST contain no data bytes.
PING/PONG
PING and PONG have empty payloads and allow a client to test liveness.
Servers MAY ignore PING requests.
Connection lifecycle
- Server MUST close the connection when the lease expires or is revoked.
- Client SHOULD treat connection close as lease invalidation.
- Server SHOULD close the connection on any protocol violation.
TCP block storage protocol (FBBT)
Deprecated: kept for historical reference; do not extend.
This protocol runs over TCP and exposes a fixed-size block device using lease-gated access.
All integers are big-endian.
Header
Each message begins with a fixed 16-byte header:
magic : [u8; 4] = "FBBT"version : u8 = 1op : u8flags : u16 (must be 0 for v1)payload_len : u32request_id : u32payload_len counts only the payload bytes following the header. request_id
is client-chosen; for IO operations it MUST be non-zero. Unknown magic,
unsupported version, or non-zero flags MUST result in the connection being
closed.
Opcodes
0x01 HELLO0x02 HELLO_ACK0x10 READ_BLOCK0x11 READ_BLOCK_RESP0x20 WRITE_BLOCK0x21 WRITE_BLOCK_RESP0x30 PING0x31 PONG0x7F ERRORHandshake
Client MUST send HELLO immediately after connect. Server replies with
HELLO_ACK and either accepts or rejects the lease.
HELLO payload:
lease_id : u128HELLO_ACK payload:
status : u8reserved : u8reserved : u16block_size : u32device_block_cnt : u64max_blocks_per_io: u32If status != 0, the server SHOULD close the connection after sending the
response.
IO operations
READ_BLOCK payload:
block_index : u64block_count : u32READ_BLOCK_RESP payload:
status : u8reserved : u8reserved : u16block_count : u32block_data : [u8; block_count * block_size]WRITE_BLOCK payload:
block_index : u64block_count : u32block_data : [u8; block_count * block_size]WRITE_BLOCK_RESP payload:
status : u8reserved : u8reserved : u16Rules:
block_index + block_countMUST be withindevice_block_cntfromHELLO_ACK.block_countMUST be> 0and<= max_blocks_per_io.- Payload sizes MUST match
block_count * block_size; otherwise returnBAD_LENGTH. - Server MAY process requests out of order; client matches responses by
request_id.
Status codes
0x00 OK0x01 INVALID_LEASE0x02 EXPIRED0x03 FENCED0x04 OUT_OF_RANGE0x05 BAD_LENGTH0x06 BUSY0x07 INTERNALFor non-OK statuses on IO requests, responses MUST contain no data bytes.
PING/PONG
PING and PONG have empty payloads and allow a client to test liveness.
Servers MAY ignore PING requests.
Connection lifecycle
- Server MUST close the connection when the lease expires or is revoked.
- Client SHOULD treat connection close as lease invalidation.
- Server SHOULD close the connection on any protocol violation.