Control Plane TLS Integration Plan (Draft)
This document outlines a pragmatic path to add authenticated encryption (TLS 1.3 / mTLS semantics) to the fabricBIOS control plane without blocking ongoing correctness work.
Current direction:
- Pi5 bare metal: control plane runs on QUIC over UDP (no TCP). QUIC uses TLS 1.3 for its handshake and key schedule, so “TLS enablement” means selecting/wiring a QUIC implementation with an acceptable TLS stack and certificate validation model.
- Dev/Linux: a proxy is allowed during early bring-up (e.g., terminate QUIC using a standard library on Linux, forward to the bare-metal node via a smaller trusted interface).
Goals
- Encrypt and authenticate control traffic (mTLS semantics).
- Keep core logic transport-agnostic.
- Minimize implementation churn by keeping the control plane as a framed request/response protocol regardless of the underlying transport.
Library Choice
We intentionally do not commit to a single QUIC stack in this document; the choice is gated by:
no_stdviability (direct-on-bare-metal mode) vs proxy mode,- certificate validation needs (mTLS identity mapping),
- memory footprint and API shape (state machine driven by UDP I/O).
Pragmatic approach:
- Proxy mode (Linux/QEMU): use a well-supported QUIC stack and a conventional TLS 1.3 implementation (e.g., rustls via the QUIC library) so we can validate identities and message formats early.
- Direct mode (bare metal): select a QUIC stack that can be integrated as a state machine with fixed pools, or keep proxy mode until a suitable direct stack is available.
Certificate and Key Sources
Dev (Linux/QEMU)
- Server keypair, client keypair, and CA bundle are provisioned per node out of dev tooling state directories. The exact filesystem layout is an implementation detail of the dev harness.
Bare metal (later)
- Inject via local provisioning path (config partition, provisioning tool, or relay-assisted enrollment).
- Store in a small key-value area (flash/SD partition) or in a dedicated secure blob region.
Handshake Identity Mapping
- Map the peer certificate SAN URI to
node_idusing:urn:fabricbios:node:0x<32-hex>. - Server fails closed if the SAN URI is missing or ambiguous (or the chain is not anchored in the configured trust set).
Implementation Steps
- Define a control-plane framing format that is independent of transport (length-delimited RPC messages).
- Implement identity mapping + fail-closed certificate policy at the control-plane boundary.
- Proxy mode:
- Terminate QUIC on Linux and forward framed control messages to/from the bare-metal node (for bring-up).
- Direct mode (when feasible):
- Wire a QUIC state machine into the bare-metal UDP stack (no TCP).
- Provide monotonic time, CSPRNG, and fixed-pool allocation.
- Add integration tests:
- identity mapping success/failure,
- request idempotency and replay-safety (0-RTT disabled),
- lease lifecycle correctness under retries.
Security Notes
- Require TLS 1.3 semantics (QUIC handshake).
- Reject peers without mapped identity or unknown CA.
- Keep token audience checks even with authenticated transport (defense-in-depth).
Current Status
Implemented
- Pi5 bare-metal direct mode: QUIC v1 + TLS 1.3 runs on bare metal using vendored
rustls+rustls-rustcrypto(patched forno_std+alloc). The bare-metal QUIC state machine includes retransmission. - mTLS (TOFU): Server cert pinning (first-connect) and client cert verification (TOFU SHA-256 pin) are implemented via the bare-metal TOFU verifier.
- Capability tokens: The CAP_REQUEST handler mints HMAC-SHA256 tokens; LEASE_ALLOC validates HMAC, TTL, audience (client cert hash), and permissions. Behind the
cap-tokensfeature flag. - Identity mapping: Client identity is extracted from the mTLS client cert hash (TOFU pin) and used as the audience for token binding.
- QUIC interop: The host-side Quinn client interops with the bare-metal server, including power-cycle soak.
- grafOS QUIC client:
QuicControlClientin grafos-runtime (behindquic-transport) supports static discovery and control-plane ops. - fabricbiosd QUIC server: Quinn listener serves all control ops over QUIC.
- TCP transport: Removed. QUIC is the sole default transport across active implementations. The COMPAT (TCP) profile is retained in documentation for historical compatibility context only.
Known limitations
- Proxy mode: Not needed in practice — direct QUIC on bare metal is the deployment path. Proxy mode remains documented as a fallback but is not a current target.
- CA-signed certificates: Currently TOFU-only. Proper enrollment and CA workflow is future work.
- Key rotation: Not yet implemented. Requires persistent storage on bare metal.
Future Work
- Rotate keys and support short-lived certs.
- Add OCSP/CRL once a CA workflow exists.
- Keep transport docs aligned with QUIC-default policy and avoid reintroducing stale TCP-first wording.