Recipe 3: A Database Buffer Pool That Borrows Memory From the Network
Situation
Databases are often constrained by buffer pool size. If the working set grows beyond RAM, performance falls off a cliff as disk I/O increases.
Traditional options:
- Move to a bigger machine (restart, cost).
- Add cache nodes (complexity).
- Add replicas (not the same thing as more buffer pool).
In a disaggregated fabric, the “buffer pool” is not limited to local RAM. You can lease memory on other nodes and use it as a second-tier cache.
The goal: add capacity now, without moving the primary process.
What You Build
A two-tier buffer pool abstraction:
- Tier 1: low-latency “near” memory lease.
- Tier 2: a set of additional memory leases (shards) on other nodes.
Reads:
- Check tier 1.
- If miss, check tier 2.
- If miss, go to disk / storage.
Writes:
- Write-through or write-back depending on your system.
Building Blocks
MemBuilderandMemLeasefor acquiring memory.FabricHashMap(orFabricVec) to store pages/blocks.- Typed errors:
FabricError::Disconnected,FabricError::LeaseExpired.
Related API docs:
- grafos-std memory API (source)
- FabricHashMap implementation (source)
- grafos-collections guide
- grafos-std README
Design
Key Choice
In a DB, the key is typically (file_id, page_no) or similar. For the recipe:
- Use a
u128page identifier, or serialize a tuple.
Value Choice
The value is a fixed-size page (e.g. 4 KiB, 8 KiB). FabricHashMap stores serialized values; fixed-size pages
are a good fit because:
- You know the stride.
- You can avoid variable allocation.
Tier 2 Sharding
Tier 2 can be:
- A single large shard (simple, hotspot risk).
- Many shards (better parallelism and failure isolation).
Routing by hash is fine.
Consistency
This recipe is about capacity, not transactional semantics. You can treat the buffer pool as a cache:
- Stale reads are unacceptable for most DBs, so your DB engine still owns correctness.
- The buffer pool holds copies of pages; the source of truth remains storage.
Walkthrough (Implementation Sketch)
1. Tier 1
Acquire a lease and build a map:
let l1 = MemBuilder::new().min_bytes(128 * 1024).acquire()?;let mut tier1: FabricHashMap<u128, [u8; 4096]> = FabricHashMap::new(l1, 16, 4096)?;(The stride choices here are illustrative; match your actual serialization/layout.)
2. Tier 2
Acquire multiple leases:
let mut tier2 = Vec::new();for _ in 0..4 { let lease = MemBuilder::new().min_bytes(256 * 1024).acquire()?; let map: FabricHashMap<u128, [u8; 4096]> = FabricHashMap::new(lease, 16, 4096)?; tier2.push(map);}3. Read Path
fn read_page(id: u128) -> Result<[u8; 4096]> { if let Some(p) = tier1.get(&id)? { return Ok(p); }
let idx = (id as usize) % tier2.len(); if let Some(p) = tier2[idx].get(&id)? { // Promote to tier1 if desired. let _ = tier1.insert(&id, &p)?; return Ok(p); }
// Fallback: storage. let p = disk_read(id)?; let _ = tier1.insert(&id, &p)?; Ok(p)}4. Handling Remote Failure
Tier 2 is remote-ish. Failures happen. Decide behavior:
- If a tier2 shard is
Disconnected, treat it as a miss and go to disk. - Optionally reacquire a new lease and rebuild that shard.
In a real system, you would also account for latency differences: tier 2 is higher latency than tier 1, but still often lower than disk.
Failure Modes
Disconnected: treat as a miss, rebuild shard opportunistically.LeaseExpired: shard died; drop it from tier2 and replace later.
Because leases have TTL, a crash does not leave “remote shm segments” lying around.
Observability
Track:
- l1 hit rate
- l2 hit rate
- l2 latency percentiles
- l2 shard health (disconnect count)
- total leased bytes
Variations
- Adaptive tier2: add or remove shards based on miss rate.
- Locality-aware: prefer nodes in the same rack for tier2 leases.
- Compression: store compressed pages in tier2.
- Write-back: hold dirty pages in tier1 and flush to storage asynchronously.