Recipe 3: A Database Buffer Pool That Borrows Memory From the Network

Situation

Databases are often constrained by buffer pool size. If the working set grows beyond RAM, performance falls off a cliff as disk I/O increases.

Traditional options:

Move to a bigger machine (restart, cost).
Add cache nodes (complexity).
Add replicas (not the same thing as more buffer pool).

In a disaggregated fabric, the “buffer pool” is not limited to local RAM. You can lease memory on other nodes and use it as a second-tier cache.

The goal: add capacity now, without moving the primary process.

What You Build

A two-tier buffer pool abstraction:

Tier 1: low-latency “near” memory lease.
Tier 2: a set of additional memory leases (shards) on other nodes.

Reads:

Check tier 1.
If miss, check tier 2.
If miss, go to disk / storage.

Writes:

Write-through or write-back depending on your system.

Building Blocks

MemBuilder and MemLease for acquiring memory.
FabricHashMap (or FabricVec) to store pages/blocks.
Typed errors: FabricError::Disconnected, FabricError::LeaseExpired.

Related API docs:

Design

Key Choice

In a DB, the key is typically (file_id, page_no) or similar. For the recipe:

Use a u128 page identifier, or serialize a tuple.

Value Choice

The value is a fixed-size page (e.g. 4 KiB, 8 KiB). FabricHashMap stores serialized values; fixed-size pages are a good fit because:

You know the stride.
You can avoid variable allocation.

Tier 2 Sharding

Tier 2 can be:

A single large shard (simple, hotspot risk).
Many shards (better parallelism and failure isolation).

Routing by hash is fine.

Consistency

This recipe is about capacity, not transactional semantics. You can treat the buffer pool as a cache:

Stale reads are unacceptable for most DBs, so your DB engine still owns correctness.
The buffer pool holds copies of pages; the source of truth remains storage.

Walkthrough (Implementation Sketch)

1. Tier 1

Acquire a lease and build a map:

let l1 = MemBuilder::new().min_bytes(128 * 1024).acquire()?;
let mut tier1: FabricHashMap<u128, [u8; 4096]> = FabricHashMap::new(l1, 16, 4096)?;

(The stride choices here are illustrative; match your actual serialization/layout.)

2. Tier 2

Acquire multiple leases:

let mut tier2 = Vec::new();
for _ in 0..4 {
    let lease = MemBuilder::new().min_bytes(256 * 1024).acquire()?;
    let map: FabricHashMap<u128, [u8; 4096]> = FabricHashMap::new(lease, 16, 4096)?;
    tier2.push(map);
}

3. Read Path

fn read_page(id: u128) -> Result<[u8; 4096]> {
    if let Some(p) = tier1.get(&id)? { return Ok(p); }

    let idx = (id as usize) % tier2.len();
    if let Some(p) = tier2[idx].get(&id)? {
        // Promote to tier1 if desired.
        let _ = tier1.insert(&id, &p)?;
        return Ok(p);
    }

    // Fallback: storage.
    let p = disk_read(id)?;
    let _ = tier1.insert(&id, &p)?;
    Ok(p)
}

4. Handling Remote Failure

Tier 2 is remote-ish. Failures happen. Decide behavior:

If a tier2 shard is Disconnected, treat it as a miss and go to disk.
Optionally reacquire a new lease and rebuild that shard.

In a real system, you would also account for latency differences: tier 2 is higher latency than tier 1, but still often lower than disk.

Failure Modes

Disconnected: treat as a miss, rebuild shard opportunistically.
LeaseExpired: shard died; drop it from tier2 and replace later.

Because leases have TTL, a crash does not leave “remote shm segments” lying around.

Observability

Track:

l1 hit rate
l2 hit rate
l2 latency percentiles
l2 shard health (disconnect count)
total leased bytes

Variations

Adaptive tier2: add or remove shards based on miss rate.
Locality-aware: prefer nodes in the same rack for tier2 leases.
Compression: store compressed pages in tier2.
Write-back: hold dirty pages in tier1 and flush to storage asynchronously.