Recipe 4: Real-Time Collaboration With Lease-Based Conflict Resolution
Situation
Two or more services must coordinate access to a shared resource: a scheduler, a coordinator, an index writer, a “single writer” for some domain.
Distributed locks are famously hard:
- If the lock holder crashes, someone must detect it.
- Heartbeats and timeouts are hard to tune.
- A separate coordination service (ZooKeeper/etcd) becomes a reliability dependency.
The fabric lease model changes the shape of the problem: holding the lock means holding a lease. If the holder dies, the lease expires, and the lock state is reclaimed.
What You Build
A leader-election and coordination pattern using:
grafos_sync::FabricMutex<T>to select a leader and protect shared leader state.grafos_sync::watch()to broadcast leader identity/epoch.grafos_sync::FabricBarrierto coordinate phase transitions (optional).
The key part: leadership is treated as fenced, not “exactly once”.
Building Blocks
MemBuilderto acquire memory leases for sync primitives.FabricMutex<T>(requiresT: Copy) to protect a small leader record.watch()for fan-out notifications.grafos_fence::{FenceEpoch, FenceGuard, Fenced<T>}for epoch-based fencing — source
See also:
- grafos-sync mutex demo
- grafos-sync guide
- grafos-sync README
- FabricMutex implementation (source)
- watch() implementation (source)
Design
The Mutex Value
Because FabricMutex<T> stores a Copy value, store only small metadata:
leader_id: u128(or hash)leader_epoch: FenceEpoch
The mutex protects the authoritative epoch increment.
Fencing Rule
Any write performed “as leader” must include its FenceEpoch.
Workers hold a FenceGuard that rejects stale epochs via FenceGuard::check(). This addresses:
- delayed messages
- retries
- partitions where an old leader continues running
This is the key complement to lease TTL. TTL gets you bounded automatic release; fencing gets you safety when the world is messy.
Walkthrough (Implementation Sketch)
1. Acquire Lease and Create Mutex
use grafos_sync::FabricMutex;use grafos_std::mem::MemBuilder;use grafos_fence::{FenceEpoch, FenceGuard};
#[derive(Copy, Clone)]struct LeaderRecord { epoch: FenceEpoch, leader: u128,}
let lease = MemBuilder::new().min_bytes(4096).acquire()?;let mtx = FabricMutex::new( lease, 0, LeaderRecord { epoch: FenceEpoch::new(0), leader: 0 },)?;2. Leader Election
Each contender tries to lock:
let holder_id = my_node_id;let mut guard = mtx.lock(holder_id, 100)?;guard.epoch = guard.epoch.bump();guard.leader = my_node_id;let my_epoch = guard.epoch;drop(guard);At this point, you have:
- leader identity
- leader epoch (as
FenceEpoch)
3. Broadcast Leader Info
Use watch() with a shared arena offset:
- The leader publishes
(leader_id, leader_epoch)whenever it changes. - Workers subscribe and update their current view.
4. Workers Enforce Fencing
Each worker holds a FenceGuard initialized from the last known epoch:
let mut fence = FenceGuard::new(current_epoch);
// On each leader message:match fence.check(msg.epoch) { Ok(()) => { /* accept — epoch is current */ } Err(_stale) => { /* reject — stale epoch */ }}If the worker sees a newer epoch, it calls fence.advance() to update its view:
let new_epoch = fence.advance();5. Failure and Recovery
If the leader crashes:
- It stops renewing the lease (or loses connectivity).
- The lock becomes available after TTL / reclaim.
- Another contender locks, increments epoch, and becomes leader.
Workers will start rejecting old-epoch messages.
Failure Modes
- Lease expires while leader is active (renewal bug): leadership may flap.
- Observability should catch
LeaseExpiredevents; renew early (60-80% of TTL).
- Observability should catch
- Partition: old leader continues running but cannot renew / cannot reach fabric.
- Fencing protects correctness.
Observability
Track:
leader_epochgaugeleader_changes_totalmutex_lock_attempts_total/lock_fail_totalstale_epoch_reject_total
Variations
- Write-ahead leader log: persist epoch transitions to block storage.
- Multiple leaders: partition by keyspace, one mutex per partition.
- Barrier phases: use
FabricBarrierto coordinate “drain then switch” transitions.