Skip to content

Recipe 4: Real-Time Collaboration With Lease-Based Conflict Resolution

Situation

Two or more services must coordinate access to a shared resource: a scheduler, a coordinator, an index writer, a “single writer” for some domain.

Distributed locks are famously hard:

  • If the lock holder crashes, someone must detect it.
  • Heartbeats and timeouts are hard to tune.
  • A separate coordination service (ZooKeeper/etcd) becomes a reliability dependency.

The fabric lease model changes the shape of the problem: holding the lock means holding a lease. If the holder dies, the lease expires, and the lock state is reclaimed.

What You Build

A leader-election and coordination pattern using:

  • grafos_sync::FabricMutex<T> to select a leader and protect shared leader state.
  • grafos_sync::watch() to broadcast leader identity/epoch.
  • grafos_sync::FabricBarrier to coordinate phase transitions (optional).

The key part: leadership is treated as fenced, not “exactly once”.

Building Blocks

  • MemBuilder to acquire memory leases for sync primitives.
  • FabricMutex<T> (requires T: Copy) to protect a small leader record.
  • watch() for fan-out notifications.
  • grafos_fence::{FenceEpoch, FenceGuard, Fenced<T>} for epoch-based fencing — source

See also:

Design

The Mutex Value

Because FabricMutex<T> stores a Copy value, store only small metadata:

  • leader_id: u128 (or hash)
  • leader_epoch: FenceEpoch

The mutex protects the authoritative epoch increment.

Fencing Rule

Any write performed “as leader” must include its FenceEpoch.

Workers hold a FenceGuard that rejects stale epochs via FenceGuard::check(). This addresses:

  • delayed messages
  • retries
  • partitions where an old leader continues running

This is the key complement to lease TTL. TTL gets you bounded automatic release; fencing gets you safety when the world is messy.

Walkthrough (Implementation Sketch)

1. Acquire Lease and Create Mutex

use grafos_sync::FabricMutex;
use grafos_std::mem::MemBuilder;
use grafos_fence::{FenceEpoch, FenceGuard};
#[derive(Copy, Clone)]
struct LeaderRecord {
epoch: FenceEpoch,
leader: u128,
}
let lease = MemBuilder::new().min_bytes(4096).acquire()?;
let mtx = FabricMutex::new(
lease, 0,
LeaderRecord { epoch: FenceEpoch::new(0), leader: 0 },
)?;

2. Leader Election

Each contender tries to lock:

let holder_id = my_node_id;
let mut guard = mtx.lock(holder_id, 100)?;
guard.epoch = guard.epoch.bump();
guard.leader = my_node_id;
let my_epoch = guard.epoch;
drop(guard);

At this point, you have:

  • leader identity
  • leader epoch (as FenceEpoch)

3. Broadcast Leader Info

Use watch() with a shared arena offset:

  • The leader publishes (leader_id, leader_epoch) whenever it changes.
  • Workers subscribe and update their current view.

4. Workers Enforce Fencing

Each worker holds a FenceGuard initialized from the last known epoch:

let mut fence = FenceGuard::new(current_epoch);
// On each leader message:
match fence.check(msg.epoch) {
Ok(()) => { /* accept — epoch is current */ }
Err(_stale) => { /* reject — stale epoch */ }
}

If the worker sees a newer epoch, it calls fence.advance() to update its view:

let new_epoch = fence.advance();

5. Failure and Recovery

If the leader crashes:

  • It stops renewing the lease (or loses connectivity).
  • The lock becomes available after TTL / reclaim.
  • Another contender locks, increments epoch, and becomes leader.

Workers will start rejecting old-epoch messages.

Failure Modes

  • Lease expires while leader is active (renewal bug): leadership may flap.
    • Observability should catch LeaseExpired events; renew early (60-80% of TTL).
  • Partition: old leader continues running but cannot renew / cannot reach fabric.
    • Fencing protects correctness.

Observability

Track:

  • leader_epoch gauge
  • leader_changes_total
  • mutex_lock_attempts_total / lock_fail_total
  • stale_epoch_reject_total

Variations

  • Write-ahead leader log: persist epoch transitions to block storage.
  • Multiple leaders: partition by keyspace, one mutex per partition.
  • Barrier phases: use FabricBarrier to coordinate “drain then switch” transitions.