Recipe 17: Live Event Mode (Flash-Crowd Autopilot)

Situation

Traffic spikes are unpredictable and short-lived. The worst outcome is:

manual incident response to add capacity
capacity lingering after the event

You want the system to scale out and back in automatically.

What You Build

A hot-object replication pattern:

start with one cached copy
when load/latency exceeds thresholds, acquire more leases and replicate
route reads across replicas
stop renewing extra replicas when demand drops; let TTL expire

Building Blocks

MemBuilder leases for replicas
FabricDns for routing endpoints
grafos_observe for p99, lease churn, replica counts

Related API docs:

Design

Control Loop

Inputs:

QPS
p95/p99 latency
error rate

Actions:

add replica (acquire lease, copy bytes, register)
remove replica (stop renewing)

Safety

Avoid thrash:

hysteresis thresholds
minimum time between scale actions

Also add:

hard cap on replicas per object
cool-down period after a scale action

Replica Placement (Locality)

Even in a fabric, locality matters:

replicas in the same rack reduce tail latency for a rack-local flash crowd
a remote replica may help throughput but add latency

Placement policy is “policy, not mechanism”. The recipe assumes you can prefer nearby leases or adapt by observing latency and selecting the best-performing replica set over time.

Routing Model

You need a routing layer that can:

discover the current replica set
distribute reads across replicas

Simple approach:

FabricDns name -> list of replica endpoints (or a per-object name)
client selects replica by hash (request id) or least-loaded measurement

More advanced:

coordinator pushes replica set updates to clients via watch-like broadcasts

Walkthrough

1. Detect Hot Object

Detect with any cheap signal:

QPS over the last N seconds
p95/p99 latency increase
origin fetch rate

2. Allocate Replica Leases

When scaling out:

acquire a new memory lease for the object bytes
copy object bytes into that lease
register the replica endpoint in discovery (FabricDns or equivalent)

3. Renew While Hot

Renew replica leases while demand is above threshold. Avoid per-request renewal:

renew when remaining TTL < 25%
apply jitter so all replicas do not renew at once

4. Route Reads

Clients select among replicas:

consistent hashing (stable distribution)
random choice (good enough)
least-loaded (requires feedback)

5. Scale Back In

When demand drops below the lower threshold for long enough:

stop renewing extra replicas
optionally deregister their endpoints
let leases expire naturally

Failure Modes

Replica lease expires unexpectedly: client falls back to another replica or origin.
Discovery staleness: clients may attempt dead replicas; implement quick failover and retry.
Thrash: fix with hysteresis and cooldown.

Observability

Track:

replicas_active{object_id=...}
replica_scale_out_total, replica_scale_in_total
per-object hit rate and origin fetch rate
lease churn and renewal errors

Variations

stripe an object across multiple leases for parallel reads (for very large objects)
multi-region: maintain independent replica sets per locality domain