Skip to content

Recipe 17: Live Event Mode (Flash-Crowd Autopilot)

Situation

Traffic spikes are unpredictable and short-lived. The worst outcome is:

  • manual incident response to add capacity
  • capacity lingering after the event

You want the system to scale out and back in automatically.

What You Build

A hot-object replication pattern:

  • start with one cached copy
  • when load/latency exceeds thresholds, acquire more leases and replicate
  • route reads across replicas
  • stop renewing extra replicas when demand drops; let TTL expire

Building Blocks

  • MemBuilder leases for replicas
  • FabricDns for routing endpoints
  • grafos_observe for p99, lease churn, replica counts

Related API docs:

Design

Control Loop

Inputs:

  • QPS
  • p95/p99 latency
  • error rate

Actions:

  • add replica (acquire lease, copy bytes, register)
  • remove replica (stop renewing)

Safety

Avoid thrash:

  • hysteresis thresholds
  • minimum time between scale actions

Also add:

  • hard cap on replicas per object
  • cool-down period after a scale action

Replica Placement (Locality)

Even in a fabric, locality matters:

  • replicas in the same rack reduce tail latency for a rack-local flash crowd
  • a remote replica may help throughput but add latency

Placement policy is “policy, not mechanism”. The recipe assumes you can prefer nearby leases or adapt by observing latency and selecting the best-performing replica set over time.

Routing Model

You need a routing layer that can:

  • discover the current replica set
  • distribute reads across replicas

Simple approach:

  • FabricDns name -> list of replica endpoints (or a per-object name)
  • client selects replica by hash (request id) or least-loaded measurement

More advanced:

  • coordinator pushes replica set updates to clients via watch-like broadcasts

Walkthrough

1. Detect Hot Object

Detect with any cheap signal:

  • QPS over the last N seconds
  • p95/p99 latency increase
  • origin fetch rate

2. Allocate Replica Leases

When scaling out:

  1. acquire a new memory lease for the object bytes
  2. copy object bytes into that lease
  3. register the replica endpoint in discovery (FabricDns or equivalent)

3. Renew While Hot

Renew replica leases while demand is above threshold. Avoid per-request renewal:

  • renew when remaining TTL < 25%
  • apply jitter so all replicas do not renew at once

4. Route Reads

Clients select among replicas:

  • consistent hashing (stable distribution)
  • random choice (good enough)
  • least-loaded (requires feedback)

5. Scale Back In

When demand drops below the lower threshold for long enough:

  • stop renewing extra replicas
  • optionally deregister their endpoints
  • let leases expire naturally

Failure Modes

  • Replica lease expires unexpectedly: client falls back to another replica or origin.
  • Discovery staleness: clients may attempt dead replicas; implement quick failover and retry.
  • Thrash: fix with hysteresis and cooldown.

Observability

Track:

  • replicas_active{object_id=...}
  • replica_scale_out_total, replica_scale_in_total
  • per-object hit rate and origin fetch rate
  • lease churn and renewal errors

Variations

  • stripe an object across multiple leases for parallel reads (for very large objects)
  • multi-region: maintain independent replica sets per locality domain