Expand description
Cache-specific observability metrics for LLM inference KV caches.
Defines the 12 metric names from the design document Section 7.5 and provides
a CacheMetrics struct that aggregates all cache-related counters, gauges,
and histograms. The CacheMetrics::global() singleton can be used by
KvCacheManager and other cache infrastructure to emit metrics.
Structs§
- Cache
Metrics - Aggregated cache observability metrics.
Constants§
- ALL_
CACHE_ METRIC_ NAMES - All 12 cache metric names as a slice, for validation.
- CACHE_
ATTACH_ FAILURE_ TOTAL - Counter: total attach failures, labeled by reason.
- CACHE_
DECODE_ FAR_ FROM_ CACHE - Counter: decode requests placed far from cache (low cache_locality score).
- CACHE_
DECODE_ LATENCY_ US - Histogram: steady-state per-token decode latency in microseconds.
- CACHE_
FIRST_ TOKEN_ LATENCY_ US - Histogram: time to first token in microseconds.
- CACHE_
FORK_ TOTAL - Counter: total cache forks.
- CACHE_
HIT_ TOTAL - Counter: total cache hits, labeled by cache_class.
- CACHE_
MISS_ TOTAL - Counter: total cache misses, labeled by cache_class.
- CACHE_
PREFILL_ LATENCY_ US - Histogram: prefill latency in microseconds.
- CACHE_
RECLAIM_ TOTAL - Counter: total cache reclaims, labeled by cause (expired, revoked, evicted).
- CACHE_
RESIDENT_ BYTES - Gauge: resident bytes, labeled by tier (VRAM, DRAM, CXL).
- CACHE_
SPILL_ BYTES_ TOTAL - Counter: total bytes spilled between tiers, labeled by (src_tier, dst_tier).
- CACHE_
WARMUP_ BYTES_ TOTAL - Counter: total bytes warmed between tiers, labeled by (src_tier, dst_tier).