Recipe 59: Cost Attribution With Accounting Tags
Situation
A team needs to know what each piece of their workload actually costs. Not “what’s the tenant invoice” — that’s the operator’s view. The team wants to roll up cost by their own budget group (e.g. “production inference” vs “research” vs “internal tools”) and by their own request class (e.g. “realtime”, “interactive”, “batch”) so they can answer questions like:
- How much did production inference cost this week?
- Within production inference, how much was realtime vs batch?
- Is the synthetic monitoring probe inflating any of these numbers?
In grafOS, every lease can carry a fixed-size AccountingTag with
five canonical fields: tenant, workload, request_class,
budget_group, flags. The fields enter the audit-chain canonical
bytes deterministically. A tenant’s local bookkeeping and the
operator’s billing path consume the same struct, so the tenant’s
roll-up matches what the invoice would compute.
What You Build
A typed cost-attribution helper that:
- Builds
AccountingTags at the call site without hiding which u32 is which (tag(tenant, workload, request_class, budget_group, flags)); - Carries lease-held seconds + a tenant-defined per-second rate;
- Rolls up cost by
budget_groupand by(budget_group, request_class); - Surfaces a typed
FLAG_SYNTHETIC_PROBEbit and anis_synthetic_probepredicate so probes can be filtered out of production roll-ups without magic numbers; - Uses saturating arithmetic so a pathological rate doesn’t panic.
The compiled recipe lives in
cookbook/recipe-59-cost-attribution-tags.
Core grafOS API Path
use grafos_core::AccountingTag;
let prod_realtime = AccountingTag { tenant: 1, workload: 42, request_class: 1, // tenant-defined: 1 = realtime budget_group: 100, // tenant-defined: 100 = production-inference flags: 0,};
// Round-trip through canonical bytes — same encoding the// audit chain consumes when this tag accompanies a lease.let mut buf = [0u8; AccountingTag::ENCODED_LEN];prod_realtime.encode(&mut buf)?;let back = AccountingTag::decode(&buf)?;assert_eq!(back, prod_realtime);# Ok::<(), Box<dyn std::error::Error>>(())Program
use cookbook_recipe_59_cost_attribution_tags::{ is_synthetic_probe, tag, BudgetByClassRollup, BudgetGroupRollup, LeaseCostRecord, FLAG_SYNTHETIC_PROBE,};
// Three production leases + one synthetic probe.let records = vec![ LeaseCostRecord { tag: tag(1, /*workload=*/ 1, /*class=*/ 1, /*budget=*/ 100, 0), seconds_held: 3600, rate_micros_per_sec: 1_000, }, LeaseCostRecord { tag: tag(1, 2, 2, 100, 0), seconds_held: 1800, rate_micros_per_sec: 2_000, }, LeaseCostRecord { tag: tag(1, 3, 1, 200, 0), seconds_held: 7200, rate_micros_per_sec: 1_500, }, LeaseCostRecord { tag: tag(1, 99, 1, 100, FLAG_SYNTHETIC_PROBE), seconds_held: 60, rate_micros_per_sec: 1_000, },];
// Production-only roll-up — drop probes.let production_only: Vec<_> = records.iter() .copied() .filter(|r| !is_synthetic_probe(r.tag)) .collect();let by_budget = BudgetGroupRollup::from_records(production_only.iter().copied());let by_budget_class = BudgetByClassRollup::from_records(production_only);
for (budget, total) in &by_budget.totals_micros { println!("budget_group {budget}: {total} micro-units");}for ((budget, class), total) in &by_budget_class.totals_micros { println!(" budget={budget} class={class}: {total}");}Design
The AccountingTag shape is deliberately five fixed-size fields
and nothing else:
| Field | Type | Purpose |
|---|---|---|
tenant | u64 | Issuing tenant / namespace owner. |
workload | u64 | Program / service / run identifier. |
request_class | u32 | Policy-defined latency / priority bucket. |
budget_group | u32 | Policy-defined cost bucket. |
flags | u32 | Policy-defined bitfield; bit 0 reserved for “synthetic / probe edge” by convention. |
Two consequences:
- No free-form strings, no PII. The fields are u64s and u32s
that the scheduler does not interpret. The tenant’s policy
layer decides what
request_class = 1means; the scheduler carries the number through unchanged. SIEM streams and billing roll-ups operate on integers, not strings, so aggregation is deterministic. - Fixed canonical bytes.
AccountingTag::ENCODED_LENis 28 bytes. The encoding is byte-stable across versions, so a recorded audit row from yesterday round-trips through today’s decoder unchanged.
The synthetic-probe flag is the one bit the recipe surfaces with a named constant. Monitoring tooling that emits synthetic traffic (latency probes, capacity checks, canary requests) sets bit 0 so production cost roll-ups can filter it out without sampling or heuristics.
BudgetGroupRollup keys on budget_group alone — typical
operator-facing view (“what did production inference cost?”).
BudgetByClassRollup keys on (budget_group, request_class) for
the deeper “within production inference, how much was realtime vs
batch?” view. Both use BTreeMap so iteration is sort-stable, so
roll-up output is byte-identical across runs given the same input.
Failure Modes
- Misattributed lease: a lease created without an
AccountingTag(or withAccountingTag::ZERO) aggregates into the all-zeros bucket. The tenant’s local discipline catches this — the operator’s billing path will also surface the zeroed rows so neither side hides them. - Probe inflation: probes that don’t set
FLAG_SYNTHETIC_PROBEget rolled up into production. The recipe ships the constant so probe tooling can set it consistently. - Rate overflow:
LeaseCostRecord::total_microsuses saturating multiplication. A pathological rate (e.g. u64::MAX from a misconfigured rate-card) saturates atu64::MAXinstead of panicking, and the saturated value is visible at the roll-up — it does not silently roll into a neighboring bucket. - Tag tampering after seal: the audit-chain hash includes the
tag bytes (28 bytes per record). Tampering with any of the
five fields after seal breaks the chain hash; the reference
collector at
crates/grafos-audit-collectorcatches this on ingest.
Tests
Run it with:
cargo test -p cookbook-recipe-59-cost-attribution-tagsSix tests cover budget-group aggregation across request classes,
budget+class split rollup, synthetic-probe flag predicates,
canonical-bytes round-trip for AccountingTag (proves the recipe
matches what audit-chain canonical bytes carry), rollup stability
under record reordering, and saturating arithmetic on pathological
rates.
Adaptation Notes
- Currency: this recipe uses tenant-defined “micro-units per
second” so it doesn’t pick a currency. Production callers wire
the per-second rate from
grafos admin fair-share-policy get/ the scheduler’s rate-card so tenant and operator agree. - Field semantics:
request_classandbudget_groupare policy-defined u32s. Pick a convention (e.g.request_class = 1for realtime) and document it in the tenant’s own runbook; the scheduler treats both as opaque identifiers. - Flag bits beyond bit 0: the
flagsfield is policy-defined except for bit 0 (“synthetic / probe edge”). Tenants may define additional bits (bit 1 = “speculative cancel-on-loss”, bit 2 = “tenant-internal idle pre-warm”, etc.) — keep the meaning stable, never reuse a retired bit. - Cross-tenant rollup: not allowed at the tenant API
surface. A tenant rolling up only sees its own tagged leases.
Operator-side rollups across tenants are visible through the
billing endpoints (
/api/v1/billing/invoice) which carry the sameAccountingTagshape.
See also:
crates/grafos-core/src/accounting.rs—AccountingTag, canonical encode/decode.docs/operations/scheduler-features.md§ “Accounting tags- project scopes”.
docs/spec/audit-chain-canonical-bytes.md— wire shape for the audit-chain bytes that carry these tags.- Recipe 57 (Per-Project Fair-Share Policy) — pairs naturally with this one when tagging is keyed off a project_id used in both surfaces.