Recipe 43 - Provider Migration Without Losing Work
Situation
You started with replicas in AWS and GCP. Later you want to add Azure, let it catch up, and move some consumers there. The dangerous version of this is a cloud-by-cloud migration script that rewrites queue names, copies offsets, and quietly sends consumers to a provider the program never authorized.
What You Build
A provider migration where:
- accepted work is already in a replicated queue;
- the replica-set locator advances to a new generation that includes Azure;
- a catch-up consumer proves the new provider can read the committed stream;
- active consumers move to Azure only through an explicit placement policy;
- the original consumer group keeps its placement and cursor.
The compiled recipe lives in cookbook/recipe-43-provider-migration.
use cookbook_recipe_43_provider_migration::{ azure_region, ProviderMigrationPipeline,};
let mut pipeline = ProviderMigrationPipeline::new()?;pipeline.publish("order-1", "created")?;pipeline.publish("order-2", "paid")?;
let migration = pipeline.migrate_to_azure()?;assert_eq!(migration.added_domain, azure_region());assert_eq!(migration.catch_up_count, 2);
let azure_group = pipeline.shift_consumers_to_azure()?;let deliveries = pipeline.poll_shifted_from_azure(&azure_group, 10)?;assert_eq!(deliveries.len(), 2);# Ok::<(), grafos_replicated::ReplicatedError>(())Core grafOS API Path
The migration uses the queue and locator APIs directly. The initial resource is AWS+GCP:
use fabricbios_core::lease::FenceEpoch;use grafos_replicated::{ ConsumerGroupName, DeliverySemantics, LogicalResourceName, ReplicaHealth, ReplicaId, ReplicaLocator, ReplicaRole, ReplicaSetLocator, ReplicatedFabricQueue, ResourceGeneration, SchemaId,};use cookbook_recipe_43_provider_migration::{ aws_gcp_replica_policy, aws_gcp_worker_placement, aws_zone, gcp_region, WorkItem,};
let generation = ResourceGeneration(1);let replicas = aws_gcp_replica_policy();let locator = ReplicaSetLocator::new( generation, vec![ ReplicaLocator { replica_id: ReplicaId::new("orders-aws-a"), domain: aws_zone(), role: ReplicaRole::Voter, health: ReplicaHealth::Healthy, epoch: FenceEpoch(1), content_generation: generation.0, }, ReplicaLocator { replica_id: ReplicaId::new("orders-gcp-central"), domain: gcp_region(), role: ReplicaRole::Voter, health: ReplicaHealth::Healthy, epoch: FenceEpoch(1), content_generation: generation.0, }, ],);let mut queue = ReplicatedFabricQueue::<WorkItem>::new( LogicalResourceName::new("migration-orders"), SchemaId::new("migration-work.v1"), FenceEpoch(1), replicas.clone(), locator,)?;
let active_group = queue.create_consumer_group( ConsumerGroupName::new("active-workers"), aws_gcp_worker_placement(), DeliverySemantics::EffectivelyOnceWithIdempotency, FenceEpoch(1),)?;# let _ = active_group;# Ok::<(), grafos_replicated::ReplicatedError>(())Migration advances the locator generation before any Azure consumer is allowed to own active work:
use cookbook_recipe_43_provider_migration::{ aws_gcp_azure_worker_placement, azure_region,};
let new_generation = ResourceGeneration(generation.0 + 1);queue.observe_replica_health( azure_region(), grafos_replicated::ReplicaHealth::Healthy, grafos_replicated::HealthAuthorityId::new("scheduler-a"), grafos_replicated::HealthObservationSignature::test_signature(43), 1_000,)?;queue.set_locator(ReplicaSetLocator::new( new_generation, vec![ ReplicaLocator { replica_id: ReplicaId::new("orders-aws-a"), domain: aws_zone(), role: ReplicaRole::Voter, health: ReplicaHealth::Healthy, epoch: FenceEpoch(1), content_generation: new_generation.0, }, ReplicaLocator { replica_id: ReplicaId::new("orders-gcp-central"), domain: gcp_region(), role: ReplicaRole::Voter, health: ReplicaHealth::Healthy, epoch: FenceEpoch(1), content_generation: new_generation.0, }, ReplicaLocator { replica_id: ReplicaId::new("orders-azure-eastus"), domain: azure_region(), role: ReplicaRole::Voter, health: ReplicaHealth::Healthy, epoch: FenceEpoch(1), content_generation: new_generation.0, }, ],))?;
let catchup = queue.create_consumer_group( ConsumerGroupName::new("azure-catchup"), aws_gcp_azure_worker_placement(), DeliverySemantics::EffectivelyOnceWithIdempotency, FenceEpoch(1),)?;let deliveries = queue.poll(&catchup, &azure_region(), usize::MAX, FenceEpoch(1))?;# let _ = deliveries;# Ok::<(), grafos_replicated::ReplicatedError>(())Design
The recipe separates replica membership from consumer placement:
- The queue starts with an AWS+GCP replica policy and locator.
- Work is appended through
ReplicatedFabricQueue. - Azure polling through the original consumer group fails with
DomainUnavailable. - Migration publishes a new locator generation containing the Azure replica.
- A catch-up consumer group with Azure in its placement reads the committed stream from offset zero.
- Active consumers move to Azure by creating a new consumer group whose placement explicitly permits Azure.
- The original consumer group remains AWS+GCP only; migration does not widen it behind the caller’s back.
Why It Is Useful
Provider migration is often where hidden fallback sneaks into systems. A program starts as “AWS only,” then an operations script starts sending work to GCP or Azure under pressure. This recipe keeps the boundaries visible:
- replica membership changes are generationed;
- consumer placement changes are explicit program policy;
- catch-up is tested by reading committed offsets;
- work is neither dropped nor double-finalized during the provider shift.
Failure Modes
- Azure consumer before migration: rejected with
DomainUnavailable. - Replica generation does not advance: the locator publication fails closed instead of pretending migration happened.
- Catch-up cannot read committed records: migration does not proceed to active consumer shift.
- Original group after migration: still rejects Azure, proving provider expansion was not a hidden fallback.
Tests
Run it with:
cargo test -p cookbook-recipe-43-provider-migrationThe tests cover:
- Azure rejection before it is authorized;
- new replica generation and catch-up from the committed queue;
- explicit consumer shift to Azure;
- original consumer placement and cursor preservation.
See also:
- Recipe 39: Cross-Cloud Order Pipeline
- Recipe 42: Mirrored ETL Pipeline
crates/grafos-replicated