Skip to content

Recipe 35: Web Server With Leased Ingress and Replicas

The Idea

A web server on grafOS is not a host process that happens to use fabric resources. It is a native service: a set of tasklet replicas, each holding a leased listener port, leased CPU, and leased memory, placed and managed by the scheduler.

This recipe shows how to define, place, run, and fail over a replicated HTTP echo service using native grafOS primitives — no ProgramRuntime, no POSIX sockets, no ambient host networking.

What You Need

  • A fabric with at least two nodes (sim mode works fine)
  • grafos-scheduler with the scheduler feature
  • grafos-std for the tasklet SDK
  • A tasklet WASM module implementing the HTTP echo handler

Step 1: Define and Deploy the Service

A native service starts with a ServiceSpec — the full description of what to run, how to replicate it, and what resources each instance needs.

use grafos_core::ResourceKind;
use grafos_scheduler::{
NodeConstraint, Priority, ReplicationMode, ResourceRequirement, ServiceId,
ServiceSpec, Strategy, TenantId,
};
let spec = ServiceSpec {
service_id: ServiceId {
name: "echo-http".into(),
tenant_id: TenantId(7),
},
version: "1.0.0".into(),
module_hash: echo_module_hash,
wasm: echo_module_bytes,
replication: ReplicationMode::ActiveActive { replica_count: 2 },
priority: Priority::Standard,
strategy: Strategy::Spread,
node_constraint: NodeConstraint::Any,
anti_affinity_services: vec![],
resources_per_instance: vec![
ResourceRequirement { resource_type: ResourceKind::Cpu, capacity: 1 },
ResourceRequirement { resource_type: ResourceKind::Mem, capacity: 64 * 1024 * 1024 },
],
listener_port: 8080,
max_sessions: 64,
drain_deadline_secs: 30,
required_rights: 0,
};
// Deploy through the orchestrator (acquires leases, places instances).
let (service_id, events) = orchestrator.deploy(spec, &capacity_ledger, now)?;

Core grafOS API Path

The scheduler-facing path is ServiceSpec into ServiceOrchestrator::deploy. The orchestrator uses the service transport to acquire a CPU lease, listener lease, memory leases, and then submit the tasklet with service capabilities. Routing then reads the resulting ServiceTopology:

use grafos_scheduler::{RoutingPolicy, ServiceResolver};
let (service_id, _events) = orchestrator.deploy(spec, &capacity_ledger, now)?;
let topology = orchestrator
.get_topology(&service_id)
.expect("service was just deployed");
let mut resolver = ServiceResolver::new();
let endpoint = resolver.resolve(topology, &RoutingPolicy::RoundRobin)?;
# let _ = endpoint;
# Ok::<(), Box<dyn std::error::Error>>(())

Each placed instance gets:

  • A listener lease — exclusive authority over port 8080 on that node
  • A CPU lease — execution capacity for the tasklet
  • A memory lease — working memory for request buffers and state

Step 2: Submit the Tasklet

Each replica runs the same WASM module. The tasklet is submitted through a CpuLease, which manages execution capacity.

use grafos_std::cpu::CpuBuilder;
// Acquire a CPU lease, then submit the tasklet module.
let cpu_lease = CpuBuilder::new()
.cores(1)
.lease_secs(300)
.acquire()?;
let result = cpu_lease.cpu()
.submit(&echo_module_bytes)
.fuel(1_000_000)
.input(b"")
.launch()?;

Inside the tasklet, the handler uses service hostcalls from grafos_svc_v0 (see docs/grafos/service-abi-v0.md for the full ABI):

// Inside the WASM tasklet
// svc_listen requires a cap_handle with RIGHTS_SVC_LISTEN for the port.
let listener = svc_listen(cap_handle, 8080, 64)?;
loop {
let session = svc_accept(listener)?;
if session < 0 { continue; } // no pending connection
// svc_read / svc_write operate on session handles, not file descriptors.
let mut buf = [0u8; 1024];
let n = svc_read(session, &mut buf)?;
svc_write(session, b"HTTP/1.1 200 OK\r\n\r\necho")?;
svc_close(session)?;
}

No POSIX sockets. No bind() / accept() / read() / write(). The service hostcalls operate on leased listener handles, not file descriptors.

Step 3: Clients Discover and Route

A client finds the service by resolving its topology, not by node address.

use grafos_scheduler::service_resolver::{ServiceResolver, RoutingPolicy};
let mut resolver = ServiceResolver::new();
// The topology comes from the orchestrator's service state.
let topology = orchestrator.get_topology(&service_id).expect("service topology");
let endpoint = resolver.resolve(
&topology,
&RoutingPolicy::RoundRobin,
)?;
// endpoint.node_id, endpoint.listener_port, endpoint.instance_id

The ResolvedEndpointCache (5s TTL, generation-aware) can wrap the resolver to avoid re-resolving on every call. See native-service-routing-model.md section 3a for cache semantics.

Step 4: Survive a Node Failure

When a node dies, the scheduler detects the lease expiry and triggers failover:

  1. The dead replica’s listener lease expires → instance transitions to Fenced.
  2. The ServiceOrchestrator starts a replacement on another node.
  3. The replacement acquires a new listener lease and starts the tasklet.
  4. The resolver’s cached entry is invalidated by the generation bump.
  5. New client requests route to the surviving replica immediately; the replacement starts receiving traffic once it is Active.

No DNS update. No load balancer reconfiguration. No process restart. The lease model handles it.

// Trigger failover for a failed instance.
let events = orchestrator.failover(
&service_id,
failed_instance_id,
&capacity_ledger,
now,
)?;
// events: [FailoverStarted, InstanceProvisioned, ...]
// orchestrator.tick() advances the state machine to completion.

Step 5: Planned Cutover

To move the service to a new node (rolling deploy, hardware maintenance):

  1. Call cutover() on the instance to be replaced.
  2. The orchestrator provisions a replacement on a different node.
  3. The old instance drains: stops accepting new sessions, waits for in-flight sessions to complete (bounded by drain_deadline_secs).
  4. Old listener lease is revoked after drain completes.
  5. Generation bumps at each step keep the resolver’s cache fresh.
// Initiate a planned cutover for a specific instance.
let events = orchestrator.cutover(
&service_id,
old_instance_id,
&capacity_ledger,
now,
)?;
// events: [CutoverStarted, InstanceProvisioned, ...]
// orchestrator.tick() drives: Drain → Revoke → Replace → CutoverCompleted

Why This Matters

In a traditional system, “deploy a web server with failover” means: write a server binary, package it, configure a process manager, set up a load balancer, write health checks, configure DNS failover, and hope the pieces agree about what “healthy” means.

In grafOS, the service is the lease graph. The listener lease is the authority to accept connections. The CPU lease is the authority to execute. The memory lease is the authority to allocate buffers. When any lease expires, the resource is fenced and the scheduler replaces it. There is no gap between “the service is running” and “the resources are leased.”

Failure Modes

FailureBehavior
Node diesListener lease expires → fenced → failover to replacement
Tasklet crashesCPU lease intact; scheduler can resubmit on same node
Listener lease revokedInstance transitions to Fenced; new sessions rejected
All replicas downResolver returns error; client gets explicit “no healthy replica”
Network partitionLease renewal fails → expiry → fenced; partition heals → re-place

Testing This Recipe

In sim mode:

use grafos_testkit::SimFabric;
let mut fabric = SimFabric::new(4); // 4 simulated nodes
// Place service, submit tasklets, verify routing...
// Partition node_a, verify failover...
// Heal partition, verify re-placement...

See Also

  • docs/grafos/native-service-model.md — service primitive definition
  • docs/grafos/native-service-topology-model.md — topology/failover model
  • docs/grafos/native-service-routing-model.md — discovery/routing/cache
  • docs/grafos/service-abi-v0.md — service hostcall ABI reference
  • docs/runbooks/service-routing-runbook.md — operational guidance
  • Recipe 7 (zero-copy microservices) — memory-transport RPC, not service placement
  • Recipe 36 (stateful KV with fabric storage) — adds durable state to this pattern