Runner

What it does

The Runner is the self-hosted process that executes the work the Server hands out. When a Module Job is dispatched, a Runner picks it up over its SignalR connection, fetches the Module source, materialises Inputs and Extra Files into the working directory, executes the configured engine commands (init, plan, apply, destroy, output), streams logs back to the Server in real time, and reports the final status.

The Runner is where credentials with real blast radius live: cloud provider keys, Kubernetes service accounts, on-prem API tokens. By deploying Runners with narrowly scoped permissions and binding Modules to them via Runner Supplies, you control which Modules can act on which infrastructure. The Server itself never holds these credentials.

The Runner is bundled with Snap CD in both editions and is not license-gated — Cloud-edition customers run their own Runners against snapcd.io, and Self-Hosted customers run them against their own Server.

Prerequisite Server Resources

The following resources must exist on the Server before a Runner can connect and pick up Jobs:

Resource	Notes
Service Principal	The identity the Runner authenticates as. The Runner uses its `ClientId` and `ClientSecret` to obtain JWTs from `/connect/token`
Runner record	The Runner binds to this record. Its `Id` is what the Runner reports via `Runner.Id`, and its `ServicePrincipalId` must reference the Service Principal above
Runner Supply	At least one, covering every Module the Runner is to handle. Created at Stack / Namespace / Module scope, or shortcut via `is_supplied_to_all_modules` on the Runner record

Other Prerequisites

What must be present on the Runner host or its network:

Prerequisite	Notes
Engine binary	`terraform`, `tofu` or `pulumi` must be present on `PATH` (or via `Engine.AdditionalBinaryPaths`) for whichever Engines you intend to use
Network egress to the Server	A long-lived outbound HTTPS / Websocket connection to the Server’s `/runnerhub`. No inbound ports are required on the Runner host
Provider credentials	Whatever the Modules running on this Runner need — cloud credentials, kubeconfig, on-prem tokens. Bound to the host (env vars, instance metadata, mounted kubeconfig)
Writable working directory	Path for fetched Module source and engine state. Defaults to `~/.snapcd/runner`

Deployment

See Deployment > Guide > Runner for the reference deployment repositories (Docker, Kubernetes, local) and the minimum Compose shape.

Connection model

A Runner connects to the Server’s /runnerhub with a JWT obtained via client_credentials against /connect/token. The connection is long-lived and bidirectional:

Server → Runner. Job dispatches, cancellation signals, and configuration updates
Runner → Server. Log envelopes, step status updates, terminal Job results

When the Runner connects, it announces its Runner.Instance name. The Server records the connection in the database. Job dispatch then targets either:

Any available Runner against this record (the default — first to respond handles the Job)
A specific Runner, when the Module sets runner_instance_name to pin to one by name

Reconnect and outages

The SignalR client reconnects automatically. While disconnected:

Outgoing log envelopes buffer in-process; on reconnect, the buffer flushes
The Runner does not pull new Jobs (it can’t — dispatch is push-based)
Jobs already in-flight continue executing locally; their terminal status posts on reconnect

The Server treats a Runner as offline once its RunnerConnection row is gone. A Runner that crashes hard mid-Job will resume reporting on its next start; the Server reconciles by treating any Job whose owning Runner has disconnected as eligible for the next available one.

Multiple Runners per record

When the Runner record has allow_multiple_instances = true, you can run replicas (for example a Kubernetes StatefulSet with replicas: 3). Job dispatch follows the Runner Selection model — by default the Server broadcasts each Job to all connected Runners and the first to respond handles it. Each replica must report a distinct Runner.Instance name.

Operations & observability

The Runner emits two kinds of logs:

Runtime / diagnostic logs — standard MEL ILogger output to stdout, filtered by the Logging.LogLevel section. These cover Runner startup, connection events, Job pickup and so on
Job logs — the engine output for each Job step, shipped to the Server in batches over the /runnerhub connection and visible in the Dashboard’s Jobs view. These are not written to stdout

Job-log shipping is batched: the Runner accumulates events for JobLogStream.PeriodSeconds (default 5) or up to JobLogStream.BatchSizeLimit (default 50), whichever comes first, then ships a single batch. The first event of each batch ships immediately when JobLogStream.EagerlyEmitFirstEvent is true (default), keeping initial Job output responsive on the Dashboard.

The Runner has no built-in dashboard. Operator-facing visibility is the Server’s Dashboard:

The Runners page shows each Runner record with its connected processes and a live online / offline badge
The Jobs view shows in-flight Jobs and streams their logs as they arrive

For the Runner host itself, treat it as a standard container workload: stdout to your log aggregator, container metrics to your usual collector.

Runner Environment Variables

The RunnerEnvVars section in the Runner’s appsettings.json defines environment variables that are injected into every engine process the Runner executes. These are merged with any environment variables resolved by the Server (e.g. from Module Inputs or Namespace Inputs with input_kind = "EnvVar"). When both sources define the same variable, the server-resolved value takes precedence.

{
  "RunnerEnvVars": {
    "SNAPCD_CLIENT_ID": "default",
    "SNAPCD_CLIENT_SECRET": "default"
  }
}

Use this for any environment variable that should be available to every engine process on this Runner — cloud credentials, feature flags, or authentication tokens. For example, the State Store HTTP backend uses SNAPCD_CLIENT_ID and SNAPCD_CLIENT_SECRET from this section to authenticate against the State Store API.

Settings

The Runner reads its settings from the standard layered pipeline described in Deployment > Settings. Production deployments typically source Runner.Credentials.ClientSecret from a vault via the External Settings provider rather than placing it in plain-text settings.

Generated from the Runner’s published JSON Schema — the same schema operators reference via "$schema" in their appsettings.json to get editor IntelliSense. Click any section to expand its fields.

Engine object

Discovery hints for the engine binaries (terraform, tofu, pulumi) the Runner invokes per Job. The Runner looks for binaries on PATH first; entries here extend that search.

AdditionalBinaryPaths array of string

Extra directories prepended to the Runner's binary-search path. Supports leading ~ expansion. Useful when an engine ships in a non-standard location — for example ~/.pulumi/bin for a per-user Pulumi install.

HooksPreapproval object

Optional content-based allowlist for Hook scripts the Runner is permitted to execute. When enabled, every Hook a Job tries to run must match (by SHA256) a file in the allowlist directory or it is refused. Intended for security-sensitive deployments where the set of shippable Hooks must be reviewed out-of-band.

Enabled boolean

Enable or disable hook pre-approval validation. When enabled, all incoming hooks must match a pre-approved hook from the PreapprovedHooksDirectory.

PreapprovedHooksDirectory string

Directory containing pre-approved hook scripts. Each file in this directory is considered a pre-approved hook. File names don't matter - only file content is used for validation.

JobLogStream object

Tunables for the per-Job log shipping pipeline that streams engine output back to the Server over SignalR. Defaults are sensible for typical workloads; tune BatchSizeLimit and PeriodSeconds together if you need lower per-log latency at the cost of more frequent network round-trips.

BatchSizeLimit integer

Default: 50

Maximum number of log events to ship in a single batch. The PeriodicBatchingSink will flush early when this size is reached even before PeriodSeconds elapses.

EagerlyEmitFirstEvent boolean

Default: true

When true, the first event in a fresh batch is emitted immediately rather than waiting for the period or size threshold. Keeps initial job output responsive.

PeriodSeconds integer

Default: 5

Maximum wall-clock interval, in seconds, between batch flushes. A batch ships whenever either BatchSizeLimit or this period is reached.

Logging object

Standard .NET Logging configuration. See https://learn.microsoft.com/dotnet/core/extensions/logging-configuration for the full reference. Provider-specific sub-blocks (Console, Debug, EventSource, etc.) are accepted but not enumerated here.

LogLevel object

Map of log category names (or category prefixes) to minimum log levels. 'Default' applies when no more-specific category matches; longer keys override shorter ones (Microsoft.AspNetCore beats Microsoft beats Default).

additional keys → string

Allowed values: Trace, Debug, Information, Warning, Error, Critical, None

Runner object

Identity, organisation and credentials that bind this Runner process to a Runner record on the Server. All four fields are required for the Runner to authenticate and connect.

Credentials object

Service Principal credentials the Runner authenticates with. The Service Principal referenced here must be the one bound to the Runner record via service_principal_id.

ClientId string

The Service Principal's client identifier, prefixed with the Organization ID at the token endpoint (the prefix is added automatically by the Runner; supply only the raw client ID here).

ClientSecret string

The Service Principal's client secret. Sensitive — production deployments should source this via the External Settings provider rather than committing it to appsettings.json.

Id string (uuid)

Identifier of the Runner record on the Server this process binds to.

Instance string

Name this Runner reports when it connects, used to distinguish replicas when allow_multiple_instances is set on the Runner record. Visible in the Dashboard's Runners page next to the parent record.

OrganizationId string (uuid)

Identifier of the Organization this Runner belongs to. Must match the Organization the Runner record below was created in.

Server object

Coordinates of the Snap CD Server the Runner connects to.

Url string

Base URL of the Snap CD Server, including scheme and port. The Runner opens its SignalR connection to {Url}/runnerhub and obtains JWTs from {Url}/connect/token.

WorkingDirectory object

Filesystem locations the Runner uses for fetched Module source and ephemeral state. Both paths support leading ~ expansion to the host user's home directory.

TempDirectory string

Directory for ephemeral per-Job scratch space. Cleaned between Jobs. Typically ~/.snapcd/runner/.temp.

WorkingDirectory string

Root directory under which the Runner persists fetched Module source, engine state and per-Job outputs. Must be writable by the Runner process. Typically ~/.snapcd/runner.

See the Resources area for per-resource semantics (Hooks, Engine, and so on).

Last updated on July 8, 2026