Runner

What it does

The Runner is the self-hosted process that executes the work the Server hands out. When a Module Job is dispatched, a Runner picks it up over its SignalR connection, fetches the Module source, materialises Inputs and Extra Files into the working directory, executes the configured engine commands (init, plan, apply, destroy, output), streams logs back to the Server in real time, and reports the final status.

The Runner is where credentials with real blast radius live: cloud provider keys, Kubernetes service accounts, on-prem API tokens. By deploying Runners with narrowly scoped permissions and binding Modules to them via Runner Assignments, you control which Modules can act on which infrastructure. The Server itself never holds these credentials.

The Runner is bundled with Snap CD in both editions and is not license-gated — Cloud-edition customers run their own Runners against snapcd.io, and Self-Hosted customers run them against their own Server.

Prerequisite Server Resources

The following resources must exist on the Server before a Runner can connect and pick up Jobs:

ResourceNotes
Service PrincipalThe identity the Runner authenticates as. The Runner uses its ClientId and ClientSecret to obtain JWTs from /connect/token
Runner recordThe Runner binds to this record. Its Id is what the Runner reports via Runner.Id, and its ServicePrincipalId must reference the Service Principal above
Runner AssignmentAt least one, covering every Module the Runner is to handle. Created at Stack / Namespace / Module scope, or shortcut via is_assigned_to_all_modules on the Runner record

Other Prerequisites

What must be present on the Runner host or its network:

PrerequisiteNotes
Engine binaryterraform, tofu or pulumi must be present on PATH (or via Engine.AdditionalBinaryPaths) for whichever Engines you intend to use
Network egress to the ServerA long-lived outbound HTTPS / Websocket connection to the Server’s /runnerhub. No inbound ports are required on the Runner host
Provider credentialsWhatever the Modules running on this Runner need — cloud credentials, kubeconfig, on-prem tokens. Bound to the host (env vars, instance metadata, mounted kubeconfig)
Writable working directoryPath for fetched Module source and engine state. Defaults to ~/.snapcd/runner

Deployment

See Deployment > Guide > Runner for the reference deployment repositories (Docker, Kubernetes, local) and the minimum Compose shape.

Connection model

A Runner connects to the Server’s /runnerhub with a JWT obtained via client_credentials against /connect/token. The connection is long-lived and bidirectional:

  • Server → Runner. Job dispatches, cancellation signals, and configuration updates
  • Runner → Server. Log envelopes, step status updates, terminal Job results

When the Runner connects, it announces its Runner.Instance name. The Server records the connection in the database. Job dispatch then targets either:

  • Any available Runner against this record (the default — first to respond handles the Job)
  • A specific Runner, when the Module sets runner_instance_name to pin to one by name

Reconnect and outages

The SignalR client reconnects automatically. While disconnected:

  • Outgoing log envelopes buffer in-process; on reconnect, the buffer flushes
  • The Runner does not pull new Jobs (it can’t — dispatch is push-based)
  • Jobs already in-flight continue executing locally; their terminal status posts on reconnect

The Server treats a Runner as offline once its RunnerConnection row is gone. A Runner that crashes hard mid-Job will resume reporting on its next start; the Server reconciles by treating any Job whose owning Runner has disconnected as eligible for the next available one.

Multiple Runners per record

When the Runner record has allow_multiple_instances = true, you can run replicas (for example a Kubernetes StatefulSet with replicas: 3). Job dispatch follows the Runner Selection model — by default the Server broadcasts each Job to all connected Runners and the first to respond handles it. Each replica must report a distinct Runner.Instance name.

Operations & observability

The Runner emits two kinds of logs:

  • Runtime / diagnostic logs — standard MEL ILogger output to stdout, filtered by the Logging.LogLevel section. These cover Runner startup, connection events, Job pickup and so on
  • Job logs — the engine output for each Job step, shipped to the Server in batches over the /runnerhub connection and visible in the Dashboard’s Jobs view. These are not written to stdout

Job-log shipping is batched: the Runner accumulates events for JobLogStream.PeriodSeconds (default 5) or up to JobLogStream.BatchSizeLimit (default 50), whichever comes first, then ships a single batch. The first event of each batch ships immediately when JobLogStream.EagerlyEmitFirstEvent is true (default), keeping initial Job output responsive on the Dashboard.

The Runner has no built-in dashboard. Operator-facing visibility is the Server’s Dashboard:

  • The Runners page shows each Runner record with its connected processes and a live online / offline badge
  • The Jobs view shows in-flight Jobs and streams their logs as they arrive

For the Runner host itself, treat it as a standard container workload: stdout to your log aggregator, container metrics to your usual collector.

Settings

The Runner reads its settings from the standard layered pipeline described in Deployment > Settings. Production deployments typically source Runner.Credentials.ClientSecret from a vault via the External Settings provider rather than placing it in plain-text settings.

Generated from the Runner’s published JSON Schema — the same schema operators reference via "$schema" in their appsettings.json to get editor IntelliSense. Click any section to expand its fields.

Engine object

Discovery hints for the engine binaries (terraform, tofu, pulumi) the Runner invokes per Job. The Runner looks for binaries on PATH first; entries here extend that search.

AdditionalBinaryPaths array of string

Extra directories prepended to the Runner's binary-search path. Supports leading ~ expansion. Useful when an engine ships in a non-standard location — for example ~/.pulumi/bin for a per-user Pulumi install.

HooksPreapproval object

Optional content-based allowlist for Hook scripts the Runner is permitted to execute. When enabled, every Hook a Job tries to run must match (by SHA256) a file in the allowlist directory or it is refused. Intended for security-sensitive deployments where the set of shippable Hooks must be reviewed out-of-band.

Enabled boolean

Enable or disable hook pre-approval validation. When enabled, all incoming hooks must match a pre-approved hook from the PreapprovedHooksDirectory.

PreapprovedHooksDirectory string

Directory containing pre-approved hook scripts. Each file in this directory is considered a pre-approved hook. File names don't matter - only file content is used for validation.

JobLogStream object

Tunables for the per-Job log shipping pipeline that streams engine output back to the Server over SignalR. Defaults are sensible for typical workloads; tune BatchSizeLimit and PeriodSeconds together if you need lower per-log latency at the cost of more frequent network round-trips.

BatchSizeLimit integer

Default: 50

Maximum number of log events to ship in a single batch. The PeriodicBatchingSink will flush early when this size is reached even before PeriodSeconds elapses.

EagerlyEmitFirstEvent boolean

Default: true

When true, the first event in a fresh batch is emitted immediately rather than waiting for the period or size threshold. Keeps initial job output responsive.

PeriodSeconds integer

Default: 5

Maximum wall-clock interval, in seconds, between batch flushes. A batch ships whenever either BatchSizeLimit or this period is reached.

Logging object

Standard .NET Logging configuration. See https://learn.microsoft.com/dotnet/core/extensions/logging-configuration for the full reference. Provider-specific sub-blocks (Console, Debug, EventSource, etc.) are accepted but not enumerated here.

LogLevel object

Map of log category names (or category prefixes) to minimum log levels. 'Default' applies when no more-specific category matches; longer keys override shorter ones (Microsoft.AspNetCore beats Microsoft beats Default).

additional keys string

Allowed values: Trace, Debug, Information, Warning, Error, Critical, None

Runner object

Identity, organisation and credentials that bind this Runner process to a Runner record on the Server. All four fields are required for the Runner to authenticate and connect.

Credentials object

Service Principal credentials the Runner authenticates with. The Service Principal referenced here must be the one bound to the Runner record via service_principal_id.

ClientId string

The Service Principal's client identifier, prefixed with the Organization ID at the token endpoint (the prefix is added automatically by the Runner; supply only the raw client ID here).

ClientSecret string

The Service Principal's client secret. Sensitive — production deployments should source this via the External Settings provider rather than committing it to appsettings.json.

Id string (uuid)

Identifier of the Runner record on the Server this process binds to.

Instance string

Name this Runner reports when it connects, used to distinguish replicas when allow_multiple_instances is set on the Runner record. Visible in the Dashboard's Runners page next to the parent record.

OrganizationId string (uuid)

Identifier of the Organization this Runner belongs to. Must match the Organization the Runner record below was created in.

Server object

Coordinates of the Snap CD Server the Runner connects to.

Url string

Base URL of the Snap CD Server, including scheme and port. The Runner opens its SignalR connection to {Url}/runnerhub and obtains JWTs from {Url}/connect/token.

WorkingDirectory object

Filesystem locations the Runner uses for fetched Module source and ephemeral state. Both paths support leading ~ expansion to the host user's home directory.

TempDirectory string

Directory for ephemeral per-Job scratch space. Cleaned between Jobs. Typically ~/.snapcd/runner/.temp.

WorkingDirectory string

Root directory under which the Runner persists fetched Module source, engine state and per-Job outputs. Must be writable by the Runner process. Typically ~/.snapcd/runner.

See the Resources area for per-resource semantics (Hooks, Engine, and so on).

Last updated on