Proposal: Runtime Interface for Spindle Engines#
Status: Draft Author: @evanjarrett Date: 2025-12-09
Summary#
Extract a Runtime interface from the nixery engine to enable:
- A new
engine:dockerthat accepts user-specified images - Support for multiple container runtimes (Docker, Podman)
- Downstream implementations (e.g., Kubernetes in Loom)
Motivation#
Currently, the spindle engine architecture tightly couples workflow execution logic with Docker-specific container management in the nixery engine. This creates several limitations:
-
No user-specified images: Users must declare Nix dependencies; they can't use pre-built Docker images like
node:20orgolang:1.22 -
Single runtime: Only Docker daemon is supported; no path to Podman or other OCI runtimes
-
Downstream friction: Loom (Kubernetes-based spindle) must reimplement the entire Engine interface rather than reusing workflow parsing and step execution logic
Proposal#
New Runtime Interface#
Create spindle/models/runtime.go:
package models
import (
"context"
"io"
)
// Runtime abstracts container/job execution environments.
// Implementations: Docker, Podman, (downstream: Kubernetes)
type Runtime interface {
// Setup creates the execution environment and returns a handle.
// The environment should be ready for Exec calls after Setup returns.
Setup(ctx context.Context, opts SetupOpts) (Handle, error)
// Exec runs a command in the environment.
// For container runtimes, this is typically `exec` into a running container.
// For job-based runtimes (K8s), this may stream logs from a pre-defined job.
Exec(ctx context.Context, h Handle, opts ExecOpts) (*ExecResult, error)
// Destroy tears down the environment and releases resources.
Destroy(ctx context.Context, h Handle) error
}
// SetupOpts configures the execution environment.
type SetupOpts struct {
// Image is the container image to use (e.g., "node:20", "nixery.dev/shell/git")
Image string
// WorkflowID uniquely identifies this workflow run (used for labeling/naming)
WorkflowID WorkflowId
// WorkDir is the working directory inside the container (e.g., "/tangled/workspace")
WorkDir string
// Labels for the container/job (e.g., {"sh.tangled.pipeline/workflow_id": "..."})
Labels map[string]string
// Security options
DropAllCaps bool
AddCaps []string // e.g., ["DAC_OVERRIDE", "CHOWN", "FOWNER", "SETUID", "SETGID"]
// Architecture hint for multi-arch scheduling (used by K8s runtime)
Architecture string
}
// ExecOpts configures a single command execution.
type ExecOpts struct {
// Command to run (e.g., ["bash", "-c", "npm install"])
Command []string
// Environment variables
Env []string
// Output streams (nil = discard)
Stdout io.Writer
Stderr io.Writer
}
// ExecResult contains the outcome of an Exec call.
type ExecResult struct {
ExitCode int
OOMKilled bool
}
// Handle is an opaque reference to an execution environment.
type Handle interface {
// ID returns a unique identifier for this environment (container ID, job name, etc.)
ID() string
}
// RuntimeMode indicates how the runtime executes steps.
type RuntimeMode int
const (
// RuntimeModeExec means steps are executed one at a time via Exec calls.
// Used by Docker/Podman where we exec into a running container.
RuntimeModeExec RuntimeMode = iota
// RuntimeModeBatch means all steps run in a single invocation.
// Used by Kubernetes where a Job runs all steps and the engine streams logs.
// In this mode, Exec() streams logs rather than executing commands.
RuntimeModeBatch
)
// RuntimeInfo provides metadata about a runtime implementation.
type RuntimeInfo interface {
Mode() RuntimeMode
}
Refactored Engine Structure#
Engines become thin wrappers focused on image resolution and step generation:
// engines/base/engine.go - shared logic
package base
type Engine struct {
Runtime models.Runtime
Logger *slog.Logger
Handles map[string]models.Handle
mu sync.Mutex
}
func (e *Engine) SetupWorkflow(ctx context.Context, wid models.WorkflowId, wf *models.Workflow) error {
data := wf.Data.(WorkflowData)
handle, err := e.Runtime.Setup(ctx, models.SetupOpts{
Image: data.Image,
WorkflowID: wid,
WorkDir: "/tangled/workspace",
Labels: map[string]string{"sh.tangled.pipeline/workflow_id": wid.String()},
DropAllCaps: true,
AddCaps: []string{"DAC_OVERRIDE", "CHOWN", "FOWNER", "SETUID", "SETGID"},
})
if err != nil {
return err
}
e.mu.Lock()
e.Handles[wid.String()] = handle
e.mu.Unlock()
// Create workspace directories
_, err = e.Runtime.Exec(ctx, handle, models.ExecOpts{
Command: []string{"mkdir", "-p", "/tangled/workspace", "/tangled/home"},
})
return err
}
func (e *Engine) RunStep(ctx context.Context, wid models.WorkflowId, w *models.Workflow, idx int, secrets []secrets.UnlockedSecret, wfLogger *models.WorkflowLogger) error {
e.mu.Lock()
handle := e.Handles[wid.String()]
e.mu.Unlock()
step := w.Steps[idx]
env := buildEnvs(w.Environment, step, secrets)
var stdout, stderr io.Writer
if wfLogger != nil {
stdout = wfLogger.DataWriter(idx, "stdout")
stderr = wfLogger.DataWriter(idx, "stderr")
}
result, err := e.Runtime.Exec(ctx, handle, models.ExecOpts{
Command: []string{"bash", "-c", step.Command()},
Env: env,
Stdout: stdout,
Stderr: stderr,
})
if err != nil {
return err
}
if result.OOMKilled {
return ErrOOMKilled
}
if result.ExitCode != 0 {
return engine.ErrWorkflowFailed
}
return nil
}
func (e *Engine) DestroyWorkflow(ctx context.Context, wid models.WorkflowId) error {
e.mu.Lock()
handle, exists := e.Handles[wid.String()]
delete(e.Handles, wid.String())
e.mu.Unlock()
if !exists {
return nil
}
return e.Runtime.Destroy(ctx, handle)
}
Engine Implementations#
Nixery Engine (image from dependencies):
// engines/nixery/engine.go
package nixery
type Engine struct {
*base.Engine
cfg *config.Config
}
func (e *Engine) InitWorkflow(twf tangled.Pipeline_Workflow, tpl tangled.Pipeline) (*models.Workflow, error) {
var spec struct {
Steps []StepSpec `yaml:"steps"`
Dependencies map[string][]string `yaml:"dependencies"`
Environment map[string]string `yaml:"environment"`
}
yaml.Unmarshal([]byte(twf.Raw), &spec)
// NIXERY-SPECIFIC: Build image URL from dependencies
image := workflowImage(spec.Dependencies, e.cfg.NixeryPipelines.Nixery)
steps := []models.Step{}
// Add nixery-specific setup steps
steps = append(steps, nixConfStep())
steps = append(steps, models.BuildCloneStep(twf, *tpl.TriggerMetadata, e.cfg.Server.Dev))
if depStep := dependencyStep(spec.Dependencies); depStep != nil {
steps = append(steps, *depStep)
}
// Add user steps
for _, s := range spec.Steps {
steps = append(steps, Step{name: s.Name, command: s.Command, ...})
}
return &models.Workflow{
Name: twf.Name,
Steps: steps,
Environment: spec.Environment,
Data: base.WorkflowData{Image: image},
}, nil
}
func (e *Engine) WorkflowTimeout() time.Duration {
// ... existing config-based timeout
}
Docker Engine (user-specified image):
// engines/docker/engine.go
package docker
type Engine struct {
*base.Engine
}
func (e *Engine) InitWorkflow(twf tangled.Pipeline_Workflow, tpl tangled.Pipeline) (*models.Workflow, error) {
var spec struct {
Image string `yaml:"image"`
Steps []StepSpec `yaml:"steps"`
Environment map[string]string `yaml:"environment"`
}
yaml.Unmarshal([]byte(twf.Raw), &spec)
// DOCKER-SPECIFIC: Require explicit image
if spec.Image == "" {
return nil, fmt.Errorf("docker engine requires 'image' field in workflow")
}
steps := []models.Step{}
// Add clone step (shared with nixery)
steps = append(steps, models.BuildCloneStep(twf, *tpl.TriggerMetadata, false))
// Add user steps
for _, s := range spec.Steps {
steps = append(steps, SimpleStep{Name: s.Name, Command: s.Command, ...})
}
return &models.Workflow{
Name: twf.Name,
Steps: steps,
Environment: spec.Environment,
Data: base.WorkflowData{Image: spec.Image},
}, nil
}
func (e *Engine) WorkflowTimeout() time.Duration {
return 1 * time.Hour // default
}
Runtime Implementations#
Docker Runtime:
// runtime/docker/runtime.go
package docker
type Runtime struct {
client client.APIClient
logger *slog.Logger
}
type handle struct {
containerID string
networkID string
}
func (h *handle) ID() string { return h.containerID }
func (r *Runtime) Setup(ctx context.Context, opts models.SetupOpts) (models.Handle, error) {
// Create network
netResp, _ := r.client.NetworkCreate(ctx, networkName(opts.WorkflowID), ...)
// Pull image
reader, _ := r.client.ImagePull(ctx, opts.Image, image.PullOptions{})
io.Copy(io.Discard, reader)
reader.Close()
// Create container
resp, _ := r.client.ContainerCreate(ctx, &container.Config{
Image: opts.Image,
Cmd: []string{"cat"},
OpenStdin: true,
WorkingDir: opts.WorkDir,
Labels: opts.Labels,
}, &container.HostConfig{
CapDrop: capDrop(opts.DropAllCaps),
CapAdd: opts.AddCaps,
// ... mounts, security opts
}, nil, nil, "")
r.client.ContainerStart(ctx, resp.ID, container.StartOptions{})
return &handle{containerID: resp.ID, networkID: netResp.ID}, nil
}
func (r *Runtime) Exec(ctx context.Context, h models.Handle, opts models.ExecOpts) (*models.ExecResult, error) {
dh := h.(*handle)
execResp, _ := r.client.ContainerExecCreate(ctx, dh.containerID, container.ExecOptions{
Cmd: opts.Command,
Env: opts.Env,
AttachStdout: true,
AttachStderr: true,
})
attach, _ := r.client.ContainerExecAttach(ctx, execResp.ID, container.ExecAttachOptions{})
defer attach.Close()
stdcopy.StdCopy(opts.Stdout, opts.Stderr, attach.Reader)
inspect, _ := r.client.ContainerExecInspect(ctx, execResp.ID)
// Check OOMKilled
containerInspect, _ := r.client.ContainerInspect(ctx, dh.containerID)
return &models.ExecResult{
ExitCode: inspect.ExitCode,
OOMKilled: containerInspect.State.OOMKilled,
}, nil
}
func (r *Runtime) Destroy(ctx context.Context, h models.Handle) error {
dh := h.(*handle)
r.client.ContainerStop(ctx, dh.containerID, container.StopOptions{})
r.client.ContainerRemove(ctx, dh.containerID, container.RemoveOptions{RemoveVolumes: true})
r.client.NetworkRemove(ctx, dh.networkID)
return nil
}
Podman Runtime:
// runtime/podman/runtime.go
package podman
// Podman is API-compatible with Docker for most operations.
// This runtime uses the Podman socket instead of Docker socket.
type Runtime struct {
client *podman.APIClient // or use Docker client with Podman socket
logger *slog.Logger
}
// Implementation nearly identical to Docker runtime.
// Key differences:
// - Socket path: /run/user/1000/podman/podman.sock (rootless) or /run/podman/podman.sock
// - Some API differences in network handling
// - Native support for rootless containers
Kubernetes Runtime (Downstream - Loom)#
// In loom repo: runtime/kubernetes/runtime.go
package kubernetes
type Runtime struct {
client client.Client
config *rest.Config
namespace string
template SpindleTemplate
}
func (r *Runtime) Mode() models.RuntimeMode {
return models.RuntimeModeBatch // All steps run in single Job
}
func (r *Runtime) Setup(ctx context.Context, opts models.SetupOpts) (models.Handle, error) {
// Build Job spec with:
// - Init containers for setup (user namespace, clone, etc.)
// - Main container running loom-runner with all steps
// - Node affinity based on opts.Architecture
job := jobbuilder.BuildJob(jobbuilder.WorkflowConfig{
Image: opts.Image,
Architecture: opts.Architecture,
// ... steps passed via ConfigMap
})
r.client.Create(ctx, job)
// Wait for pod to be running
pod := waitForPod(ctx, job)
return &k8sHandle{
jobName: job.Name,
podName: pod.Name,
}, nil
}
func (r *Runtime) Exec(ctx context.Context, h models.Handle, opts models.ExecOpts) (*models.ExecResult, error) {
// In batch mode, Exec streams logs rather than executing commands.
// The loom-runner binary inside the Job executes all steps.
// This method reads log output and returns when the step completes.
kh := h.(*k8sHandle)
// Stream logs from pod, parse JSON log lines from loom-runner
// Return when step end marker is seen
return &models.ExecResult{ExitCode: 0}, nil
}
func (r *Runtime) Destroy(ctx context.Context, h models.Handle) error {
kh := h.(*k8sHandle)
// Delete Job (GC handles pod cleanup)
return r.client.Delete(ctx, &batchv1.Job{ObjectMeta: metav1.ObjectMeta{Name: kh.jobName}})
}
Workflow YAML Examples#
engine:nixery (current behavior):
engine: nixery
dependencies:
nixpkgs:
- nodejs
- python3
steps:
- name: build
command: npm run build
engine:docker (new):
engine: docker
image: node:20-alpine
steps:
- name: install
command: npm ci
- name: build
command: npm run build
- name: test
command: npm test
engine:kubernetes (downstream, in Loom):
engine: kubernetes
image: golang:1.22
architecture: arm64
steps:
- name: build
command: go build ./...
- name: test
command: go test ./...
Server Wiring#
// server.go
func Run(ctx context.Context) error {
cfg, err := config.Load(ctx)
// Create runtime based on config
var rt models.Runtime
switch cfg.Runtime.Type {
case "docker":
dockerClient, _ := client.NewClientWithOpts(client.FromEnv)
rt = docker.NewRuntime(dockerClient, logger)
case "podman":
podmanClient, _ := podman.NewClient(cfg.Runtime.Podman.Socket)
rt = podman.NewRuntime(podmanClient, logger)
default:
rt = docker.NewRuntime(...) // default
}
// Create engines with shared runtime
nixeryEng := nixery.New(rt, cfg, logger)
dockerEng := dockerengine.New(rt, logger)
s, _ := New(ctx, cfg, map[string]models.Engine{
"nixery": nixeryEng,
"docker": dockerEng,
})
return s.Start(ctx)
}
Configuration#
# spindle.toml
[runtime]
type = "docker" # or "podman"
[runtime.docker]
# Uses DOCKER_HOST env var by default
[runtime.podman]
socket = "/run/user/1000/podman/podman.sock"
Migration Path#
Phase 1: Extract Runtime Interface#
- Add
models/runtime.gowith interface definition - Add
runtime/docker/implementation - Refactor nixery to use docker runtime internally
- No breaking changes to existing workflows
Phase 2: Add Docker Engine#
- Add
engines/docker/that uses same runtime - Register as
"docker"in engine map - Users can now use
engine: dockerwith explicit images
Phase 3: Add Podman Runtime#
- Add
runtime/podman/implementation - Add config option to select runtime
- Podman users can run nixery/docker engines without Docker daemon
Phase 4: Downstream Kubernetes (Loom)#
- Loom implements
runtime/kubernetes/ - Can register
"kubernetes"engine or reuse"docker"/"nixery"engines with K8s runtime - Maintains current Job + loom-runner architecture
Alternatives Considered#
1. Keep engines monolithic#
- Pro: Simpler, less abstraction
- Con: Code duplication, can't swap runtimes, harder for downstream
2. Docker-in-Docker for Kubernetes#
- Pro: Identical behavior to local execution
- Con: Security concerns, complexity, resource overhead
3. Runtime as engine parameter#
- Pro: More flexible per-workflow
- Con: Overcomplicates workflow YAML, runtime is deployment choice not user choice
Open Questions#
-
Should runtime selection be per-engine or global?
- Proposal: Global (deployment config), not per-workflow
-
How to handle runtime-specific features?
- E.g., K8s node affinity, Docker network modes
- Proposal:
SetupOptshas optional fields; runtimes ignore unsupported options
-
Should we upstream the Kubernetes runtime?
- Proposal: No, keep in Loom. Upstream provides interface, downstream implements.
-
Podman rootless considerations?
- User namespaces, different capability handling
- Need testing matrix