# Proposal: Runtime Interface for Spindle Engines **Status:** Draft **Author:** @evanjarrett **Date:** 2025-12-09 ## Summary Extract a `Runtime` interface from the nixery engine to enable: 1. A new `engine:docker` that accepts user-specified images 2. Support for multiple container runtimes (Docker, Podman) 3. Downstream implementations (e.g., Kubernetes in Loom) ## Motivation Currently, the spindle engine architecture tightly couples workflow execution logic with Docker-specific container management in the nixery engine. This creates several limitations: 1. **No user-specified images**: Users must declare Nix dependencies; they can't use pre-built Docker images like `node:20` or `golang:1.22` 2. **Single runtime**: Only Docker daemon is supported; no path to Podman or other OCI runtimes 3. **Downstream friction**: Loom (Kubernetes-based spindle) must reimplement the entire Engine interface rather than reusing workflow parsing and step execution logic ## Proposal ### New Runtime Interface Create `spindle/models/runtime.go`: ```go package models import ( "context" "io" ) // Runtime abstracts container/job execution environments. // Implementations: Docker, Podman, (downstream: Kubernetes) type Runtime interface { // Setup creates the execution environment and returns a handle. // The environment should be ready for Exec calls after Setup returns. Setup(ctx context.Context, opts SetupOpts) (Handle, error) // Exec runs a command in the environment. // For container runtimes, this is typically `exec` into a running container. // For job-based runtimes (K8s), this may stream logs from a pre-defined job. Exec(ctx context.Context, h Handle, opts ExecOpts) (*ExecResult, error) // Destroy tears down the environment and releases resources. Destroy(ctx context.Context, h Handle) error } // SetupOpts configures the execution environment. type SetupOpts struct { // Image is the container image to use (e.g., "node:20", "nixery.dev/shell/git") Image string // WorkflowID uniquely identifies this workflow run (used for labeling/naming) WorkflowID WorkflowId // WorkDir is the working directory inside the container (e.g., "/tangled/workspace") WorkDir string // Labels for the container/job (e.g., {"sh.tangled.pipeline/workflow_id": "..."}) Labels map[string]string // Security options DropAllCaps bool AddCaps []string // e.g., ["DAC_OVERRIDE", "CHOWN", "FOWNER", "SETUID", "SETGID"] // Architecture hint for multi-arch scheduling (used by K8s runtime) Architecture string } // ExecOpts configures a single command execution. type ExecOpts struct { // Command to run (e.g., ["bash", "-c", "npm install"]) Command []string // Environment variables Env []string // Output streams (nil = discard) Stdout io.Writer Stderr io.Writer } // ExecResult contains the outcome of an Exec call. type ExecResult struct { ExitCode int OOMKilled bool } // Handle is an opaque reference to an execution environment. type Handle interface { // ID returns a unique identifier for this environment (container ID, job name, etc.) ID() string } // RuntimeMode indicates how the runtime executes steps. type RuntimeMode int const ( // RuntimeModeExec means steps are executed one at a time via Exec calls. // Used by Docker/Podman where we exec into a running container. RuntimeModeExec RuntimeMode = iota // RuntimeModeBatch means all steps run in a single invocation. // Used by Kubernetes where a Job runs all steps and the engine streams logs. // In this mode, Exec() streams logs rather than executing commands. RuntimeModeBatch ) // RuntimeInfo provides metadata about a runtime implementation. type RuntimeInfo interface { Mode() RuntimeMode } ``` ### Refactored Engine Structure Engines become thin wrappers focused on image resolution and step generation: ```go // engines/base/engine.go - shared logic package base type Engine struct { Runtime models.Runtime Logger *slog.Logger Handles map[string]models.Handle mu sync.Mutex } func (e *Engine) SetupWorkflow(ctx context.Context, wid models.WorkflowId, wf *models.Workflow) error { data := wf.Data.(WorkflowData) handle, err := e.Runtime.Setup(ctx, models.SetupOpts{ Image: data.Image, WorkflowID: wid, WorkDir: "/tangled/workspace", Labels: map[string]string{"sh.tangled.pipeline/workflow_id": wid.String()}, DropAllCaps: true, AddCaps: []string{"DAC_OVERRIDE", "CHOWN", "FOWNER", "SETUID", "SETGID"}, }) if err != nil { return err } e.mu.Lock() e.Handles[wid.String()] = handle e.mu.Unlock() // Create workspace directories _, err = e.Runtime.Exec(ctx, handle, models.ExecOpts{ Command: []string{"mkdir", "-p", "/tangled/workspace", "/tangled/home"}, }) return err } func (e *Engine) RunStep(ctx context.Context, wid models.WorkflowId, w *models.Workflow, idx int, secrets []secrets.UnlockedSecret, wfLogger *models.WorkflowLogger) error { e.mu.Lock() handle := e.Handles[wid.String()] e.mu.Unlock() step := w.Steps[idx] env := buildEnvs(w.Environment, step, secrets) var stdout, stderr io.Writer if wfLogger != nil { stdout = wfLogger.DataWriter(idx, "stdout") stderr = wfLogger.DataWriter(idx, "stderr") } result, err := e.Runtime.Exec(ctx, handle, models.ExecOpts{ Command: []string{"bash", "-c", step.Command()}, Env: env, Stdout: stdout, Stderr: stderr, }) if err != nil { return err } if result.OOMKilled { return ErrOOMKilled } if result.ExitCode != 0 { return engine.ErrWorkflowFailed } return nil } func (e *Engine) DestroyWorkflow(ctx context.Context, wid models.WorkflowId) error { e.mu.Lock() handle, exists := e.Handles[wid.String()] delete(e.Handles, wid.String()) e.mu.Unlock() if !exists { return nil } return e.Runtime.Destroy(ctx, handle) } ``` ### Engine Implementations **Nixery Engine** (image from dependencies): ```go // engines/nixery/engine.go package nixery type Engine struct { *base.Engine cfg *config.Config } func (e *Engine) InitWorkflow(twf tangled.Pipeline_Workflow, tpl tangled.Pipeline) (*models.Workflow, error) { var spec struct { Steps []StepSpec `yaml:"steps"` Dependencies map[string][]string `yaml:"dependencies"` Environment map[string]string `yaml:"environment"` } yaml.Unmarshal([]byte(twf.Raw), &spec) // NIXERY-SPECIFIC: Build image URL from dependencies image := workflowImage(spec.Dependencies, e.cfg.NixeryPipelines.Nixery) steps := []models.Step{} // Add nixery-specific setup steps steps = append(steps, nixConfStep()) steps = append(steps, models.BuildCloneStep(twf, *tpl.TriggerMetadata, e.cfg.Server.Dev)) if depStep := dependencyStep(spec.Dependencies); depStep != nil { steps = append(steps, *depStep) } // Add user steps for _, s := range spec.Steps { steps = append(steps, Step{name: s.Name, command: s.Command, ...}) } return &models.Workflow{ Name: twf.Name, Steps: steps, Environment: spec.Environment, Data: base.WorkflowData{Image: image}, }, nil } func (e *Engine) WorkflowTimeout() time.Duration { // ... existing config-based timeout } ``` **Docker Engine** (user-specified image): ```go // engines/docker/engine.go package docker type Engine struct { *base.Engine } func (e *Engine) InitWorkflow(twf tangled.Pipeline_Workflow, tpl tangled.Pipeline) (*models.Workflow, error) { var spec struct { Image string `yaml:"image"` Steps []StepSpec `yaml:"steps"` Environment map[string]string `yaml:"environment"` } yaml.Unmarshal([]byte(twf.Raw), &spec) // DOCKER-SPECIFIC: Require explicit image if spec.Image == "" { return nil, fmt.Errorf("docker engine requires 'image' field in workflow") } steps := []models.Step{} // Add clone step (shared with nixery) steps = append(steps, models.BuildCloneStep(twf, *tpl.TriggerMetadata, false)) // Add user steps for _, s := range spec.Steps { steps = append(steps, SimpleStep{Name: s.Name, Command: s.Command, ...}) } return &models.Workflow{ Name: twf.Name, Steps: steps, Environment: spec.Environment, Data: base.WorkflowData{Image: spec.Image}, }, nil } func (e *Engine) WorkflowTimeout() time.Duration { return 1 * time.Hour // default } ``` ### Runtime Implementations **Docker Runtime**: ```go // runtime/docker/runtime.go package docker type Runtime struct { client client.APIClient logger *slog.Logger } type handle struct { containerID string networkID string } func (h *handle) ID() string { return h.containerID } func (r *Runtime) Setup(ctx context.Context, opts models.SetupOpts) (models.Handle, error) { // Create network netResp, _ := r.client.NetworkCreate(ctx, networkName(opts.WorkflowID), ...) // Pull image reader, _ := r.client.ImagePull(ctx, opts.Image, image.PullOptions{}) io.Copy(io.Discard, reader) reader.Close() // Create container resp, _ := r.client.ContainerCreate(ctx, &container.Config{ Image: opts.Image, Cmd: []string{"cat"}, OpenStdin: true, WorkingDir: opts.WorkDir, Labels: opts.Labels, }, &container.HostConfig{ CapDrop: capDrop(opts.DropAllCaps), CapAdd: opts.AddCaps, // ... mounts, security opts }, nil, nil, "") r.client.ContainerStart(ctx, resp.ID, container.StartOptions{}) return &handle{containerID: resp.ID, networkID: netResp.ID}, nil } func (r *Runtime) Exec(ctx context.Context, h models.Handle, opts models.ExecOpts) (*models.ExecResult, error) { dh := h.(*handle) execResp, _ := r.client.ContainerExecCreate(ctx, dh.containerID, container.ExecOptions{ Cmd: opts.Command, Env: opts.Env, AttachStdout: true, AttachStderr: true, }) attach, _ := r.client.ContainerExecAttach(ctx, execResp.ID, container.ExecAttachOptions{}) defer attach.Close() stdcopy.StdCopy(opts.Stdout, opts.Stderr, attach.Reader) inspect, _ := r.client.ContainerExecInspect(ctx, execResp.ID) // Check OOMKilled containerInspect, _ := r.client.ContainerInspect(ctx, dh.containerID) return &models.ExecResult{ ExitCode: inspect.ExitCode, OOMKilled: containerInspect.State.OOMKilled, }, nil } func (r *Runtime) Destroy(ctx context.Context, h models.Handle) error { dh := h.(*handle) r.client.ContainerStop(ctx, dh.containerID, container.StopOptions{}) r.client.ContainerRemove(ctx, dh.containerID, container.RemoveOptions{RemoveVolumes: true}) r.client.NetworkRemove(ctx, dh.networkID) return nil } ``` **Podman Runtime**: ```go // runtime/podman/runtime.go package podman // Podman is API-compatible with Docker for most operations. // This runtime uses the Podman socket instead of Docker socket. type Runtime struct { client *podman.APIClient // or use Docker client with Podman socket logger *slog.Logger } // Implementation nearly identical to Docker runtime. // Key differences: // - Socket path: /run/user/1000/podman/podman.sock (rootless) or /run/podman/podman.sock // - Some API differences in network handling // - Native support for rootless containers ``` ### Kubernetes Runtime (Downstream - Loom) ```go // In loom repo: runtime/kubernetes/runtime.go package kubernetes type Runtime struct { client client.Client config *rest.Config namespace string template SpindleTemplate } func (r *Runtime) Mode() models.RuntimeMode { return models.RuntimeModeBatch // All steps run in single Job } func (r *Runtime) Setup(ctx context.Context, opts models.SetupOpts) (models.Handle, error) { // Build Job spec with: // - Init containers for setup (user namespace, clone, etc.) // - Main container running loom-runner with all steps // - Node affinity based on opts.Architecture job := jobbuilder.BuildJob(jobbuilder.WorkflowConfig{ Image: opts.Image, Architecture: opts.Architecture, // ... steps passed via ConfigMap }) r.client.Create(ctx, job) // Wait for pod to be running pod := waitForPod(ctx, job) return &k8sHandle{ jobName: job.Name, podName: pod.Name, }, nil } func (r *Runtime) Exec(ctx context.Context, h models.Handle, opts models.ExecOpts) (*models.ExecResult, error) { // In batch mode, Exec streams logs rather than executing commands. // The loom-runner binary inside the Job executes all steps. // This method reads log output and returns when the step completes. kh := h.(*k8sHandle) // Stream logs from pod, parse JSON log lines from loom-runner // Return when step end marker is seen return &models.ExecResult{ExitCode: 0}, nil } func (r *Runtime) Destroy(ctx context.Context, h models.Handle) error { kh := h.(*k8sHandle) // Delete Job (GC handles pod cleanup) return r.client.Delete(ctx, &batchv1.Job{ObjectMeta: metav1.ObjectMeta{Name: kh.jobName}}) } ``` ### Workflow YAML Examples **engine:nixery** (current behavior): ```yaml engine: nixery dependencies: nixpkgs: - nodejs - python3 steps: - name: build command: npm run build ``` **engine:docker** (new): ```yaml engine: docker image: node:20-alpine steps: - name: install command: npm ci - name: build command: npm run build - name: test command: npm test ``` **engine:kubernetes** (downstream, in Loom): ```yaml engine: kubernetes image: golang:1.22 architecture: arm64 steps: - name: build command: go build ./... - name: test command: go test ./... ``` ### Server Wiring ```go // server.go func Run(ctx context.Context) error { cfg, err := config.Load(ctx) // Create runtime based on config var rt models.Runtime switch cfg.Runtime.Type { case "docker": dockerClient, _ := client.NewClientWithOpts(client.FromEnv) rt = docker.NewRuntime(dockerClient, logger) case "podman": podmanClient, _ := podman.NewClient(cfg.Runtime.Podman.Socket) rt = podman.NewRuntime(podmanClient, logger) default: rt = docker.NewRuntime(...) // default } // Create engines with shared runtime nixeryEng := nixery.New(rt, cfg, logger) dockerEng := dockerengine.New(rt, logger) s, _ := New(ctx, cfg, map[string]models.Engine{ "nixery": nixeryEng, "docker": dockerEng, }) return s.Start(ctx) } ``` ### Configuration ```toml # spindle.toml [runtime] type = "docker" # or "podman" [runtime.docker] # Uses DOCKER_HOST env var by default [runtime.podman] socket = "/run/user/1000/podman/podman.sock" ``` ## Migration Path ### Phase 1: Extract Runtime Interface - Add `models/runtime.go` with interface definition - Add `runtime/docker/` implementation - Refactor nixery to use docker runtime internally - No breaking changes to existing workflows ### Phase 2: Add Docker Engine - Add `engines/docker/` that uses same runtime - Register as `"docker"` in engine map - Users can now use `engine: docker` with explicit images ### Phase 3: Add Podman Runtime - Add `runtime/podman/` implementation - Add config option to select runtime - Podman users can run nixery/docker engines without Docker daemon ### Phase 4: Downstream Kubernetes (Loom) - Loom implements `runtime/kubernetes/` - Can register `"kubernetes"` engine or reuse `"docker"`/`"nixery"` engines with K8s runtime - Maintains current Job + loom-runner architecture ## Alternatives Considered ### 1. Keep engines monolithic - Pro: Simpler, less abstraction - Con: Code duplication, can't swap runtimes, harder for downstream ### 2. Docker-in-Docker for Kubernetes - Pro: Identical behavior to local execution - Con: Security concerns, complexity, resource overhead ### 3. Runtime as engine parameter - Pro: More flexible per-workflow - Con: Overcomplicates workflow YAML, runtime is deployment choice not user choice ## Open Questions 1. **Should runtime selection be per-engine or global?** - Proposal: Global (deployment config), not per-workflow 2. **How to handle runtime-specific features?** - E.g., K8s node affinity, Docker network modes - Proposal: `SetupOpts` has optional fields; runtimes ignore unsupported options 3. **Should we upstream the Kubernetes runtime?** - Proposal: No, keep in Loom. Upstream provides interface, downstream implements. 4. **Podman rootless considerations?** - User namespaces, different capability handling - Need testing matrix ## References - [Current nixery engine](/home/data/core/spindle/engines/nixery/engine.go) - [Loom KubernetesEngine](/home/data/loom/internal/engine/kubernetes_engine.go) - [Docker Engine API](https://docs.docker.com/engine/api/) - [Podman API](https://docs.podman.io/en/latest/_static/api.html)