# Deployment Strategy for dynamicalsystem Services ## Overview This document defines the comprehensive deployment strategy for dynamicalsystem services, covering: - **Service isolation**: Users, UIDs, and process separation - **Storage strategy**: NFS-backed persistence with XDG integration - **Network strategy**: Port allocation and Docker networking - **Deployment patterns**: Machine setup and service orchestration Each service runs under a dedicated user with specific UID and port allocations, ensuring complete isolation between services and environments. ## SMEP Numbering Scheme **SMEP** is a 5-digit numbering scheme used for **both UIDs and ports** in tinsnip: **S-M-E-P** where each digit represents: - **S** (Sheet): `1-5` = Sheet number (dynamically calculated from sheet name) - **M** (Machine): `00-99` = Machine number within the sheet (2-digit) - **E** (Environment): `0-9` = Environment number (expanded from 2 to 10 environments) - **P** (Port Index): `0-9` = Port index within machine's allocation ### SMEP Applied to UIDs vs Ports **IMPORTANT**: UIDs and ports are **different things** that share the same numbering scheme: - **Machine UID** (system user ID): Always P=0 - Example: `50300` = system user for homelab-prod - Created once per machine-environment - **Machine Ports** (TCP/UDP port numbers): P=0 through P=9 - Example: `50300-50309` = 10 ports allocated to homelab-prod - Used by catalog services running on that machine The SMEP **base number** (e.g., `5030` for homelab-prod) determines: - The machine UID: `50300` (base + P=0) - The port range: `50300-50309` (base + P=0-9) ### SMEP Implementation The UID is calculated during `machine/setup.sh` based on the inputs: ```bash # Example: ./machine/setup.sh gazette prod DS412plus TIN_SERVICE_UID=$(calculate_machine_uid "gazette" "prod") # Result: 10100 (S=1, M=01, E=0, P=0) ``` **Machine Number Mapping (M digits):** - `00` = station (sheet infrastructure: registry, shared config) - `01` = gazette (first machine) - `02` = lldap (identity management) - `03` = gateway (gateway machine) - `04-99` = Additional machines (auto-assigned or manually configured) **Port Allocation:** The machine's SMEP base number determines both its UID and port range: ```bash # gazette-prod machine TIN_SERVICE_UID=10100 # System user ID (P=0) # Port range for this machine TIN_PORT_0=10100 # P=0, main service port TIN_PORT_1=10101 # P=1, admin/management port TIN_PORT_2=10102 # P=2, API endpoint # etc... up to TIN_PORT_9=10109 for 10 total ports per machine ``` ### Sheet Number Calculation The sheet number (N) is automatically calculated from the sheet name using this deterministic hash function: ```bash # Implementation from machine/scripts/lib.sh:get_sheet_number() get_sheet_number() { local sheet="${1:-dynamicalsystem}" # Hash sheet to 1-9 range using MD5 echo "$sheet" | md5sum | cut -c1-1 | { read hex printf "%d\n" "0x$hex" | awk '{n=($1 % 9) + 1; print n}' } } ``` This ensures: - Same sheet always gets the same number across all deployments - No central registry needed for sheet coordination - Supports up to 5 different sheets (S=1 through S=5) Examples: - `topsheet` → S=1 (UID starts with 1xxxx) - `mycompany` → S=5 (UID starts with 7xxxx) - `acmecorp` → S=3 (UID starts with 3xxxx) ### S-M-E-P Examples **Default sheet (dynamicalsystem, S=1):** - `10000` = S:1, SS:00, E:0, P:0 → dynamicalsystem.station.prod - `10001` = S:1, SS:00, E:1, P:0 → dynamicalsystem.station.test - `10100` = S:1, SS:01, E:0, P:0 → dynamicalsystem.gazette.prod - `10101` = S:1, SS:01, E:1, P:0 → dynamicalsystem.gazette.test - `10120` = S:1, SS:01, E:2, P:0 → dynamicalsystem.gazette.dev - `10200` = S:1, SS:02, E:0, P:0 → dynamicalsystem.lldap.prod - `10210` = S:1, SS:02, E:1, P:0 → dynamicalsystem.lldap.test **Custom sheet (mycompany, S=7):** - `40000` = S:7, SS:00, E:0, P:0 → mycompany.station.prod - `40001` = S:7, SS:00, E:1, P:0 → mycompany.station.test - `40100` = S:7, SS:01, E:0, P:0 → mycompany.gazette.prod - `40110` = S:7, SS:01, E:1, P:0 → mycompany.gazette.test **Port allocation example (lldap-prod, UID=10200):** - `10200` = LDAP protocol port - `10201` = Web admin interface - `10202` = API endpoint - `10203` = Metrics/monitoring **Environment mapping example (gazette service in sheet 1):** - `10100` = gazette-prod (E:0) - `10110` = gazette-test (E:1) - `10120` = gazette-dev (E:2) - `10130` = gazette-staging (E:3) - `10140` = gazette-demo (E:4) - `10150` = gazette-qa (E:5) - `10160` = gazette-uat (E:6) - `10170` = gazette-preview (E:7) - `10180` = gazette-canary (E:8) - `10190` = gazette-local (E:9) ## Sheet Station (M=00) The sheet station (M=00) provides infrastructure services for the sheet: ### Machine Registry Located at `/volume1/{sheet}/station/prod/machine-registry`, this file maps machine names to numbers in the example form: ``` gazette=01 lldap=02 redis=03 prometheus=04 ``` ### Directory Structure ``` /volume1/{sheet}/station/ ├── prod/ # UID: N0000 │ ├── machine-registry # Machine name to number mapping │ ├── port-allocations # Track allocated ports (optional) │ └── config/ # Shared sheet configuration └── test/ # UID: N0010 └── machine-registry # Test environment registry ``` ### Access Permissions - The station exports are readable by all machine users in the sheet - Only administrators can write to the registry - Machines consult the registry during deployment to determine their machine number ## Port Allocation Strategy Ports are automatically allocated based on the machine's SMEP number to ensure no conflicts when running multiple machines and environments on the same host. ### Port Calculation The P field in S-M-E-P provides port indexing within a machine. Each machine allocates 10 ports (P=0-9): ```bash # Example: lldap-test machine TIN_SERVICE_UID=10210 # System user ID (S:1, M:02, E:1, P:0) BASE_PORT=$TIN_SERVICE_UID # Increment P digit for additional ports, for example: TIN_PORT_0=$BASE_PORT # 10210 (P=0) - LDAP protocol TIN_PORT_1=$((BASE_PORT + 1)) # 10211 (P=1) - Web interface TIN_PORT_2=$((BASE_PORT + 2)) # 10212 (P=2) - REST API TIN_PORT_3=$((BASE_PORT + 3)) # 10213 (P=3) - Prometheus metrics # ... up to 10219 (P=9) for 10 total ports per service ``` **Implementation in machine/scripts/lib.sh:** ```bash calculate_service_ports() { local service_uid="$1" local port_count="${2:-3}" local base_port=$service_uid for ((i=0; i 10100 ``` **Important**: - This configuration is **manual and optional** - most clients can be configured to use the UID-based ports directly - Only one service per standard port per host - choose which environment gets the standard port - Configure port forwarding only when you encounter clients that cannot be configured to use custom ports - Document any port forwarding rules for future reference --- # CRITICAL: Environment Variable Loading for Docker Compose **READ THIS BEFORE MODIFYING DEPLOYMENT SCRIPTS** This section documents a critical, debugged, and stabilized pattern that **must not be modified** without full understanding of the constraints. Violations will cause circular debugging and service deployment failures. ## The Core Problem Docker Compose requires environment variables for YAML interpolation (e.g., `${TIN_PORT_0}` in docker-compose.yml), but rootless Docker daemon **breaks** when it inherits NFS-backed XDG environment variables from the parent shell. ## The Solution Service deployment uses a specific bash command pattern that: 1. Exports variables for Docker Compose YAML interpolation 2. Unsets problematic NFS-backed paths before starting Docker 3. Ensures containers receive variables via `env_file` directive **Implementation in `cmd/service/deploy.sh` (lines 159-162):** ```bash # Source env files with auto-export for Docker Compose YAML interpolation, but unset XDG vars that break rootless Docker # Containers get all vars via env_file directive in docker-compose.yml sudo -u "$service_user" bash -c "set -a && source /mnt/$service_env/.machine/machine.env && source /mnt/$service_env/service/$catalog_service/.env && set +a && unset XDG_DATA_HOME XDG_CONFIG_HOME XDG_STATE_HOME && cd /mnt/$service_env/service/$catalog_service && docker compose up -d" ``` ## Why Each Component Matters ### 1. `set -a` (allexport mode) **Required** before sourcing environment files. - `source` alone loads variables into the shell but **does NOT export them** - Docker Compose runs as a subprocess and needs **exported** variables - Without `set -a`: Variables are loaded but invisible to subprocesses - Result: `${TIN_PORT_0}` becomes empty string in docker-compose.yml ### 2. Source both environment files ```bash source /mnt/$service_env/.machine/machine.env && source /mnt/$service_env/service/$catalog_service/.env ``` - **machine.env**: Infrastructure variables (TIN_MACHINE_NAME, TIN_SERVICE_UID, DOCKER_HOST, XDG paths) - **service/.env**: Service-specific variables (TIN_CATALOG_SERVICE, TIN_PORT_0, TIN_PORT_1, etc.) - Both files required for complete YAML interpolation ### 3. `set +a` (disable allexport) Turns off auto-export after sourcing files. Good practice to prevent unintended exports. ### 4. `unset XDG_DATA_HOME XDG_CONFIG_HOME XDG_STATE_HOME` (CRITICAL) **This is the most important and fragile part:** - These variables point to NFS-backed paths (e.g., `/mnt/service-env/data`) - Rootless Docker daemon **inherits environment from parent shell** - If Docker daemon inherits NFS XDG paths, it tries to use NFS for internal storage - NFS + Docker internal storage = permission failures, daemon crashes - **Must unset AFTER sourcing** so variables are still in environment for Docker Compose - Containers will still receive these via `env_file` directive ### 5. `env_file` directive (docker-compose.yml) Containers receive their environment directly from files, not from parent process: ```yaml services: myservice: env_file: - ../../.machine/machine.env # Infrastructure variables - .env # Service-specific variables user: "${TIN_SERVICE_UID}:${TIN_SERVICE_UID}" ``` - This is why we can safely unset XDG vars for Docker daemon - Containers load vars from files independently - Containers **can** safely use NFS XDG paths (they're inside containers, not the daemon) ## Three Environment Contexts | Component | Needs Vars? | Source | XDG Path Constraints | |-----------|-------------|--------|----------------------| | **Host shell** (docker compose CLI) | Yes | `source` with `set -a` | Must have for YAML interpolation, must unset before Docker | | **Docker daemon** (dockerd-rootless.sh) | No | Inherits from parent | **MUST NOT** inherit NFS XDG paths (breaks daemon) | | **Containers** (service processes) | Yes | `env_file` directive | **CAN** use NFS XDG paths safely | ## Common Mistakes and Their Symptoms | Mistake | Symptom | How to Fix | |---------|---------|-----------| | Remove `set -a` | `${TIN_PORT_0}` → empty string in YAML | Add `set -a` before source | | Remove `source .machine/machine.env` | Missing TIN_MACHINE_NAME, DOCKER_HOST | Source both files | | Remove `unset XDG_*` | Docker daemon fails with permission errors | Keep unset after source | | Remove `env_file` from docker-compose.yml | Containers missing environment | Add env_file directive | | Source without export (`source` not `source + set -a`) | Variables loaded but not visible to docker compose | Use `set -a && source` | | Unset XDG before sourcing | Variables never loaded, both YAML and containers broken | Unset AFTER sourcing | ## Testing Checklist Before modifying deployment code, verify all of these work: - [ ] Docker Compose YAML interpolation: `${TIN_PORT_0}` expands to correct port number - [ ] Docker daemon starts without errors (check with `docker ps`) - [ ] Containers receive complete environment (check with `docker exec env`) - [ ] Services can write to NFS-backed XDG paths inside containers - [ ] Multiple services can deploy to same machine without interference - [ ] Service logs show correct port bindings - [ ] Containers don't crash-loop with permission errors ## Why This Pattern Is Hard to Maintain 1. **Three separate contexts**: Shell, daemon, containers each have different needs 2. **Conflicting requirements**: Daemon needs clean environment, containers need full environment 3. **Timing matters**: Order of source/unset operations is critical 4. **Non-obvious failure**: Missing `set -a` looks like it works (no error) but silently breaks 5. **NFS interaction**: XDG vars work fine in most contexts, only break Docker daemon ## Historical Context - **Bug introduced**: Oct 16, 2025 (commit 3bb4514) - Systemd detection checked user session instead of system capability - Resulted in wrong DOCKER_HOST path - **Fixed**: Oct 22, 2025 - Added `set -a` for proper variable export - Added XDG unset logic to protect Docker daemon - Fixed systemd detection in lib/docker.sh - Updated env loader script in lib/core.sh for ACT-2 metadata paths - **Root cause**: Multiple bugs compounded over time, circular debugging - **Prevention**: This documentation section ## Related Code - `cmd/service/deploy.sh`: Service deployment orchestration (lines 159-162) - `lib/docker.sh`: Docker installation and systemd detection (lines 178-188, 247-256) - `lib/core.sh`: Shell environment loader script generation (lines 104-135) - `service/*/docker-compose.yml`: Service definitions with env_file directive ## References See also: - OODA ACT-3 plan: `ooda/2025-10-multi-service-architecture/act/03-port-allocation/plan.md` - Docker rootless mode docs: https://docs.docker.com/engine/security/rootless/ - NFS + Docker issues: https://github.com/moby/moby/issues/47962 --- ## NFS Storage Strategy ### Directory Structure ``` /volume1/topsheet/ ├── station/ │ ├── prod/ (UID: 50000) - machine registry, shared config │ └── test/ (UID: 50010) - test registry └── gazette/ ├── prod/ (UID: 50100) └── test/ (UID: 50110) ``` ### NFS Export Requirements Each service/environment requires a dedicated NFS export with UID mapping: - **all_squash**: Maps all users to specific UID/GID - **anonuid/anongid**: Maps to service-specific UID (90000, 90010, etc.) - **Host restrictions**: Limit access to specific machines For detailed NFS setup instructions, see [CREATE_MACHINE.md](CREATE_MACHINE.md). ### Storage Organization Each service environment uses a standardized directory structure: ``` /mnt/-/ # NFS mount point ├── state/ # Service state (logs, history, etc.) ├── data/ # Service data files ├── config/ # Service configuration └── service/ # Docker Compose configurations └── / ├── docker-compose.yml └── .env (optional) ``` ## XDG Base Directory Integration To align with [XDG Base Directory Specification](https://specifications.freedesktop.org/basedir-spec/latest/) and make service data accessible to user applications, symlink NFS mount subdirectories to their XDG locations. ### XDG Directory Assumptions - **XDG_CACHE_HOME**: Local, host-specific cache files (not backed by NFS) - **XDG_RUNTIME_DIR**: Local, ephemeral runtime files (not backed by NFS) - **XDG_STATE_HOME**: Persistent state data (backed by NFS) - **XDG_DATA_HOME**: User-specific data files (backed by NFS) - **XDG_CONFIG_HOME**: User-specific configuration (backed by NFS) - **XDG_DATA_DIRS**: System-managed data directories (read-only) - **XDG_CONFIG_DIRS**: System-managed config directories (read-only) ### Directory Mapping ```bash # After mounting NFS to /mnt/-, create XDG symlinks TIN_SHEET=dynamicalsystem TIN_SERVICE_NAME=tinsnip # Ensure XDG directories exist mkdir -p "${XDG_STATE_HOME:-$HOME/.local/state}/${TIN_SHEET}" mkdir -p "${XDG_DATA_HOME:-$HOME/.local/share}/${TIN_SHEET}" mkdir -p "${XDG_CONFIG_HOME:-$HOME/.config}/${TIN_SHEET}" # Create symlinks from NFS mount to XDG locations ln -sf /mnt/${TIN_SERVICE_NAME}-${TIN_SERVICE_ENVIRONMENT}/state "${XDG_STATE_HOME:-$HOME/.local/state}/${TIN_SHEET}/@${TIN_SERVICE_NAME}" ln -sf /mnt/${TIN_SERVICE_NAME}-${TIN_SERVICE_ENVIRONMENT}/data "${XDG_DATA_HOME:-$HOME/.local/share}/${TIN_SHEET}/@${TIN_SERVICE_NAME}" ln -sf /mnt/${TIN_SERVICE_NAME}-${TIN_SERVICE_ENVIRONMENT}/config "${XDG_CONFIG_HOME:-$HOME/.config}/${TIN_SHEET}/@${TIN_SERVICE_NAME}" ``` ### Example Structure ``` /mnt/tinsnip-test/ # NFS mount point ├── state/ # Service state (logs, history, etc.) ├── data/ # Service data files └── config/ # Service configuration ~/.local/state/dynamicalsystem/@tinsnip -> /mnt/tinsnip-test/state ~/.local/share/dynamicalsystem/@tinsnip -> /mnt/tinsnip-test/data ~/.config/dynamicalsystem/@tinsnip -> /mnt/tinsnip-test/config ``` ### Benefits of XDG Integration 1. **Standard Compliance**: Follows XDG Base Directory specification 2. **User Access**: Applications can access service data through standard paths 3. **Backup Integration**: XDG paths are commonly included in user backups 4. **Clear Organization**: The `@` prefix clearly indicates NFS-backed service data 5. **Performance**: Cache and runtime data remain local for speed ### Implementation in Makefile ```makefile setup-xdg-links: mount-nfs @mkdir -p "$${XDG_STATE_HOME:-$$HOME/.local/state}/$(TIN_SHEET)" @mkdir -p "$${XDG_DATA_HOME:-$$HOME/.local/share}/$(TIN_SHEET)" @mkdir -p "$${XDG_CONFIG_HOME:-$$HOME/.config}/$(TIN_SHEET)" @ln -sfn $(MOUNT_POINT)/state "$${XDG_STATE_HOME:-$$HOME/.local/state}/$(TIN_SHEET)/@$(TIN_SERVICE_NAME)" @ln -sfn $(MOUNT_POINT)/data "$${XDG_DATA_HOME:-$$HOME/.local/share}/$(TIN_SHEET)/@$(TIN_SERVICE_NAME)" @ln -sfn $(MOUNT_POINT)/config "$${XDG_CONFIG_HOME:-$$HOME/.config}/$(TIN_SHEET)/@$(TIN_SERVICE_NAME)" @echo "Created XDG symlinks for $(TIN_SERVICE_NAME)" ``` ## Benefits 1. **Complete Isolation**: Each service/environment has its own UID and NFS directory 2. **No Shared Credentials**: NFS `all_squash` eliminates need for LDAP/shared users 3. **Persistent Data**: All data survives host rebuilds 4. **Easy Backup**: Centralized data on Synology NAS 5. **Scalable**: UID convention supports multiple sheets, services, and environments 6. **XDG Compliance**: Integrates with Linux desktop standards ## Language-Specific Patterns ### Python Services with UV Python services using UV package manager require specific handling to work correctly with tinsnip's UID isolation. #### The Editable Install Problem UV workspace packages (using `[tool.uv.workspace]`) are installed in editable/development mode by default. This causes permission errors in containers because: 1. Editable packages create symlinks/metadata that point to source code 2. Python attempts to rebuild/update package metadata at import time 3. Container runs as `TIN_SERVICE_UID` but venv was built as root 4. Import fails with "Permission denied" errors #### Solution: Non-Editable Production Install **Dockerfile pattern:** ```dockerfile FROM python:3.13-slim # Install system dependencies RUN apt-get update && apt-get install -y \ curl \ && rm -rf /var/lib/apt/lists/* # Install uv COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv WORKDIR /app # Copy dependency files first (for layer caching) COPY pyproject.toml . COPY uv.lock . # Install dependencies WITHOUT the workspace package RUN uv sync --frozen --no-install-workspace # Copy application code COPY myservice/ ./myservice/ # Install package as non-editable RUN uv pip install --no-deps ./myservice/ # Create directories RUN mkdir -p data config state logs EXPOSE 10700 # Use venv directly - no need for 'uv run' at runtime CMD [".venv/bin/gunicorn", \ "--bind", "0.0.0.0:10700", \ "myservice.app:create_app()"] ``` **docker-compose.yml pattern:** ```yaml services: myservice: build: . container_name: ${TIN_SERVICE_NAME}-${TIN_SERVICE_ENVIRONMENT} ports: - "${TIN_PORT_0}:10700" volumes: - ${XDG_DATA_HOME}/${TIN_SHEET}/${TIN_SERVICE_NAME}:/app/data - ${XDG_CONFIG_HOME}/${TIN_SHEET}/${TIN_SERVICE_NAME}:/app/config - ${XDG_STATE_HOME}/${TIN_SHEET}/${TIN_SERVICE_NAME}:/app/state user: "${TIN_SERVICE_UID}:${TIN_SERVICE_UID}" environment: - TIN_SERVICE_UID=${TIN_SERVICE_UID} - UV_NO_CACHE=1 # Disable cache directory creation - PYTHONUNBUFFERED=1 restart: unless-stopped ``` #### Key Points 1. **No user creation in Dockerfile** - Let docker-compose handle UID via `user:` directive 2. **Two-stage install** - Dependencies first (cached), then package non-editable 3. **UV_NO_CACHE=1** - Prevents UV from trying to create cache directories 4. **Direct venv execution** - Use `.venv/bin/python` or `.venv/bin/gunicorn`, not `uv run` 5. **Read-only venv** - Venv is built as root, readable by all, never modified at runtime #### Why This Works - **Build time**: Venv created as root with all dependencies and package installed - **Runtime**: Container runs as `TIN_SERVICE_UID`, venv is read-only - **No writes needed**: Non-editable install means Python never modifies venv - **Permission model**: Follows tinsnip pattern - specific UID, no privilege escalation #### Common Mistakes **INCORRECT: Using `uv run` in CMD** - Triggers package rebuilds ```dockerfile CMD ["uv", "run", "gunicorn", ...] # BAD ``` **INCORRECT: Editable install** - Requires venv write access ```dockerfile RUN uv sync --frozen # Installs workspace package as editable ``` **INCORRECT: Entrypoint with user switching** - Violates tinsnip pattern ```dockerfile ENTRYPOINT ["/entrypoint.sh"] # Container must start as root ``` **CORRECT approach:** ```dockerfile RUN uv sync --frozen --no-install-workspace # Dependencies only RUN uv pip install --no-deps ./myservice/ # Non-editable CMD [".venv/bin/gunicorn", ...] # Direct execution ``` #### Testing Locally Development and testing should still use editable installs: ```bash # Local development cd myservice uv sync # Editable install for development # Local testing uv run pytest uv run flask run # Production build test docker compose build docker compose up ``` ## Adding a New Service 1. Choose the next available service number (e.g., 2 for a new service) 2. Calculate UIDs: `10200` (prod), `10210` (test) 3. Create NFS directories on Synology with appropriate ownership 4. Add NFS exports to `/etc/exports` via SSH (GUI won't support custom UIDs) 5. Create Makefile using the template above 6. Deploy using `make setup && make deploy` ## Security Notes - Each NFS export is restricted to specific hosts - UIDs are in the 10000+ range to avoid conflicts - Services cannot access each other's data due to UID isolation - No root access required within containers (rootless Docker)