# Service Repository Integration Design ## Overview tinsnip and the service catalog are separate repositories that work together. tinsip provides the infrastructure while the service repo provides deployable applications. ## Installation Structure ### Clean Install ``` ~/.local/opt/ ├── dynamicalsystem.tinsnip/ # Platform installation │ ├── machine/ # Infrastructure setup │ ├── scripts/ # Platform utilities │ └── setup.sh # Main orchestrator └── dynamicalsystem.service/ # Service catalog (optional) ├── lldap/ # Example service ├── gazette/ # Future services └── README.md # Service documentation ``` ### Sheet-aware Installation For custom sheets, the structure follows the pattern: ``` ~/.local/opt/ ├── {sheet}.tinsnip/ # Platform for sheet └── {sheet}.service/ # Services for sheet ``` ## Service Repository URL Strategy ### Default Behavior: Infer from tinsnip repo ```bash # If tinsnip is from: https://github.com/dynamicalsystem/tinsnip # Then service defaults to: https://github.com/dynamicalsystem/service # If tinsnip is from: git@gitlab.acmecorp.com:infra/tinsnip # Then service defaults to: git@gitlab.acmecorp.com:infra/service # Logic: Replace "tinsnip" with "service" in the repo URL ``` ### Override: SERVICE_REPO_URL ```bash # Organizations can override with their own service catalog SERVICE_REPO_URL="git@github.com:mycompany/our-services" ./install.sh # This allows using official tinsnip with custom services ``` ### Implementation in install.sh ```bash # Get tinsnip repo URL (from git remote or hardcoded) get_tinsnip_repo_url() { if [[ -d .git ]]; then git remote get-url origin 2>/dev/null || echo "$DEFAULT_REPO_URL" else echo "$DEFAULT_REPO_URL" fi } # Infer service repo from tinsnip repo infer_service_repo() { local tinsnip_url="$1" # Replace "tinsnip" with "service" in the URL echo "${tinsnip_url/tinsnip/service}" } # Use SERVICE_REPO_URL if set, otherwise infer SERVICE_REPO="${SERVICE_REPO_URL:-$(infer_service_repo "$(get_tinsnip_repo_url)")}" ``` ## Service Discovery ### 1. Installation-time Discovery During setup, tinsnip checks for service catalog in order: ```bash # Priority order for service discovery 1. $SERVICE_PATH (environment variable - for local development) 2. ~/.local/opt/{sheet}.service (standard installation) 3. ./service (backward compatibility if exists) ``` ### 2. Installation Options ```bash # Platform only curl -fsSL "https://url/install.sh" | bash # Platform + services (using inferred service repo) curl -fsSL "https://url/install.sh" | INSTALL_SERVICES=true bash # Platform + custom service repo curl -fsSL "https://url/install.sh" | SERVICE_REPO_URL="git@example.com:custom/services" bash ``` ## Integration Points ### 1. Machine Setup Integration Machine setup can optionally validate service exists: ```bash tin machine create gazette prod nas-server # Checks if ~/.local/opt/{sheet}.service/gazette exists # Warns if service definition not found but continues ``` ### 2. Service Deployment Integration Services are deployed from their catalog location: ```bash # As service user sudo -u gazette-prod -i # Service files are available via symlink cd ~/service/gazette # -> /mnt/gazette-prod/service/gazette # Or copy from catalog during deployment cp ~/.local/opt/dynamicalsystem.service/gazette/* /mnt/gazette-prod/service/gazette/ ``` ### 3. Machine Registry Integration The station tracks deployed services, not available services: - **Machine Registry** (`/mnt/station-prod/data/machines`): Simple text file tracking machine name → number mappings - **Service Catalog** (`~/.local/opt/{sheet}.service/`): Git repository containing actual service implementations - No tight coupling between tinsnip platform and specific services **Note**: The YAML-based service catalog metadata concept was removed as it was unused. Service metadata lives in the service repository itself (docker-compose.yml, README.md, etc.). ## Platform Updates to Support External Services ### 1. Update install.sh ```bash # Add service catalog installation option INSTALL_SERVICES="${INSTALL_SERVICES:-false}" SERVICE_REPO_URL="${SERVICE_REPO_URL:-}" # Optional override # Infer service repo from tinsnip repo infer_service_repo() { local tinsnip_url="$1" echo "${tinsnip_url/tinsnip/service}" } # Optional service installation if [[ "$INSTALL_SERVICES" == "true" ]]; then install_service_catalog fi ``` ### 2. Update documentation - Remove service-specific content from tinsnip docs - Add "Service Catalog" section explaining the separation - Document SERVICE_REPO_URL override option ### 3. Create minimal service example Keep one example in tinsnip to show the interface: ``` examples/ └── service-template/ ├── docker-compose.yml # Template showing tinsnip conventions └── README.md # How to create tinsnip services ``` ## Benefits of This Design 1. **Clean Separation**: Platform doesn't need to know about specific services 2. **Smart Defaults**: Service repo intelligently inferred from tinsnip source 3. **Easy Override**: Organizations can point to their own catalogs 4. **No Forking Required**: Use official tinsnip with custom services via SERVICE_REPO_URL 5. **Predictable Locations**: Services always in ~/.local/opt/{sheet}.service ## Example Workflows ### Deploy from Inferred Catalog ```bash # Install tinsnip + services (will infer service repo) INSTALL_SERVICES=true curl -fsSL "https://..." | bash # Setup machine cd ~/.local/opt/dynamicalsystem.tinsnip tin machine create lldap prod nas-server # Deploy service sudo -u lldap-prod -i cp -r ~/.local/opt/dynamicalsystem.service/lldap /mnt/lldap-prod/service/ cd /mnt/lldap-prod/service/lldap docker compose up -d ``` ### Deploy with Custom Service Catalog ```bash # Install tinsip + custom services SERVICE_REPO_URL="git@github.com:acmecorp/services" INSTALL_SERVICES=true curl -fsSL "https://..." | bash # Setup machine for custom service cd ~/.local/opt/dynamicalsystem.tinsnip tin machine create myapp prod nas-server # Deploy custom service sudo -u myapp-prod -i cp -r ~/.local/opt/dynamicalsystem.service/myapp /mnt/myapp-prod/service/ cd /mnt/myapp-prod/service/myapp docker compose up -d ``` ## Implementation Experiences During the first real deployment of the separated service repository architecture, several pain points and successes were discovered: ### Pain Points Encountered 1. **NFS Export Detection Issues** - The `check_nfs_exists` function repeatedly asks to set up exports that already exist - Even after verifying exports with `exportfs -v`, the setup script doesn't detect them properly - **Impact**: Confusing user experience, requiring manual confirmation multiple times - **Root Cause**: The NFS detection logic times out too quickly or has permission issues 2. **Script Path Resolution Problems** - Legacy path issues resolved in CLI refactor - `setup_service.sh` looks for scripts in `scripts/scripts/` instead of `scripts/` - **Impact**: Setup fails with "file not found" errors - **Fix Applied**: Updated paths in setup.sh and setup_service.sh 3. **Rootless Docker Systemd Issues** - Service users get "Failed to connect to bus: No medium found" when using systemctl - Docker containers fail with cgroup errors: "Interactive authentication required" - **Impact**: Cannot run containers as service users - **Root Cause**: Service users don't have proper systemd user sessions or cgroup permissions 4. **Service Catalog Access** - Service users (e.g., lldap-test) cannot access `~simonhorrobin/.local/opt/dynamicalsystem.service/` - No sudo access for service users to copy files - **Impact**: Manual intervention required to copy service definitions - **Workaround**: Admin user must copy files and chown them 5. **Installation Script Completeness** - The `install.sh` script doesn't properly include the machine directory - Git repository must be cloned manually instead of using the curl installer - **Impact**: Confusing installation experience ### What Worked Well 1. **SMEP UID Scheme** - UIDs calculated correctly: station-prod (11000), lldap-test (11210) - Clear separation between services and environments - Port allocation follows UID scheme as designed 2. **NFS Mounting** - Once exports are created, mounting works reliably - Correct ownership preserved through all_squash - XDG symlinks created successfully 3. **Directory Structure** - Clean separation of `/mnt/-/{data,config,state,service}` - Service definitions isolated in their own directories - Proper ownership maintained throughout 4. **Service Repository Separation** - External service repository clones successfully - Service definitions are clean and self-contained - Easy to understand docker-compose.yml files ### Recommended Improvements 1. **Fix NFS Detection** - Increase timeout in `check_nfs_exists` function - Add better error handling and logging - Consider using showmount or rpcinfo for detection 2. **Resolve Docker Issues** - Investigate systemd-logind configuration for service users - Consider using lingering sessions: `loginctl enable-linger` - Document manual Docker startup procedure as fallback 3. **Improve Service Catalog Access** - Consider mounting service catalog via NFS - Or copy to a shared location like `/opt/tinsnip/services` - Or include in machine setup to copy to service user homes 4. **Enhance Installation Process** - Update install.sh to properly fetch all components - Add verification step to ensure complete installation - Consider providing pre-built archives as alternative 5. **Add Validation Steps** - Verify all paths exist before executing scripts - Add dry-run mode for testing - Better error messages with suggested fixes ### Lessons Learned - The separation of tinsip and services is conceptually sound - Real-world deployment reveals integration challenges - Service users need special consideration for Docker and file access - NFS setup verification needs to be more robust - Documentation should include troubleshooting guide for common issues ## Open Discussion Items ### 1. Docker Volume Persistence Architecture **Problem**: Services currently use Docker named volumes (e.g., `lldap_data:/data`) which store data in Docker's internal storage (`~/.local/share/docker/volumes/`). This defeats tinsnip's core value proposition of NFS-backed persistence for continuous delivery. **Current Impact**: - Service data is lost when machines are rebuilt - Cannot achieve true continuous delivery model - Inconsistent with tinsnip's architecture philosophy **Potential Solutions**: - Automatically generate `docker-compose.override.yml` during service setup to map volumes to `/mnt/-/*` paths - Modify service repository docker-compose.yml files to use bind mounts by default - Create service setup script convention that handles volume mapping - Use environment variable substitution in docker-compose.yml for volume paths **Questions**: - Should this be handled in the tinsnip or the service repository? - How do we maintain service portability while ensuring tinsnip integration? ### 2. Service Debugging and Change Control Workflow **Problem**: During service deployment debugging, errors span multiple repositories: - Platform issues (Docker setup, NFS mounting) are in the tinsnip repository - Service configuration issues (docker-compose.yml, setup scripts) are in the service repository - Debugging often happens directly on the deployment box with ad-hoc fixes **Current Impact**: - Manual fixes applied to boxes are lost when machines are rebuilt - Changes made during debugging don't get captured in version control - Difficult to maintain discipline around continuous delivery principles - Knowledge transfer issues when fixes aren't documented **Workflow Challenges**: - How to ensure all debugging fixes get committed to appropriate repositories? - How to maintain CD discipline when rapid iteration is needed for debugging? - How to handle cross-repository dependencies during troubleshooting? - How to preserve institutional knowledge from debugging sessions? **Questions**: - Should we require all changes to go through proper git workflow even during debugging? - How do we balance speed of debugging with change control discipline? - What tooling could help capture and replay manual fixes in automation? - How do we ensure debugging insights get captured in documentation?