homelab infrastructure services

Service Repository Integration Design#

Overview#

tinsnip and the service catalog are separate repositories that work together. tinsip provides the infrastructure while the service repo provides deployable applications.

Installation Structure#

Clean Install#

~/.local/opt/
├── dynamicalsystem.tinsnip/      # Platform installation
│   ├── machine/                  # Infrastructure setup
│   ├── scripts/                  # Platform utilities
│   └── setup.sh                  # Main orchestrator
└── dynamicalsystem.service/      # Service catalog (optional)
    ├── lldap/                    # Example service
    ├── gazette/                  # Future services
    └── README.md                 # Service documentation

Sheet-aware Installation#

For custom sheets, the structure follows the pattern:

~/.local/opt/
├── {sheet}.tinsnip/          # Platform for sheet
└── {sheet}.service/          # Services for sheet

Service Repository URL Strategy#

Default Behavior: Infer from tinsnip repo#

# If tinsnip is from: https://github.com/dynamicalsystem/tinsnip
# Then service defaults to: https://github.com/dynamicalsystem/service

# If tinsnip is from: git@gitlab.acmecorp.com:infra/tinsnip  
# Then service defaults to: git@gitlab.acmecorp.com:infra/service

# Logic: Replace "tinsnip" with "service" in the repo URL

Override: SERVICE_REPO_URL#

# Organizations can override with their own service catalog
SERVICE_REPO_URL="git@github.com:mycompany/our-services" ./install.sh

# This allows using official tinsnip with custom services

Implementation in install.sh#

# Get tinsnip repo URL (from git remote or hardcoded)
get_tinsnip_repo_url() {
    if [[ -d .git ]]; then
        git remote get-url origin 2>/dev/null || echo "$DEFAULT_REPO_URL"
    else
        echo "$DEFAULT_REPO_URL"
    fi
}

# Infer service repo from tinsnip repo
infer_service_repo() {
    local tinsnip_url="$1"
    # Replace "tinsnip" with "service" in the URL
    echo "${tinsnip_url/tinsnip/service}"
}

# Use SERVICE_REPO_URL if set, otherwise infer
SERVICE_REPO="${SERVICE_REPO_URL:-$(infer_service_repo "$(get_tinsnip_repo_url)")}"

Service Discovery#

1. Installation-time Discovery#

During setup, tinsnip checks for service catalog in order:

# Priority order for service discovery
1. $SERVICE_PATH (environment variable - for local development)
2. ~/.local/opt/{sheet}.service (standard installation)
3. ./service (backward compatibility if exists)

2. Installation Options#

# Platform only
curl -fsSL "https://url/install.sh" | bash

# Platform + services (using inferred service repo)
curl -fsSL "https://url/install.sh" | INSTALL_SERVICES=true bash

# Platform + custom service repo
curl -fsSL "https://url/install.sh" | SERVICE_REPO_URL="git@example.com:custom/services" bash

Integration Points#

1. Machine Setup Integration#

Machine setup can optionally validate service exists:

tin machine create gazette prod nas-server
# Checks if ~/.local/opt/{sheet}.service/gazette exists
# Warns if service definition not found but continues

2. Service Deployment Integration#

Services are deployed from their catalog location:

# As service user
sudo -u gazette-prod -i

# Service files are available via symlink
cd ~/service/gazette  # -> /mnt/gazette-prod/service/gazette

# Or copy from catalog during deployment
cp ~/.local/opt/dynamicalsystem.service/gazette/* /mnt/gazette-prod/service/gazette/

3. Machine Registry Integration#

The station tracks deployed services, not available services:

  • Machine Registry (/mnt/station-prod/data/machines): Simple text file tracking machine name → number mappings
  • Service Catalog (~/.local/opt/{sheet}.service/): Git repository containing actual service implementations
  • No tight coupling between tinsnip platform and specific services

Note: The YAML-based service catalog metadata concept was removed as it was unused. Service metadata lives in the service repository itself (docker-compose.yml, README.md, etc.).

Platform Updates to Support External Services#

1. Update install.sh#

# Add service catalog installation option
INSTALL_SERVICES="${INSTALL_SERVICES:-false}"
SERVICE_REPO_URL="${SERVICE_REPO_URL:-}"  # Optional override

# Infer service repo from tinsnip repo
infer_service_repo() {
    local tinsnip_url="$1"
    echo "${tinsnip_url/tinsnip/service}"
}

# Optional service installation
if [[ "$INSTALL_SERVICES" == "true" ]]; then
    install_service_catalog
fi

2. Update documentation#

  • Remove service-specific content from tinsnip docs
  • Add "Service Catalog" section explaining the separation
  • Document SERVICE_REPO_URL override option

3. Create minimal service example#

Keep one example in tinsnip to show the interface:

examples/
└── service-template/
    ├── docker-compose.yml    # Template showing tinsnip conventions
    └── README.md            # How to create tinsnip services

Benefits of This Design#

  1. Clean Separation: Platform doesn't need to know about specific services
  2. Smart Defaults: Service repo intelligently inferred from tinsnip source
  3. Easy Override: Organizations can point to their own catalogs
  4. No Forking Required: Use official tinsnip with custom services via SERVICE_REPO_URL
  5. Predictable Locations: Services always in ~/.local/opt/{sheet}.service

Example Workflows#

Deploy from Inferred Catalog#

# Install tinsnip + services (will infer service repo)
INSTALL_SERVICES=true curl -fsSL "https://..." | bash

# Setup machine
cd ~/.local/opt/dynamicalsystem.tinsnip
tin machine create lldap prod nas-server

# Deploy service
sudo -u lldap-prod -i
cp -r ~/.local/opt/dynamicalsystem.service/lldap /mnt/lldap-prod/service/
cd /mnt/lldap-prod/service/lldap
docker compose up -d

Deploy with Custom Service Catalog#

# Install tinsip + custom services
SERVICE_REPO_URL="git@github.com:acmecorp/services" INSTALL_SERVICES=true curl -fsSL "https://..." | bash

# Setup machine for custom service
cd ~/.local/opt/dynamicalsystem.tinsnip
tin machine create myapp prod nas-server

# Deploy custom service
sudo -u myapp-prod -i
cp -r ~/.local/opt/dynamicalsystem.service/myapp /mnt/myapp-prod/service/
cd /mnt/myapp-prod/service/myapp
docker compose up -d

Implementation Experiences#

During the first real deployment of the separated service repository architecture, several pain points and successes were discovered:

Pain Points Encountered#

  1. NFS Export Detection Issues

    • The check_nfs_exists function repeatedly asks to set up exports that already exist
    • Even after verifying exports with exportfs -v, the setup script doesn't detect them properly
    • Impact: Confusing user experience, requiring manual confirmation multiple times
    • Root Cause: The NFS detection logic times out too quickly or has permission issues
  2. Script Path Resolution Problems

    • Legacy path issues resolved in CLI refactor
    • setup_service.sh looks for scripts in scripts/scripts/ instead of scripts/
    • Impact: Setup fails with "file not found" errors
    • Fix Applied: Updated paths in setup.sh and setup_service.sh
  3. Rootless Docker Systemd Issues

    • Service users get "Failed to connect to bus: No medium found" when using systemctl
    • Docker containers fail with cgroup errors: "Interactive authentication required"
    • Impact: Cannot run containers as service users
    • Root Cause: Service users don't have proper systemd user sessions or cgroup permissions
  4. Service Catalog Access

    • Service users (e.g., lldap-test) cannot access ~simonhorrobin/.local/opt/dynamicalsystem.service/
    • No sudo access for service users to copy files
    • Impact: Manual intervention required to copy service definitions
    • Workaround: Admin user must copy files and chown them
  5. Installation Script Completeness

    • The install.sh script doesn't properly include the machine directory
    • Git repository must be cloned manually instead of using the curl installer
    • Impact: Confusing installation experience

What Worked Well#

  1. SMEP UID Scheme

    • UIDs calculated correctly: station-prod (11000), lldap-test (11210)
    • Clear separation between services and environments
    • Port allocation follows UID scheme as designed
  2. NFS Mounting

    • Once exports are created, mounting works reliably
    • Correct ownership preserved through all_squash
    • XDG symlinks created successfully
  3. Directory Structure

    • Clean separation of /mnt/<service>-<environment>/{data,config,state,service}
    • Service definitions isolated in their own directories
    • Proper ownership maintained throughout
  4. Service Repository Separation

    • External service repository clones successfully
    • Service definitions are clean and self-contained
    • Easy to understand docker-compose.yml files
  1. Fix NFS Detection

    • Increase timeout in check_nfs_exists function
    • Add better error handling and logging
    • Consider using showmount or rpcinfo for detection
  2. Resolve Docker Issues

    • Investigate systemd-logind configuration for service users
    • Consider using lingering sessions: loginctl enable-linger
    • Document manual Docker startup procedure as fallback
  3. Improve Service Catalog Access

    • Consider mounting service catalog via NFS
    • Or copy to a shared location like /opt/tinsnip/services
    • Or include in machine setup to copy to service user homes
  4. Enhance Installation Process

    • Update install.sh to properly fetch all components
    • Add verification step to ensure complete installation
    • Consider providing pre-built archives as alternative
  5. Add Validation Steps

    • Verify all paths exist before executing scripts
    • Add dry-run mode for testing
    • Better error messages with suggested fixes

Lessons Learned#

  • The separation of tinsip and services is conceptually sound
  • Real-world deployment reveals integration challenges
  • Service users need special consideration for Docker and file access
  • NFS setup verification needs to be more robust
  • Documentation should include troubleshooting guide for common issues

Open Discussion Items#

1. Docker Volume Persistence Architecture#

Problem: Services currently use Docker named volumes (e.g., lldap_data:/data) which store data in Docker's internal storage (~/.local/share/docker/volumes/). This defeats tinsnip's core value proposition of NFS-backed persistence for continuous delivery.

Current Impact:

  • Service data is lost when machines are rebuilt
  • Cannot achieve true continuous delivery model
  • Inconsistent with tinsnip's architecture philosophy

Potential Solutions:

  • Automatically generate docker-compose.override.yml during service setup to map volumes to /mnt/<service>-<environment>/* paths
  • Modify service repository docker-compose.yml files to use bind mounts by default
  • Create service setup script convention that handles volume mapping
  • Use environment variable substitution in docker-compose.yml for volume paths

Questions:

  • Should this be handled in the tinsnip or the service repository?
  • How do we maintain service portability while ensuring tinsnip integration?

2. Service Debugging and Change Control Workflow#

Problem: During service deployment debugging, errors span multiple repositories:

  • Platform issues (Docker setup, NFS mounting) are in the tinsnip repository
  • Service configuration issues (docker-compose.yml, setup scripts) are in the service repository
  • Debugging often happens directly on the deployment box with ad-hoc fixes

Current Impact:

  • Manual fixes applied to boxes are lost when machines are rebuilt
  • Changes made during debugging don't get captured in version control
  • Difficult to maintain discipline around continuous delivery principles
  • Knowledge transfer issues when fixes aren't documented

Workflow Challenges:

  • How to ensure all debugging fixes get committed to appropriate repositories?
  • How to maintain CD discipline when rapid iteration is needed for debugging?
  • How to handle cross-repository dependencies during troubleshooting?
  • How to preserve institutional knowledge from debugging sessions?

Questions:

  • Should we require all changes to go through proper git workflow even during debugging?
  • How do we balance speed of debugging with change control discipline?
  • What tooling could help capture and replay manual fixes in automation?
  • How do we ensure debugging insights get captured in documentation?