# Service Repository Integration Design

## Overview

tinsnip and the service catalog are separate repositories that work together. tinsip provides the infrastructure while the service repo provides deployable applications.

## Installation Structure

### Clean Install
```
~/.local/opt/
├── dynamicalsystem.tinsnip/      # Platform installation
│   ├── machine/                  # Infrastructure setup
│   ├── scripts/                  # Platform utilities
│   └── setup.sh                  # Main orchestrator
└── dynamicalsystem.service/      # Service catalog (optional)
    ├── lldap/                    # Example service
    ├── gazette/                  # Future services
    └── README.md                 # Service documentation
```

### Sheet-aware Installation
For custom sheets, the structure follows the pattern:
```
~/.local/opt/
├── {sheet}.tinsnip/          # Platform for sheet
└── {sheet}.service/          # Services for sheet
```

## Service Repository URL Strategy

### Default Behavior: Infer from tinsnip repo
```bash
# If tinsnip is from: https://github.com/dynamicalsystem/tinsnip
# Then service defaults to: https://github.com/dynamicalsystem/service

# If tinsnip is from: git@gitlab.acmecorp.com:infra/tinsnip  
# Then service defaults to: git@gitlab.acmecorp.com:infra/service

# Logic: Replace "tinsnip" with "service" in the repo URL
```

### Override: SERVICE_REPO_URL
```bash
# Organizations can override with their own service catalog
SERVICE_REPO_URL="git@github.com:mycompany/our-services" ./install.sh

# This allows using official tinsnip with custom services
```

### Implementation in install.sh
```bash
# Get tinsnip repo URL (from git remote or hardcoded)
get_tinsnip_repo_url() {
    if [[ -d .git ]]; then
        git remote get-url origin 2>/dev/null || echo "$DEFAULT_REPO_URL"
    else
        echo "$DEFAULT_REPO_URL"
    fi
}

# Infer service repo from tinsnip repo
infer_service_repo() {
    local tinsnip_url="$1"
    # Replace "tinsnip" with "service" in the URL
    echo "${tinsnip_url/tinsnip/service}"
}

# Use SERVICE_REPO_URL if set, otherwise infer
SERVICE_REPO="${SERVICE_REPO_URL:-$(infer_service_repo "$(get_tinsnip_repo_url)")}"
```

## Service Discovery

### 1. Installation-time Discovery
During setup, tinsnip checks for service catalog in order:
```bash
# Priority order for service discovery
1. $SERVICE_PATH (environment variable - for local development)
2. ~/.local/opt/{sheet}.service (standard installation)
3. ./service (backward compatibility if exists)
```

### 2. Installation Options
```bash
# Platform only
curl -fsSL "https://url/install.sh" | bash

# Platform + services (using inferred service repo)
curl -fsSL "https://url/install.sh" | INSTALL_SERVICES=true bash

# Platform + custom service repo
curl -fsSL "https://url/install.sh" | SERVICE_REPO_URL="git@example.com:custom/services" bash
```

## Integration Points

### 1. Machine Setup Integration
Machine setup can optionally validate service exists:
```bash
tin machine create gazette prod nas-server
# Checks if ~/.local/opt/{sheet}.service/gazette exists
# Warns if service definition not found but continues
```

### 2. Service Deployment Integration
Services are deployed from their catalog location:
```bash
# As service user
sudo -u gazette-prod -i

# Service files are available via symlink
cd ~/service/gazette  # -> /mnt/gazette-prod/service/gazette

# Or copy from catalog during deployment
cp ~/.local/opt/dynamicalsystem.service/gazette/* /mnt/gazette-prod/service/gazette/
```

### 3. Machine Registry Integration
The station tracks deployed services, not available services:
- **Machine Registry** (`/mnt/station-prod/data/machines`): Simple text file tracking machine name → number mappings
- **Service Catalog** (`~/.local/opt/{sheet}.service/`): Git repository containing actual service implementations
- No tight coupling between tinsnip platform and specific services

**Note**: The YAML-based service catalog metadata concept was removed as it was unused. Service metadata lives in the service repository itself (docker-compose.yml, README.md, etc.).

## Platform Updates to Support External Services

### 1. Update install.sh
```bash
# Add service catalog installation option
INSTALL_SERVICES="${INSTALL_SERVICES:-false}"
SERVICE_REPO_URL="${SERVICE_REPO_URL:-}"  # Optional override

# Infer service repo from tinsnip repo
infer_service_repo() {
    local tinsnip_url="$1"
    echo "${tinsnip_url/tinsnip/service}"
}

# Optional service installation
if [[ "$INSTALL_SERVICES" == "true" ]]; then
    install_service_catalog
fi
```

### 2. Update documentation
- Remove service-specific content from tinsnip docs
- Add "Service Catalog" section explaining the separation
- Document SERVICE_REPO_URL override option

### 3. Create minimal service example
Keep one example in tinsnip to show the interface:
```
examples/
└── service-template/
    ├── docker-compose.yml    # Template showing tinsnip conventions
    └── README.md            # How to create tinsnip services
```

## Benefits of This Design

1. **Clean Separation**: Platform doesn't need to know about specific services
2. **Smart Defaults**: Service repo intelligently inferred from tinsnip source
3. **Easy Override**: Organizations can point to their own catalogs
4. **No Forking Required**: Use official tinsnip with custom services via SERVICE_REPO_URL
5. **Predictable Locations**: Services always in ~/.local/opt/{sheet}.service

## Example Workflows

### Deploy from Inferred Catalog
```bash
# Install tinsnip + services (will infer service repo)
INSTALL_SERVICES=true curl -fsSL "https://..." | bash

# Setup machine
cd ~/.local/opt/dynamicalsystem.tinsnip
tin machine create lldap prod nas-server

# Deploy service
sudo -u lldap-prod -i
cp -r ~/.local/opt/dynamicalsystem.service/lldap /mnt/lldap-prod/service/
cd /mnt/lldap-prod/service/lldap
docker compose up -d
```

### Deploy with Custom Service Catalog
```bash
# Install tinsip + custom services
SERVICE_REPO_URL="git@github.com:acmecorp/services" INSTALL_SERVICES=true curl -fsSL "https://..." | bash

# Setup machine for custom service
cd ~/.local/opt/dynamicalsystem.tinsnip
tin machine create myapp prod nas-server

# Deploy custom service
sudo -u myapp-prod -i
cp -r ~/.local/opt/dynamicalsystem.service/myapp /mnt/myapp-prod/service/
cd /mnt/myapp-prod/service/myapp
docker compose up -d
```

## Implementation Experiences

During the first real deployment of the separated service repository architecture, several pain points and successes were discovered:

### Pain Points Encountered

1. **NFS Export Detection Issues**
   - The `check_nfs_exists` function repeatedly asks to set up exports that already exist
   - Even after verifying exports with `exportfs -v`, the setup script doesn't detect them properly
   - **Impact**: Confusing user experience, requiring manual confirmation multiple times
   - **Root Cause**: The NFS detection logic times out too quickly or has permission issues

2. **Script Path Resolution Problems**
   - Legacy path issues resolved in CLI refactor
   - `setup_service.sh` looks for scripts in `scripts/scripts/` instead of `scripts/`
   - **Impact**: Setup fails with "file not found" errors
   - **Fix Applied**: Updated paths in setup.sh and setup_service.sh

3. **Rootless Docker Systemd Issues**
   - Service users get "Failed to connect to bus: No medium found" when using systemctl
   - Docker containers fail with cgroup errors: "Interactive authentication required"
   - **Impact**: Cannot run containers as service users
   - **Root Cause**: Service users don't have proper systemd user sessions or cgroup permissions

4. **Service Catalog Access**
   - Service users (e.g., lldap-test) cannot access `~simonhorrobin/.local/opt/dynamicalsystem.service/`
   - No sudo access for service users to copy files
   - **Impact**: Manual intervention required to copy service definitions
   - **Workaround**: Admin user must copy files and chown them

5. **Installation Script Completeness**
   - The `install.sh` script doesn't properly include the machine directory
   - Git repository must be cloned manually instead of using the curl installer
   - **Impact**: Confusing installation experience

### What Worked Well

1. **SMEP UID Scheme**
   - UIDs calculated correctly: station-prod (11000), lldap-test (11210)
   - Clear separation between services and environments
   - Port allocation follows UID scheme as designed

2. **NFS Mounting**
   - Once exports are created, mounting works reliably
   - Correct ownership preserved through all_squash
   - XDG symlinks created successfully

3. **Directory Structure**
   - Clean separation of `/mnt/<service>-<environment>/{data,config,state,service}`
   - Service definitions isolated in their own directories
   - Proper ownership maintained throughout

4. **Service Repository Separation**
   - External service repository clones successfully
   - Service definitions are clean and self-contained
   - Easy to understand docker-compose.yml files

### Recommended Improvements

1. **Fix NFS Detection**
   - Increase timeout in `check_nfs_exists` function
   - Add better error handling and logging
   - Consider using showmount or rpcinfo for detection

2. **Resolve Docker Issues**
   - Investigate systemd-logind configuration for service users
   - Consider using lingering sessions: `loginctl enable-linger`
   - Document manual Docker startup procedure as fallback

3. **Improve Service Catalog Access**
   - Consider mounting service catalog via NFS
   - Or copy to a shared location like `/opt/tinsnip/services`
   - Or include in machine setup to copy to service user homes

4. **Enhance Installation Process**
   - Update install.sh to properly fetch all components
   - Add verification step to ensure complete installation
   - Consider providing pre-built archives as alternative

5. **Add Validation Steps**
   - Verify all paths exist before executing scripts
   - Add dry-run mode for testing
   - Better error messages with suggested fixes

### Lessons Learned

- The separation of tinsip and services is conceptually sound
- Real-world deployment reveals integration challenges
- Service users need special consideration for Docker and file access
- NFS setup verification needs to be more robust
- Documentation should include troubleshooting guide for common issues

## Open Discussion Items

### 1. Docker Volume Persistence Architecture

**Problem**: Services currently use Docker named volumes (e.g., `lldap_data:/data`) which store data in Docker's internal storage (`~/.local/share/docker/volumes/`). This defeats tinsnip's core value proposition of NFS-backed persistence for continuous delivery.

**Current Impact**: 
- Service data is lost when machines are rebuilt
- Cannot achieve true continuous delivery model
- Inconsistent with tinsnip's architecture philosophy

**Potential Solutions**:
- Automatically generate `docker-compose.override.yml` during service setup to map volumes to `/mnt/<service>-<environment>/*` paths
- Modify service repository docker-compose.yml files to use bind mounts by default
- Create service setup script convention that handles volume mapping
- Use environment variable substitution in docker-compose.yml for volume paths

**Questions**: 
- Should this be handled in the tinsnip or the service repository?
- How do we maintain service portability while ensuring tinsnip integration?

### 2. Service Debugging and Change Control Workflow

**Problem**: During service deployment debugging, errors span multiple repositories:
- Platform issues (Docker setup, NFS mounting) are in the tinsnip repository
- Service configuration issues (docker-compose.yml, setup scripts) are in the service repository
- Debugging often happens directly on the deployment box with ad-hoc fixes

**Current Impact**:
- Manual fixes applied to boxes are lost when machines are rebuilt
- Changes made during debugging don't get captured in version control
- Difficult to maintain discipline around continuous delivery principles
- Knowledge transfer issues when fixes aren't documented

**Workflow Challenges**:
- How to ensure all debugging fixes get committed to appropriate repositories?
- How to maintain CD discipline when rapid iteration is needed for debugging?
- How to handle cross-repository dependencies during troubleshooting?
- How to preserve institutional knowledge from debugging sessions?

**Questions**:
- Should we require all changes to go through proper git workflow even during debugging?
- How do we balance speed of debugging with change control discipline?
- What tooling could help capture and replay manual fixes in automation?
- How do we ensure debugging insights get captured in documentation?