commits
Implements coordinated port allocation for multi-service deployments.
Key Features:
- First-fit port allocation algorithm with gap support
- Automatic service .env generation with TIN_PORT_* variables
- Port deallocation on service removal
- Idempotent service deployment
Includes Bug Fixes:
- Sheet creation registry file path (15e877d)
- Service logs/status commands for ACT-2 structure (232dd58)
- Removed legacy service deployment from machine setup (93b1010)
Testing:
- Multi-service deployments verified
- Port conflict prevention confirmed
- Deallocation and reallocation tested
- HTTP endpoints validated
Branch: feature/port-allocation-algorithm
Commits: 11 commits
Status: All tests passing
Added comprehensive documentation for the environment variable loading
pattern used in service deployment:
- cmd/service/deploy.sh: Updated comments to clarify env loading
- docs/deployment_strategy.md: New section on Docker Compose env loading
- docs/service_creation.md: Added constraints and troubleshooting
- ooda/.../plan.md: Detailed explanation of env loading requirements
- lib/core.sh, lib/docker.sh: Minor formatting improvements
This pattern was debugged extensively during ACT-3 testing and must be
maintained to support both Docker Compose YAML interpolation and
rootless Docker daemon stability.
Comprehensive testing summary including:
- All core functionality verified (multi-service, port tracking, .env generation)
- Two test environments (test-ports-dev, act-3-dev)
- Detailed deployment test results with verification commands
- Three bug fixes documented with root causes and solutions
- Implementation summary with function list and integration points
- Port allocation algorithm description
- Ready for merge checklist
All ASCII characters (no emojis).
The copy_service_definition() function was creating /mnt/{machine}/service/{machine}/
during machine creation, which conflicts with the new multi-service architecture.
Changes:
- Removed copy_service_definition() function entirely
- Removed call to copy_service_definition() from main()
- Machine setup now only creates infrastructure (user, NFS, Docker, metadata)
- Service deployment is now exclusively handled by 'tin service deploy'
- Updated terminology: 'Service:' → 'Machine:' in output
Fixes empty service directories with same name as machine (e.g., act-3-dev/service/act-3/).
Commands were using old /mnt/$service_env/.env location.
Updated to:
- Source .machine/machine.env (machine infrastructure)
- Source service/$service/.env (service-specific vars)
- Use same env loading pattern as deploy.sh (set -a, unset XDG)
Fixes 'No such file or directory' errors in tin service logs/status.
The sheet creation was trying to write the machine registry to
/mnt/station-prod/data/machines/<sheet> which would fail if the
directory already existed from previous machine creations.
Changed to:
- Create directory if needed
- Write registry file to <sheet>/registry
This handles both fresh sheet creation and re-creation after removal.
Fixes error: 'tee: /mnt/station-prod/data/machines/<sheet>: Is a directory'
Login shells (-i flag) run initialization scripts (.bash_profile, etc)
which can reset environment variables. Since we explicitly source the
env files we need, we don't need login shell behavior.
Before: sudo -u user -i bash -c "source env && ..."
After: sudo -u user bash -c "source env && ..."
This fixes docker compose warnings about unset variables.
Also fix temp file permissions in port deallocation by creating temp
file as station-prod user instead of current user.
Refs: ooda/2025-10-multi-service-architecture/act/03-port-allocation/BRANCH.md
Allow redeployment of services without removing first.
Behavior:
- If service exists with matching port count: Reuse allocation
- If service exists with different port count: Error (requires rm/deploy)
- If service doesn't exist: Allocate new ports as before
Benefits:
- Update service files without removing
- Retry failed deployments easily
- More ergonomic workflow
- Matches Docker/Kubernetes expectations
Error only when actual conflict (port count mismatch).
Refs: ooda/2025-10-multi-service-architecture/act/03-port-allocation/plan.md
deallocate_service_ports() calls parse_machine_name() but metadata.sh
wasn't sourcing uid.sh where that function is defined.
This caused port deallocation to fail with:
ERROR: Invalid machine environment: test-ports-dev
Refs: ooda/2025-10-multi-service-architecture/act/03-port-allocation/BRANCH.md
After ACT-2, machine infrastructure moved to .machine/machine.env.
Service deployment now needs both:
- Machine infrastructure: .machine/machine.env
- Service ports: service/<catalog>/.env
Updated:
- cmd/service/deploy.sh: Source both env files for setup.sh and docker compose
- cmd/service/rm.sh: Use both env files for docker compose down
This bridges ACT-2 and ACT-3 until ACT-4 adds env_file directive to
docker-compose.yml files.
Refs: ooda/2025-10-multi-service-architecture/act/03-port-allocation/plan.md
Refs: ooda/2025-10-multi-service-architecture/act/03-port-allocation/
- Add port allocation functions to lib/metadata.sh:
- get_service_port_count(): Parse docker-compose.yml for port needs
- find_available_port_range(): First-fit allocation algorithm
- allocate_service_ports(): Central port allocation with registry
- deallocate_service_ports(): Free ports on service removal
- generate_service_env(): Create service .env with TIN_PORT_* vars
- Update cmd/service/deploy.sh:
- Allocate ports before service deployment
- Generate per-service .env files
- Show allocated port range in success message
- Clean up on allocation failure
- Update cmd/service/rm.sh:
- Deallocate ports on service removal
- Update success message to note freed ports
- Remove old allocate_service_ports() from lib/core.sh
- Old implementation did not support multi-service coordination
- Fix .gitignore: Change service/ to /service/ to allow cmd/service/ tracking
- Add cmd/service/*.sh files (previously untracked)
- These CLI commands are part of the platform
- Update OODA documentation:
- Add ACT-3 plan.md with implementation details
- Update outcomes.md with port-allocation outcome and tests
- Update BRANCH.md to track progress
Refs: ooda/2025-10-multi-service-architecture/orient/port-allocation-algorithm.md
Refs: ooda/2025-10-multi-service-architecture/act/03-port-allocation/plan.md
Updated tracking documents to reflect ACT-2 completion:
- Status: Merged (765d255)
- All testing completed successfully
- ACT-3 no longer blocked
Refs: ooda/2025-10-multi-service-architecture/act/02-machine-metadata/BRANCH.md
Centralize machine metadata in station-prod to enable:
- Single source of truth for machine configuration
- Central observability of all machines
- Foundation for ACT-3 port allocation
Implementation:
- New lib/metadata.sh for metadata management
- machine.env + ports files in /mnt/station-prod/data/machines/<sheet>/<machine-env>/
- .machine symlink from mount point for easy access
- Updated registry to use <sheet>/registry directory structure
Tested on geneticalgorithm, all success criteria passed.
Closes: ooda/2025-10-multi-service-architecture/act/02-machine-metadata
Machine metadata implementation tested and verified on geneticalgorithm.
All success criteria passed.
Results:
- Metadata centralized in /mnt/station-prod/data/machines/<sheet>/<machine-env>/
- .machine symlink provides clean access path
- machine.env + ports files created correctly
- No regressions
Ready to merge to main.
Refs: ooda/2025-10-multi-service-architecture/act/02-machine-metadata/BRANCH.md
Update setup_station.sh to write registry to
/data/machines/<sheet>/registry instead of /data/machines/<sheet>.
Ensures sheet directory is created before writing registry file.
Refs: ooda/2025-10-multi-service-architecture/act/02-machine-metadata/plan.md
Update machine registry to use <sheet>/registry file structure instead
of flat <sheet> files. This makes room for machine metadata at
<sheet>/<machine-env>/.
Structure:
- Old: /data/machines/<sheet> (file)
- New: /data/machines/<sheet>/registry (file in directory)
/data/machines/<sheet>/<machine-env>/ (machine metadata)
Updated registry functions in lib/registry.sh and lib/uid.sh to append
'/registry' to all registry paths.
Refs: ooda/2025-10-multi-service-architecture/act/02-machine-metadata/plan.md
Remove SCRIPT_DIR definition from lib/metadata.sh to avoid overwriting
the caller's SCRIPT_DIR variable. Assumes core.sh is already sourced by
the caller (which it always is in our use cases).
Fixes path resolution issue where setup_service.sh couldn't find
setup_station.sh because SCRIPT_DIR was overwritten to point to lib/
instead of machine/.
Refs: ooda/2025-10-multi-service-architecture/act/02-machine-metadata/plan.md
Ensures metadata.sh is downloaded during tinsnip installation.
Refs: ooda/2025-10-multi-service-architecture/act/02-machine-metadata/plan.md
Create machine metadata in station-prod instead of .env at machine root:
- New lib/metadata.sh with metadata management functions
- machine.env + ports registry in /mnt/station-prod/data/machines/<sheet>/<machine-env>/
- .machine symlink from mount to station-prod for easy access
- Docker env appends to machine.env instead of .env
Benefits:
- Single source of truth in station-prod
- Central observability via ls /mnt/station-prod/data/machines/<sheet>/
- Foundation for ACT-3 port allocation
- Self-documenting via symlink (no env var complexity)
Refs: ooda/2025-10-multi-service-architecture/act/02-machine-metadata/plan.md
Move OODA tracking from docs/ooda/ to ooda/ for cleaner structure.
All outcome-based tracking now in top-level ooda/ directory.
Refs: ooda/2025-10-multi-service-architecture/workflow.md
Created implementation plan and defined outcome for centralizing machine
metadata in station-prod.
Outcome: Enable central observability of all machines from
/mnt/station-prod/data/machines/<sheet>/
Refs: ooda/2025-10-multi-service-architecture/orient/machine-metadata.md
Add section explaining BRANCH.md tracking with minimal template based on
ACT-1 experience. Documents which sections to keep (Timeline, Testing,
Issues, Notes) and which to omit (Commits table, PR Information, Status
Updates) as they were found to be unused or redundant.
Removed Commits table, PR Information, and Status Updates sections that
were never used. Kept Timeline, Testing, and Issues Encountered sections
which provided value during execution.
- Updated BRANCH.md: Status complete, timeline filled, issues documented
- Updated README.md: ACT progress 1/5 branches merged
- Testing completed on geneticalgorithm
- Outcome verified with naive Claude (4/4 test questions passed)
Line 52 still referenced $ssep_uid after variable was renamed to $smep_uid.
This caused 'unbound variable' error when running tin machine list.
Caught during testing on geneticalgorithm.
Added test: 'Explain the difference between a UID and a port.'
This verifies readers understand UIDs and ports are different things
that share the SMEP numbering scheme.
Key changes:
- Added 'SMEP Applied to UIDs vs Ports' section in deployment_strategy.md
- Explicitly states UIDs and ports are different things sharing same numbering
- Machine UID always P=0 (system user ID)
- Machine ports P=0-9 (TCP/UDP port numbers)
- Updated README.md table to show both UID and port range
- Removed conflation of 'UID' when referring to ports
Makes terminology discipline explicit for readers.
Renamed and updated terminology throughout:
- CLAUDE.md: 'machine registry' in comments
- deployment_strategy.md: directory description
- service_integration_design.md: section title and paths
- atproto_pds_deployment_plan.md: phase title
- README.md: updated link and description
- Renamed docs/service-registry-update.md → machine-registry-update.md
- Updated all content within machine-registry-update.md
All user-facing terminology now consistently uses 'Machine Registry'
- cmd/machine/rm.sh: Updated log message to 'machine registry'
- cmd/registry/show.sh: Updated label and function call to use list_machines()
- machine/teardown.sh: Updated comment
Updated SSEP→SMEP and S-SS-E-P→S-M-E-P in:
- manual_test_plan.md
- example_workflows.md
- service_integration_design.md
- implementation_plan.md
- cli_forward_refactor.md
- deployment_strategy.md (S-M-E-P examples section)
- sheet_configuration.md (M instead of SS)
- CLAUDE.md (S-M-E-P scheme validation)
All user-facing code and docs now use SMEP terminology consistently.
OODA planning docs preserved as-is for historical record.
- cmd/machine/create.sh: SSEP→SMEP in help text
- cmd/machine/list.sh: SSEP→SMEP in comments and variable names
- lib/core.sh: SSEP→SMEP in error message
Creates symlink: /data/services → /data/machines
Ensures old code paths continue to work transparently
- README.md: Updated UID scheme references and registry links
- CLAUDE.md: Updated architecture philosophy and registry paths
- sheet_configuration.md: Updated SMEP terminology and error messages
- Changed UID scheme name from SSEP to SMEP
- Updated SS (Service) to M (Machine) throughout
- Renamed Service Registry section to Machine Registry
- Updated registry paths from service-registry to machine-registry
- Updated port allocation table headers
- Renamed service_registry.md to machine_registry.md
- Updated all terminology: service→machine, SSEP→SMEP
- Updated file paths from /data/services to /data/machines
- create.sh: machine registry creation and examples
- rm.sh: machine registry removal and cascading
- show.sh: list machines instead of services
Core library changes:
- lib/registry.sh: Renamed all registry functions and variables
- lib/uid.sh: Updated to SMEP scheme with backward compatibility
- machine/setup_station.sh: Updated registry creation logic
Backward compatibility:
- Added deprecated wrappers for old function names
- Old code will work but show deprecation warnings
Document how to pass TIN_SERVICE_UID as a build argument for services that
need correct file ownership at build time. Covers:
- When and why to use build arguments
- Docker Compose build args configuration
- Dockerfile ARG usage and user creation
- Benefits of build-time UID awareness
- When to use vs skip this pattern
Based on lessons learned deploying Python services with UV where venv
ownership needs to match runtime UID.
Document the non-editable install approach for Python services using UV
package manager in tinsnip deployments. Covers:
- The editable install problem with workspace packages
- Solution using --no-install-workspace and non-editable install
- Complete Dockerfile and docker-compose.yml patterns
- Key points for UID isolation compatibility
- Common mistakes to avoid
- Local development workflow
Based on lessons learned deploying the account service.
The env-loader pattern ^[^-]+-[^-]+$ only matched service-env pairs
where the service name had no dashes (e.g. gazette-prod worked but
bsky-pds-dev failed). This caused DOCKER_HOST to be undefined, breaking
docker ps and other commands for services like bsky-pds.
Changed pattern to ^.+-[^-]+$ to allow dashes in service names while
keeping environment as single word.
Critical fixes for rootless Docker in non-systemd environments:
1. Docker runtime directory detection
- Detect systemd availability at runtime
- Use ~/.docker/run when systemd is absent
- Use /run/user/$uid when systemd is available
- Fixes: lib/docker.sh (setup_docker_environment, ensure_docker_running)
2. Docker cgroup driver configuration
- Set cgroupfs driver when systemd not available
- Prevents "Interactive authentication required" errors
- Fixes: lib/docker.sh (configure_docker_data_root)
3. Machine name parsing with hyphens
- Created parse_machine_name() function
- Parses from end using known environment names
- Handles service names like "bsky-pds" correctly
- Fixes: lib/uid.sh, cmd/machine/list.sh, cmd/service/deploy.sh,
cmd/machine/rm.sh, cmd/machine/status.sh, cmd/status/show.sh,
machine/teardown.sh
4. Logging output to stderr
- Changed log_with_prefix() to output to stderr
- Prevents pollution of command substitution
- Fixes UID calculation capturing log output
- Fixes: lib/core.sh
5. Service command environment sourcing
- Added source /mnt/$service_env/.env before docker commands
- Ensures DOCKER_HOST and XDG_RUNTIME_DIR are set
- Fixes: cmd/service/deploy.sh, cmd/service/logs.sh
Refactoring:
- Consolidated scattered scripts into lib/ modules
- Moved CLI commands to cmd/ structure
- Removed legacy bin/tin-* scripts
Tested end-to-end with bsky-pds deployment on non-systemd Ubuntu.
All rootless Docker operations now work correctly.
- Create docs/ folder with lowercase filenames for all documentation
- Move test scripts to tests/ directory
- Keep CLAUDE.md at root for Claude Code compatibility
- Update all documentation references in README.md
- Add comprehensive Documentation section to README.md
- Remove completed SERVICE_REPO_MIGRATION.md
Project now has clean separation:
- Root: essential files only (README.md, CLAUDE.md, install.sh, setup.sh)
- docs/: organized documentation with lowercase names
- tests/: development and integration test scripts
- bin/, machine/, scripts/, service/, station/: functional directories
- Fix topsheet sheet number validation (expected 5, not 9)
- Remove custom sheet tests that require registry infrastructure
- Update port allocation tests to match sheet 5 UIDs (50000 range)
- Clean up validation tests to only test registry-independent functionality
- Add new documentation files for ATPROTO, nRF development, and service setup
- Update SSH key management to use standard ~/.ssh/ location instead of NFS-stored keys
- Add automatic directory recovery for NFS mount failures (handles partial teardowns)
- Complete migration from namespace to sheet terminology across all documentation
- Fix NFS path generation to use /volume1/tinsnip// structure
- Update installer to skip SSH key generation when working SSH already exists
- Remove hardcoded /volume1/tinsnip/station-prod paths in favor of mount points
- Consolidate mount recovery logic into single robust function
- Add interactive sudo fallback for directory creation on NAS
- Remove deprecated backward compatibility functions from lib.sh
- Fix remaining function calls in test files and machine scripts
- Update all bin wrappers: tin, tin-status, tin-machine use sheet terminology
- Update machine/scripts: setup_station.sh, validate_functions.sh, generate_nfs_exports.sh
- Update legacy scripts: deploy_from_catalog.sh, register_service.sh
- Fix system profile script: /etc/profile.d/tinsnip-namespace.sh → tinsnip-sheet.sh
- Remove obsolete namespace_proposals.txt
All components now consistently use SSEP (Sheet-Service-Environment-Port)
terminology with topsheet as the reserved platform sheet.
The namespace → sheet migration is now 100% complete.
- Updated NSEP scheme: N (Namespace) → N (Sheet)
- Function name: get_namespace_number() → get_sheet_number()
- All TIN_NAMESPACE → TIN_SHEET variable references
- Updated examples: 'dynamicalsystem namespace' → 'dynamicalsystem sheet'
- Preserved technical namespace references (Linux namespaces, etc.)
Sheet terminology migration is now 100% complete across all files.
- Updated documentation files (CLAUDE.md, CREATE_MACHINE.md, SHEET_CONFIGURATION.md)
- Updated setup.sh and legacy scripts to use TIN_SHEET and /etc/tinsnip-sheet
- All file path references now consistently use sheet terminology
Sheet terminology migration is now completely consistent.
- Updated all .md files: TIN_NAMESPACE → TIN_SHEET throughout
- Renamed NAMESPACE_CONFIGURATION.md → SHEET_CONFIGURATION.md
- Added comprehensive Sheet Concept section to CLAUDE.md explaining:
- Isolated UID ranges and storage per sheet
- Common use cases (personal/work, multi-tenant, env separation)
- The special topsheet sheet for platform services
- Sheet management commands
The namespace → sheet terminology migration is now complete.
All code, tests, docs, and concepts consistently use sheet terminology.
Major components completed:
- Registry functions: Updated to use sheet terminology and provide tin-sheet integration
- Machine scripts: All TIN_NAMESPACE → TIN_SHEET conversions
- Test files: Updated namespace references to sheet terminology
- CLI: tin-namespace → tin-sheet (registry integration working)
The core infrastructure now uses "sheet" terminology throughout.
Remaining: documentation updates and concept explanations.
- Renamed bin/tin-namespace to bin/tin-sheet
- Updated command help text to use sheet terminology
- Updated function names to use sheet terminology
- Partial migration - function implementations still need updating
This is part of the larger namespace→sheet terminology migration.
- Updated all test descriptions and output messages
- Changed function calls to use sheet terminology
- Updated variable names from 'namespace' to 'sheet'
- Changed registry file naming in tests
- All tests pass with new sheet terminology
Validates that the core sheet registry functionality works correctly.
- Rename core functions: get_namespace_number() → get_sheet_number()
- Update UID convention: NSEP → SSEP (Sheet-Service-Environment-Port)
- Change variable names: TIN_NAMESPACE → TIN_SHEET
- Update registry paths: /data/namespaces → /data/sheets
- Update config file: /etc/tinsnip-namespace → /etc/tinsnip-sheet
- Add backward compatibility functions with deprecation warnings
Core infrastructure now uses sheet terminology throughout.
- Change reserved namespace name from 'tinsnip' to 'topsheet'
- Resolves naming collision between product name and namespace name
- topsheet namespace (N=9) contains the global registry
- Updated all code references to use 'topsheet' as platform namespace
- Updated test files to validate topsheet reservation
This prepares for the full namespace→sheet terminology migration.
- Registry functions were looking for /volume1/tinsnip/station-prod/data (NAS filesystem)
- Should look for /mnt/station-prod/data (local mount point)
- This caused tin status to always show 'Not found' even when station was working
- Fixes CLI detection of successfully bootstrapped tinsnip.station-prod
- ssh_to_nas now properly handles -t option for terminal allocation
- Extracts SSH flags from arguments before processing authentication
- Enables automated directory creation with password sudo
- Fixes silent failures during NFS setup bootstrap
Implements coordinated port allocation for multi-service deployments.
Key Features:
- First-fit port allocation algorithm with gap support
- Automatic service .env generation with TIN_PORT_* variables
- Port deallocation on service removal
- Idempotent service deployment
Includes Bug Fixes:
- Sheet creation registry file path (15e877d)
- Service logs/status commands for ACT-2 structure (232dd58)
- Removed legacy service deployment from machine setup (93b1010)
Testing:
- Multi-service deployments verified
- Port conflict prevention confirmed
- Deallocation and reallocation tested
- HTTP endpoints validated
Branch: feature/port-allocation-algorithm
Commits: 11 commits
Status: All tests passing
Added comprehensive documentation for the environment variable loading
pattern used in service deployment:
- cmd/service/deploy.sh: Updated comments to clarify env loading
- docs/deployment_strategy.md: New section on Docker Compose env loading
- docs/service_creation.md: Added constraints and troubleshooting
- ooda/.../plan.md: Detailed explanation of env loading requirements
- lib/core.sh, lib/docker.sh: Minor formatting improvements
This pattern was debugged extensively during ACT-3 testing and must be
maintained to support both Docker Compose YAML interpolation and
rootless Docker daemon stability.
Comprehensive testing summary including:
- All core functionality verified (multi-service, port tracking, .env generation)
- Two test environments (test-ports-dev, act-3-dev)
- Detailed deployment test results with verification commands
- Three bug fixes documented with root causes and solutions
- Implementation summary with function list and integration points
- Port allocation algorithm description
- Ready for merge checklist
All ASCII characters (no emojis).
The copy_service_definition() function was creating /mnt/{machine}/service/{machine}/
during machine creation, which conflicts with the new multi-service architecture.
Changes:
- Removed copy_service_definition() function entirely
- Removed call to copy_service_definition() from main()
- Machine setup now only creates infrastructure (user, NFS, Docker, metadata)
- Service deployment is now exclusively handled by 'tin service deploy'
- Updated terminology: 'Service:' → 'Machine:' in output
Fixes empty service directories with same name as machine (e.g., act-3-dev/service/act-3/).
Commands were using old /mnt/$service_env/.env location.
Updated to:
- Source .machine/machine.env (machine infrastructure)
- Source service/$service/.env (service-specific vars)
- Use same env loading pattern as deploy.sh (set -a, unset XDG)
Fixes 'No such file or directory' errors in tin service logs/status.
The sheet creation was trying to write the machine registry to
/mnt/station-prod/data/machines/<sheet> which would fail if the
directory already existed from previous machine creations.
Changed to:
- Create directory if needed
- Write registry file to <sheet>/registry
This handles both fresh sheet creation and re-creation after removal.
Fixes error: 'tee: /mnt/station-prod/data/machines/<sheet>: Is a directory'
Login shells (-i flag) run initialization scripts (.bash_profile, etc)
which can reset environment variables. Since we explicitly source the
env files we need, we don't need login shell behavior.
Before: sudo -u user -i bash -c "source env && ..."
After: sudo -u user bash -c "source env && ..."
This fixes docker compose warnings about unset variables.
Also fix temp file permissions in port deallocation by creating temp
file as station-prod user instead of current user.
Refs: ooda/2025-10-multi-service-architecture/act/03-port-allocation/BRANCH.md
Allow redeployment of services without removing first.
Behavior:
- If service exists with matching port count: Reuse allocation
- If service exists with different port count: Error (requires rm/deploy)
- If service doesn't exist: Allocate new ports as before
Benefits:
- Update service files without removing
- Retry failed deployments easily
- More ergonomic workflow
- Matches Docker/Kubernetes expectations
Error only when actual conflict (port count mismatch).
Refs: ooda/2025-10-multi-service-architecture/act/03-port-allocation/plan.md
After ACT-2, machine infrastructure moved to .machine/machine.env.
Service deployment now needs both:
- Machine infrastructure: .machine/machine.env
- Service ports: service/<catalog>/.env
Updated:
- cmd/service/deploy.sh: Source both env files for setup.sh and docker compose
- cmd/service/rm.sh: Use both env files for docker compose down
This bridges ACT-2 and ACT-3 until ACT-4 adds env_file directive to
docker-compose.yml files.
Refs: ooda/2025-10-multi-service-architecture/act/03-port-allocation/plan.md
- Add port allocation functions to lib/metadata.sh:
- get_service_port_count(): Parse docker-compose.yml for port needs
- find_available_port_range(): First-fit allocation algorithm
- allocate_service_ports(): Central port allocation with registry
- deallocate_service_ports(): Free ports on service removal
- generate_service_env(): Create service .env with TIN_PORT_* vars
- Update cmd/service/deploy.sh:
- Allocate ports before service deployment
- Generate per-service .env files
- Show allocated port range in success message
- Clean up on allocation failure
- Update cmd/service/rm.sh:
- Deallocate ports on service removal
- Update success message to note freed ports
- Remove old allocate_service_ports() from lib/core.sh
- Old implementation did not support multi-service coordination
- Fix .gitignore: Change service/ to /service/ to allow cmd/service/ tracking
- Add cmd/service/*.sh files (previously untracked)
- These CLI commands are part of the platform
- Update OODA documentation:
- Add ACT-3 plan.md with implementation details
- Update outcomes.md with port-allocation outcome and tests
- Update BRANCH.md to track progress
Refs: ooda/2025-10-multi-service-architecture/orient/port-allocation-algorithm.md
Refs: ooda/2025-10-multi-service-architecture/act/03-port-allocation/plan.md
Centralize machine metadata in station-prod to enable:
- Single source of truth for machine configuration
- Central observability of all machines
- Foundation for ACT-3 port allocation
Implementation:
- New lib/metadata.sh for metadata management
- machine.env + ports files in /mnt/station-prod/data/machines/<sheet>/<machine-env>/
- .machine symlink from mount point for easy access
- Updated registry to use <sheet>/registry directory structure
Tested on geneticalgorithm, all success criteria passed.
Closes: ooda/2025-10-multi-service-architecture/act/02-machine-metadata
Machine metadata implementation tested and verified on geneticalgorithm.
All success criteria passed.
Results:
- Metadata centralized in /mnt/station-prod/data/machines/<sheet>/<machine-env>/
- .machine symlink provides clean access path
- machine.env + ports files created correctly
- No regressions
Ready to merge to main.
Refs: ooda/2025-10-multi-service-architecture/act/02-machine-metadata/BRANCH.md
Update machine registry to use <sheet>/registry file structure instead
of flat <sheet> files. This makes room for machine metadata at
<sheet>/<machine-env>/.
Structure:
- Old: /data/machines/<sheet> (file)
- New: /data/machines/<sheet>/registry (file in directory)
/data/machines/<sheet>/<machine-env>/ (machine metadata)
Updated registry functions in lib/registry.sh and lib/uid.sh to append
'/registry' to all registry paths.
Refs: ooda/2025-10-multi-service-architecture/act/02-machine-metadata/plan.md
Remove SCRIPT_DIR definition from lib/metadata.sh to avoid overwriting
the caller's SCRIPT_DIR variable. Assumes core.sh is already sourced by
the caller (which it always is in our use cases).
Fixes path resolution issue where setup_service.sh couldn't find
setup_station.sh because SCRIPT_DIR was overwritten to point to lib/
instead of machine/.
Refs: ooda/2025-10-multi-service-architecture/act/02-machine-metadata/plan.md
Create machine metadata in station-prod instead of .env at machine root:
- New lib/metadata.sh with metadata management functions
- machine.env + ports registry in /mnt/station-prod/data/machines/<sheet>/<machine-env>/
- .machine symlink from mount to station-prod for easy access
- Docker env appends to machine.env instead of .env
Benefits:
- Single source of truth in station-prod
- Central observability via ls /mnt/station-prod/data/machines/<sheet>/
- Foundation for ACT-3 port allocation
- Self-documenting via symlink (no env var complexity)
Refs: ooda/2025-10-multi-service-architecture/act/02-machine-metadata/plan.md
Key changes:
- Added 'SMEP Applied to UIDs vs Ports' section in deployment_strategy.md
- Explicitly states UIDs and ports are different things sharing same numbering
- Machine UID always P=0 (system user ID)
- Machine ports P=0-9 (TCP/UDP port numbers)
- Updated README.md table to show both UID and port range
- Removed conflation of 'UID' when referring to ports
Makes terminology discipline explicit for readers.
Renamed and updated terminology throughout:
- CLAUDE.md: 'machine registry' in comments
- deployment_strategy.md: directory description
- service_integration_design.md: section title and paths
- atproto_pds_deployment_plan.md: phase title
- README.md: updated link and description
- Renamed docs/service-registry-update.md → machine-registry-update.md
- Updated all content within machine-registry-update.md
All user-facing terminology now consistently uses 'Machine Registry'
Updated SSEP→SMEP and S-SS-E-P→S-M-E-P in:
- manual_test_plan.md
- example_workflows.md
- service_integration_design.md
- implementation_plan.md
- cli_forward_refactor.md
- deployment_strategy.md (S-M-E-P examples section)
- sheet_configuration.md (M instead of SS)
- CLAUDE.md (S-M-E-P scheme validation)
All user-facing code and docs now use SMEP terminology consistently.
OODA planning docs preserved as-is for historical record.
Core library changes:
- lib/registry.sh: Renamed all registry functions and variables
- lib/uid.sh: Updated to SMEP scheme with backward compatibility
- machine/setup_station.sh: Updated registry creation logic
Backward compatibility:
- Added deprecated wrappers for old function names
- Old code will work but show deprecation warnings
Document how to pass TIN_SERVICE_UID as a build argument for services that
need correct file ownership at build time. Covers:
- When and why to use build arguments
- Docker Compose build args configuration
- Dockerfile ARG usage and user creation
- Benefits of build-time UID awareness
- When to use vs skip this pattern
Based on lessons learned deploying Python services with UV where venv
ownership needs to match runtime UID.
Document the non-editable install approach for Python services using UV
package manager in tinsnip deployments. Covers:
- The editable install problem with workspace packages
- Solution using --no-install-workspace and non-editable install
- Complete Dockerfile and docker-compose.yml patterns
- Key points for UID isolation compatibility
- Common mistakes to avoid
- Local development workflow
Based on lessons learned deploying the account service.
The env-loader pattern ^[^-]+-[^-]+$ only matched service-env pairs
where the service name had no dashes (e.g. gazette-prod worked but
bsky-pds-dev failed). This caused DOCKER_HOST to be undefined, breaking
docker ps and other commands for services like bsky-pds.
Changed pattern to ^.+-[^-]+$ to allow dashes in service names while
keeping environment as single word.
Critical fixes for rootless Docker in non-systemd environments:
1. Docker runtime directory detection
- Detect systemd availability at runtime
- Use ~/.docker/run when systemd is absent
- Use /run/user/$uid when systemd is available
- Fixes: lib/docker.sh (setup_docker_environment, ensure_docker_running)
2. Docker cgroup driver configuration
- Set cgroupfs driver when systemd not available
- Prevents "Interactive authentication required" errors
- Fixes: lib/docker.sh (configure_docker_data_root)
3. Machine name parsing with hyphens
- Created parse_machine_name() function
- Parses from end using known environment names
- Handles service names like "bsky-pds" correctly
- Fixes: lib/uid.sh, cmd/machine/list.sh, cmd/service/deploy.sh,
cmd/machine/rm.sh, cmd/machine/status.sh, cmd/status/show.sh,
machine/teardown.sh
4. Logging output to stderr
- Changed log_with_prefix() to output to stderr
- Prevents pollution of command substitution
- Fixes UID calculation capturing log output
- Fixes: lib/core.sh
5. Service command environment sourcing
- Added source /mnt/$service_env/.env before docker commands
- Ensures DOCKER_HOST and XDG_RUNTIME_DIR are set
- Fixes: cmd/service/deploy.sh, cmd/service/logs.sh
Refactoring:
- Consolidated scattered scripts into lib/ modules
- Moved CLI commands to cmd/ structure
- Removed legacy bin/tin-* scripts
Tested end-to-end with bsky-pds deployment on non-systemd Ubuntu.
All rootless Docker operations now work correctly.
- Create docs/ folder with lowercase filenames for all documentation
- Move test scripts to tests/ directory
- Keep CLAUDE.md at root for Claude Code compatibility
- Update all documentation references in README.md
- Add comprehensive Documentation section to README.md
- Remove completed SERVICE_REPO_MIGRATION.md
Project now has clean separation:
- Root: essential files only (README.md, CLAUDE.md, install.sh, setup.sh)
- docs/: organized documentation with lowercase names
- tests/: development and integration test scripts
- bin/, machine/, scripts/, service/, station/: functional directories
- Fix topsheet sheet number validation (expected 5, not 9)
- Remove custom sheet tests that require registry infrastructure
- Update port allocation tests to match sheet 5 UIDs (50000 range)
- Clean up validation tests to only test registry-independent functionality
- Add new documentation files for ATPROTO, nRF development, and service setup
- Update SSH key management to use standard ~/.ssh/ location instead of NFS-stored keys
- Add automatic directory recovery for NFS mount failures (handles partial teardowns)
- Complete migration from namespace to sheet terminology across all documentation
- Fix NFS path generation to use /volume1/tinsnip// structure
- Update installer to skip SSH key generation when working SSH already exists
- Remove hardcoded /volume1/tinsnip/station-prod paths in favor of mount points
- Consolidate mount recovery logic into single robust function
- Add interactive sudo fallback for directory creation on NAS
- Remove deprecated backward compatibility functions from lib.sh
- Fix remaining function calls in test files and machine scripts
- Update all bin wrappers: tin, tin-status, tin-machine use sheet terminology
- Update machine/scripts: setup_station.sh, validate_functions.sh, generate_nfs_exports.sh
- Update legacy scripts: deploy_from_catalog.sh, register_service.sh
- Fix system profile script: /etc/profile.d/tinsnip-namespace.sh → tinsnip-sheet.sh
- Remove obsolete namespace_proposals.txt
All components now consistently use SSEP (Sheet-Service-Environment-Port)
terminology with topsheet as the reserved platform sheet.
The namespace → sheet migration is now 100% complete.
- Updated NSEP scheme: N (Namespace) → N (Sheet)
- Function name: get_namespace_number() → get_sheet_number()
- All TIN_NAMESPACE → TIN_SHEET variable references
- Updated examples: 'dynamicalsystem namespace' → 'dynamicalsystem sheet'
- Preserved technical namespace references (Linux namespaces, etc.)
Sheet terminology migration is now 100% complete across all files.
- Updated all .md files: TIN_NAMESPACE → TIN_SHEET throughout
- Renamed NAMESPACE_CONFIGURATION.md → SHEET_CONFIGURATION.md
- Added comprehensive Sheet Concept section to CLAUDE.md explaining:
- Isolated UID ranges and storage per sheet
- Common use cases (personal/work, multi-tenant, env separation)
- The special topsheet sheet for platform services
- Sheet management commands
The namespace → sheet terminology migration is now complete.
All code, tests, docs, and concepts consistently use sheet terminology.
Major components completed:
- Registry functions: Updated to use sheet terminology and provide tin-sheet integration
- Machine scripts: All TIN_NAMESPACE → TIN_SHEET conversions
- Test files: Updated namespace references to sheet terminology
- CLI: tin-namespace → tin-sheet (registry integration working)
The core infrastructure now uses "sheet" terminology throughout.
Remaining: documentation updates and concept explanations.
- Updated all test descriptions and output messages
- Changed function calls to use sheet terminology
- Updated variable names from 'namespace' to 'sheet'
- Changed registry file naming in tests
- All tests pass with new sheet terminology
Validates that the core sheet registry functionality works correctly.
- Rename core functions: get_namespace_number() → get_sheet_number()
- Update UID convention: NSEP → SSEP (Sheet-Service-Environment-Port)
- Change variable names: TIN_NAMESPACE → TIN_SHEET
- Update registry paths: /data/namespaces → /data/sheets
- Update config file: /etc/tinsnip-namespace → /etc/tinsnip-sheet
- Add backward compatibility functions with deprecation warnings
Core infrastructure now uses sheet terminology throughout.
- Change reserved namespace name from 'tinsnip' to 'topsheet'
- Resolves naming collision between product name and namespace name
- topsheet namespace (N=9) contains the global registry
- Updated all code references to use 'topsheet' as platform namespace
- Updated test files to validate topsheet reservation
This prepares for the full namespace→sheet terminology migration.
- Registry functions were looking for /volume1/tinsnip/station-prod/data (NAS filesystem)
- Should look for /mnt/station-prod/data (local mount point)
- This caused tin status to always show 'Not found' even when station was working
- Fixes CLI detection of successfully bootstrapped tinsnip.station-prod