# testing testing philosophy and infrastructure for plyr.fm. ## philosophy ### test behavior, not implementation tests should verify *what* the code does, not *how* it does it. this makes tests resilient to refactoring and keeps them focused on user-facing behavior. **good**: "when a user likes a track, the like count increases" **bad**: "when `_increment_like_counter` is called, it executes `UPDATE tracks SET...`" signs you're testing implementation: - mocking internal functions that aren't boundaries - asserting on SQL queries or ORM calls - testing private methods directly - tests break when you refactor without changing behavior ### test at the right level - **unit tests**: pure functions, utilities, data transformations - **integration tests**: API endpoints with real database - **skip mocks when possible**: prefer real dependencies (postgres, redis) over mocks ### keep tests fast slow tests don't get run. we use parallel execution (xdist) and template databases to keep the full suite under 30 seconds. ## parallel execution with xdist we run tests in parallel using pytest-xdist. each worker gets its own isolated database. ### how it works 1. **template database**: first worker creates a template with all migrations applied 2. **clone per worker**: each xdist worker clones from template (`CREATE DATABASE ... WITH TEMPLATE`) 3. **instant setup**: cloning is a file copy - no migrations needed per worker 4. **advisory locks**: coordinate template creation between workers this is a common pattern for fast parallel test execution in large codebases. ### the fixture chain ``` test_database_url (session) └── creates template db (once, with advisory lock) └── clones worker db from template └── patches settings.database.url for this worker _database_setup (session) └── marker that db is ready _engine (function) └── creates engine for test_database_url └── clears ENGINES cache _clear_db (function) └── calls clear_database() procedure after each test db_session (function) └── provides AsyncSession for test ``` ### common pitfall: missing db_session dependency if a test uses the FastAPI app but doesn't depend on `db_session`, the database URL won't be patched for the worker. the test will connect to the wrong database. **wrong**: ```python @pytest.fixture def test_app() -> FastAPI: return app async def test_something(test_app: FastAPI): # may connect to wrong database in xdist! ... ``` **right**: ```python @pytest.fixture def test_app(db_session: AsyncSession) -> FastAPI: _ = db_session # ensures db fixtures run first return app async def test_something(test_app: FastAPI): # database URL is correctly patched ... ``` ## running tests ```bash # from repo root just backend test # run specific test just backend test tests/api/test_tracks.py # run with coverage just backend test --cov # run single-threaded (debugging) just backend test -n 0 ``` ## writing good tests ### do - use descriptive test names that describe behavior - one assertion per concept (multiple asserts ok if testing one behavior) - use fixtures for setup, not test body - test edge cases and error conditions - add regression tests when fixing bugs ### don't - use `@pytest.mark.asyncio` - we use `asyncio_mode = "auto"` - mock database calls - use real postgres - test ORM internals or SQL structure - leave tests that depend on execution order - skip tests instead of fixing them (unless truly environment-specific) ## when private function tests are acceptable generally avoid testing private functions (`_foo`), but there are pragmatic exceptions: **acceptable**: - pure utility functions with complex logic (string parsing, data transformation) - functions that are difficult to exercise through public API alone - when the private function is a clear unit with stable interface **not acceptable**: - implementation details that might change (crypto internals, caching strategy) - internal orchestration functions - anything that's already exercised by integration tests the key question: "if i refactor, will this test break even though behavior didn't change?" ## database fixtures ### clear_database procedure instead of truncating tables between tests (slow), we use a stored procedure that deletes only rows created during the test: ```sql CALL clear_database(:test_start_time) ``` this deletes rows where `created_at > test_start_time`, preserving any seed data. ### why not transactions? rolling back transactions is faster, but: - can't test commit behavior - can't test constraints properly - some ORMs behave differently in uncommitted transactions delete-by-timestamp gives us real commits while staying fast. ## redis isolation for parallel tests tests that use redis (caching, background tasks) need isolation between xdist workers. without isolation, one worker's cache entries pollute another's tests. ### how it works each xdist worker uses a different redis database number: | worker | redis db | |--------|----------| | master/gw0 | 1 | | gw1 | 2 | | gw2 | 3 | | ... | ... | db 0 is reserved for local development. ### the redis_database fixture ```python @pytest.fixture(scope="session", autouse=True) def redis_database(worker_id: str) -> Generator[None, None, None]: """use isolated redis databases for parallel test execution.""" db = _redis_db_for_worker(worker_id) new_url = _redis_url_with_db(settings.docket.url, db) # patch settings for this worker process settings.docket.url = new_url os.environ["DOCKET_URL"] = new_url clear_client_cache() # flush db before tests sync_redis = redis.Redis.from_url(new_url) sync_redis.flushdb() sync_redis.close() yield # flush after tests ... ``` this fixture is `autouse=True` so it applies to all tests automatically. ### common pitfall: unique URIs in cache tests even with per-worker database isolation, tests within the same worker share redis state. if multiple tests use the same cache keys, they can interfere with each other. **wrong**: ```python async def test_caching_first(): uris = ["at://did:plc:test/fm.plyr.track/1"] # generic URI result = await get_active_copyright_labels(uris) # caches the result async def test_caching_second(): uris = ["at://did:plc:test/fm.plyr.track/1"] # same URI! result = await get_active_copyright_labels(uris) # gets cached value from first test - may fail unexpectedly ``` **right**: ```python async def test_caching_first(): uris = ["at://did:plc:first/fm.plyr.track/1"] # unique to this test ... async def test_caching_second(): uris = ["at://did:plc:second/fm.plyr.track/1"] # different URI ... ``` use unique identifiers (test name, uuid, etc.) in cache keys to avoid cross-test pollution.