testing#

testing philosophy and infrastructure for plyr.fm.

philosophy#

test behavior, not implementation#

tests should verify what the code does, not how it does it. this makes tests resilient to refactoring and keeps them focused on user-facing behavior.

good: "when a user likes a track, the like count increases" bad: "when _increment_like_counter is called, it executes UPDATE tracks SET..."

signs you're testing implementation:

mocking internal functions that aren't boundaries
asserting on SQL queries or ORM calls
testing private methods directly
tests break when you refactor without changing behavior

test at the right level#

unit tests: pure functions, utilities, data transformations
integration tests: API endpoints with real database
skip mocks when possible: prefer real dependencies (postgres, redis) over mocks

keep tests fast#

slow tests don't get run. we use parallel execution (xdist) and template databases to keep the full suite under 30 seconds.

parallel execution with xdist#

we run tests in parallel using pytest-xdist. each worker gets its own isolated database.

how it works#

template database: first worker creates a template with all migrations applied
clone per worker: each xdist worker clones from template (CREATE DATABASE ... WITH TEMPLATE)
instant setup: cloning is a file copy - no migrations needed per worker
advisory locks: coordinate template creation between workers

this is a common pattern for fast parallel test execution in large codebases.

the fixture chain#

test_database_url (session)
    └── creates template db (once, with advisory lock)
    └── clones worker db from template
    └── patches settings.database.url for this worker

_database_setup (session)
    └── marker that db is ready

_engine (function)
    └── creates engine for test_database_url
    └── clears ENGINES cache

_clear_db (function)
    └── calls clear_database() procedure after each test

db_session (function)
    └── provides AsyncSession for test

common pitfall: missing db_session dependency#

if a test uses the FastAPI app but doesn't depend on db_session, the database URL won't be patched for the worker. the test will connect to the wrong database.

wrong:

@pytest.fixture
def test_app() -> FastAPI:
    return app

async def test_something(test_app: FastAPI):
    # may connect to wrong database in xdist!
    ...

right:

@pytest.fixture
def test_app(db_session: AsyncSession) -> FastAPI:
    _ = db_session  # ensures db fixtures run first
    return app

async def test_something(test_app: FastAPI):
    # database URL is correctly patched
    ...

running tests#

# from repo root
just backend test

# run specific test
just backend test tests/api/test_tracks.py

# run with coverage
just backend test --cov

# run single-threaded (debugging)
just backend test -n 0

writing good tests#

do#

use descriptive test names that describe behavior
one assertion per concept (multiple asserts ok if testing one behavior)
use fixtures for setup, not test body
test edge cases and error conditions
add regression tests when fixing bugs

don't#

use @pytest.mark.asyncio - we use asyncio_mode = "auto"
mock database calls - use real postgres
test ORM internals or SQL structure
leave tests that depend on execution order
skip tests instead of fixing them (unless truly environment-specific)

when private function tests are acceptable#

generally avoid testing private functions (_foo), but there are pragmatic exceptions:

acceptable:

pure utility functions with complex logic (string parsing, data transformation)
functions that are difficult to exercise through public API alone
when the private function is a clear unit with stable interface

not acceptable:

implementation details that might change (crypto internals, caching strategy)
internal orchestration functions
anything that's already exercised by integration tests

the key question: "if i refactor, will this test break even though behavior didn't change?"

database fixtures#

clear_database procedure#

instead of truncating tables between tests (slow), we use a stored procedure that deletes only rows created during the test:

CALL clear_database(:test_start_time)

this deletes rows where created_at > test_start_time, preserving any seed data.

why not transactions?#

rolling back transactions is faster, but:

can't test commit behavior
can't test constraints properly
some ORMs behave differently in uncommitted transactions

delete-by-timestamp gives us real commits while staying fast.

redis isolation for parallel tests#

tests that use redis (caching, background tasks) need isolation between xdist workers. without isolation, one worker's cache entries pollute another's tests.

how it works#

each xdist worker uses a different redis database number:

worker	redis db
master/gw0	1
gw1	2
gw2	3
...	...

db 0 is reserved for local development.

the redis_database fixture#

@pytest.fixture(scope="session", autouse=True)
def redis_database(worker_id: str) -> Generator[None, None, None]:
    """use isolated redis databases for parallel test execution."""
    db = _redis_db_for_worker(worker_id)
    new_url = _redis_url_with_db(settings.docket.url, db)

    # patch settings for this worker process
    settings.docket.url = new_url
    os.environ["DOCKET_URL"] = new_url
    clear_client_cache()

    # flush db before tests
    sync_redis = redis.Redis.from_url(new_url)
    sync_redis.flushdb()
    sync_redis.close()

    yield

    # flush after tests
    ...

this fixture is autouse=True so it applies to all tests automatically.

common pitfall: unique URIs in cache tests#

even with per-worker database isolation, tests within the same worker share redis state. if multiple tests use the same cache keys, they can interfere with each other.

wrong:

async def test_caching_first():
    uris = ["at://did:plc:test/fm.plyr.track/1"]  # generic URI
    result = await get_active_copyright_labels(uris)
    # caches the result

async def test_caching_second():
    uris = ["at://did:plc:test/fm.plyr.track/1"]  # same URI!
    result = await get_active_copyright_labels(uris)
    # gets cached value from first test - may fail unexpectedly

right:

async def test_caching_first():
    uris = ["at://did:plc:first/fm.plyr.track/1"]  # unique to this test
    ...

async def test_caching_second():
    uris = ["at://did:plc:second/fm.plyr.track/1"]  # different URI
    ...

use unique identifiers (test name, uuid, etc.) in cache keys to avoid cross-test pollution.