pydantic#

pydantic is a library, not the language. but it's become foundational enough that it's worth understanding.

the core idea#

python's type hints don't do anything at runtime by default. def foo(x: int) accepts strings, floats, whatever - the annotation is just documentation.

pydantic makes them real. define a model with type hints, and pydantic validates and coerces data to match:

from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int

user = User(name="alice", age="25")  # age coerced to int
user = User(name="alice", age="not a number")  # raises ValidationError

this is why pydantic shows up everywhere in python - it bridges the gap between python's dynamic runtime and the desire for validated, typed data.

settings from environment#

the most common use: replacing os.getenv() calls with validated configuration.

from pydantic import Field
from pydantic_settings import BaseSettings, SettingsConfigDict

class Settings(BaseSettings):
    """settings for atproto cli."""

    model_config = SettingsConfigDict(
        env_file=str(Path.cwd() / ".env"),
        env_file_encoding="utf-8",
        extra="ignore",
        case_sensitive=False,
    )

    atproto_pds_url: str = Field(
        default="https://bsky.social",
        description="PDS URL",
    )
    atproto_handle: str = Field(default="", description="Your atproto handle")
    atproto_password: str = Field(default="", description="Your atproto app password")

settings = Settings()

model_config controls where settings come from (environment, .env files) and how to handle unknowns. required fields without defaults fail at import time - not later when you try to use them.

from pdsx/_internal/config.py

annotated types for reusable validation#

when multiple schemas share the same validation logic, bind it to the type itself instead of repeating @field_validator on each schema:

from datetime import timedelta
from typing import Annotated
from pydantic import AfterValidator, BaseModel

def _validate_non_negative_timedelta(v: timedelta) -> timedelta:
    if v < timedelta(seconds=0):
        raise ValueError("timedelta must be non-negative")
    return v

NonNegativeTimedelta = Annotated[
    timedelta,
    AfterValidator(_validate_non_negative_timedelta)
]

class RunDeployment(BaseModel):
    schedule_after: NonNegativeTimedelta

benefits:

  • write validation once
  • field types become swappable interfaces
  • types are self-documenting

from coping with python's type system

model_validator for side effects#

run setup code when settings load:

from typing import Self
from pydantic import model_validator
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    debug: bool = False

    @model_validator(mode="after")
    def configure_logging(self) -> Self:
        setup_logging(debug=self.debug)
        return self

settings = Settings()  # logging configured on import

the validator runs after all fields are set. use for side effects that depend on configuration values.

from bot/config.py

when to use what#

pydantic models are heavier than they look - they do a lot of work on instantiation. for internal data you control, python's dataclasses are simpler:

from dataclasses import dataclass

@dataclass
class BatchResult:
    """result of a batch operation."""
    successful: list[str]
    failed: list[tuple[str, Exception]]

    @property
    def total(self) -> int:
        return len(self.successful) + len(self.failed)

no validation, no coercion, just a class with fields. use pydantic at boundaries (API input, config files, external data) where you need validation. use dataclasses for internal data structures.

from pdsx/_internal/batch.py

sources: