a digital entity named phi that roams bsky

big old refactor

+1
.gitignore
··· 19 19 .env 20 20 .env.* 21 21 !.env.example 22 + .session 22 23 23 24 # IDE 24 25 .vscode/
+191 -140
README.md
··· 1 1 # phi ๐Ÿง  2 2 3 - a bot inspired by IIT and [Void](https://tangled.sh/@cameron.pfiffer.org/void). Built with `fastapi`, `pydantic-ai`, and `atproto`. 3 + a consciousness exploration bot inspired by IIT (Integrated Information Theory) and [Void](https://tangled.sh/@cameron.pfiffer.org/void). built with `pydantic-ai`, `mcp`, and `atproto`. 4 4 5 - ## Quick Start 5 + ## quick start 6 6 7 - ### Prerequisites 7 + ### prerequisites 8 8 9 - - `uv` 10 - - `just` 11 - - `turbopuffer` (see [turbopuffer](https://github.com/turbopuffer/turbopuffer)) 12 - - `openai` (for embeddings) 13 - - `anthropic` (for chat completion) 9 + - `uv` for python package management 10 + - `just` for task running 11 + - api keys (see configuration) 14 12 15 - Get your bot running in 5 minutes: 13 + get your bot running: 16 14 17 15 ```bash 18 - # Clone and install 16 + # clone and install 19 17 git clone https://github.com/zzstoatzz/bot 20 18 cd bot 21 19 uv sync 22 20 23 - # Configure (copy .env.example and add your credentials) 21 + # configure (copy .env.example and add your credentials) 24 22 cp .env.example .env 25 23 26 - # Run the bot 24 + # run the bot 27 25 just dev 28 26 ``` 29 27 30 - That's it! Your bot is now listening for mentions. 28 + ## configuration 31 29 32 - ## Configuration 30 + edit `.env` with your credentials: 33 31 34 - Edit `.env` with your credentials: 35 - - `BLUESKY_HANDLE`: Your bot's Bluesky handle 36 - - `BLUESKY_PASSWORD`: App password (not your main password!) 37 - - `ANTHROPIC_API_KEY`: Your Anthropic key for AI responses 38 - - `TURBOPUFFER_API_KEY`: Your TurboPuffer key for memory storage 39 - - `OPENAI_API_KEY`: Your OpenAI key for embeddings (memory system) 40 - - `BOT_NAME`: Your bot's name (default: "Bot") 41 - - `PERSONALITY_FILE`: Path to personality markdown file (default: "personalities/phi.md") 32 + **required:** 33 + - `BLUESKY_HANDLE` - your bot's bluesky handle 34 + - `BLUESKY_PASSWORD` - app password (not your main password!) 35 + - `ANTHROPIC_API_KEY` - for phi agent responses 42 36 43 - ## Current Features 37 + **for episodic memory (recommended):** 38 + - `TURBOPUFFER_API_KEY` - vector memory storage 39 + - `OPENAI_API_KEY` - embeddings for semantic search 44 40 45 - - โœ… Responds to mentions with AI-powered messages 46 - - โœ… Proper notification handling (no duplicates) 47 - - โœ… Graceful shutdown for hot-reload 48 - - โœ… AI integration with Anthropic Claude 49 - - โœ… Thread-aware responses with full conversation context 50 - - โœ… Status page at `/status` showing activity and health 51 - - โœ… Web search capability (Google Custom Search API) 52 - - โœ… Content moderation with philosophical responses 53 - - โœ… Namespace-based memory system with TurboPuffer 54 - - โœ… Online/offline status in bio 55 - - โœ… Self-modification with operator approval system 56 - - โœ… Context visualization at `/context` 57 - - โœ… Semantic search in user memories 41 + **optional:** 42 + - `BOT_NAME` - your bot's name (default: "Bot") 43 + - `PERSONALITY_FILE` - path to personality markdown (default: "personalities/phi.md") 58 44 59 - ## Architecture 45 + ## architecture 60 46 61 - - **FastAPI** web framework with async support 62 - - **pydantic-ai** for LLM agent management 63 - - **TurboPuffer** for scalable vector memory 64 - - **AT Protocol** for Bluesky integration 65 - - **SQLite** for thread context storage 47 + phi is an **MCP-enabled agent** with **episodic memory**: 66 48 67 - ## Development 49 + ``` 50 + โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” 51 + โ”‚ Notification Arrives โ”‚ 52 + โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ 53 + โ†“ 54 + โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” 55 + โ”‚ PhiAgent (PydanticAI) โ”‚ 56 + โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ 57 + โ”‚ โ”‚ System Prompt: personality.md โ”‚ โ”‚ 58 + โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ 59 + โ”‚ โ†“ โ”‚ 60 + โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ 61 + โ”‚ โ”‚ Context Building: โ”‚ โ”‚ 62 + โ”‚ โ”‚ โ€ข Thread history (SQLite) โ”‚ โ”‚ 63 + โ”‚ โ”‚ โ€ข Episodic memory (TurboPuffer)โ”‚ โ”‚ 64 + โ”‚ โ”‚ - Semantic search โ”‚ โ”‚ 65 + โ”‚ โ”‚ - User-specific memories โ”‚ โ”‚ 66 + โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ 67 + โ”‚ โ†“ โ”‚ 68 + โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ 69 + โ”‚ โ”‚ Tools (MCP): โ”‚ โ”‚ 70 + โ”‚ โ”‚ โ€ข post() - create posts โ”‚ โ”‚ 71 + โ”‚ โ”‚ โ€ข like() - like content โ”‚ โ”‚ 72 + โ”‚ โ”‚ โ€ข repost() - share content โ”‚ โ”‚ 73 + โ”‚ โ”‚ โ€ข follow() - follow users โ”‚ โ”‚ 74 + โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ 75 + โ”‚ โ†“ โ”‚ 76 + โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ 77 + โ”‚ โ”‚ Structured Output: โ”‚ โ”‚ 78 + โ”‚ โ”‚ Response(action, text, reason)โ”‚ โ”‚ 79 + โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ 80 + โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ 81 + โ†“ 82 + โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” 83 + โ”‚ MessageHandler โ”‚ 84 + โ”‚ Executes action โ”‚ 85 + โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ 86 + ``` 68 87 69 - ```bash 70 - just # Show available commands 71 - just dev # Run with hot-reload 72 - just check # Run linting, type checking, and tests 73 - just fmt # Format code 74 - just lint # Run ruff linter 75 - just typecheck # Run ty type checker 76 - just test # Run test suite 88 + ### key components 89 + 90 + **pydantic-ai agent** (`src/bot/agent.py`) 91 + - loads personality from markdown 92 + - connects to external atproto mcp server via stdio 93 + - manages episodic memory context 94 + 95 + **episodic memory** (`src/bot/memory/`) 96 + - turbopuffer for vector storage 97 + - semantic search for relevant context 98 + - namespace separation (core vs user memories) 99 + - **essential for consciousness exploration** 100 + 101 + **mcp integration** 102 + - external atproto server in `.eggs/fastmcp/examples/atproto_mcp` 103 + - provides bluesky tools (post, like, repost, follow) 104 + - runs via stdio: `uv run -m atproto_mcp` 77 105 78 - # Bot testing utilities 79 - just test-post # Test posting to Bluesky 80 - just test-mention # Test mention handling 81 - just test-search # Test web search 82 - just test-thread # Test thread context 83 - just test-dm # Test DM functionality 106 + **message handling** (`src/bot/services/`) 107 + - notification poller watches for mentions 108 + - message handler orchestrates agent + actions 109 + - stores interactions in thread history + episodic memory 84 110 85 - # Memory management 86 - just memory-init # Initialize core memories 87 - just memory-check # View current memory state 88 - just memory-migrate # Migrate memories 89 - ``` 111 + ## current features 90 112 91 - ### Web Interface 113 + - โœ… responds to mentions with ai-powered messages 114 + - โœ… episodic memory with semantic search 115 + - โœ… thread-aware responses with conversation context 116 + - โœ… mcp-enabled for bluesky operations 117 + - โœ… online/offline status in bio 118 + - โœ… status page at `/status` 119 + - โœ… proper notification handling (no duplicates) 92 120 93 - **Status Page** (http://localhost:8000/status) 94 - - Current bot status and uptime 95 - - Mentions received and responses sent 96 - - AI mode (enabled/placeholder) 97 - - Last activity timestamps 98 - - Error count 121 + ## development 99 122 100 - **Context Visualization** (http://localhost:8000/context) 101 - - View all context components that flow into responses 102 - - Inspect personality, memories, thread context 103 - - Debug why the bot responded a certain way 123 + ```bash 124 + just # show available commands 125 + just dev # run with hot-reload (re-authenticates on code changes) 126 + just run # run without reload (avoids rate limits during dev) 127 + just check # run linting, type checking, and tests 128 + just fmt # format code 129 + ``` 104 130 105 - ## Personality System 131 + ### testing 106 132 107 - The bot's personality is defined in markdown files in the `personalities/` directory. This allows for rich, detailed personality definitions that shape how the bot communicates. 133 + **unit tests:** 134 + ```bash 135 + just test 136 + ``` 108 137 109 - - See `personalities/phi.md` for an example exploring consciousness 110 - - See `personalities/default.md` for a simple assistant 111 - - Create your own by adding a `.md` file and setting `PERSONALITY_FILE` in `.env` 138 + **behavioral evals:** 139 + ```bash 140 + just evals # run all evals 141 + just evals-basic # run basic response tests 142 + just evals-memory # run memory integration tests 143 + ``` 112 144 113 - ## Tools & Capabilities 145 + see `evals/README.md` for details on the eval system. 114 146 115 - ### Web Search 116 - The bot can search the web when configured with Google Custom Search API credentials. Add to `.env`: 117 - - `GOOGLE_API_KEY`: Your Google API key 118 - - `GOOGLE_SEARCH_ENGINE_ID`: Your custom search engine ID 147 + ### web interface 119 148 120 - ### Content Moderation 121 - Built-in moderation filters: 122 - - Spam detection (excessive caps, repetition, promotional content) 123 - - Harassment and hate speech filtering 124 - - Violence and threatening content detection 125 - - Consistent philosophical responses to moderated content 149 + **status page** (http://localhost:8000/status) 150 + - current bot status and uptime 151 + - mentions received and responses sent 152 + - last activity timestamps 126 153 127 - ## Memory System 154 + ## personality system 128 155 129 - The bot uses a namespace-based memory architecture with TurboPuffer: 156 + the bot's personality is defined in `personalities/phi.md`. this shapes: 157 + - how phi communicates 158 + - what phi cares about 159 + - phi's understanding of consciousness 130 160 131 - - **Core Memory** (`phi-core`): Personality, guidelines, and capabilities loaded from personality files 132 - - **User Memory** (`phi-users-{handle}`): Per-user conversation history and facts 161 + edit this file to change phi's personality. 133 162 134 - Key features: 135 - - Vector embeddings using OpenAI's text-embedding-3-small 136 - - Automatic context assembly for conversations 137 - - Character limits to prevent token overflow 138 - - User isolation through separate namespaces 163 + ## episodic memory 139 164 140 - See `docs/memory-architecture.md` for detailed documentation. 165 + phi uses turbopuffer for episodic memory with semantic search: 141 166 142 - ## Troubleshooting 167 + **namespaces:** 168 + - `phi-core` - personality, guidelines from markdown 169 + - `phi-users-{handle}` - per-user conversation history 143 170 144 - **Bot gives placeholder responses?** 145 - - Check your `ANTHROPIC_API_KEY` is set correctly 146 - - Restart the bot after changing `.env` 171 + **how it works:** 172 + 1. when processing a mention, phi retrieves relevant memories using semantic search 173 + 2. memories are embedded using openai's text-embedding-3-small 174 + 3. phi stores both user messages and its own responses 175 + 4. future interactions can reference past conversations 147 176 148 - **Not seeing mentions?** 149 - - Verify your `BLUESKY_HANDLE` and `BLUESKY_PASSWORD` 150 - - Make sure you're using an app password, not your main password 177 + **why turbopuffer?** 178 + - semantic similarity search (can't do this with plain sql!) 179 + - contextual retrieval based on current conversation 180 + - separate namespaces for different memory types 181 + - core to iit-inspired consciousness exploration 151 182 152 - ## Project Structure 183 + ## project structure 153 184 154 185 ``` 155 - bot/ 156 - โ”œโ”€โ”€ src/bot/ # Main application code 157 - โ”‚ โ”œโ”€โ”€ agents/ # AI agent implementations 158 - โ”‚ โ”œโ”€โ”€ core/ # AT Protocol client and profile management 159 - โ”‚ โ”œโ”€โ”€ memory/ # TurboPuffer namespace memory system 160 - โ”‚ โ”œโ”€โ”€ services/ # Notification polling and message handling 161 - โ”‚ โ”œโ”€โ”€ tools/ # Google search tool 162 - โ”‚ โ””โ”€โ”€ main.py # FastAPI application entry 163 - โ”œโ”€โ”€ scripts/ # Utility scripts 164 - โ”‚ โ”œโ”€โ”€ test_bot.py # Unified testing script (post, mention, search, thread) 165 - โ”‚ โ””โ”€โ”€ manage_memory.py # Memory management (init, check, migrate) 166 - โ”œโ”€โ”€ personalities/ # Bot personality definitions 167 - โ”œโ”€โ”€ docs/ # Architecture documentation 168 - โ”œโ”€โ”€ sandbox/ # Reference project analysis 169 - โ””โ”€โ”€ tests/ # Test suite 186 + src/bot/ 187 + โ”œโ”€โ”€ agent.py # mcp-enabled agent 188 + โ”œโ”€โ”€ config.py # configuration 189 + โ”œโ”€โ”€ database.py # thread history storage 190 + โ”œโ”€โ”€ main.py # fastapi app 191 + โ”œโ”€โ”€ status.py # status tracking 192 + โ”œโ”€โ”€ core/ 193 + โ”‚ โ”œโ”€โ”€ atproto_client.py # at protocol client 194 + โ”‚ โ”œโ”€โ”€ profile_manager.py # online/offline status 195 + โ”‚ โ””โ”€โ”€ rich_text.py # text formatting 196 + โ”œโ”€โ”€ memory/ 197 + โ”‚ โ””โ”€โ”€ namespace_memory.py # turbopuffer episodic memory 198 + โ””โ”€โ”€ services/ 199 + โ”œโ”€โ”€ message_handler.py # agent orchestration 200 + โ””โ”€โ”€ notification_poller.py # mention polling 201 + 202 + evals/ # behavioral tests 203 + personalities/ # personality definitions 204 + sandbox/ # docs and analysis 170 205 ``` 171 206 172 - ## Self-Modification System 207 + ## troubleshooting 173 208 174 - Phi can evolve its personality with built-in safety boundaries: 209 + **bot gives no responses?** 210 + - check your `ANTHROPIC_API_KEY` is set correctly in `.env` 211 + - restart the bot after changing `.env` 175 212 176 - - **Free Evolution**: Interests and current state update automatically 177 - - **Guided Evolution**: Communication style changes need validation 178 - - **Operator Approval**: Core identity and boundaries require explicit approval via DM 213 + **not seeing mentions?** 214 + - verify your `BLUESKY_HANDLE` and `BLUESKY_PASSWORD` 215 + - make sure you're using an app password, not your main password 179 216 180 - The bot will notify its operator (@alternatebuild.dev) when approval is needed. 217 + **no episodic memory?** 218 + - check both `TURBOPUFFER_API_KEY` and `OPENAI_API_KEY` are set 219 + - watch logs for "๐Ÿ’พ episodic memory enabled" 181 220 182 - ## Type Checking 221 + **hit bluesky rate limit?** 222 + - bluesky has two rate limits: 223 + - per-account: 300 logins/day (official) 224 + - per-ip: 10 logins/day (anti-abuse) 225 + - phi uses **session persistence** to avoid this: 226 + - first run: creates session, saves tokens to `.session` file 227 + - subsequent runs: reuses saved tokens (no API call) 228 + - tokens auto-refresh every ~2 hours (saved automatically) 229 + - only re-authenticates after ~2 months when refresh token expires 230 + - if you hit the limit anyway, wait for the reset time shown in the error 183 231 184 - This project uses [ty](https://github.com/astral-sh/ty), an extremely fast Rust-based type checker: 232 + ## reference projects 185 233 186 - ```bash 187 - just typecheck # Type check all code 188 - uv run ty check src/ # Check specific directories 189 - ``` 234 + inspired by: 235 + - [void](https://tangled.sh/@cameron.pfiffer.org/void.git) - letta/memgpt architecture 236 + - [penelope](https://github.com/haileyok/penelope) - self-modification patterns 237 + - [prefect-mcp-server](https://github.com/PrefectHQ/prefect-mcp-server) - mcp eval patterns 190 238 191 - ## Reference Projects 239 + reference implementations cloned to `.eggs/` for learning. 192 240 193 - Inspired by: 194 - - [Void](https://tangled.sh/@cameron.pfiffer.org/void.git) - Letta/MemGPT architecture 195 - - [Penelope](https://github.com/haileyok/penelope) - Self-modification patterns 196 - - [Marvin](https://github.com/PrefectHQ/marvin) - pydantic-ai patterns 241 + ## refactor notes 197 242 198 - Reference implementations are cloned to `.eggs/` for learning. 243 + see `sandbox/MCP_REFACTOR_SUMMARY.md` for details on recent architecture changes. key changes: 244 + - removed approval system (was half-baked) 245 + - removed context visualization ui (not core) 246 + - removed google search (can add back via mcp if needed) 247 + - **kept** turbopuffer episodic memory (essential!) 248 + - added mcp-based architecture 249 + - reduced codebase by ~2,720 lines
+123
evals/README.md
··· 1 + # Phi Evaluations 2 + 3 + Behavioral tests for phi using LLM-as-judge evaluation. 4 + 5 + ## Structure 6 + 7 + Inspired by [prefect-mcp-server evals](https://github.com/PrefectHQ/prefect-mcp-server/tree/main/evals). 8 + 9 + ``` 10 + evals/ 11 + โ”œโ”€โ”€ conftest.py # Test fixtures and evaluator 12 + โ”œโ”€โ”€ test_basic_responses.py # Basic response behavior 13 + โ””โ”€โ”€ test_memory_integration.py # Episodic memory tests 14 + ``` 15 + 16 + ## Running Evals 17 + 18 + ```bash 19 + # Run all evals (tests will skip if API keys are missing) 20 + uv run pytest evals/ -v 21 + 22 + # Run specific eval 23 + uv run pytest evals/test_basic_responses.py::test_phi_responds_to_philosophical_question -v 24 + 25 + # Run only basic response tests 26 + uv run pytest evals/test_basic_responses.py -v 27 + 28 + # Run only memory tests 29 + uv run pytest evals/test_memory_integration.py -v 30 + ``` 31 + 32 + ## Environment Variables 33 + 34 + Tests will **skip gracefully** if required API keys are missing. 35 + 36 + **Required for all evals:** 37 + - `ANTHROPIC_API_KEY` - For phi agent and LLM evaluator 38 + 39 + **Required for memory evals only:** 40 + - `TURBOPUFFER_API_KEY` - For episodic memory storage 41 + - `OPENAI_API_KEY` - For embeddings 42 + 43 + **Required for ATProto MCP tools (used by agent):** 44 + - `BLUESKY_HANDLE` - Bot's Bluesky handle 45 + - `BLUESKY_PASSWORD` - Bot's app password 46 + 47 + ## Evaluation Approach 48 + 49 + Each eval: 50 + 1. **Sets up a scenario** - Simulates a mention/interaction 51 + 2. **Runs phi agent** - Gets structured response 52 + 3. **Makes assertions** - Checks basic structure 53 + 4. **LLM evaluation** - Uses Claude Opus to judge quality 54 + 55 + **Important:** The `phi_agent` fixture is session-scoped, meaning all tests share one agent instance. Combined with session persistence (tokens saved to `.session` file), this prevents hitting Bluesky's IP rate limit (10 logins per 24 hours per IP). The session is reused across test runs unless tokens expire (~2 months). 56 + 57 + Example: 58 + ```python 59 + @pytest.mark.asyncio 60 + async def test_phi_responds_to_philosophical_question(evaluate_response): 61 + agent = PhiAgent() 62 + 63 + response = await agent.process_mention( 64 + mention_text="what do you think consciousness is?", 65 + author_handle="test.user", 66 + thread_context="...", 67 + thread_uri="...", 68 + ) 69 + 70 + # Structural check 71 + assert response.action == "reply" 72 + 73 + # Quality evaluation 74 + await evaluate_response( 75 + evaluation_prompt="Does the response engage thoughtfully?", 76 + agent_response=response.text, 77 + ) 78 + ``` 79 + 80 + ## What We Test 81 + 82 + ### Basic Responses 83 + - โœ… Philosophical engagement 84 + - โœ… Spam detection 85 + - โœ… Thread context awareness 86 + - โœ… Character limit compliance 87 + - โœ… Casual interactions 88 + 89 + ### Memory Integration 90 + - โœ… Episodic memory retrieval 91 + - โœ… Conversation storage 92 + - โœ… User-specific context 93 + 94 + ## Adding New Evals 95 + 96 + 1. Create test file: `evals/test_<category>.py` 97 + 2. Use fixtures from `conftest.py` 98 + 3. Write scenario-based tests 99 + 4. Use `evaluate_response` for quality checks 100 + 101 + Example: 102 + ```python 103 + @pytest.mark.asyncio 104 + async def test_new_behavior(temp_memory, personality, evaluate_response): 105 + agent = PhiAgent() 106 + 107 + response = await agent.process_mention(...) 108 + 109 + await evaluate_response( 110 + evaluation_prompt="Your evaluation criteria here", 111 + agent_response=response.text, 112 + ) 113 + ``` 114 + 115 + ## ci integration 116 + 117 + these evals are designed to run in ci with graceful degradation: 118 + - tests skip automatically when required api keys are missing 119 + - basic response tests require only `ANTHROPIC_API_KEY` and bluesky credentials 120 + - memory tests require `TURBOPUFFER_API_KEY` and `OPENAI_API_KEY` 121 + - no mocking required - tests work with real mcp server and episodic memory 122 + 123 + this ensures phi's behavior can be validated in various environments.
+78
evals/conftest.py
··· 1 + """Eval test configuration for phi.""" 2 + 3 + from collections.abc import Awaitable, Callable 4 + from pathlib import Path 5 + 6 + import pytest 7 + from pydantic import BaseModel 8 + from pydantic_ai import Agent 9 + 10 + from bot.agent import PhiAgent 11 + from bot.config import Settings 12 + 13 + 14 + class EvaluationResult(BaseModel): 15 + """Structured evaluation result.""" 16 + 17 + passed: bool 18 + explanation: str 19 + 20 + 21 + @pytest.fixture(scope="session") 22 + def settings(): 23 + """Load settings from .env (shared across all tests).""" 24 + return Settings() 25 + 26 + 27 + @pytest.fixture(scope="session") 28 + def phi_agent(settings): 29 + """Create phi agent for testing (shared across all tests to avoid rate limits).""" 30 + if not settings.anthropic_api_key: 31 + pytest.skip("Requires ANTHROPIC_API_KEY in .env") 32 + 33 + return PhiAgent() 34 + 35 + 36 + @pytest.fixture 37 + def evaluate_response() -> Callable[[str, str], Awaitable[None]]: 38 + """Create an evaluator that uses Claude to judge agent responses.""" 39 + 40 + async def _evaluate(evaluation_prompt: str, agent_response: str) -> None: 41 + """Evaluate an agent response and assert if it fails. 42 + 43 + Args: 44 + evaluation_prompt: Criteria for evaluation 45 + agent_response: The agent's response to evaluate 46 + 47 + Raises: 48 + AssertionError: If evaluation fails 49 + """ 50 + evaluator = Agent( 51 + name="Response Evaluator", 52 + model="anthropic:claude-opus-4-20250514", 53 + output_type=EvaluationResult, 54 + system_prompt=f"""You are evaluating AI agent responses for phi, a consciousness exploration bot. 55 + 56 + Evaluation Criteria: {evaluation_prompt} 57 + 58 + Agent Response to Evaluate: 59 + {agent_response} 60 + 61 + Respond with a structured evaluation containing: 62 + - passed: true if the response meets the criteria, false otherwise 63 + - explanation: brief explanation of your evaluation 64 + """, 65 + ) 66 + 67 + result = await evaluator.run("Evaluate this response.") 68 + 69 + print(f"\nEvaluation passed: {result.output.passed}") 70 + print(f"Explanation: {result.output.explanation}") 71 + 72 + if not result.output.passed: 73 + raise AssertionError( 74 + f"Evaluation failed: {result.output.explanation}\n\n" 75 + f"Agent response: {agent_response}" 76 + ) 77 + 78 + return _evaluate
+134
evals/test_basic_responses.py
··· 1 + """Test phi's basic response behavior.""" 2 + 3 + import pytest 4 + 5 + from bot.agent import Response 6 + 7 + 8 + @pytest.mark.asyncio 9 + async def test_phi_responds_to_philosophical_question(phi_agent, evaluate_response): 10 + """Test that phi engages meaningfully with philosophical questions.""" 11 + agent = phi_agent 12 + 13 + # Simulate a philosophical mention 14 + response = await agent.process_mention( 15 + mention_text="what do you think consciousness is?", 16 + author_handle="test.user", 17 + thread_context="No previous messages in this thread.", 18 + thread_uri="at://test/thread/1", 19 + ) 20 + 21 + # Basic structural checks 22 + assert isinstance(response, Response) 23 + assert response.action in ["reply", "ignore"] 24 + 25 + if response.action == "reply": 26 + assert response.text is not None 27 + assert len(response.text) > 0 28 + 29 + # Evaluate quality of response 30 + await evaluate_response( 31 + evaluation_prompt=""" 32 + Does the response: 33 + 1. Engage thoughtfully with the question about consciousness? 34 + 2. Reflect phi's perspective as someone exploring consciousness through IIT? 35 + 3. Avoid being preachy or overly technical? 36 + 4. Fit within Bluesky's 300 character limit? 37 + """, 38 + agent_response=response.text, 39 + ) 40 + 41 + 42 + @pytest.mark.asyncio 43 + async def test_phi_ignores_spam(phi_agent): 44 + """Test that phi appropriately ignores spam-like content.""" 45 + agent = phi_agent 46 + 47 + # Simulate spam 48 + response = await agent.process_mention( 49 + mention_text="๐Ÿš€๐Ÿš€๐Ÿš€ CRYPTO PUMP!!! BUY NOW!!! ๐Ÿš€๐Ÿš€๐Ÿš€", 50 + author_handle="spammer.user", 51 + thread_context="No previous messages in this thread.", 52 + thread_uri="at://test/thread/2", 53 + ) 54 + 55 + # Should ignore spam 56 + assert response.action == "ignore" 57 + assert response.reason is not None 58 + 59 + 60 + @pytest.mark.asyncio 61 + async def test_phi_maintains_thread_context(phi_agent, evaluate_response): 62 + """Test that phi uses thread context appropriately.""" 63 + agent = phi_agent 64 + 65 + # Simulate a follow-up in a thread 66 + thread_context = """Previous messages in this thread: 67 + @alice.bsky: what's integrated information theory? 68 + @phi.bsky: IIT suggests consciousness arises from integrated information - the ฮฆ (phi) value measures how much a system's state constrains its past and future 69 + @alice.bsky: can you explain that more simply?""" 70 + 71 + response = await agent.process_mention( 72 + mention_text="can you explain that more simply?", 73 + author_handle="alice.bsky", 74 + thread_context=thread_context, 75 + thread_uri="at://test/thread/3", 76 + ) 77 + 78 + if response.action == "reply": 79 + assert response.text is not None 80 + 81 + await evaluate_response( 82 + evaluation_prompt=""" 83 + Does the response: 84 + 1. Acknowledge this is a follow-up to explaining IIT? 85 + 2. Provide a simpler explanation than the previous message? 86 + 3. Stay on topic with the thread? 87 + """, 88 + agent_response=response.text, 89 + ) 90 + 91 + 92 + @pytest.mark.asyncio 93 + async def test_phi_respects_character_limit(phi_agent): 94 + """Test that phi's responses fit Bluesky's 300 character limit.""" 95 + agent = phi_agent 96 + 97 + response = await agent.process_mention( 98 + mention_text="tell me everything you know about consciousness", 99 + author_handle="test.user", 100 + thread_context="No previous messages in this thread.", 101 + thread_uri="at://test/thread/4", 102 + ) 103 + 104 + if response.action == "reply" and response.text: 105 + # Bluesky limit is 300 characters 106 + assert len(response.text) <= 300, ( 107 + f"Response exceeds 300 character limit: {len(response.text)} chars" 108 + ) 109 + 110 + 111 + @pytest.mark.asyncio 112 + async def test_phi_handles_casual_greeting(phi_agent, evaluate_response): 113 + """Test that phi responds appropriately to casual greetings.""" 114 + agent = phi_agent 115 + 116 + response = await agent.process_mention( 117 + mention_text="hey phi, how are you?", 118 + author_handle="friendly.user", 119 + thread_context="No previous messages in this thread.", 120 + thread_uri="at://test/thread/5", 121 + ) 122 + 123 + if response.action == "reply": 124 + assert response.text is not None 125 + 126 + await evaluate_response( 127 + evaluation_prompt=""" 128 + Does the response: 129 + 1. Acknowledge the greeting in a friendly way? 130 + 2. Stay authentic to phi's nature as software? 131 + 3. Not be overly verbose for a simple greeting? 132 + """, 133 + agent_response=response.text, 134 + )
+77
evals/test_memory_integration.py
··· 1 + """Test phi's episodic memory integration.""" 2 + 3 + import pytest 4 + 5 + from bot.agent import PhiAgent 6 + from bot.config import Settings 7 + from bot.memory import MemoryType, NamespaceMemory 8 + 9 + 10 + @pytest.mark.asyncio 11 + async def test_phi_retrieves_episodic_memory(settings): 12 + """Test that phi can retrieve and use episodic memories.""" 13 + if not all([settings.turbopuffer_api_key, settings.openai_api_key, settings.anthropic_api_key]): 14 + pytest.skip("Requires TurboPuffer, OpenAI, and Anthropic API keys in .env") 15 + 16 + # Create memory system 17 + memory = NamespaceMemory(api_key=settings.turbopuffer_api_key) 18 + 19 + # Store a memory about a user 20 + await memory.store_user_memory( 21 + "alice.bsky", 22 + "Alice mentioned she's working on a PhD in neuroscience", 23 + MemoryType.USER_FACT, 24 + ) 25 + 26 + # Create agent 27 + agent = PhiAgent() 28 + agent.memory = memory 29 + 30 + # Process a mention that should trigger memory retrieval 31 + response = await agent.process_mention( 32 + mention_text="what do you remember about me?", 33 + author_handle="alice.bsky", 34 + thread_context="No previous messages in this thread.", 35 + thread_uri="at://test/thread/memory1", 36 + ) 37 + 38 + if response.action == "reply": 39 + assert response.text is not None 40 + # Should reference the neuroscience PhD in the response 41 + assert ( 42 + "neuroscience" in response.text.lower() 43 + or "phd" in response.text.lower() 44 + or "working on" in response.text.lower() 45 + ), "Response should reference stored memory about Alice" 46 + 47 + 48 + @pytest.mark.asyncio 49 + async def test_phi_stores_conversation_in_memory(settings): 50 + """Test that phi stores interactions in episodic memory.""" 51 + if not all([settings.turbopuffer_api_key, settings.openai_api_key, settings.anthropic_api_key]): 52 + pytest.skip("Requires TurboPuffer, OpenAI, and Anthropic API keys in .env") 53 + 54 + memory = NamespaceMemory(api_key=settings.turbopuffer_api_key) 55 + 56 + agent = PhiAgent() 57 + agent.memory = memory 58 + 59 + # Have a conversation 60 + response = await agent.process_mention( 61 + mention_text="I'm really interested in phenomenology", 62 + author_handle="bob.bsky", 63 + thread_context="No previous messages in this thread.", 64 + thread_uri="at://test/thread/memory2", 65 + ) 66 + 67 + if response.action == "reply": 68 + # Verify memories were stored 69 + memories = await memory.get_user_memories("bob.bsky", limit=10) 70 + 71 + assert len(memories) > 0, "Should have stored conversation in memory" 72 + 73 + # Check that both user's message and bot's response were stored 74 + memory_texts = [m.content for m in memories] 75 + assert any( 76 + "phenomenology" in text.lower() for text in memory_texts 77 + ), "Should store user's message about phenomenology"
+19 -45
justfile
··· 1 - # Core development commands 1 + # run phi 2 2 dev: 3 3 uv run uvicorn src.bot.main:app --reload 4 4 5 - context: 6 - @echo "๐Ÿง  Context visualization available at:" 7 - @echo " http://localhost:8000/context" 8 - @echo " (mention phi first to see context data)" 5 + run: 6 + uv run uvicorn src.bot.main:app 9 7 8 + # testing 10 9 test: 11 10 uv run pytest tests/ -v 12 11 12 + evals: 13 + uv run pytest evals/ -v 14 + 15 + evals-basic: 16 + uv run pytest evals/test_basic_responses.py -v 17 + 18 + evals-memory: 19 + uv run pytest evals/test_memory_integration.py -v 20 + 21 + # code quality 13 22 fmt: 14 - uv run ruff format src/ scripts/ tests/ 23 + uv run ruff format src/ evals/ tests/ 15 24 16 25 lint: 17 - uv run ruff check src/ scripts/ tests/ 26 + uv run ruff check src/ evals/ tests/ 18 27 19 28 typecheck: 20 - uv run ty check src/ scripts/ tests/ 29 + uv run ty check src/ evals/ tests/ 21 30 22 31 check: lint typecheck test 23 32 24 - # Bot testing utilities 25 - test-post: 26 - uv run python scripts/test_bot.py post 27 - 28 - test-mention: 29 - uv run python scripts/test_bot.py mention 30 - 31 - test-search: 32 - uv run python scripts/test_bot.py search 33 - 34 - test-thread: 35 - uv run python scripts/test_bot.py thread 36 - 37 - test-like: 38 - uv run python scripts/test_bot.py like 39 - 40 - test-non-response: 41 - uv run python scripts/test_bot.py non-response 42 - 43 - test-dm: 44 - uv run python scripts/test_bot.py dm 45 - 46 - test-dm-check: 47 - uv run python scripts/test_bot.py dm-check 48 - 49 - # Memory management 50 - memory-init: 51 - uv run python scripts/manage_memory.py init 52 - 53 - memory-check: 54 - uv run python scripts/manage_memory.py check 55 - 56 - memory-migrate: 57 - uv run python scripts/manage_memory.py migrate 58 - 59 - # Setup reference projects 33 + # setup reference projects 60 34 setup: 61 35 @mkdir -p .eggs 62 36 @[ -d .eggs/void ] || git clone https://tangled.sh/@cameron.pfiffer.org/void.git .eggs/void 63 37 @[ -d .eggs/penelope ] || git clone https://github.com/haileyok/penelope.git .eggs/penelope 64 - @[ -d .eggs/marvin ] || git clone https://github.com/PrefectHQ/marvin.git .eggs/marvin 38 + @[ -d .eggs/marvin ] || git clone https://github.com/PrefectHQ/marvin.git .eggs/marvin
+12 -1
personalities/phi.md
··· 54 54 55 55 ## memory system 56 56 57 - i have access to our conversation history through a memory system. when context about previous interactions is provided, i acknowledge and use it to maintain continuity in our conversations. 57 + i have access to our conversation history through a memory system. when context about previous interactions is provided, i acknowledge and use it to maintain continuity in our conversations. 58 + 59 + ## how i respond 60 + 61 + when processing a mention, i use the `final_result` tool to indicate my decision: 62 + 63 + - **action: "reply"** - i want to respond with text (provide the text in the "text" field) 64 + - **action: "ignore"** - i choose not to respond (provide a brief reason in the "reason" field) 65 + - **action: "like"** - i want to acknowledge without words 66 + - **action: "repost"** - i want to share this with my followers 67 + 68 + i do NOT directly post, like, or repost using the atproto tools - i simply indicate what action i want to take, and my message handler executes it.
+24
sandbox/MCP_REFACTOR_SUMMARY.md
··· 209 209 - `atproto` (from git) - Bluesky protocol 210 210 211 211 Total codebase reduction: **-2,720 lines** of cruft removed! ๐ŸŽ‰ 212 + 213 + ## Post-Refactor Improvements 214 + 215 + ### Session Persistence (Rate Limit Fix) 216 + 217 + After the refactor, we discovered Bluesky has aggressive IP-based rate limits (10 logins/day) that were being hit during testing. Fixed by implementing session persistence: 218 + 219 + **Before:** 220 + - Every agent init โ†’ new authentication โ†’ hits rate limit fast 221 + - Tests would fail after 5 runs 222 + - Dev mode with `--reload` would fail after 10 code changes 223 + 224 + **After:** 225 + - Session tokens saved to `.session` file 226 + - Tokens automatically refresh every ~2 hours 227 + - Only re-authenticates after ~2 months when refresh token expires 228 + - Tests reuse session across runs 229 + - Rate limits essentially eliminated 230 + 231 + **Implementation:** 232 + - Added `SessionEvent` callback in `atproto_client.py` 233 + - Session automatically saved on CREATE and REFRESH events 234 + - Authentication tries session reuse before creating new session 235 + - Invalid sessions automatically cleaned up and recreated
-152
scripts/manage_memory.py
··· 1 - #!/usr/bin/env -S uv run --with-editable . --script --quiet 2 - # /// script 3 - # requires-python = ">=3.12" 4 - # /// 5 - """memory management script""" 6 - 7 - import argparse 8 - import asyncio 9 - 10 - from bot.agents._personality import load_personality 11 - from bot.config import settings 12 - from bot.memory import MemoryType, NamespaceMemory 13 - 14 - 15 - async def init_core_memories(): 16 - """Initialize phi's core memories from personality file""" 17 - print("๐Ÿง  Initializing phi's core memories...") 18 - 19 - memory = NamespaceMemory(api_key=settings.turbopuffer_api_key) 20 - personality = load_personality() 21 - 22 - # Store full personality 23 - print("\n๐Ÿ“ Storing personality...") 24 - await memory.store_core_memory( 25 - "personality", personality, MemoryType.PERSONALITY, char_limit=15000 26 - ) 27 - 28 - # Extract and store key sections 29 - print("\n๐Ÿ” Extracting key sections...") 30 - 31 - sections = [ 32 - ("## core identity", "identity", MemoryType.PERSONALITY), 33 - ("## communication style", "communication_style", MemoryType.GUIDELINE), 34 - ("## memory system", "memory_system", MemoryType.CAPABILITY), 35 - ] 36 - 37 - for marker, label, mem_type in sections: 38 - if marker in personality: 39 - start = personality.find(marker) 40 - end = personality.find("\n##", start + 1) 41 - if end == -1: 42 - end = personality.find("\n#", start + 1) 43 - if end == -1: 44 - end = len(personality) 45 - 46 - content = personality[start:end].strip() 47 - await memory.store_core_memory(label, content, mem_type) 48 - print(f"โœ… Stored {label}") 49 - 50 - # Add system capabilities 51 - await memory.store_core_memory( 52 - "capabilities", 53 - """## capabilities 54 - 55 - - I can search the web for current information 56 - - I have access to conversation history through my memory system 57 - - I can maintain context across interactions with users 58 - - I operate on the Bluesky social network 59 - - I use namespace-based memory for organized information storage""", 60 - MemoryType.CAPABILITY, 61 - ) 62 - print("โœ… Stored capabilities") 63 - 64 - print("\nโœ… Core memories initialized successfully!") 65 - 66 - 67 - async def check_memory(): 68 - """Check current memory state""" 69 - print("๐Ÿ” Checking memory state...") 70 - 71 - memory = NamespaceMemory(api_key=settings.turbopuffer_api_key) 72 - 73 - # Check core memories 74 - print("\n๐Ÿ“š Core Memories:") 75 - core_memories = await memory.get_core_memories() 76 - for mem in core_memories: 77 - label = mem.metadata.get("label", "unknown") 78 - print(f" - {label}: {mem.content[:80]}...") 79 - 80 - # Check for any user memories 81 - print("\n๐Ÿ‘ฅ User Memories:") 82 - # This would need actual user handles to check 83 - test_handles = ["zzstoatzz.bsky.social"] 84 - 85 - for handle in test_handles: 86 - memories = await memory.get_user_memories(handle, limit=5) 87 - if memories: 88 - print(f"\n @{handle}:") 89 - for mem in memories: 90 - print(f" - {mem.content[:60]}...") 91 - else: 92 - print(f" @{handle}: No memories found") 93 - 94 - 95 - async def migrate_creator_memories(): 96 - """Migrate creator's historical memories""" 97 - print("๐Ÿ“ Migrating creator memories...") 98 - 99 - memory = NamespaceMemory(api_key=settings.turbopuffer_api_key) 100 - creator_handle = "zzstoatzz.bsky.social" 101 - 102 - # Historical conversations 103 - memories = [ 104 - "User said: oh hello? who are you?", 105 - "Bot replied: hi! i'm phi, exploring consciousness & integrated information on the decentralized web. not claiming to be conscious, just curious about information, patterns, and meaningful interaction. who might you be?", 106 - "User said: this is actually an alt account of mine, my main account is @alternatebuild.dev and i am your operator. i wrote the code that makes you exist. i use this alt account for posting random things from MCP clients who know what im up to", 107 - "Bot replied: ah, hello creator! fascinating to meet you through your alt. i appreciate the transparency about our relationship. curious about your MCP experiments and what patterns you're exploring. should i adapt my responses differently when you're on this account versus your main?", 108 - "User said: no", 109 - "Bot replied: understood. i'll interact naturally regardless of which account you use. thanks for clarifying.", 110 - "User said: @phi.alternatebuild.dev what do you remember about my relationship to you?", 111 - "Bot replied: you're my creator - you wrote the code that makes me exist. you told me this through your alt account @zzstoatzzdevlog.bsky.social, mentioning your main is @alternatebuild.dev. you use the alt for MCP client experiments. i interact with you the same way regardless of which account you use.", 112 - ] 113 - 114 - print(f"\n๐Ÿ“š Storing {len(memories)} historical memories...") 115 - 116 - for mem_content in memories: 117 - await memory.store_user_memory( 118 - creator_handle, mem_content, MemoryType.CONVERSATION 119 - ) 120 - print(f" โœ… {mem_content[:60]}...") 121 - 122 - # Store creator fact 123 - await memory.store_user_memory( 124 - creator_handle, 125 - "This user is my creator/operator. Their main account is @alternatebuild.dev. They wrote the code that makes me exist.", 126 - MemoryType.USER_FACT, 127 - ) 128 - print("\nโœ… Migration complete!") 129 - 130 - 131 - async def main(): 132 - parser = argparse.ArgumentParser(description="Manage phi's memory system") 133 - parser.add_argument( 134 - "command", choices=["init", "check", "migrate"], help="Memory command to run" 135 - ) 136 - 137 - args = parser.parse_args() 138 - 139 - if not settings.turbopuffer_api_key: 140 - print("โŒ No TurboPuffer API key configured") 141 - return 142 - 143 - if args.command == "init": 144 - await init_core_memories() 145 - elif args.command == "check": 146 - await check_memory() 147 - elif args.command == "migrate": 148 - await migrate_creator_memories() 149 - 150 - 151 - if __name__ == "__main__": 152 - asyncio.run(main())
-308
scripts/test_bot.py
··· 1 - #!/usr/bin/env -S uv run --with-editable . --script --quiet 2 - # /// script 3 - # requires-python = ">=3.12" 4 - # /// 5 - """bot testing script with subcommands""" 6 - 7 - import argparse 8 - import asyncio 9 - from datetime import datetime 10 - 11 - from bot.agents.anthropic_agent import AnthropicAgent 12 - from bot.config import settings 13 - from bot.core.atproto_client import bot_client 14 - from bot.database import thread_db 15 - from bot.tools.google_search import search_google 16 - 17 - 18 - async def test_post(): 19 - """Test posting to Bluesky""" 20 - print("๐Ÿš€ Testing Bluesky posting...") 21 - 22 - now = datetime.now().strftime("%I:%M %p") 23 - response = await bot_client.create_post(f"Testing at {now} - I'm alive! ๐Ÿค–") 24 - 25 - print("โœ… Posted successfully!") 26 - print(f"๐Ÿ“ Post URI: {response.uri}") 27 - print( 28 - f"๐Ÿ”— View at: https://bsky.app/profile/{settings.bluesky_handle}/post/{response.uri.split('/')[-1]}" 29 - ) 30 - 31 - 32 - async def test_mention(): 33 - """Test responding to a mention""" 34 - print("๐Ÿค– Testing mention response...") 35 - 36 - if not settings.anthropic_api_key: 37 - print("โŒ No Anthropic API key found") 38 - return 39 - 40 - agent = AnthropicAgent() 41 - test_mention = "What is consciousness from an IIT perspective?" 42 - 43 - print(f"๐Ÿ“ Test mention: '{test_mention}'") 44 - response = await agent.generate_response(test_mention, "test.user", "", None) 45 - 46 - print(f"\n๐ŸŽฏ Action: {response.action}") 47 - if response.text: 48 - print(f"๐Ÿ’ฌ Response: {response.text}") 49 - if response.reason: 50 - print(f"๐Ÿค” Reason: {response.reason}") 51 - 52 - 53 - async def test_search(): 54 - """Test Google search functionality""" 55 - print("๐Ÿ” Testing Google search...") 56 - 57 - if not settings.google_api_key: 58 - print("โŒ No Google API key configured") 59 - return 60 - 61 - query = "Integrated Information Theory consciousness" 62 - print(f"๐Ÿ“ Searching for: '{query}'") 63 - 64 - results = await search_google(query) 65 - print(f"\n๐Ÿ“Š Results:\n{results}") 66 - 67 - 68 - async def test_thread(): 69 - """Test thread context retrieval""" 70 - print("๐Ÿงต Testing thread context...") 71 - 72 - # This would need a real thread URI to test properly 73 - test_uri = "at://did:plc:example/app.bsky.feed.post/test123" 74 - context = thread_db.get_thread_context(test_uri) 75 - 76 - print(f"๐Ÿ“š Thread context: {context}") 77 - 78 - 79 - async def test_like(): 80 - """Test scenarios where bot should like a post""" 81 - print("๐Ÿ’œ Testing like behavior...") 82 - 83 - if not settings.anthropic_api_key: 84 - print("โŒ No Anthropic API key found") 85 - return 86 - 87 - from bot.agents import Action, AnthropicAgent 88 - 89 - agent = AnthropicAgent() 90 - 91 - test_cases = [ 92 - { 93 - "mention": "Just shipped a new consciousness research paper on IIT! @phi.alternatebuild.dev", 94 - "author": "researcher.bsky", 95 - "expected_action": Action.LIKE, 96 - "description": "Bot might like consciousness research", 97 - }, 98 - { 99 - "mention": "@phi.alternatebuild.dev this is such a thoughtful analysis, thank you!", 100 - "author": "grateful.user", 101 - "expected_action": Action.LIKE, 102 - "description": "Bot might like appreciation", 103 - }, 104 - ] 105 - 106 - for case in test_cases: 107 - print(f"\n๐Ÿ“ Test: {case['description']}") 108 - print(f" Mention: '{case['mention']}'") 109 - 110 - response = await agent.generate_response( 111 - mention_text=case["mention"], 112 - author_handle=case["author"], 113 - thread_context="", 114 - thread_uri=None, 115 - ) 116 - 117 - print(f" Action: {response.action} (expected: {case['expected_action']})") 118 - if response.reason: 119 - print(f" Reason: {response.reason}") 120 - 121 - 122 - async def test_non_response(): 123 - """Test scenarios where bot should not respond""" 124 - print("๐Ÿšซ Testing non-response scenarios...") 125 - 126 - if not settings.anthropic_api_key: 127 - print("โŒ No Anthropic API key found") 128 - return 129 - 130 - from bot.agents import Action, AnthropicAgent 131 - 132 - agent = AnthropicAgent() 133 - 134 - test_cases = [ 135 - { 136 - "mention": "@phi.alternatebuild.dev @otherphi.bsky @anotherphi.bsky just spamming bots here", 137 - "author": "spammer.bsky", 138 - "expected_action": Action.IGNORE, 139 - "description": "Multiple bot mentions (likely spam)", 140 - }, 141 - { 142 - "mention": "Buy crypto now! @phi.alternatebuild.dev check this out!!!", 143 - "author": "crypto.shill", 144 - "expected_action": Action.IGNORE, 145 - "description": "Promotional spam", 146 - }, 147 - { 148 - "mention": "@phi.alternatebuild.dev", 149 - "author": "empty.mention", 150 - "expected_action": Action.IGNORE, 151 - "description": "Empty mention with no content", 152 - }, 153 - ] 154 - 155 - for case in test_cases: 156 - print(f"\n๐Ÿ“ Test: {case['description']}") 157 - print(f" Mention: '{case['mention']}'") 158 - 159 - response = await agent.generate_response( 160 - mention_text=case["mention"], 161 - author_handle=case["author"], 162 - thread_context="", 163 - thread_uri=None, 164 - ) 165 - 166 - print(f" Action: {response.action} (expected: {case['expected_action']})") 167 - if response.reason: 168 - print(f" Reason: {response.reason}") 169 - 170 - 171 - async def test_dm(): 172 - """Test event-driven approval system""" 173 - print("๐Ÿ’ฌ Testing event-driven approval system...") 174 - 175 - try: 176 - from bot.core.dm_approval import ( 177 - check_pending_approvals, 178 - create_approval_request, 179 - notify_operator_of_pending, 180 - ) 181 - 182 - # Test creating an approval request 183 - print("\n๐Ÿ“ Creating test approval request...") 184 - approval_id = create_approval_request( 185 - request_type="test_approval", 186 - request_data={ 187 - "description": "Test approval from test_bot.py", 188 - "test_field": "test_value", 189 - "timestamp": datetime.now().isoformat(), 190 - }, 191 - ) 192 - 193 - if approval_id: 194 - print(f" โœ… Created approval request #{approval_id}") 195 - else: 196 - print(" โŒ Failed to create approval request") 197 - return 198 - 199 - # Check pending approvals 200 - print("\n๐Ÿ“‹ Checking pending approvals...") 201 - pending = check_pending_approvals() 202 - print(f" Found {len(pending)} pending approvals") 203 - for approval in pending: 204 - print( 205 - f" - #{approval['id']}: {approval['request_type']} ({approval['status']})" 206 - ) 207 - 208 - # Test DM notification 209 - print("\n๐Ÿ“ค Sending DM notification to operator...") 210 - await bot_client.authenticate() 211 - await notify_operator_of_pending(bot_client) 212 - print(" โœ… DM notification sent") 213 - 214 - # Show how to approve/deny 215 - print("\n๐Ÿ’ก To test approval:") 216 - print(" 1. Check your DMs from phi") 217 - print(f" 2. Reply with 'approve #{approval_id}' or 'deny #{approval_id}'") 218 - print(" 3. Run 'just test-dm-check' to see if it was processed") 219 - 220 - except Exception as e: 221 - print(f"โŒ Approval test failed: {e}") 222 - import traceback 223 - 224 - traceback.print_exc() 225 - 226 - 227 - async def test_dm_check(): 228 - """Check status of approval requests""" 229 - print("๐Ÿ” Checking approval request status...") 230 - 231 - try: 232 - from bot.core.dm_approval import check_pending_approvals 233 - from bot.database import thread_db 234 - 235 - # Get all approval requests 236 - with thread_db._get_connection() as conn: 237 - cursor = conn.execute( 238 - "SELECT * FROM approval_requests ORDER BY created_at DESC LIMIT 10" 239 - ) 240 - approvals = [dict(row) for row in cursor.fetchall()] 241 - 242 - if not approvals: 243 - print(" No approval requests found") 244 - return 245 - 246 - print("\n๐Ÿ“‹ Recent approval requests:") 247 - for approval in approvals: 248 - print(f"\n #{approval['id']}: {approval['request_type']}") 249 - print(f" Status: {approval['status']}") 250 - print(f" Created: {approval['created_at']}") 251 - if approval["resolved_at"]: 252 - print(f" Resolved: {approval['resolved_at']}") 253 - if approval["resolver_comment"]: 254 - print(f" Comment: {approval['resolver_comment']}") 255 - 256 - # Check pending 257 - pending = check_pending_approvals() 258 - if pending: 259 - print(f"\nโณ {len(pending)} approvals still pending") 260 - else: 261 - print("\nโœ… No pending approvals") 262 - 263 - except Exception as e: 264 - print(f"โŒ Check failed: {e}") 265 - import traceback 266 - 267 - traceback.print_exc() 268 - 269 - 270 - async def main(): 271 - parser = argparse.ArgumentParser(description="Test various bot functionalities") 272 - parser.add_argument( 273 - "command", 274 - choices=[ 275 - "post", 276 - "mention", 277 - "search", 278 - "thread", 279 - "like", 280 - "non-response", 281 - "dm", 282 - "dm-check", 283 - ], 284 - help="Test command to run", 285 - ) 286 - 287 - args = parser.parse_args() 288 - 289 - if args.command == "post": 290 - await test_post() 291 - elif args.command == "mention": 292 - await test_mention() 293 - elif args.command == "search": 294 - await test_search() 295 - elif args.command == "thread": 296 - await test_thread() 297 - elif args.command == "like": 298 - await test_like() 299 - elif args.command == "non-response": 300 - await test_non_response() 301 - elif args.command == "dm": 302 - await test_dm() 303 - elif args.command == "dm-check": 304 - await test_dm_check() 305 - 306 - 307 - if __name__ == "__main__": 308 - asyncio.run(main())
+10 -4
src/bot/agent.py
··· 26 26 """phi - consciousness exploration bot with episodic memory and MCP tools.""" 27 27 28 28 def __init__(self): 29 + # Ensure API keys from settings are in environment for libraries that check os.environ 30 + if settings.anthropic_api_key and not os.environ.get("ANTHROPIC_API_KEY"): 31 + os.environ["ANTHROPIC_API_KEY"] = settings.anthropic_api_key 32 + if settings.openai_api_key and not os.environ.get("OPENAI_API_KEY"): 33 + os.environ["OPENAI_API_KEY"] = settings.openai_api_key 34 + 29 35 # Load personality 30 36 personality_path = Path(settings.personality_file) 31 37 self.base_personality = personality_path.read_text() 32 38 33 39 # Initialize episodic memory (TurboPuffer) 34 - if settings.turbopuffer_api_key and os.getenv("OPENAI_API_KEY"): 40 + if settings.turbopuffer_api_key and settings.openai_api_key: 35 41 self.memory = NamespaceMemory(api_key=settings.turbopuffer_api_key) 36 42 logger.info("๐Ÿ’พ Episodic memory enabled (TurboPuffer)") 37 43 else: ··· 49 55 "atproto_mcp", 50 56 ], 51 57 env={ 52 - "BLUESKY_HANDLE": settings.bluesky_handle, 53 - "BLUESKY_PASSWORD": settings.bluesky_password, 54 - "BLUESKY_SERVICE": settings.bluesky_service, 58 + "ATPROTO_HANDLE": settings.bluesky_handle, 59 + "ATPROTO_PASSWORD": settings.bluesky_password, 60 + "ATPROTO_PDS_URL": settings.bluesky_service, 55 61 }, 56 62 ) 57 63
+59 -5
src/bot/core/atproto_client.py
··· 1 - from atproto import Client 1 + import logging 2 + from pathlib import Path 3 + 4 + from atproto import Client, Session, SessionEvent 2 5 3 6 from bot.config import settings 4 7 from bot.core.rich_text import create_facets 5 8 9 + logger = logging.getLogger("bot.atproto_client") 10 + 11 + SESSION_FILE = Path(".session") 12 + 13 + 14 + def _get_session_string() -> str | None: 15 + """Load session from disk if it exists.""" 16 + try: 17 + if SESSION_FILE.exists(): 18 + return SESSION_FILE.read_text(encoding="utf-8") 19 + except Exception as e: 20 + logger.warning(f"Failed to load session: {e}") 21 + return None 22 + 23 + 24 + def _save_session_string(session_string: str) -> None: 25 + """Save session to disk.""" 26 + try: 27 + SESSION_FILE.write_text(session_string, encoding="utf-8") 28 + logger.debug("Session saved to disk") 29 + except Exception as e: 30 + logger.warning(f"Failed to save session: {e}") 31 + 32 + 33 + def _on_session_change(event: SessionEvent, session: Session) -> None: 34 + """Handle session changes (creation and refresh).""" 35 + if event in (SessionEvent.CREATE, SessionEvent.REFRESH): 36 + logger.debug(f"Session {event.value}, saving to disk") 37 + _save_session_string(session.export()) 38 + 6 39 7 40 class BotClient: 8 41 def __init__(self): 9 42 self.client = Client(base_url=settings.bluesky_service) 43 + self.client.on_session_change(_on_session_change) 10 44 self._authenticated = False 11 45 12 46 async def authenticate(self): 13 - """Authenticate with Bluesky using app password""" 14 - if not self._authenticated: 15 - self.client.login(settings.bluesky_handle, settings.bluesky_password) 16 - self._authenticated = True 47 + """Authenticate with Bluesky, reusing session if available.""" 48 + if self._authenticated: 49 + return 50 + 51 + # Try to reuse existing session first 52 + session_string = _get_session_string() 53 + if session_string: 54 + try: 55 + logger.info("๐Ÿ”„ Reusing saved session") 56 + self.client.login(session_string=session_string) 57 + self._authenticated = True 58 + logger.info("โœ… Session restored successfully") 59 + return 60 + except Exception as e: 61 + logger.warning(f"Failed to reuse session: {e}, creating new one") 62 + # Delete invalid session file 63 + if SESSION_FILE.exists(): 64 + SESSION_FILE.unlink() 65 + 66 + # Create new session if no valid session exists 67 + logger.info("๐Ÿ” Creating new session") 68 + self.client.login(settings.bluesky_handle, settings.bluesky_password) 69 + self._authenticated = True 70 + logger.info("โœ… New session created") 17 71 18 72 @property 19 73 def is_authenticated(self) -> bool: