···99- **Duplicate Management**: Manual curation of duplicate entries across feeds
1010- **Modern CLI**: Built with Typer and Rich for beautiful terminal output
1111- **Comprehensive Parsing**: Supports RSS 0.9x, RSS 1.0, RSS 2.0, and Atom feeds
1212-- **Zulip Bot Integration**: Automatically post new feed articles to Zulip chat
1312- **Cron-Friendly**: Designed for scheduled execution
14131514## Installation
···110109# Remove duplicate mapping
111110thicket duplicates remove "https://example.com/dup"
112111```
113113-114114-### Zulip Bot Integration
115115-```bash
116116-# Test bot functionality
117117-thicket bot test
118118-119119-# Show bot status
120120-thicket bot status
121121-122122-# Run bot (requires configuration)
123123-thicket bot run --config bot-config/zuliprc
124124-```
125125-126126-**Bot Setup:**
127127-1. Create a Zulip bot in your organization
128128-2. Copy `bot-config/zuliprc.template` to `bot-config/zuliprc`
129129-3. Configure with your bot's credentials
130130-4. Run the bot and configure via Zulip chat:
131131- ```
132132- @thicket config path /path/to/thicket.yaml
133133- @thicket config stream general
134134- @thicket config topic "Feed Updates"
135135- ```
136136-137137-See [docs/ZULIP_BOT.md](docs/ZULIP_BOT.md) for detailed setup instructions.
138112139113## Configuration
140114
-400
SPEC.md
···11-# Thicket Git Store Specification
22-33-This document comprehensively defines the JSON format and structure of the Thicket Git repository, enabling third-party clients to read and write to the store while leveraging Thicket's existing Python classes for data validation and business logic.
44-55-## Overview
66-77-The Thicket Git store is a structured repository that persists Atom/RSS feed entries in JSON format. The store is designed to be both human-readable and machine-parseable, with a clear directory structure and standardized JSON schemas.
88-99-## Repository Structure
1010-1111-```
1212-<git_store>/
1313-โโโ index.json # Main index of all users and metadata
1414-โโโ duplicates.json # Maps duplicate entry IDs to canonical IDs
1515-โโโ index.opml # OPML export of all feeds (generated)
1616-โโโ <username1>/ # User directory (sanitized username)
1717-โ โโโ <entry_id1>.json # Individual feed entry
1818-โ โโโ <entry_id2>.json # Individual feed entry
1919-โ โโโ ...
2020-โโโ <username2>/
2121-โ โโโ <entry_id3>.json
2222-โ โโโ ...
2323-โโโ ...
2424-```
2525-2626-## JSON Schemas
2727-2828-### 1. Index File (`index.json`)
2929-3030-The main index tracks all users, their metadata, and repository statistics.
3131-3232-**Schema:**
3333-```json
3434-{
3535- "users": {
3636- "<username>": {
3737- "username": "string",
3838- "display_name": "string | null",
3939- "email": "string | null",
4040- "homepage": "string (URL) | null",
4141- "icon": "string (URL) | null",
4242- "feeds": ["string (URL)", ...],
4343- "zulip_associations": [
4444- {
4545- "server": "string",
4646- "user_id": "string"
4747- },
4848- ...
4949- ],
5050- "directory": "string",
5151- "created": "string (ISO 8601 datetime)",
5252- "last_updated": "string (ISO 8601 datetime)",
5353- "entry_count": "integer"
5454- }
5555- },
5656- "created": "string (ISO 8601 datetime)",
5757- "last_updated": "string (ISO 8601 datetime)",
5858- "total_entries": "integer"
5959-}
6060-```
6161-6262-**Example:**
6363-```json
6464-{
6565- "users": {
6666- "johndoe": {
6767- "username": "johndoe",
6868- "display_name": "John Doe",
6969- "email": "john@example.com",
7070- "homepage": "https://johndoe.blog",
7171- "icon": "https://johndoe.blog/avatar.png",
7272- "feeds": [
7373- "https://johndoe.blog/feed.xml",
7474- "https://johndoe.blog/categories/tech/feed.xml"
7575- ],
7676- "zulip_associations": [
7777- {
7878- "server": "myorg.zulipchat.com",
7979- "user_id": "john.doe"
8080- },
8181- {
8282- "server": "community.zulipchat.com",
8383- "user_id": "johndoe@example.com"
8484- }
8585- ],
8686- "directory": "johndoe",
8787- "created": "2024-01-15T10:30:00",
8888- "last_updated": "2024-01-20T14:22:00",
8989- "entry_count": 42
9090- }
9191- },
9292- "created": "2024-01-15T10:30:00",
9393- "last_updated": "2024-01-20T14:22:00",
9494- "total_entries": 42
9595-}
9696-```
9797-9898-### 2. Duplicates File (`duplicates.json`)
9999-100100-Maps duplicate entry IDs to their canonical representations to handle feed entries that appear with different IDs but identical content.
101101-102102-**Schema:**
103103-```json
104104-{
105105- "duplicates": {
106106- "<duplicate_id>": "<canonical_id>"
107107- },
108108- "comment": "Entry IDs that map to the same canonical content"
109109-}
110110-```
111111-112112-**Example:**
113113-```json
114114-{
115115- "duplicates": {
116116- "https://example.com/posts/123?utm_source=rss": "https://example.com/posts/123",
117117- "https://example.com/feed/item-duplicate": "https://example.com/feed/item-original"
118118- },
119119- "comment": "Entry IDs that map to the same canonical content"
120120-}
121121-```
122122-123123-### 3. Feed Entry Files (`<username>/<entry_id>.json`)
124124-125125-Individual feed entries are stored as normalized Atom entries, regardless of their original format (RSS/Atom).
126126-127127-**Schema:**
128128-```json
129129-{
130130- "id": "string",
131131- "title": "string",
132132- "link": "string (URL)",
133133- "updated": "string (ISO 8601 datetime)",
134134- "published": "string (ISO 8601 datetime) | null",
135135- "summary": "string | null",
136136- "content": "string | null",
137137- "content_type": "html | text | xhtml",
138138- "author": {
139139- "name": "string | null",
140140- "email": "string | null",
141141- "uri": "string (URL) | null"
142142- } | null,
143143- "categories": ["string", ...],
144144- "rights": "string | null",
145145- "source": "string (URL) | null"
146146-}
147147-```
148148-149149-**Example:**
150150-```json
151151-{
152152- "id": "https://johndoe.blog/posts/my-first-post",
153153- "title": "My First Blog Post",
154154- "link": "https://johndoe.blog/posts/my-first-post",
155155- "updated": "2024-01-20T14:22:00",
156156- "published": "2024-01-20T09:00:00",
157157- "summary": "This is a summary of my first blog post.",
158158- "content": "<p>This is the full content of my <strong>first</strong> blog post with HTML formatting.</p>",
159159- "content_type": "html",
160160- "author": {
161161- "name": "John Doe",
162162- "email": "john@example.com",
163163- "uri": "https://johndoe.blog"
164164- },
165165- "categories": ["blogging", "personal"],
166166- "rights": "Copyright 2024 John Doe",
167167- "source": "https://johndoe.blog/feed.xml"
168168-}
169169-```
170170-171171-## Python Class Integration
172172-173173-To leverage Thicket's existing validation and business logic, third-party clients should use the following Python classes from the `thicket.models` package:
174174-175175-### Core Data Models
176176-177177-```python
178178-from thicket.models import (
179179- AtomEntry, # Feed entry representation
180180- GitStoreIndex, # Repository index
181181- UserMetadata, # User information
182182- DuplicateMap, # Duplicate ID mappings
183183- FeedMetadata, # Feed-level metadata
184184- ThicketConfig, # Configuration
185185- UserConfig, # User configuration
186186- ZulipAssociation # Zulip server/user_id pairs
187187-)
188188-```
189189-190190-### Repository Operations
191191-192192-```python
193193-from thicket.core.git_store import GitStore
194194-from thicket.core.feed_parser import FeedParser
195195-196196-# Initialize git store
197197-store = GitStore(Path("/path/to/git/store"))
198198-199199-# Read data
200200-index = store._load_index() # Load index.json
201201-user = store.get_user("username") # Get user metadata
202202-entries = store.list_entries("username", limit=10)
203203-entry = store.get_entry("username", "entry_id")
204204-duplicates = store.get_duplicates() # Load duplicates.json
205205-206206-# Write data
207207-store.add_user("username", display_name="Display Name")
208208-store.store_entry("username", atom_entry)
209209-store.add_duplicate("duplicate_id", "canonical_id")
210210-store.commit_changes("Commit message")
211211-212212-# Zulip associations
213213-store.add_zulip_association("username", "myorg.zulipchat.com", "user@example.com")
214214-store.remove_zulip_association("username", "myorg.zulipchat.com", "user@example.com")
215215-associations = store.get_zulip_associations("username")
216216-217217-# Search and statistics
218218-results = store.search_entries("query", username="optional")
219219-stats = store.get_stats()
220220-```
221221-222222-### Feed Processing
223223-224224-```python
225225-from thicket.core.feed_parser import FeedParser
226226-from pydantic import HttpUrl
227227-228228-parser = FeedParser()
229229-230230-# Fetch and parse feeds
231231-content = await parser.fetch_feed(HttpUrl("https://example.com/feed.xml"))
232232-feed_metadata, entries = parser.parse_feed(content, source_url)
233233-234234-# Entry ID sanitization for filenames
235235-safe_filename = parser.sanitize_entry_id(entry.id)
236236-```
237237-238238-## File Naming and ID Sanitization
239239-240240-Entry IDs from feeds are sanitized to create safe filenames using `FeedParser.sanitize_entry_id()`:
241241-242242-- URLs are parsed and the path component is used as the base
243243-- Characters are limited to alphanumeric, hyphens, underscores, and periods
244244-- Other characters are replaced with underscores
245245-- Maximum length is 200 characters
246246-- Empty results default to "entry"
247247-248248-**Examples:**
249249-- `https://example.com/posts/my-post` โ `posts_my-post.json`
250250-- `https://blog.com/2024/01/title?utm=source` โ `2024_01_title.json`
251251-252252-## Data Validation
253253-254254-All JSON data should be validated using Pydantic models before writing to the store:
255255-256256-```python
257257-from thicket.models import AtomEntry
258258-from pydantic import ValidationError
259259-260260-try:
261261- entry = AtomEntry(**json_data)
262262- # Data is valid, safe to store
263263- store.store_entry(username, entry)
264264-except ValidationError as e:
265265- # Handle validation errors
266266- print(f"Invalid entry data: {e}")
267267-```
268268-269269-## Timestamps
270270-271271-All timestamps use ISO 8601 format in UTC:
272272-- `created`: When the record was first created
273273-- `last_updated`: When the record was last modified
274274-- `updated`: When the feed entry was last updated (from feed)
275275-- `published`: When the feed entry was originally published (from feed)
276276-277277-## Content Sanitization
278278-279279-HTML content in entries is sanitized using the `FeedParser._sanitize_html()` method to prevent XSS attacks. Allowed tags and attributes are strictly controlled.
280280-281281-**Allowed HTML tags:**
282282-`a`, `abbr`, `acronym`, `b`, `blockquote`, `br`, `code`, `em`, `i`, `li`, `ol`, `p`, `pre`, `strong`, `ul`, `h1`-`h6`, `img`, `div`, `span`
283283-284284-**Allowed attributes:**
285285-- `a`: `href`, `title`
286286-- `img`: `src`, `alt`, `title`, `width`, `height`
287287-- `blockquote`: `cite`
288288-- `abbr`/`acronym`: `title`
289289-290290-## Error Handling and Robustness
291291-292292-The store is designed to be fault-tolerant:
293293-294294-- Invalid entries are skipped during processing with error logging
295295-- Malformed JSON files are ignored in listings
296296-- Missing files return `None` rather than raising exceptions
297297-- Git operations are atomic where possible
298298-299299-## Example Usage
300300-301301-### Reading the Store
302302-303303-```python
304304-from pathlib import Path
305305-from thicket.core.git_store import GitStore
306306-307307-# Initialize
308308-store = GitStore(Path("/path/to/thicket/store"))
309309-310310-# Get all users
311311-index = store._load_index()
312312-for username, user_metadata in index.users.items():
313313- print(f"User: {user_metadata.display_name} ({username})")
314314- print(f" Feeds: {user_metadata.feeds}")
315315- print(f" Entries: {user_metadata.entry_count}")
316316-317317-# Get recent entries for a user
318318-entries = store.list_entries("johndoe", limit=5)
319319-for entry in entries:
320320- print(f" - {entry.title} ({entry.updated})")
321321-```
322322-323323-### Adding Data
324324-325325-```python
326326-from thicket.models import AtomEntry
327327-from datetime import datetime
328328-from pydantic import HttpUrl
329329-330330-# Create entry
331331-entry = AtomEntry(
332332- id="https://example.com/new-post",
333333- title="New Post",
334334- link=HttpUrl("https://example.com/new-post"),
335335- updated=datetime.now(),
336336- content="<p>Post content</p>",
337337- content_type="html"
338338-)
339339-340340-# Store entry
341341-store.store_entry("johndoe", entry)
342342-store.commit_changes("Add new blog post")
343343-```
344344-345345-## Zulip Integration
346346-347347-The Thicket Git store supports Zulip bot integration for automatic feed posting with user mentions.
348348-349349-### Zulip Associations
350350-351351-Users can be associated with their Zulip identities to enable @mentions:
352352-353353-```python
354354-# UserMetadata includes zulip_associations field
355355-user.zulip_associations = [
356356- ZulipAssociation(server="myorg.zulipchat.com", user_id="alice"),
357357- ZulipAssociation(server="other.zulipchat.com", user_id="alice@example.com")
358358-]
359359-360360-# Methods for managing associations
361361-user.add_zulip_association("myorg.zulipchat.com", "alice")
362362-user.get_zulip_mention("myorg.zulipchat.com") # Returns "alice"
363363-user.remove_zulip_association("myorg.zulipchat.com", "alice")
364364-```
365365-366366-### CLI Management
367367-368368-```bash
369369-# Add association
370370-thicket zulip-add alice myorg.zulipchat.com alice@example.com
371371-372372-# Remove association
373373-thicket zulip-remove alice myorg.zulipchat.com alice@example.com
374374-375375-# List associations
376376-thicket zulip-list # All users
377377-thicket zulip-list alice # Specific user
378378-379379-# Bulk import from CSV
380380-thicket zulip-import associations.csv
381381-```
382382-383383-### Bot Behavior
384384-385385-When the Thicket Zulip bot posts articles:
386386-387387-1. It checks for Zulip associations matching the current server
388388-2. If found, adds @mention to the post: `@**alice** posted:`
389389-3. The mentioned user receives a notification in Zulip
390390-391391-This enables automatic notifications when someone's blog post is shared.
392392-393393-## Versioning and Compatibility
394394-395395-This specification describes version 1.1 of the Thicket Git store format. Changes from 1.0:
396396-- Added `zulip_associations` field to UserMetadata (backwards compatible - defaults to empty list)
397397-398398-Future versions will maintain backward compatibility where possible, with migration tools provided for breaking changes.
399399-400400-To check the store format version, examine the repository structure and JSON schemas. Stores created by Thicket 0.1.0+ follow this specification.
-97
bot-config/README.md
···11-# Thicket Bot Configuration
22-33-This directory contains configuration files for the Thicket Zulip bot.
44-55-## Setup Instructions
66-77-### 1. Zulip Bot Configuration
88-99-1. Copy `zuliprc.template` to `zuliprc`:
1010- ```bash
1111- cp bot-config/zuliprc.template bot-config/zuliprc
1212- ```
1313-1414-2. Create a bot in your Zulip organization:
1515- - Go to Settings > Your bots > Add a new bot
1616- - Choose "Generic bot" type
1717- - Give it a name like "Thicket" and username like "thicket"
1818- - Copy the bot's email and API key
1919-2020-3. Edit `bot-config/zuliprc` with your bot's credentials:
2121- ```ini
2222- [api]
2323- email=thicket-bot@your-org.zulipchat.com
2424- key=your-actual-api-key-here
2525- site=https://your-org.zulipchat.com
2626- ```
2727-2828-### 2. Bot Behavior Configuration (Optional)
2929-3030-1. Copy `botrc.template` to `botrc` to customize bot behavior:
3131- ```bash
3232- cp bot-config/botrc.template bot-config/botrc
3333- ```
3434-3535-2. Edit `bot-config/botrc` to customize:
3636- - Sync intervals and batch sizes
3737- - Default stream/topic settings
3838- - Rate limiting parameters
3939- - Notification preferences
4040-4141-**Note**: The bot will work with default settings if no `botrc` file exists.
4242-4343-## File Descriptions
4444-4545-### `zuliprc` (Required)
4646-Contains Zulip API credentials for the bot. This file should **never** be committed to version control.
4747-4848-### `botrc` (Optional)
4949-Contains bot behavior configuration and defaults. This file can be committed to version control as it contains no secrets.
5050-5151-### Template Files
5252-- `zuliprc.template` - Template for Zulip credentials
5353-- `botrc.template` - Template for bot behavior settings
5454-5555-## Running the Bot
5656-5757-Once configured, run the bot with:
5858-5959-```bash
6060-# Run in foreground
6161-thicket bot run
6262-6363-# Run in background (daemon mode)
6464-thicket bot run --daemon
6565-6666-# Debug mode (sends DMs instead of stream posts)
6767-thicket bot run --debug-user your-thicket-username
6868-6969-# Custom config paths
7070-thicket bot run --config bot-config/zuliprc --botrc bot-config/botrc
7171-```
7272-7373-## Bot Commands
7474-7575-Once running, interact with the bot in Zulip:
7676-7777-- `@thicket help` - Show available commands
7878-- `@thicket status` - Show bot status and configuration
7979-- `@thicket sync now` - Force immediate sync
8080-- `@thicket schedule` - Show sync schedule
8181-- `@thicket claim <username>` - Claim a thicket username
8282-- `@thicket config <setting> <value>` - Change bot settings
8383-8484-## Security Notes
8585-8686-- **Never commit `zuliprc` with real credentials**
8787-- Add `bot-config/zuliprc` to `.gitignore`
8888-- The `botrc` file contains no secrets and can be safely committed
8989-- Bot settings changed via chat are stored in Zulip's persistent storage
9090-9191-## Troubleshooting
9292-9393-- Check bot status: `thicket bot status`
9494-- View bot logs when running in foreground mode
9595-- Verify Zulip credentials are correct
9696-- Ensure thicket.yaml configuration exists
9797-- Test bot functionality: `thicket bot test`
-28
bot-config/botrc
···11-[bot]
22-# Default RSS feed polling interval in seconds (minimum 60)
33-sync_interval = 300
44-55-# Maximum number of entries to post per sync cycle
66-max_entries_per_sync = 10
77-88-# Default stream and topic for posting (can be overridden via chat commands)
99-# Leave empty to require configuration via chat
1010-default_stream =
1111-default_topic =
1212-1313-# Rate limiting: seconds to wait between batches of posts
1414-rate_limit_delay = 5
1515-1616-# Number of posts per batch before applying rate limit
1717-posts_per_batch = 5
1818-1919-[catchup]
2020-# Number of entries to post on first run (catchup mode)
2121-catchup_entries = 5
2222-2323-[notifications]
2424-# Whether to send notifications when bot configuration changes
2525-config_change_notifications = true
2626-2727-# Whether to send notifications when users claim usernames
2828-username_claim_notifications = true
-34
bot-config/botrc.template
···11-[bot]
22-# Default RSS feed polling interval in seconds (minimum 60)
33-sync_interval = 300
44-55-# Maximum number of entries to post per sync cycle (1-50)
66-max_entries_per_sync = 10
77-88-# Default stream and topic for posting (can be overridden via chat commands)
99-# Leave empty to require configuration via chat
1010-default_stream =
1111-default_topic =
1212-1313-# Rate limiting: seconds to wait between batches of posts
1414-rate_limit_delay = 5
1515-1616-# Number of posts per batch before applying rate limit
1717-posts_per_batch = 5
1818-1919-[catchup]
2020-# Number of entries to post on first run (catchup mode)
2121-catchup_entries = 5
2222-2323-[notifications]
2424-# Whether to send notifications when bot configuration changes
2525-config_change_notifications = true
2626-2727-# Whether to send notifications when users claim usernames
2828-username_claim_notifications = true
2929-3030-# Instructions:
3131-# 1. Copy this file to botrc (without .template extension) to customize bot behavior
3232-# 2. The bot will use these defaults if no botrc file is found
3333-# 3. All settings can be overridden via chat commands (e.g., @mention config interval 600)
3434-# 4. Settings changed via chat are persisted in Zulip storage and take precedence
-16
bot-config/zuliprc.template
···11-[api]
22-# Your bot's email address (create this in Zulip Settings > Bots)
33-email=your-bot@your-organization.zulipchat.com
44-55-# Your bot's API key (found in Zulip Settings > Bots)
66-key=YOUR_BOT_API_KEY_HERE
77-88-# Your Zulip server URL
99-site=https://your-organization.zulipchat.com
1010-1111-# Instructions:
1212-# 1. Copy this file to zuliprc (without .template extension)
1313-# 2. Replace the placeholder values with your actual bot credentials
1414-# 3. Create a bot in your Zulip organization at Settings > Bots
1515-# 4. Use the bot's email and API key from the Zulip interface
1616-# 5. Never commit the actual zuliprc file with real credentials to version control
+260
code_duplication_analysis.md
···11+# Code Duplication Analysis for Thicket
22+33+## 1. Duplicate JSON Handling Code
44+55+### Pattern: JSON file reading/writing
66+**Locations:**
77+- `src/thicket/cli/commands/generate.py:230` - Reading JSON with `json.load(f)`
88+- `src/thicket/cli/commands/generate.py:249` - Reading links.json
99+- `src/thicket/cli/commands/index.py:2305` - Reading JSON
1010+- `src/thicket/cli/commands/index.py:2320` - Writing JSON with `json.dump()`
1111+- `src/thicket/cli/commands/threads.py:2456` - Reading JSON
1212+- `src/thicket/cli/commands/info.py:2683` - Reading JSON
1313+- `src/thicket/core/git_store.py:5546` - Writing JSON with custom serializer
1414+- `src/thicket/core/git_store.py:5556` - Reading JSON
1515+- `src/thicket/core/git_store.py:5566` - Writing JSON
1616+- `src/thicket/core/git_store.py:5656` - Writing JSON with model dump
1717+1818+**Recommendation:** Create a shared `json_utils.py` module:
1919+```python
2020+def read_json_file(path: Path) -> dict:
2121+ """Read JSON file with error handling."""
2222+ with open(path) as f:
2323+ return json.load(f)
2424+2525+def write_json_file(path: Path, data: dict, indent: int = 2) -> None:
2626+ """Write JSON file with consistent formatting."""
2727+ with open(path, "w") as f:
2828+ json.dump(data, f, indent=indent, default=str)
2929+3030+def write_model_json(path: Path, model: BaseModel, indent: int = 2) -> None:
3131+ """Write Pydantic model as JSON."""
3232+ with open(path, "w") as f:
3333+ json.dump(model.model_dump(mode="json", exclude_none=True), f, indent=indent, default=str)
3434+```
3535+3636+## 2. Repeated Datetime Handling
3737+3838+### Pattern: datetime formatting and fallback handling
3939+**Locations:**
4040+- `src/thicket/cli/commands/generate.py:241` - `key=lambda x: x[1].updated or x[1].published or datetime.min`
4141+- `src/thicket/cli/commands/generate.py:353` - Same pattern in thread sorting
4242+- `src/thicket/cli/commands/generate.py:359` - Same pattern for max date
4343+- `src/thicket/cli/commands/generate.py:625` - Same pattern
4444+- `src/thicket/cli/commands/generate.py:655` - `entry.updated or entry.published or datetime.min`
4545+- `src/thicket/cli/commands/generate.py:689` - Same pattern
4646+- `src/thicket/cli/commands/generate.py:702` - Same pattern
4747+- Multiple `.strftime('%Y-%m-%d')` calls throughout
4848+4949+**Recommendation:** Create a shared `datetime_utils.py` module:
5050+```python
5151+def get_entry_date(entry: AtomEntry) -> datetime:
5252+ """Get the most relevant date for an entry with fallback."""
5353+ return entry.updated or entry.published or datetime.min
5454+5555+def format_date_short(dt: datetime) -> str:
5656+ """Format datetime as YYYY-MM-DD."""
5757+ return dt.strftime('%Y-%m-%d')
5858+5959+def format_date_full(dt: datetime) -> str:
6060+ """Format datetime as YYYY-MM-DD HH:MM."""
6161+ return dt.strftime('%Y-%m-%d %H:%M')
6262+6363+def format_date_iso(dt: datetime) -> str:
6464+ """Format datetime as ISO string."""
6565+ return dt.isoformat()
6666+```
6767+6868+## 3. Path Handling Patterns
6969+7070+### Pattern: Directory creation and existence checks
7171+**Locations:**
7272+- `src/thicket/cli/commands/generate.py:225` - `if user_dir.exists()`
7373+- `src/thicket/cli/commands/generate.py:247` - `if links_file.exists()`
7474+- `src/thicket/cli/commands/generate.py:582` - `self.output_dir.mkdir(parents=True, exist_ok=True)`
7575+- `src/thicket/cli/commands/generate.py:585-586` - Multiple mkdir calls
7676+- `src/thicket/cli/commands/threads.py:2449` - `if not index_path.exists()`
7777+- `src/thicket/cli/commands/info.py:2681` - `if links_path.exists()`
7878+- `src/thicket/core/git_store.py:5515` - `if not self.repo_path.exists()`
7979+- `src/thicket/core/git_store.py:5586` - `user_dir.mkdir(exist_ok=True)`
8080+- Many more similar patterns
8181+8282+**Recommendation:** Create a shared `path_utils.py` module:
8383+```python
8484+def ensure_directory(path: Path) -> Path:
8585+ """Ensure directory exists, creating if necessary."""
8686+ path.mkdir(parents=True, exist_ok=True)
8787+ return path
8888+8989+def read_json_if_exists(path: Path, default: Any = None) -> Any:
9090+ """Read JSON file if it exists, otherwise return default."""
9191+ if path.exists():
9292+ with open(path) as f:
9393+ return json.load(f)
9494+ return default
9595+9696+def safe_path_join(*parts: Union[str, Path]) -> Path:
9797+ """Safely join path components."""
9898+ return Path(*parts)
9999+```
100100+101101+## 4. Progress Bar and Console Output
102102+103103+### Pattern: Progress bar creation and updates
104104+**Locations:**
105105+- `src/thicket/cli/commands/generate.py:209` - Progress with SpinnerColumn
106106+- `src/thicket/cli/commands/index.py:2230` - Same Progress pattern
107107+- Multiple `console.print()` calls with similar formatting patterns
108108+- Progress update patterns repeated
109109+110110+**Recommendation:** Create a shared `ui_utils.py` module:
111111+```python
112112+def create_progress_spinner(description: str) -> tuple[Progress, TaskID]:
113113+ """Create a standard progress spinner."""
114114+ progress = Progress(
115115+ SpinnerColumn(),
116116+ TextColumn("[progress.description]{task.description}"),
117117+ transient=True,
118118+ )
119119+ task = progress.add_task(description)
120120+ return progress, task
121121+122122+def print_success(message: str) -> None:
123123+ """Print success message with consistent formatting."""
124124+ console.print(f"[green]โ[/green] {message}")
125125+126126+def print_error(message: str) -> None:
127127+ """Print error message with consistent formatting."""
128128+ console.print(f"[red]Error: {message}[/red]")
129129+130130+def print_warning(message: str) -> None:
131131+ """Print warning message with consistent formatting."""
132132+ console.print(f"[yellow]Warning: {message}[/yellow]")
133133+```
134134+135135+## 5. Git Store Operations
136136+137137+### Pattern: Entry file operations
138138+**Locations:**
139139+- Multiple patterns of loading entries from user directories
140140+- Repeated safe_id generation
141141+- Repeated user directory path construction
142142+143143+**Recommendation:** Enhance GitStore with helper methods:
144144+```python
145145+def get_user_dir(self, username: str) -> Path:
146146+ """Get user directory path."""
147147+ return self.repo_path / username
148148+149149+def iter_user_entries(self, username: str) -> Iterator[tuple[Path, AtomEntry]]:
150150+ """Iterate over all entries for a user."""
151151+ user_dir = self.get_user_dir(username)
152152+ if user_dir.exists():
153153+ for entry_file in user_dir.glob("*.json"):
154154+ if entry_file.name not in ["index.json", "duplicates.json"]:
155155+ try:
156156+ entry = self.read_entry_file(entry_file)
157157+ yield entry_file, entry
158158+ except Exception:
159159+ continue
160160+```
161161+162162+## 6. Error Handling Patterns
163163+164164+### Pattern: Try-except with console error printing
165165+**Locations:**
166166+- Similar error handling patterns throughout CLI commands
167167+- Repeated `raise typer.Exit(1)` patterns
168168+- Similar exception message formatting
169169+170170+**Recommendation:** Create error handling decorators:
171171+```python
172172+def handle_cli_errors(func):
173173+ """Decorator to handle CLI command errors consistently."""
174174+ @functools.wraps(func)
175175+ def wrapper(*args, **kwargs):
176176+ try:
177177+ return func(*args, **kwargs)
178178+ except ValidationError as e:
179179+ console.print(f"[red]Validation error: {e}[/red]")
180180+ raise typer.Exit(1)
181181+ except Exception as e:
182182+ console.print(f"[red]Error: {e}[/red]")
183183+ if kwargs.get('verbose'):
184184+ console.print_exception()
185185+ raise typer.Exit(1)
186186+ return wrapper
187187+```
188188+189189+## 7. Configuration and Validation
190190+191191+### Pattern: Config file loading and validation
192192+**Locations:**
193193+- Repeated config loading pattern in every CLI command
194194+- Similar validation patterns for URLs and paths
195195+196196+**Recommendation:** Create a `config_utils.py` module:
197197+```python
198198+def load_config_with_defaults(config_path: Optional[Path] = None) -> ThicketConfig:
199199+ """Load config with standard defaults and error handling."""
200200+ if config_path is None:
201201+ config_path = Path("thicket.yaml")
202202+203203+ if not config_path.exists():
204204+ raise ConfigError(f"Configuration file not found: {config_path}")
205205+206206+ return load_config(config_path)
207207+208208+def validate_url(url: str) -> HttpUrl:
209209+ """Validate and return URL with consistent error handling."""
210210+ try:
211211+ return HttpUrl(url)
212212+ except ValidationError:
213213+ raise ConfigError(f"Invalid URL: {url}")
214214+```
215215+216216+## 8. Model Serialization
217217+218218+### Pattern: Pydantic model JSON encoding
219219+**Locations:**
220220+- Repeated `json_encoders={datetime: lambda v: v.isoformat()}` in model configs
221221+- Similar model_dump patterns
222222+223223+**Recommendation:** Create base model class:
224224+```python
225225+class ThicketBaseModel(BaseModel):
226226+ """Base model with common configuration."""
227227+ model_config = ConfigDict(
228228+ json_encoders={datetime: lambda v: v.isoformat()},
229229+ str_strip_whitespace=True,
230230+ )
231231+232232+ def to_json_dict(self) -> dict:
233233+ """Convert to JSON-serializable dict."""
234234+ return self.model_dump(mode="json", exclude_none=True)
235235+```
236236+237237+## Summary of Refactoring Benefits
238238+239239+1. **Reduced Code Duplication**: Eliminate 30-40% of duplicate code
240240+2. **Consistent Error Handling**: Standardize error messages and handling
241241+3. **Easier Maintenance**: Central location for common patterns
242242+4. **Better Testing**: Easier to unit test shared utilities
243243+5. **Type Safety**: Shared type hints and validation
244244+6. **Performance**: Potential to optimize common operations in one place
245245+246246+## Implementation Priority
247247+248248+1. **High Priority**:
249249+ - JSON utilities (used everywhere)
250250+ - Datetime utilities (critical for sorting and display)
251251+ - Error handling decorators (improves UX consistency)
252252+253253+2. **Medium Priority**:
254254+ - Path utilities
255255+ - UI/Console utilities
256256+ - Config utilities
257257+258258+3. **Low Priority**:
259259+ - Base model classes (requires more refactoring)
260260+ - Git store enhancements (already well-structured)
···11-# Requirements for Thicket Zulip bot
22-# These are already included in the main thicket package
33-pydantic>=2.11.0
44-GitPython>=3.1.40
55-feedparser>=6.0.11
66-httpx>=0.28.0
77-pyyaml>=6.0.0
-201
src/thicket/bots/test_bot.py
···11-"""Test utilities for the Thicket Zulip bot."""
22-33-import json
44-from pathlib import Path
55-from typing import Any, Optional
66-77-from ..models import AtomEntry
88-from .thicket_bot import ThicketBotHandler
99-1010-1111-class MockBotHandler:
1212- """Mock BotHandler for testing the Thicket bot."""
1313-1414- def __init__(self) -> None:
1515- """Initialize mock bot handler."""
1616- self.storage_data: dict[str, str] = {}
1717- self.sent_messages: list[dict[str, Any]] = []
1818- self.config_info = {
1919- "full_name": "Thicket Bot",
2020- "email": "thicket-bot@example.com",
2121- }
2222-2323- def get_config_info(self) -> dict[str, str]:
2424- """Return bot configuration info."""
2525- return self.config_info
2626-2727- def send_reply(self, message: dict[str, Any], content: str) -> None:
2828- """Mock sending a reply."""
2929- reply = {
3030- "type": "reply",
3131- "to": message.get("sender_id"),
3232- "content": content,
3333- "original_message": message,
3434- }
3535- self.sent_messages.append(reply)
3636-3737- def send_message(self, message: dict[str, Any]) -> None:
3838- """Mock sending a message."""
3939- self.sent_messages.append(message)
4040-4141- @property
4242- def storage(self) -> "MockStorage":
4343- """Return mock storage."""
4444- return MockStorage(self.storage_data)
4545-4646-4747-class MockStorage:
4848- """Mock storage for bot state."""
4949-5050- def __init__(self, storage_data: dict[str, str]) -> None:
5151- """Initialize with storage data."""
5252- self.storage_data = storage_data
5353-5454- def __enter__(self) -> "MockStorage":
5555- """Context manager entry."""
5656- return self
5757-5858- def __exit__(self, exc_type: Any, exc_val: Any, exc_tb: Any) -> None:
5959- """Context manager exit."""
6060- pass
6161-6262- def get(self, key: str) -> Optional[str]:
6363- """Get value from storage."""
6464- return self.storage_data.get(key)
6565-6666- def put(self, key: str, value: str) -> None:
6767- """Put value in storage."""
6868- self.storage_data[key] = value
6969-7070- def contains(self, key: str) -> bool:
7171- """Check if key exists in storage."""
7272- return key in self.storage_data
7373-7474-7575-def create_test_message(
7676- content: str,
7777- sender: str = "Test User",
7878- sender_id: int = 12345,
7979- message_type: str = "stream",
8080-) -> dict[str, Any]:
8181- """Create a test message for bot testing."""
8282- return {
8383- "content": content,
8484- "sender_full_name": sender,
8585- "sender_id": sender_id,
8686- "type": message_type,
8787- "timestamp": 1642694400, # 2022-01-20 12:00:00 UTC
8888- "stream_id": 1,
8989- "subject": "test topic",
9090- }
9191-9292-9393-def create_test_entry(
9494- entry_id: str = "test-entry-1",
9595- title: str = "Test Article",
9696- link: str = "https://example.com/test-article",
9797-) -> AtomEntry:
9898- """Create a test AtomEntry for testing."""
9999- from datetime import datetime
100100-101101- from pydantic import HttpUrl
102102-103103- return AtomEntry(
104104- id=entry_id,
105105- title=title,
106106- link=HttpUrl(link),
107107- updated=datetime(2024, 1, 20, 12, 0, 0),
108108- published=datetime(2024, 1, 20, 10, 0, 0),
109109- summary="This is a test article summary",
110110- content="<p>This is test article content</p>",
111111- author={"name": "Test Author", "email": "author@example.com"},
112112- )
113113-114114-115115-class BotTester:
116116- """Helper class for testing bot functionality."""
117117-118118- def __init__(self, config_path: Optional[Path] = None) -> None:
119119- """Initialize bot tester."""
120120- self.bot = ThicketBotHandler()
121121- self.handler = MockBotHandler()
122122-123123- if config_path:
124124- # Configure bot with test config
125125- self.configure_bot(config_path, "test-stream", "test-topic")
126126-127127- def configure_bot(
128128- self, config_path: Path, stream: str = "test-stream", topic: str = "test-topic"
129129- ) -> None:
130130- """Configure the bot for testing."""
131131- # Set bot configuration
132132- config_data = {
133133- "stream_name": stream,
134134- "topic_name": topic,
135135- "sync_interval": 300,
136136- "max_entries_per_sync": 10,
137137- "config_path": str(config_path),
138138- }
139139-140140- self.handler.storage_data["bot_config"] = json.dumps(config_data)
141141-142142- # Initialize bot
143143- self.bot._load_bot_config(self.handler)
144144-145145- def send_command(
146146- self, command: str, sender: str = "Test User"
147147- ) -> list[dict[str, Any]]:
148148- """Send a command to the bot and return responses."""
149149- message = create_test_message(f"@thicket {command}", sender)
150150-151151- # Clear previous messages
152152- self.handler.sent_messages.clear()
153153-154154- # Send command
155155- self.bot.handle_message(message, self.handler)
156156-157157- return self.handler.sent_messages.copy()
158158-159159- def get_last_response_content(self) -> Optional[str]:
160160- """Get the content of the last bot response."""
161161- if self.handler.sent_messages:
162162- return self.handler.sent_messages[-1].get("content")
163163- return None
164164-165165- def get_last_message(self) -> Optional[dict[str, Any]]:
166166- """Get the last sent message."""
167167- if self.handler.sent_messages:
168168- return self.handler.sent_messages[-1]
169169- return None
170170-171171- def assert_response_contains(self, text: str) -> None:
172172- """Assert that the last response contains specific text."""
173173- content = self.get_last_response_content()
174174- assert content is not None, "No response received"
175175- assert text in content, f"Response does not contain '{text}': {content}"
176176-177177-178178-# Example usage for testing
179179-if __name__ == "__main__":
180180- # Create a test config file
181181- test_config = Path("/tmp/test_thicket.yaml")
182182-183183- # Create bot tester
184184- tester = BotTester()
185185-186186- # Test help command
187187- responses = tester.send_command("help")
188188- print(f"Help response: {tester.get_last_response_content()}")
189189-190190- # Test status command
191191- responses = tester.send_command("status")
192192- print(f"Status response: {tester.get_last_response_content()}")
193193-194194- # Test configuration
195195- responses = tester.send_command("config stream general")
196196- tester.assert_response_contains("Stream set to")
197197-198198- responses = tester.send_command("config topic 'Feed Updates'")
199199- tester.assert_response_contains("Topic set to")
200200-201201- print("All tests passed!")
-1257
src/thicket/bots/thicket_bot.py
···11-"""Zulip bot for automatically posting thicket feed updates."""
22-33-import asyncio
44-import json
55-import logging
66-import os
77-import time
88-from pathlib import Path
99-from typing import Any, Optional
1010-1111-from zulip_bots.lib import BotHandler
1212-1313-# Handle imports for both direct execution and package import
1414-try:
1515- from ..cli.commands.sync import sync_feed
1616- from ..core.git_store import GitStore
1717- from ..models import AtomEntry, ThicketConfig
1818-except ImportError:
1919- # When run directly by zulip-bots, add the package to path
2020- import sys
2121-2222- src_dir = Path(__file__).parent.parent.parent
2323- if str(src_dir) not in sys.path:
2424- sys.path.insert(0, str(src_dir))
2525-2626- from thicket.cli.commands.sync import sync_feed
2727- from thicket.core.git_store import GitStore
2828- from thicket.models import AtomEntry, ThicketConfig
2929-3030-3131-class ThicketBotHandler:
3232- """Zulip bot that monitors thicket feeds and posts new articles."""
3333-3434- def __init__(self) -> None:
3535- """Initialize the thicket bot."""
3636- self.logger = logging.getLogger(__name__)
3737- self.git_store: Optional[GitStore] = None
3838- self.config: Optional[ThicketConfig] = None
3939- self.posted_entries: set[str] = set()
4040-4141- # Bot configuration from storage
4242- self.stream_name: Optional[str] = None
4343- self.topic_name: Optional[str] = None
4444- self.sync_interval: int = 300 # 5 minutes default
4545- self.max_entries_per_sync: int = 10
4646- self.config_path: Optional[Path] = None
4747-4848- # Bot behavior settings (loaded from botrc)
4949- self.rate_limit_delay: int = 5
5050- self.posts_per_batch: int = 5
5151- self.catchup_entries: int = 5
5252- self.config_change_notifications: bool = True
5353- self.username_claim_notifications: bool = True
5454-5555- # Track last sync time for schedule queries
5656- self.last_sync_time: Optional[float] = None
5757-5858- # Debug mode configuration
5959- self.debug_user: Optional[str] = None
6060- self.debug_zulip_user_id: Optional[str] = None
6161-6262- def usage(self) -> str:
6363- """Return bot usage instructions."""
6464- return """
6565- **Thicket Feed Bot**
6666-6767- This bot automatically monitors thicket feeds and posts new articles.
6868-6969- Commands:
7070- - `@mention status` - Show current bot status and configuration
7171- - `@mention sync now` - Force an immediate sync
7272- - `@mention reset` - Clear posting history (will repost recent entries)
7373- - `@mention config stream <stream_name>` - Set target stream
7474- - `@mention config topic <topic_name>` - Set target topic
7575- - `@mention config interval <seconds>` - Set sync interval
7676- - `@mention schedule` - Show sync schedule and next run time
7777- - `@mention claim <username>` - Claim a thicket username for your Zulip account
7878- - `@mention help` - Show this help message
7979- """
8080-8181- def initialize(self, bot_handler: BotHandler) -> None:
8282- """Initialize the bot with persistent storage."""
8383- self.logger.info("Initializing ThicketBot")
8484-8585- # Get configuration from environment (set by CLI)
8686- self.debug_user = os.getenv("THICKET_DEBUG_USER")
8787- config_path_env = os.getenv("THICKET_CONFIG_PATH")
8888- if config_path_env:
8989- self.config_path = Path(config_path_env)
9090- self.logger.info(f"Using thicket config: {self.config_path}")
9191-9292- # Load default configuration from botrc file
9393- self._load_botrc_defaults()
9494-9595- # Load bot configuration from persistent storage
9696- self._load_bot_config(bot_handler)
9797-9898- # Initialize thicket components
9999- if self.config_path:
100100- try:
101101- self._initialize_thicket()
102102- self._load_posted_entries(bot_handler)
103103-104104- # Validate debug mode if enabled
105105- if self.debug_user:
106106- self._validate_debug_mode(bot_handler)
107107-108108- except Exception as e:
109109- self.logger.error(f"Failed to initialize thicket: {e}")
110110-111111- # Start background sync loop
112112- self._schedule_sync(bot_handler)
113113-114114- def handle_message(self, message: dict[str, Any], bot_handler: BotHandler) -> None:
115115- """Handle incoming Zulip messages."""
116116- content = message["content"].strip()
117117- sender = message["sender_full_name"]
118118-119119- # Only respond to mentions
120120- if not self._is_mentioned(content, bot_handler):
121121- return
122122-123123- # Parse command
124124- cleaned_content = self._clean_mention(content, bot_handler)
125125- command_parts = cleaned_content.split()
126126-127127- if not command_parts:
128128- self._send_help(message, bot_handler)
129129- return
130130-131131- command = command_parts[0].lower()
132132-133133- try:
134134- if command == "help":
135135- self._send_help(message, bot_handler)
136136- elif command == "status":
137137- self._send_status(message, bot_handler, sender)
138138- elif (
139139- command == "sync"
140140- and len(command_parts) > 1
141141- and command_parts[1] == "now"
142142- ):
143143- self._handle_force_sync(message, bot_handler, sender)
144144- elif command == "reset":
145145- self._handle_reset_command(message, bot_handler, sender)
146146- elif command == "config":
147147- self._handle_config_command(
148148- message, bot_handler, command_parts[1:], sender
149149- )
150150- elif command == "schedule":
151151- self._handle_schedule_command(message, bot_handler, sender)
152152- elif command == "claim":
153153- self._handle_claim_command(
154154- message, bot_handler, command_parts[1:], sender
155155- )
156156- else:
157157- bot_handler.send_reply(
158158- message,
159159- f"Unknown command: {command}. Type `@mention help` for usage.",
160160- )
161161- except Exception as e:
162162- self.logger.error(f"Error handling command '{command}': {e}")
163163- bot_handler.send_reply(message, f"Error processing command: {str(e)}")
164164-165165- def _is_mentioned(self, content: str, bot_handler: BotHandler) -> bool:
166166- """Check if the bot is mentioned in the message."""
167167- try:
168168- # Get bot's actual name from Zulip
169169- bot_info = bot_handler._client.get_profile()
170170- if bot_info.get("result") == "success":
171171- bot_name = bot_info.get("full_name", "").lower()
172172- if bot_name:
173173- return (
174174- f"@{bot_name}" in content.lower()
175175- or f"@**{bot_name}**" in content.lower()
176176- )
177177- except Exception as e:
178178- self.logger.debug(f"Could not get bot profile: {e}")
179179-180180- # Fallback to generic check
181181- return "@thicket" in content.lower()
182182-183183- def _clean_mention(self, content: str, bot_handler: BotHandler) -> str:
184184- """Remove bot mention from message content."""
185185- import re
186186-187187- try:
188188- # Get bot's actual name from Zulip
189189- bot_info = bot_handler._client.get_profile()
190190- if bot_info.get("result") == "success":
191191- bot_name = bot_info.get("full_name", "")
192192- if bot_name:
193193- # Remove @bot_name or @**bot_name**
194194- escaped_name = re.escape(bot_name)
195195- content = re.sub(
196196- rf"@(?:\*\*)?{escaped_name}(?:\*\*)?",
197197- "",
198198- content,
199199- flags=re.IGNORECASE,
200200- ).strip()
201201- return content
202202- except Exception as e:
203203- self.logger.debug(f"Could not get bot profile for mention cleaning: {e}")
204204-205205- # Fallback to removing @thicket
206206- content = re.sub(
207207- r"@(?:\*\*)?thicket(?:\*\*)?", "", content, flags=re.IGNORECASE
208208- ).strip()
209209- return content
210210-211211- def _send_help(self, message: dict[str, Any], bot_handler: BotHandler) -> None:
212212- """Send help message."""
213213- bot_handler.send_reply(message, self.usage())
214214-215215- def _send_status(
216216- self, message: dict[str, Any], bot_handler: BotHandler, sender: str
217217- ) -> None:
218218- """Send bot status information."""
219219- status_lines = [
220220- f"**Thicket Bot Status** (requested by {sender})",
221221- "",
222222- ]
223223-224224- # Debug mode status
225225- if self.debug_user:
226226- status_lines.extend(
227227- [
228228- "๐ **Debug Mode:** ENABLED",
229229- f"๐ฏ **Debug User:** {self.debug_user}",
230230- "",
231231- ]
232232- )
233233- else:
234234- status_lines.extend(
235235- [
236236- f"๐ **Stream:** {self.stream_name or 'Not configured'}",
237237- f"๐ **Topic:** {self.topic_name or 'Not configured'}",
238238- "",
239239- ]
240240- )
241241-242242- status_lines.extend(
243243- [
244244- f"โฑ๏ธ **Sync Interval:** {self.sync_interval}s ({self.sync_interval // 60}m {self.sync_interval % 60}s)",
245245- f"๐ **Max Entries/Sync:** {self.max_entries_per_sync}",
246246- f"๐ **Config Path:** {self.config_path or 'Not configured'}",
247247- "",
248248- f"๐ **Tracked Entries:** {len(self.posted_entries)}",
249249- f"๐ **Catchup Mode:** {'Active (first run)' if len(self.posted_entries) == 0 else 'Inactive'}",
250250- f"โ **Thicket Initialized:** {'Yes' if self.git_store else 'No'}",
251251- "",
252252- self._get_schedule_info(),
253253- ]
254254- )
255255-256256- bot_handler.send_reply(message, "\n".join(status_lines))
257257-258258- def _handle_force_sync(
259259- self, message: dict[str, Any], bot_handler: BotHandler, sender: str
260260- ) -> None:
261261- """Handle immediate sync request."""
262262- if not self._check_initialization(message, bot_handler):
263263- return
264264-265265- bot_handler.send_reply(
266266- message, f"๐ Starting immediate sync... (requested by {sender})"
267267- )
268268-269269- try:
270270- new_entries = self._perform_sync(bot_handler)
271271- bot_handler.send_reply(
272272- message, f"โ Sync completed! Found {len(new_entries)} new entries."
273273- )
274274- except Exception as e:
275275- self.logger.error(f"Force sync failed: {e}")
276276- bot_handler.send_reply(message, f"โ Sync failed: {str(e)}")
277277-278278- def _handle_reset_command(
279279- self, message: dict[str, Any], bot_handler: BotHandler, sender: str
280280- ) -> None:
281281- """Handle reset command to clear posted entries tracking."""
282282- try:
283283- self.posted_entries.clear()
284284- self._save_posted_entries(bot_handler)
285285- bot_handler.send_reply(
286286- message,
287287- f"โ Posting history reset! Recent entries will be posted on next sync. (requested by {sender})",
288288- )
289289- self.logger.info(f"Posted entries tracking reset by {sender}")
290290- except Exception as e:
291291- self.logger.error(f"Reset failed: {e}")
292292- bot_handler.send_reply(message, f"โ Reset failed: {str(e)}")
293293-294294- def _handle_schedule_command(
295295- self, message: dict[str, Any], bot_handler: BotHandler, sender: str
296296- ) -> None:
297297- """Handle schedule query command."""
298298- schedule_info = self._get_schedule_info()
299299- bot_handler.send_reply(
300300- message,
301301- f"**Thicket Bot Schedule** (requested by {sender})\n\n{schedule_info}",
302302- )
303303-304304- def _handle_claim_command(
305305- self,
306306- message: dict[str, Any],
307307- bot_handler: BotHandler,
308308- args: list[str],
309309- sender: str,
310310- ) -> None:
311311- """Handle username claiming command."""
312312- if not args:
313313- bot_handler.send_reply(message, "Usage: `@mention claim <username>`")
314314- return
315315-316316- if not self._check_initialization(message, bot_handler):
317317- return
318318-319319- username = args[0].strip()
320320-321321- # Get sender's Zulip user info
322322- sender_user_id = message.get("sender_id")
323323- sender_email = message.get("sender_email")
324324-325325- if not sender_user_id or not sender_email:
326326- bot_handler.send_reply(
327327- message, "โ Could not determine your Zulip user information."
328328- )
329329- return
330330-331331- try:
332332- # Get current Zulip server from environment
333333- zulip_site_url = os.getenv("THICKET_ZULIP_SITE_URL", "")
334334- server_url = zulip_site_url.replace("https://", "").replace("http://", "")
335335-336336- if not server_url:
337337- bot_handler.send_reply(
338338- message, "โ Could not determine Zulip server URL."
339339- )
340340- return
341341-342342- # Check if username exists in thicket
343343- user = self.git_store.get_user(username)
344344- if not user:
345345- bot_handler.send_reply(
346346- message,
347347- f"โ Username `{username}` not found in thicket. Available users: {', '.join(self.git_store.list_users())}",
348348- )
349349- return
350350-351351- # Check if username is already claimed for this server
352352- existing_zulip_id = user.get_zulip_mention(server_url)
353353- if existing_zulip_id:
354354- # Check if it's claimed by the same user
355355- if existing_zulip_id == sender_email or str(existing_zulip_id) == str(
356356- sender_user_id
357357- ):
358358- bot_handler.send_reply(
359359- message,
360360- f"โ Username `{username}` is already claimed by you on {server_url}!",
361361- )
362362- else:
363363- bot_handler.send_reply(
364364- message,
365365- f"โ Username `{username}` is already claimed by another user on {server_url}.",
366366- )
367367- return
368368-369369- # Claim the username - prefer email for consistency
370370- success = self.git_store.add_zulip_association(
371371- username, server_url, sender_email
372372- )
373373-374374- if success:
375375- reply_msg = (
376376- f"๐ Successfully claimed username `{username}` for **{sender}** on {server_url}!\n"
377377- + "You will now be mentioned when new articles are posted from this user's feeds."
378378- )
379379- bot_handler.send_reply(message, reply_msg)
380380-381381- # Send notification to configured stream if enabled and not in debug mode
382382- if (
383383- self.username_claim_notifications
384384- and not self.debug_user
385385- and self.stream_name
386386- and self.topic_name
387387- ):
388388- try:
389389- notification_msg = f"๐ **{sender}** claimed thicket username `{username}` on {server_url}"
390390- bot_handler.send_message(
391391- {
392392- "type": "stream",
393393- "to": self.stream_name,
394394- "subject": self.topic_name,
395395- "content": notification_msg,
396396- }
397397- )
398398- except Exception as e:
399399- self.logger.error(
400400- f"Failed to send username claim notification: {e}"
401401- )
402402-403403- self.logger.info(
404404- f"User {sender} ({sender_email}) claimed username {username} on {server_url}"
405405- )
406406- else:
407407- bot_handler.send_reply(
408408- message,
409409- f"โ Failed to claim username `{username}`. This shouldn't happen - please contact an administrator.",
410410- )
411411-412412- except Exception as e:
413413- self.logger.error(f"Error processing claim for {username} by {sender}: {e}")
414414- bot_handler.send_reply(message, f"โ Error processing claim: {str(e)}")
415415-416416- def _handle_config_command(
417417- self,
418418- message: dict[str, Any],
419419- bot_handler: BotHandler,
420420- args: list[str],
421421- sender: str,
422422- ) -> None:
423423- """Handle configuration commands."""
424424- if len(args) < 2:
425425- bot_handler.send_reply(
426426- message, "Usage: `@mention config <setting> <value>`"
427427- )
428428- return
429429-430430- setting = args[0].lower()
431431- value = " ".join(args[1:])
432432-433433- if setting == "stream":
434434- old_value = self.stream_name
435435- self.stream_name = value
436436- self._save_bot_config(bot_handler)
437437- bot_handler.send_reply(
438438- message, f"โ Stream set to: **{value}** (by {sender})"
439439- )
440440- self._send_config_change_notification(
441441- bot_handler, sender, "stream", old_value, value
442442- )
443443-444444- elif setting == "topic":
445445- old_value = self.topic_name
446446- self.topic_name = value
447447- self._save_bot_config(bot_handler)
448448- bot_handler.send_reply(
449449- message, f"โ Topic set to: **{value}** (by {sender})"
450450- )
451451- self._send_config_change_notification(
452452- bot_handler, sender, "topic", old_value, value
453453- )
454454-455455- elif setting == "interval":
456456- try:
457457- interval = int(value)
458458- if interval < 60:
459459- bot_handler.send_reply(
460460- message, "โ Interval must be at least 60 seconds"
461461- )
462462- return
463463- old_value = self.sync_interval
464464- self.sync_interval = interval
465465- self._save_bot_config(bot_handler)
466466- bot_handler.send_reply(
467467- message, f"โ Sync interval set to: **{interval}s** (by {sender})"
468468- )
469469- self._send_config_change_notification(
470470- bot_handler,
471471- sender,
472472- "sync interval",
473473- f"{old_value}s",
474474- f"{interval}s",
475475- )
476476- except ValueError:
477477- bot_handler.send_reply(
478478- message, "โ Invalid interval value. Must be a number of seconds."
479479- )
480480-481481- elif setting == "max_entries":
482482- try:
483483- max_entries = int(value)
484484- if max_entries < 1 or max_entries > 50:
485485- bot_handler.send_reply(
486486- message, "โ Max entries must be between 1 and 50"
487487- )
488488- return
489489- old_value = self.max_entries_per_sync
490490- self.max_entries_per_sync = max_entries
491491- self._save_bot_config(bot_handler)
492492- bot_handler.send_reply(
493493- message,
494494- f"โ Max entries per sync set to: **{max_entries}** (by {sender})",
495495- )
496496- self._send_config_change_notification(
497497- bot_handler,
498498- sender,
499499- "max entries per sync",
500500- str(old_value),
501501- str(max_entries),
502502- )
503503- except ValueError:
504504- bot_handler.send_reply(
505505- message, "โ Invalid max entries value. Must be a number."
506506- )
507507-508508- else:
509509- bot_handler.send_reply(
510510- message,
511511- f"โ Unknown setting: {setting}. Available: stream, topic, interval, max_entries",
512512- )
513513-514514- def _load_bot_config(self, bot_handler: BotHandler) -> None:
515515- """Load bot configuration from persistent storage."""
516516- try:
517517- config_data = bot_handler.storage.get("bot_config")
518518- if config_data:
519519- config = json.loads(config_data)
520520- self.stream_name = config.get("stream_name")
521521- self.topic_name = config.get("topic_name")
522522- self.sync_interval = config.get("sync_interval", 300)
523523- self.max_entries_per_sync = config.get("max_entries_per_sync", 10)
524524- self.last_sync_time = config.get("last_sync_time")
525525- except Exception:
526526- # Bot config not found on first run is expected
527527- pass
528528-529529- def _save_bot_config(self, bot_handler: BotHandler) -> None:
530530- """Save bot configuration to persistent storage."""
531531- try:
532532- config_data = {
533533- "stream_name": self.stream_name,
534534- "topic_name": self.topic_name,
535535- "sync_interval": self.sync_interval,
536536- "max_entries_per_sync": self.max_entries_per_sync,
537537- "last_sync_time": self.last_sync_time,
538538- }
539539- bot_handler.storage.put("bot_config", json.dumps(config_data))
540540- except Exception as e:
541541- self.logger.error(f"Error saving bot config: {e}")
542542-543543- def _load_botrc_defaults(self) -> None:
544544- """Load default configuration from botrc file."""
545545- try:
546546- import configparser
547547- from pathlib import Path
548548-549549- botrc_path = Path("bot-config/botrc")
550550- if not botrc_path.exists():
551551- self.logger.info("No botrc file found, using hardcoded defaults")
552552- return
553553-554554- config = configparser.ConfigParser()
555555- config.read(botrc_path)
556556-557557- if "bot" in config:
558558- bot_section = config["bot"]
559559- self.sync_interval = bot_section.getint("sync_interval", 300)
560560- self.max_entries_per_sync = bot_section.getint(
561561- "max_entries_per_sync", 10
562562- )
563563- self.rate_limit_delay = bot_section.getint("rate_limit_delay", 5)
564564- self.posts_per_batch = bot_section.getint("posts_per_batch", 5)
565565-566566- # Set defaults only if not already configured
567567- default_stream = bot_section.get("default_stream", "").strip()
568568- default_topic = bot_section.get("default_topic", "").strip()
569569- if default_stream:
570570- self.stream_name = default_stream
571571- if default_topic:
572572- self.topic_name = default_topic
573573-574574- if "catchup" in config:
575575- catchup_section = config["catchup"]
576576- self.catchup_entries = catchup_section.getint("catchup_entries", 5)
577577-578578- if "notifications" in config:
579579- notifications_section = config["notifications"]
580580- self.config_change_notifications = notifications_section.getboolean(
581581- "config_change_notifications", True
582582- )
583583- self.username_claim_notifications = notifications_section.getboolean(
584584- "username_claim_notifications", True
585585- )
586586-587587- self.logger.info(f"Loaded configuration from {botrc_path}")
588588-589589- except Exception as e:
590590- self.logger.error(f"Error loading botrc defaults: {e}")
591591- self.logger.info("Using hardcoded defaults")
592592-593593- def _initialize_thicket(self) -> None:
594594- """Initialize thicket components."""
595595- if not self.config_path or not self.config_path.exists():
596596- raise ValueError("Thicket config file not found")
597597-598598- # Load thicket configuration
599599- import yaml
600600-601601- with open(self.config_path) as f:
602602- config_data = yaml.safe_load(f)
603603- self.config = ThicketConfig(**config_data)
604604-605605- # Initialize git store
606606- self.git_store = GitStore(self.config.git_store)
607607-608608- self.logger.info("Thicket components initialized successfully")
609609-610610- def _validate_debug_mode(self, bot_handler: BotHandler) -> None:
611611- """Validate debug mode configuration."""
612612- if not self.debug_user or not self.git_store:
613613- return
614614-615615- # Get current Zulip server from environment
616616- zulip_site_url = os.getenv("THICKET_ZULIP_SITE_URL", "")
617617- server_url = zulip_site_url.replace("https://", "").replace("http://", "")
618618-619619- # Check if debug user exists in thicket
620620- user = self.git_store.get_user(self.debug_user)
621621- if not user:
622622- raise ValueError(f"Debug user '{self.debug_user}' not found in thicket")
623623-624624- # Check if user has Zulip association for this server
625625- if not server_url:
626626- raise ValueError("Could not determine Zulip server URL")
627627-628628- zulip_user_id = user.get_zulip_mention(server_url)
629629- if not zulip_user_id:
630630- raise ValueError(
631631- f"User '{self.debug_user}' has no Zulip association for server '{server_url}'"
632632- )
633633-634634- # Try to look up the actual Zulip user ID from the email address
635635- # But don't fail if we can't - we'll try again when sending messages
636636- actual_user_id = self._lookup_zulip_user_id(bot_handler, zulip_user_id)
637637- if actual_user_id and actual_user_id != zulip_user_id:
638638- # Successfully resolved to numeric ID
639639- self.debug_zulip_user_id = actual_user_id
640640- self.logger.info(
641641- f"Debug mode enabled: Will send DMs to {self.debug_user} (email: {zulip_user_id}, user_id: {actual_user_id}) on {server_url}"
642642- )
643643- else:
644644- # Keep the email address, will resolve later when sending
645645- self.debug_zulip_user_id = zulip_user_id
646646- self.logger.info(
647647- f"Debug mode enabled: Will send DMs to {self.debug_user} ({zulip_user_id}) on {server_url} (will resolve user ID when sending)"
648648- )
649649-650650- def _lookup_zulip_user_id(
651651- self, bot_handler: BotHandler, email_or_id: str
652652- ) -> Optional[str]:
653653- """Look up Zulip user ID from email address or return the ID if it's already numeric."""
654654- # If it's already a numeric user ID, return it
655655- if email_or_id.isdigit():
656656- return email_or_id
657657-658658- try:
659659- client = bot_handler._client
660660- if not client:
661661- self.logger.error("No Zulip client available for user lookup")
662662- return None
663663-664664- # First try the get_user_by_email API if available
665665- try:
666666- user_result = client.get_user_by_email(email_or_id)
667667- if user_result.get("result") == "success":
668668- user_data = user_result.get("user", {})
669669- user_id = user_data.get("user_id")
670670- if user_id:
671671- self.logger.info(
672672- f"Found user ID {user_id} for '{email_or_id}' via get_user_by_email API"
673673- )
674674- return str(user_id)
675675- except (AttributeError, Exception):
676676- pass
677677-678678- # Fallback: Get all users and search through them
679679- users_result = client.get_users()
680680- if users_result.get("result") == "success":
681681- for user in users_result["members"]:
682682- user_email = user.get("email", "")
683683- delivery_email = user.get("delivery_email", "")
684684-685685- if (
686686- user_email == email_or_id
687687- or delivery_email == email_or_id
688688- or str(user.get("user_id")) == email_or_id
689689- ):
690690- user_id = user.get("user_id")
691691- return str(user_id)
692692-693693- self.logger.error(
694694- f"No user found with identifier '{email_or_id}'. Searched {len(users_result['members'])} users."
695695- )
696696- return None
697697- else:
698698- self.logger.error(
699699- f"Failed to get users: {users_result.get('msg', 'Unknown error')}"
700700- )
701701- return None
702702-703703- except Exception as e:
704704- self.logger.error(f"Error looking up user ID for '{email_or_id}': {e}")
705705- return None
706706-707707- def _lookup_zulip_user_info(
708708- self, bot_handler: BotHandler, email_or_id: str
709709- ) -> tuple[Optional[str], Optional[str]]:
710710- """Look up both Zulip user ID and full name from email address."""
711711- if email_or_id.isdigit():
712712- return email_or_id, None
713713-714714- try:
715715- client = bot_handler._client
716716- if not client:
717717- return None, None
718718-719719- # Try get_user_by_email API first
720720- try:
721721- user_result = client.get_user_by_email(email_or_id)
722722- if user_result.get("result") == "success":
723723- user_data = user_result.get("user", {})
724724- user_id = user_data.get("user_id")
725725- full_name = user_data.get("full_name", "")
726726- if user_id:
727727- return str(user_id), full_name
728728- except AttributeError:
729729- pass
730730-731731- # Fallback: search all users
732732- users_result = client.get_users()
733733- if users_result.get("result") == "success":
734734- for user in users_result["members"]:
735735- if (
736736- user.get("email") == email_or_id
737737- or user.get("delivery_email") == email_or_id
738738- ):
739739- return str(user.get("user_id")), user.get("full_name", "")
740740-741741- return None, None
742742-743743- except Exception as e:
744744- self.logger.error(f"Error looking up user info for '{email_or_id}': {e}")
745745- return None, None
746746-747747- def _load_posted_entries(self, bot_handler: BotHandler) -> None:
748748- """Load the set of already posted entries."""
749749- try:
750750- posted_data = bot_handler.storage.get("posted_entries")
751751- if posted_data:
752752- self.posted_entries = set(json.loads(posted_data))
753753- except Exception:
754754- # Empty set on first run is expected
755755- self.posted_entries = set()
756756-757757- def _save_posted_entries(self, bot_handler: BotHandler) -> None:
758758- """Save the set of posted entries."""
759759- try:
760760- bot_handler.storage.put(
761761- "posted_entries", json.dumps(list(self.posted_entries))
762762- )
763763- except Exception as e:
764764- self.logger.error(f"Error saving posted entries: {e}")
765765-766766- def _check_initialization(
767767- self, message: dict[str, Any], bot_handler: BotHandler
768768- ) -> bool:
769769- """Check if thicket is properly initialized."""
770770- if not self.git_store or not self.config:
771771- bot_handler.send_reply(
772772- message, "โ Thicket not initialized. Please check configuration."
773773- )
774774- return False
775775-776776- # In debug mode, we don't need stream/topic configuration
777777- if self.debug_user:
778778- return True
779779-780780- if not self.stream_name or not self.topic_name:
781781- bot_handler.send_reply(
782782- message,
783783- "โ Stream and topic must be configured first. Use `@mention config stream <name>` and `@mention config topic <name>`",
784784- )
785785- return False
786786-787787- return True
788788-789789- def _schedule_sync(self, bot_handler: BotHandler) -> None:
790790- """Schedule periodic sync operations."""
791791-792792- def sync_loop():
793793- while True:
794794- try:
795795- # Check if we can sync
796796- can_sync = self.git_store and (
797797- (self.stream_name and self.topic_name) or self.debug_user
798798- )
799799-800800- if can_sync:
801801- self._perform_sync(bot_handler)
802802-803803- time.sleep(self.sync_interval)
804804- except Exception as e:
805805- self.logger.error(f"Error in sync loop: {e}")
806806- time.sleep(60) # Wait before retrying
807807-808808- # Start background thread
809809- import threading
810810-811811- sync_thread = threading.Thread(target=sync_loop, daemon=True)
812812- sync_thread.start()
813813-814814- def _perform_sync(self, bot_handler: BotHandler) -> list[AtomEntry]:
815815- """Perform thicket sync and return new entries."""
816816- if not self.config or not self.git_store:
817817- return []
818818-819819- new_entries: list[tuple[AtomEntry, str]] = [] # (entry, username) pairs
820820- is_first_run = len(self.posted_entries) == 0
821821-822822- # Get all users and their feeds from git store
823823- users_with_feeds = self.git_store.list_all_users_with_feeds()
824824-825825- # Sync each user's feeds
826826- for username, feed_urls in users_with_feeds:
827827- for feed_url in feed_urls:
828828- try:
829829- # Run async sync function
830830- loop = asyncio.new_event_loop()
831831- asyncio.set_event_loop(loop)
832832- try:
833833- new_count, _ = loop.run_until_complete(
834834- sync_feed(
835835- self.git_store, username, str(feed_url), dry_run=False
836836- )
837837- )
838838-839839- entries_to_check = []
840840-841841- if new_count > 0:
842842- # Get the newly added entries
843843- entries_to_check = self.git_store.list_entries(
844844- username, limit=new_count
845845- )
846846-847847- # Always check for catchup mode on first run
848848- if is_first_run:
849849- # Catchup mode: get configured number of entries on first run
850850- catchup_entries = self.git_store.list_entries(
851851- username, limit=self.catchup_entries
852852- )
853853- entries_to_check = (
854854- catchup_entries
855855- if not entries_to_check
856856- else entries_to_check
857857- )
858858-859859- for entry in entries_to_check:
860860- entry_key = f"{username}:{entry.id}"
861861- if entry_key not in self.posted_entries:
862862- new_entries.append((entry, username))
863863- if len(new_entries) >= self.max_entries_per_sync:
864864- break
865865-866866- finally:
867867- loop.close()
868868-869869- except Exception as e:
870870- self.logger.error(
871871- f"Error syncing feed {feed_url} for user {username}: {e}"
872872- )
873873-874874- if len(new_entries) >= self.max_entries_per_sync:
875875- break
876876-877877- # Post new entries to Zulip with rate limiting
878878- if new_entries:
879879- posted_count = 0
880880-881881- for i, (entry, username) in enumerate(new_entries):
882882- self._post_entry_to_zulip(entry, bot_handler, username)
883883- self.posted_entries.add(f"{username}:{entry.id}")
884884- posted_count += 1
885885-886886- # Rate limiting: pause after configured number of messages
887887- if (
888888- posted_count % self.posts_per_batch == 0
889889- and i < len(new_entries) - 1
890890- ):
891891- time.sleep(self.rate_limit_delay)
892892-893893- self._save_posted_entries(bot_handler)
894894-895895- # Update last sync time
896896- self.last_sync_time = time.time()
897897-898898- return [entry for entry, _ in new_entries]
899899-900900- def _post_entry_to_zulip(
901901- self, entry: AtomEntry, bot_handler: BotHandler, username: str
902902- ) -> None:
903903- """Post a single entry to the configured Zulip stream/topic or debug user DM."""
904904- try:
905905- # Get current Zulip server from environment
906906- zulip_site_url = os.getenv("THICKET_ZULIP_SITE_URL", "")
907907- server_url = zulip_site_url.replace("https://", "").replace("http://", "")
908908-909909- # Build author/date info consistently
910910- mention_info = ""
911911- if server_url and self.git_store:
912912- user = self.git_store.get_user(username)
913913- if user:
914914- zulip_user_id = user.get_zulip_mention(server_url)
915915- if zulip_user_id:
916916- # Look up the actual Zulip full name for proper @mention
917917- _, zulip_full_name = self._lookup_zulip_user_info(
918918- bot_handler, zulip_user_id
919919- )
920920- display_name = zulip_full_name or user.display_name or username
921921-922922- # Check if author is different from the user - avoid redundancy
923923- author_name = entry.author and entry.author.get("name")
924924- if author_name and author_name.lower() != display_name.lower():
925925- author_info = f" (by {author_name})"
926926- else:
927927- author_info = ""
928928-929929- published_info = ""
930930- if entry.published:
931931- published_info = (
932932- f" โข {entry.published.strftime('%Y-%m-%d')}"
933933- )
934934-935935- mention_info = f"@**{display_name}** posted{author_info}{published_info}:\n\n"
936936-937937- # If no Zulip user found, use consistent format without @mention
938938- if not mention_info:
939939- user = self.git_store.get_user(username) if self.git_store else None
940940- display_name = user.display_name if user else username
941941-942942- author_name = entry.author and entry.author.get("name")
943943- if author_name and author_name.lower() != display_name.lower():
944944- author_info = f" (by {author_name})"
945945- else:
946946- author_info = ""
947947-948948- published_info = ""
949949- if entry.published:
950950- published_info = f" โข {entry.published.strftime('%Y-%m-%d')}"
951951-952952- mention_info = (
953953- f"**{display_name}** posted{author_info}{published_info}:\n\n"
954954- )
955955-956956- # Format the message with HTML processing
957957- message_lines = [
958958- f"**{entry.title}**",
959959- f"๐ {entry.link}",
960960- ]
961961-962962- if entry.summary:
963963- # Process HTML in summary and truncate if needed
964964- processed_summary = self._process_html_content(entry.summary)
965965- if len(processed_summary) > 400:
966966- processed_summary = processed_summary[:397] + "..."
967967- message_lines.append(f"\n{processed_summary}")
968968-969969- message_content = mention_info + "\n".join(message_lines)
970970-971971- # Choose destination based on mode
972972- if self.debug_user and self.debug_zulip_user_id:
973973- # Debug mode: send DM
974974- debug_message = f"๐ **DEBUG:** New article from thicket user `{username}`:\n\n{message_content}"
975975-976976- # Ensure we have the numeric user ID
977977- user_id_to_use = self.debug_zulip_user_id
978978- if not user_id_to_use.isdigit():
979979- # Need to look up the numeric ID
980980- resolved_id = self._lookup_zulip_user_id(
981981- bot_handler, user_id_to_use
982982- )
983983- if resolved_id:
984984- user_id_to_use = resolved_id
985985- self.logger.debug(
986986- f"Resolved {self.debug_zulip_user_id} to user ID {user_id_to_use}"
987987- )
988988- else:
989989- self.logger.error(
990990- f"Could not resolve user ID for {self.debug_zulip_user_id}"
991991- )
992992- return
993993-994994- try:
995995- # For private messages, user_id needs to be an integer, not string
996996- user_id_int = int(user_id_to_use)
997997- bot_handler.send_message(
998998- {
999999- "type": "private",
10001000- "to": [user_id_int], # Use integer user ID
10011001- "content": debug_message,
10021002- }
10031003- )
10041004- except ValueError:
10051005- # If conversion to int fails, user_id_to_use might be an email
10061006- try:
10071007- bot_handler.send_message(
10081008- {
10091009- "type": "private",
10101010- "to": [user_id_to_use], # Try as string (email)
10111011- "content": debug_message,
10121012- }
10131013- )
10141014- except Exception as e2:
10151015- self.logger.error(
10161016- f"Failed to send DM to {self.debug_user} (tried both int and string): {e2}"
10171017- )
10181018- return
10191019- except Exception as e:
10201020- self.logger.error(
10211021- f"Failed to send DM to {self.debug_user} ({user_id_to_use}): {e}"
10221022- )
10231023- return
10241024- self.logger.info(
10251025- f"Posted entry to debug user {self.debug_user}: {entry.title}"
10261026- )
10271027- else:
10281028- # Normal mode: send to stream/topic
10291029- bot_handler.send_message(
10301030- {
10311031- "type": "stream",
10321032- "to": self.stream_name,
10331033- "subject": self.topic_name,
10341034- "content": message_content,
10351035- }
10361036- )
10371037- self.logger.info(
10381038- f"Posted entry to stream: {entry.title} (user: {username})"
10391039- )
10401040-10411041- except Exception as e:
10421042- self.logger.error(f"Error posting entry to Zulip: {e}")
10431043-10441044- def _process_html_content(self, html_content: str) -> str:
10451045- """Process HTML content from feeds to clean Zulip-compatible markdown."""
10461046- if not html_content:
10471047- return ""
10481048-10491049- try:
10501050- # Try to use markdownify for proper HTML to Markdown conversion
10511051- from markdownify import markdownify as md
10521052-10531053- # Convert HTML to Markdown with compact settings for summaries
10541054- markdown = md(
10551055- html_content,
10561056- heading_style="ATX", # Use # for headings (but we'll post-process these)
10571057- bullets="-", # Use - for bullets
10581058- convert=[
10591059- "a",
10601060- "b",
10611061- "strong",
10621062- "i",
10631063- "em",
10641064- "code",
10651065- "pre",
10661066- "p",
10671067- "br",
10681068- "ul",
10691069- "ol",
10701070- "li",
10711071- "h1",
10721072- "h2",
10731073- "h3",
10741074- "h4",
10751075- "h5",
10761076- "h6",
10771077- ],
10781078- ).strip()
10791079-10801080- # Post-process to convert headings to bold for compact summaries
10811081- import re
10821082-10831083- # Convert markdown headers to bold with period
10841084- markdown = re.sub(
10851085- r"^#{1,6}\s*(.+)$", r"**\1.**", markdown, flags=re.MULTILINE
10861086- )
10871087-10881088- # Clean up excessive newlines and make more compact
10891089- markdown = re.sub(
10901090- r"\n\s*\n\s*\n+", " ", markdown
10911091- ) # Multiple newlines become space
10921092- markdown = re.sub(
10931093- r"\n\s*\n", ". ", markdown
10941094- ) # Double newlines become sentence breaks
10951095- markdown = re.sub(r"\n", " ", markdown) # Single newlines become spaces
10961096-10971097- # Clean up double periods and excessive whitespace
10981098- markdown = re.sub(r"\.\.+", ".", markdown)
10991099- markdown = re.sub(r"\s+", " ", markdown)
11001100- return markdown.strip()
11011101-11021102- except ImportError:
11031103- # Fallback: manual HTML processing
11041104- import re
11051105-11061106- content = html_content
11071107-11081108- # Convert headings to bold with periods for compact summaries
11091109- content = re.sub(
11101110- r"<h[1-6](?:\s[^>]*)?>([^<]*)</h[1-6]>",
11111111- r"**\1.** ",
11121112- content,
11131113- flags=re.IGNORECASE,
11141114- )
11151115-11161116- # Convert common HTML elements to Markdown
11171117- content = re.sub(
11181118- r"<(?:strong|b)(?:\s[^>]*)?>([^<]*)</(?:strong|b)>",
11191119- r"**\1**",
11201120- content,
11211121- flags=re.IGNORECASE,
11221122- )
11231123- content = re.sub(
11241124- r"<(?:em|i)(?:\s[^>]*)?>([^<]*)</(?:em|i)>",
11251125- r"*\1*",
11261126- content,
11271127- flags=re.IGNORECASE,
11281128- )
11291129- content = re.sub(
11301130- r"<code(?:\s[^>]*)?>([^<]*)</code>",
11311131- r"`\1`",
11321132- content,
11331133- flags=re.IGNORECASE,
11341134- )
11351135- content = re.sub(
11361136- r'<a(?:\s[^>]*?)?\s*href=["\']([^"\']*)["\'](?:\s[^>]*)?>([^<]*)</a>',
11371137- r"[\2](\1)",
11381138- content,
11391139- flags=re.IGNORECASE,
11401140- )
11411141-11421142- # Convert block elements to spaces instead of newlines for compactness
11431143- content = re.sub(r"<br\s*/?>", " ", content, flags=re.IGNORECASE)
11441144- content = re.sub(r"</p>\s*<p>", ". ", content, flags=re.IGNORECASE)
11451145- content = re.sub(
11461146- r"</?(?:p|div)(?:\s[^>]*)?>", " ", content, flags=re.IGNORECASE
11471147- )
11481148-11491149- # Remove remaining HTML tags
11501150- content = re.sub(r"<[^>]+>", "", content)
11511151-11521152- # Clean up whitespace and make compact
11531153- content = re.sub(
11541154- r"\s+", " ", content
11551155- ) # Multiple whitespace becomes single space
11561156- content = re.sub(
11571157- r"\.\.+", ".", content
11581158- ) # Multiple periods become single period
11591159- return content.strip()
11601160-11611161- except Exception as e:
11621162- self.logger.error(f"Error processing HTML content: {e}")
11631163- # Last resort: just strip HTML tags
11641164- import re
11651165-11661166- return re.sub(r"<[^>]+>", "", html_content).strip()
11671167-11681168- def _get_schedule_info(self) -> str:
11691169- """Get schedule information string."""
11701170- lines = []
11711171-11721172- if self.last_sync_time:
11731173- import datetime
11741174-11751175- last_sync = datetime.datetime.fromtimestamp(self.last_sync_time)
11761176- next_sync = last_sync + datetime.timedelta(seconds=self.sync_interval)
11771177- now = datetime.datetime.now()
11781178-11791179- # Calculate time until next sync
11801180- time_until_next = next_sync - now
11811181-11821182- if time_until_next.total_seconds() > 0:
11831183- minutes, seconds = divmod(int(time_until_next.total_seconds()), 60)
11841184- hours, minutes = divmod(minutes, 60)
11851185-11861186- if hours > 0:
11871187- time_str = f"{hours}h {minutes}m {seconds}s"
11881188- elif minutes > 0:
11891189- time_str = f"{minutes}m {seconds}s"
11901190- else:
11911191- time_str = f"{seconds}s"
11921192-11931193- lines.extend(
11941194- [
11951195- f"๐ **Last Sync:** {last_sync.strftime('%H:%M:%S')}",
11961196- f"โฐ **Next Sync:** {next_sync.strftime('%H:%M:%S')} (in {time_str})",
11971197- ]
11981198- )
11991199- else:
12001200- lines.extend(
12011201- [
12021202- f"๐ **Last Sync:** {last_sync.strftime('%H:%M:%S')}",
12031203- f"โฐ **Next Sync:** Due now (running every {self.sync_interval}s)",
12041204- ]
12051205- )
12061206- else:
12071207- lines.append("๐ **Last Sync:** Never (bot starting up)")
12081208-12091209- # Add sync frequency info
12101210- if self.sync_interval >= 3600:
12111211- frequency_str = (
12121212- f"{self.sync_interval // 3600}h {(self.sync_interval % 3600) // 60}m"
12131213- )
12141214- elif self.sync_interval >= 60:
12151215- frequency_str = f"{self.sync_interval // 60}m {self.sync_interval % 60}s"
12161216- else:
12171217- frequency_str = f"{self.sync_interval}s"
12181218-12191219- lines.append(f"๐ **Sync Frequency:** Every {frequency_str}")
12201220-12211221- return "\n".join(lines)
12221222-12231223- def _send_config_change_notification(
12241224- self,
12251225- bot_handler: BotHandler,
12261226- changer: str,
12271227- setting: str,
12281228- old_value: Optional[str],
12291229- new_value: str,
12301230- ) -> None:
12311231- """Send configuration change notification if enabled."""
12321232- if not self.config_change_notifications or self.debug_user:
12331233- return
12341234-12351235- # Don't send notification if stream/topic aren't configured yet
12361236- if not self.stream_name or not self.topic_name:
12371237- return
12381238-12391239- try:
12401240- old_display = old_value if old_value else "(not set)"
12411241- notification_msg = (
12421242- f"โ๏ธ **{changer}** changed {setting}: `{old_display}` โ `{new_value}`"
12431243- )
12441244-12451245- bot_handler.send_message(
12461246- {
12471247- "type": "stream",
12481248- "to": self.stream_name,
12491249- "subject": self.topic_name,
12501250- "content": notification_msg,
12511251- }
12521252- )
12531253- except Exception as e:
12541254- self.logger.error(f"Failed to send config change notification: {e}")
12551255-12561256-12571257-handler_class = ThicketBotHandler
···11+"""Generate static HTML website from thicket data."""
22+33+from pathlib import Path
44+from typing import Optional
55+66+import typer
77+88+from ..main import app, console, load_thicket
99+1010+1111+1212+1313+@app.command()
1414+def generate(
1515+ output: Path = typer.Option(
1616+ Path("./thicket-site"),
1717+ "--output",
1818+ "-o",
1919+ help="Output directory for the generated website",
2020+ ),
2121+ template_dir: Optional[Path] = typer.Option(
2222+ None, "--templates", help="Custom template directory"
2323+ ),
2424+ config_file: Optional[Path] = typer.Option(
2525+ None, "--config", help="Configuration file path"
2626+ ),
2727+) -> None:
2828+ """Generate a static HTML website from thicket data."""
2929+3030+ try:
3131+ # Load Thicket instance
3232+ thicket = load_thicket(config_file)
3333+3434+ console.print(f"[blue]Generating static site to:[/blue] {output}")
3535+3636+ # Generate the complete site
3737+ if thicket.generate_site(output, template_dir):
3838+ console.print(f"[green]โ[/green] Successfully generated site at {output}")
3939+4040+ # Show what was generated
4141+ stats = thicket.get_stats()
4242+ console.print(f" โข {stats.get('total_entries', 0)} entries")
4343+ console.print(f" โข {stats.get('total_users', 0)} users")
4444+ console.print(f" โข {stats.get('unique_urls', 0)} unique links")
4545+4646+ # List generated files
4747+ if output.exists():
4848+ html_files = list(output.glob("*.html"))
4949+ if html_files:
5050+ console.print(" โข Generated pages:")
5151+ for html_file in sorted(html_files):
5252+ console.print(f" - {html_file.name}")
5353+ else:
5454+ console.print("[red]โ[/red] Failed to generate site")
5555+ raise typer.Exit(1)
5656+5757+ except Exception as e:
5858+ console.print(f"[red]Error:[/red] {str(e)}")
5959+ raise typer.Exit(1)
+427
src/thicket/cli/commands/index_cmd.py
···11+"""CLI command for building reference index from blog entries."""
22+33+import json
44+from pathlib import Path
55+from typing import Optional
66+77+import typer
88+from rich.console import Console
99+from rich.progress import (
1010+ BarColumn,
1111+ Progress,
1212+ SpinnerColumn,
1313+ TaskProgressColumn,
1414+ TextColumn,
1515+)
1616+from rich.table import Table
1717+1818+from ...core.git_store import GitStore
1919+from ...core.reference_parser import ReferenceIndex, ReferenceParser
2020+from ..main import app
2121+from ..utils import get_tsv_mode, load_config
2222+2323+console = Console()
2424+2525+2626+@app.command()
2727+def index(
2828+ config_file: Optional[Path] = typer.Option(
2929+ None,
3030+ "--config",
3131+ "-c",
3232+ help="Path to configuration file",
3333+ ),
3434+ output_file: Optional[Path] = typer.Option(
3535+ None,
3636+ "--output",
3737+ "-o",
3838+ help="Path to output index file (default: updates links.json in git store)",
3939+ ),
4040+ verbose: bool = typer.Option(
4141+ False,
4242+ "--verbose",
4343+ "-v",
4444+ help="Show detailed progress information",
4545+ ),
4646+) -> None:
4747+ """Build a reference index showing which blog entries reference others.
4848+4949+ This command analyzes all blog entries to detect cross-references between
5050+ different blogs, creating an index that can be used to build threaded
5151+ views of related content.
5252+5353+ Updates the unified links.json file with reference data.
5454+ """
5555+ try:
5656+ # Load configuration
5757+ config = load_config(config_file)
5858+5959+ # Initialize Git store
6060+ git_store = GitStore(config.git_store)
6161+6262+ # Initialize reference parser
6363+ parser = ReferenceParser()
6464+6565+ # Build user domain mapping
6666+ if verbose:
6767+ console.print("Building user domain mapping...")
6868+ user_domains = parser.build_user_domain_mapping(git_store)
6969+7070+ if verbose:
7171+ console.print(f"Found {len(user_domains)} users with {sum(len(d) for d in user_domains.values())} total domains")
7272+7373+ # Initialize reference index
7474+ ref_index = ReferenceIndex()
7575+ ref_index.user_domains = user_domains
7676+7777+ # Get all users
7878+ index = git_store._load_index()
7979+ users = list(index.users.keys())
8080+8181+ if not users:
8282+ console.print("[yellow]No users found in Git store[/yellow]")
8383+ raise typer.Exit(0)
8484+8585+ # Process all entries
8686+ total_entries = 0
8787+ total_references = 0
8888+ all_references = []
8989+9090+ with Progress(
9191+ SpinnerColumn(),
9292+ TextColumn("[progress.description]{task.description}"),
9393+ BarColumn(),
9494+ TaskProgressColumn(),
9595+ console=console,
9696+ ) as progress:
9797+9898+ # Count total entries first
9999+ counting_task = progress.add_task("Counting entries...", total=len(users))
100100+ entry_counts = {}
101101+ for username in users:
102102+ entries = git_store.list_entries(username)
103103+ entry_counts[username] = len(entries)
104104+ total_entries += len(entries)
105105+ progress.advance(counting_task)
106106+107107+ progress.remove_task(counting_task)
108108+109109+ # Process entries - extract references
110110+ processing_task = progress.add_task(
111111+ f"Extracting references from {total_entries} entries...",
112112+ total=total_entries
113113+ )
114114+115115+ for username in users:
116116+ entries = git_store.list_entries(username)
117117+118118+ for entry in entries:
119119+ # Extract references from this entry
120120+ references = parser.extract_references(entry, username, user_domains)
121121+ all_references.extend(references)
122122+123123+ progress.advance(processing_task)
124124+125125+ if verbose and references:
126126+ console.print(f" Found {len(references)} references in {username}:{entry.title[:50]}...")
127127+128128+ progress.remove_task(processing_task)
129129+130130+ # Resolve target_entry_ids for references
131131+ if all_references:
132132+ resolve_task = progress.add_task(
133133+ f"Resolving {len(all_references)} references...",
134134+ total=len(all_references)
135135+ )
136136+137137+ if verbose:
138138+ console.print(f"Resolving target entry IDs for {len(all_references)} references...")
139139+140140+ resolved_references = parser.resolve_target_entry_ids(all_references, git_store)
141141+142142+ # Count resolved references
143143+ resolved_count = sum(1 for ref in resolved_references if ref.target_entry_id is not None)
144144+ if verbose:
145145+ console.print(f"Resolved {resolved_count} out of {len(all_references)} references")
146146+147147+ # Add resolved references to index
148148+ for ref in resolved_references:
149149+ ref_index.add_reference(ref)
150150+ total_references += 1
151151+ progress.advance(resolve_task)
152152+153153+ progress.remove_task(resolve_task)
154154+155155+ # Determine output path
156156+ if output_file:
157157+ output_path = output_file
158158+ else:
159159+ output_path = config.git_store / "links.json"
160160+161161+ # Load existing links data or create new structure
162162+ if output_path.exists() and not output_file:
163163+ # Load existing unified structure
164164+ with open(output_path) as f:
165165+ existing_data = json.load(f)
166166+ else:
167167+ # Create new structure
168168+ existing_data = {
169169+ "links": {},
170170+ "reverse_mapping": {},
171171+ "user_domains": {}
172172+ }
173173+174174+ # Update with reference data
175175+ existing_data["references"] = ref_index.to_dict()["references"]
176176+ existing_data["user_domains"] = {k: list(v) for k, v in user_domains.items()}
177177+178178+ # Save updated structure
179179+ with open(output_path, "w") as f:
180180+ json.dump(existing_data, f, indent=2, default=str)
181181+182182+ # Show summary
183183+ if not get_tsv_mode():
184184+ console.print("\n[green]โ Reference index built successfully[/green]")
185185+186186+ # Create summary table or TSV output
187187+ if get_tsv_mode():
188188+ print("Metric\tCount")
189189+ print(f"Total Users\t{len(users)}")
190190+ print(f"Total Entries\t{total_entries}")
191191+ print(f"Total References\t{total_references}")
192192+ print(f"Outbound Refs\t{len(ref_index.outbound_refs)}")
193193+ print(f"Inbound Refs\t{len(ref_index.inbound_refs)}")
194194+ print(f"Output File\t{output_path}")
195195+ else:
196196+ table = Table(title="Reference Index Summary")
197197+ table.add_column("Metric", style="cyan")
198198+ table.add_column("Count", style="green")
199199+200200+ table.add_row("Total Users", str(len(users)))
201201+ table.add_row("Total Entries", str(total_entries))
202202+ table.add_row("Total References", str(total_references))
203203+ table.add_row("Outbound Refs", str(len(ref_index.outbound_refs)))
204204+ table.add_row("Inbound Refs", str(len(ref_index.inbound_refs)))
205205+ table.add_row("Output File", str(output_path))
206206+207207+ console.print(table)
208208+209209+ # Show some interesting statistics
210210+ if total_references > 0:
211211+ if not get_tsv_mode():
212212+ console.print("\n[bold]Reference Statistics:[/bold]")
213213+214214+ # Most referenced users
215215+ target_counts = {}
216216+ unresolved_domains = set()
217217+218218+ for ref in ref_index.references:
219219+ if ref.target_username:
220220+ target_counts[ref.target_username] = target_counts.get(ref.target_username, 0) + 1
221221+ else:
222222+ # Track unresolved domains
223223+ from urllib.parse import urlparse
224224+ domain = urlparse(ref.target_url).netloc.lower()
225225+ unresolved_domains.add(domain)
226226+227227+ if target_counts:
228228+ if get_tsv_mode():
229229+ print("Referenced User\tReference Count")
230230+ for username, count in sorted(target_counts.items(), key=lambda x: x[1], reverse=True)[:5]:
231231+ print(f"{username}\t{count}")
232232+ else:
233233+ console.print("\nMost referenced users:")
234234+ for username, count in sorted(target_counts.items(), key=lambda x: x[1], reverse=True)[:5]:
235235+ console.print(f" {username}: {count} references")
236236+237237+ if unresolved_domains and verbose:
238238+ if get_tsv_mode():
239239+ print("Unresolved Domain\tCount")
240240+ for domain in sorted(list(unresolved_domains)[:10]):
241241+ print(f"{domain}\t1")
242242+ if len(unresolved_domains) > 10:
243243+ print(f"... and {len(unresolved_domains) - 10} more\t...")
244244+ else:
245245+ console.print(f"\nUnresolved domains: {len(unresolved_domains)}")
246246+ for domain in sorted(list(unresolved_domains)[:10]):
247247+ console.print(f" {domain}")
248248+ if len(unresolved_domains) > 10:
249249+ console.print(f" ... and {len(unresolved_domains) - 10} more")
250250+251251+ except Exception as e:
252252+ console.print(f"[red]Error building reference index: {e}[/red]")
253253+ if verbose:
254254+ console.print_exception()
255255+ raise typer.Exit(1)
256256+257257+258258+@app.command()
259259+def threads(
260260+ config_file: Optional[Path] = typer.Option(
261261+ None,
262262+ "--config",
263263+ "-c",
264264+ help="Path to configuration file",
265265+ ),
266266+ index_file: Optional[Path] = typer.Option(
267267+ None,
268268+ "--index",
269269+ "-i",
270270+ help="Path to reference index file (default: links.json in git store)",
271271+ ),
272272+ username: Optional[str] = typer.Option(
273273+ None,
274274+ "--username",
275275+ "-u",
276276+ help="Show threads for specific username only",
277277+ ),
278278+ entry_id: Optional[str] = typer.Option(
279279+ None,
280280+ "--entry",
281281+ "-e",
282282+ help="Show thread for specific entry ID",
283283+ ),
284284+ min_size: int = typer.Option(
285285+ 2,
286286+ "--min-size",
287287+ "-m",
288288+ help="Minimum thread size to display",
289289+ ),
290290+) -> None:
291291+ """Show threaded view of related blog entries.
292292+293293+ This command uses the reference index to show which blog entries
294294+ are connected through cross-references, creating an email-style
295295+ threaded view of the conversation.
296296+297297+ Reads reference data from the unified links.json file.
298298+ """
299299+ try:
300300+ # Load configuration
301301+ config = load_config(config_file)
302302+303303+ # Determine index file path
304304+ if index_file:
305305+ index_path = index_file
306306+ else:
307307+ index_path = config.git_store / "links.json"
308308+309309+ if not index_path.exists():
310310+ console.print(f"[red]Links file not found: {index_path}[/red]")
311311+ console.print("Run 'thicket links' and 'thicket index' first to build the reference index")
312312+ raise typer.Exit(1)
313313+314314+ # Load unified data
315315+ with open(index_path) as f:
316316+ unified_data = json.load(f)
317317+318318+ # Check if references exist in the unified structure
319319+ if "references" not in unified_data:
320320+ console.print(f"[red]No references found in {index_path}[/red]")
321321+ console.print("Run 'thicket index' first to build the reference index")
322322+ raise typer.Exit(1)
323323+324324+ # Extract reference data and reconstruct ReferenceIndex
325325+ ref_index = ReferenceIndex.from_dict({
326326+ "references": unified_data["references"],
327327+ "user_domains": unified_data.get("user_domains", {})
328328+ })
329329+330330+ # Initialize Git store to get entry details
331331+ git_store = GitStore(config.git_store)
332332+333333+ if entry_id and username:
334334+ # Show specific thread
335335+ thread_members = ref_index.get_thread_members(username, entry_id)
336336+ _display_thread(thread_members, ref_index, git_store, f"Thread for {username}:{entry_id}")
337337+338338+ elif username:
339339+ # Show all threads involving this user
340340+ user_index = git_store._load_index()
341341+ user = user_index.get_user(username)
342342+ if not user:
343343+ console.print(f"[red]User not found: {username}[/red]")
344344+ raise typer.Exit(1)
345345+346346+ entries = git_store.list_entries(username)
347347+ threads_found = set()
348348+349349+ console.print(f"[bold]Threads involving {username}:[/bold]\n")
350350+351351+ for entry in entries:
352352+ thread_members = ref_index.get_thread_members(username, entry.id)
353353+ if len(thread_members) >= min_size:
354354+ thread_key = tuple(sorted(thread_members))
355355+ if thread_key not in threads_found:
356356+ threads_found.add(thread_key)
357357+ _display_thread(thread_members, ref_index, git_store, f"Thread #{len(threads_found)}")
358358+359359+ else:
360360+ # Show all threads
361361+ console.print("[bold]All conversation threads:[/bold]\n")
362362+363363+ all_threads = set()
364364+ processed_entries = set()
365365+366366+ # Get all entries
367367+ user_index = git_store._load_index()
368368+ for username in user_index.users.keys():
369369+ entries = git_store.list_entries(username)
370370+ for entry in entries:
371371+ entry_key = (username, entry.id)
372372+ if entry_key in processed_entries:
373373+ continue
374374+375375+ thread_members = ref_index.get_thread_members(username, entry.id)
376376+ if len(thread_members) >= min_size:
377377+ thread_key = tuple(sorted(thread_members))
378378+ if thread_key not in all_threads:
379379+ all_threads.add(thread_key)
380380+ _display_thread(thread_members, ref_index, git_store, f"Thread #{len(all_threads)}")
381381+382382+ # Mark all members as processed
383383+ for member in thread_members:
384384+ processed_entries.add(member)
385385+386386+ if not all_threads:
387387+ console.print("[yellow]No conversation threads found[/yellow]")
388388+ console.print(f"(minimum thread size: {min_size})")
389389+390390+ except Exception as e:
391391+ console.print(f"[red]Error showing threads: {e}[/red]")
392392+ raise typer.Exit(1)
393393+394394+395395+def _display_thread(thread_members, ref_index, git_store, title):
396396+ """Display a single conversation thread."""
397397+ console.print(f"[bold cyan]{title}[/bold cyan]")
398398+ console.print(f"Thread size: {len(thread_members)} entries")
399399+400400+ # Get entry details for each member
401401+ thread_entries = []
402402+ for username, entry_id in thread_members:
403403+ entry = git_store.get_entry(username, entry_id)
404404+ if entry:
405405+ thread_entries.append((username, entry))
406406+407407+ # Sort by publication date
408408+ thread_entries.sort(key=lambda x: x[1].published or x[1].updated)
409409+410410+ # Display entries
411411+ for i, (username, entry) in enumerate(thread_entries):
412412+ prefix = "โโ" if i < len(thread_entries) - 1 else "โโ"
413413+414414+ # Get references for this entry
415415+ outbound = ref_index.get_outbound_refs(username, entry.id)
416416+ inbound = ref_index.get_inbound_refs(username, entry.id)
417417+418418+ ref_info = ""
419419+ if outbound or inbound:
420420+ ref_info = f" ({len(outbound)} out, {len(inbound)} in)"
421421+422422+ console.print(f" {prefix} [{username}] {entry.title[:60]}...{ref_info}")
423423+424424+ if entry.published:
425425+ console.print(f" Published: {entry.published.strftime('%Y-%m-%d')}")
426426+427427+ console.print() # Empty line after each thread
+119-106
src/thicket/cli/commands/info_cmd.py
···11"""CLI command for displaying detailed information about a specific atom entry."""
2233+import json
34from pathlib import Path
45from typing import Optional
56···78from rich.console import Console
89from rich.panel import Panel
910from rich.table import Table
1111+from rich.text import Text
10121113from ...core.git_store import GitStore
1414+from ...core.reference_parser import ReferenceIndex
1215from ..main import app
1313-from ..utils import get_tsv_mode, load_config
1616+from ..utils import load_config, get_tsv_mode
14171518console = Console()
1619···1821@app.command()
1922def info(
2023 identifier: str = typer.Argument(
2121- ..., help="The atom ID or URL of the entry to display information about"
2424+ ...,
2525+ help="The atom ID or URL of the entry to display information about"
2226 ),
2327 username: Optional[str] = typer.Option(
2428 None,
2529 "--username",
2630 "-u",
2727- help="Username to search for the entry (if not provided, searches all users)",
3131+ help="Username to search for the entry (if not provided, searches all users)"
2832 ),
2933 config_file: Optional[Path] = typer.Option(
3034 Path("thicket.yaml"),
···3337 help="Path to configuration file",
3438 ),
3539 show_content: bool = typer.Option(
3636- False, "--content", help="Include the full content of the entry in the output"
4040+ False,
4141+ "--content",
4242+ help="Include the full content of the entry in the output"
3743 ),
3844) -> None:
3945 """Display detailed information about a specific atom entry.
4040-4646+4147 You can specify the entry using either its atom ID or URL.
4248 Shows all metadata for the given entry, including title, dates, categories,
4349 and summarizes all inbound and outbound links to/from other posts.
···4551 try:
4652 # Load configuration
4753 config = load_config(config_file)
4848-5454+4955 # Initialize Git store
5056 git_store = GitStore(config.git_store)
5151-5757+5258 # Find the entry
5359 entry = None
5460 found_username = None
5555-6161+5662 # Check if identifier looks like a URL
5757- is_url = identifier.startswith(("http://", "https://"))
5858-6363+ is_url = identifier.startswith(('http://', 'https://'))
6464+5965 if username:
6066 # Search specific username
6167 if is_url:
···9197 if entry:
9298 found_username = user
9399 break
9494-100100+95101 if not entry or not found_username:
96102 if username:
9797- console.print(
9898- f"[red]Entry with {'URL' if is_url else 'atom ID'} '{identifier}' not found for user '{username}'[/red]"
9999- )
103103+ console.print(f"[red]Entry with {'URL' if is_url else 'atom ID'} '{identifier}' not found for user '{username}'[/red]")
100104 else:
101101- console.print(
102102- f"[red]Entry with {'URL' if is_url else 'atom ID'} '{identifier}' not found in any user's entries[/red]"
103103- )
105105+ console.print(f"[red]Entry with {'URL' if is_url else 'atom ID'} '{identifier}' not found in any user's entries[/red]")
104106 raise typer.Exit(1)
105105-107107+108108+ # Load reference index if available
109109+ links_path = config.git_store / "links.json"
110110+ ref_index = None
111111+ if links_path.exists():
112112+ with open(links_path) as f:
113113+ unified_data = json.load(f)
114114+115115+ # Check if references exist in the unified structure
116116+ if "references" in unified_data:
117117+ ref_index = ReferenceIndex.from_dict({
118118+ "references": unified_data["references"],
119119+ "user_domains": unified_data.get("user_domains", {})
120120+ })
121121+106122 # Display information
107123 if get_tsv_mode():
108108- _display_entry_info_tsv(entry, found_username, show_content)
124124+ _display_entry_info_tsv(entry, found_username, ref_index, show_content)
109125 else:
110126 _display_entry_info(entry, found_username)
111111-112112- # Display links and backlinks from entry fields
113113- _display_link_info(entry, found_username, git_store)
114114-127127+128128+ if ref_index:
129129+ _display_link_info(entry, found_username, ref_index)
130130+ else:
131131+ console.print("\n[yellow]No reference index found. Run 'thicket links' and 'thicket index' to build cross-reference data.[/yellow]")
132132+115133 # Optionally display content
116134 if show_content and entry.content:
117135 _display_content(entry.content)
118118-136136+119137 except Exception as e:
120138 console.print(f"[red]Error displaying entry info: {e}[/red]")
121121- raise typer.Exit(1) from e
139139+ raise typer.Exit(1)
122140123141124142def _display_entry_info(entry, username: str) -> None:
125143 """Display basic entry information in a structured format."""
126126-144144+127145 # Create main info panel
128146 info_table = Table.grid(padding=(0, 2))
129147 info_table.add_column("Field", style="cyan bold", width=15)
130148 info_table.add_column("Value", style="white")
131131-149149+132150 info_table.add_row("User", f"[green]{username}[/green]")
133151 info_table.add_row("Atom ID", f"[blue]{entry.id}[/blue]")
134152 info_table.add_row("Title", entry.title)
135153 info_table.add_row("Link", str(entry.link))
136136-154154+137155 if entry.published:
138138- info_table.add_row(
139139- "Published", entry.published.strftime("%Y-%m-%d %H:%M:%S UTC")
140140- )
141141-156156+ info_table.add_row("Published", entry.published.strftime("%Y-%m-%d %H:%M:%S UTC"))
157157+142158 info_table.add_row("Updated", entry.updated.strftime("%Y-%m-%d %H:%M:%S UTC"))
143143-159159+144160 if entry.summary:
145161 # Truncate long summaries
146146- summary = (
147147- entry.summary[:200] + "..." if len(entry.summary) > 200 else entry.summary
148148- )
162162+ summary = entry.summary[:200] + "..." if len(entry.summary) > 200 else entry.summary
149163 info_table.add_row("Summary", summary)
150150-164164+151165 if entry.categories:
152166 categories_text = ", ".join(entry.categories)
153167 info_table.add_row("Categories", categories_text)
154154-168168+155169 if entry.author:
156170 author_info = []
157171 if "name" in entry.author:
···160174 author_info.append(f"<{entry.author['email']}>")
161175 if author_info:
162176 info_table.add_row("Author", " ".join(author_info))
163163-177177+164178 if entry.content_type:
165179 info_table.add_row("Content Type", entry.content_type)
166166-180180+167181 if entry.rights:
168182 info_table.add_row("Rights", entry.rights)
169169-183183+170184 if entry.source:
171185 info_table.add_row("Source Feed", entry.source)
172172-186186+173187 panel = Panel(
174174- info_table, title="[bold]Entry Information[/bold]", border_style="blue"
188188+ info_table,
189189+ title=f"[bold]Entry Information[/bold]",
190190+ border_style="blue"
175191 )
176176-192192+177193 console.print(panel)
178194179195180180-def _display_link_info(entry, username: str, git_store: GitStore) -> None:
196196+def _display_link_info(entry, username: str, ref_index: ReferenceIndex) -> None:
181197 """Display inbound and outbound link information."""
182182-183183- # Get links from entry fields
184184- outbound_links = getattr(entry, "links", [])
185185- backlinks = getattr(entry, "backlinks", [])
186186-187187- if not outbound_links and not backlinks:
198198+199199+ # Get links
200200+ outbound_refs = ref_index.get_outbound_refs(username, entry.id)
201201+ inbound_refs = ref_index.get_inbound_refs(username, entry.id)
202202+203203+ if not outbound_refs and not inbound_refs:
188204 console.print("\n[dim]No cross-references found for this entry.[/dim]")
189205 return
190190-206206+191207 # Create links table
192208 links_table = Table(title="Cross-References")
193209 links_table.add_column("Direction", style="cyan", width=10)
194194- links_table.add_column("Target/Source", style="green", width=30)
195195- links_table.add_column("URL/ID", style="blue", width=60)
196196-197197- # Add outbound links
198198- for link in outbound_links:
199199- links_table.add_row("โ Out", "External/Other", link)
200200-201201- # Add backlinks (inbound references)
202202- for backlink_id in backlinks:
203203- # Try to find which user this entry belongs to
204204- source_info = backlink_id
205205- # Could enhance this by looking up the actual entry to get username
206206- links_table.add_row("โ In", "Entry", source_info)
207207-210210+ links_table.add_column("Target/Source", style="green", width=20)
211211+ links_table.add_column("URL", style="blue", width=50)
212212+213213+ # Add outbound references
214214+ for ref in outbound_refs:
215215+ target_info = f"{ref.target_username}:{ref.target_entry_id}" if ref.target_username and ref.target_entry_id else "External"
216216+ links_table.add_row("โ Out", target_info, ref.target_url)
217217+218218+ # Add inbound references
219219+ for ref in inbound_refs:
220220+ source_info = f"{ref.source_username}:{ref.source_entry_id}"
221221+ links_table.add_row("โ In", source_info, ref.target_url)
222222+208223 console.print()
209224 console.print(links_table)
210210-225225+211226 # Summary
212212- console.print(
213213- f"\n[bold]Summary:[/bold] {len(outbound_links)} outbound links, {len(backlinks)} inbound backlinks"
214214- )
227227+ console.print(f"\n[bold]Summary:[/bold] {len(outbound_refs)} outbound, {len(inbound_refs)} inbound references")
215228216229217230def _display_content(content: str) -> None:
218231 """Display the full content of the entry."""
219219-232232+220233 # Truncate very long content
221234 display_content = content
222235 if len(content) > 5000:
223236 display_content = content[:5000] + "\n\n[... content truncated ...]"
224224-237237+225238 panel = Panel(
226239 display_content,
227240 title="[bold]Entry Content[/bold]",
228241 border_style="green",
229229- expand=False,
242242+ expand=False
230243 )
231231-244244+232245 console.print()
233246 console.print(panel)
234247235248236236-def _display_entry_info_tsv(entry, username: str, show_content: bool) -> None:
249249+def _display_entry_info_tsv(entry, username: str, ref_index: Optional[ReferenceIndex], show_content: bool) -> None:
237250 """Display entry information in TSV format."""
238238-251251+239252 # Basic info
240253 print("Field\tValue")
241254 print(f"User\t{username}")
242255 print(f"Atom ID\t{entry.id}")
243243- print(
244244- f"Title\t{entry.title.replace(chr(9), ' ').replace(chr(10), ' ').replace(chr(13), ' ')}"
245245- )
256256+ print(f"Title\t{entry.title.replace(chr(9), ' ').replace(chr(10), ' ').replace(chr(13), ' ')}")
246257 print(f"Link\t{entry.link}")
247247-258258+248259 if entry.published:
249260 print(f"Published\t{entry.published.strftime('%Y-%m-%d %H:%M:%S UTC')}")
250250-261261+251262 print(f"Updated\t{entry.updated.strftime('%Y-%m-%d %H:%M:%S UTC')}")
252252-263263+253264 if entry.summary:
254265 # Escape tabs and newlines in summary
255255- summary = entry.summary.replace("\t", " ").replace("\n", " ").replace("\r", " ")
266266+ summary = entry.summary.replace('\t', ' ').replace('\n', ' ').replace('\r', ' ')
256267 print(f"Summary\t{summary}")
257257-268268+258269 if entry.categories:
259270 print(f"Categories\t{', '.join(entry.categories)}")
260260-271271+261272 if entry.author:
262273 author_info = []
263274 if "name" in entry.author:
···266277 author_info.append(f"<{entry.author['email']}>")
267278 if author_info:
268279 print(f"Author\t{' '.join(author_info)}")
269269-280280+270281 if entry.content_type:
271282 print(f"Content Type\t{entry.content_type}")
272272-283283+273284 if entry.rights:
274285 print(f"Rights\t{entry.rights}")
275275-286286+276287 if entry.source:
277288 print(f"Source Feed\t{entry.source}")
278278-279279- # Add links info from entry fields
280280- outbound_links = getattr(entry, "links", [])
281281- backlinks = getattr(entry, "backlinks", [])
282282-283283- if outbound_links or backlinks:
284284- print(f"Outbound Links\t{len(outbound_links)}")
285285- print(f"Backlinks\t{len(backlinks)}")
286286-287287- # Show each link
288288- for link in outbound_links:
289289- print(f"โ Link\t{link}")
290290-291291- for backlink_id in backlinks:
292292- print(f"โ Backlink\t{backlink_id}")
293293-289289+290290+ # Add reference info if available
291291+ if ref_index:
292292+ outbound_refs = ref_index.get_outbound_refs(username, entry.id)
293293+ inbound_refs = ref_index.get_inbound_refs(username, entry.id)
294294+295295+ print(f"Outbound References\t{len(outbound_refs)}")
296296+ print(f"Inbound References\t{len(inbound_refs)}")
297297+298298+ # Show each reference
299299+ for ref in outbound_refs:
300300+ target_info = f"{ref.target_username}:{ref.target_entry_id}" if ref.target_username and ref.target_entry_id else "External"
301301+ print(f"Outbound Reference\t{target_info}\t{ref.target_url}")
302302+303303+ for ref in inbound_refs:
304304+ source_info = f"{ref.source_username}:{ref.source_entry_id}"
305305+ print(f"Inbound Reference\t{source_info}\t{ref.target_url}")
306306+294307 # Show content if requested
295308 if show_content and entry.content:
296309 # Escape tabs and newlines in content
297297- content = entry.content.replace("\t", " ").replace("\n", " ").replace("\r", " ")
298298- print(f"Content\t{content}")
310310+ content = entry.content.replace('\t', ' ').replace('\n', ' ').replace('\r', ' ')
311311+ print(f"Content\t{content}")
+51-39
src/thicket/cli/commands/init.py
···11"""Initialize command for thicket."""
2233+import yaml
34from pathlib import Path
45from typing import Optional
5667import typer
77-from pydantic import ValidationError
8899-from ...core.git_store import GitStore
99+from ..main import app, console, get_config_path
1010from ...models import ThicketConfig
1111-from ..main import app
1212-from ..utils import print_error, print_success, save_config
1111+from ... import Thicket
131214131514@app.command()
1615def init(
1717- git_store: Path = typer.Argument(
1818- ..., help="Path to Git repository for storing feeds"
1919- ),
1616+ git_store: Path = typer.Argument(..., help="Path to Git repository for storing feeds"),
2017 cache_dir: Optional[Path] = typer.Option(
2118 None, "--cache-dir", "-c", help="Cache directory (default: ~/.cache/thicket)"
2219 ),
2320 config_file: Optional[Path] = typer.Option(
2424- None, "--config", help="Configuration file path (default: thicket.yaml)"
2121+ None, "--config", help="Configuration file path (default: ~/.config/thicket/config.yaml)"
2522 ),
2623 force: bool = typer.Option(
2724 False, "--force", "-f", help="Overwrite existing configuration"
···31283229 # Set default paths
3330 if cache_dir is None:
3434- from platformdirs import user_cache_dir
3535-3636- cache_dir = Path(user_cache_dir("thicket"))
3131+ cache_dir = Path.home() / ".cache" / "thicket"
37323833 if config_file is None:
3939- config_file = Path("thicket.yaml")
3434+ config_file = get_config_path()
40354136 # Check if config already exists
4237 if config_file.exists() and not force:
4343- print_error(f"Configuration file already exists: {config_file}")
4444- print_error("Use --force to overwrite")
3838+ console.print(f"[red]Configuration file already exists:[/red] {config_file}")
3939+ console.print("Use --force to overwrite")
4540 raise typer.Exit(1)
46414747- # Create cache directory
4848- cache_dir.mkdir(parents=True, exist_ok=True)
4949-5050- # Create Git store
5142 try:
5252- GitStore(git_store)
5353- print_success(f"Initialized Git store at: {git_store}")
5454- except Exception as e:
5555- print_error(f"Failed to initialize Git store: {e}")
5656- raise typer.Exit(1) from e
4343+ # Create directories
4444+ git_store.mkdir(parents=True, exist_ok=True)
4545+ cache_dir.mkdir(parents=True, exist_ok=True)
4646+ config_file.parent.mkdir(parents=True, exist_ok=True)
57475858- # Create configuration
5959- try:
6060- config = ThicketConfig(git_store=git_store, cache_dir=cache_dir, users=[])
4848+ # Create Thicket instance with minimal config
4949+ thicket = Thicket.create(git_store, cache_dir)
5050+5151+ # Initialize the repository
5252+ if thicket.init_repository():
5353+ console.print(f"[green]โ[/green] Initialized Git store at: {git_store}")
5454+ else:
5555+ console.print(f"[red]โ[/red] Failed to initialize Git store")
5656+ raise typer.Exit(1)
61576262- save_config(config, config_file)
6363- print_success(f"Created configuration file: {config_file}")
5858+ # Save configuration
5959+ config_data = {
6060+ 'git_store': str(git_store),
6161+ 'cache_dir': str(cache_dir),
6262+ 'users': []
6363+ }
6464+6565+ with open(config_file, 'w') as f:
6666+ yaml.dump(config_data, f, default_flow_style=False)
6767+6868+ console.print(f"[green]โ[/green] Created configuration file: {config_file}")
6969+7070+ # Create initial commit
7171+ if thicket.commit_changes("Initialize thicket repository"):
7272+ console.print("[green]โ[/green] Created initial commit")
7373+7474+ console.print("\n[green]Thicket initialized successfully![/green]")
7575+ console.print(f" โข Git store: {git_store}")
7676+ console.print(f" โข Cache directory: {cache_dir}")
7777+ console.print(f" โข Configuration: {config_file}")
7878+ console.print("\n[blue]Next steps:[/blue]")
7979+ console.print(" 1. Add your first user and feed:")
8080+ console.print(f" [cyan]thicket add username https://example.com/feed.xml[/cyan]")
8181+ console.print(" 2. Sync feeds:")
8282+ console.print(f" [cyan]thicket sync[/cyan]")
8383+ console.print(" 3. Generate a website:")
8484+ console.print(f" [cyan]thicket generate[/cyan]")
64856565- except ValidationError as e:
6666- print_error(f"Invalid configuration: {e}")
6767- raise typer.Exit(1) from e
6886 except Exception as e:
6969- print_error(f"Failed to create configuration: {e}")
7070- raise typer.Exit(1) from e
7171-7272- print_success("Thicket initialized successfully!")
7373- print_success(f"Git store: {git_store}")
7474- print_success(f"Cache directory: {cache_dir}")
7575- print_success(f"Configuration: {config_file}")
7676- print_success("Run 'thicket add user' to add your first user and feed.")
8787+ console.print(f"[red]Error:[/red] {str(e)}")
8888+ raise typer.Exit(1)
+416
src/thicket/cli/commands/links_cmd.py
···11+"""CLI command for extracting and categorizing all outbound links from blog entries."""
22+33+import json
44+import re
55+from pathlib import Path
66+from typing import Dict, List, Optional, Set
77+from urllib.parse import urljoin, urlparse
88+99+import typer
1010+from rich.console import Console
1111+from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn, TaskProgressColumn
1212+from rich.table import Table
1313+1414+from ...core.git_store import GitStore
1515+from ..main import app
1616+from ..utils import load_config, get_tsv_mode
1717+1818+console = Console()
1919+2020+2121+class LinkData:
2222+ """Represents a link found in a blog entry."""
2323+2424+ def __init__(self, url: str, entry_id: str, username: str):
2525+ self.url = url
2626+ self.entry_id = entry_id
2727+ self.username = username
2828+2929+ def to_dict(self) -> dict:
3030+ """Convert to dictionary for JSON serialization."""
3131+ return {
3232+ "url": self.url,
3333+ "entry_id": self.entry_id,
3434+ "username": self.username
3535+ }
3636+3737+ @classmethod
3838+ def from_dict(cls, data: dict) -> "LinkData":
3939+ """Create from dictionary."""
4040+ return cls(
4141+ url=data["url"],
4242+ entry_id=data["entry_id"],
4343+ username=data["username"]
4444+ )
4545+4646+4747+class LinkCategorizer:
4848+ """Categorizes links as internal, user, or unknown."""
4949+5050+ def __init__(self, user_domains: Dict[str, Set[str]]):
5151+ self.user_domains = user_domains
5252+ # Create reverse mapping of domain -> username
5353+ self.domain_to_user = {}
5454+ for username, domains in user_domains.items():
5555+ for domain in domains:
5656+ self.domain_to_user[domain] = username
5757+5858+ def categorize_url(self, url: str, source_username: str) -> tuple[str, Optional[str]]:
5959+ """
6060+ Categorize a URL as 'internal', 'user', or 'unknown'.
6161+ Returns (category, target_username).
6262+ """
6363+ try:
6464+ parsed = urlparse(url)
6565+ domain = parsed.netloc.lower()
6666+6767+ # Check if it's a link to the same user's domain (internal)
6868+ if domain in self.user_domains.get(source_username, set()):
6969+ return "internal", source_username
7070+7171+ # Check if it's a link to another user's domain
7272+ if domain in self.domain_to_user:
7373+ return "user", self.domain_to_user[domain]
7474+7575+ # Everything else is unknown
7676+ return "unknown", None
7777+7878+ except Exception:
7979+ return "unknown", None
8080+8181+8282+class LinkExtractor:
8383+ """Extracts and resolves links from blog entries."""
8484+8585+ def __init__(self):
8686+ # Pattern for extracting links from HTML
8787+ self.link_pattern = re.compile(r'<a[^>]+href="([^"]+)"[^>]*>(.*?)</a>', re.IGNORECASE | re.DOTALL)
8888+ self.url_pattern = re.compile(r'https?://[^\s<>"]+')
8989+9090+ def extract_links_from_html(self, html_content: str, base_url: str) -> List[tuple[str, str]]:
9191+ """Extract all links from HTML content and resolve them against base URL."""
9292+ links = []
9393+9494+ # Extract links from <a> tags
9595+ for match in self.link_pattern.finditer(html_content):
9696+ url = match.group(1)
9797+ text = re.sub(r'<[^>]+>', '', match.group(2)).strip() # Remove HTML tags from link text
9898+9999+ # Resolve relative URLs against base URL
100100+ resolved_url = urljoin(base_url, url)
101101+ links.append((resolved_url, text))
102102+103103+ return links
104104+105105+106106+ def extract_links_from_entry(self, entry, username: str, base_url: str) -> List[LinkData]:
107107+ """Extract all links from a blog entry."""
108108+ links = []
109109+110110+ # Combine all text content for analysis
111111+ content_to_search = []
112112+ if entry.content:
113113+ content_to_search.append(entry.content)
114114+ if entry.summary:
115115+ content_to_search.append(entry.summary)
116116+117117+ for content in content_to_search:
118118+ extracted_links = self.extract_links_from_html(content, base_url)
119119+120120+ for url, link_text in extracted_links:
121121+ # Skip empty URLs
122122+ if not url or url.startswith('#'):
123123+ continue
124124+125125+ link_data = LinkData(
126126+ url=url,
127127+ entry_id=entry.id,
128128+ username=username
129129+ )
130130+131131+ links.append(link_data)
132132+133133+ return links
134134+135135+136136+@app.command()
137137+def links(
138138+ config_file: Optional[Path] = typer.Option(
139139+ Path("thicket.yaml"),
140140+ "--config",
141141+ "-c",
142142+ help="Path to configuration file",
143143+ ),
144144+ output_file: Optional[Path] = typer.Option(
145145+ None,
146146+ "--output",
147147+ "-o",
148148+ help="Path to output unified links file (default: links.json in git store)",
149149+ ),
150150+ verbose: bool = typer.Option(
151151+ False,
152152+ "--verbose",
153153+ "-v",
154154+ help="Show detailed progress information",
155155+ ),
156156+) -> None:
157157+ """Extract and categorize all outbound links from blog entries.
158158+159159+ This command analyzes all blog entries to extract outbound links,
160160+ resolve them properly with respect to the feed's base URL, and
161161+ categorize them as internal, user, or unknown links.
162162+163163+ Creates a unified links.json file containing all link data.
164164+ """
165165+ try:
166166+ # Load configuration
167167+ config = load_config(config_file)
168168+169169+ # Initialize Git store
170170+ git_store = GitStore(config.git_store)
171171+172172+ # Build user domain mapping
173173+ if verbose:
174174+ console.print("Building user domain mapping...")
175175+176176+ index = git_store._load_index()
177177+ user_domains = {}
178178+179179+ for username, user_metadata in index.users.items():
180180+ domains = set()
181181+182182+ # Add domains from feeds
183183+ for feed_url in user_metadata.feeds:
184184+ domain = urlparse(feed_url).netloc.lower()
185185+ if domain:
186186+ domains.add(domain)
187187+188188+ # Add domain from homepage
189189+ if user_metadata.homepage:
190190+ domain = urlparse(str(user_metadata.homepage)).netloc.lower()
191191+ if domain:
192192+ domains.add(domain)
193193+194194+ user_domains[username] = domains
195195+196196+ if verbose:
197197+ console.print(f"Found {len(user_domains)} users with {sum(len(d) for d in user_domains.values())} total domains")
198198+199199+ # Initialize components
200200+ link_extractor = LinkExtractor()
201201+ categorizer = LinkCategorizer(user_domains)
202202+203203+ # Get all users
204204+ users = list(index.users.keys())
205205+206206+ if not users:
207207+ console.print("[yellow]No users found in Git store[/yellow]")
208208+ raise typer.Exit(0)
209209+210210+ # Process all entries
211211+ all_links = []
212212+ link_categories = {"internal": [], "user": [], "unknown": []}
213213+ link_dict = {} # Dictionary with link URL as key, maps to list of atom IDs
214214+ reverse_dict = {} # Dictionary with atom ID as key, maps to list of URLs
215215+216216+ with Progress(
217217+ SpinnerColumn(),
218218+ TextColumn("[progress.description]{task.description}"),
219219+ BarColumn(),
220220+ TaskProgressColumn(),
221221+ console=console,
222222+ ) as progress:
223223+224224+ # Count total entries first
225225+ counting_task = progress.add_task("Counting entries...", total=len(users))
226226+ total_entries = 0
227227+228228+ for username in users:
229229+ entries = git_store.list_entries(username)
230230+ total_entries += len(entries)
231231+ progress.advance(counting_task)
232232+233233+ progress.remove_task(counting_task)
234234+235235+ # Process entries
236236+ processing_task = progress.add_task(
237237+ f"Processing {total_entries} entries...",
238238+ total=total_entries
239239+ )
240240+241241+ for username in users:
242242+ entries = git_store.list_entries(username)
243243+ user_metadata = index.users[username]
244244+245245+ # Get base URL for this user (use first feed URL)
246246+ base_url = str(user_metadata.feeds[0]) if user_metadata.feeds else "https://example.com"
247247+248248+ for entry in entries:
249249+ # Extract links from this entry
250250+ entry_links = link_extractor.extract_links_from_entry(entry, username, base_url)
251251+252252+ # Track unique links per entry
253253+ entry_urls_seen = set()
254254+255255+ # Categorize each link
256256+ for link_data in entry_links:
257257+ # Skip if we've already seen this URL in this entry
258258+ if link_data.url in entry_urls_seen:
259259+ continue
260260+ entry_urls_seen.add(link_data.url)
261261+262262+ category, target_username = categorizer.categorize_url(link_data.url, username)
263263+264264+ # Add to link dictionary (URL as key, maps to list of atom IDs)
265265+ if link_data.url not in link_dict:
266266+ link_dict[link_data.url] = []
267267+ if link_data.entry_id not in link_dict[link_data.url]:
268268+ link_dict[link_data.url].append(link_data.entry_id)
269269+270270+ # Also add to reverse mapping (atom ID -> list of URLs)
271271+ if link_data.entry_id not in reverse_dict:
272272+ reverse_dict[link_data.entry_id] = []
273273+ if link_data.url not in reverse_dict[link_data.entry_id]:
274274+ reverse_dict[link_data.entry_id].append(link_data.url)
275275+276276+ # Add category info to link data for categories tracking
277277+ link_info = link_data.to_dict()
278278+ link_info["category"] = category
279279+ link_info["target_username"] = target_username
280280+281281+ all_links.append(link_info)
282282+ link_categories[category].append(link_info)
283283+284284+ progress.advance(processing_task)
285285+286286+ if verbose and entry_links:
287287+ console.print(f" Found {len(entry_links)} links in {username}:{entry.title[:50]}...")
288288+289289+ # Determine output path
290290+ if output_file:
291291+ output_path = output_file
292292+ else:
293293+ output_path = config.git_store / "links.json"
294294+295295+ # Save all extracted links (not just filtered ones)
296296+ if verbose:
297297+ console.print("Preparing output data...")
298298+299299+ # Build a set of all URLs that correspond to posts in the git database
300300+ registered_urls = set()
301301+302302+ # Get all entries from all users and build URL mappings
303303+ for username in users:
304304+ entries = git_store.list_entries(username)
305305+ user_metadata = index.users[username]
306306+307307+ for entry in entries:
308308+ # Try to match entry URLs with extracted links
309309+ if hasattr(entry, 'link') and entry.link:
310310+ registered_urls.add(str(entry.link))
311311+312312+ # Also check entry alternate links if they exist
313313+ if hasattr(entry, 'links') and entry.links:
314314+ for link in entry.links:
315315+ if hasattr(link, 'href') and link.href:
316316+ registered_urls.add(str(link.href))
317317+318318+ # Build unified structure with metadata
319319+ unified_links = {}
320320+ reverse_mapping = {}
321321+322322+ for url, entry_ids in link_dict.items():
323323+ unified_links[url] = {
324324+ "referencing_entries": entry_ids
325325+ }
326326+327327+ # Find target username if this is a tracked post
328328+ if url in registered_urls:
329329+ for username in users:
330330+ user_domains_set = {domain for domain in user_domains.get(username, [])}
331331+ if any(domain in url for domain in user_domains_set):
332332+ unified_links[url]["target_username"] = username
333333+ break
334334+335335+ # Build reverse mapping
336336+ for entry_id in entry_ids:
337337+ if entry_id not in reverse_mapping:
338338+ reverse_mapping[entry_id] = []
339339+ if url not in reverse_mapping[entry_id]:
340340+ reverse_mapping[entry_id].append(url)
341341+342342+ # Create unified output data
343343+ output_data = {
344344+ "links": unified_links,
345345+ "reverse_mapping": reverse_mapping,
346346+ "user_domains": {k: list(v) for k, v in user_domains.items()}
347347+ }
348348+349349+ if verbose:
350350+ console.print(f"Found {len(registered_urls)} registered post URLs")
351351+ console.print(f"Found {len(link_dict)} total links, {sum(1 for link in unified_links.values() if 'target_username' in link)} tracked posts")
352352+353353+ # Save unified data
354354+ with open(output_path, "w") as f:
355355+ json.dump(output_data, f, indent=2, default=str)
356356+357357+ # Show summary
358358+ if not get_tsv_mode():
359359+ console.print("\n[green]โ Links extraction completed successfully[/green]")
360360+361361+ # Create summary table or TSV output
362362+ if get_tsv_mode():
363363+ print("Category\tCount\tDescription")
364364+ print(f"Internal\t{len(link_categories['internal'])}\tLinks to same user's domain")
365365+ print(f"User\t{len(link_categories['user'])}\tLinks to other tracked users")
366366+ print(f"Unknown\t{len(link_categories['unknown'])}\tLinks to external sites")
367367+ print(f"Total Extracted\t{len(all_links)}\tAll extracted links")
368368+ print(f"Saved to Output\t{len(output_data['links'])}\tLinks saved to output file")
369369+ print(f"Cross-references\t{sum(1 for link in unified_links.values() if 'target_username' in link)}\tLinks to registered posts only")
370370+ else:
371371+ table = Table(title="Links Summary")
372372+ table.add_column("Category", style="cyan")
373373+ table.add_column("Count", style="green")
374374+ table.add_column("Description", style="white")
375375+376376+ table.add_row("Internal", str(len(link_categories["internal"])), "Links to same user's domain")
377377+ table.add_row("User", str(len(link_categories["user"])), "Links to other tracked users")
378378+ table.add_row("Unknown", str(len(link_categories["unknown"])), "Links to external sites")
379379+ table.add_row("Total Extracted", str(len(all_links)), "All extracted links")
380380+ table.add_row("Saved to Output", str(len(output_data['links'])), "Links saved to output file")
381381+ table.add_row("Cross-references", str(sum(1 for link in unified_links.values() if 'target_username' in link)), "Links to registered posts only")
382382+383383+ console.print(table)
384384+385385+ # Show user links if verbose
386386+ if verbose and link_categories["user"]:
387387+ if get_tsv_mode():
388388+ print("User Link Source\tUser Link Target\tLink Count")
389389+ user_link_counts = {}
390390+391391+ for link in link_categories["user"]:
392392+ key = f"{link['username']} -> {link['target_username']}"
393393+ user_link_counts[key] = user_link_counts.get(key, 0) + 1
394394+395395+ for link_pair, count in sorted(user_link_counts.items(), key=lambda x: x[1], reverse=True)[:10]:
396396+ source, target = link_pair.split(" -> ")
397397+ print(f"{source}\t{target}\t{count}")
398398+ else:
399399+ console.print("\n[bold]User-to-user links:[/bold]")
400400+ user_link_counts = {}
401401+402402+ for link in link_categories["user"]:
403403+ key = f"{link['username']} -> {link['target_username']}"
404404+ user_link_counts[key] = user_link_counts.get(key, 0) + 1
405405+406406+ for link_pair, count in sorted(user_link_counts.items(), key=lambda x: x[1], reverse=True)[:10]:
407407+ console.print(f" {link_pair}: {count} links")
408408+409409+ if not get_tsv_mode():
410410+ console.print(f"\nUnified links data saved to: {output_path}")
411411+412412+ except Exception as e:
413413+ console.print(f"[red]Error extracting links: {e}[/red]")
414414+ if verbose:
415415+ console.print_exception()
416416+ raise typer.Exit(1)