🪻 distributed transcription service thistle.dunkirk.sh
fork

Configure Feed

Select the types of activity you want to include in your feed.

feat: cleanup after murmur jobs

dunkirk.sh b406ff86 0c268ddd

verified
+66 -5
+37 -1
CRUSH.md
··· 641 641 VALUES ('test-sub', <user_id>, 'test-customer', 'active'); 642 642 ``` 643 643 644 + ## Transcription Service Integration (Murmur) 645 + 646 + The application uses [Murmur](https://github.com/taciturnaxolotl/murmur) as the transcription backend. 647 + 648 + **Murmur API endpoints:** 649 + - `POST /transcribe` - Upload audio file and create transcription job 650 + - `GET /transcribe/:job_id` - Get job status and transcript (supports `?format=json|vtt`) 651 + - `GET /transcribe/:job_id/stream` - Stream real-time progress via Server-Sent Events 652 + - `GET /jobs` - List all jobs (newest first) 653 + - `DELETE /transcribe/:job_id` - Delete a job from Murmur's database 654 + 655 + **Job synchronization:** 656 + The `TranscriptionService` runs periodic syncs to reconcile state between our database and Murmur: 657 + - Reconnects to active jobs on server restart 658 + - Syncs status updates for processing/transcribing jobs 659 + - Handles completed jobs (fetches VTT, cleans transcript, saves to storage) 660 + - **Cleans up finished jobs** - After successful completion or failure, jobs are deleted from Murmur 661 + - **Cleans up orphaned jobs** - Jobs found in Murmur but not in our database are automatically deleted 662 + 663 + **Job cleanup:** 664 + - **Completed jobs**: After fetching transcript and saving to storage, the job is deleted from Murmur 665 + - **Failed jobs**: After recording the error in our database, the job is deleted from Murmur 666 + - **Orphaned jobs**: Jobs in Murmur but not in our database are deleted on discovery 667 + - All deletions use `DELETE /transcribe/:job_id` 668 + - This prevents Murmur's database from accumulating stale jobs (Murmur doesn't have automatic cleanup) 669 + - Logs success/failure of deletion attempts for monitoring 670 + 671 + **Job lifecycle:** 672 + 1. User uploads audio → creates transcription in our DB with `status='uploading'` 673 + 2. Audio uploaded to Murmur → get `whisper_job_id`, update to `status='processing'` 674 + 3. Murmur transcribes → stream progress updates, update to `status='transcribing'` 675 + 4. Job completes → fetch VTT, clean with LLM, save transcript, update to `status='completed'`, **delete from Murmur** 676 + 5. If job fails in Murmur → update to `status='failed'` with error message, **delete from Murmur** 677 + 678 + **Configuration:** 679 + Set `WHISPER_SERVICE_URL` in `.env` (default: `http://localhost:8000`) 680 + 644 681 ## Future Additions 645 682 646 683 As the codebase grows, document: 647 684 - Database schema and migrations 648 685 - API endpoint patterns 649 686 - Authentication/authorization approach 650 - - Transcription service integration details 651 687 - Deployment process 652 688 - Environment variables needed 653 689
+29 -4
src/lib/transcription.ts
··· 500 500 } 501 501 } 502 502 503 + private async deleteWhisperJob(jobId: string) { 504 + try { 505 + const response = await fetch( 506 + `${this.serviceUrl}/transcribe/${jobId}`, 507 + { 508 + method: "DELETE", 509 + }, 510 + ); 511 + if (response.ok) { 512 + console.log(`[Cleanup] Deleted job ${jobId} from Murmur`); 513 + } else { 514 + console.warn( 515 + `[Cleanup] Failed to delete job ${jobId}: ${response.status}`, 516 + ); 517 + } 518 + } catch (error) { 519 + console.error(`[Cleanup] Error deleting job ${jobId}:`, error); 520 + } 521 + } 522 + 503 523 private async handleOrphanedWhisperJob(jobId: string) { 504 524 // Check if this Murmur job_id exists in our DB (either as id or whisper_job_id) 505 525 const jobExists = this.db ··· 509 529 .get(jobId, jobId); 510 530 511 531 if (!jobExists) { 512 - // Not our job - Murmur will keep it until explicitly deleted 532 + // Not our job - delete it from Murmur 513 533 console.warn( 514 - `[Sync] Found orphaned job ${jobId} in Murmur (not in our DB)`, 534 + `[Sync] Found orphaned job ${jobId} in Murmur (not in our DB) - deleting...`, 515 535 ); 536 + await this.deleteWhisperJob(jobId); 516 537 } 517 538 } 518 539 ··· 564 585 status: "completed", 565 586 progress: 100, 566 587 }); 588 + 589 + // Clean up job from Murmur after successful completion 590 + await this.deleteWhisperJob(whisperJob.id); 567 591 } else if (details.status === "failed") { 568 592 const errorMessage = ( 569 593 details.error_message ?? "Transcription failed" ··· 579 603 progress: 0, 580 604 error_message: errorMessage, 581 605 }); 606 + 607 + // Clean up failed job from Murmur 608 + await this.deleteWhisperJob(whisperJob.id); 582 609 } 583 - 584 - // Job persists in Murmur until explicitly deleted - we just sync state 585 610 } catch { 586 611 console.warn( 587 612 `[Sync] Failed to retrieve details for job ${whisperJob.id}`,