index homepage, improved opengraphs, misc other stuff

Orual d3c5919b 058d1a2c

+637 -272
+13 -13
Cargo.lock
··· 1411 1411 1412 1412 [[package]] 1413 1413 name = "const-str" 1414 - version = "0.7.0" 1414 + version = "0.7.1" 1415 1415 source = "registry+https://github.com/rust-lang/crates.io-index" 1416 - checksum = "f4d34b8f066904ed7cfa4a6f9ee96c3214aa998cb44b69ca20bd2054f47402ed" 1416 + checksum = "b0664d2867b4a32697dfe655557f5c3b187e9b605b38612a748e5ec99811d160" 1417 1417 1418 1418 [[package]] 1419 1419 name = "const_format" ··· 2477 2477 "base64 0.22.1", 2478 2478 "bytes", 2479 2479 "ciborium", 2480 - "const-str 0.7.0", 2480 + "const-str 0.7.1", 2481 2481 "const_format", 2482 2482 "content_disposition", 2483 2483 "derive_more 2.1.0", ··· 5567 5567 [[package]] 5568 5568 name = "jacquard" 5569 5569 version = "0.9.4" 5570 - source = "git+https://tangled.org/@nonbinary.computer/jacquard#d5d29a337d8f08ae52fe7bb4e31f91f8b029ff48" 5570 + source = "git+https://tangled.org/@nonbinary.computer/jacquard#bf8f91add8747b64b1d1d74af4b358960f69d6e7" 5571 5571 dependencies = [ 5572 5572 "bytes", 5573 5573 "getrandom 0.2.16", ··· 5599 5599 [[package]] 5600 5600 name = "jacquard-api" 5601 5601 version = "0.9.2" 5602 - source = "git+https://tangled.org/@nonbinary.computer/jacquard#d5d29a337d8f08ae52fe7bb4e31f91f8b029ff48" 5602 + source = "git+https://tangled.org/@nonbinary.computer/jacquard#bf8f91add8747b64b1d1d74af4b358960f69d6e7" 5603 5603 dependencies = [ 5604 5604 "bon", 5605 5605 "bytes", ··· 5618 5618 [[package]] 5619 5619 name = "jacquard-axum" 5620 5620 version = "0.9.2" 5621 - source = "git+https://tangled.org/@nonbinary.computer/jacquard#d5d29a337d8f08ae52fe7bb4e31f91f8b029ff48" 5621 + source = "git+https://tangled.org/@nonbinary.computer/jacquard#bf8f91add8747b64b1d1d74af4b358960f69d6e7" 5622 5622 dependencies = [ 5623 5623 "axum", 5624 5624 "bytes", ··· 5640 5640 [[package]] 5641 5641 name = "jacquard-common" 5642 5642 version = "0.9.2" 5643 - source = "git+https://tangled.org/@nonbinary.computer/jacquard#d5d29a337d8f08ae52fe7bb4e31f91f8b029ff48" 5643 + source = "git+https://tangled.org/@nonbinary.computer/jacquard#bf8f91add8747b64b1d1d74af4b358960f69d6e7" 5644 5644 dependencies = [ 5645 5645 "base64 0.22.1", 5646 5646 "bon", ··· 5688 5688 [[package]] 5689 5689 name = "jacquard-derive" 5690 5690 version = "0.9.4" 5691 - source = "git+https://tangled.org/@nonbinary.computer/jacquard#d5d29a337d8f08ae52fe7bb4e31f91f8b029ff48" 5691 + source = "git+https://tangled.org/@nonbinary.computer/jacquard#bf8f91add8747b64b1d1d74af4b358960f69d6e7" 5692 5692 dependencies = [ 5693 5693 "heck 0.5.0", 5694 5694 "jacquard-lexicon", ··· 5700 5700 [[package]] 5701 5701 name = "jacquard-identity" 5702 5702 version = "0.9.2" 5703 - source = "git+https://tangled.org/@nonbinary.computer/jacquard#d5d29a337d8f08ae52fe7bb4e31f91f8b029ff48" 5703 + source = "git+https://tangled.org/@nonbinary.computer/jacquard#bf8f91add8747b64b1d1d74af4b358960f69d6e7" 5704 5704 dependencies = [ 5705 5705 "bon", 5706 5706 "bytes", ··· 5729 5729 [[package]] 5730 5730 name = "jacquard-lexicon" 5731 5731 version = "0.9.2" 5732 - source = "git+https://tangled.org/@nonbinary.computer/jacquard#d5d29a337d8f08ae52fe7bb4e31f91f8b029ff48" 5732 + source = "git+https://tangled.org/@nonbinary.computer/jacquard#bf8f91add8747b64b1d1d74af4b358960f69d6e7" 5733 5733 dependencies = [ 5734 5734 "cid", 5735 5735 "dashmap 6.1.0", ··· 5755 5755 [[package]] 5756 5756 name = "jacquard-oauth" 5757 5757 version = "0.9.2" 5758 - source = "git+https://tangled.org/@nonbinary.computer/jacquard#d5d29a337d8f08ae52fe7bb4e31f91f8b029ff48" 5758 + source = "git+https://tangled.org/@nonbinary.computer/jacquard#bf8f91add8747b64b1d1d74af4b358960f69d6e7" 5759 5759 dependencies = [ 5760 5760 "base64 0.22.1", 5761 5761 "bytes", ··· 5788 5788 [[package]] 5789 5789 name = "jacquard-repo" 5790 5790 version = "0.9.4" 5791 - source = "git+https://tangled.org/@nonbinary.computer/jacquard#d5d29a337d8f08ae52fe7bb4e31f91f8b029ff48" 5791 + source = "git+https://tangled.org/@nonbinary.computer/jacquard#bf8f91add8747b64b1d1d74af4b358960f69d6e7" 5792 5792 dependencies = [ 5793 5793 "bytes", 5794 5794 "cid", ··· 6746 6746 [[package]] 6747 6747 name = "mini-moka" 6748 6748 version = "0.10.99" 6749 - source = "git+https://tangled.org/@nonbinary.computer/jacquard#d5d29a337d8f08ae52fe7bb4e31f91f8b029ff48" 6749 + source = "git+https://tangled.org/@nonbinary.computer/jacquard#bf8f91add8747b64b1d1d74af4b358960f69d6e7" 6750 6750 dependencies = [ 6751 6751 "crossbeam-channel", 6752 6752 "crossbeam-utils",
+10 -16
crates/weaver-app/Dockerfile
··· 1 - # Build stage with cargo-chef for dependency caching 2 - FROM rust:1-trixie AS chef 3 - # Pin nightly version for reproducibility 4 - RUN rustup default nightly-2025-12-04 && rustup component add rust-src --toolchain nightly-2025-12-04 5 - RUN cargo install cargo-chef 1 + # Build stage 2 + FROM rust:1-trixie AS builder 3 + 6 4 WORKDIR /app 7 5 8 - FROM chef AS planner 9 - COPY . . 10 - RUN cargo chef prepare --recipe-path recipe.json 11 - 12 - FROM chef AS builder 6 + # Pin nightly version for reproducibility 7 + RUN rustup default nightly-2025-12-04 && rustup component add rust-src llvm-tools-preview --toolchain nightly-2025-12-04 13 8 14 9 # Install build dependencies 15 10 RUN apt-get update && apt-get install -y \ 16 11 pkg-config \ 17 12 libssl-dev \ 13 + clang \ 14 + llvm \ 15 + binutils \ 18 16 && rm -rf /var/lib/apt/lists/* 19 17 20 18 # Install wasm target and tools ··· 24 22 # Install dioxus-cli 25 23 RUN curl -L --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/cargo-bins/cargo-binstall/main/install-from-binstall-release.sh | bash 26 24 RUN cargo binstall dioxus-cli --root /usr/local -y --force 27 - 28 - # Cook dependencies first (cached layer) 29 - COPY --from=planner /app/recipe.json recipe.json 30 - RUN cargo chef cook --release --recipe-path recipe.json 31 25 32 26 # Copy source code 33 27 COPY . . ··· 51 45 --no-typescript 52 46 53 47 # Bundle the app 54 - RUN dx bundle --release --debug-symbols false -p weaver-app 48 + RUN dx bundle --verbose --release --debug-symbols false -p weaver-app 55 49 56 50 # Runtime stage 57 - FROM debian:bookworm-slim 51 + FROM debian:trixie-slim 58 52 59 53 RUN apt-get update && apt-get install -y \ 60 54 ca-certificates \
crates/weaver-app/assets/fonts/cmu-sans-serif/CMUSansSerif-Bold.ttf

This is a binary file and will not be displayed.

crates/weaver-app/assets/fonts/cmu-sans-serif/CMUSansSerif-Medium.ttf

This is a binary file and will not be displayed.

+1 -1
crates/weaver-app/assets/styling/theme-defaults.css
··· 56 56 font-family: "Adobe Caslon Pro"; 57 57 font-style: normal; 58 58 font-weight: bold; 59 - src: url("/assets/AdobeCaslonPro-Semibold.ttf") format("truetype"); 59 + src: url("/assets/AdobeCaslonPro-Bold.ttf") format("truetype"); 60 60 } 61 61 62 62 @font-face {
+1 -1
crates/weaver-app/src/env.rs
··· 19 19 #[allow(unused)] 20 20 pub const WEAVER_PRIVACY_POLICY_URI: &'static str = ""; 21 21 #[allow(unused)] 22 - pub const WEAVER_INDEXER_URL: &'static str = "http://localhost:3000"; 22 + pub const WEAVER_INDEXER_URL: &'static str = "https://index.weaver.sh"; 23 23 #[allow(unused)] 24 24 pub const WEAVER_INDEXER_DID: &'static str = "did:web:index.weaver.sh"; 25 25 #[allow(unused)]
+18 -8
crates/weaver-app/src/og/mod.rs
··· 134 134 fn get_fontdb() -> &'static fontdb::Database { 135 135 FONTDB.get_or_init(|| { 136 136 let mut db = fontdb::Database::new(); 137 - // Load IBM Plex Sans from embedded bytes 138 - let font_data = include_bytes!("../../assets/fonts/IBMPlexSans-VariableFont_wdth,wght.ttf"); 139 - db.load_font_data(font_data.to_vec()); 140 - // Load IBM Plex Sans Bold (static weight for proper bold rendering) 141 - let font_data = include_bytes!("../../assets/fonts/IBMPlexSans-Bold.ttf"); 142 - db.load_font_data(font_data.to_vec()); 143 - let font_data = include_bytes!("../../assets/fonts/ioskeley-mono/IoskeleyMono-Regular.ttf"); 144 - db.load_font_data(font_data.to_vec()); 137 + // Load CMU Sans Serif for headings/UI 138 + db.load_font_data( 139 + include_bytes!("../../assets/fonts/cmu-sans-serif/CMUSansSerif-Medium.ttf").to_vec(), 140 + ); 141 + db.load_font_data( 142 + include_bytes!("../../assets/fonts/cmu-sans-serif/CMUSansSerif-Bold.ttf").to_vec(), 143 + ); 144 + // Load Adobe Caslon Pro for body text 145 + db.load_font_data( 146 + include_bytes!("../../assets/fonts/adobe-caslon/AdobeCaslonPro-Regular.ttf").to_vec(), 147 + ); 148 + db.load_font_data( 149 + include_bytes!("../../assets/fonts/adobe-caslon/AdobeCaslonPro-Bold.ttf").to_vec(), 150 + ); 151 + // Load Ioskeley Mono for branding/handles 152 + db.load_font_data( 153 + include_bytes!("../../assets/fonts/ioskeley-mono/IoskeleyMono-Regular.ttf").to_vec(), 154 + ); 145 155 db 146 156 }) 147 157 }
+5 -5
crates/weaver-app/templates/og_hero_image.svg
··· 5 5 <!-- Bottom panel with theme colors --> 6 6 <rect x="0" y="420" width="1200" height="210" fill="#191724"/> 7 7 8 - <!-- Title --> 8 + <!-- Title - CMU Sans Serif --> 9 9 {% for line in title_lines %} 10 - <text x="60" y="{{ 480 + loop.index0 * 56 }}" fill="#c4a7e7" font-family="IBM Plex Sans, sans-serif" font-size="52" font-weight="900">{{ line }}</text> 10 + <text x="60" y="{{ 472 + loop.index0 * 56 }}" fill="#c4a7e7" font-family="CMU Sans Serif, sans-serif" font-size="52" font-weight="bold">{{ line }}</text> 11 11 {% endfor %} 12 12 13 - <!-- Notebook + Author row --> 14 - <text x="60" y="600" fill="#ebbcba" font-family="Ioskeley Mono, monospace" font-size="32">{{ notebook_title }} · @{{ author_handle }}</text> 13 + <!-- Notebook + Author row - flows after title --> 14 + <text x="60" y="{{ 472 + (title_lines.len() - 1) * 56 + 52 }}" fill="#ebbcba" font-family="Ioskeley Mono, monospace" font-size="32">{{ notebook_title }} · @{{ author_handle }}</text> 15 15 16 16 <!-- Weaver branding --> 17 - <text x="1080" y="600" fill="#908caa" font-family="Ioskeley Mono, monospace" font-size="20">weaver.sh</text> 17 + <text x="1060" y="600" fill="#908caa" font-family="Ioskeley Mono, monospace" font-size="24">weaver.sh</text> 18 18 </svg>
+7 -7
crates/weaver-app/templates/og_notebook.svg
··· 2 2 <!-- Background --> 3 3 <rect width="1200" height="630" fill="#191724"/> 4 4 5 - <!-- Notebook title (large, wrapped) --> 5 + <!-- Notebook title (large, wrapped) - CMU Sans Serif --> 6 6 {% for line in title_lines %} 7 - <text x="60" y="{{ 120 + loop.index0 * 64 }}" fill="#c4a7e7" font-family="IBM Plex Sans, sans-serif" font-size="64" font-weight="800">{{ line }}</text> 7 + <text x="60" y="{{ 120 + loop.index0 * 68 }}" fill="#c4a7e7" font-family="CMU Sans Serif, sans-serif" font-size="60" font-weight="bold">{{ line }}</text> 8 8 {% endfor %} 9 9 10 - <!-- Author + entry count --> 11 - <text x="60" y="280" fill="#ebbcba" font-family="Ioskeley Mono, monospace" font-size="32">@{{ author_handle }} · {{ entry_count }} {% if entry_count == 1 %}entry{% else %}entries{% endif %}</text> 10 + <!-- Author + entry count - flows after title --> 11 + <text x="60" y="{{ 120 + (title_lines.len() - 1) * 68 + 60 }}" fill="#ebbcba" font-family="Ioskeley Mono, monospace" font-size="32">@{{ author_handle }} · {{ entry_count }} {% if entry_count == 1 %}entry{% else %}entries{% endif %}</text> 12 12 13 - <!-- Entry titles list --> 13 + <!-- Entry titles list - Adobe Caslon Pro --> 14 14 {% for entry_title in entry_titles %} 15 - <text x="60" y="{{ 360 + loop.index0 * 44 }}" fill="#e0def4" font-family="IBM Plex Sans, sans-serif" font-size="28">{{ entry_title }}</text> 15 + <text x="60" y="{{ 120 + (title_lines.len() - 1) * 68 + 60 + 60 + loop.index0 * 46 }}" fill="#e0def4" font-family="Adobe Caslon Pro, Georgia, serif" font-size="30">{{ entry_title }}</text> 16 16 {% endfor %} 17 17 18 18 <!-- Weaver branding --> 19 - <text x="60" y="590" fill="#908caa" font-family="Ioskeley Mono, monospace" font-size="24">weaver.sh</text> 19 + <text x="60" y="590" fill="#908caa" font-family="Ioskeley Mono, monospace" font-size="28">weaver.sh</text> 20 20 </svg>
+8 -8
crates/weaver-app/templates/og_profile.svg
··· 12 12 <image xlink:href="{{ avatar_data.as_ref().unwrap() }}" x="940" y="100" width="200" height="200" clip-path="url(#avatar-clip)" preserveAspectRatio="xMidYMid slice"/> 13 13 {% endif %} 14 14 15 - <!-- Display name (large) --> 15 + <!-- Display name (large) - CMU Sans Serif --> 16 16 {% for line in display_name_lines %} 17 - <text x="60" y="{{ 180 + loop.index0 * 64 }}" fill="#c4a7e7" font-family="IBM Plex Sans, sans-serif" font-size="64" font-weight="900">{{ line }}</text> 17 + <text x="60" y="{{ 160 + loop.index0 * 68 }}" fill="#c4a7e7" font-family="CMU Sans Serif, sans-serif" font-size="60" font-weight="bold">{{ line }}</text> 18 18 {% endfor %} 19 19 20 - <!-- Handle --> 21 - <text x="60" y="260" fill="#ebbcba" font-family="Ioskeley Mono, monospace" font-size="36">@{{ handle }}</text> 20 + <!-- Handle - flows after display name --> 21 + <text x="60" y="{{ 160 + (display_name_lines.len() - 1) * 68 + 56 }}" fill="#ebbcba" font-family="Ioskeley Mono, monospace" font-size="36">@{{ handle }}</text> 22 22 23 - <!-- Bio snippet --> 23 + <!-- Bio snippet - Adobe Caslon Pro --> 24 24 {% for line in bio_lines %} 25 - <text x="60" y="{{ 340 + loop.index0 * 44 }}" fill="#e0def4" font-family="IBM Plex Sans, sans-serif" font-size="32" font-weight="400">{{ line }}</text> 25 + <text x="60" y="{{ 160 + (display_name_lines.len() - 1) * 68 + 56 + 60 + loop.index0 * 44 }}" fill="#e0def4" font-family="Adobe Caslon Pro, Georgia, serif" font-size="32">{{ line }}</text> 26 26 {% endfor %} 27 27 28 28 <!-- Notebook count --> 29 - <text x="60" y="540" fill="#908caa" font-family="IBM Plex Sans, sans-serif" font-size="32" font-weight="400">{{ notebook_count }} {% if notebook_count == 1 %}notebook{% else %}notebooks{% endif %}</text> 29 + <text x="60" y="540" fill="#908caa" font-family="CMU Sans Serif, sans-serif" font-size="32">{{ notebook_count }} {% if notebook_count == 1 %}notebook{% else %}notebooks{% endif %}</text> 30 30 31 31 <!-- Weaver branding --> 32 - <text x="60" y="600" fill="#908caa" font-family="Ioskeley Mono, monospace" font-size="24">weaver.sh</text> 32 + <text x="60" y="600" fill="#908caa" font-family="Ioskeley Mono, monospace" font-size="28">weaver.sh</text> 33 33 </svg>
+7 -7
crates/weaver-app/templates/og_profile_banner.svg
··· 15 15 <image xlink:href="{{ avatar_data.as_ref().unwrap() }}" x="940" y="215" width="200" height="200" clip-path="url(#avatar-clip)" preserveAspectRatio="xMidYMid slice"/> 16 16 {% endif %} 17 17 18 - <!-- Display name --> 18 + <!-- Display name - CMU Sans Serif --> 19 19 {% for line in display_name_lines %} 20 - <text x="60" y="{{ 400 + loop.index0 * 64 }}" fill="#c4a7e7" font-family="IBM Plex Sans, sans-serif" font-size="64" font-weight="900">{{ line }}</text> 20 + <text x="60" y="{{ 390 + loop.index0 * 60 }}" fill="#c4a7e7" font-family="CMU Sans Serif, sans-serif" font-size="56" font-weight="bold">{{ line }}</text> 21 21 {% endfor %} 22 22 23 - <!-- Handle + notebook count --> 24 - <text x="60" y="450" fill="#ebbcba" font-family="Ioskeley Mono, monospace" font-size="36">@{{ handle }} · {{ notebook_count }} {% if notebook_count == 1 %}notebook{% else %}notebooks{% endif %}</text> 23 + <!-- Handle + notebook count - flows after display name --> 24 + <text x="60" y="{{ 390 + (display_name_lines.len() - 1) * 60 + 52 }}" fill="#ebbcba" font-family="Ioskeley Mono, monospace" font-size="32">@{{ handle }} · {{ notebook_count }} {% if notebook_count == 1 %}notebook{% else %}notebooks{% endif %}</text> 25 25 26 - <!-- Bio snippet --> 26 + <!-- Bio snippet - Adobe Caslon Pro --> 27 27 {% if bio_lines.first().is_some() %} 28 - <text x="60" y="500" fill="#e0def4" font-family="IBM Plex Sans, sans-serif" font-size="28" font-weight="400">{{ bio_lines.first().unwrap() }}</text> 28 + <text x="60" y="{{ 390 + (display_name_lines.len() - 1) * 60 + 52 + 48 }}" fill="#e0def4" font-family="Adobe Caslon Pro, Georgia, serif" font-size="28">{{ bio_lines.first().unwrap() }}</text> 29 29 {% endif %} 30 30 31 31 <!-- Weaver branding --> 32 - <text x="60" y="600" fill="#908caa" font-family="Ioskeley Mono, monospace" font-size="24">weaver.sh</text> 32 + <text x="60" y="600" fill="#908caa" font-family="Ioskeley Mono, monospace" font-size="28">weaver.sh</text> 33 33 </svg>
+3 -3
crates/weaver-app/templates/og_site.svg
··· 3 3 <rect width="1200" height="630" fill="#191724"/> 4 4 5 5 <!-- Weaver title --> 6 - <text x="60" y="280" fill="#c4a7e7" font-family="Ioskeley Mono, monospace" font-size="120" font-weight="700">Weaver</text> 6 + <text x="60" y="280" fill="#c4a7e7" font-family="Ioskeley Mono, monospace" font-size="126" font-weight="bold">Weaver</text> 7 7 8 8 <!-- Tagline --> 9 - <text x="60" y="380" fill="#ebbcba" font-family="Ioskeley Mono, monospace" font-size="40">Share your words, your way.</text> 9 + <text x="60" y="380" fill="#ebbcba" font-family="Ioskeley Mono, monospace" font-size="44">Share your words, your way.</text> 10 10 11 11 <!-- Branding --> 12 - <text x="60" y="590" fill="#908caa" font-family="Ioskeley Mono, monospace" font-size="24">weaver.sh</text> 12 + <text x="60" y="590" fill="#908caa" font-family="Ioskeley Mono, monospace" font-size="28">weaver.sh</text> 13 13 </svg>
+7 -9
crates/weaver-app/templates/og_text_only.svg
··· 2 2 <!-- Background --> 3 3 <rect width="1200" height="630" fill="#191724"/> 4 4 5 - <!-- Entry title (large, wrapped) --> 5 + <!-- Entry title (large, wrapped) - CMU Sans Serif --> 6 6 {% for line in title_lines %} 7 - <text x="60" y="{{ 140 + loop.index0 * 56 }}" fill="#c4a7e7" font-family="IBM Plex Sans, sans-serif" font-size="64" font-weight="800">{{ line }}</text> 7 + <text x="60" y="{{ 120 + loop.index0 * 68 }}" fill="#c4a7e7" font-family="CMU Sans Serif, sans-serif" font-size="60" font-weight="bold">{{ line }}</text> 8 8 {% endfor %} 9 9 10 - <!-- Notebook title + Author --> 11 - <text x="60" y="320" fill="#ebbcba" font-family="Ioskeley Mono, monospace" font-size="32">{{ notebook_title }} · @{{ author_handle }}</text> 10 + <!-- Notebook title + Author - flows after title --> 11 + <text x="60" y="{{ 120 + (title_lines.len() - 1) * 68 + 60 }}" fill="#ebbcba" font-family="Ioskeley Mono, monospace" font-size="32">{{ notebook_title }} · @{{ author_handle }}</text> 12 12 13 - <!-- Content snippet --> 13 + <!-- Content snippet - Adobe Caslon Pro --> 14 14 {% for line in content_lines %} 15 - <text x="60" y="{{ 380 + loop.index0 * 36 }}" fill="#e0def4" font-family="IBM Plex Sans, sans-serif" font-size="28">{{ line }}</text> 15 + <text x="60" y="{{ 120 + (title_lines.len() - 1) * 68 + 60 + 56 + loop.index0 * 40 }}" fill="#e0def4" font-family="Adobe Caslon Pro, Georgia, serif" font-size="30">{{ line }}</text> 16 16 {% endfor %} 17 17 18 - 19 - 20 18 <!-- Weaver branding --> 21 - <text x="60" y="590" fill="#908caa" font-family="Ioskeley Mono, monospace" font-size="24">weaver.sh</text> 19 + <text x="60" y="590" fill="#908caa" font-family="Ioskeley Mono, monospace" font-size="28">weaver.sh</text> 22 20 </svg>
+35 -2
crates/weaver-common/src/agent.rs
··· 7 7 // Re-export jacquard for convenience 8 8 use crate::constellation::{GetBacklinksQuery, RecordId}; 9 9 use crate::error::WeaverError; 10 + #[allow(unused_imports)] 11 + use crate::{PublishResult, W_TICKER, normalize_title_path}; 10 12 pub use jacquard; 11 13 use jacquard::bytes::Bytes; 14 + #[allow(unused_imports)] 12 15 use jacquard::client::{AgentError, AgentErrorKind, AgentSession, AgentSessionExt}; 13 16 use jacquard::error::ClientError; 14 17 use jacquard::prelude::*; 15 18 use jacquard::smol_str::SmolStr; 16 19 use jacquard::types::blob::{BlobRef, MimeType}; 17 20 use jacquard::types::string::{AtUri, Did, RecordKey, Rkey}; 21 + #[allow(unused_imports)] 18 22 use jacquard::types::tid::Tid; 19 23 use jacquard::types::uri::Uri; 20 24 use jacquard::url::Url; 21 25 use jacquard::{CowStr, IntoStatic}; 22 26 use mime_sniffer::MimeTypeSniffer; 27 + #[allow(unused_imports)] 23 28 use std::path::Path; 24 29 use weaver_api::com_atproto::repo::strong_ref::StrongRef; 25 30 use weaver_api::sh_weaver::notebook::entry; 26 31 use weaver_api::sh_weaver::publish::blob::Blob as PublishedBlob; 27 - 28 - use crate::{PublishResult, W_TICKER, normalize_title_path}; 29 32 30 33 const CONSTELLATION_URL: &str = "https://constellation.microcosm.blue"; 31 34 ··· 473 476 } 474 477 475 478 /// Fetch an entry and construct EntryView 479 + #[cfg(feature = "use-index")] 480 + fn fetch_entry_view<'a>( 481 + &self, 482 + _notebook: &NotebookView<'a>, 483 + entry_ref: &StrongRef<'_>, 484 + ) -> impl Future<Output = Result<EntryView<'a>, WeaverError>> 485 + where 486 + Self: Sized, 487 + { 488 + async move { 489 + use weaver_api::sh_weaver::notebook::get_entry::GetEntry; 490 + 491 + let resp = self 492 + .send(GetEntry::new().uri(entry_ref.uri.clone()).build()) 493 + .await 494 + .map_err(|e| AgentError::from(ClientError::from(e)))?; 495 + 496 + let output = resp.into_output().map_err(|e| { 497 + AgentError::from(ClientError::invalid_request(format!( 498 + "Failed to get entry: {}", 499 + e 500 + ))) 501 + })?; 502 + 503 + Ok(output.value.into_static()) 504 + } 505 + } 506 + 507 + /// Fetch an entry and construct EntryView 508 + #[cfg(not(feature = "use-index"))] 476 509 fn fetch_entry_view<'a>( 477 510 &self, 478 511 notebook: &NotebookView<'a>,
+1 -1
crates/weaver-index/Cargo.toml
··· 25 25 26 26 # AT Protocol / Jacquard 27 27 jacquard = { workspace = true, features = ["websocket", "zstd", "dns", "cache"] } 28 - jacquard-common = { workspace = true, features = ["crypto-k256"] } 28 + jacquard-common = { workspace = true, features = ["service-auth","crypto-ed25519"] } 29 29 jacquard-repo = { workspace = true } 30 30 jacquard-axum = { workspace = true } 31 31
+3 -24
crates/weaver-index/migrations/clickhouse/001_raw_records.sql
··· 1 1 -- Raw records from firehose/jetstream 2 2 -- Core table for all AT Protocol records before denormalization 3 - -- 4 - -- Append-only log using plain MergeTree - all versions preserved for audit/rollback. 5 - -- Query-time deduplication via ORDER BY + LIMIT or window functions. 6 - -- JSON column stores full record, extract fields only when needed for ORDER BY/WHERE/JOINs 7 3 8 4 CREATE TABLE IF NOT EXISTS raw_records ( 9 5 -- Decomposed AT URI components (at://did/collection/rkey) 10 6 did String, 11 7 collection LowCardinality(String), 12 8 rkey String, 13 - 14 - -- Content identifier from the record (content-addressed hash) 15 9 cid String, 16 - 17 10 -- Repository revision (TID) - monotonically increasing per DID, used for ordering 18 11 rev String, 19 - 20 - -- Full record as native JSON (schema-flexible, queryable with record.field.subfield) 21 12 record JSON, 22 - 23 - -- Operation: 'create', 'update', 'delete', 'cache' (fetched on-demand) 13 + -- Operation: 'create', 'update', 'delete', ('cache' - fetched on-demand) 24 14 operation LowCardinality(String), 25 - 26 - -- Firehose sequence number (metadata only, not for ordering - can jump on relay restart) 15 + -- Firehose sequence number 27 16 seq UInt64, 28 - 29 17 -- Event timestamp from firehose 30 18 event_time DateTime64(3), 31 - 32 - -- When we indexed this record 19 + -- When the database indexed this record 33 20 indexed_at DateTime64(3) DEFAULT now64(3), 34 - 35 21 -- Validation state: 'unchecked', 'valid', 'invalid_rev', 'invalid_gap', 'invalid_account' 36 - -- Populated by async batch validation, not in hot path 37 22 validation_state LowCardinality(String) DEFAULT 'unchecked', 38 - 39 23 -- Whether this came from live firehose (true) or backfill (false) 40 - -- Backfill events may not reflect current state until repo is fully synced 41 24 is_live Bool DEFAULT true, 42 - 43 25 -- Materialized AT URI for convenience 44 26 uri String MATERIALIZED concat('at://', did, '/', collection, '/', rkey), 45 - 46 27 -- Projection for fast delete lookups by (did, cid) 47 - -- Delete events include CID, so we can O(1) lookup the original record 48 - -- to know what to decrement (e.g., which notebook's like count) 49 28 PROJECTION by_did_cid ( 50 29 SELECT * ORDER BY (did, cid) 51 30 )
+1 -1
crates/weaver-index/migrations/clickhouse/002_identity_events.sql
··· 5 5 -- The DID this identity event is about 6 6 did String, 7 7 8 - -- Handle (may be empty if cleared) 8 + -- Handle (may be empty) 9 9 handle String, 10 10 11 11 -- Sequence number from firehose
-1
crates/weaver-index/migrations/clickhouse/003_account_events.sql
··· 1 1 -- Account events from firehose (#account messages) 2 - -- Tracks account status changes: active, deactivated, deleted, suspended, takendown 3 2 4 3 CREATE TABLE IF NOT EXISTS raw_account_events ( 5 4 -- The DID this account event is about
-1
crates/weaver-index/migrations/clickhouse/005_firehose_cursor.sql
··· 2 2 -- Tracks our position in the firehose stream for resumption after restart 3 3 4 4 CREATE TABLE IF NOT EXISTS firehose_cursor ( 5 - -- Consumer identifier (allows multiple consumers with different cursors) 6 5 consumer_id String, 7 6 8 7 -- Last successfully processed sequence number
-3
crates/weaver-index/migrations/clickhouse/006_account_rev_state.sql
··· 1 1 -- Per-account revision state tracking 2 2 -- Maintains latest rev/cid per DID for dedup and gap detection 3 - -- 4 - -- AggregatingMergeTree with incremental MV from raw_records 5 - -- Query with argMaxMerge/maxMerge to finalize aggregates 6 3 7 4 CREATE TABLE IF NOT EXISTS account_rev_state ( 8 5 -- Account DID
+1 -2
crates/weaver-index/migrations/clickhouse/007_account_rev_state_mv.sql
··· 1 - -- Incremental MV: fires on each insert to raw_records, maintains aggregate state 2 - -- Must be created after both account_rev_state (target) and raw_records (source) exist 1 + 3 2 4 3 CREATE MATERIALIZED VIEW IF NOT EXISTS account_rev_state_mv TO account_rev_state AS 5 4 SELECT
-2
crates/weaver-index/migrations/clickhouse/010_handle_mappings_account_mv.sql
··· 1 1 -- Auto-populate freed status from account events 2 - -- JOINs against handle_mappings to find current handle for the DID 3 - -- If no mapping exists yet, the JOIN fails silently (can't free unknown handles) 4 2 5 3 CREATE MATERIALIZED VIEW IF NOT EXISTS handle_mappings_from_account_mv TO handle_mappings AS 6 4 SELECT
-3
crates/weaver-index/migrations/clickhouse/011_profiles_weaver.sql
··· 1 1 -- Weaver profile source table 2 - -- Populated by MV from raw_records, merged into profiles by refreshable MV 3 2 4 3 CREATE TABLE IF NOT EXISTS profiles_weaver ( 5 4 did String, 6 - 7 - -- Raw profile JSON 8 5 profile String, 9 6 10 7 -- Extracted fields for coalescing
+1 -1
crates/weaver-index/migrations/clickhouse/012_profiles_weaver_mv.sql
··· 3 3 CREATE MATERIALIZED VIEW IF NOT EXISTS profiles_weaver_mv TO profiles_weaver AS 4 4 SELECT 5 5 did, 6 - toString(record) as profile, 6 + record as profile, 7 7 coalesce(record.displayName, '') as display_name, 8 8 coalesce(record.description, '') as description, 9 9 coalesce(record.avatar.ref.`$link`, '') as avatar_cid,
-2
crates/weaver-index/migrations/clickhouse/013_profiles_bsky.sql
··· 1 1 -- Bluesky profile source table 2 - -- Populated by MV from raw_records, merged into profiles by refreshable MV 3 2 4 3 CREATE TABLE IF NOT EXISTS profiles_bsky ( 5 4 did String, 6 5 7 - -- Raw profile JSON 8 6 profile String, 9 7 10 8 -- Extracted fields for coalescing
+1 -1
crates/weaver-index/migrations/clickhouse/014_profiles_bsky_mv.sql
··· 3 3 CREATE MATERIALIZED VIEW IF NOT EXISTS profiles_bsky_mv TO profiles_bsky AS 4 4 SELECT 5 5 did, 6 - toString(record) as profile, 6 + record as profile, 7 7 coalesce(record.displayName, '') as display_name, 8 8 coalesce(record.description, '') as description, 9 9 coalesce(record.avatar.ref.`$link`, '') as avatar_cid,
-4
crates/weaver-index/migrations/clickhouse/015_profiles.sql
··· 1 1 -- Unified profiles view 2 - -- Refreshable MV that merges weaver + bsky profiles with handle resolution 3 - -- Queries are pure reads, no merge computation needed 4 - 5 2 CREATE MATERIALIZED VIEW IF NOT EXISTS profiles 6 3 REFRESH EVERY 1 MINUTE 7 4 ENGINE = ReplacingMergeTree(indexed_at) ··· 9 6 AS SELECT 10 7 if(w.did != '', w.did, b.did) as did, 11 8 12 - -- Handle from handle_mappings (empty if not resolved yet) 13 9 coalesce(h.handle, '') as handle, 14 10 15 11 -- Raw profiles per source
-2
crates/weaver-index/migrations/clickhouse/017_notebooks.sql
··· 25 25 26 26 -- Soft delete (epoch = not deleted) 27 27 deleted_at DateTime64(3) DEFAULT toDateTime64(0, 3), 28 - 29 - -- Full record JSON for hydration 30 28 record JSON DEFAULT '{}' 31 29 ) 32 30 ENGINE = ReplacingMergeTree(indexed_at)
+1 -3
crates/weaver-index/migrations/clickhouse/019_notebook_counts.sql
··· 1 - -- Notebook engagement counts 2 - -- Updated by MVs from likes, bookmarks, subscriptions (added later with graph tables) 3 - -- Joined with notebooks at query time 1 + -- Notebook engagement counts table stub 4 2 5 3 CREATE TABLE IF NOT EXISTS notebook_counts ( 6 4 did String,
-2
crates/weaver-index/migrations/clickhouse/020_entries.sql
··· 24 24 25 25 -- Soft delete (epoch = not deleted) 26 26 deleted_at DateTime64(3) DEFAULT toDateTime64(0, 3), 27 - 28 - -- Full record JSON for hydration 29 27 record JSON DEFAULT '{}' 30 28 ) 31 29 ENGINE = ReplacingMergeTree(indexed_at)
+1 -3
crates/weaver-index/migrations/clickhouse/022_entry_counts.sql
··· 1 - -- Entry engagement counts 2 - -- Updated by MVs from likes, bookmarks (added later with graph tables) 3 - -- Joined with entries at query time 1 + -- Entry engagement counts table stub 4 2 5 3 CREATE TABLE IF NOT EXISTS entry_counts ( 6 4 did String,
-1
crates/weaver-index/migrations/clickhouse/023_drafts.sql
··· 1 1 -- Draft stub records 2 - -- Anchors for unpublished content, enables draft discovery via queries 3 2 4 3 CREATE TABLE IF NOT EXISTS drafts ( 5 4 -- Identity
+2 -4
crates/weaver-index/migrations/clickhouse/025_edit_nodes.sql
··· 16 16 node_type LowCardinality(String), -- 'root' or 'diff' 17 17 18 18 -- Resource being edited (extracted from doc.value) 19 - -- One of these will be populated depending on doc type 20 19 resource_type LowCardinality(String) DEFAULT '', -- 'entry', 'notebook', 'draft' 21 20 resource_did String DEFAULT '', 22 21 resource_rkey String DEFAULT '', 23 22 resource_collection LowCardinality(String) DEFAULT '', 24 23 25 - -- For diffs: reference to root (StrongRef) 24 + -- For diffs: reference to root 26 25 root_did String DEFAULT '', 27 26 root_rkey String DEFAULT '', 28 27 root_cid String DEFAULT '', 29 28 30 - -- For diffs: reference to previous node (StrongRef) 29 + -- For diffs: reference to previous node 31 30 prev_did String DEFAULT '', 32 31 prev_rkey String DEFAULT '', 33 32 prev_cid String DEFAULT '', 34 33 35 - -- Whether this has inline diff data vs blob snapshot 36 34 has_inline_diff UInt8 DEFAULT 0, 37 35 has_snapshot UInt8 DEFAULT 0, 38 36
+5 -3
crates/weaver-index/migrations/clickhouse/026_edit_roots_mv.sql
··· 16 16 '' 17 17 ) as resource_type, 18 18 19 - -- Extract resource DID (parse from URI or empty) 19 + -- Extract resource DID 20 20 multiIf( 21 21 toString(record.doc.value.entry.uri) != '', 22 22 splitByChar('/', replaceOne(toString(record.doc.value.entry.uri), 'at://', ''))[1], 23 23 toString(record.doc.value.notebook.uri) != '', 24 24 splitByChar('/', replaceOne(toString(record.doc.value.notebook.uri), 'at://', ''))[1], 25 + toString(record.doc.value.draftKey) != '', 26 + splitByChar('/', replaceOne(toString(record.doc.value.draftKey), 'at://', ''))[1], 25 27 '' 26 28 ) as resource_did, 27 29 28 - -- Extract resource rkey (parse from URI or use draftKey) 30 + -- Extract resource rkey 29 31 multiIf( 30 32 toString(record.doc.value.entry.uri) != '', 31 33 splitByChar('/', replaceOne(toString(record.doc.value.entry.uri), 'at://', ''))[3], 32 34 toString(record.doc.value.notebook.uri) != '', 33 35 splitByChar('/', replaceOne(toString(record.doc.value.notebook.uri), 'at://', ''))[3], 34 36 toString(record.doc.value.draftKey) != '', 35 - toString(record.doc.value.draftKey), 37 + splitByChar('/', replaceOne(toString(record.doc.value.draftKey), 'at://', ''))[3], 36 38 '' 37 39 ) as resource_rkey, 38 40
+3 -1
crates/weaver-index/migrations/clickhouse/027_edit_diffs_mv.sql
··· 22 22 splitByChar('/', replaceOne(toString(record.doc.value.entry.uri), 'at://', ''))[1], 23 23 toString(record.doc.value.notebook.uri) != '', 24 24 splitByChar('/', replaceOne(toString(record.doc.value.notebook.uri), 'at://', ''))[1], 25 + toString(record.doc.value.draftKey) != '', 26 + splitByChar('/', replaceOne(toString(record.doc.value.draftKey), 'at://', ''))[1], 25 27 '' 26 28 ) as resource_did, 27 29 ··· 32 34 toString(record.doc.value.notebook.uri) != '', 33 35 splitByChar('/', replaceOne(toString(record.doc.value.notebook.uri), 'at://', ''))[3], 34 36 toString(record.doc.value.draftKey) != '', 35 - toString(record.doc.value.draftKey), 37 + splitByChar('/', replaceOne(toString(record.doc.value.draftKey), 'at://', ''))[3], 36 38 '' 37 39 ) as resource_rkey, 38 40
-1
crates/weaver-index/migrations/clickhouse/038_notebook_entries.sql
··· 1 1 -- Notebook entries mapping (denormalized for reverse lookup) 2 2 -- Maps entries to the notebooks that contain them 3 - -- Enables reverse lookup: find notebooks containing an entry 4 3 5 4 CREATE TABLE IF NOT EXISTS notebook_entries ( 6 5 -- Entry being referenced
-5
crates/weaver-index/migrations/clickhouse/039_notebook_entries_mv.sql
··· 1 1 -- Populate notebook_entries from notebooks 2 - -- Extracts entry references from the entryList in notebook records 3 - -- Incremental MV: triggers on INSERT to notebooks, writes to notebook_entries 4 2 5 3 CREATE MATERIALIZED VIEW IF NOT EXISTS notebook_entries_mv 6 4 TO notebook_entries 7 5 AS 8 6 SELECT 9 - -- Parse entry URI to extract did and rkey 10 - -- URI format: at://did:plc:xxx/sh.weaver.notebook.entry/rkey 11 - -- assumeNotNull is safe here because WHERE filters guarantee non-null 12 7 assumeNotNull(extract(entry_uri, 'at://([^/]+)/')) as entry_did, 13 8 assumeNotNull(extract(entry_uri, '/sh\\.weaver\\.notebook\\.entry/([^/]+)$')) as entry_rkey, 14 9
+43 -50
crates/weaver-index/src/clickhouse/queries/notebooks.rs
··· 133 133 Ok(row) 134 134 } 135 135 136 - /// List entries for a notebook's author (did). 136 + /// List entries for a specific notebook, ordered by position in the notebook. 137 137 /// 138 - /// Note: This is a simplified version. The full implementation would 139 - /// need to join with notebook's entryList to get proper ordering. 140 - /// For now, we just list entries by the same author, ordered by rkey (notebook order). 138 + /// Uses notebook_entries table to get entries that belong to this notebook. 141 139 pub async fn list_notebook_entries( 142 140 &self, 143 - did: &str, 141 + notebook_did: &str, 142 + notebook_rkey: &str, 144 143 limit: u32, 145 - cursor: Option<&str>, 144 + cursor: Option<u32>, 146 145 ) -> Result<Vec<EntryRow>, IndexError> { 147 - // Note: rkey ordering is intentional here - it's the notebook's entry order 148 - let query = if cursor.is_some() { 149 - r#" 150 - SELECT did, rkey, cid, uri, title, path, tags, author_dids, created_at, indexed_at, record 151 - FROM ( 152 - SELECT did, rkey, cid, uri, title, path, tags, author_dids, created_at, updated_at, indexed_at, record, 153 - ROW_NUMBER() OVER (PARTITION BY rkey ORDER BY updated_at DESC) as rn 154 - FROM entries FINAL 155 - WHERE did = ? 156 - AND deleted_at = toDateTime64(0, 3) 157 - AND rkey > ? 158 - ) 159 - WHERE rn = 1 160 - ORDER BY rkey ASC 161 - LIMIT ? 162 - "# 163 - } else { 164 - r#" 165 - SELECT did, rkey, cid, uri, title, path, tags, author_dids, created_at, indexed_at, record 166 - FROM ( 167 - SELECT did, rkey, cid, uri, title, path, tags, author_dids, created_at, updated_at, indexed_at, record, 168 - ROW_NUMBER() OVER (PARTITION BY rkey ORDER BY updated_at DESC) as rn 169 - FROM entries FINAL 170 - WHERE did = ? 171 - AND deleted_at = toDateTime64(0, 3) 172 - ) 173 - WHERE rn = 1 174 - ORDER BY rkey ASC 175 - LIMIT ? 176 - "# 177 - }; 146 + let query = r#" 147 + SELECT 148 + e.did AS did, 149 + e.rkey AS rkey, 150 + e.cid AS cid, 151 + e.uri AS uri, 152 + e.title AS title, 153 + e.path AS path, 154 + e.tags AS tags, 155 + e.author_dids AS author_dids, 156 + e.created_at AS created_at, 157 + e.indexed_at AS indexed_at, 158 + e.record AS record 159 + FROM notebook_entries ne FINAL 160 + INNER JOIN entries e ON 161 + e.did = ne.entry_did 162 + AND e.rkey = ne.entry_rkey 163 + AND e.deleted_at = toDateTime64(0, 3) 164 + WHERE ne.notebook_did = ? 165 + AND ne.notebook_rkey = ? 166 + AND ne.position > ? 167 + ORDER BY ne.position ASC 168 + LIMIT ? 169 + "#; 178 170 179 - let mut q = self.inner().query(query).bind(did); 180 - 181 - if let Some(c) = cursor { 182 - q = q.bind(c); 183 - } 171 + let cursor_val = cursor.unwrap_or(0); 184 172 185 - let rows = 186 - q.bind(limit) 187 - .fetch_all::<EntryRow>() 188 - .await 189 - .map_err(|e| ClickHouseError::Query { 190 - message: "failed to list notebook entries".into(), 191 - source: e, 192 - })?; 173 + let rows = self 174 + .inner() 175 + .query(query) 176 + .bind(notebook_did) 177 + .bind(notebook_rkey) 178 + .bind(cursor_val) 179 + .bind(limit) 180 + .fetch_all::<EntryRow>() 181 + .await 182 + .map_err(|e| ClickHouseError::Query { 183 + message: "failed to list notebook entries".into(), 184 + source: e, 185 + })?; 193 186 194 187 Ok(rows) 195 188 }
+48
crates/weaver-index/src/endpoints/bsky.rs
··· 1 + //! app.bsky.* passthrough endpoints 2 + //! 3 + //! These forward requests to the Bluesky appview. 4 + 5 + use axum::{Json, extract::State}; 6 + use jacquard::prelude::*; 7 + use jacquard_axum::ExtractXrpc; 8 + use weaver_api::app_bsky::actor::get_profile::{GetProfileOutput, GetProfileRequest}; 9 + use weaver_api::app_bsky::feed::get_posts::{GetPostsOutput, GetPostsRequest}; 10 + 11 + use crate::endpoints::repo::XrpcErrorResponse; 12 + use crate::server::AppState; 13 + 14 + /// Handle app.bsky.actor.getProfile (passthrough) 15 + pub async fn get_profile( 16 + State(state): State<AppState>, 17 + ExtractXrpc(args): ExtractXrpc<GetProfileRequest>, 18 + ) -> Result<Json<GetProfileOutput<'static>>, XrpcErrorResponse> { 19 + let response = state.resolver.send(args).await.map_err(|e| { 20 + tracing::warn!("Appview getProfile failed: {}", e); 21 + XrpcErrorResponse::internal_error("Failed to fetch profile from appview") 22 + })?; 23 + 24 + let output = response.into_output().map_err(|e| { 25 + tracing::warn!("Failed to parse getProfile response: {}", e); 26 + XrpcErrorResponse::internal_error("Failed to parse appview response") 27 + })?; 28 + 29 + Ok(Json(output)) 30 + } 31 + 32 + /// Handle app.bsky.feed.getPosts (passthrough) 33 + pub async fn get_posts( 34 + State(state): State<AppState>, 35 + ExtractXrpc(args): ExtractXrpc<GetPostsRequest>, 36 + ) -> Result<Json<GetPostsOutput<'static>>, XrpcErrorResponse> { 37 + let response = state.resolver.send(args).await.map_err(|e| { 38 + tracing::warn!("Appview getPosts failed: {}", e); 39 + XrpcErrorResponse::internal_error("Failed to fetch posts from appview") 40 + })?; 41 + 42 + let output = response.into_output().map_err(|e| { 43 + tracing::warn!("Failed to parse getPosts response: {}", e); 44 + XrpcErrorResponse::internal_error("Failed to parse appview response") 45 + })?; 46 + 47 + Ok(Json(output)) 48 + }
+1 -1
crates/weaver-index/src/endpoints/edit.rs
··· 21 21 use crate::clickhouse::{EditNodeRow, ProfileRow}; 22 22 use crate::endpoints::actor::{Viewer, resolve_actor}; 23 23 use crate::endpoints::collab::profile_to_view_basic; 24 - use crate::endpoints::resolve_uri; 25 24 use crate::endpoints::repo::XrpcErrorResponse; 25 + use crate::endpoints::resolve_uri; 26 26 use crate::server::AppState; 27 27 28 28 /// Handle sh.weaver.edit.getEditHistory
+29
crates/weaver-index/src/endpoints/identity.rs
··· 1 + //! com.atproto.identity.* endpoint handlers 2 + 3 + use axum::{Json, extract::State}; 4 + use jacquard::IntoStatic; 5 + use jacquard::types::ident::AtIdentifier; 6 + use jacquard_axum::ExtractXrpc; 7 + use weaver_api::com_atproto::identity::resolve_handle::{ 8 + ResolveHandleOutput, ResolveHandleRequest, 9 + }; 10 + 11 + use crate::endpoints::actor::resolve_actor; 12 + use crate::endpoints::repo::XrpcErrorResponse; 13 + use crate::server::AppState; 14 + 15 + /// Handle com.atproto.identity.resolveHandle 16 + pub async fn resolve_handle( 17 + State(state): State<AppState>, 18 + ExtractXrpc(args): ExtractXrpc<ResolveHandleRequest>, 19 + ) -> Result<Json<ResolveHandleOutput<'static>>, XrpcErrorResponse> { 20 + let did = resolve_actor(&state, &AtIdentifier::Handle(args.handle)).await?; 21 + 22 + Ok(Json( 23 + ResolveHandleOutput { 24 + did: did.into_static(), 25 + extra_data: None, 26 + } 27 + .into_static(), 28 + )) 29 + }
+2
crates/weaver-index/src/endpoints/mod.rs
··· 12 12 use self::repo::XrpcErrorResponse; 13 13 14 14 pub mod actor; 15 + pub mod bsky; 15 16 pub mod collab; 16 17 pub mod edit; 18 + pub mod identity; 17 19 pub mod notebook; 18 20 pub mod repo; 19 21
+27 -29
crates/weaver-index/src/endpoints/notebook.rs
··· 45 45 let did_str = did.as_str(); 46 46 let name = args.name.as_ref(); 47 47 48 - // Fetch notebook and entries in parallel - both just need the DID 49 48 let limit = args.entry_limit.unwrap_or(50).clamp(1, 100) as u32; 50 - let cursor = args.entry_cursor.as_deref(); 49 + let cursor: Option<u32> = args 50 + .entry_cursor 51 + .as_deref() 52 + .and_then(|c| c.parse().ok()); 51 53 52 - let (notebook_result, entries_result) = tokio::try_join!( 53 - async { 54 - state 55 - .clickhouse 56 - .resolve_notebook(did_str, name) 57 - .await 58 - .map_err(|e| { 59 - tracing::error!("Failed to resolve notebook: {}", e); 60 - XrpcErrorResponse::internal_error("Database query failed") 61 - }) 62 - }, 63 - async { 64 - state 65 - .clickhouse 66 - .list_notebook_entries(did_str, limit + 1, cursor) 67 - .await 68 - .map_err(|e| { 69 - tracing::error!("Failed to list entries: {}", e); 70 - XrpcErrorResponse::internal_error("Database query failed") 71 - }) 72 - } 73 - )?; 54 + // Fetch notebook first to get its rkey 55 + let notebook_row = state 56 + .clickhouse 57 + .resolve_notebook(did_str, name) 58 + .await 59 + .map_err(|e| { 60 + tracing::error!("Failed to resolve notebook: {}", e); 61 + XrpcErrorResponse::internal_error("Database query failed") 62 + })? 63 + .ok_or_else(|| XrpcErrorResponse::not_found("Notebook not found"))?; 74 64 75 - let notebook_row = 76 - notebook_result.ok_or_else(|| XrpcErrorResponse::not_found("Notebook not found"))?; 77 - let entry_rows = entries_result; 65 + // Now fetch entries using notebook's rkey 66 + let entry_rows = state 67 + .clickhouse 68 + .list_notebook_entries(did_str, &notebook_row.rkey, limit + 1, cursor) 69 + .await 70 + .map_err(|e| { 71 + tracing::error!("Failed to list entries: {}", e); 72 + XrpcErrorResponse::internal_error("Database query failed") 73 + })?; 78 74 79 75 // Fetch notebook contributors (evidence-based) 80 76 let notebook_contributors = state ··· 183 179 entries.push(book_entry); 184 180 } 185 181 186 - // Build cursor for pagination 182 + // Build cursor for pagination (position-based) 187 183 let next_cursor = if has_more { 188 - entry_rows.last().map(|e| e.rkey.to_string().into()) 184 + // Position = cursor offset + number of entries returned 185 + let last_position = cursor.unwrap_or(0) + entry_rows.len() as u32; 186 + Some(last_position.to_string().into()) 189 187 } else { 190 188 None 191 189 };
+127
crates/weaver-index/src/landing.html
··· 1 + <!doctype html> 2 + <html lang="en"> 3 + <head> 4 + <meta charset="utf-8" /> 5 + <meta name="viewport" content="width=device-width, initial-scale=1" /> 6 + <title>Weaver Index</title> 7 + <style> 8 + @font-face { 9 + font-family: "Ioskeley Mono"; 10 + font-style: normal; 11 + font-weight: normal; 12 + src: url("/assets/IoskeleyMono-Regular.woff2") format("woff2"); 13 + } 14 + :root { 15 + --color-base: #faf4ed; 16 + --color-surface: #fffaf3; 17 + --color-text: #1f1d2e; 18 + --color-muted: #635e74; 19 + --color-subtle: #4a4560; 20 + --color-primary: #907aa9; 21 + --color-secondary: #56949f; 22 + --color-link: #d7827e; 23 + --color-border: #908caa; 24 + } 25 + @media (prefers-color-scheme: dark) { 26 + :root { 27 + --color-base: #191724; 28 + --color-surface: #1f1d2e; 29 + --color-text: #e0def4; 30 + --color-muted: #6e6a86; 31 + --color-subtle: #908caa; 32 + --color-primary: #c4a7e7; 33 + --color-secondary: #3e8fb0; 34 + --color-link: #ebbcba; 35 + --color-border: #403d52; 36 + } 37 + } 38 + * { 39 + box-sizing: border-box; 40 + margin: 0; 41 + padding: 0; 42 + } 43 + body { 44 + font-family: "Ioskeley Mono", "IBM Plex Mono", "Berkeley Mono", Consolas, monospace; 45 + font-size: 14px; 46 + line-height: 1.6; 47 + color: var(--color-text); 48 + background: var(--color-base); 49 + max-width: 700px; 50 + margin: 0 auto; 51 + padding: 3rem 1.5rem; 52 + } 53 + pre { 54 + font-size: 0.7rem; 55 + line-height: 1.1; 56 + color: var(--color-primary); 57 + margin-bottom: 1.5rem; 58 + } 59 + h1 { 60 + font-size: 1.25rem; 61 + font-weight: 600; 62 + color: var(--color-secondary); 63 + margin-bottom: 0.25rem; 64 + } 65 + .subtitle { 66 + color: var(--color-muted); 67 + margin-bottom: 1.5rem; 68 + } 69 + p { 70 + margin-bottom: 1rem; 71 + } 72 + a { 73 + color: var(--color-link); 74 + text-decoration: none; 75 + } 76 + a:hover { 77 + text-decoration: underline; 78 + } 79 + code { 80 + background: var(--color-surface); 81 + padding: 0.125rem 0.375rem; 82 + border-radius: 3px; 83 + border: 1px solid var(--color-border); 84 + } 85 + ul { 86 + list-style: none; 87 + margin-top: 1.5rem; 88 + padding-top: 1rem; 89 + border-top: 1px solid var(--color-border); 90 + } 91 + li { 92 + margin-bottom: 0.5rem; 93 + } 94 + li::before { 95 + content: "→ "; 96 + color: var(--color-muted); 97 + } 98 + </style> 99 + </head> 100 + <body> 101 + <pre> 102 + 103 + .dP' db `Yb .dP' 104 + dP' db db 88 dP' 105 + 88 106 + `Yb d888b 'Yb 'Yb 88 d88b d88b 'Yb `Yb dP' 107 + 88P 88 88 88 88P 8Y 8b 88 Yb dP 108 + 88 8P 88 88 88 8P 88 88 YbdP 109 + 88 .dP .8P .8P 88 .dP' .dP' .8P .8P 110 + .88888888b. 8888888888888b. dP' b 111 + Y. ,P 112 + `""'</pre 113 + > 114 + <h1>Weaver Index</h1> 115 + <p class="subtitle">AT Protocol Record Index</p> 116 + <p>This is an AT Protocol record index serving the Weaver writing platform.</p> 117 + <p>Most API endpoints are available under <code>/xrpc/</code>.</p> 118 + <ul> 119 + <li>Web App (alpha): <a href="https://alpha.weaver.sh">alpha.weaver.sh</a></li> 120 + <li> 121 + Source Code: 122 + <a href="https://tangled.org/@nonbinary.computer/weaver">tangled.org/@nonbinary.computer/weaver</a> 123 + </li> 124 + <li><a href="https://atproto.com">AT Protocol</a></li> 125 + </ul> 126 + </body> 127 + </html>
+59 -4
crates/weaver-index/src/server.rs
··· 1 1 use std::net::SocketAddr; 2 2 use std::sync::Arc; 3 3 4 - use axum::{Json, Router, extract::State, http::StatusCode, response::IntoResponse, routing::get}; 4 + use axum::{ 5 + Json, Router, 6 + extract::State, 7 + http::{StatusCode, header}, 8 + response::{Html, IntoResponse}, 9 + routing::get, 10 + }; 5 11 use jacquard::api::com_atproto::repo::{ 6 12 get_record::GetRecordRequest, list_records::ListRecordsRequest, 7 13 }; ··· 16 22 use tower_http::cors::CorsLayer; 17 23 use tower_http::trace::TraceLayer; 18 24 use tracing::info; 25 + use weaver_api::app_bsky::actor::get_profile::GetProfileRequest as BskyGetProfileRequest; 26 + use weaver_api::app_bsky::feed::get_posts::GetPostsRequest as BskyGetPostsRequest; 27 + use weaver_api::com_atproto::identity::resolve_handle::ResolveHandleRequest; 19 28 use weaver_api::sh_weaver::actor::{ 20 29 get_actor_entries::GetActorEntriesRequest, get_actor_notebooks::GetActorNotebooksRequest, 21 30 get_profile::GetProfileRequest, ··· 35 44 36 45 use crate::clickhouse::Client; 37 46 use crate::config::ShardConfig; 38 - use crate::endpoints::{actor, collab, edit, notebook, repo}; 47 + use crate::endpoints::{actor, bsky, collab, edit, identity, notebook, repo}; 39 48 use crate::error::{IndexError, ServerError}; 40 49 use crate::sqlite::ShardRouter; 41 50 ··· 59 68 Self { 60 69 clickhouse: Arc::new(clickhouse), 61 70 shards: Arc::new(ShardRouter::new(shard_config.base_path)), 62 - resolver: UnauthenticatedSession::new_slingshot(), 71 + resolver: UnauthenticatedSession::new_public(), 63 72 service_did, 64 73 } 65 74 } ··· 84 93 /// Build the axum router with all XRPC endpoints 85 94 pub fn router(state: AppState, did_doc: DidDocument<'static>) -> Router { 86 95 Router::new() 87 - // did:web document 96 + .route("/", get(landing)) 97 + .route( 98 + "/assets/IoskeleyMono-Regular.woff2", 99 + get(font_ioskeley_regular), 100 + ) 101 + .route("/assets/IoskeleyMono-Bold.woff2", get(font_ioskeley_bold)) 102 + .route( 103 + "/assets/IoskeleyMono-Italic.woff2", 104 + get(font_ioskeley_italic), 105 + ) 88 106 .route("/xrpc/_health", get(health)) 89 107 .route("/metrics", get(metrics)) 108 + // com.atproto.identity.* endpoints 109 + .merge(ResolveHandleRequest::into_router(identity::resolve_handle)) 90 110 // com.atproto.repo.* endpoints (record cache) 91 111 .merge(GetRecordRequest::into_router(repo::get_record)) 92 112 .merge(ListRecordsRequest::into_router(repo::list_records)) 113 + // app.bsky.* passthrough endpoints 114 + .merge(BskyGetProfileRequest::into_router(bsky::get_profile)) 115 + .merge(BskyGetPostsRequest::into_router(bsky::get_posts)) 93 116 // sh.weaver.actor.* endpoints 94 117 .merge(GetProfileRequest::into_router(actor::get_profile)) 95 118 .merge(GetActorNotebooksRequest::into_router( ··· 136 159 /// Prometheus metrics endpoint 137 160 async fn metrics() -> String { 138 161 telemetry::render() 162 + } 163 + 164 + // Embedded font files 165 + const IOSKELEY_MONO_REGULAR: &[u8] = 166 + include_bytes!("../../weaver-app/assets/fonts/ioskeley-mono/IoskeleyMono-Regular.woff2"); 167 + const IOSKELEY_MONO_BOLD: &[u8] = 168 + include_bytes!("../../weaver-app/assets/fonts/ioskeley-mono/IoskeleyMono-Bold.woff2"); 169 + const IOSKELEY_MONO_ITALIC: &[u8] = 170 + include_bytes!("../../weaver-app/assets/fonts/ioskeley-mono/IoskeleyMono-Italic.woff2"); 171 + 172 + /// Serve the Ioskeley Mono Regular font 173 + async fn font_ioskeley_regular() -> impl IntoResponse { 174 + ( 175 + [(header::CONTENT_TYPE, "font/woff2")], 176 + IOSKELEY_MONO_REGULAR, 177 + ) 178 + } 179 + /// Serve the Ioskeley Mono Regular font 180 + async fn font_ioskeley_bold() -> impl IntoResponse { 181 + ([(header::CONTENT_TYPE, "font/woff2")], IOSKELEY_MONO_BOLD) 182 + } 183 + 184 + /// Serve the Ioskeley Mono Regular font 185 + async fn font_ioskeley_italic() -> impl IntoResponse { 186 + ([(header::CONTENT_TYPE, "font/woff2")], IOSKELEY_MONO_ITALIC) 187 + } 188 + 189 + const LANDING_HTML: &str = include_str!("./landing.html"); 190 + 191 + /// Landing page 192 + async fn landing() -> Html<&'static str> { 193 + Html(LANDING_HTML) 139 194 } 140 195 141 196 /// Health check response
+2 -5
crates/weaver-index/src/service_identity.rs
··· 102 102 /// Encode the public key as a multikey string 103 103 fn encode_public_key(signing_key: &SigningKey) -> String { 104 104 let verifying_key = signing_key.verifying_key(); 105 - let point = verifying_key.to_encoded_point(true); // compressed 106 - let bytes = point.as_bytes(); 105 + let bytes = verifying_key.to_sec1_bytes(); 107 106 // 0xE7 is the multicodec for secp256k1-pub 108 - multikey(0xE7, bytes) 107 + multikey(0xE7, bytes.as_ref()) 109 108 } 110 109 111 110 /// Get the public key multibase string ··· 126 125 "@context": [ 127 126 "https://www.w3.org/ns/did/v1", 128 127 "https://w3id.org/security/multikey/v1", 129 - "https://w3id.org/security/suites/secp256k1-2019/v1" 130 128 ], 131 129 "id": did_str, 132 130 "verificationMethod": [{ ··· 152 150 "@context": [ 153 151 "https://www.w3.org/ns/did/v1", 154 152 "https://w3id.org/security/multikey/v1", 155 - "https://w3id.org/security/suites/secp256k1-2019/v1" 156 153 ], 157 154 "id": did_str, 158 155 "verificationMethod": [{
+1 -4
crates/weaver-index/src/sqlite.rs
··· 113 113 } 114 114 115 115 pub fn last_accessed(&self) -> Instant { 116 - self.last_accessed 117 - .lock() 118 - .map(|t| *t) 119 - .unwrap_or_else(|_| Instant::now()) 116 + self.last_accessed.lock().map(|t| *t).expect("poisoned") 120 117 } 121 118 122 119 /// Execute a read operation on the shard
+12 -9
docker-compose.yml
··· 22 22 TAP_BIND: ":2480" 23 23 TAP_DISABLE_ACKS: "false" 24 24 TAP_LOG_LEVEL: info 25 - TAP_SIGNAL_COLLECTION: sh.tangled.actor.profile 26 - TAP_COLLECTION_FILTERS: "sh.weaver.*,app.bsky.actor.profile,sh.tangled.*,pub.leaflet.*" 25 + TAP_OUTBOX_PARALLELISM: 5 26 + #TAP_FULL_NETWORK: true 27 + TAP_SIGNAL_COLLECTION: place.stream.chat.profile 28 + TAP_COLLECTION_FILTERS: "sh.weaver.*,app.bsky.actor.profile,sh.tangled.*,pub.leaflet.*,net.anisota.*,place.stream.*" 27 29 healthcheck: 28 30 test: ["CMD", "wget", "-q", "--spider", "http://localhost:2480/health"] 29 31 interval: 20s ··· 32 34 restart: unless-stopped 33 35 34 36 # Weaver indexer - consumes from tap 35 - indexer: 36 - container_name: weaver-indexer 37 + index: 38 + container_name: weaver-index 37 39 image: ${REGISTRY_HOST:-localhost}:5000/weaver-index:latest 38 40 ports: 39 41 - "3000:3000" 42 + command: ["run"] 43 + volumes: 44 + - index_data:/app/data 40 45 environment: 41 46 RUST_LOG: info,weaver_index=debug,hyper_util::client::legacy::pool=info 42 47 CLICKHOUSE_URL: ${CLICKHOUSE_URL} ··· 47 52 TAP_URL: ws://tap:2480/channel 48 53 TAP_SEND_ACKS: "true" 49 54 FIREHOSE_RELAY_URL: wss://bsky.network 50 - INDEXER_COLLECTIONS: "sh.weaver.*,app.bsky.actor.profile,sh.tangled.*,pub.leaflet.*" 55 + INDEXER_COLLECTIONS: "sh.weaver.*,app.bsky.actor.profile,sh.tangled.*,pub.leaflet.*,net.anisota.*,place.stream.*" 51 56 depends_on: 52 57 tap: 53 58 condition: service_healthy 54 59 healthcheck: 55 - test: ["CMD", "wget", "-q", "--spider", "http://localhost:3000/xrpc/_health"] 60 + test: ["CMD", "wget", "-q", "-O", "/dev/null", "http://localhost:3000/xrpc/_health"] 56 61 interval: 20s 57 62 timeout: 5s 58 63 retries: 3 ··· 68 73 PORT: 8080 69 74 IP: 0.0.0.0 70 75 RUST_LOG: info 71 - depends_on: 72 - indexer: 73 - condition: service_healthy 74 76 healthcheck: 75 77 test: ["CMD", "wget", "-q", "--spider", "http://localhost:8080/"] 76 78 interval: 20s ··· 81 83 volumes: 82 84 registry_data: 83 85 tap_data: 86 + index_data:
+8 -19
weaver_notes/.obsidian/workspace.json
··· 41 41 "state": { 42 42 "type": "markdown", 43 43 "state": { 44 - "file": "Why I rewrote pdsls in Rust (tm).md", 44 + "file": "Writing the AppView Last.md", 45 45 "mode": "source", 46 46 "source": false 47 47 }, 48 48 "icon": "lucide-file", 49 - "title": "Why I rewrote pdsls in Rust (tm)" 50 - } 51 - }, 52 - { 53 - "id": "4b584137400d2323", 54 - "type": "leaf", 55 - "state": { 56 - "type": "release-notes", 57 - "state": { 58 - "currentVersion": "1.10.6" 59 - }, 60 - "icon": "lucide-book-up", 61 - "title": "Release Notes 1.10.6" 49 + "title": "Writing the AppView Last" 62 50 } 63 51 } 64 52 ], 65 - "currentTab": 3 53 + "currentTab": 2 66 54 } 67 55 ], 68 56 "direction": "vertical" ··· 211 199 "bases:Create new base": false 212 200 } 213 201 }, 214 - "active": "4b584137400d2323", 202 + "active": "6029beecc3d03bce", 215 203 "lastOpenFiles": [ 204 + "diff_record.png", 205 + "Why I rewrote pdsls in Rust (tm).md", 206 + "Writing the AppView Last.md", 216 207 "light_mode_excerpt.png", 217 208 "notebook_entry_preview.png", 218 209 "xkcd_345_excerpt.png", 219 210 "bug notes.md", 220 - "Why I rewrote pdsls in Rust (tm).md", 221 211 "meta.png", 222 212 "Pasted image 20251114125028.png", 223 213 "invalid_record.png", ··· 226 216 "json_editor_with_errors.png", 227 217 "pretty_editor.png", 228 218 "Arch.md", 229 - "Weaver - Long-form writing.md", 230 - "weaver_photo_med.jpg" 219 + "Weaver - Long-form writing.md" 231 220 ] 232 221 }
+143
weaver_notes/Writing the AppView Last.md
··· 1 + If you've been to this site before, you maybe noticed it loaded a fair bit more quickly this time. That's not really because the web server creating this HTML got a whole lot better. It did require some refactoring, but it was mostly in the vein of taking some code and adding new code that did the same thing gated behind a cargo feature. This did, however, have the side effect of, in the final binary, replacing functions that are literally hundreds of lines, that in turn call functions that may also be hundreds of lines, making several cascading network requests, with functions that look like this, which make by and large a single network request and return exactly what is required. 2 + 3 + ```rust 4 + #[cfg(feature = "use-index")] 5 + fn fetch_entry_view( 6 + &self, 7 + entry_ref: &StrongRef<'_>, 8 + ) -> impl Future<Output = Result<EntryView<'static>, WeaverError>> 9 + where 10 + Self: Sized, 11 + { 12 + async move { 13 + use weaver_api::sh_weaver::notebook::get_entry::GetEntry; 14 + 15 + let resp = self 16 + .send(GetEntry::new().uri(entry_ref.uri.clone()).build()) 17 + .await 18 + .map_err(|e| AgentError::from(ClientError::from(e)))?; 19 + 20 + let output = resp.into_output().map_err(|e| { 21 + AgentError::xrpc(e.into)) 22 + })?; 23 + 24 + Ok(output.value.into_static()) 25 + } 26 + } 27 + ``` 28 + 29 + Of course the reason is that I finally got round to building the Weaver AppView. I'm going to be calling mine the Index, because Weaver is about writing and I think "AppView" as a term kind of sucks and "index" is much more elegant, on top of being a good descriptor of what the big backend service now powering Weaver does. ![[at://did:plc:ragtjsm2j2vknwkz3zp4oxrd/app.bsky.feed.post/3lyucxfxq622w]] 30 + For the uninitiated, because I expect at least some people reading this aren't big into AT Protocol development, an AppView is an instance of the kind of big backend service that Bluesky PBLLC runs which powers essentially every Bluesky client, with a few notable exceptions, such as [Red Dwarf](https://reddwarf.app/), and (partially, eventually more completely) [Blacksky](https://blacksky.community/). It listens to the [Firehose](https://bsky.network/) [event stream](https://atproto.com/specs/event-stream) from the main Bluesky Relay and analyzes the data which comes through that pertains to Bluesky, producing your timeline feeds, figuring out who follows you, who you block and who blocks you (and filtering them out of your view of the app), how many people liked your last post, and so on. Because the records in your PDS (and those of all the other people on Bluesky) need context and relationship and so on to give them meaning, and then that context can be passed along to you without your app having to go collect it all. ![[at://did:plc:uu5axsmbm2or2dngy4gwchec/app.bsky.feed.post/3lsc2tzfsys2f]] 31 + It's a very normal backend with some weird constraints because of the protocol, and in it's practice the thing that separates the day-to-day Bluesky experience from the Mastodon experience the most. It's also by far the most centralising force in the network, because it also does moderation, and because it's quite expensive to run. A full index of all Bluesky activity takes a lot of storage (futur's Zeppelin experiment detailed above took about 16 terabytes of storage using PostgreSQL for the database and cost $200/month to run), and then it takes that much more computing power to calculate all the relationships between the data on the fly as new events come in and then serve personalized versions to everyone that uses it. 32 + 33 + It's not the only AppView out there, most atproto apps have something like this. Tangled, Streamplace, Leaflet, and so on all have substantial backends. Some (like Tangled) actually combine the front end you interact with and the AppView into a single service. But in general these are big, complicated persistent services you have to backfill from existing data to bootstrap, and they really strongly shape your app, whether they're literally part of the same executable or hosted on the same server or not. And when I started building Weaver in earnest, not only did I still have a few big unanswered questions about how I wanted Weaver to work, how it needed to work, I also didn't want to fundamentally tie it to some big server, create this centralising force. I wanted it to be possible for someone else to run it without being dependent on me personally, ideally possible even if all they had access to was a static site host like GitHub Pages or a browser runtime platform like Cloudflare Workers, so long as someone somewhere was running a couple of generic services. I wanted to be able to distribute the fullstack server version as basically just an executable in a directory of files with no other dependencies, which could easily be run in any container hosting environment with zero persistent storage required. Hell, you could technically serve it as a blob or series of blobs from your PDS with the right entry point if I did my job right. 34 + 35 + I succeeded. 36 + 37 + Well, I don't know if you can serve `weaver-app` purely via `com.atproto.sync.getBlob` request, but it doesn't need much. 38 + ## Constellation 39 + ![[at://did:plc:ttdrpj45ibqunmfhdsb4zdwq/app.bsky.feed.post/3m6pckslkt222]] Ana's leaflet does a good job of explaining more or less how Weaver worked up until now. It used direct requests to personal data servers (mostly mine) as well as many calls to [Constellation](https://constellation.microcosm.blue/) and [Slingshot](https://slingshot.microcosm.blue/), and some even to [UFOs](https://ufos.microcosm.blue/), plus a couple of judicious calls to the Bluesky AppView for profiles and post embeds. ![[at://did:plc:hdhoaan3xa3jiuq4fg4mefid/app.bsky.feed.post/3m5jzclsvpc2c]] 40 + The three things linked above are generic services that provide back-links, a record cache, and a running feed of the most recent instances of all lexicons on the network, respectively. That's more than enough to build an app with, though it's not always easy. For some things it can be pretty straightforward. Constellation can tell you what notebooks an entry is in. It can tell you which edit history records are related to this notebook entry. For single-layer relationships it's straightforward. However you then have to also fetch the records individually, because it doesn't provide you the records, just the URIs you need to find them. Slingshot doesn't currently have an endpoint that will batch fetch a list of URIs for you. And the PDS only has endpoints like [`com.atproto.repo.listRecords`](https://docs.bsky.app/docs/api/com-atproto-repo-list-records), which gives you a paginated list of all records of a specific type, but doesn't let you narrow that down easily, so you have to page through until you find what you wanted. 41 + 42 + This wouldn't be too bad if I was fine with almost everything after the hostname in my web URLs being gobbledegook record keys, but I wanted people to be able to link within a notebook like they normally would if they were linking within an Obsidian Vault, by name or by path, something human-readable. So some queries became the good old N+1 requests, because I had to list a lot of records and fetch them until I could find the one that matched. Or worse still, particularly once I introduce collaboration and draft syncing to the editor. Loading a draft of an entry with a lot of edit history could take 100 or more requests, to check permissions, find all the edit records, figure out which ones mattered, publish the collaboration session record, check for collaborators, and so on. It was pretty slow going, particularly when one could not pre-fetch and cache and generate everything server-side on a real CPU rather than in a browser after downloading a nice chunk of WebAssembly code. My profile page [alpha.weaver.sh/nonbinary.computer](https://alpha.weaver.sh/nonbinary.computer) often took quite some time to load due to a frustrating quirk of Dioxus, the Rust web framework I've used for the front-end, which prevented server-side rendering from waiting until everything important had been fetched to render the complete page on that specific route, forcing me to load it client-side. 43 + 44 + Some stuff is just complicated to graph out, to find and pull all the relevant data together in order, and some connections aren't the kinds of things you can graph generically. For example, in order to work without any sort of service that has access to indefinite authenticated sessions of more than one person at once, Weaver handles collaborative writing and publishing by having each collaborator write to their own repository and publish there, and then, when the published version is requested, figuring out which version of an entry or notebook is most up-to-date, and displaying that one. It matches by record key across more than one repository, determined at request time by the state of multiple other records in those users' repositories. 45 + 46 + # Shape of Data 47 + All of that being said, this was still the correct route, particularly for me. Because not only does this provide a powerful fallback mode, built-in protection against me going AWOL, it was critical in the design process of the index. My friend Ollie, when talking about database and API design, always says that, regardless of the specific technology you use, you need to structure your data based on how you need to query into it. Whatever interface you put in front of it, be it GraphQL, SQL, gRPC, XRPC, server functions, AJAX, literally any way that you can have the part of your app that people interact with pull the specific data they want from where it's stored, how well that performs, how many cycles your server or client spends collecting it, sorting it, or waiting on it, how much memory it takes, how much bandwidth it takes, depends on how that data is shaped, and you, when you are designing your app and all the services that go into it, get to choose that shape. 48 + 49 + Bluesky developers have said that hydrating blocks, mutes, and labels and applying the appropriate ones to the feed content based on the preferences of the user takes quite a bit of compute at scale, and that even the seemingly simple [Following feed](https://jazco.dev/2025/02/19/imperfection/), which is mostly a reverse-chronological feed of posts by people you follow explicitly (plus a few simple rules), is remarkably resource-intensive to produce for them. The extremely clever [string interning](https://jazco.dev/2025/09/26/interning/) and [bitmap tricks](https://jazco.dev/2024/04/20/roaring-bitmaps/) implemented by a brilliant engineer during their time at Bluesky are all oriented toward figuring out the most efficient way to structure the data to make the desired query emerge naturally from it. ![Roaring Bitmaps Diagram from the Original Publication at https://arxiv.org/pdf/1709.07821](https://jazco.dev/public/images/2025-09-26/roaring_bitmaps_diagram.png) 50 + 51 + It's intuitive that this matters a lot when you use something like RocksDB, or FoundationDB, or Redis, which are fundamentally key-value stores. What your key contains there determines almost everything about how easy it is to find and manipulate the values you want. Fig and I have had some struggles getting a backup of their Constellation service running in real-time and keeping up with Jetstream on my home server, because the only storage on said home server with enough free space for Constellation's full index is a ZFS pool that's primarily hard-drive based, and the way the Constellation RocksDB backend storage is structured makes processing delete events extremely expensive on a hard drive where seek times are nontrivial. On a Pi 4 with an SSD, it runs just fine. ![[at://did:plc:44ybard66vv44zksje25o7dz/app.bsky.feed.post/3m7e3hnyh5c2u]] 52 + But it's a problem for every database. Custom feed builder service [graze.social](https://graze.social/) ran into difficulties with Postgres early on in their development, as they rapidly gained popularity. They ended up using the same database I did, Clickhouse, for many of the same reasons. ![[at://did:plc:i6y3jdklpvkjvynvsrnqfdoq/app.bsky.feed.post/3m7ecmqcwys23]] 53 + And while thankfully I don't think that a platform oriented around long-form written content will ever have the kinds of following timeline graph write amplification problems Bluesky has dealt with, even if it becomes successful beyond my wildest dreams, there are definitely going to be areas where latency matters a ton and the workload is very write-heavy, like real-time collaboration, particularly if a large number of people work on a document simultaneously, even while the vast majority of requests will primarily be reading data out. 54 + 55 + One reason why the edit records for Weaver have three link fields (and may get more!), even though it may seem a bit redundant, is precisely because those links make it easy to graph the relationships between them, to trace a tree of edits backward to the root, while also allowing direct access and a direct relationship to the root snapshot and the thing it's associated with. 56 + 57 + In contrast, notebook entry records lack links to other parts of the notebook in and of themselves because calculating them would be challenging, and updating one entry would require not just updating the entry itself and notebook it's in, but also neighbouring entries in said notebook. With the shape of collaborative publishing in Weaver, that would result in up to 4 writes to the PDS when you publish an entry, in addition to any blob uploads. And trying to link the other way in edit history (root to edit head) is similarly challenging. 58 + 59 + I anticipated some of these. but others emerged only because I ran into them while building the web app. I've had to manually fix up records more than once because I made breaking changes to my lexicons after discovering I really wanted X piece of metadata or cross-linkage. If I'd built the index first or alongside—particularly if the index remained a separate service from the web app as I intended it to, to keep the web app simple—it would likely have constrained my choices and potentially cut off certain solutions, due to the time it takes to dump the database and re-run backfill even at a very small scale. Building a big chunk of the front end first told me exactly what the index needed to provide easy access to. 60 + # ClickHAUS 61 + So what does Weaver's index look like? Well it starts with either the firehose or the new Tap sync tool. The index ingests from either over a WebSocket connection, does a bit of processing (less is required when ingesting from Tap, and that's currently what I've deployed) and then dumps them in the Clickhouse database. I chose it as the primary index database on recommendation from a friend, and after doing a lot of reading. It fits atproto data well, as Graze found. Because it isolates concurrent inserts and selects so that you can just dump data in, while it cleans things up asynchronously after, it does wonderfully when you have a single major input point or a set of them to dump into that fans out, which you can then transform and then read from. 62 + 63 + I will not claim that the tables you can find in the weaver repository are especially **good** database design overall, but they work, and we'll see how they scale. This is one of three main input tables. One for record writes, one for identity events, and one for account events. 64 + ```SQL 65 + CREATE TABLE IF NOT EXISTS raw_records ( 66 + did String, 67 + collection LowCardinality(String), 68 + rkey String, 69 + cid String, 70 + -- Repository revision (TID) 71 + rev String, 72 + record JSON, 73 + -- Operation: 'create', 'update', 'delete', 'cache' (fetched on-demand) 74 + operation LowCardinality(String), 75 + -- Firehose sequence number 76 + seq UInt64, 77 + -- Event timestamp from firehose 78 + event_time DateTime64(3), 79 + -- When the database indexed this record 80 + indexed_at DateTime64(3) DEFAULT now64(3), 81 + -- Validation state: 'unchecked', 'valid', 'invalid_rev', 'invalid_gap', 'invalid_account' 82 + validation_state LowCardinality(String) DEFAULT 'unchecked', 83 + -- Whether this came from live firehose (true) or backfill (false) 84 + is_live Bool DEFAULT true, 85 + -- Materialized AT URI for convenience 86 + uri String MATERIALIZED concat('at://', did, '/', collection, '/', rkey), 87 + -- Projection for fast delete lookups by (did, cid) 88 + PROJECTION by_did_cid ( 89 + SELECT * ORDER BY (did, cid) 90 + ) 91 + ) 92 + ENGINE = MergeTree() 93 + ORDER BY (collection, did, rkey, event_time, indexed_at); 94 + ``` 95 + From here we fan out into a cascading series of materialized views and other specialised tables. These break out the different record types, calculate metadata, and pull critical fields out of the record JSON for easier querying. Clickhouse's wild-ass compression means we're not too badly off replicating data on disk this way. Seriously, their JSON type ends up being the same size as a CBOR BLOB on disk in my testing, though it *does* have some quirks, as I discovered when I read back Datetime fields and got...not the format I put in. Thankfully there's a config setting for that. ![Clickhouse animation showing parallel inserts into a source table and a transformation query into a materialized view](https://clickhouse.com/docs/assets/images/incremental_materialized_view-1158726e31b08dc9808d96671239467f.gif)We also build out the list of who contributed to a published entry and determine the canonical record for it, so that fetching a fully hydrated entry with all contributor profiles only takes a couple of `SELECT` queries that themselves avoid performing extensive table scans due to reasonable choices of `ORDER BY` fields in the denormalized tables they query. And then I can do quirky things like power a profile fetch endpoint that will provide either a Weaver or a Bluesky profile, while also unifying fields so that we can easily get at the critical stuff in common. This is a relatively expensive calculation, but people thankfully don't edit their profiles that often, and this is why we don't keep the stats in the same table. 96 + 97 + However, this is ***also*** why Clickhouse will not be the only database used in the index. 98 + 99 + # Why is it always SQLite? 100 + When it comes to things like real-time collaboration sessions with almost keystroke-level cursor tracking and rapid per-user writeback/readback, where latency matters and we can't wait around for the merge cycle to produce the right state, *don't* work well in Clickhouse. But they sure do in SQLite! 101 + 102 + If there's one thing the AT Protocol developer community loves more than base32-encoded timestamps it's SQLite. In fairness, we're in good company, the whole world loves SQLite. It's a good fucking embedded database and very hard to beat for write or read performance so long as you're not trying to hit it massively concurrently. Of course, that concurrency limitation does end up mattering as you scale. And here we take a cue from the Typescript PDS implementation and discover the magic of buying, well, a lot more than two of them, and of using the filesystem like a hierarchical key-value store. 103 + 104 + <iframe width="560" height="315" src="https://www.youtube.com/embed/CZs-YcmxyUw?si=bd3GmSxMVQGdqHAR" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe> 105 + 106 + This part of the data backend is still *very* much a work-in-progress and isn't used yet in the deployed version, but I did want to discuss the architecture. Unlike the PDS, we don't divide primarily by DID, instead we shard by resource, designated by collection and record key. 107 + 108 + ```rust 109 + pub struct ShardKey { 110 + pub collection: SmolStr, 111 + pub rkey: SmolStr, 112 + } 113 + 114 + impl ShardKey { 115 + ... 116 + /// Directory path: {base}/{hash(collection,rkey)[0..2]}/{rkey}/ 117 + fn dir_path(&self, base: &Path) -> PathBuf { 118 + base.join(self.hash_prefix()).join(self.rkey.as_str()) 119 + } 120 + ... 121 + } 122 + /// A single SQLite shard for a resource 123 + pub struct SqliteShard { 124 + conn: Mutex<Connection>, 125 + path: PathBuf, 126 + last_accessed: Mutex<Instant>, 127 + } 128 + /// Routes resources to their SQLite shards 129 + pub struct ShardRouter { 130 + base_path: PathBuf, 131 + shards: DashMap<ShardKey, std::sync::Arc<SqliteShard>>, 132 + } 133 + ``` 134 + 135 + The hash of the shard key plus the record key gives us the directory where we put the database file for this resource. Ultimately this may be moved out of the main index off onto something more comparable to the Tangled knot server or Streamplace nodes, depending on what constraints we run into if things go exceptionally well, but for now it lives as part of the index. In there we can tee off raw events from the incoming firehose and then transform them into the correct forms in memory, optionally persisted to disk, alongside Clickhouse and probably, for the specific things we want it for with a local scope, faster. 136 + 137 + And direct communication, either by using something like oatproxy to swap the auth relationships around a bit (currently the index is accessed via service proxying through the PDS when authenticated) or via an iroh channel from the client, gets stuff there without having to wait for the relay to pick it up and fan it out to us, which then means that users can read their own writes very effectively. The handler hits the relevant SQLite shard if present and Clickhouse in parallel, merging the data to provide the most up-to-date form. For real-time collaboration this is critical. The current `iroh-gossip` implementation works well and requires only a generic iroh relay, but it runs into the problem every gossip protocol runs into the more concurrent users you have. 138 + 139 + The exact method of authentication of that side-channel is by far the largest remaining unanswered question about Weaver right now, aside from "Will anyone (else) use it?" 140 + 141 + If people have ideas, I'm all ears. 142 + 143 + I hope you found this interesting. I enjoyed writing it out.
weaver_notes/diff_record.png

This is a binary file and will not be displayed.