swim protocol in ocaml interoperable with membership lib and serf cli

docs: add usage guide covering cluster creation, messaging, and config

Changed files
+151
docs
+151
docs/usage.md
··· 1 + # SWIM Protocol Library - Usage Guide 2 + 3 + This library provides a production-ready implementation of the SWIM (Scalable Weakly-consistent Infection-style Process Group Membership) protocol in OCaml 5. It handles cluster membership, failure detection, and messaging. 4 + 5 + ## Key Features 6 + 7 + - **Membership**: Automatic discovery and failure detection. 8 + - **Gossip**: Efficient state propagation (Alive/Suspect/Dead). 9 + - **Messaging**: 10 + - **Broadcast**: Eventual consistency (gossip-based) for cluster-wide updates. 11 + - **Direct Send**: High-throughput point-to-point UDP messaging. 12 + - **Security**: AES-256-GCM encryption. 13 + - **Zero-Copy**: Optimized buffer management for high performance. 14 + 15 + ## Getting Started 16 + 17 + ### 1. Define Configuration 18 + 19 + Start with `default_config` and customize as needed. 20 + 21 + ```ocaml 22 + open Swim.Types 23 + 24 + let config = { 25 + default_config with 26 + bind_port = 7946; 27 + node_name = Some "node-1"; 28 + secret_key = "your-32-byte-secret-key-must-be-32-bytes"; (* 32 bytes for AES-256 *) 29 + encryption_enabled = true; 30 + } 31 + ``` 32 + 33 + ### 2. Create and Start a Cluster Node 34 + 35 + Use `Cluster.create` within an Eio switch. 36 + 37 + ```ocaml 38 + module Cluster = Swim.Cluster 39 + 40 + let () = 41 + Eio_main.run @@ fun env -> 42 + Eio.Switch.run @@ fun sw -> 43 + 44 + (* Create environment wrapper *) 45 + let env_wrap = { stdenv = env; sw } in 46 + 47 + match Cluster.create ~sw ~env:env_wrap ~config with 48 + | Error `Invalid_key -> failwith "Invalid secret key" 49 + | Ok cluster -> 50 + (* Start background daemons (protocol loop, UDP receiver, TCP listener) *) 51 + Cluster.start cluster; 52 + 53 + Printf.printf "Node started!\n%!"; 54 + 55 + (* Keep running *) 56 + Eio.Fiber.await_cancel () 57 + ``` 58 + 59 + ### 3. Joining a Cluster 60 + 61 + To join an existing cluster, you need the address of at least one seed node. 62 + 63 + ```ocaml 64 + let seed_nodes = ["192.168.1.10:7946"] in 65 + match Cluster.join cluster ~seed_nodes with 66 + | Ok () -> Printf.printf "Joined cluster successfully\n" 67 + | Error `No_seeds_reachable -> Printf.printf "Failed to join cluster\n" 68 + ``` 69 + 70 + ## Messaging 71 + 72 + ### Broadcast (Gossip) 73 + Use `broadcast` to send data to **all** nodes. This uses the gossip protocol (piggybacking on membership messages). It is bandwidth-efficient but has higher latency and is eventually consistent. 74 + 75 + **Best for:** Configuration updates, low-frequency state sync. 76 + 77 + ```ocaml 78 + Cluster.broadcast cluster 79 + ~topic:"config-update" 80 + ~payload:"{\"version\": 2}" 81 + ``` 82 + 83 + ### Direct Send (Point-to-Point) 84 + Use `send` to send a message directly to a specific node via UDP. This is high-throughput and low-latency. 85 + 86 + **Best for:** RPC, high-volume data transfer, direct coordination. 87 + 88 + ```ocaml 89 + (* Send by Node ID *) 90 + let target_node_id = node_id_of_string "node-2" in 91 + Cluster.send cluster 92 + ~target:target_node_id 93 + ~topic:"ping" 94 + ~payload:"pong" 95 + 96 + (* Send by Address (if Node ID unknown) *) 97 + let addr = `Udp (Eio.Net.Ipaddr.of_raw "\192\168\001\010", 7946) in 98 + Cluster.send_to_addr cluster 99 + ~addr 100 + ~topic:"alert" 101 + ~payload:"alert-data" 102 + ``` 103 + 104 + ### Handling Messages 105 + Register a callback to handle incoming messages (both broadcast and direct). 106 + 107 + ```ocaml 108 + Cluster.on_message cluster (fun sender topic payload -> 109 + Printf.printf "Received '%s' from %s: %s\n" 110 + topic 111 + (node_id_to_string sender.id) 112 + payload 113 + ) 114 + ``` 115 + 116 + ## Membership Events 117 + 118 + Listen for node lifecycle events. 119 + 120 + ```ocaml 121 + Eio.Fiber.fork ~sw (fun () -> 122 + let stream = Cluster.events cluster in 123 + while true do 124 + match Eio.Stream.take stream with 125 + | Join node -> Printf.printf "Node joined: %s\n" (node_id_to_string node.id) 126 + | Leave node -> Printf.printf "Node left: %s\n" (node_id_to_string node.id) 127 + | Suspect_event node -> Printf.printf "Node suspected: %s\n" (node_id_to_string node.id) 128 + | Alive_event node -> Printf.printf "Node alive again: %s\n" (node_id_to_string node.id) 129 + | Update _ -> () 130 + done 131 + ) 132 + ``` 133 + 134 + ## Configuration Options 135 + 136 + | Field | Default | Description | 137 + |-------|---------|-------------| 138 + | `bind_addr` | "0.0.0.0" | Interface to bind UDP/TCP listeners. | 139 + | `bind_port` | 7946 | Port for SWIM protocol. | 140 + | `protocol_interval` | 1.0 | Seconds between probe rounds. Lower = faster failure detection, higher bandwidth. | 141 + | `probe_timeout` | 0.5 | Seconds to wait for Ack. | 142 + | `indirect_checks` | 3 | Number of peers to ask for indirect probes. | 143 + | `udp_buffer_size` | 1400 | Max UDP packet size (MTU). | 144 + | `secret_key` | (zeros) | 32-byte key for AES-256-GCM. | 145 + | `max_gossip_queue_depth` | 5000 | Max items in broadcast queue before dropping oldest (prevents leaks). | 146 + 147 + ## Performance Tips 148 + 149 + 1. **Buffer Pool**: The library uses zero-copy buffer pools. Ensure `send_buffer_count` and `recv_buffer_count` are sufficient for your load (default 16). 150 + 2. **Gossip Limit**: If broadcasting aggressively, `max_gossip_queue_depth` protects memory but may drop messages. Use `Direct Send` for high volume. 151 + 3. **Eio**: Run within an Eio domain/switch. The library is designed for OCaml 5 multicore.