+151
docs/usage.md
+151
docs/usage.md
···
1
+
# SWIM Protocol Library - Usage Guide
2
+
3
+
This library provides a production-ready implementation of the SWIM (Scalable Weakly-consistent Infection-style Process Group Membership) protocol in OCaml 5. It handles cluster membership, failure detection, and messaging.
4
+
5
+
## Key Features
6
+
7
+
- **Membership**: Automatic discovery and failure detection.
8
+
- **Gossip**: Efficient state propagation (Alive/Suspect/Dead).
9
+
- **Messaging**:
10
+
- **Broadcast**: Eventual consistency (gossip-based) for cluster-wide updates.
11
+
- **Direct Send**: High-throughput point-to-point UDP messaging.
12
+
- **Security**: AES-256-GCM encryption.
13
+
- **Zero-Copy**: Optimized buffer management for high performance.
14
+
15
+
## Getting Started
16
+
17
+
### 1. Define Configuration
18
+
19
+
Start with `default_config` and customize as needed.
20
+
21
+
```ocaml
22
+
open Swim.Types
23
+
24
+
let config = {
25
+
default_config with
26
+
bind_port = 7946;
27
+
node_name = Some "node-1";
28
+
secret_key = "your-32-byte-secret-key-must-be-32-bytes"; (* 32 bytes for AES-256 *)
29
+
encryption_enabled = true;
30
+
}
31
+
```
32
+
33
+
### 2. Create and Start a Cluster Node
34
+
35
+
Use `Cluster.create` within an Eio switch.
36
+
37
+
```ocaml
38
+
module Cluster = Swim.Cluster
39
+
40
+
let () =
41
+
Eio_main.run @@ fun env ->
42
+
Eio.Switch.run @@ fun sw ->
43
+
44
+
(* Create environment wrapper *)
45
+
let env_wrap = { stdenv = env; sw } in
46
+
47
+
match Cluster.create ~sw ~env:env_wrap ~config with
48
+
| Error `Invalid_key -> failwith "Invalid secret key"
49
+
| Ok cluster ->
50
+
(* Start background daemons (protocol loop, UDP receiver, TCP listener) *)
51
+
Cluster.start cluster;
52
+
53
+
Printf.printf "Node started!\n%!";
54
+
55
+
(* Keep running *)
56
+
Eio.Fiber.await_cancel ()
57
+
```
58
+
59
+
### 3. Joining a Cluster
60
+
61
+
To join an existing cluster, you need the address of at least one seed node.
62
+
63
+
```ocaml
64
+
let seed_nodes = ["192.168.1.10:7946"] in
65
+
match Cluster.join cluster ~seed_nodes with
66
+
| Ok () -> Printf.printf "Joined cluster successfully\n"
67
+
| Error `No_seeds_reachable -> Printf.printf "Failed to join cluster\n"
68
+
```
69
+
70
+
## Messaging
71
+
72
+
### Broadcast (Gossip)
73
+
Use `broadcast` to send data to **all** nodes. This uses the gossip protocol (piggybacking on membership messages). It is bandwidth-efficient but has higher latency and is eventually consistent.
74
+
75
+
**Best for:** Configuration updates, low-frequency state sync.
76
+
77
+
```ocaml
78
+
Cluster.broadcast cluster
79
+
~topic:"config-update"
80
+
~payload:"{\"version\": 2}"
81
+
```
82
+
83
+
### Direct Send (Point-to-Point)
84
+
Use `send` to send a message directly to a specific node via UDP. This is high-throughput and low-latency.
85
+
86
+
**Best for:** RPC, high-volume data transfer, direct coordination.
87
+
88
+
```ocaml
89
+
(* Send by Node ID *)
90
+
let target_node_id = node_id_of_string "node-2" in
91
+
Cluster.send cluster
92
+
~target:target_node_id
93
+
~topic:"ping"
94
+
~payload:"pong"
95
+
96
+
(* Send by Address (if Node ID unknown) *)
97
+
let addr = `Udp (Eio.Net.Ipaddr.of_raw "\192\168\001\010", 7946) in
98
+
Cluster.send_to_addr cluster
99
+
~addr
100
+
~topic:"alert"
101
+
~payload:"alert-data"
102
+
```
103
+
104
+
### Handling Messages
105
+
Register a callback to handle incoming messages (both broadcast and direct).
106
+
107
+
```ocaml
108
+
Cluster.on_message cluster (fun sender topic payload ->
109
+
Printf.printf "Received '%s' from %s: %s\n"
110
+
topic
111
+
(node_id_to_string sender.id)
112
+
payload
113
+
)
114
+
```
115
+
116
+
## Membership Events
117
+
118
+
Listen for node lifecycle events.
119
+
120
+
```ocaml
121
+
Eio.Fiber.fork ~sw (fun () ->
122
+
let stream = Cluster.events cluster in
123
+
while true do
124
+
match Eio.Stream.take stream with
125
+
| Join node -> Printf.printf "Node joined: %s\n" (node_id_to_string node.id)
126
+
| Leave node -> Printf.printf "Node left: %s\n" (node_id_to_string node.id)
127
+
| Suspect_event node -> Printf.printf "Node suspected: %s\n" (node_id_to_string node.id)
128
+
| Alive_event node -> Printf.printf "Node alive again: %s\n" (node_id_to_string node.id)
129
+
| Update _ -> ()
130
+
done
131
+
)
132
+
```
133
+
134
+
## Configuration Options
135
+
136
+
| Field | Default | Description |
137
+
|-------|---------|-------------|
138
+
| `bind_addr` | "0.0.0.0" | Interface to bind UDP/TCP listeners. |
139
+
| `bind_port` | 7946 | Port for SWIM protocol. |
140
+
| `protocol_interval` | 1.0 | Seconds between probe rounds. Lower = faster failure detection, higher bandwidth. |
141
+
| `probe_timeout` | 0.5 | Seconds to wait for Ack. |
142
+
| `indirect_checks` | 3 | Number of peers to ask for indirect probes. |
143
+
| `udp_buffer_size` | 1400 | Max UDP packet size (MTU). |
144
+
| `secret_key` | (zeros) | 32-byte key for AES-256-GCM. |
145
+
| `max_gossip_queue_depth` | 5000 | Max items in broadcast queue before dropping oldest (prevents leaks). |
146
+
147
+
## Performance Tips
148
+
149
+
1. **Buffer Pool**: The library uses zero-copy buffer pools. Ensure `send_buffer_count` and `recv_buffer_count` are sufficient for your load (default 16).
150
+
2. **Gossip Limit**: If broadcasting aggressively, `max_gossip_queue_depth` protects memory but may drop messages. Use `Direct Send` for high volume.
151
+
3. **Eio**: Run within an Eio domain/switch. The library is designed for OCaml 5 multicore.