upstream: https://github.com/janestreet/memtrace
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

Add 'memtrace/' from commit 'a88470c3de884182503ba5fcd4729e281e544731'

git-subtree-dir: memtrace
git-subtree-mainline: fb18b2a146dd4d5de44773fc7c9ecb8da544ea32
git-subtree-split: a88470c3de884182503ba5fcd4729e281e544731

+5317
+5
.gitignore
··· 1 + _build 2 + *.install 3 + *.merlin 4 + _opam 5 +
+1
.ocamlformat
··· 1 + profile=janestreet
+40
CHANGES.md
··· 1 + ## v0.2.2 (2021-03-03) 2 + 3 + - Don't write to trace file from forked child process (avoids 4 + corruption from concurrent writers) 5 + 6 + - Fix a TSDL syntax error 7 + 8 + - Bug fixes 9 + 10 + ## v0.2.1.2 (2020-03-08) 11 + 12 + More fixes for 32-bit platforms. 13 + 14 + ## v0.2.1.1 (2020-02-03) 15 + 16 + Fix compilation on 32-bit platforms. 17 + 18 + ## v0.2.1 (2020-01-27) 19 + 20 + Packaging fixes. 21 + 22 + ## v0.2 (2020-12-08) 23 + 24 + - New support for tracing external allocations 25 + 26 + - Bugfix for 32-bit builds 27 + 28 + - Better error-checking for sampling_rate parameter 29 + 30 + ## v0.1.2 (2020-09-16) 31 + 32 + More packaging fixes. 33 + 34 + ## v0.1.1 (2020-09-09) 35 + 36 + Packaging fixes. 37 + 38 + ## v0.1 (2020-09-04) 39 + 40 + Initial release.
+67
CONTRIBUTING.md
··· 1 + This repository contains open source software that is developed and 2 + maintained by [Jane Street][js]. 3 + 4 + Contributions to this project are welcome and should be submitted via 5 + GitHub pull requests. 6 + 7 + Signing contributions 8 + --------------------- 9 + 10 + We require that you sign your contributions. Your signature certifies 11 + that you wrote the patch or otherwise have the right to pass it on as 12 + an open-source patch. The rules are pretty simple: if you can certify 13 + the below (from [developercertificate.org][dco]): 14 + 15 + ``` 16 + Developer Certificate of Origin 17 + Version 1.1 18 + 19 + Copyright (C) 2004, 2006 The Linux Foundation and its contributors. 20 + 1 Letterman Drive 21 + Suite D4700 22 + San Francisco, CA, 94129 23 + 24 + Everyone is permitted to copy and distribute verbatim copies of this 25 + license document, but changing it is not allowed. 26 + 27 + 28 + Developer's Certificate of Origin 1.1 29 + 30 + By making a contribution to this project, I certify that: 31 + 32 + (a) The contribution was created in whole or in part by me and I 33 + have the right to submit it under the open source license 34 + indicated in the file; or 35 + 36 + (b) The contribution is based upon previous work that, to the best 37 + of my knowledge, is covered under an appropriate open source 38 + license and I have the right under that license to submit that 39 + work with modifications, whether created in whole or in part 40 + by me, under the same open source license (unless I am 41 + permitted to submit under a different license), as indicated 42 + in the file; or 43 + 44 + (c) The contribution was provided directly to me by some other 45 + person who certified (a), (b) or (c) and I have not modified 46 + it. 47 + 48 + (d) I understand and agree that this project and the contribution 49 + are public and that a record of the contribution (including all 50 + personal information I submit with it, including my sign-off) is 51 + maintained indefinitely and may be redistributed consistent with 52 + this project or the open source license(s) involved. 53 + ``` 54 + 55 + Then you just add a line to every git commit message: 56 + 57 + ``` 58 + Signed-off-by: Joe Smith <joe.smith@email.com> 59 + ``` 60 + 61 + Use your real name (sorry, no pseudonyms or anonymous contributions.) 62 + 63 + If you set your `user.name` and `user.email` git configs, you can sign 64 + your commit automatically with git commit -s. 65 + 66 + [dco]: http://developercertificate.org/ 67 + [js]: https://opensource.janestreet.com/
+21
LICENSE.md
··· 1 + The MIT License 2 + 3 + Copyright (c) 2020--2026 Jane Street Group, LLC <opensource-contacts@janestreet.com> 4 + 5 + Permission is hereby granted, free of charge, to any person obtaining a copy 6 + of this software and associated documentation files (the "Software"), to deal 7 + in the Software without restriction, including without limitation the rights 8 + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 + copies of the Software, and to permit persons to whom the Software is 10 + furnished to do so, subject to the following conditions: 11 + 12 + The above copyright notice and this permission notice shall be included in all 13 + copies or substantial portions of the Software. 14 + 15 + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 + SOFTWARE.
+17
Makefile
··· 1 + INSTALL_ARGS := $(if $(PREFIX),--prefix $(PREFIX),) 2 + 3 + default: 4 + dune build 5 + 6 + install: 7 + dune install $(INSTALL_ARGS) 8 + 9 + uninstall: 10 + dune uninstall $(INSTALL_ARGS) 11 + 12 + reinstall: uninstall install 13 + 14 + clean: 15 + dune clean 16 + 17 + .PHONY: default install uninstall reinstall clean
+25
README.md
··· 1 + # memtrace 2 + 3 + A streaming client for OCaml's Memprof, which generates compact traces 4 + of a program's memory use. 5 + 6 + To profile the memory use of a program, start by putting this line 7 + somewhere at the program startup: 8 + 9 + Memtrace.trace_if_requested ~context:Version_util.version ();; 10 + 11 + If the `MEMTRACE` environment variable is present, tracing begins to 12 + the filename it specifies. (If it's absent, nothing happens) 13 + 14 + The ~context parameter is optional, and can be set to any string that 15 + helps to identify the trace file. 16 + 17 + If the program daemonises, the call to `trace_if_requested` should 18 + occur *after* the program forks, to ensure the right process is 19 + traced. 20 + 21 + The resulting trace files can be analysed with some simple 22 + command-line tools in bin/, but the recommended interface is the 23 + memtrace viewer, which lives in: 24 + 25 + jane/app/memtrace_viewer
+49
bin/dump_trace.ml
··· 1 + open Memtrace.Trace 2 + 3 + let dump filename = 4 + let trace = Reader.open_ ~filename in 5 + Reader.iter trace (fun time ev -> 6 + Printf.printf "%010Ld " (Timedelta.to_int64 time); 7 + match ev with 8 + | Alloc 9 + { obj_id 10 + ; length 11 + ; domain 12 + ; nsamples 13 + ; source 14 + ; backtrace_buffer 15 + ; backtrace_length 16 + ; common_prefix 17 + } -> 18 + let src = 19 + match source with 20 + | Minor -> "alloc" 21 + | Major -> "alloc_major" 22 + | External -> "alloc_ext" 23 + in 24 + Printf.printf 25 + "%010d %s %d len=%d dom=%d % 4d:" 26 + (obj_id :> int) 27 + src 28 + nsamples 29 + length 30 + (domain :> int) 31 + common_prefix; 32 + let print_location ppf loc = Printf.fprintf ppf "%s" (Location.to_string loc) in 33 + for i = 0 to backtrace_length - 1 do 34 + let s = backtrace_buffer.(i) in 35 + match Reader.lookup_location_code trace s with 36 + | [] -> Printf.printf " $%d" (s :> int) 37 + | ls -> ls |> List.iter (Printf.printf " %a" print_location) 38 + done; 39 + Printf.printf "\n%!" 40 + | Promote (id, _) -> Printf.printf "%010d promote\n" (id :> int) 41 + | Collect (id, _) -> Printf.printf "%010d collect\n" (id :> int)); 42 + Reader.close trace 43 + ;; 44 + 45 + let () = 46 + if Array.length Sys.argv <> 2 47 + then Printf.fprintf stderr "Usage: %s <trace file>\n" Sys.executable_name 48 + else dump Sys.argv.(1) 49 + ;;
+6
bin/dune
··· 1 + (executables 2 + (modes byte exe) 3 + (names dump_trace identify subsample flamegraph hotspots strip_names) 4 + (libraries memtrace unix) 5 + (preprocess 6 + (pps ppx_jane)))
+119
bin/flamegraph.ml
··· 1 + open Memtrace.Trace 2 + 3 + module StrTbl = Hashtbl.Make (struct 4 + type t = string 5 + 6 + let equal = String.equal 7 + let hash = Hashtbl.hash 8 + end) 9 + 10 + type summary = 11 + { mutable samples : int 12 + ; subsums : summary StrTbl.t 13 + } 14 + 15 + let summary filename = 16 + let summary = { samples = 0; subsums = StrTbl.create 20 } in 17 + let count (filenames, nsamples) = 18 + let lastsum = 19 + List.fold_left 20 + (fun sum f -> 21 + if StrTbl.mem sum.subsums f 22 + then StrTbl.find sum.subsums f 23 + else ( 24 + let s = { samples = 0; subsums = StrTbl.create 10 } in 25 + StrTbl.add sum.subsums f s; 26 + s)) 27 + summary 28 + filenames 29 + in 30 + lastsum.samples <- lastsum.samples + nsamples 31 + in 32 + let allocs = Hashtbl.create 20 in 33 + let sz = ref 0 in 34 + let nallocs = ref 0 in 35 + let trace = Reader.open_ ~filename in 36 + Reader.iter trace (fun _time ev -> 37 + match ev with 38 + | Alloc 39 + { obj_id 40 + ; length = _ 41 + ; domain = _ 42 + ; nsamples 43 + ; source = _ 44 + ; backtrace_buffer 45 + ; backtrace_length 46 + ; common_prefix = _ 47 + } -> 48 + let str_of_location (l : Location.t) = 49 + l.defname 50 + (*Printf.sprintf "%s:%d" filename line*) 51 + in 52 + let _print_location ppf Location.{ filename; line; start_char; end_char; _ } = 53 + Printf.fprintf ppf "%s:%d:%d-%d" filename line start_char end_char 54 + in 55 + let filenames = 56 + List.concat 57 + (Array.sub backtrace_buffer 0 backtrace_length 58 + |> Array.map (fun l -> 59 + let locs = Reader.lookup_location_code trace l in 60 + List.map (fun (Location.{ filename = _; _ } as l) -> str_of_location l) locs) 61 + |> Array.to_list) 62 + in 63 + let seen = StrTbl.create 10 in 64 + let rec dedup = function 65 + | [] -> [] 66 + | [ x ] -> [ x ] 67 + | x :: x' :: xs when x = x' -> dedup (x :: xs) 68 + | x :: xs -> x :: dedup xs 69 + in 70 + let filenames = dedup filenames in 71 + let first_filenames = (*List.rev*) filenames in 72 + let first_filenames = 73 + first_filenames 74 + |> List.filter (fun f -> 75 + if StrTbl.mem seen f 76 + then false 77 + else ( 78 + StrTbl.add seen f (); 79 + true)) 80 + in 81 + Hashtbl.add allocs obj_id (first_filenames, nsamples); 82 + sz := !sz + backtrace_length; 83 + incr nallocs; 84 + if true then count (first_filenames, nsamples) 85 + (* count (first_filenames, nsamples) *) 86 + (* first_filenames |> List.iter (Printf.printf " %s"); 87 + Printf.printf "\n%!"*) 88 + | Promote _ -> () 89 + (*count (Hashtbl.find allocs i)*) 90 + | Collect (i, _) -> 91 + assert (Hashtbl.mem allocs i); 92 + Hashtbl.remove allocs i); 93 + Reader.close trace; 94 + let rec dump_summary files_rev summary = 95 + if summary.samples > 0 96 + then ( 97 + match List.rev files_rev with 98 + | [] -> () 99 + | [ _ ] -> () 100 + | x :: xs -> 101 + Printf.printf "%s" x; 102 + List.iter (Printf.printf ";%s") xs; 103 + Printf.printf " %d\n" summary.samples); 104 + let keys = 105 + StrTbl.fold (fun k _ ks -> k :: ks) summary.subsums [] |> List.sort String.compare 106 + in 107 + keys 108 + |> List.iter (fun f -> 109 + let s = StrTbl.find summary.subsums f in 110 + dump_summary (f :: files_rev) s) 111 + in 112 + dump_summary [] summary 113 + ;; 114 + 115 + let () = 116 + if Array.length Sys.argv <> 2 117 + then Printf.fprintf stderr "Usage: %s <trace file>\n" Sys.executable_name 118 + else summary Sys.argv.(1) 119 + ;;
+330
bin/hotspots.ml
··· 1 + open Memtrace.Trace 2 + 3 + module Heavy_hitters (A : Hashtbl.HashedType) : sig 4 + type t 5 + 6 + val make : int -> t 7 + val add : t -> A.t -> unit 8 + val length : t -> int 9 + val iter : t -> (A.t -> int -> int -> unit) -> unit 10 + end = struct 11 + module Tbl = Hashtbl.Make (A) 12 + 13 + type counter = 14 + { (* (hits - misses) is the usual Misra-Gries summary *) 15 + mutable hits : int 16 + ; mutable misses : int 17 + ; (* number of elements added before this one (including duplicates) *) 18 + added_before : int 19 + ; (* number of elements skipped before this one was added *) 20 + skipped_before : int 21 + } 22 + 23 + (*let upper_bound t c = 24 + c.skipped_before 25 + *) 26 + type t = 27 + { k : int 28 + ; tbl : counter Tbl.t 29 + ; mutable len : int 30 + ; mutable added : int 31 + ; mutable skipped : int 32 + } 33 + 34 + let make k = { k; tbl = Tbl.create k; len = 0; added = 0; skipped = 0 } 35 + 36 + let _check t = 37 + let hits = ref 0 in 38 + t.tbl |> Tbl.iter (fun _k c -> hits := !hits + c.hits); 39 + assert (t.len = Tbl.length t.tbl); 40 + assert (!hits + t.skipped = t.added); 41 + () 42 + ;; 43 + 44 + let add t x = 45 + (* check t;*) 46 + (match Tbl.find_opt t.tbl x with 47 + | Some c -> c.hits <- c.hits + 1 48 + | None -> 49 + if t.len < t.k 50 + then ( 51 + t.len <- t.len + 1; 52 + Tbl.add 53 + t.tbl 54 + x 55 + { hits = 1; misses = 0; added_before = t.added; skipped_before = t.skipped }) 56 + else ( 57 + t.skipped <- t.skipped + 1; 58 + t.tbl 59 + |> Tbl.filter_map_inplace (fun _k c -> 60 + c.misses <- c.misses + 1; 61 + if c.hits > c.misses 62 + then Some c 63 + else ( 64 + t.len <- t.len - 1; 65 + t.skipped <- t.skipped + c.hits; 66 + None)))); 67 + t.added <- t.added + 1 68 + ;; 69 + 70 + let length t = t.added 71 + let iter t f = t.tbl |> Tbl.iter (fun x c -> f x c.hits c.skipped_before) 72 + end 73 + 74 + module Loc_tbl = Hashtbl.Make (struct 75 + type t = Location_code.t 76 + 77 + let hash (x : Location_code.t) = ((x :> int) * 218854569) lsr 17 78 + let equal (x : Location_code.t) (y : Location_code.t) = x = y 79 + end) 80 + 81 + module Str_tbl = Hashtbl.Make (struct 82 + type t = string 83 + 84 + let hash = Hashtbl.hash 85 + let equal = String.equal 86 + end) 87 + 88 + type loc_entry = 89 + { line : int 90 + ; start_ch : int 91 + ; end_ch : int 92 + ; func : func 93 + ; mutable alloc_count : int 94 + } 95 + 96 + and func = 97 + { id : int 98 + ; name : string 99 + ; filename : string 100 + ; mutable locs : loc_entry list 101 + ; mutable total_count : int 102 + ; mutable n_allocs : int 103 + ; mutable total_dist_to_alloc : int 104 + } 105 + 106 + module Func_tbl = Hashtbl.Make (struct 107 + type t = func 108 + 109 + let hash (f : func) = f.id * 21089245 110 + let equal (f : func) (g : func) = f.id = g.id 111 + end) 112 + 113 + let total_allocs (f : func) = 114 + List.fold_left (fun k e -> k + e.alloc_count) f.total_count f.locs 115 + ;; 116 + 117 + let direct_allocs (f : func) = List.fold_left (fun k e -> k + e.alloc_count) 0 f.locs 118 + 119 + let avg_dist_to_alloc (f : func) = 120 + float_of_int f.total_dist_to_alloc /. float_of_int f.n_allocs 121 + ;; 122 + 123 + type loc_table = 124 + { entries : loc_entry Loc_tbl.t 125 + ; funcs : (string * string, func) Hashtbl.t 126 + ; trace : Reader.t 127 + ; mutable next_id : int 128 + } 129 + 130 + let new_loc_table trace = 131 + { entries = Loc_tbl.create 10000; funcs = Hashtbl.create 10000; trace; next_id = 0 } 132 + ;; 133 + 134 + let rec describe_location ?(max_discard = 2) trace buf i = 135 + match Reader.lookup_location_code trace buf.(i) with 136 + | [] when i > 0 && max_discard > 0 -> 137 + describe_location ~max_discard:(max_discard - 1) trace buf (i - 1) 138 + | [] -> "??", Printf.sprintf "#%x" (buf.(i) :> int), 0, 0, 0 139 + | locs -> 140 + let l = List.nth locs (List.length locs - 1) in 141 + l.filename, l.defname, l.line, l.start_char, l.end_char 142 + ;; 143 + 144 + let add_loc t buf i = 145 + let loc = buf.(i) in 146 + match Loc_tbl.find t.entries loc with 147 + | e -> e 148 + | exception Not_found -> 149 + let filename, funcname, line, start_ch, end_ch = describe_location t.trace buf i in 150 + let func = 151 + match Hashtbl.find t.funcs (filename, funcname) with 152 + | func -> func 153 + | exception Not_found -> 154 + let id = t.next_id in 155 + t.next_id <- t.next_id + 1; 156 + let func : func = 157 + { id 158 + ; filename 159 + ; name = funcname 160 + ; locs = [] 161 + ; total_count = 0 162 + ; n_allocs = 0 163 + ; total_dist_to_alloc = 0 164 + } 165 + in 166 + Hashtbl.add t.funcs (filename, funcname) func; 167 + func 168 + in 169 + let entry = { line; start_ch; end_ch; func; alloc_count = 0 } in 170 + Loc_tbl.add t.entries loc entry; 171 + func.locs <- entry :: func.locs; 172 + entry 173 + ;; 174 + 175 + module HH = Heavy_hitters (struct 176 + type t = func * func 177 + 178 + let hash ((a, b) : t) = (a.id * 1231441) + (b.id * 3821) 179 + let equal ((a, b) : t) ((a', b') : t) = a.id = a'.id && b.id = b'.id 180 + (* 181 + let hash ((a : location_code), (b : location_code)) = 182 + Int64.(shift_right (add (mul (a :> int64) 0x94837298472a9321L) (mul (b :> int64) 0x4783213feac37L)) 11 |> to_int) 183 + let equal (a, b) (a', b') = 184 + Int64.equal (a : location_code :> int64) (a' : location_code :> int64) && 185 + Int64.equal (b : location_code :> int64) (b' : location_code:> int64)*) 186 + end) 187 + 188 + let count filename = 189 + let trace = Reader.open_ ~filename in 190 + let hh = HH.make 10000 in 191 + let seen = Func_tbl.create 100 in 192 + let locs = new_loc_table trace in 193 + let total_samples = ref 0 in 194 + Reader.iter trace (fun _time ev -> 195 + match ev with 196 + | Alloc 197 + { obj_id = _ 198 + ; length = _ 199 + ; domain = _ 200 + ; nsamples 201 + ; source = _ 202 + ; backtrace_buffer 203 + ; backtrace_length 204 + ; common_prefix = _ 205 + } -> 206 + let allocpt = add_loc locs backtrace_buffer (backtrace_length - 1) in 207 + allocpt.alloc_count <- allocpt.alloc_count + nsamples; 208 + Func_tbl.clear seen; 209 + for i' = 0 to backtrace_length - 2 do 210 + let i = backtrace_length - 2 - i' in 211 + let b = (add_loc locs backtrace_buffer i).func in 212 + if not (Func_tbl.mem seen b) 213 + then ( 214 + Func_tbl.add seen b (); 215 + b.total_count <- b.total_count + nsamples; 216 + b.n_allocs <- b.n_allocs + 1; 217 + b.total_dist_to_alloc <- b.total_dist_to_alloc + (backtrace_length - 1 - i); 218 + HH.add hh (b, allocpt.func)) 219 + done; 220 + total_samples := !total_samples + nsamples 221 + | Promote _ -> () 222 + | Collect _ -> ()); 223 + let tinfo = Reader.info trace in 224 + Reader.close trace; 225 + let total_samples = !total_samples in 226 + let wordsize = 8. in 227 + (* FIXME: store this in the trace *) 228 + let print_bytes ppf = function 229 + | n when n < 100. -> Printf.fprintf ppf "%4.0f B" n 230 + | n when n < 100. *. 1024. -> Printf.fprintf ppf "%4.1f kB" (n /. 1024.) 231 + | n when n < 100. *. 1024. *. 1024. -> 232 + Printf.fprintf ppf "%4.1f MB" (n /. 1024. /. 1024.) 233 + | n when n < 100. *. 1024. *. 1024. *. 1024. -> 234 + Printf.fprintf ppf "%4.1f GB" (n /. 1024. /. 1024. /. 1024.) 235 + | n -> Printf.fprintf ppf "%4.1f TB" (n /. 1024. /. 1024. /. 1024. /. 1024.) 236 + in 237 + Printf.printf 238 + "Trace for %s [%Ld]:\n %d samples of %a allocations\n\n" 239 + tinfo.executable_name 240 + tinfo.pid 241 + total_samples 242 + print_bytes 243 + (float_of_int total_samples /. tinfo.sample_rate *. wordsize); 244 + let hot_allocs = Func_tbl.create 100 in 245 + HH.iter hh (fun (fn, al) d1 _d2 -> 246 + let alloc_prop = float_of_int (direct_allocs al) /. float_of_int total_samples in 247 + if alloc_prop > 0.005 248 + then ( 249 + let callers = 250 + match Func_tbl.find hot_allocs al with 251 + | c -> c 252 + | exception Not_found -> 253 + let c = ref [] in 254 + Func_tbl.add hot_allocs al c; 255 + c 256 + in 257 + let pair_freq = float_of_int d1 in 258 + let fn_freq = float_of_int (total_allocs fn) in 259 + if pair_freq /. fn_freq > max 0.10 (alloc_prop *. 1.2) 260 + && pair_freq /. float_of_int total_samples > 0.005 261 + then callers := (fn, d1) :: !callers)); 262 + let hot_allocs = Func_tbl.fold (fun al c acc -> (al, !c) :: acc) hot_allocs [] in 263 + let hot_allocs = 264 + List.sort 265 + (fun (al, _) (al', _) -> compare (direct_allocs al') (direct_allocs al)) 266 + hot_allocs 267 + in 268 + hot_allocs 269 + |> List.iter (fun (al, callers) -> 270 + let freq = float_of_int (direct_allocs al) /. float_of_int total_samples in 271 + let bytes = float_of_int (direct_allocs al) /. tinfo.sample_rate *. wordsize in 272 + Printf.printf "%a (%4.1f%%) at %s" print_bytes bytes (100. *. freq) al.name; 273 + Printf.printf " (%s:" al.filename; 274 + let printed = ref 0 in 275 + let locs = 276 + al.locs |> List.sort (fun l1 l2 -> compare l2.alloc_count l1.alloc_count) 277 + in 278 + locs 279 + |> List.iter (fun { line; start_ch; end_ch; alloc_count; func = _ } -> 280 + if alloc_count > 0 281 + then ( 282 + (match !printed with 283 + | n when n < 3 -> 284 + if n > 0 then Printf.printf ", "; 285 + Printf.printf "%d:%d-%d" line start_ch end_ch 286 + | 3 -> Printf.printf "..." 287 + | _ -> ()); 288 + incr printed)); 289 + Printf.printf ")\n"; 290 + let first_caller = ref true in 291 + let callers = 292 + List.sort 293 + (fun (_c, f) (_c', f') -> 294 + (*compare (avg_dist_to_alloc c) (avg_dist_to_alloc c')*) 295 + (* compare (float_of_int f' /. float_of_int (total_allocs c')) (float_of_int f /. float_of_int (total_allocs c)) *) 296 + compare f' f) 297 + callers 298 + in 299 + callers 300 + |> List.iter (fun (caller, freq) -> 301 + let rfreq = float_of_int freq /. float_of_int (total_allocs caller) in 302 + (* let pfreq = float_of_int freq /. float_of_int total_samples in *) 303 + let bytes = float_of_int freq /. tinfo.sample_rate *. wordsize in 304 + if float_of_int (total_allocs caller) /. float_of_int total_samples > 0.005 305 + then ( 306 + if !first_caller then Printf.printf " including:\n"; 307 + first_caller := false; 308 + Printf.printf 309 + " %a via %s (%.0f%% of this)\n" 310 + print_bytes 311 + bytes 312 + caller.name 313 + (100. *. rfreq))); 314 + Printf.printf "\n") 315 + ;; 316 + 317 + (* 318 + let pair_freq = d1 in 319 + let tot_freq = summarize_fn fn in 320 + if 100 * pair_freq / tot_freq > 20 then begin 321 + let frac_of_program = float_of_int pair_freq /. float_of_int !total_samples in 322 + Printf.printf "%.1f%% % 6d % 6d % 6d %d %s %s\n" (100. *. rf) !total_samples d1 d2 tot_freq fn.name al.name 323 + end) 324 + *) 325 + 326 + let () = 327 + if Array.length Sys.argv <> 2 328 + then Printf.fprintf stderr "Usage: %s <trace file>\n" Sys.executable_name 329 + else count Sys.argv.(1) 330 + ;;
+160
bin/identify.ml
··· 1 + open Memtrace.Trace 2 + 3 + let print_duration ppf = function 4 + | n -> Printf.fprintf ppf "%.3f s" n 5 + ;; 6 + 7 + let print_bytes ppf = function 8 + | n when n < 1000. -> Printf.fprintf ppf "%3.0f B" n 9 + | n when n < 1000. *. 1024. -> 10 + let n = n /. 1024. in 11 + Printf.fprintf ppf "%3.*f kB" (if n < 10. then 1 else 0) n 12 + | n when n < 1000. *. 1024. *. 1024. -> 13 + let n = n /. 1024. /. 1024. in 14 + Printf.fprintf ppf "%3.*f MB" (if n < 10. then 1 else 0) n 15 + | n when n < 1000. *. 1024. *. 1024. *. 1024. -> 16 + let n = n /. 1024. /. 1024. /. 1024. in 17 + Printf.fprintf ppf "%3.*f GB" (if n < 10. then 1 else 0) n 18 + | n -> 19 + let n = n /. 1024. /. 1024. /. 1024. /. 1024. in 20 + Printf.fprintf ppf "%3.*f TB" (if n < 10. then 1 else 0) n 21 + ;; 22 + 23 + let identify filename = 24 + let trace = Reader.open_ ~filename in 25 + let info = Reader.info trace in 26 + Printf.printf 27 + "Trace file %s (%a)\n" 28 + filename 29 + print_bytes 30 + (Int64.to_float (Reader.size_bytes trace)); 31 + let start_time = Timestamp.to_int64 info.start_time in 32 + let tm : Unix.tm = Unix.gmtime (Int64.to_float start_time *. 1e-6) in 33 + let days = [| "Sun"; "Mon"; "Tue"; "Wed"; "Thu"; "Fri"; "Sat" |] in 34 + let months = 35 + [| "Jan" 36 + ; "Feb" 37 + ; "Mar" 38 + ; "Apr" 39 + ; "May" 40 + ; "Jun" 41 + ; "Jul" 42 + ; "Aug" 43 + ; "Sep" 44 + ; "Oct" 45 + ; "Nov" 46 + ; "Dec" 47 + |] 48 + in 49 + Printf.printf " of %s\n" info.executable_name; 50 + Printf.printf 51 + " at %s %s %d %02d:%02d:%02d %d UTC\n" 52 + days.(tm.tm_wday) 53 + months.(tm.tm_mon) 54 + tm.tm_mday 55 + tm.tm_hour 56 + tm.tm_min 57 + tm.tm_sec 58 + (tm.tm_year + 1900); 59 + Printf.printf 60 + " on host %s (%d-bit), pid %Ld, sample rate %.1e\n" 61 + info.host_name 62 + info.word_size 63 + info.pid 64 + info.sample_rate; 65 + Printf.printf " with OCaml GC params %s\n" info.ocaml_runtime_params; 66 + (match info.context with 67 + | None -> () 68 + | Some c -> Printf.printf " with context: %s\n" c); 69 + Printf.printf "Trace statistics:\n%!"; 70 + let tmax = ref 0L in 71 + let minor_live = Obj_id.Tbl.create 100 in 72 + (* As an optimisation, only store allocations with nsamples > 1 *) 73 + let major_live_multisampled = Obj_id.Tbl.create 100 in 74 + let minor_alloc = ref 0 75 + and minor_collect = ref 0 76 + and promote = ref 0 in 77 + let major_alloc = ref 0 78 + and major_collect = ref 0 in 79 + let dist4 = ref 0 80 + and dist16 = ref 0 81 + and dist256 = ref 0 82 + and szmax = ref 0 in 83 + Reader.iter ~parse_backtraces:false trace (fun t ev -> 84 + tmax := Timedelta.to_int64 t; 85 + match ev with 86 + | Alloc { obj_id; length; nsamples; source; _ } -> 87 + if source <> Minor 88 + then ( 89 + major_alloc := !major_alloc + nsamples; 90 + if nsamples > 1 then Obj_id.Tbl.add major_live_multisampled obj_id nsamples) 91 + else ( 92 + minor_alloc := !minor_alloc + nsamples; 93 + Obj_id.Tbl.add minor_live obj_id nsamples); 94 + if length <= 4 then dist4 := !dist4 + nsamples; 95 + if length <= 16 then dist16 := !dist16 + nsamples; 96 + if length <= 256 then dist256 := !dist256 + nsamples; 97 + if length > !szmax then szmax := length 98 + | Promote (id, _) -> 99 + assert (Obj_id.Tbl.mem minor_live id); 100 + let nsamples = Obj_id.Tbl.find minor_live id in 101 + Obj_id.Tbl.remove minor_live id; 102 + promote := !promote + nsamples; 103 + if nsamples > 1 then Obj_id.Tbl.add major_live_multisampled id nsamples 104 + | Collect (id, _) -> 105 + if Obj_id.Tbl.mem minor_live id 106 + then ( 107 + let nsamples = Obj_id.Tbl.find minor_live id in 108 + Obj_id.Tbl.remove minor_live id; 109 + minor_collect := !minor_collect + nsamples) 110 + else ( 111 + let nsamples = 112 + match Obj_id.Tbl.find major_live_multisampled id with 113 + | n -> 114 + Obj_id.Tbl.remove major_live_multisampled id; 115 + n 116 + | exception Not_found -> 1 117 + in 118 + major_collect := !major_collect + nsamples)); 119 + let duration = Int64.to_float !tmax *. 1e-6 in 120 + let bytes nsamples = 121 + float_of_int nsamples /. info.sample_rate *. float_of_int (info.word_size / 8) 122 + in 123 + let minor_alloc = bytes !minor_alloc 124 + and _minor_collect = bytes !minor_collect 125 + and promote = bytes !promote in 126 + let major_alloc = bytes !major_alloc 127 + and major_collect = bytes !major_collect in 128 + let total_alloc = minor_alloc +. major_alloc in 129 + let dist4 = bytes !dist4 130 + and dist16 = bytes !dist16 131 + and dist256 = bytes !dist256 in 132 + Printf.printf 133 + " Minor: %a/s allocations, %.1f%% promoted\n" 134 + print_bytes 135 + (minor_alloc /. duration) 136 + (100. *. promote /. minor_alloc); 137 + Printf.printf 138 + " Major: %a/s allocations (%a/s direct), %a/s collections\n" 139 + print_bytes 140 + ((major_alloc +. promote) /. duration) 141 + print_bytes 142 + (major_alloc /. duration) 143 + print_bytes 144 + (major_collect /. duration); 145 + Printf.printf 146 + " Sizes (by word): %.0f%% <= 4w, %.0f%% <= 16w, %.0f%% <= 256w. Max: %a\n" 147 + (100. *. dist4 /. total_alloc) 148 + (100. *. dist16 /. total_alloc) 149 + (100. *. dist256 /. total_alloc) 150 + print_bytes 151 + (float_of_int (!szmax * info.word_size / 8)); 152 + Printf.printf " Runtime: %a\n" print_duration duration; 153 + Reader.close trace 154 + ;; 155 + 156 + let () = 157 + if Array.length Sys.argv <> 2 158 + then Printf.fprintf stderr "Usage: %s <trace file>\n" Sys.executable_name 159 + else identify Sys.argv.(1) 160 + ;;
+141
bin/strip_names.ml
··· 1 + module Anon_trace = struct 2 + type obj_id = Memtrace.Trace.Obj_id.t 3 + type dom_id = Memtrace.Trace.Domain_id.t 4 + type timestamp = Memtrace.Trace.Timestamp.t 5 + type timedelta = Memtrace.Trace.Timedelta.t 6 + 7 + type allocation_source = Memtrace.Trace.Allocation_source.t = 8 + | Minor 9 + | Major 10 + | External 11 + 12 + type event = 13 + | Alloc of 14 + { obj_id : obj_id 15 + ; domain : dom_id 16 + ; length : int 17 + ; nsamples : int 18 + ; source : allocation_source 19 + } 20 + | Promote of obj_id * dom_id 21 + | Collect of obj_id * dom_id 22 + 23 + type trace_info = 24 + { sample_rate : float 25 + ; word_size : int 26 + ; ocaml_runtime_params : string 27 + } 28 + end 29 + 30 + module Anon_writer : sig 31 + type t 32 + 33 + val create : Unix.file_descr -> Anon_trace.trace_info -> t 34 + val put_event : t -> Anon_trace.timedelta -> Anon_trace.event -> unit 35 + val flush : t -> unit 36 + end = struct 37 + open Anon_trace 38 + module Trace = Memtrace.Trace 39 + module Writer = Trace.Writer.Multiplexed_domains 40 + 41 + type t = Writer.t 42 + 43 + let start_time = Trace.Timestamp.of_int64 0L 44 + 45 + let create fd info = 46 + let { sample_rate; word_size; ocaml_runtime_params } = info in 47 + let info : Trace.Info.t = 48 + { sample_rate 49 + ; word_size 50 + ; ocaml_runtime_params 51 + ; executable_name = "<anonymous>" 52 + ; host_name = "<anonymous>" 53 + ; pid = 0L 54 + ; initial_domain = Trace.Domain_id.main_domain 55 + ; start_time 56 + ; context = None 57 + } 58 + in 59 + Writer.create fd ~getpid:(fun () -> 0L) info 60 + ;; 61 + 62 + let put_event w time ev = 63 + let ev : Trace.Event.t = 64 + match ev with 65 + | Alloc { obj_id; length; domain; nsamples; source } -> 66 + Alloc 67 + { obj_id 68 + ; length 69 + ; domain 70 + ; nsamples 71 + ; source 72 + ; backtrace_buffer = [||] 73 + ; backtrace_length = 0 74 + ; common_prefix = 0 75 + } 76 + | Promote (id, dom) -> Promote (id, dom) 77 + | Collect (id, dom) -> Collect (id, dom) 78 + in 79 + Writer.put_event 80 + w 81 + (Trace.Timedelta.offset start_time time) 82 + ev 83 + ~decode_callstack_entry:(fun _ -> assert false) 84 + ;; 85 + 86 + let flush = Writer.flush 87 + end 88 + 89 + module Anon_reader : sig 90 + type t 91 + 92 + val open_ : filename:string -> t 93 + val info : t -> Anon_trace.trace_info 94 + val iter : t -> (Anon_trace.timedelta -> Anon_trace.event -> unit) -> unit 95 + val close : t -> unit 96 + end = struct 97 + open Anon_trace 98 + module Trace = Memtrace.Trace 99 + module Reader = Trace.Reader 100 + 101 + type t = Reader.t 102 + 103 + let open_ = Reader.open_ 104 + 105 + let info r = 106 + let { sample_rate; word_size; ocaml_runtime_params; _ } : Trace.Info.t = 107 + Reader.info r 108 + in 109 + { sample_rate; word_size; ocaml_runtime_params } 110 + ;; 111 + 112 + let iter r f = 113 + Reader.iter r (fun time (ev : Trace.Event.t) -> 114 + let ev = 115 + match ev with 116 + | Alloc { obj_id; length; domain; nsamples; source; _ } -> 117 + Alloc { obj_id; length; domain; nsamples; source } 118 + | Promote (id, dom) -> Promote (id, dom) 119 + | Collect (id, dom) -> Collect (id, dom) 120 + in 121 + f time ev) 122 + ;; 123 + 124 + let close = Reader.close 125 + end 126 + 127 + let strip_names infile outfile = 128 + let r = Anon_reader.open_ ~filename:infile in 129 + let wfd = Unix.openfile outfile [ O_CREAT; O_WRONLY; O_TRUNC ] 0o600 in 130 + let w = Anon_writer.create wfd (Anon_reader.info r) in 131 + Anon_reader.iter r (Anon_writer.put_event w); 132 + Anon_writer.flush w; 133 + Unix.close wfd; 134 + Anon_reader.close r 135 + ;; 136 + 137 + let () = 138 + if Array.length Sys.argv <> 3 139 + then Printf.fprintf stderr "Usage: %s <input> <output>\n" Sys.executable_name 140 + else strip_names Sys.argv.(1) Sys.argv.(2) 141 + ;;
+78
bin/subsample.ml
··· 1 + open Memtrace.Trace 2 + 3 + let copy inf outf fact tstart tend = 4 + let id_remap = Obj_id.Tbl.create 20 in 5 + let r = Reader.open_ ~filename:inf in 6 + let wfd = Unix.openfile outf [ O_CREAT; O_WRONLY; O_TRUNC ] 0o600 in 7 + let info = Reader.info r in 8 + let info = { info with sample_rate = info.sample_rate /. float_of_int fact } in 9 + let pid = info.pid in 10 + let w = Writer.Multiplexed_domains.create wfd ~getpid:(fun () -> pid) info in 11 + Reader.iter r (fun now ev -> 12 + let ev : Event.t option = 13 + match ev with 14 + | Alloc ({ obj_id; _ } as e) -> 15 + let now_i = Int64.to_int (Int64.div (Timedelta.to_int64 now) 1_000_000L) in 16 + if tstart <= now_i && now_i < tend 17 + then ( 18 + let samples = ref 0 in 19 + for _i = 1 to e.nsamples do 20 + if Random.int fact = 0 then incr samples 21 + done; 22 + if !samples > 0 23 + then ( 24 + let id = Writer.Multiplexed_domains.next_alloc_id w ~domain:e.domain in 25 + let ev = Event.Alloc { e with obj_id = id; nsamples = !samples } in 26 + Obj_id.Tbl.add id_remap obj_id id; 27 + Some ev) 28 + else None) 29 + else None 30 + | Promote (id, dom) -> 31 + if Obj_id.Tbl.mem id_remap id 32 + then Some (Promote (Obj_id.Tbl.find id_remap id, dom)) 33 + else None 34 + | Collect (id, dom) -> 35 + if Obj_id.Tbl.mem id_remap id 36 + then ( 37 + let id' = Obj_id.Tbl.find id_remap id in 38 + Obj_id.Tbl.remove id_remap id; 39 + Some (Event.Collect (id', dom))) 40 + else None 41 + in 42 + match ev with 43 + | None -> () 44 + | Some ev -> 45 + Writer.Multiplexed_domains.put_event 46 + w 47 + ~decode_callstack_entry:(fun loc -> Reader.lookup_location_code r loc) 48 + (Timedelta.offset info.start_time now) 49 + ev); 50 + Reader.close r; 51 + Writer.Multiplexed_domains.flush w; 52 + Unix.close wfd 53 + ;; 54 + 55 + let parseint f = 56 + match int_of_string f with 57 + | n -> n 58 + | exception _ -> 59 + Printf.fprintf stderr "argument must be an integer (got %s)\n" f; 60 + exit 1 61 + ;; 62 + 63 + let () = 64 + match Sys.argv with 65 + | [| _; inf; outf; fact |] -> 66 + let fact = parseint fact in 67 + copy inf outf fact min_int max_int 68 + | [| _; inf; outf; fact; tstart; tend |] -> 69 + let fact = parseint fact in 70 + let tstart = parseint tstart in 71 + let tend = parseint tend in 72 + copy inf outf fact tstart tend 73 + | _ -> 74 + Printf.fprintf 75 + stderr 76 + "usage: subsample <in> <out> <subsampling factor> [<start time> <end time>]\n%!"; 77 + exit 1 78 + ;;
+367
docs/internal.md
··· 1 + # The enthusiast's guide to Memtrace 2 + 3 + Memtrace is a tracing system for profiling the memory usage and 4 + allocation patterns of OCaml programs. This guide explains how it 5 + works internally, and describes some of the efficiency tricks it 6 + uses. (If your goal is learning to use Memtrace rather than learning 7 + how it works, this is not the document for you). 8 + 9 + Memtrace files consist of a sequence of timestamped GC events: 10 + allocations, promotions and collections. Each allocation event carries 11 + a backtrace, captured at the time of the allocation. Each promotion 12 + or collection event is associated with a specific allocation, allowing 13 + analysis of object lifetimes. 14 + 15 + A common source of trouble with tracing systems is that the resulting 16 + trace files can grow large, making them annoying to transport and slow 17 + to process. Traces can also be fragile, requiring the exact original 18 + binary (and any loaded shared libraries or plugins) to interpret 19 + correctly. 20 + 21 + Memtrace was specifically designed to avoid these issues, producing 22 + compact self-contained traces. It's instructive to compare the trace 23 + sizes with those produced by `perf record`. Perf is a very different 24 + system from memtrace, sampling based on time rather than GC 25 + behaviour. However, the resulting trace files are similar, as they 26 + also consist of a sequence of timestamped events with associated 27 + backtraces. 28 + 29 + The graph below shows the trace sizes in average bytes / sample, 30 + across several runs of a benchmark with 31 + different sample rates. The comparison is a little unfair to memtrace: 32 + first, memtrace stores more information per sample than perf 33 + (e.g. object length, whether an allocation was in the major heap). 34 + Second, each sample in memtrace may produce up to three concrete 35 + timestamped events (allocation + promotion + collection), and the 36 + sizes shown below are the sum of the three, compared to perf's single 37 + event per sample. 38 + 39 + ![Sizes of memtrace vs. perf traces](memtrace-vs-perf-sizes.png) 40 + 41 + As we can see, at the same sample rate memtrace traces are more than 42 + an order of magnitude smaller than perf's, no matter which of perf's 43 + three backtrace mechanisms (`lbr`, `fp` or `dwarf`) is in use. When 44 + perf's backtraces are disabled entirely (`perf-none` above), perf 45 + still produces somewhat larger files than memtrace, due to a less 46 + efficient binary format. 47 + 48 + To analyse these traces, perf requires access to the original 167 MB 49 + binary. Memtrace, on the other hand, stores enough debug info inline 50 + in the trace to recover full location information without needing the 51 + binary. In the benchmarks above, the total size of this debug info 52 + ranges from 10 KB to 70 KB (and is not shown in the graph above). 53 + 54 + There are two aspects to tracing with Memtrace: sampling allocations, 55 + and efficiently encoding the sampled data. 56 + 57 + 58 + ## Sampling allocations with Gc.Memprof 59 + 60 + The sampling engine behind memtrace is OCaml's new `Gc.Memprof` 61 + module. 62 + 63 + `Gc.Memprof` accepts a *sampling rate* and a set of callbacks. The 64 + sampling rate is in units of probability: at rate `1e-5`, each 65 + allocated word has a one in 10^5 chance of being sampled. Since the 66 + sampling rate is per word rather than per allocation, larger 67 + allocations are more likely to be sampled. 68 + 69 + Since each word allocated has an independent chance of being 70 + sampled, the number of words until the next sample is distributed 71 + according to a geometric distribution. `Gc.Memprof` draws from this 72 + geometric distribution, and takes a sample after that much memory has 73 + been allocated. 74 + 75 + OCaml uses a *bump-pointer* allocator on its minor heap: allocations 76 + work by subtracting the desired amount of memory from the minor heap 77 + pointer, and comparing to the minor heap limit. Rather than 78 + introducing an extra branch, `Gc.Memprof` works by changing the minor 79 + heap limit to the position of the next sample, as though the minor 80 + heap ended at that point. 81 + 82 + Eventually, an allocation will fail when it hits this limit, using 83 + OCaml's existing heap-limit check. `Gc.Memprof` then invokes a callback, 84 + and resets the minor heap limit by re-drawing from the geometric 85 + distribution. This minimises overhead: the only branch needed is the 86 + heap-limit check that's already present. 87 + 88 + When an allocation is sampled, it's added to an internal GC data 89 + structure so that `Gc.Memprof` can invoke callbacks when it is 90 + promoted to the major heap and/or collected. This way, the entire 91 + lifetime of a sampled object is visible. 92 + 93 + 94 + ### Comballoc 95 + 96 + OCaml's `comballoc` optimisation introduces a major complication to 97 + this scheme. Multiple consecutive allocations can be combined into a 98 + single allocation, with a single heap-limit test. Although these are 99 + allocated simultaneously, these allocations may not have the same 100 + lifetime. They may not even have the same backtrace, as they can be 101 + the result of different levels of inlining. 102 + 103 + The debug information used to convert backtraces to readable form was 104 + extended to deal with this case, maintaining multiple debug entries 105 + for a single allocation instruction when allocations are combined. 106 + This means that if an allocation of 2 words is combined with an 107 + adjacent allocation of 3 words, about 40% of the samples will be 108 + associated with the smaller object and about 60% with the larger. 109 + 110 + 111 + ### Collecting backtraces efficiently 112 + 113 + `Gc.Memprof` collects a backtrace on each sampled allocation, which is 114 + something that needs to be efficient. 115 + 116 + Collecting a backtrace efficiently at an arbitrary point in an 117 + arbitrary program is hard. There are three standard approaches, 118 + available as options in `perf record`: 119 + 120 + - `--call-graph=dwarf` uses the DWARF debugging information 121 + 122 + - `--call-graph=fp` follows a chain of frame pointers 123 + 124 + - `--call-graph=lbr` uses the Last Branch Record hardware support 125 + 126 + However, all of these have disadvantages: 127 + 128 + - DWARF exists to support debuggers, and so is designed for 129 + flexibility rather than speed. 130 + 131 + This flexibility is necessary to handle the hard cases of C stack 132 + frames: for instance, a C program can define a variable-length 133 + array of ints on the stack, and store its length (in ints, not 134 + bytes) in a callee-save register that later gets spilled. So, in 135 + order to work out the length of a C stack frame, you may need to 136 + walk several *other* frames to find the register and do some 137 + arbitrary arithmetic to compute the length. 138 + 139 + So, the DWARF format allows stack frames to attach an arbitrary 140 + program in the DWARF bytecode language to express how to find the 141 + next frame. Decoding these is slow and complicated, and perf 142 + doesn't try: when sampling, it simply copies a chunk of stack (8KB 143 + by default) to the trace, to decode offline. This results in very 144 + large traces, which can still be incomplete as OCaml programs often 145 + use more than 8KB of stack. 146 + 147 + - Frame pointers change the function calling convention to 148 + continually maintain a linked list of stack frames so that it's 149 + available should a backtrace be taken. This adds a small, but 150 + nonzero overhead to every function call and return, and reserves a 151 + register. It also requires a special build, as OCaml does not use 152 + frame pointers by default. 153 + 154 + - LBR is generally the best option, but has a couple of limitations: 155 + First, it requires hardware support which is not currently available on 156 + UID boxes, and second, where available it is implemented with a fixed-size 157 + hardware ring buffer (32 entries on Skylake). Should the ring 158 + buffer overflow, entries are dropped, so deep stacks cannot be 159 + accurately recorded. (However, there is some very recent 160 + experimental support in perf for [heuristically stitching together 161 + LBR stacks from different 162 + samples](https://lwn.net/Articles/802821/), which may help). 163 + 164 + Happily, `Gc.Memprof` doesn't have to collect a backtrace at an 165 + arbitrary point in an arbitrary program, but only a backtrace at an 166 + allocation site in an OCaml program. Because OCaml is a 167 + garbage-collected language with precise marking, it already needs to 168 + be able to traverse the stack at allocation sites. This traversal needs 169 + to be accurate and efficient, because it's done at every GC. It's 170 + implemented with a large hashtable, which maps every return address 171 + and allocation site to a `frame_descr`, which contains the length of 172 + the stack frame (amongst other information). Traversing the stack 173 + using this hashtable is almost as efficient as following frame 174 + pointers, but without the runtime overhead. 175 + 176 + 177 + ## Compressing traces with memtrace 178 + 179 + The job of the memtrace library is to take the information collected 180 + by `Gc.Memprof` and stream it efficiently to a file. The largest 181 + portion of this data by far is the backtraces associated with each 182 + allocation. 183 + 184 + Backtraces arrive as a sequence of entries (opaque 64-bit integers), 185 + which you can think of as return addresses. (They're actually pointers 186 + to a static structure generated by the OCaml compiler, which includes 187 + the return address and a couple of other bits of information). Each of 188 + these entries can be *decoded* into locations, which contain a source 189 + filename, line/column position, and function name. Note that a single 190 + entry may yield several locations, due to inlining. 191 + 192 + Naively encoding these backtraces would be expensive. OCaml programs 193 + often have deep stacks, with dozens or hundreds of entries in a 194 + backtrace. Even though the entries themselves are only 8 bytes long, 195 + the location information can be much larger. (Remember, it is a design 196 + goal of memtrace to include all relevant location information in the 197 + trace, rather than relying on the user to hold onto the original binary) 198 + 199 + Instead, memtrace uses a series of optimisations to encode backtraces 200 + efficiently. 201 + 202 + ### Common prefixes 203 + 204 + The first optimisation is to observe that the backtraces from 205 + consecutive samples often have a common prefix. There are two reasons 206 + for this: 207 + 208 + - Consecutive samples are close together in time. Whatever the 209 + program was doing during the last sample, there's a good chance 210 + it's still doing it. 211 + 212 + - Many programs have some driver structure wrapping the main 213 + function, causing there to be the same few entries at 214 + the start of nearly every single backtrace. 215 + 216 + So, the first thing memtrace does is compare the start of each 217 + backtrace to the start of the last. The number of common entries is 218 + written to the trace file, and those entries are skipped entirely, 219 + saving space. 220 + 221 + 222 + ### Location information and caching 223 + 224 + Since the location information is much larger than the 8-byte 225 + backtrace entry, it's important to avoid encoding it redundantly. 226 + Memtrace keeps a cache of previously-seen backtrace entries, and 227 + location information is only decoded and written to the trace file on 228 + cache misses. In other words, location information need only be 229 + written to a trace file the first time a given location is seen. Since 230 + most backtraces hit a relatively small set of locations, this cache 231 + has an extremely high hit rate. 232 + 233 + There are also some minor optimisations in writing the location 234 + information: first, the line and column numbers are written in the 235 + same bit-packed format that OCaml uses internally, and second, file 236 + and function names use a simple 31-element move-to-front coder, so 237 + that file and function names need not be written if they are among the 238 + 31 most recently used names. 239 + 240 + 241 + ### Cache design 242 + 243 + In order to ensure good performance and to bound memory use, memtrace 244 + uses a fixed-size cache rather than letting it grow without bound. 245 + This does mean that it's possible for the location information about 246 + an entry to be written redundantly: if an entry is evicted from the 247 + cache then its locations will have to be written again the next time 248 + it's seen. However, this does not happen much in practice. 249 + 250 + For speed, the cache is a not a full LRU cache, but rather a two-way 251 + skewed-associative cache. In this cache design, each entry is hashed 252 + twice, giving two cache buckets. If either contains the entry, it's a 253 + hit, otherwise the older of the two is evicted. 254 + 255 + 256 + ### Encoding cache indices 257 + 258 + The fixed-size cache has 16384 (i.e. `2^14`) entries. In the 259 + overwhelmingly common case of a cache hit, memtrace writes the bucket 260 + index rather than the entry to the trace file. This means that the 261 + 64-bit backtrace entry is instead encoded in 14 bits. 262 + 263 + The full 64-bit backtrace entries need only be written during cache 264 + misses, along with the cache bucket index into which they're being 265 + inserted. This ensures there's enough information in the trace for the 266 + trace reader to reconstruct the cache state and know which buckets 267 + map to which backtrace entries. 268 + 269 + The 14-bit cach bucket index word is written in two bytes, leaving a 270 + two-bit tag having four possible states. One of these states is used 271 + for cache misses, but the other three are used for a further 272 + optimisation: prediction. 273 + 274 + 275 + ### Prediction 276 + 277 + It is often possible to predict the next entry in a backtrace: most of 278 + the time, the target of a function call is the same as it was last 279 + time. So, memtrace maintains an extra field in its cache: in each 280 + cache bucket, we also store the cache bucket of the entry that 281 + previously followed this one. This is used as a prediction: if 282 + correct, we can encode backtraces more compactly. 283 + 284 + Specifically, after every cache hit, we follow the chain of 285 + predictions until it mispredicts something. The cache is then encoded 286 + as one of three cases: 0 correct predictions, 1 correct prediction, or 287 + up to 256 correct predictions as specified by a supplemental byte. 288 + 289 + This encoding is extremely effective for long chains of 290 + non-tail-recursive functions. For instance, suppose we are encoding a 291 + backtrace that's using the OCaml stdlib's non-tail-recursive 292 + `List.map`, where the backtrace consists of 200 frames of `List.map` 293 + followed by one frame of some other function doing an allocation. 294 + 295 + This backtrace gets encoded in 7 bytes: 296 + 297 + - 2 bytes: the first `List.map` frame and 0 correct predictions. 298 + (assuming we get unlucky with the prediction here) 299 + 300 + - 2 bytes: the next `List.map` frame. The prediction for `List.map` 301 + was updated by the previous frame, so now predicts correctly. 302 + 303 + - 1 byte: the number 198, indicating 198 correct predictions 304 + following the previous frame. 305 + 306 + - 2 bytes: the actual function doing the allocation. 307 + 308 + (Of course, it's possible that we get lucky on the first frame, and 309 + encode this backtrace in 5 bytes) 310 + 311 + 312 + ### More tricks? 313 + 314 + There is a lot more that could be done here. The prediction model is 315 + very simplistic, always predicting whichever way it went last time. 316 + There is much room for improvement: 317 + 318 + - we could allow multiple possible predictions, allowing more than 319 + one possibility to be encoded efficiently (eventually leading to 320 + PPM-style compressors) 321 + 322 + - we could use more than one previous frame as context for the next, 323 + or do LZ77-style substring matches. 324 + 325 + - we could allow weak/strong predictions as used in CPU branch 326 + predictors, where a prediction that's worked in the past must be 327 + wrong more than once to be replaced. 328 + 329 + However, even the relatively simple caching / prediction model 330 + currently in use is highly effective: the average encoded size is 331 + on the order of 10 bytes for an entire backtrace. 332 + 333 + 334 + ### Other optimisations 335 + 336 + In fact, with the optimisations above backtraces became sufficiently 337 + short that the fixed-size fields of the trace (timestamps, etc.) 338 + started to become a significant fraction of the total file size. So, 339 + memtrace currently does a number of packing tricks to keep size down 340 + there too: 341 + 342 + 343 + - Integers that are usually small use a variable-length encoding, 344 + meaning most of them fit in one byte. 345 + 346 + - Allocation IDs are used in promotion and collection events to 347 + specify which block was collected. They have a relative encoding: 348 + instead of "collect allocation N", they encode "collect the Nth 349 + most recent allocation". This means that N is usually small and 350 + profits from the variable-length integer encoding. 351 + 352 + - The most common cases (small allocations on the minor heap, with 353 + only one sample, and <= 256 backtrace entries) are given special 354 + event codes to save space. 355 + 356 + - Timestamps are truncated, allowing the timestamp and the event 357 + type field to fit in the same 32-bit word. Short timestamps 358 + overflow after just over 30 seconds, but the format ensures there 359 + is at least one full 64-bit timestamp every 30 seconds (in a CTF 360 + packet header), so all timestamps can be decoded unambiguously. 361 + 362 + 363 + ### For more 364 + 365 + If you're curious about the exact format, go look at 366 + `src/memtrace.tsdl` in the memtrace sources, which specifies the whole 367 + format in CTF's TSDL language.
docs/memtrace-vs-perf-sizes.png

This is a binary file and will not be displayed.

+1
dune-project
··· 1 + (lang dune 3.17)
+21
memtrace.opam
··· 1 + opam-version: "2.0" 2 + maintainer: "Jane Street developers" 3 + authors: ["Jane Street Group, LLC"] 4 + homepage: "https://github.com/janestreet/memtrace" 5 + bug-reports: "https://github.com/janestreet/memtrace/issues" 6 + dev-repo: "git+https://github.com/janestreet/memtrace.git" 7 + doc: "https://ocaml.janestreet.com/ocaml-core/latest/doc/memtrace/index.html" 8 + license: "MIT" 9 + build: [ 10 + ["dune" "build" "-p" name "-j" jobs] 11 + ] 12 + depends: [ 13 + "ocaml" {>= "4.11.0"} 14 + "base-threads" 15 + "dune" {>= "3.17.0"} 16 + ] 17 + available: arch != "arm32" & arch != "x86_32" 18 + synopsis: "Streaming client for Memprof" 19 + description: " 20 + Generates compact traces of a program's memory use. 21 + "
+242
src/backtrace_codec.ml
··· 1 + let cache_size = 1 lsl 14 2 + 3 + type cache_bucket = int (* 0 to cache_size - 1 *) 4 + 5 + module Writer = struct 6 + open Buf.Write 7 + 8 + (* The writer cache carries slightly more state than the reader cache, 9 + since the writer must make decisions about which slot to use. 10 + (The reader just follows the choices made by the writer) *) 11 + type t = 12 + { cache : int array 13 + ; cache_date : int array 14 + ; (* when an entry was added to the cache (used for eviction) *) 15 + cache_next : cache_bucket array 16 + ; (* last time we saw this entry, which entry followed it? *) 17 + mutable next_verify_ix : int 18 + } 19 + 20 + let create () = 21 + { cache = Array.make cache_size 0 22 + ; cache_date = Array.make cache_size 0 23 + ; cache_next = Array.make cache_size 0 24 + ; next_verify_ix = 4242 25 + } 26 + ;; 27 + 28 + let max_length = 4096 29 + 30 + let put_backtrace 31 + cache 32 + b 33 + ~alloc_id 34 + ~callstack 35 + ~callstack_pos 36 + ~callstack_len 37 + ~log_new_location 38 + = 39 + let max_entry = 2 + 8 in 40 + let limit = b.pos + max_length - max_entry in 41 + let put_hit b bucket ncorrect = 42 + match ncorrect with 43 + | 0 -> put_16 b (bucket lsl 2) 44 + | 1 -> put_16 b ((bucket lsl 2) lor 1) 45 + | n -> 46 + put_16 b ((bucket lsl 2) lor 2); 47 + put_8 b n 48 + in 49 + let rec code_no_prediction predictor pos ncodes = 50 + if pos < callstack_pos || b.pos > limit 51 + then ncodes 52 + else ( 53 + let mask = cache_size - 1 in 54 + let slot = callstack.(pos) in 55 + (* Pick the least recently used of two slots, selected by two 56 + different hashes. *) 57 + let hash1 = ((slot * 0x4983723) lsr 11) land mask in 58 + let hash2 = ((slot * 0xfdea731) lsr 21) land mask in 59 + if cache.cache.(hash1) = slot 60 + then code_cache_hit predictor hash1 pos ncodes 61 + else if cache.cache.(hash2) = slot 62 + then code_cache_hit predictor hash2 pos ncodes 63 + else ( 64 + (* cache miss *) 65 + log_new_location ~index:pos; 66 + let bucket = 67 + if cache.cache_date.(hash1) < cache.cache_date.(hash2) then hash1 else hash2 68 + in 69 + (* Printf.printf "miss %05d %016x\n%!" 70 + bucket slot; (*" %016x\n%!" bucket slot;*) *) 71 + cache.cache.(bucket) <- slot; 72 + cache.cache_date.(bucket) <- alloc_id; 73 + cache.cache_next.(predictor) <- bucket; 74 + put_16 b ((bucket lsl 2) lor 3); 75 + put_64 b (Int64.of_int slot); 76 + code_no_prediction bucket (pos - 1) (ncodes + 1))) 77 + and code_cache_hit predictor hit pos ncodes = 78 + (* Printf.printf "hit %d\n" hit; *) 79 + cache.cache_date.(hit) <- alloc_id; 80 + cache.cache_next.(predictor) <- hit; 81 + code_with_prediction hit hit 0 (pos - 1) (ncodes + 1) 82 + and code_with_prediction orig_hit predictor ncorrect pos ncodes = 83 + assert (ncorrect < 256); 84 + if pos < callstack_pos || b.pos + 2 > limit 85 + then ( 86 + put_hit b orig_hit ncorrect; 87 + ncodes) 88 + else ( 89 + let slot = callstack.(pos) in 90 + let pred_bucket = cache.cache_next.(predictor) in 91 + if cache.cache.(pred_bucket) = slot 92 + then 93 + (* correct prediction *) 94 + (* Printf.printf "pred %d %d\n" pred_bucket ncorrect; *) 95 + if ncorrect = 255 96 + then ( 97 + (* overflow: code a new prediction block *) 98 + put_hit b orig_hit ncorrect; 99 + code_cache_hit predictor pred_bucket pos ncodes) 100 + else code_with_prediction orig_hit pred_bucket (ncorrect + 1) (pos - 1) ncodes 101 + else ( 102 + (* incorrect prediction *) 103 + put_hit b orig_hit ncorrect; 104 + code_no_prediction predictor pos ncodes)) 105 + in 106 + code_no_prediction 0 (callstack_len - 1) 0 107 + ;; 108 + 109 + let put_cache_verifier cache b = 110 + let ix = cache.next_verify_ix in 111 + cache.next_verify_ix <- (cache.next_verify_ix + 5413) land (cache_size - 1); 112 + put_16 b ix; 113 + put_16 b cache.cache_next.(ix); 114 + put_64 b (Int64.of_int cache.cache.(ix)) 115 + ;; 116 + 117 + let put_dummy_verifier b = 118 + put_16 b 0xffff; 119 + put_16 b 0; 120 + put_64 b 0L 121 + ;; 122 + end 123 + 124 + module Reader = struct 125 + open Buf.Read 126 + 127 + type t = 128 + { cache_loc : int array 129 + ; cache_pred : int array 130 + ; mutable last_backtrace : int array 131 + ; mutable last_backtrace_len : int 132 + } 133 + 134 + let create () = 135 + { cache_loc = Array.make cache_size 0 136 + ; cache_pred = Array.make cache_size 0 137 + ; last_backtrace = [||] 138 + ; last_backtrace_len = 0 139 + } 140 + ;; 141 + 142 + let[@inline never] realloc_bbuf bbuf pos (x : int) = 143 + assert (pos = Array.length bbuf); 144 + let new_size = Array.length bbuf * 2 in 145 + let new_size = if new_size < 32 then 32 else new_size in 146 + let new_bbuf = Array.make new_size x in 147 + Array.blit bbuf 0 new_bbuf 0 pos; 148 + new_bbuf 149 + ;; 150 + 151 + let[@inline] put_bbuf bbuf pos (x : int) = 152 + if pos < Array.length bbuf 153 + then ( 154 + Array.unsafe_set bbuf pos x; 155 + bbuf) 156 + else realloc_bbuf bbuf pos x 157 + ;; 158 + 159 + let get_backtrace ({ cache_loc; cache_pred; _ } as cache) b ~nencoded ~common_pfx_len = 160 + let rec decode pred bbuf pos = function 161 + | 0 -> bbuf, pos 162 + | i -> 163 + let codeword = get_16 b in 164 + let bucket = codeword lsr 2 165 + and tag = codeword land 3 in 166 + cache_pred.(pred) <- bucket; 167 + (match tag with 168 + | 0 -> 169 + (* cache hit, 0 prediction *) 170 + let bbuf = put_bbuf bbuf pos cache_loc.(bucket) in 171 + predict bucket bbuf (pos + 1) (i - 1) 0 172 + | 1 -> 173 + (* cache hit, 1 prediction *) 174 + let bbuf = put_bbuf bbuf pos cache_loc.(bucket) in 175 + predict bucket bbuf (pos + 1) (i - 1) 1 176 + | 2 -> 177 + (* cache hit, N prediction *) 178 + let ncorrect = get_8 b in 179 + let bbuf = put_bbuf bbuf pos cache_loc.(bucket) in 180 + predict bucket bbuf (pos + 1) (i - 1) ncorrect 181 + | _ -> 182 + (* cache miss *) 183 + let lit = Int64.to_int (get_64 b) in 184 + cache_loc.(bucket) <- lit; 185 + let bbuf = put_bbuf bbuf pos lit in 186 + decode bucket bbuf (pos + 1) (i - 1)) 187 + and predict pred bbuf pos i = function 188 + | 0 -> decode pred bbuf pos i 189 + | n -> 190 + let pred' = cache_pred.(pred) in 191 + let bbuf = put_bbuf bbuf pos cache_loc.(pred') in 192 + predict pred' bbuf (pos + 1) i (n - 1) 193 + in 194 + if common_pfx_len <= cache.last_backtrace_len 195 + then ( 196 + let bbuf, pos = decode 0 cache.last_backtrace common_pfx_len nencoded in 197 + cache.last_backtrace <- bbuf; 198 + cache.last_backtrace_len <- pos; 199 + bbuf, pos) 200 + else ( 201 + (* This can occur if the last backtrace was truncated, and the current 202 + backtrace shares a long prefix with it. Return the amount of backtrace 203 + that we have. (We still go through the motions of decoding to ensure 204 + that the location cache is updated correctly) *) 205 + let _bbuf, _pos = decode 0 [||] 0 nencoded in 206 + cache.last_backtrace, cache.last_backtrace_len) 207 + ;; 208 + 209 + let skip_backtrace _cache b ~nencoded ~common_pfx_len:_ = 210 + for _ = 1 to nencoded do 211 + let codeword = get_16 b in 212 + if codeword land 3 = 2 213 + then ignore (get_8 b) (* hitN *) 214 + else if codeword land 3 = 3 215 + then ignore (get_64 b) 216 + (* miss *) 217 + done 218 + ;; 219 + 220 + type cache_verifier = 221 + { ix : int 222 + ; pred : int 223 + ; value : Int64.t 224 + } 225 + 226 + let get_cache_verifier b = 227 + let ix = get_16 b in 228 + let pred = get_16 b in 229 + let value = get_64 b in 230 + { ix; pred; value } 231 + ;; 232 + 233 + let check_cache_verifier cache { ix; pred; value } = 234 + if ix <> 0xffff 235 + then 236 + 0 <= ix 237 + && ix < Array.length cache.cache_loc 238 + && cache.cache_pred.(ix) = pred 239 + && cache.cache_loc.(ix) = Int64.to_int value 240 + else true 241 + ;; 242 + end
+41
src/backtrace_codec.mli
··· 1 + module Writer : sig 2 + type t 3 + 4 + val create : unit -> t 5 + 6 + (* Maximum number of bytes written by put_backtrace *) 7 + val max_length : int 8 + 9 + val put_backtrace 10 + : t 11 + -> Buf.Write.t 12 + -> alloc_id:int 13 + -> callstack:int array 14 + -> callstack_pos:int 15 + -> callstack_len:int 16 + -> log_new_location:(index:int -> unit) 17 + -> int (* number of encoded slots *) 18 + 19 + val put_cache_verifier : t -> Buf.Write.t -> unit 20 + val put_dummy_verifier : Buf.Write.t -> unit 21 + end 22 + 23 + module Reader : sig 24 + type t 25 + 26 + val create : unit -> t 27 + 28 + val get_backtrace 29 + : t 30 + -> Buf.Read.t 31 + -> nencoded:int 32 + -> common_pfx_len:int 33 + -> int array * int 34 + 35 + val skip_backtrace : t -> Buf.Read.t -> nencoded:int -> common_pfx_len:int -> unit 36 + 37 + type cache_verifier 38 + 39 + val get_cache_verifier : Buf.Read.t -> cache_verifier 40 + val check_cache_verifier : t -> cache_verifier -> bool 41 + end
+303
src/buf.ml
··· 1 + module Shared_writer_fd = struct 2 + type t = 3 + { lock : Mutex.t 4 + ; closed : bool Atomic.t 5 + ; fd : Unix.file_descr 6 + } 7 + 8 + let make fd = { lock = Mutex.create (); closed = Atomic.make false; fd } 9 + 10 + exception Closed 11 + 12 + let rec write_fully fd buf ~pos ~len = 13 + if len = 0 14 + then () 15 + else ( 16 + let written = Unix.write fd buf pos len in 17 + write_fully fd buf ~pos:(pos + written) ~len:(len - written)) 18 + ;; 19 + 20 + let write_fully t buf ~pos ~len = 21 + Mutex.lock t.lock; 22 + Fun.protect 23 + (fun () -> 24 + if Atomic.get t.closed then raise Closed; 25 + write_fully t.fd buf ~pos ~len) 26 + ~finally:(fun () -> Mutex.unlock t.lock) 27 + ;; 28 + 29 + let close t = 30 + Mutex.lock t.lock; 31 + Atomic.set t.closed true; 32 + Mutex.unlock t.lock 33 + ;; 34 + end 35 + 36 + module Shared = struct 37 + type t = 38 + { buf : Bytes.t 39 + ; mutable pos : int 40 + ; pos_end : int 41 + } 42 + 43 + let of_bytes buf = { buf; pos = 0; pos_end = Bytes.length buf } 44 + let of_bytes_sub buf ~pos ~pos_end = { buf; pos; pos_end } 45 + let remaining b = b.pos_end - b.pos 46 + 47 + external bswap_16 : int -> int = "%bswap16" 48 + external bswap_32 : int32 -> int32 = "%bswap_int32" 49 + external bswap_64 : int64 -> int64 = "%bswap_int64" 50 + end 51 + 52 + module Write = struct 53 + include Shared 54 + 55 + let write_fd fd b = Shared_writer_fd.write_fully fd b.buf ~pos:0 ~len:b.pos 56 + let put_raw_8 b i v = Bytes.unsafe_set b i (Char.unsafe_chr v) 57 + 58 + external put_raw_16 : Bytes.t -> int -> int -> unit = "%caml_bytes_set16u" 59 + external put_raw_32 : Bytes.t -> int -> int32 -> unit = "%caml_bytes_set32u" 60 + external put_raw_64 : Bytes.t -> int -> int64 -> unit = "%caml_bytes_set64u" 61 + 62 + exception Overflow of int 63 + 64 + let[@inline never] overflow b = Overflow b.pos 65 + 66 + let[@inline always] put_8 b v = 67 + let pos = b.pos in 68 + let pos' = b.pos + 1 in 69 + if pos' > b.pos_end 70 + then raise (overflow b) 71 + else ( 72 + put_raw_8 b.buf pos v; 73 + b.pos <- pos') 74 + ;; 75 + 76 + let[@inline always] put_16 b v = 77 + let pos = b.pos in 78 + let pos' = b.pos + 2 in 79 + if pos' > b.pos_end 80 + then raise (overflow b) 81 + else ( 82 + put_raw_16 b.buf pos (if Sys.big_endian then bswap_16 v else v); 83 + b.pos <- pos') 84 + ;; 85 + 86 + let[@inline always] put_32 b v = 87 + let pos = b.pos in 88 + let pos' = b.pos + 4 in 89 + if pos' > b.pos_end 90 + then raise (overflow b) 91 + else ( 92 + put_raw_32 b.buf pos (if Sys.big_endian then bswap_32 v else v); 93 + b.pos <- pos') 94 + ;; 95 + 96 + let[@inline always] put_64 b v = 97 + let pos = b.pos in 98 + let pos' = b.pos + 8 in 99 + if pos' > b.pos_end 100 + then raise (overflow b) 101 + else ( 102 + put_raw_64 b.buf pos (if Sys.big_endian then bswap_64 v else v); 103 + b.pos <- pos') 104 + ;; 105 + 106 + let[@inline always] put_float b f = put_64 b (Int64.bits_of_float f) 107 + 108 + let put_string b s = 109 + let slen = 110 + match String.index_opt s '\000' with 111 + | Some i -> i 112 + | None -> String.length s 113 + in 114 + if b.pos + slen + 1 > b.pos_end then raise (overflow b); 115 + Bytes.blit_string s 0 b.buf b.pos slen; 116 + Bytes.unsafe_set b.buf (b.pos + slen) '\000'; 117 + b.pos <- b.pos + slen + 1 118 + ;; 119 + 120 + let[@inline never] put_vint_big b v = 121 + if v = v land 0xffff 122 + then ( 123 + put_8 b 253; 124 + put_16 b v) 125 + else if v = Int32.to_int (Int32.of_int v) 126 + then ( 127 + put_8 b 254; 128 + put_32 b (Int32.of_int v)) 129 + else ( 130 + put_8 b 255; 131 + put_64 b (Int64.of_int v)) 132 + ;; 133 + 134 + let[@inline always] put_vint b v = 135 + if 0 <= v && v <= 252 then put_8 b v else put_vint_big b v 136 + ;; 137 + 138 + type position_8 = int 139 + type position_16 = int 140 + type position_32 = int 141 + type position_64 = int 142 + type position_float = int 143 + 144 + let[@inline always] skip_8 b = 145 + let pos = b.pos in 146 + let pos' = b.pos + 1 in 147 + if pos' > b.pos_end then raise (overflow b); 148 + b.pos <- pos'; 149 + pos 150 + ;; 151 + 152 + let[@inline always] skip_16 b = 153 + let pos = b.pos in 154 + let pos' = b.pos + 2 in 155 + if pos' > b.pos_end then raise (overflow b); 156 + b.pos <- pos'; 157 + pos 158 + ;; 159 + 160 + let[@inline always] skip_32 b = 161 + let pos = b.pos in 162 + let pos' = b.pos + 4 in 163 + if pos' > b.pos_end then raise (overflow b); 164 + b.pos <- pos'; 165 + pos 166 + ;; 167 + 168 + let[@inline always] skip_64 b = 169 + let pos = b.pos in 170 + let pos' = b.pos + 8 in 171 + if pos' > b.pos_end then raise (overflow b); 172 + b.pos <- pos'; 173 + pos 174 + ;; 175 + 176 + let skip_float = skip_64 177 + 178 + let update_8 b pos v = 179 + assert (pos + 1 <= b.pos_end); 180 + put_raw_8 b.buf pos v 181 + ;; 182 + 183 + let update_16 b pos v = 184 + assert (pos + 2 <= b.pos_end); 185 + put_raw_16 b.buf pos v 186 + ;; 187 + 188 + let update_32 b pos v = 189 + assert (pos + 4 <= b.pos_end); 190 + put_raw_32 b.buf pos v 191 + ;; 192 + 193 + let update_64 b pos v = 194 + assert (pos + 8 <= b.pos_end); 195 + put_raw_64 b.buf pos v 196 + ;; 197 + 198 + let update_float b pos f = update_64 b pos (Int64.bits_of_float f) 199 + end 200 + 201 + module Read = struct 202 + include Shared 203 + 204 + let rec read_into fd buf off = 205 + if off = Bytes.length buf 206 + then { buf; pos = 0; pos_end = off } 207 + else ( 208 + assert (0 <= off && off <= Bytes.length buf); 209 + let n = Unix.read fd buf off (Bytes.length buf - off) in 210 + if n = 0 211 + then (* EOF *) 212 + { buf; pos = 0; pos_end = off } 213 + else (* Short read *) 214 + read_into fd buf (off + n)) 215 + ;; 216 + 217 + let read_fd fd buf = read_into fd buf 0 218 + 219 + let refill_fd fd b = 220 + let len = remaining b in 221 + Bytes.blit b.buf b.pos b.buf 0 len; 222 + read_into fd b.buf len 223 + ;; 224 + 225 + let split b len = 226 + let len = min (remaining b) len in 227 + { b with pos_end = b.pos + len }, { b with pos = b.pos + len } 228 + ;; 229 + 230 + let empty = { buf = Bytes.make 0 '?'; pos = 0; pos_end = 0 } 231 + 232 + external get_raw_16 : Bytes.t -> int -> int = "%caml_bytes_get16u" 233 + external get_raw_32 : Bytes.t -> int -> int32 = "%caml_bytes_get32u" 234 + external get_raw_64 : Bytes.t -> int -> int64 = "%caml_bytes_get64u" 235 + 236 + exception Underflow of int 237 + 238 + let[@inline never] underflow b = Underflow b.pos 239 + 240 + let[@inline always] get_8 b = 241 + let pos = b.pos in 242 + let pos' = b.pos + 1 in 243 + if pos' > b.pos_end then raise (underflow b); 244 + b.pos <- pos'; 245 + Char.code (Bytes.unsafe_get b.buf pos) 246 + ;; 247 + 248 + let[@inline always] get_16 b = 249 + let pos = b.pos in 250 + let pos' = b.pos + 2 in 251 + if pos' > b.pos_end then raise (underflow b); 252 + b.pos <- pos'; 253 + if Sys.big_endian then bswap_16 (get_raw_16 b.buf pos) else get_raw_16 b.buf pos 254 + ;; 255 + 256 + let[@inline always] get_32 b = 257 + let pos = b.pos in 258 + let pos' = b.pos + 4 in 259 + if pos' > b.pos_end then raise (underflow b); 260 + b.pos <- pos'; 261 + if Sys.big_endian then bswap_32 (get_raw_32 b.buf pos) else get_raw_32 b.buf pos 262 + ;; 263 + 264 + let[@inline always] get_64 b = 265 + let pos = b.pos in 266 + let pos' = b.pos + 8 in 267 + if pos' > b.pos_end then raise (underflow b); 268 + b.pos <- pos'; 269 + if Sys.big_endian then bswap_64 (get_raw_64 b.buf pos) else get_raw_64 b.buf pos 270 + ;; 271 + 272 + let[@inline always] get_float b = Int64.float_of_bits (get_64 b) 273 + 274 + let get_string b = 275 + let start = b.pos in 276 + while get_8 b <> 0 do 277 + () 278 + done; 279 + let len = b.pos - 1 - start in 280 + Bytes.sub_string b.buf start len 281 + ;; 282 + 283 + let[@inline never] get_vint_big b c = 284 + match c with 285 + | 253 -> get_16 b 286 + | 254 -> Int32.to_int (get_32 b) 287 + | 255 -> Int64.to_int (get_64 b) 288 + | _ -> assert false 289 + ;; 290 + 291 + let[@inline always] get_vint b = 292 + match get_8 b with 293 + | c when c < 253 -> c 294 + | c -> get_vint_big b c 295 + ;; 296 + end 297 + 298 + let () = 299 + (Printexc.register_printer [@ocaml.alert "-unsafe_multidomain"]) (function 300 + | Write.Overflow n -> Some ("Buffer overflow at position " ^ string_of_int n) 301 + | Read.Underflow n -> Some ("Buffer underflow at position " ^ string_of_int n) 302 + | _ -> None) 303 + ;;
+125
src/buf.mli
··· 1 + module Shared_writer_fd : sig 2 + (** A Unix file descriptor, shared between multiple writers *) 3 + type t 4 + 5 + exception Closed 6 + 7 + val make : Unix.file_descr -> t 8 + 9 + (** Writes the specified range of a buffer to the file descriptor, raising Closed if 10 + [close] has been called *) 11 + val write_fully : t -> bytes -> pos:int -> len:int -> unit 12 + 13 + val close : t -> unit 14 + end 15 + 16 + module Write : sig 17 + (** A [t] is a subsequence of a Bytes.t, to be written sequentially. 18 + 19 + None of the operations below allocate or resize the underlying byte buffer - the 20 + underlying Bytes.t is managed by the caller, and may be shared between mutiple [t]s *) 21 + type t = private 22 + { buf : Bytes.t 23 + ; mutable pos : int 24 + ; pos_end : int 25 + } 26 + 27 + val of_bytes : Bytes.t -> t 28 + val of_bytes_sub : Bytes.t -> pos:int -> pos_end:int -> t 29 + val remaining : t -> int 30 + 31 + (** [write_fd fd b] writes the bytes written to b to the fd. 32 + 33 + No bufs are invalidated *) 34 + val write_fd : Shared_writer_fd.t -> t -> unit 35 + 36 + (** Writing to a buf. All types are written little-endian. All functions raise Overflow 37 + if there is insufficient space remaining *) 38 + 39 + exception Overflow of int 40 + 41 + val put_8 : t -> int -> unit 42 + val put_16 : t -> int -> unit 43 + val put_32 : t -> int32 -> unit 44 + val put_64 : t -> int64 -> unit 45 + val put_float : t -> float -> unit 46 + val put_string : t -> string -> unit 47 + val put_vint : t -> int -> unit 48 + 49 + (** The skip_t functions reserve space to be filled by a later update_t. (for e.g. 50 + length fields that are only known when writing is finished) *) 51 + 52 + type position_8 = private int 53 + 54 + val skip_8 : t -> position_8 55 + val update_8 : t -> position_8 -> int -> unit 56 + 57 + type position_16 = private int 58 + 59 + val skip_16 : t -> position_16 60 + val update_16 : t -> position_16 -> int -> unit 61 + 62 + type position_32 = private int 63 + 64 + val skip_32 : t -> position_32 65 + val update_32 : t -> position_32 -> int32 -> unit 66 + 67 + type position_64 = private int 68 + 69 + val skip_64 : t -> position_64 70 + val update_64 : t -> position_64 -> int64 -> unit 71 + 72 + type position_float = private int 73 + 74 + val skip_float : t -> position_float 75 + val update_float : t -> position_float -> float -> unit 76 + end 77 + 78 + module Read : sig 79 + (** A [t] is a subsequence of a Bytes.t, to be read sequentially. 80 + 81 + None of the operations below allocate or resize the underlying byte buffer - the 82 + underlying Bytes.t is managed by the caller, and may be shared between mutiple [t]s *) 83 + type t = private 84 + { buf : Bytes.t 85 + ; mutable pos : int 86 + ; pos_end : int 87 + } 88 + 89 + val of_bytes : Bytes.t -> t 90 + val of_bytes_sub : Bytes.t -> pos:int -> pos_end:int -> t 91 + val remaining : t -> int 92 + 93 + (** [split_buf b len] splits b into (a, b), where a contains at most len bytes and b 94 + contains the rest, if any. 95 + 96 + The two returned parts share the same underlying Bytes.t *) 97 + val split : t -> int -> t * t 98 + 99 + (** [read_fd fd byt] returns a buf containing bytes read from fd, whose underlying 100 + buffer is byt. 101 + 102 + All [t]s sharing this underlying buffer are invalidated *) 103 + val read_fd : Unix.file_descr -> Bytes.t -> t 104 + 105 + (** [refill_fd fd b] returns a buf containing the contents of b followed by bytes read 106 + from fd, whose underlying buffer is that of b. 107 + 108 + All bufs sharing this underlying buffer (including b) are invalidated *) 109 + val refill_fd : Unix.file_descr -> t -> t 110 + 111 + val empty : t 112 + 113 + (** Reading from a buf. All types are read little-endian. All functions raise Underflow 114 + if there are insufficient bytes remaining. *) 115 + 116 + exception Underflow of int 117 + 118 + val get_8 : t -> int 119 + val get_16 : t -> int 120 + val get_32 : t -> int32 121 + val get_64 : t -> int64 122 + val get_float : t -> float 123 + val get_string : t -> string 124 + val get_vint : t -> int (* NB: may overflow if read on 32-bit machines *) 125 + end
+5
src/dune
··· 1 + (library 2 + (name memtrace) 3 + (public_name memtrace) 4 + (libraries unix threads) 5 + (preprocess no_preprocessing))
+44
src/geometric_sampler.ml
··· 1 + type t = 2 + { rand : Random.State.t 3 + ; one_log1m_lambda : float 4 + } 5 + 6 + let default_rand () = 7 + Random.State.make [| 0x52b87efb; 0x332235ea; 0x5f813723; 0x057b9dff |] 8 + ;; 9 + 10 + let make ?(rand = default_rand ()) ~sampling_rate () = 11 + let one_log1m_lambda = 12 + if sampling_rate >= 1. then 0. else 1. /. log1p (-.sampling_rate) 13 + in 14 + { rand; one_log1m_lambda } 15 + ;; 16 + 17 + let copy ?(rand = default_rand ()) t = { rand; one_log1m_lambda = t.one_log1m_lambda } 18 + 19 + (* port of log_approx in runtime/memprof.c to OCaml 20 + see https://github.com/ocaml/ocaml/pull/9466 21 + 22 + slightly different results because of double precision, 23 + however, this difference is ~100x smaller than the 24 + error introduced by the original approximation *) 25 + 26 + let log_approx n = 27 + let f = Int64.bits_of_float Int32.(to_float (add one (shift_left n 1))) in 28 + let exp = Int64.(to_int (shift_right f 52)) in 29 + let exp = float_of_int (exp + (127 - 1023 + 1)) in 30 + let x = Int64.(float_of_bits (logor (logand f 0xfffffffffffffL) 0x3ff0000000000000L)) in 31 + (x *. (2.104659476859 +. (x *. (-0.720478916626 +. (x *. 0.107132064797))))) 32 + +. (-111.701724334061 +. (0.6931471805 *. exp)) 33 + ;; 34 + 35 + (* Draw from the geometric distribution by: 36 + (1) Draw a uniform random number (via Random.State.bits) 37 + (2) Construct an exponentially-distributed random variable as the log of (1) 38 + (3) Scale the distribution of (2) to have the correct mean by multiplying 39 + (4) Convert to a geometric distribution by taking the floor + 1 *) 40 + let draw s = 41 + let uniform_rand = Int32.of_int (Random.State.bits s.rand) in 42 + let exp_rand = log_approx uniform_rand in 43 + 1 + int_of_float (exp_rand *. s.one_log1m_lambda) 44 + ;;
+28
src/geometric_sampler.mli
··· 1 + (** An efficient sampler for geometrically distributed random variables. (Port to OCaml of 2 + the statmemprof sampler used in the OCaml runtime) 3 + 4 + A sampler is parameterised by a sampling_rate λ, and simulates a long sequence of 5 + flips of a biased coin that has probability λ of coming up heads. 6 + 7 + The result of [draw] is an integer distributed as the length of the gaps between two 8 + successive heads. This is a geometrically distributed random variable: the expected 9 + value is 1/λ, and the probability that draw returns k is (1-λ)^(k-1) * λ (that is, the 10 + probability of (k-1) tails followed by a heads). 11 + 12 + NB: Note that here, we're adopting the convention that two consecutive heads counts as 13 + a gap of length 1. In other words, this distribution is the number of times you need 14 + to flip the coin to see heads, rather than the number of tails you'll see while doing 15 + so (which is one less). This means that the result is never zero. This convention is 16 + not universal, and some authers use "geometrically distributed" for the zero-based 17 + distribution. *) 18 + type t 19 + 20 + (** Create a sampler with a given sampling rate and randomness source. The default for 21 + [rand] uses a constant seed, giving deterministic results *) 22 + val make : ?rand:Random.State.t -> sampling_rate:float -> unit -> t 23 + 24 + val copy : ?rand:Random.State.t -> t -> t 25 + 26 + (** Returns a geometrically-distributed random integer in the range [1..inf) 27 + with mean 1/sampling_rate *) 28 + val draw : t -> int
+225
src/location_codec.ml
··· 1 + open Buf 2 + 3 + (* Move-to-front codec *) 4 + module Mtf_table : sig 5 + type 'a t 6 + 7 + val create : unit -> 'a t 8 + 9 + type index = int 10 + 11 + val not_found : index 12 + val encode : 'a t -> if_absent:(unit -> 'a) -> string -> index * 'a 13 + val decode : 'a t -> if_absent:(unit -> string * 'a) -> index -> string * 'a 14 + val last : 'a t -> 'a option 15 + end = struct 16 + type 'a entry = 17 + | Empty 18 + | Full of string * 'a 19 + 20 + type 'a t = 'a entry array 21 + 22 + let length = 31 23 + let create () = Array.make length Empty 24 + 25 + type index = int 26 + 27 + let not_found = length 28 + 29 + let swap mtf i = 30 + assert (i <> not_found); 31 + let e = 32 + match mtf.(i) with 33 + | Empty -> assert false 34 + | Full _ as e -> e 35 + in 36 + Array.blit mtf 0 mtf 1 i; 37 + mtf.(0) <- e 38 + ;; 39 + 40 + let push mtf k v = 41 + Array.blit mtf 0 mtf 1 (length - 1); 42 + mtf.(0) <- Full (k, v) 43 + ;; 44 + 45 + let encode mtf ~if_absent x = 46 + let rec go mtf x i = 47 + if i = length 48 + then ( 49 + let v = if_absent () in 50 + push mtf x v; 51 + not_found, v) 52 + else ( 53 + match mtf.(i) with 54 + | Empty -> 55 + let v = if_absent () in 56 + push mtf x v; 57 + not_found, v 58 + | Full (k, v) when String.equal k x -> 59 + swap mtf i; 60 + i, v 61 + | Full _ -> go mtf x (i + 1)) 62 + in 63 + go mtf x 0 64 + ;; 65 + 66 + let decode mtf ~if_absent i = 67 + if i = not_found 68 + then ( 69 + let ((k, v) as kv) = if_absent () in 70 + push mtf k v; 71 + kv) 72 + else ( 73 + swap mtf i; 74 + match mtf.(0) with 75 + | Empty -> assert false 76 + | Full (k, v) -> k, v) 77 + ;; 78 + 79 + let last mtf = 80 + match mtf.(length - 1) with 81 + | Empty -> None 82 + | Full (_, v) -> Some v 83 + ;; 84 + end 85 + 86 + (** Source locations *) 87 + module Location = struct 88 + type t = 89 + { filename : string 90 + ; line : int 91 + ; start_char : int 92 + ; end_char : int 93 + ; defname : string 94 + } 95 + 96 + let to_string { filename; line; start_char; end_char; defname } = 97 + Printf.sprintf "%s@%s:%d:%d-%d" defname filename line start_char end_char 98 + ;; 99 + 100 + let unknown = 101 + { filename = "<unknown>"; line = 1; start_char = 1; end_char = 1; defname = "??" } 102 + ;; 103 + end 104 + 105 + type state = unit Mtf_table.t Mtf_table.t 106 + 107 + module Writer = struct 108 + open Buf.Write 109 + 110 + type t = state 111 + 112 + let create () = Mtf_table.create () 113 + let max_length = 4 * 1024 114 + 115 + let put_location (file_mtf : t) b (id, locs) = 116 + let total_size_max = 117 + (* Worst-case size, assuming no MTF hits *) 118 + List.fold_left 119 + (fun sz (loc : Location.t) -> 120 + sz + 6 + (String.length loc.filename + 1) + (String.length loc.defname + 1)) 121 + (8 + 1) 122 + locs 123 + in 124 + let no_truncation = List.length locs <= 255 && total_size_max <= max_length in 125 + let locs = if no_truncation then locs else [ Location.unknown ] in 126 + let start_pos = b.Write.pos in 127 + put_64 b (Int64.of_int id); 128 + put_8 b (List.length locs); 129 + locs 130 + |> List.iter (fun (loc : Location.t) -> 131 + let clamp n lim = if n < 0 || n > lim then lim else n in 132 + let line_number = 133 + (* 20 bits *) 134 + clamp loc.line 0xfffff 135 + in 136 + let start_char = 137 + (* 8 bits *) 138 + clamp loc.start_char 0xff 139 + in 140 + let end_char = 141 + (* 10 bits *) 142 + clamp loc.end_char 0x3ff 143 + in 144 + let filename_code, defn_mtf = 145 + Mtf_table.encode file_mtf ~if_absent:Mtf_table.create loc.filename 146 + in 147 + let defname_code, () = 148 + Mtf_table.encode defn_mtf ~if_absent:(fun () -> ()) loc.defname 149 + in 150 + let encoded = 151 + Int64.( 152 + logor 153 + (of_int line_number) 154 + (logor 155 + (shift_left (of_int start_char) 20) 156 + (logor 157 + (shift_left (of_int end_char) (20 + 8)) 158 + (logor 159 + (shift_left (of_int (filename_code :> int)) (20 + 8 + 10)) 160 + (shift_left (of_int (defname_code :> int)) (20 + 8 + 10 + 5)))))) 161 + in 162 + put_32 b (Int64.to_int32 encoded); 163 + put_16 b Int64.(to_int (shift_right encoded 32)); 164 + if filename_code = Mtf_table.not_found then put_string b loc.filename; 165 + if defname_code = Mtf_table.not_found then put_string b loc.defname); 166 + if no_truncation then assert (b.pos - start_pos <= total_size_max) 167 + ;; 168 + end 169 + 170 + module Reader = struct 171 + open Buf.Read 172 + 173 + type t = state 174 + 175 + let create () = Mtf_table.create () 176 + 177 + let get_location (file_mtf : t) b = 178 + let id = Int64.to_int (get_64 b) in 179 + let nlocs = get_8 b in 180 + let locs = 181 + List.init nlocs (fun _ -> 182 + let low = get_32 b in 183 + let high = get_16 b in 184 + let encoded = 185 + Int64.(logor (shift_left (of_int high) 32) (logand (of_int32 low) 0xffffffffL)) 186 + in 187 + let line, start_char, end_char, filename_code, defname_code = 188 + Int64.( 189 + ( to_int (logand 0xfffffL encoded) 190 + , to_int (logand 0xffL (shift_right encoded 20)) 191 + , to_int (logand 0x3ffL (shift_right encoded (20 + 8))) 192 + , to_int (logand 0x1fL (shift_right encoded (20 + 8 + 10))) 193 + , to_int (logand 0x1fL (shift_right encoded (20 + 8 + 10 + 5))) )) 194 + in 195 + let filename, defn_mtf = 196 + Mtf_table.decode 197 + file_mtf 198 + ~if_absent:(fun () -> 199 + let s = get_string b in 200 + (* Reuse the defname MTF table that's about to be pushed off. 201 + This is only present to match a bug in the v001 encoder, 202 + which sometimes generated traces relying on this behaviour. 203 + The current encoder never relies on this, so once v001 204 + trace files stop mattering, this match can be deleted *) 205 + let d = 206 + match Mtf_table.last file_mtf with 207 + | Some v -> v 208 + | None -> Mtf_table.create () 209 + in 210 + s, d) 211 + filename_code 212 + in 213 + let defname, () = 214 + Mtf_table.decode 215 + defn_mtf 216 + ~if_absent:(fun () -> 217 + let s = get_string b in 218 + s, ()) 219 + defname_code 220 + in 221 + { Location.line; start_char; end_char; filename; defname }) 222 + in 223 + id, locs 224 + ;; 225 + end
+29
src/location_codec.mli
··· 1 + module Location : sig 2 + type t = 3 + { filename : string 4 + ; line : int 5 + ; start_char : int 6 + ; end_char : int 7 + ; defname : string 8 + } 9 + 10 + val to_string : t -> string 11 + val unknown : t 12 + end 13 + 14 + module Writer : sig 15 + type t 16 + 17 + val create : unit -> t 18 + val max_length : int 19 + 20 + (* put_location never writes more than max_length bytes *) 21 + val put_location : t -> Buf.Write.t -> int * Location.t list -> unit 22 + end 23 + 24 + module Reader : sig 25 + type t 26 + 27 + val create : unit -> t 28 + val get_location : t -> Buf.Read.t -> int * Location.t list 29 + end
+362
src/memprof_tracer.ml
··· 1 + open Stdlib_shim 2 + 3 + (* The tracer requires some tricky locking. 4 + 5 + It can be used both synchronously from user code (e.g. in calls to ext_alloc) 6 + and asynchronously from Gc.Memprof callbacks. Locks in async callbacks cannot 7 + block, as they might interrupt code which synchronously holds the lock, and 8 + so would deadlock if they blocked. 9 + 10 + Instead, users of lock_async must handle the case where the lock is already held 11 + synchronously by the current thread. In this case, the lock can't be taken, and 12 + instead the user must defer some work to be processed when the synchronous holder 13 + later releases the lock. 14 + 15 + In other words, the lock is nonreentrant: you can't take a lock you already hold. 16 + 17 + This module is implemented with nonatomic references, and should only be used on 18 + the main domain (that is, when [Domain.is_main_domain ()]). *) 19 + module Lock = struct 20 + type 'a t = 21 + { mutable locked : bool 22 + ; mutable locked_sync : bool 23 + ; mutable locked_sync_thread_id : int 24 + ; mutable stopped : bool 25 + ; mutable deferred : 'a list 26 + } 27 + 28 + let create () = 29 + { locked = false 30 + ; locked_sync = false 31 + ; locked_sync_thread_id = -1 32 + ; stopped = false 33 + ; deferred = [] 34 + } 35 + ;; 36 + 37 + let self_id () = Thread.id (Thread.self ()) 38 + 39 + (* lock_sync and unlock_sync form a spinlock with two extra features: 40 + - Work may be 'deferred' (see defer and lock_async below), to run at unlock time 41 + - There is a 'stopped' state (see unlock_and_stop), in which locking always fails *) 42 + 43 + type lock_sync_result = 44 + | Success 45 + | Is_stopped 46 + 47 + let[@inline never] rec lock_sync s : lock_sync_result = 48 + if s.stopped 49 + then Is_stopped 50 + else if s.locked 51 + then ( 52 + if s.locked_sync && s.locked_sync_thread_id = self_id () 53 + then failwith "Memprof_tracer.Lock.lock_sync: Attempted to lock recursively"; 54 + Thread.yield (); 55 + lock_sync s) 56 + else ( 57 + s.locked <- true; 58 + s.locked_sync <- true; 59 + s.locked_sync_thread_id <- self_id (); 60 + Success) 61 + ;; 62 + 63 + let[@inline never] defer s t = 64 + assert ( 65 + (not s.stopped) && s.locked && s.locked_sync && s.locked_sync_thread_id = self_id ()); 66 + s.deferred <- t :: s.deferred 67 + ;; 68 + 69 + let[@inline never] unlock_and_stop s = 70 + s.stopped <- true; 71 + s.deferred <- []; 72 + s.locked <- false; 73 + s.locked_sync <- false 74 + ;; 75 + 76 + let[@inline never] rec unlock_sync ~report_exn ~handle_deferred s = 77 + assert ( 78 + (not s.stopped) && s.locked && s.locked_sync && s.locked_sync_thread_id = self_id ()); 79 + match s.deferred with 80 + | [] -> 81 + s.locked_sync <- false; 82 + s.locked_sync_thread_id <- -1; 83 + s.locked <- false 84 + | deferred -> 85 + s.deferred <- []; 86 + (match List.iter handle_deferred (List.rev deferred) with 87 + | () -> unlock_sync ~report_exn ~handle_deferred s 88 + | exception e -> 89 + unlock_and_stop s; 90 + report_exn e) 91 + ;; 92 + 93 + (* Normally, taking a lock during an asynchronous callback can cause deadlocks, as the 94 + lock may already be held by the thread that is currently running the asynchronous 95 + handler. So, it is incorrect to use [lock_sync] / [unlock_sync] from a handler. 96 + 97 + Instead, [lock_async] and [unlock_async] may be used from asynchronous 98 + handlers. Taking a lock asynchronously can return a [Is_sync_locked_by_this_thread], 99 + indicating that the lock is already held by this thread and so cannot be waited for. 100 + 101 + In this state, it is not in general safe to manipulate the state protected by the lock 102 + (since you don't know what the interrupted thread is doing with it), but you can use 103 + [defer] to schedule work for when the interrupted thread releases the lock. *) 104 + 105 + type lock_async_result = 106 + | Success 107 + | Is_sync_locked_by_this_thread 108 + | Is_stopped 109 + 110 + let[@inline never] rec lock_async s : lock_async_result = 111 + if s.stopped 112 + then Is_stopped 113 + else if s.locked_sync && s.locked_sync_thread_id = self_id () 114 + then Is_sync_locked_by_this_thread 115 + else if s.locked 116 + then ( 117 + Thread.yield (); 118 + lock_async s) 119 + else ( 120 + s.locked <- true; 121 + Success) 122 + ;; 123 + 124 + let[@inline never] unlock_async s = 125 + assert ((not s.stopped) && s.locked && not s.locked_sync); 126 + s.locked <- false 127 + ;; 128 + end 129 + 130 + type deferred_event = 131 + | Deferred_promote of Trace.Writer.t * Trace.Obj_id.t 132 + | Deferred_collect of Trace.Writer.t * Trace.Obj_id.t 133 + 134 + let handle_deferred_event = function 135 + | Deferred_promote (t, id) -> Trace.Writer.put_promote t (Trace.Timestamp.now ()) id 136 + | Deferred_collect (t, id) -> Trace.Writer.put_collect t (Trace.Timestamp.now ()) id 137 + ;; 138 + 139 + type t = 140 + { lock : deferred_event Lock.t 141 + ; report_exn : exn -> unit 142 + ; trace : Trace.Writer.t 143 + ; ext_sampler : Geometric_sampler.t 144 + } 145 + 146 + let curr_active_tracer : t option ref = ref None 147 + let active_tracer () = !curr_active_tracer 148 + 149 + let current_domain () = 150 + let id = (Stdlib.Domain.self () :> int) in 151 + Trace.Domain_id.Expert.of_int id 152 + ;; 153 + 154 + let bytes_before_ext_sample = ref max_int 155 + let draw_sampler_bytes t = Geometric_sampler.draw t.ext_sampler * (Sys.word_size / 8) 156 + 157 + let[@inline never] mark_failed s e = 158 + Lock.unlock_and_stop s.lock; 159 + s.report_exn e 160 + ;; 161 + 162 + let default_report_exn e = 163 + match e with 164 + | Trace.Writer.Pid_changed -> 165 + (* This error is silently ignored, so that if Memtrace is active across 166 + Unix.fork () then the child process silently stops tracing *) 167 + () 168 + | e -> 169 + let msg = Printf.sprintf "Memtrace failure: %s\n" (Printexc.to_string e) in 170 + output_string stderr msg; 171 + Printexc.print_backtrace stderr; 172 + flush stderr 173 + ;; 174 + 175 + let assert_main_domain () = 176 + if not (Domain.is_main_domain ()) 177 + then failwith "Memtrace can currently only be used from the main domain" 178 + ;; 179 + 180 + let start ?(report_exn = default_report_exn) ~sampling_rate trace = 181 + assert_main_domain (); 182 + let ext_sampler = Geometric_sampler.make ~sampling_rate () in 183 + let s = { trace; lock = Lock.create (); report_exn; ext_sampler } in 184 + let allocate ~(info : Gc.Memprof.allocation) ~source : Trace.Obj_id.t option = 185 + if not (Domain.is_main_domain ()) 186 + then None 187 + else ( 188 + match Lock.lock_async s.lock with 189 + | Is_stopped -> None 190 + | Is_sync_locked_by_this_thread -> 191 + None (* Ignore allocations during e.g. ext_alloc *) 192 + | Success -> 193 + (match 194 + Trace.Writer.put_alloc_with_raw_backtrace 195 + trace 196 + (Trace.Timestamp.now ()) 197 + ~length:info.size 198 + ~nsamples:info.n_samples 199 + ~source 200 + ~callstack:info.callstack 201 + with 202 + | r -> 203 + Lock.unlock_async s.lock; 204 + Some r 205 + | exception e -> 206 + mark_failed s e; 207 + None)) 208 + in 209 + let promote id : Trace.Obj_id.t option = 210 + if not (Domain.is_main_domain ()) 211 + then None 212 + else ( 213 + match Lock.lock_async s.lock with 214 + | Is_stopped -> None 215 + | Is_sync_locked_by_this_thread -> 216 + Lock.defer s.lock (Deferred_promote (trace, id)); 217 + Some id 218 + | Success -> 219 + (match Trace.Writer.put_promote trace (Trace.Timestamp.now ()) id with 220 + | () -> 221 + Lock.unlock_async s.lock; 222 + Some id 223 + | exception e -> 224 + mark_failed s e; 225 + None)) 226 + in 227 + let dealloc id = 228 + if not (Domain.is_main_domain ()) 229 + then () 230 + else ( 231 + match Lock.lock_async s.lock with 232 + | Is_stopped -> () 233 + | Is_sync_locked_by_this_thread -> Lock.defer s.lock (Deferred_collect (trace, id)) 234 + | Success -> 235 + (match Trace.Writer.put_collect trace (Trace.Timestamp.now ()) id with 236 + | () -> Lock.unlock_async s.lock 237 + | exception e -> mark_failed s e)) 238 + in 239 + let tracker : (_, _) Gc.Memprof.tracker = 240 + { alloc_minor = (fun info -> allocate ~info ~source:Minor) 241 + ; alloc_major = (fun info -> allocate ~info ~source:Major) 242 + ; promote 243 + ; dealloc_minor = dealloc 244 + ; dealloc_major = dealloc 245 + } 246 + in 247 + (* Pre-allocate these so that they don't get included in the trace (and use 248 + [Sys.opaque_identity] so the allocations don't get moved *) 249 + let active_tracer, sampler_bytes = Sys.opaque_identity (Some s, draw_sampler_bytes s) in 250 + ignore (Gc.Memprof.start ~sampling_rate ~callstack_size:max_int tracker : Gc.Memprof.t); 251 + curr_active_tracer := active_tracer; 252 + bytes_before_ext_sample := sampler_bytes; 253 + s 254 + ;; 255 + 256 + let stop s = 257 + assert_main_domain (); 258 + match Lock.lock_sync s.lock with 259 + | Is_stopped -> () 260 + | Success -> 261 + Lock.unlock_and_stop s.lock; 262 + (match !curr_active_tracer with 263 + | None -> () 264 + | Some _ -> 265 + Gc.Memprof.stop (); 266 + (try Trace.Writer.close s.trace with 267 + | e -> s.report_exn e); 268 + curr_active_tracer := None) 269 + ;; 270 + 271 + let[@inline never] ext_alloc_slowpath ~bytes : Trace.Obj_id.t or_null = 272 + match !curr_active_tracer with 273 + | None -> 274 + bytes_before_ext_sample := max_int; 275 + Null 276 + | Some s -> 277 + (match Lock.lock_sync s.lock with 278 + | Is_stopped -> Null 279 + | Success -> 280 + (match 281 + let bytes_per_word = Sys.word_size / 8 in 282 + (* round up to an integer number of words *) 283 + let size_words = (bytes + bytes_per_word - 1) / bytes_per_word in 284 + let samples = ref 0 in 285 + while !bytes_before_ext_sample <= 0 do 286 + bytes_before_ext_sample := !bytes_before_ext_sample + draw_sampler_bytes s; 287 + incr samples 288 + done; 289 + assert (!samples > 0); 290 + let callstack = Printexc.get_callstack max_int in 291 + let drop_slots = 292 + (* The last callstack slot will be exactly this function, since it's 293 + never inlined. We don't want to see it in the backtrace, so drop it 294 + here. *) 295 + 1 296 + in 297 + (* Sys.opaque_identity ensures that flambda2 doesn't move the 298 + allocation past the [unlock_tracer_ext] call *) 299 + This 300 + (Trace.Writer.put_alloc_with_suffix_of_raw_backtrace 301 + s.trace 302 + (Trace.Timestamp.now ()) 303 + ~length:size_words 304 + ~nsamples:!samples 305 + ~source:External 306 + ~callstack 307 + ~drop_slots) 308 + |> Sys.opaque_identity 309 + with 310 + | r -> 311 + Lock.unlock_sync 312 + ~report_exn:s.report_exn 313 + ~handle_deferred:handle_deferred_event 314 + s.lock; 315 + r 316 + | exception e -> 317 + mark_failed s e; 318 + Null)) 319 + ;; 320 + 321 + type ext_token = Trace.Obj_id.t 322 + 323 + let ext_alloc ~bytes = 324 + let next_sample = !bytes_before_ext_sample in 325 + if next_sample = max_int 326 + then Null 327 + else if not (Domain.is_main_domain ()) 328 + then Null 329 + else ( 330 + let n = next_sample - bytes in 331 + bytes_before_ext_sample := n; 332 + if n <= 0 333 + then 334 + (* This has [@tail] to make sure this function won't appear in any backtraces (unless 335 + it's inlined into another function, in which case we have to filter it out after 336 + the fact). *) 337 + ext_alloc_slowpath ~bytes [@tail] 338 + else Null) 339 + ;; 340 + 341 + let ext_free id = 342 + match !curr_active_tracer with 343 + | None -> () 344 + | Some s -> 345 + if Domain.is_main_domain () 346 + then ( 347 + match Lock.lock_sync s.lock with 348 + | Is_stopped -> () 349 + | Success -> 350 + (match Trace.Writer.put_collect s.trace (Trace.Timestamp.now ()) id with 351 + | () -> 352 + Lock.unlock_sync 353 + ~report_exn:s.report_exn 354 + ~handle_deferred:handle_deferred_event 355 + s.lock; 356 + () 357 + | exception e -> 358 + mark_failed s e; 359 + ())) 360 + ;; 361 + 362 + let () = Trace.Private.set_name_of_memprof_tracer_module __MODULE__
+13
src/memprof_tracer.mli
··· 1 + open Stdlib_shim 2 + 3 + type t 4 + 5 + val start : ?report_exn:(exn -> unit) -> sampling_rate:float -> Trace.Writer.t -> t 6 + val stop : t -> unit 7 + val active_tracer : unit -> t option 8 + val current_domain : unit -> Trace.Domain_id.t 9 + 10 + type ext_token [@@immediate] 11 + 12 + val ext_alloc : bytes:int -> ext_token or_null 13 + val ext_free : ext_token -> unit
+85
src/memprof_tracer_410.ml_
··· 1 + type t = 2 + { mutable locked : bool; 3 + mutable failed : bool; 4 + mutable stopped : bool; 5 + report_exn : exn -> unit; 6 + trace : Trace.Writer.t } 7 + 8 + let[@inline never] lock_tracer s = 9 + (* This is a maximally unfair spinlock. *) 10 + (* if s.locked then Printf.fprintf stderr "contention\n%!"; *) 11 + while s.locked do Thread.yield () done; 12 + if s.failed then 13 + false 14 + else 15 + (s.locked <- true; true) 16 + 17 + let[@inline never] unlock_tracer s = 18 + assert (s.locked && not s.failed); 19 + s.locked <- false 20 + 21 + let[@inline never] mark_failed s e = 22 + assert (s.locked && not s.failed); 23 + s.failed <- true; 24 + s.locked <- false; 25 + s.report_exn e 26 + 27 + let default_report_exn e = 28 + let msg = Printf.sprintf "Memtrace failure: %s\n" (Printexc.to_string e) in 29 + output_string stderr msg; 30 + Printexc.print_backtrace stderr; 31 + flush stderr 32 + 33 + let start ?(report_exn=default_report_exn) ~sampling_rate trace = 34 + let s = { trace; locked = false; stopped = false; failed = false; report_exn } in 35 + Gc.Memprof.start 36 + ~callstack_size:max_int 37 + ~minor_alloc_callback:(fun info -> 38 + if lock_tracer s then begin 39 + match Trace.Writer.put_alloc_with_raw_backtrace trace (Trace.Timestamp.now ()) 40 + ~length:info.size 41 + ~nsamples:info.n_samples 42 + ~is_major:false 43 + ~callstack:info.callstack 44 + with 45 + | r -> unlock_tracer s; Some r 46 + | exception e -> mark_failed s e; None 47 + end else None) 48 + ~major_alloc_callback:(fun info -> 49 + if lock_tracer s then begin 50 + match Trace.Writer.put_alloc_with_raw_backtrace trace (Trace.Timestamp.now ()) 51 + ~length:info.size 52 + ~nsamples:info.n_samples 53 + ~is_major:true 54 + ~callstack:info.callstack 55 + with 56 + | r -> unlock_tracer s; Some r 57 + | exception e -> mark_failed s e; None 58 + end else None) 59 + ~promote_callback:(fun id -> 60 + if lock_tracer s then 61 + match Trace.Writer.put_promote trace (Trace.Timestamp.now ()) id with 62 + | () -> unlock_tracer s; Some id 63 + | exception e -> mark_failed s e; None 64 + else None) 65 + ~minor_dealloc_callback:(fun id -> 66 + if lock_tracer s then 67 + match Trace.Writer.put_collect trace (Trace.Timestamp.now ()) id with 68 + | () -> unlock_tracer s 69 + | exception e -> mark_failed s e) 70 + ~major_dealloc_callback:(fun id -> 71 + if lock_tracer s then 72 + match Trace.Writer.put_collect trace (Trace.Timestamp.now ()) id with 73 + | () -> unlock_tracer s 74 + | exception e -> mark_failed s e) 75 + ~sampling_rate 76 + (); 77 + s 78 + 79 + let stop s = 80 + if not s.stopped then begin 81 + s.stopped <- true; 82 + Gc.Memprof.stop (); 83 + if lock_tracer s then 84 + Trace.Writer.close s.trace 85 + end
+103
src/memtrace.ml
··· 1 + type tracer = Memprof_tracer.t 2 + 3 + let getpid64 () = Int64.of_int (Unix.getpid ()) 4 + 5 + let start_tracing ~if_started ~context ~sampling_rate ~filename = 6 + match Memprof_tracer.active_tracer () with 7 + | Some tracer -> 8 + (match if_started with 9 + | `Fail -> failwith "Only one Memtrace instance may be active at a time" 10 + | `Ignore -> tracer) 11 + | None -> 12 + let fd = 13 + try Unix.openfile filename Unix.[ O_CREAT; O_WRONLY ] 0o600 with 14 + | Unix.Unix_error (err, _, _) -> 15 + raise 16 + (Invalid_argument 17 + ("Cannot open memtrace file " ^ filename ^ ": " ^ Unix.error_message err)) 18 + in 19 + (try Unix.lockf fd F_TLOCK 0 with 20 + | Unix.Unix_error ((EAGAIN | EACCES), _, _) -> 21 + Unix.close fd; 22 + raise 23 + (Invalid_argument 24 + ("Cannot lock memtrace file " ^ filename ^ ": is another process using it?"))); 25 + (try Unix.ftruncate fd 0 with 26 + | Unix.Unix_error _ -> 27 + (* On special files (e.g. /dev/null), ftruncate fails. Ignoring errors 28 + here gives us the truncate-if-a-regular-file behaviour of O_TRUNC. *) 29 + ()); 30 + let info : Trace.Info.t = 31 + { sample_rate = sampling_rate 32 + ; word_size = Sys.word_size 33 + ; executable_name = Sys.executable_name 34 + ; host_name = Unix.gethostname () 35 + ; ocaml_runtime_params = Sys.runtime_parameters () 36 + ; pid = getpid64 () 37 + ; initial_domain = Trace.Domain_id.main_domain 38 + ; start_time = Trace.Timestamp.now () 39 + ; context 40 + } 41 + in 42 + let trace = Trace.Writer.create fd ~getpid:getpid64 info in 43 + Memprof_tracer.start ~sampling_rate trace 44 + ;; 45 + 46 + let stop_tracing t = Memprof_tracer.stop t 47 + 48 + let () = 49 + at_exit (fun () -> 50 + match Memprof_tracer.active_tracer () with 51 + | Some s -> stop_tracing s 52 + | None -> ()) 53 + ;; 54 + 55 + let currently_tracing () = 56 + match Memprof_tracer.active_tracer () with 57 + | Some _ -> true 58 + | None -> false 59 + ;; 60 + 61 + let default_sampling_rate = 1e-5 62 + 63 + let trace_if_requested ?(if_started = `Ignore) ?context ?sampling_rate () = 64 + match Sys.getenv_opt "MEMTRACE" with 65 + | None | Some "" -> () 66 + | Some filename -> 67 + (* Prevent spawned OCaml programs from being traced *) 68 + (Unix.putenv [@ocaml.alert "-unsafe_multidomain"]) "MEMTRACE" ""; 69 + let check_rate = function 70 + | Some rate when 0. < rate && rate <= 1. -> rate 71 + | _ -> 72 + raise 73 + (Invalid_argument 74 + ("Memtrace.trace_if_requested: " ^ "sampling_rate must be between 0 and 1")) 75 + in 76 + let sampling_rate = 77 + match sampling_rate with 78 + | Some _ -> check_rate sampling_rate 79 + | None -> 80 + (match Sys.getenv_opt "MEMTRACE_RATE" with 81 + | None | Some "" -> default_sampling_rate 82 + | Some rate -> check_rate (float_of_string_opt rate)) 83 + in 84 + let _s = start_tracing ~if_started ~context ~sampling_rate ~filename in 85 + () 86 + ;; 87 + 88 + let start_tracing ~context ~sampling_rate ~filename = 89 + (* Default of `Fail for backward compatibility *) 90 + start_tracing ~if_started:`Fail ~context ~sampling_rate ~filename 91 + ;; 92 + 93 + module Trace = Trace 94 + module Memprof_tracer = Memprof_tracer 95 + 96 + module External = struct 97 + type token = Memprof_tracer.ext_token 98 + 99 + let alloc = Memprof_tracer.ext_alloc 100 + let free = Memprof_tracer.ext_free 101 + end 102 + 103 + module Geometric_sampler = Geometric_sampler
+66
src/memtrace.mli
··· 1 + open Stdlib_shim 2 + 3 + (** If the MEMTRACE environment variable is set, begin tracing to the file it specifies, 4 + and continue tracing until the process exits. 5 + 6 + The context is an arbitrary string, which is logged in the trace. It may be useful to 7 + identify trace files. 8 + 9 + The sampling_rate is the proportion of allocated words that should be sampled. Values 10 + larger than about 1e-4 will have some performance impact. The sampling rate can also 11 + be specified with the MEMTRACE_RATE environment variable. If both means are used, the 12 + env var takes precedence. 13 + 14 + May raise Unix.Unix_error if the specified file cannot be opened, or Invalid_argument 15 + if the MEMTRACE_RATE parameter is ill-formed. 16 + 17 + May call failwith if memtrace was already started and [if_started] is `Fail. *) 18 + val trace_if_requested 19 + : ?if_started:[ `Fail | `Ignore ] 20 + -> ?context:string 21 + -> ?sampling_rate:float 22 + -> unit 23 + -> unit 24 + 25 + (** Tracing can also be manually started and stopped. *) 26 + type tracer 27 + 28 + (** Manually start tracing. *) 29 + val start_tracing 30 + : context:string option 31 + -> sampling_rate:float 32 + -> filename:string 33 + -> tracer 34 + 35 + (** Manually stop tracing *) 36 + val stop_tracing : tracer -> unit 37 + 38 + (** Is there an active trace running at the moment? *) 39 + val currently_tracing : unit -> bool 40 + 41 + val default_sampling_rate : float 42 + 43 + (** Use the Trace module to read and write trace files *) 44 + module Trace = Trace 45 + 46 + (** Use Memprof_tracer in conjunction with Trace.Writer for more manual control over trace 47 + collection *) 48 + module Memprof_tracer = Memprof_tracer 49 + 50 + (** Use External to track non-GC-heap allocations in a Memtrace trace *) 51 + module External : sig 52 + type token [@@immediate] 53 + 54 + (** [alloc ~bytes] reports an allocation of a given number of bytes. 55 + 56 + If tracing is enabled, a small fraction of the calls to this function will return 57 + [This tok], where [tok] should be passed to [free] when the object is freed. 58 + 59 + This function is very fast in the common case where it returns [Null] *) 60 + val alloc : bytes:int -> token or_null 61 + 62 + val free : token -> unit 63 + end 64 + 65 + (** (For testing) *) 66 + module Geometric_sampler = Geometric_sampler
+300
src/memtrace.tsdl
··· 1 + /* CTF 1.8 */ 2 + /* Format specification for Memtrace v002. 3 + 4 + This specification is in CTF format (see https://diamon.org/ctf/). 5 + 6 + The trace format consists of a sequence of packets, each containing 7 + a sequence of events. 8 + */ 9 + 10 + /* basic type definitions */ 11 + typealias integer {size = 8;} := uint8; 12 + typealias integer {size = 16;} := uint16; 13 + typealias integer {size = 32;} := uint32; 14 + typealias integer {size = 64;} := uint64; 15 + typealias integer {size = 8; signed = true;} := int8; 16 + typealias integer {size = 16; signed = true;} := int16; 17 + typealias integer {size = 32; signed = true;} := int32; 18 + typealias floating_point {exp_dig = 11;mant_dig = 53;align = 8;} := float64; 19 + 20 + /* vint: variable-length encoding for unsigned 64-bit integers that 21 + are usually small. */ 22 + typealias struct { 23 + enum : uint8 24 + { u8 = 0...252, u16 = 253, u32 = 254, u64 = 255 } tag; 25 + variant <tag> { 26 + struct {} u8; 27 + uint16 u16; 28 + uint32 u32; 29 + uint64 u64; 30 + } value; 31 + } := vint; 32 + 33 + /* timestamps: microseconds since Unix epoch */ 34 + clock { 35 + name = clk; 36 + freq = 1000000; /* microsecs */ 37 + offset_s = 0; 38 + }; 39 + 40 + typealias integer { 41 + size = 64; 42 + map = clock.clk.value; 43 + } := clk64; 44 + 45 + /* CTF header: CTF 1.8, with the CTF magic number */ 46 + trace { 47 + major = 1; 48 + minor = 8; 49 + byte_order = le; 50 + packet.header := struct { uint32 magic; }; 51 + }; 52 + 53 + stream { 54 + /* Packet header */ 55 + packet.context := struct { 56 + /* Size of the packet, in bits, including header */ 57 + uint32 packet_size; 58 + 59 + /* Timestamps of all events in this packet are between 60 + timestamp_begin and timestamp_end (inclusive) */ 61 + clk64 timestamp_begin; 62 + clk64 timestamp_end; 63 + 64 + uint32 flush_duration; 65 + 66 + /* Version number of memtrace format. 67 + Same for all packets in the trace. */ 68 + uint16 version; 69 + 70 + /* PID of the process that wrote the packet. 71 + Since PID can change via fork(), occasionally packets with 72 + the wrong PID can be found in a trace. These should be 73 + silently ignored. Packets are written atomically, so these 74 + bad packets can be skipped entirely. */ 75 + uint64 pid; 76 + 77 + /* Domain ID of the domain that wrote this packet. 78 + If version < 3, then this field is absent and assumed to be zero. */ 79 + uint16 domain; 80 + 81 + /* See below. */ 82 + uint16 cache_verify_ix; 83 + uint16 cache_verify_pred; 84 + uint64 cache_verify_val; 85 + 86 + /* Allocation IDs of allocation events in this packet are 87 + between alloc_id_begin and alloc_id_end (inclusive) */ 88 + uint64 alloc_id_begin; 89 + uint64 alloc_id_end; 90 + }; 91 + 92 + /* Event header (32 bits) 93 + 94 + The timestamp field here is only 25 bits long, which means it 95 + wraps after 2^25us (a bit over 30 seconds). So that timestamps 96 + can be decoded unambiguously, no packet may contain more than 30 97 + seconds worth of data. */ 98 + event.header := struct { 99 + integer {size=25; align=1; signed=false; map=clock.clk.value;} timestamp; 100 + integer {size=7; align=1; signed=false;} id; 101 + }; 102 + }; 103 + 104 + 105 + /* Event 0: trace info. 106 + 107 + Only one of these events is sent, in a packet by itself at the 108 + start of the trace. */ 109 + event { 110 + id = 0; 111 + name = "trace_info"; 112 + fields := struct { 113 + float64 sample_rate; 114 + uint8 word_size; 115 + string executable_name; 116 + string host_name; 117 + string ocaml_runtime_params; 118 + uint64 pid; 119 + string context; 120 + }; 121 + }; 122 + 123 + /* Event 1: source code location information. 124 + 125 + Location information is large, so to save space locations are 126 + represented in backtraces by 64-bit opaque identifiers. 127 + 128 + Before the first occurrence of such an identifier in a backtrace, 129 + there must always be a location event explaining what that 130 + identifer maps to. 131 + 132 + The numeric fields below (line, start_char, end_char) have slightly 133 + odd widths, chosen to match those in OCaml's internal debug info 134 + format. 135 + 136 + The text fields below (filename, defname) are move-to-front coded. 137 + There is a 5-bit field for filename. Codes 0 to 30 refer to the 138 + most recently used 31 filenames, and code 31 means that the 139 + filename is not one of the most recent (Code 31 is followed by a 140 + string). 141 + 142 + Defnames (function and module names) are encoded in the same way, 143 + using 31 independent MTF tables for each of the 31 most recent 144 + files. */ 145 + typealias struct { 146 + integer {size=20; align=1; signed=false;} line; 147 + integer {size=8; align=1; signed=false;} start_char; 148 + integer {size=10; align=1; signed=false;} end_char; 149 + enum : integer {size=5; align=1; signed=false;} 150 + { mtf = 0...30, new = 31 } filename; 151 + enum : integer {size=5; align=1; signed=false;} 152 + { mtf = 0...30, new = 31 } defname; 153 + variant <filename> { 154 + struct {} mtf; 155 + string new; 156 + } opt_filename; 157 + variant <defname> { 158 + struct {} mtf; 159 + string new; 160 + } opt_defname; 161 + } := location; 162 + 163 + event { 164 + id = 1; 165 + name = "location"; 166 + fields := struct { 167 + uint64 code; 168 + /* A single code refers to a sequence of locations, 169 + because of inlining */ 170 + uint8 nlocs; 171 + location locs[nlocs]; 172 + }; 173 + }; 174 + 175 + 176 + /* Event 2: Allocations. 177 + 178 + Most of the data in the trace is the backtrace for each allocation, 179 + so the format for backtraces does quite a lot to save space. 180 + 181 + The backtrace is a sequence of location IDs (64-bit opaque codes, 182 + see above). This sequence goes from caller to callee - main is at the 183 + start of the sequence, and the allocation point at the end. 184 + 185 + First, note that two successive backtraces likely share a long 186 + common prefix. These frames are not encoded at all: instead, a 187 + single vint common_prefix stores their length. 188 + 189 + The remaining suffix is a sequence of 64-bit opaque codes. A cache 190 + of size 2^14 is maintained. If one of the codes is present in the 191 + cache, only its 14-bit index need be encoded. If not, a cache slot 192 + is allocated to it, and the 14-bit index as well as the 64-bit code 193 + are encoded. 194 + 195 + Finally, each cache slot remembers which slot followed it the last 196 + time it was encoded, called its "prediction". If it's followed by 197 + the same slot next time, then the correctly predicted slot need not 198 + be encoded at all. Only the number of correct predictions is 199 + encoded, with a special case for the common cases of 0 or 1 correct 200 + predictions. 201 + 202 + In particular, this mechanism provides very short encodings of 203 + repetitive subsequences, as found in backtraces for e.g. List.map. 204 + 205 + Correct decoding of this requires that the decoder have precisely 206 + the same cache and prediction state as the encoder. As a check to 207 + ensure that these are not getting out of sync, a different slot's 208 + contents and current prediction are written in each packet header 209 + (see cache_verify_{ix,val,pred}), so that the decoder can verify 210 + that the states match what it expects. (This is not a particularly 211 + good defense against data corruption, although it will detect many 212 + errors eventually. It is intended mainly to detect bugs in the 213 + encoder and decoder implementation). */ 214 + 215 + typealias struct { 216 + enum : integer {size=2; align=1; signed=false;} { hit0 = 0, hit1 = 1, hitN = 2, miss = 3 } tag; 217 + integer {size=14; align=1; signed=false;} cache_bucket; 218 + variant <tag> { 219 + struct {} hit0; 220 + struct {} hit1; 221 + uint8 hitN; 222 + uint64 miss; 223 + } v; 224 + } := backtrace_code; 225 + 226 + typealias enum : uint8 { MINOR=0, MAJOR=1, EXTERNAL=2 } := allocation_source; 227 + 228 + event { 229 + id = 2; 230 + name = "alloc"; 231 + fields := struct { 232 + /* Wosize of the allocation (length, in words, excluding header) */ 233 + vint length; 234 + /* Number of samples at this allocation. 1 <= samples <= length+1 */ 235 + vint samples; 236 + /* major/minor/external */ 237 + allocation_source source; 238 + /* Backtrace. See above. */ 239 + vint common_prefix; 240 + uint16 new_suffix; 241 + backtrace_code backtrace[new_suffix]; 242 + }; 243 + }; 244 + 245 + /* Event 3: Promotion. 246 + 247 + The only information needed for a promotion (beyond the timestamp, 248 + which is part of the header) is the ID of the object promoted. 249 + 250 + This ID must be less than or equal to the ID of the last allocation, 251 + so (last_alloc_id - promoted_id) is a nonnegative number, often small. 252 + This number is encoded as a vint */ 253 + 254 + event { 255 + id = 3; 256 + name = "promote"; 257 + fields := struct { 258 + vint obj_id_delta; 259 + }; 260 + }; 261 + 262 + /* Event 4: Collection. 263 + 264 + Same coding as promotion */ 265 + event { 266 + id = 4; 267 + name = "collect"; 268 + fields := struct { 269 + vint obj_id_delta; 270 + }; 271 + }; 272 + 273 + 274 + /* Events 101-116. Allocations (short format) 275 + 276 + Most allocations are small, on the minor heap, with a single sample, 277 + and a small number of new backtrace slots. These are given a shorter 278 + encoding, using the same format for backtraces */ 279 + 280 + typealias struct { 281 + vint common_prefix; 282 + uint8 new_suffix; 283 + backtrace_code backtrace[new_suffix]; 284 + } := short_alloc; 285 + event { id = 101; name = "alloc01"; fields := short_alloc; }; 286 + event { id = 102; name = "alloc02"; fields := short_alloc; }; 287 + event { id = 103; name = "alloc03"; fields := short_alloc; }; 288 + event { id = 104; name = "alloc04"; fields := short_alloc; }; 289 + event { id = 105; name = "alloc05"; fields := short_alloc; }; 290 + event { id = 106; name = "alloc06"; fields := short_alloc; }; 291 + event { id = 107; name = "alloc07"; fields := short_alloc; }; 292 + event { id = 108; name = "alloc08"; fields := short_alloc; }; 293 + event { id = 109; name = "alloc09"; fields := short_alloc; }; 294 + event { id = 110; name = "alloc10"; fields := short_alloc; }; 295 + event { id = 111; name = "alloc11"; fields := short_alloc; }; 296 + event { id = 112; name = "alloc12"; fields := short_alloc; }; 297 + event { id = 113; name = "alloc13"; fields := short_alloc; }; 298 + event { id = 114; name = "alloc14"; fields := short_alloc; }; 299 + event { id = 115; name = "alloc15"; fields := short_alloc; }; 300 + event { id = 116; name = "alloc16"; fields := short_alloc; };
+16
src/stdlib_shim.ml
··· 1 + (* In non-OxCaml builds, this file has stub implementations of some OxCaml stdlib features *) 2 + 3 + type 'a or_null = 4 + | Null 5 + | This of 'a 6 + 7 + module Obj = struct 8 + include Obj 9 + 10 + let magic_uncontended = Fun.id 11 + end 12 + 13 + module Hashtbl = struct 14 + include Hashtbl 15 + module MakeSeededPortable = MakeSeeded 16 + end
+1122
src/trace.ml
··· 1 + open Stdlib_shim 2 + (* This is the implementation of the encoder/decoder for the memtrace 3 + format. This format is quite involved, and to understand it it's 4 + best to read the CTF specification and comments in memtrace.tsl 5 + first. *) 6 + 7 + (* Increment this when the format changes in an incompatible way *) 8 + (* Version 2: added context field to trace_info event 9 + Version 3: added domain field to packet header *) 10 + let memtrace_version = 3 11 + 12 + (* If this is true, then all backtraces are immediately decoded and 13 + verified after encoding. This is slow, but helpful for debugging. *) 14 + let cache_enable_debug = false 15 + 16 + open Buf 17 + 18 + exception Parse_error of string 19 + 20 + let () = 21 + (Printexc.register_printer [@ocaml.alert "-unsafe_multidomain"]) (function 22 + | Parse_error s -> Some ("malformed trace: " ^ s) 23 + | _ -> None) 24 + ;; 25 + 26 + let[@inline never] bad_format s = raise (Parse_error s) 27 + let[@inline never] bad_formatf f = Printf.ksprintf (fun s -> bad_format s) f 28 + let check_fmt s b = if not b then bad_format s 29 + 30 + (* Utility types *) 31 + 32 + (* Time since the epoch *) 33 + module Timestamp = struct 34 + type t = int64 35 + 36 + let of_int64 t = t 37 + let to_int64 t = t 38 + let to_float t = Int64.to_float t /. 1_000_000. 39 + let of_float f = f *. 1_000_000. |> Int64.of_float 40 + let now () = of_float (Unix.gettimeofday ()) 41 + end 42 + 43 + (* Time since the start of the trace *) 44 + module Timedelta = struct 45 + type t = int64 46 + 47 + let to_int64 t = t 48 + let offset = Int64.add 49 + end 50 + 51 + module IntTbl = Hashtbl.MakeSeededPortable (struct 52 + type t = int 53 + 54 + let hash _seed (id : t) = 55 + let h = id * 189696287 in 56 + h lxor (h lsr 23) 57 + ;; 58 + 59 + (* Required for OCaml >= 5.0.0, but causes errors for older compilers 60 + because it is an unused value declaration. *) 61 + let[@warning "-32"] seeded_hash = hash 62 + let equal (a : t) (b : t) = a = b 63 + end) 64 + 65 + module Domain_id = struct 66 + type t = int 67 + 68 + module Tbl = IntTbl 69 + 70 + module Expert = struct 71 + let of_int x = x 72 + end 73 + 74 + let main_domain = Expert.of_int 0 75 + end 76 + 77 + (** CTF packet headers *) 78 + 79 + (* Small enough that Unix.write still does single writes. 80 + (i.e. below 64k) *) 81 + let max_packet_size = 1 lsl 15 82 + 83 + type packet_header_info = 84 + { content_size : int (* bytes, excluding header *) 85 + ; time_begin : Timestamp.t 86 + ; time_end : Timestamp.t 87 + ; alloc_id_begin : Int64.t 88 + ; alloc_id_end : Int64.t 89 + ; pid : Int64.t 90 + ; version : int 91 + ; domain : int 92 + ; cache_verifier : Backtrace_codec.Reader.cache_verifier 93 + } 94 + 95 + (* When writing a packet, some fields can be filled in only once the 96 + packet is complete. *) 97 + type ctf_header_offsets = 98 + { off_packet_size : Write.position_32 99 + ; off_timestamp_begin : Write.position_64 100 + ; off_timestamp_end : Write.position_64 101 + ; off_flush_duration : Write.position_32 102 + ; off_alloc_begin : Write.position_64 103 + ; off_alloc_end : Write.position_64 104 + } 105 + 106 + let put_ctf_header b ~pid ~domain ~cache = 107 + let open Write in 108 + put_32 b 0xc1fc1fc1l; 109 + let off_packet_size = skip_32 b in 110 + let off_timestamp_begin = skip_64 b in 111 + let off_timestamp_end = skip_64 b in 112 + let off_flush_duration = skip_32 b in 113 + put_16 b memtrace_version; 114 + put_64 b pid; 115 + put_16 b domain; 116 + (match cache with 117 + | Some c -> Backtrace_codec.Writer.put_cache_verifier c b 118 + | None -> Backtrace_codec.Writer.put_dummy_verifier b); 119 + let off_alloc_begin = skip_64 b in 120 + let off_alloc_end = skip_64 b in 121 + { off_packet_size 122 + ; off_timestamp_begin 123 + ; off_timestamp_end 124 + ; off_flush_duration 125 + ; off_alloc_begin 126 + ; off_alloc_end 127 + } 128 + ;; 129 + 130 + let finish_ctf_header hdr b ~timestamp_begin ~timestamp_end ~alloc_id_begin ~alloc_id_end = 131 + let open Write in 132 + let size = b.pos in 133 + update_32 b hdr.off_packet_size (Int32.mul (Int32.of_int size) 8l); 134 + update_64 b hdr.off_timestamp_begin timestamp_begin; 135 + update_64 b hdr.off_timestamp_end timestamp_end; 136 + update_32 b hdr.off_flush_duration 0l; 137 + update_64 b hdr.off_alloc_begin (Int64.of_int alloc_id_begin); 138 + update_64 b hdr.off_alloc_end (Int64.of_int alloc_id_end) 139 + ;; 140 + 141 + let get_ctf_header b = 142 + let open Read in 143 + let start = b.pos in 144 + let magic = get_32 b in 145 + let packet_size = get_32 b in 146 + let time_begin = get_64 b in 147 + let time_end = get_64 b in 148 + let _flush_duration = get_32 b in 149 + let version = get_16 b in 150 + let pid = get_64 b in 151 + let domain = if version >= 3 then get_16 b else 0 in 152 + let cache_verifier = Backtrace_codec.Reader.get_cache_verifier b in 153 + let alloc_id_begin = get_64 b in 154 + let alloc_id_end = get_64 b in 155 + check_fmt "Not a CTF packet" (magic = 0xc1fc1fc1l); 156 + if version > memtrace_version 157 + then bad_formatf "trace format v%03d, but expected v%03d" version memtrace_version; 158 + check_fmt "Bad packet size" (packet_size >= 0l); 159 + check_fmt "Monotone packet timestamps" (time_begin <= time_end); 160 + check_fmt "Monotone alloc IDs" (alloc_id_begin <= alloc_id_end); 161 + let header_size = b.pos - start in 162 + { content_size = Int32.(to_int (div packet_size 8l) - header_size) 163 + ; time_begin 164 + ; time_end 165 + ; alloc_id_begin 166 + ; alloc_id_end 167 + ; pid 168 + ; domain 169 + ; version 170 + ; cache_verifier 171 + } 172 + ;; 173 + 174 + (** Event headers *) 175 + 176 + type evcode = 177 + | Ev_trace_info 178 + | Ev_location 179 + | Ev_alloc 180 + | Ev_promote 181 + | Ev_collect 182 + | Ev_short_alloc of int 183 + 184 + let event_code = function 185 + | Ev_trace_info -> 0 186 + | Ev_location -> 1 187 + | Ev_alloc -> 2 188 + | Ev_promote -> 3 189 + | Ev_collect -> 4 190 + | Ev_short_alloc n -> 191 + assert (1 <= n && n <= 16); 192 + 100 + n 193 + ;; 194 + 195 + let event_of_code = function 196 + | 0 -> Ev_trace_info 197 + | 1 -> Ev_location 198 + | 2 -> Ev_alloc 199 + | 3 -> Ev_promote 200 + | 4 -> Ev_collect 201 + | n when 101 <= n && n <= 116 -> Ev_short_alloc (n - 100) 202 + | c -> bad_format ("Unknown event code " ^ string_of_int c) 203 + ;; 204 + 205 + let event_header_time_len = 25 206 + let event_header_time_mask = 0x1ffffffl 207 + 208 + (* NB: packet_max_time is less than (1 lsl event_header_time_len) microsecs *) 209 + let packet_max_time = 30 * 1_000_000 210 + 211 + let put_event_header b ev time = 212 + let open Write in 213 + let code = 214 + Int32.( 215 + logor 216 + (shift_left (of_int (event_code ev)) event_header_time_len) 217 + (logand (Int64.to_int32 time) event_header_time_mask)) 218 + in 219 + put_32 b code 220 + ;; 221 + 222 + let[@inline] get_event_header info b = 223 + let open Read in 224 + let code = get_32 b in 225 + let start_low = Int32.logand event_header_time_mask (Int64.to_int32 info.time_begin) in 226 + let time_low = Int32.logand event_header_time_mask code in 227 + let time_low = 228 + if time_low < start_low 229 + then (* Overflow *) 230 + Int32.(add time_low (of_int (1 lsl event_header_time_len))) 231 + else time_low 232 + in 233 + let time = 234 + Int64.( 235 + add 236 + (logand info.time_begin (lognot (of_int32 event_header_time_mask))) 237 + (of_int32 time_low)) 238 + in 239 + check_fmt "time in packet bounds" (info.time_begin <= time && time <= info.time_end); 240 + let ev = 241 + event_of_code Int32.(to_int (shift_right_logical code event_header_time_len)) 242 + in 243 + ev, time 244 + ;; 245 + 246 + module Location = Location_codec.Location 247 + 248 + module Obj_id = struct 249 + type t = int 250 + 251 + module Tbl = IntTbl 252 + 253 + module Allocator = struct 254 + type nonrec t = 255 + { global_ids : t Atomic.t 256 + ; mutable start_id : t (* first object ID this packet *) 257 + ; mutable next_id : t (* next object ID in this packet *) 258 + ; mutable last_id : t (* object ID at which we need to reallocate *) 259 + } 260 + 261 + let has_next t = t.next_id < t.last_id 262 + 263 + let read_next_exn t = 264 + if t.next_id = t.last_id then failwith "Obj_id.Allocator.next_exn: exhausted"; 265 + t.next_id 266 + ;; 267 + 268 + let take_next_exn t = 269 + let id = read_next_exn t in 270 + t.next_id <- id + 1; 271 + id 272 + ;; 273 + 274 + let ids_per_chunk = Atomic.make 10_000 275 + 276 + let new_packet t = 277 + if not (has_next t) 278 + then ( 279 + let ids_per_chunk = Atomic.get ids_per_chunk in 280 + t.next_id <- Atomic.fetch_and_add t.global_ids ids_per_chunk; 281 + t.last_id <- t.next_id + ids_per_chunk); 282 + t.start_id <- t.next_id 283 + ;; 284 + 285 + let of_global_ids global_ids = 286 + let t = { global_ids; start_id = 0; next_id = 0; last_id = 0 } in 287 + new_packet t; 288 + t 289 + ;; 290 + 291 + let create () = of_global_ids (Atomic.make 0) 292 + let for_new_domain { global_ids; _ } : unit -> t = fun () -> of_global_ids global_ids 293 + end 294 + end 295 + 296 + (** Trace info *) 297 + 298 + module Info = struct 299 + type t = 300 + { sample_rate : float 301 + ; word_size : int 302 + ; executable_name : string 303 + ; host_name : string 304 + ; ocaml_runtime_params : string 305 + ; pid : Int64.t 306 + ; initial_domain : Domain_id.t 307 + ; start_time : Timestamp.t 308 + ; context : string option 309 + } 310 + end 311 + 312 + let put_trace_info b (info : Info.t) = 313 + let open Write in 314 + put_event_header b Ev_trace_info info.start_time; 315 + put_float b info.sample_rate; 316 + put_8 b info.word_size; 317 + put_string b info.executable_name; 318 + put_string b info.host_name; 319 + put_string b info.ocaml_runtime_params; 320 + put_64 b info.pid; 321 + let context = 322 + match info.context with 323 + | None -> "" 324 + | Some s -> s 325 + in 326 + put_string b context 327 + ;; 328 + 329 + let get_trace_info b ~packet_info = 330 + let open Read in 331 + let start_time = packet_info.time_begin in 332 + let sample_rate = get_float b in 333 + let word_size = get_8 b in 334 + let executable_name = get_string b in 335 + let host_name = get_string b in 336 + let ocaml_runtime_params = get_string b in 337 + let pid = get_64 b in 338 + let context = 339 + if packet_info.version >= 2 340 + then ( 341 + match get_string b with 342 + | "" -> None 343 + | s -> Some s) 344 + else None 345 + in 346 + { Info.start_time 347 + ; sample_rate 348 + ; word_size 349 + ; executable_name 350 + ; host_name 351 + ; ocaml_runtime_params 352 + ; pid 353 + ; initial_domain = packet_info.domain 354 + ; context 355 + } 356 + ;; 357 + 358 + (** Trace writer *) 359 + 360 + type writer = 361 + { dest : Buf.Shared_writer_fd.t 362 + ; pid : int64 363 + ; getpid : unit -> int64 364 + ; domain : Domain_id.t 365 + ; loc_writer : Location_codec.Writer.t 366 + ; cache : Backtrace_codec.Writer.t 367 + ; debug_reader_cache : Backtrace_codec.Reader.t option 368 + ; (* Locations that missed cache in this packet *) 369 + mutable new_locs : (int * Location.t list) array 370 + ; mutable new_locs_len : int 371 + ; new_locs_buf : Bytes.t 372 + ; (* Last allocation callstack *) 373 + mutable last_callstack : int array 374 + ; (* Number of slots that were dropped from the last callstack *) 375 + mutable last_dropped_slots : int 376 + ; obj_ids : Obj_id.Allocator.t 377 + ; mutable packet_time_start : Timestamp.t 378 + ; mutable packet_time_end : Timestamp.t 379 + ; mutable packet_header : ctf_header_offsets 380 + ; mutable packet : Write.t 381 + } 382 + 383 + let writer_for_domain ~dest ~pid ~getpid ~domain ~obj_ids ~start_time : writer = 384 + let packet = Write.of_bytes (Bytes.make max_packet_size '\042') in 385 + let packet_header = put_ctf_header packet ~pid ~domain ~cache:None in 386 + let cache = Backtrace_codec.Writer.create () in 387 + let debug_reader_cache = 388 + if cache_enable_debug then Some (Backtrace_codec.Reader.create ()) else None 389 + in 390 + let s = 391 + { dest 392 + ; pid 393 + ; getpid 394 + ; domain 395 + ; loc_writer = Location_codec.Writer.create () 396 + ; new_locs = [||] 397 + ; new_locs_len = 0 398 + ; new_locs_buf = Bytes.make max_packet_size '\042' 399 + ; cache 400 + ; debug_reader_cache 401 + ; last_callstack = [||] 402 + ; last_dropped_slots = 0 403 + ; obj_ids 404 + ; packet_time_start = start_time 405 + ; packet_time_end = start_time 406 + ; packet_header 407 + ; packet 408 + } 409 + in 410 + s 411 + ;; 412 + 413 + let make_writer dest ?getpid (info : Info.t) = 414 + let dest = Buf.Shared_writer_fd.make dest in 415 + let open Write in 416 + let getpid = 417 + match getpid with 418 + | Some getpid -> getpid 419 + | None -> fun () -> info.pid 420 + in 421 + let pid = getpid () in 422 + let domain = info.initial_domain in 423 + let packet = Write.of_bytes (Bytes.make max_packet_size '\042') in 424 + let obj_ids = Obj_id.Allocator.create () in 425 + (* Write the trace info packet *) 426 + (let hdr = put_ctf_header packet ~pid ~domain ~cache:None in 427 + put_trace_info packet info; 428 + finish_ctf_header 429 + hdr 430 + packet 431 + ~timestamp_begin:info.start_time 432 + ~timestamp_end:info.start_time 433 + ~alloc_id_begin:0 434 + ~alloc_id_end:0; 435 + write_fd dest packet); 436 + writer_for_domain ~dest ~pid ~getpid ~domain ~obj_ids ~start_time:info.start_time 437 + ;; 438 + 439 + module Location_code = struct 440 + type t = int 441 + 442 + module Tbl = IntTbl 443 + 444 + module Expert = struct 445 + let of_int t = t 446 + end 447 + end 448 + 449 + module Allocation_source = struct 450 + type t = 451 + | Minor 452 + | Major 453 + | External 454 + end 455 + 456 + module Event = struct 457 + type t = 458 + | Alloc of 459 + { obj_id : Obj_id.t 460 + ; length : int 461 + ; domain : Domain_id.t 462 + ; nsamples : int 463 + ; source : Allocation_source.t 464 + ; backtrace_buffer : Location_code.t array 465 + ; backtrace_length : int 466 + ; common_prefix : int 467 + } 468 + | Promote of Obj_id.t * Domain_id.t 469 + | Collect of Obj_id.t * Domain_id.t 470 + 471 + let to_string decode_loc = function 472 + | Alloc 473 + { obj_id 474 + ; length 475 + ; domain 476 + ; nsamples 477 + ; source 478 + ; backtrace_buffer 479 + ; backtrace_length 480 + ; common_prefix 481 + } -> 482 + let backtrace = 483 + List.init backtrace_length (fun i -> 484 + let s = backtrace_buffer.(i) in 485 + match decode_loc s with 486 + | [] -> Printf.sprintf "$%d" (s :> int) 487 + | ls -> String.concat " " (List.map Location.to_string ls)) 488 + |> String.concat " " 489 + in 490 + let alloc_src = 491 + match source with 492 + | Minor -> "alloc" 493 + | Major -> "alloc_major" 494 + | External -> "alloc_ext" 495 + in 496 + Printf.sprintf 497 + "%010d %s %d len=%d dom=%d % 4d: %s" 498 + (obj_id :> int) 499 + alloc_src 500 + nsamples 501 + length 502 + (domain :> int) 503 + common_prefix 504 + backtrace 505 + | Promote (id, _dom) -> Printf.sprintf "%010d promote" (id :> int) 506 + | Collect (id, _dom) -> Printf.sprintf "%010d collect" (id :> int) 507 + ;; 508 + 509 + let domain = function 510 + | Alloc { domain; _ } | Promote (_, domain) | Collect (_, domain) -> domain 511 + ;; 512 + end 513 + 514 + let log_new_loc s loc = 515 + let alen = Array.length s.new_locs in 516 + assert (s.new_locs_len <= alen); 517 + if s.new_locs_len = alen 518 + then ( 519 + let new_len = if alen = 0 then 32 else alen * 2 in 520 + let locs = Array.make new_len loc in 521 + Array.blit s.new_locs 0 locs 0 alen; 522 + s.new_locs <- locs; 523 + s.new_locs_len <- alen + 1) 524 + else ( 525 + s.new_locs.(s.new_locs_len) <- loc; 526 + s.new_locs_len <- s.new_locs_len + 1) 527 + ;; 528 + 529 + (** Flushing *) 530 + exception Pid_changed 531 + 532 + let flush_at s ~now = 533 + (* If the PID has changed, then the process forked and we're in the subprocess. 534 + Don't write anything to the file, and raise an exception to quit tracing *) 535 + if s.pid <> s.getpid () then raise Pid_changed; 536 + let open Write in 537 + (* First, flush newly-seen locations. 538 + These must be emitted before any events that might refer to them *) 539 + let i = ref 0 in 540 + while !i < s.new_locs_len do 541 + let b = Write.of_bytes s.new_locs_buf in 542 + let hdr = put_ctf_header b ~pid:s.pid ~domain:s.domain ~cache:None in 543 + while !i < s.new_locs_len && remaining b > Location_codec.Writer.max_length do 544 + put_event_header b Ev_location s.packet_time_start; 545 + Location_codec.Writer.put_location s.loc_writer b s.new_locs.(!i); 546 + incr i 547 + done; 548 + finish_ctf_header 549 + hdr 550 + b 551 + ~timestamp_begin:s.packet_time_start 552 + ~timestamp_end:s.packet_time_start 553 + ~alloc_id_begin:s.obj_ids.start_id 554 + ~alloc_id_end:s.obj_ids.start_id; 555 + write_fd s.dest b 556 + done; 557 + (* Next, flush the actual events *) 558 + finish_ctf_header 559 + s.packet_header 560 + s.packet 561 + ~timestamp_begin:s.packet_time_start 562 + ~timestamp_end:s.packet_time_end 563 + ~alloc_id_begin:s.obj_ids.start_id 564 + ~alloc_id_end:s.obj_ids.next_id; 565 + write_fd s.dest s.packet; 566 + (* Finally, reset the buffer *) 567 + s.packet_time_start <- now; 568 + s.packet_time_end <- now; 569 + s.new_locs_len <- 0; 570 + s.packet <- Write.of_bytes s.packet.buf; 571 + Obj_id.Allocator.new_packet s.obj_ids; 572 + s.packet_header 573 + <- put_ctf_header s.packet ~pid:s.pid ~domain:s.domain ~cache:(Some s.cache) 574 + ;; 575 + 576 + let max_ev_size = 577 + 100 578 + (* upper bound on fixed-size portion of events 579 + (i.e. not backtraces or locations) *) 580 + + max Location_codec.Writer.max_length Backtrace_codec.Writer.max_length 581 + ;; 582 + 583 + let begin_event s ev ~(now : Timestamp.t) = 584 + let open Write in 585 + if remaining s.packet < max_ev_size 586 + || s.new_locs_len > 128 587 + || Int64.(sub now s.packet_time_start > of_int packet_max_time) 588 + || not (Obj_id.Allocator.has_next s.obj_ids) 589 + then flush_at s ~now; 590 + s.packet_time_end <- now; 591 + put_event_header s.packet ev now 592 + ;; 593 + 594 + let flush s = flush_at s ~now:s.packet_time_end 595 + 596 + (* Returns length of the longest suffix of curr which is also a suffix of prev *) 597 + let find_common_suffix (prev : int array) prev_start (curr : int array) curr_start = 598 + assert (prev_start >= 0); 599 + assert (curr_start >= 0); 600 + let i = ref (Array.length curr - 1) 601 + and j = ref (Array.length prev - 1) in 602 + while !i >= curr_start && !j >= prev_start do 603 + if Array.unsafe_get curr !i = Array.unsafe_get prev !j 604 + then ( 605 + decr i; 606 + decr j) 607 + else j := -1 608 + done; 609 + (* !i is now the highest index of curr that doesn't match prev *) 610 + Array.length curr - (!i + 1) 611 + ;; 612 + 613 + type alloc_length_format = 614 + | Len_short of Write.position_8 615 + | Len_long of Write.position_16 616 + 617 + let put_alloc 618 + s 619 + now 620 + ~length 621 + ~nsamples 622 + ~source 623 + ~callstack 624 + ~callstack_as_ints 625 + ~decode_callstack_entry 626 + ~drop_slots 627 + = 628 + let open Write in 629 + let common_len = 630 + find_common_suffix s.last_callstack s.last_dropped_slots callstack_as_ints drop_slots 631 + in 632 + let new_len = Array.length callstack_as_ints - common_len in 633 + s.last_callstack <- callstack_as_ints; 634 + s.last_dropped_slots <- drop_slots; 635 + let is_short = 636 + 1 <= length 637 + && length <= 16 638 + && source = Allocation_source.Minor 639 + && nsamples = 1 640 + && new_len < 256 641 + in 642 + begin_event s (if is_short then Ev_short_alloc length else Ev_alloc) ~now; 643 + let id = Obj_id.Allocator.take_next_exn s.obj_ids in 644 + let cache = s.cache in 645 + let b = s.packet in 646 + let src_code = 647 + match source with 648 + | Minor -> 0 649 + | Major -> 1 650 + | External -> 2 651 + in 652 + let bt_len_off = 653 + if is_short 654 + then ( 655 + put_vint b common_len; 656 + Len_short (skip_8 b)) 657 + else ( 658 + put_vint b length; 659 + put_vint b nsamples; 660 + put_8 b src_code; 661 + put_vint b common_len; 662 + Len_long (skip_16 b)) 663 + in 664 + let bt_elem_off = b.pos in 665 + let log_new_location ~index = 666 + log_new_loc s (callstack_as_ints.(index), decode_callstack_entry callstack index) 667 + in 668 + let nencoded = 669 + Backtrace_codec.Writer.put_backtrace 670 + cache 671 + b 672 + ~alloc_id:id 673 + ~callstack:callstack_as_ints 674 + ~callstack_pos:drop_slots 675 + ~callstack_len:new_len 676 + ~log_new_location 677 + in 678 + (match bt_len_off with 679 + | Len_short p -> 680 + assert (nencoded <= 0xff); 681 + update_8 b p nencoded 682 + | Len_long p -> 683 + (* This can't overflow because there isn't room in a packet for more than 684 + 0xffff entries. (See max_packet_size) *) 685 + assert (nencoded <= 0xffff); 686 + update_16 b p nencoded); 687 + (match s.debug_reader_cache with 688 + | None -> () 689 + | Some c -> 690 + let open Read in 691 + (* Decode the backtrace and check that it matches *) 692 + let b' = Read.of_bytes_sub b.buf ~pos:bt_elem_off ~pos_end:b.pos in 693 + let decoded, decoded_len = 694 + Backtrace_codec.Reader.get_backtrace c b' ~nencoded ~common_pfx_len:common_len 695 + in 696 + assert (remaining b' = 0); 697 + let rev_callstack = 698 + callstack_as_ints |> Array.to_list |> List.rev |> Array.of_list 699 + in 700 + if Array.sub decoded 0 decoded_len <> rev_callstack 701 + then ( 702 + rev_callstack |> Array.map Int64.of_int |> Array.iter (Printf.printf " %08Lx"); 703 + Printf.printf " !\n%!"; 704 + Array.sub decoded 0 decoded_len |> Array.iter (Printf.printf " %08x"); 705 + Printf.printf " !\n%!"; 706 + failwith "bad coded backtrace")); 707 + id 708 + ;; 709 + 710 + let get_alloc ~parse_backtraces ~domain evcode cache alloc_id b = 711 + let open Read in 712 + let is_short, length, nsamples, source = 713 + match evcode with 714 + | Ev_short_alloc n -> true, n, 1, Allocation_source.Minor 715 + | Ev_alloc -> 716 + let length = get_vint b in 717 + let nsamples = get_vint b in 718 + let source : Allocation_source.t = 719 + match get_8 b with 720 + | 0 -> Minor 721 + | 1 -> Major 722 + | 2 -> External 723 + | _ -> bad_format "source" 724 + in 725 + false, length, nsamples, source 726 + | _ -> assert false 727 + in 728 + let common_pfx_len = get_vint b in 729 + let nencoded = if is_short then get_8 b else get_16 b in 730 + let backtrace_buffer, backtrace_length = 731 + if parse_backtraces 732 + then Backtrace_codec.Reader.get_backtrace cache b ~nencoded ~common_pfx_len 733 + else ( 734 + Backtrace_codec.Reader.skip_backtrace cache b ~nencoded ~common_pfx_len; 735 + [||], 0) 736 + in 737 + Event.Alloc 738 + { obj_id = alloc_id 739 + ; length 740 + ; domain 741 + ; nsamples 742 + ; source 743 + ; backtrace_buffer 744 + ; backtrace_length 745 + ; common_prefix = common_pfx_len 746 + } 747 + ;; 748 + 749 + (* The other events are much simpler *) 750 + 751 + let put_promote s now id = 752 + let open Write in 753 + begin_event s Ev_promote ~now; 754 + let b = s.packet in 755 + put_vint b (s.obj_ids.next_id - 1 - id) 756 + ;; 757 + 758 + let get_promote ~domain alloc_id b = 759 + let open Read in 760 + let id_delta = get_vint b in 761 + check_fmt "promote id sync" (id_delta >= 0); 762 + let id = alloc_id - 1 - id_delta in 763 + Event.Promote (id, domain) 764 + ;; 765 + 766 + let put_collect s now id = 767 + let open Write in 768 + begin_event s Ev_collect ~now; 769 + let b = s.packet in 770 + put_vint b (s.obj_ids.next_id - 1 - id) 771 + ;; 772 + 773 + let get_collect ~domain alloc_id b = 774 + let open Read in 775 + let id_delta = get_vint b in 776 + (* Typically, id_delta >= 0, because you are collecting an object with an earlier object 777 + ID. However, a tricky case in domain termination (collecting an object previously 778 + allocated by a now-terminated domain) means that this is not necessarily the case, so 779 + there's no assertion here *) 780 + let id = alloc_id - 1 - id_delta in 781 + Event.Collect (id, domain) 782 + ;; 783 + 784 + (** Trace reader *) 785 + 786 + type reader = 787 + { fd : Unix.file_descr 788 + ; info : Info.t 789 + ; data_off : int 790 + ; loc_table : Location.t list Location_code.Tbl.t 791 + } 792 + 793 + let make_reader fd = 794 + let open Read in 795 + let buf = Bytes.make max_packet_size '\042' in 796 + let start_pos = Unix.lseek fd 0 SEEK_CUR in 797 + let b = read_fd fd buf in 798 + let packet_info = get_ctf_header b in 799 + let header_size = b.pos in 800 + let b, _ = split b packet_info.content_size in 801 + check_fmt "trace info packet size" (remaining b >= packet_info.content_size); 802 + let ev, evtime = get_event_header packet_info b in 803 + check_fmt "trace info packet code" (ev = Ev_trace_info); 804 + check_fmt "trace info packet time" (evtime = packet_info.time_begin); 805 + let trace_info = get_trace_info b ~packet_info in 806 + check_fmt "trace info packet done" (remaining b = 0); 807 + let loc_table = Location_code.Tbl.create 20 in 808 + let data_off = start_pos + header_size + packet_info.content_size in 809 + { fd; info = trace_info; data_off; loc_table } 810 + ;; 811 + 812 + let report_hack fmt = Printf.kfprintf (fun ppf -> Printf.fprintf ppf "\n%!") stderr fmt 813 + 814 + let refill_to size fd stream = 815 + let open Read in 816 + if remaining stream < size then refill_fd fd stream else stream 817 + ;; 818 + 819 + let iter s ?(parse_backtraces = true) f = 820 + let open Read in 821 + let per_domain = Domain_id.Tbl.create 1 in 822 + let iter_events_of_packet (packet_header : packet_header_info) b = 823 + let domain = packet_header.domain in 824 + let alloc_id = ref (Int64.to_int packet_header.alloc_id_begin) in 825 + let loc_reader, cache, last_timestamp = 826 + try Domain_id.Tbl.find per_domain domain with 827 + | Not_found -> 828 + let reader = Location_codec.Reader.create () in 829 + let cache = Backtrace_codec.Reader.create () in 830 + let last_timestamp = ref s.info.start_time in 831 + Domain_id.Tbl.add per_domain domain (reader, cache, last_timestamp); 832 + reader, cache, last_timestamp 833 + in 834 + if parse_backtraces 835 + then 836 + if not 837 + (Backtrace_codec.Reader.check_cache_verifier 838 + cache 839 + packet_header.cache_verifier) 840 + then bad_format "cache verification"; 841 + while remaining b > 0 do 842 + let ev, time = get_event_header packet_header b in 843 + check_fmt "monotone timestamps" (!last_timestamp <= time); 844 + last_timestamp := time; 845 + let dt = Int64.(sub time s.info.start_time) in 846 + match ev with 847 + | Ev_trace_info -> bad_format "Multiple trace-info events present" 848 + | Ev_location -> 849 + let id, loc = Location_codec.Reader.get_location loc_reader b in 850 + (*Printf.printf "%3d _ _ location\n" (b.pos - last_pos);*) 851 + if Location_code.Tbl.mem s.loc_table id 852 + then 853 + check_fmt 854 + "consistent location info" 855 + (Location_code.Tbl.find s.loc_table id = loc) 856 + else Location_code.Tbl.add s.loc_table id loc 857 + | (Ev_alloc | Ev_short_alloc _) as evcode -> 858 + let info = get_alloc ~parse_backtraces ~domain evcode cache !alloc_id b in 859 + incr alloc_id; 860 + (*Printf.printf "%3d " (b.pos - last_pos);*) 861 + f dt info 862 + | Ev_collect -> 863 + let info = get_collect ~domain !alloc_id b in 864 + (*Printf.printf "%3d " (b.pos - last_pos);*) 865 + f dt info 866 + | Ev_promote -> 867 + let info = get_promote ~domain !alloc_id b in 868 + (*Printf.printf "%3d " (b.pos - last_pos);*) 869 + f dt info 870 + done; 871 + check_fmt "alloc id sync" (packet_header.alloc_id_end = Int64.of_int !alloc_id) 872 + in 873 + Unix.lseek s.fd s.data_off SEEK_SET |> ignore; 874 + let rec iter_packets stream = 875 + let header_upper_bound = 200 (* more than big enough for a header *) in 876 + let stream = refill_to header_upper_bound s.fd stream in 877 + if remaining stream = 0 878 + then () 879 + else ( 880 + let packet_header = get_ctf_header stream in 881 + let stream = refill_to packet_header.content_size s.fd stream in 882 + let packet, rest = split stream packet_header.content_size in 883 + if packet_header.pid <> s.info.pid 884 + then 885 + report_hack 886 + "skipping bad packet (wrong pid: %Ld, but tracing %Ld)" 887 + packet_header.pid 888 + s.info.pid 889 + else if remaining packet <> packet_header.content_size 890 + then report_hack "skipping truncated packet" 891 + else iter_events_of_packet packet_header packet; 892 + iter_packets rest) 893 + in 894 + iter_packets (read_fd s.fd (Bytes.make max_packet_size '\000')) 895 + ;; 896 + 897 + module Private = struct 898 + let name_of_memprof_tracer = Atomic.make "" 899 + 900 + let set_name_of_memprof_tracer_module s = 901 + Atomic.set name_of_memprof_tracer (s ^ ".ext_alloc") 902 + ;; 903 + 904 + let obj_ids_per_chunk = Obj_id.Allocator.ids_per_chunk 905 + end 906 + 907 + module Writer = struct 908 + type t = writer 909 + 910 + exception Pid_changed = Pid_changed 911 + 912 + let create = make_writer 913 + let domain t = t.domain 914 + 915 + let for_domain_at_time ~start_time t : domain:int -> t = 916 + let { dest; pid; getpid; _ } = t in 917 + let obj_ids = Obj_id.Allocator.for_new_domain t.obj_ids in 918 + fun ~domain -> 919 + let obj_ids = obj_ids () in 920 + let t = writer_for_domain ~dest ~pid ~getpid ~domain ~obj_ids ~start_time in 921 + t 922 + ;; 923 + 924 + let for_domain t = for_domain_at_time ~start_time:t.packet_time_end t 925 + 926 + (* Unfortunately, efficient access to the backtrace is not possible 927 + with the current Printexc API, even though internally it's an int 928 + array. For now, wave the Obj.magic wand. There's a PR to fix this: 929 + https://github.com/ocaml/ocaml/pull/9663 *) 930 + let location_code_array_of_raw_backtrace (b : Printexc.raw_backtrace) 931 + : Location_code.t array 932 + = 933 + Obj.magic b 934 + ;; 935 + 936 + (* Is this a location that we'd prefer to leave out of traces? This mechanism only 937 + really makes sense for inlinable functions, so that we can drop a single frame out of 938 + a backtrace slot. For non-inlinable functions like 939 + [Memprof_tracer.ext_alloc_slowpath], we instead avoid capturing the slot to begin 940 + with (using [put_alloc_with_suffix_of_raw_backtrace]). *) 941 + let is_internal_location (loc : Location.t) = 942 + String.equal loc.defname (Atomic.get Private.name_of_memprof_tracer) 943 + ;; 944 + 945 + let decode_raw_backtrace_entry callstack i : Location.t list = 946 + let open Printexc in 947 + let rec get_locations slot : Location.t list = 948 + let tail = 949 + match get_raw_backtrace_next_slot slot with 950 + | None -> [] 951 + | Some slot -> get_locations slot 952 + in 953 + let slot = convert_raw_backtrace_slot slot in 954 + match Slot.location slot with 955 + | None -> tail 956 + | Some { filename; line_number; start_char; end_char; _ } -> 957 + let defname = 958 + match Slot.name slot with 959 + | Some n -> n 960 + | _ -> "??" 961 + in 962 + { filename; line = line_number; start_char; end_char; defname } :: tail 963 + in 964 + let locs = get_locations (get_raw_backtrace_slot callstack i) |> List.rev in 965 + match List.filter (fun loc -> not (is_internal_location loc)) locs with 966 + | [] -> 967 + (* It would break things to return an empty list here, and the worst that happens if 968 + we return the whole list is occasionally slightly confusing output (it looks like 969 + Memtrace is using memory rather than user code). *) 970 + locs 971 + | at_least_one -> at_least_one 972 + ;; 973 + 974 + let put_alloc_with_suffix_of_raw_backtrace 975 + t 976 + now 977 + ~length 978 + ~nsamples 979 + ~source 980 + ~callstack 981 + ~drop_slots 982 + = 983 + let callstack_as_ints = location_code_array_of_raw_backtrace callstack in 984 + put_alloc 985 + t 986 + now 987 + ~length 988 + ~nsamples 989 + ~source 990 + ~callstack 991 + ~callstack_as_ints 992 + ~decode_callstack_entry:decode_raw_backtrace_entry 993 + ~drop_slots 994 + ;; 995 + 996 + let put_alloc_with_raw_backtrace t now ~length ~nsamples ~source ~callstack = 997 + let callstack_as_ints = location_code_array_of_raw_backtrace callstack in 998 + put_alloc 999 + t 1000 + now 1001 + ~length 1002 + ~nsamples 1003 + ~source 1004 + ~callstack 1005 + ~callstack_as_ints 1006 + ~decode_callstack_entry:decode_raw_backtrace_entry 1007 + ~drop_slots:0 1008 + ;; 1009 + 1010 + let put_alloc t now ~length ~nsamples ~source ~callstack ~decode_callstack_entry = 1011 + let decode_callstack_entry cs i = decode_callstack_entry cs.(i) in 1012 + put_alloc 1013 + t 1014 + now 1015 + ~length 1016 + ~nsamples 1017 + ~source 1018 + ~callstack 1019 + ~callstack_as_ints:callstack 1020 + ~decode_callstack_entry 1021 + ~drop_slots:0 1022 + ;; 1023 + 1024 + let put_collect = put_collect 1025 + let put_promote = put_promote 1026 + let flush = flush 1027 + 1028 + let close t = 1029 + flush t; 1030 + Buf.Shared_writer_fd.close t.dest 1031 + ;; 1032 + 1033 + let put_event w ~decode_callstack_entry now (ev : Event.t) = 1034 + if Event.domain ev <> w.domain 1035 + then raise (Invalid_argument "Trace.put_event: mismatched domain fields"); 1036 + if now < w.packet_time_end 1037 + then raise (Invalid_argument "Trace.put_event: out-of-order timestamps"); 1038 + match ev with 1039 + | Alloc 1040 + { obj_id 1041 + ; length 1042 + ; domain = _ 1043 + ; nsamples 1044 + ; source 1045 + ; backtrace_buffer 1046 + ; backtrace_length 1047 + ; common_prefix = _ 1048 + } -> 1049 + let btrev = 1050 + Array.init backtrace_length (fun i -> backtrace_buffer.(backtrace_length - 1 - i)) 1051 + in 1052 + let id = 1053 + put_alloc w now ~length ~nsamples ~source ~callstack:btrev ~decode_callstack_entry 1054 + in 1055 + if id <> obj_id then raise (Invalid_argument "Incorrect allocation ID") 1056 + | Promote (id, _domain) -> put_promote w now id 1057 + | Collect (id, _domain) -> put_collect w now id 1058 + ;; 1059 + 1060 + module Multiplexed_domains = struct 1061 + type nonrec t = 1062 + { mutable last_domain : Domain_id.t 1063 + ; (* Invariant: all writers except possibly that of [last_domain] are flushed *) 1064 + writers : t Domain_id.Tbl.t 1065 + ; start_time : Timestamp.t 1066 + } 1067 + 1068 + let create dest ?getpid info = 1069 + let w = create dest ?getpid info in 1070 + let writers = Domain_id.Tbl.create 1 in 1071 + let dom = domain w in 1072 + Domain_id.Tbl.add writers dom w; 1073 + { last_domain = dom; writers; start_time = info.start_time } 1074 + ;; 1075 + 1076 + let writer_for_domain t ~domain = 1077 + let last_w = Domain_id.Tbl.find t.writers t.last_domain in 1078 + if domain = t.last_domain 1079 + then last_w 1080 + else ( 1081 + flush last_w; 1082 + t.last_domain <- domain; 1083 + try Domain_id.Tbl.find t.writers domain with 1084 + | Not_found -> 1085 + let w = (for_domain_at_time ~start_time:t.start_time last_w) ~domain in 1086 + Domain_id.Tbl.add t.writers domain w; 1087 + w) 1088 + ;; 1089 + 1090 + let next_alloc_id t ~domain = 1091 + let w = writer_for_domain t ~domain in 1092 + if not (Obj_id.Allocator.has_next w.obj_ids) then flush_at w ~now:w.packet_time_end; 1093 + Obj_id.Allocator.read_next_exn w.obj_ids 1094 + ;; 1095 + 1096 + let put_event t ~decode_callstack_entry time ev = 1097 + let w = writer_for_domain t ~domain:(Event.domain ev) in 1098 + put_event w ~decode_callstack_entry time ev 1099 + ;; 1100 + 1101 + let flush t = flush (Domain_id.Tbl.find t.writers t.last_domain) 1102 + end 1103 + end 1104 + 1105 + module Reader = struct 1106 + type t = reader 1107 + 1108 + let create = make_reader 1109 + let info s = s.info 1110 + 1111 + let lookup_location_code { loc_table; _ } code = 1112 + match Location_code.Tbl.find loc_table code with 1113 + | v -> v 1114 + | exception Not_found -> 1115 + raise (Invalid_argument (Printf.sprintf "invalid location code %08x" code)) 1116 + ;; 1117 + 1118 + let iter = iter 1119 + let open_ ~filename = make_reader (Unix.openfile filename [ Unix.O_RDONLY ] 0) 1120 + let size_bytes s = (Unix.LargeFile.fstat s.fd).st_size 1121 + let close s = Unix.close s.fd 1122 + end
+238
src/trace.mli
··· 1 + (** Encoder and decoder for Memtrace traces *) 2 + 3 + (** Timestamps *) 4 + module Timestamp : sig 5 + type t 6 + 7 + val now : unit -> t 8 + 9 + (** Convert to and from the number of microseconds since the Unix epoch *) 10 + val of_int64 : int64 -> t 11 + 12 + val to_int64 : t -> int64 13 + 14 + (** Convert back and forth between the Unix module's float format and timestamps *) 15 + val to_float : t -> float 16 + 17 + val of_float : float -> t 18 + end 19 + 20 + (** Times measured from the start of the trace *) 21 + module Timedelta : sig 22 + type t 23 + 24 + (** Convert to the number of microseconds since the start of the trace *) 25 + val to_int64 : t -> int64 26 + 27 + val offset : Timestamp.t -> t -> Timestamp.t 28 + end 29 + 30 + (** Source locations in the traced program *) 31 + module Location : sig 32 + type t = 33 + { filename : string 34 + ; line : int 35 + ; start_char : int 36 + ; end_char : int 37 + ; defname : string 38 + } 39 + 40 + val to_string : t -> string 41 + val unknown : t 42 + end 43 + 44 + (** Identifiers to represent allocations *) 45 + module Obj_id : sig 46 + type t = private int 47 + 48 + (** For convenience, a hashtable keyed by object ID *) 49 + module Tbl : Hashtbl.SeededS with type key = t 50 + end 51 + 52 + (** Identifiers to represent domains *) 53 + module Domain_id : sig 54 + type t = private int 55 + 56 + val main_domain : t 57 + 58 + module Expert : sig 59 + val of_int : int -> t 60 + end 61 + end 62 + 63 + (** Codes for subsequences of locations in a backtrace *) 64 + module Location_code : sig 65 + type t = private int 66 + 67 + (** For convenience, a hashtable keyed by location code *) 68 + module Tbl : Hashtbl.SeededS with type key = t 69 + 70 + module Expert : sig 71 + val of_int : int -> t 72 + end 73 + end 74 + 75 + (** Types of allocation *) 76 + module Allocation_source : sig 77 + type t = 78 + | Minor 79 + | Major 80 + | External 81 + end 82 + 83 + (** Trace events *) 84 + module Event : sig 85 + type t = 86 + | Alloc of 87 + { obj_id : Obj_id.t 88 + (** An identifier for this allocation, used to refer to it in other events. These 89 + identifiers are generated in allocation order. 90 + 91 + Identifiers are not reused across domains. *) 92 + ; length : int 93 + (** The ID of the domain on which this event occurred. 94 + 95 + All events with the same obj_id will occur on the same domain, unless that 96 + domain terminates before collecting this object (in which case promotion or 97 + collection events may occur on an arbitrary domain, but will always appear in 98 + order Alloc-Promote-Collect for a given obj_id). 99 + 100 + Events from different domains and different obj_ids may appear out-of-order in 101 + a trace: timestamps and object identifiers are only guaranteed to increase 102 + monotonically among events from a single domain. Readers that require a global 103 + order of events must merge the streams of events from different domains. *) 104 + ; domain : Domain_id.t 105 + (** Length of the sampled allocation, in words, not including header word *) 106 + ; nsamples : int (** Number of samples made in this allocation. At least 1. *) 107 + ; source : Allocation_source.t (** How this object was initially allocated *) 108 + ; backtrace_buffer : Location_code.t array 109 + (** Backtrace of the allocation. The backtrace elements are stored in order from 110 + caller to callee. The first element is the main entrypoint and the last is the 111 + allocation. 112 + 113 + NB: this is a mutable buffer, reused between events. Entries at indices beyond 114 + [backtrace_length - 1] are not meaningful. If you want to store backtraces, 115 + you must copy them using: [Array.sub backtrace_buffer 0 backtrace_length]. *) 116 + ; backtrace_length : int (** Length of the backtrace *) 117 + ; common_prefix : int 118 + (** A prefix of this length has not changed since the last allocation event that 119 + occurred on the same domain. *) 120 + } 121 + | Promote of Obj_id.t * Domain_id.t 122 + | Collect of Obj_id.t * Domain_id.t 123 + 124 + val domain : t -> Domain_id.t 125 + val to_string : (Location_code.t -> Location.t list) -> t -> string 126 + end 127 + 128 + (** Global trace info *) 129 + module Info : sig 130 + type t = 131 + { sample_rate : float 132 + ; word_size : int 133 + ; executable_name : string 134 + ; host_name : string 135 + ; ocaml_runtime_params : string 136 + ; pid : Int64.t 137 + ; initial_domain : Domain_id.t 138 + ; start_time : Timestamp.t 139 + ; context : string option 140 + } 141 + end 142 + 143 + (** Writing traces *) 144 + module Writer : sig 145 + type t 146 + 147 + exception Pid_changed 148 + 149 + val create : Unix.file_descr -> ?getpid:(unit -> int64) -> Info.t -> t 150 + val domain : t -> Domain_id.t 151 + val for_domain : t -> domain:Domain_id.t -> t 152 + 153 + (** All of the functions below may raise Unix_error if writing to the file descriptor 154 + fails, or Pid_changed if getpid returns a different value. *) 155 + 156 + val put_alloc 157 + : t 158 + -> Timestamp.t 159 + -> length:int 160 + -> nsamples:int 161 + -> source:Allocation_source.t 162 + -> callstack:Location_code.t array 163 + -> decode_callstack_entry:(Location_code.t -> Location.t list) 164 + -> Obj_id.t 165 + 166 + val put_alloc_with_raw_backtrace 167 + : t 168 + -> Timestamp.t 169 + -> length:int 170 + -> nsamples:int 171 + -> source:Allocation_source.t 172 + -> callstack:Printexc.raw_backtrace 173 + -> Obj_id.t 174 + 175 + val put_alloc_with_suffix_of_raw_backtrace 176 + : t 177 + -> Timestamp.t 178 + -> length:int 179 + -> nsamples:int 180 + -> source:Allocation_source.t 181 + -> callstack:Printexc.raw_backtrace 182 + -> drop_slots:int 183 + -> Obj_id.t 184 + 185 + val put_collect : t -> Timestamp.t -> Obj_id.t -> unit 186 + val put_promote : t -> Timestamp.t -> Obj_id.t -> unit 187 + 188 + (** Flushes currently buffered events. The [file_descr] is not closed, and more events 189 + may be written after flush. *) 190 + val flush : t -> unit 191 + 192 + (** Flushes and closes the underlying [file_descr] *) 193 + val close : t -> unit 194 + 195 + module Multiplexed_domains : sig 196 + (** A set of per-domain [Writer.t]s, allowing events from different domains to be 197 + multiplexed into a single stream *) 198 + type t 199 + 200 + val create : Unix.file_descr -> ?getpid:(unit -> int64) -> Info.t -> t 201 + 202 + val put_event 203 + : t 204 + -> decode_callstack_entry:(Location_code.t -> Location.t list) 205 + -> Timestamp.t 206 + -> Event.t 207 + -> unit 208 + 209 + val next_alloc_id : t -> domain:Domain_id.t -> Obj_id.t 210 + val flush : t -> unit 211 + end 212 + end 213 + 214 + (** Reading traces *) 215 + module Reader : sig 216 + type t 217 + 218 + val create : Unix.file_descr -> t 219 + val info : t -> Info.t 220 + val lookup_location_code : t -> Location_code.t -> Location.t list 221 + 222 + (** Iterate over a trace *) 223 + val iter : t -> ?parse_backtraces:bool -> (Timedelta.t -> Event.t -> unit) -> unit 224 + 225 + (** Convenience functions for accessing traces stored in files *) 226 + val open_ : filename:string -> t 227 + 228 + val size_bytes : t -> int64 229 + val close : t -> unit 230 + end 231 + 232 + module Private : sig 233 + (*_ Internal hack to avoid depending on build-system-specific behavior *) 234 + val set_name_of_memprof_tracer_module : string -> unit 235 + 236 + (* For testing *) 237 + val obj_ids_per_chunk : int Atomic.t 238 + end
+26
test/copy.ml
··· 1 + open Memtrace.Trace 2 + 3 + let copy inf outf = 4 + let r = Reader.open_ ~filename:inf in 5 + let wfd = Unix.openfile outf [ O_CREAT; O_WRONLY; O_TRUNC ] 0o600 in 6 + let info = Reader.info r in 7 + let pid = info.pid in 8 + let w = Writer.Multiplexed_domains.create wfd ~getpid:(fun () -> pid) info in 9 + Reader.iter r (fun now ev -> 10 + Writer.Multiplexed_domains.put_event 11 + w 12 + ~decode_callstack_entry:(fun loc -> Reader.lookup_location_code r loc) 13 + (Timedelta.offset info.start_time now) 14 + ev); 15 + Reader.close r; 16 + Writer.Multiplexed_domains.flush w; 17 + Unix.close wfd 18 + ;; 19 + 20 + let () = 21 + match Sys.argv with 22 + | [| _; inf; outf |] -> copy inf outf 23 + | _ -> 24 + Printf.fprintf stderr "usage: copy <in> <out>\n%!"; 25 + exit 1 26 + ;;
+60
test/dune
··· 1 + (executables 2 + (modes byte exe) 3 + (names test copy trace geom_sampler fork) 4 + (libraries memtrace unix) 5 + (preprocess 6 + (pps ppx_jane))) 7 + 8 + (rule 9 + (deps test.exe) 10 + (action 11 + (run ./test.exe)) 12 + (alias runtest)) 13 + 14 + (rule 15 + (deps trace.exe) 16 + (action 17 + (run ./trace.exe)) 18 + (alias runtest)) 19 + 20 + (rule 21 + (deps fork.exe) 22 + (action 23 + (run ./fork.exe)) 24 + (alias runtest)) 25 + 26 + (rule 27 + (deps geom_sampler.exe geom_sampler.expected) 28 + (action 29 + (bash "diff geom_sampler.expected <(./geom_sampler.exe)")) 30 + (alias runtest)) 31 + 32 + (rule 33 + (targets ocamlopt.ctf.copy) 34 + (deps copy.exe ocamlopt.ctf) 35 + (action 36 + (run ./copy.exe ocamlopt.ctf ocamlopt.ctf.copy))) 37 + 38 + (rule 39 + (targets ocamlopt.ctf.txt) 40 + (deps ../bin/dump_trace.exe ocamlopt.ctf) 41 + (action 42 + (bash "../bin/dump_trace.exe ocamlopt.ctf > ocamlopt.ctf.txt"))) 43 + 44 + (rule 45 + (targets ocamlopt.ctf.copy.txt) 46 + (deps ../bin/dump_trace.exe ocamlopt.ctf.copy) 47 + (action 48 + (bash "../bin/dump_trace.exe ocamlopt.ctf.copy > ocamlopt.ctf.copy.txt"))) 49 + 50 + (rule 51 + (deps ocamlopt.ctf.txt ocamlopt.ctf.copy.txt) 52 + (action 53 + (bash "diff -u ocamlopt.ctf.txt ocamlopt.ctf.copy.txt")) 54 + (alias runtest)) 55 + 56 + (rule 57 + (deps ocamlopt.ctf ocamlopt.ctf.copy) 58 + (action 59 + (bash "cmp ocamlopt.ctf ocamlopt.ctf.copy")) 60 + (alias runtest))
+36
test/fork.ml
··· 1 + let test_fork ~quick_exit () = 2 + let filename = Filename.temp_file "memtrace" "ctf" in 3 + (Unix.putenv [@ocaml.alert "-unsafe_multidomain"]) "MEMTRACE" filename; 4 + let tr = Memtrace.start_tracing ~context:None ~sampling_rate:1. ~filename in 5 + let alloc_before = 1234 6 + and alloc_after = 7364 7 + and alloc_child = 42 in 8 + let _ : string array = Sys.opaque_identity Array.make alloc_before "a" in 9 + (match Unix.fork () with 10 + | 0 -> 11 + let count = if quick_exit then 1 else 1000000 in 12 + for _i = 1 to count do 13 + ignore (Sys.opaque_identity Array.make alloc_child "a" : string array) 14 + done; 15 + exit 0 16 + | pid -> 17 + (match Unix.waitpid [] pid with 18 + | _, WEXITED 0 -> () 19 + | _ -> assert false)); 20 + let _ : string array = Sys.opaque_identity Array.make alloc_after "a" in 21 + Memtrace.stop_tracing tr; 22 + let module R = Memtrace.Trace.Reader in 23 + let tr = R.open_ ~filename in 24 + let sizes = Hashtbl.create 20 in 25 + R.iter tr (fun _time ev -> 26 + match ev with 27 + | Alloc a -> Hashtbl.add sizes a.length () 28 + | _ -> ()); 29 + assert (Hashtbl.mem sizes alloc_before); 30 + assert (Hashtbl.mem sizes alloc_after); 31 + assert (not (Hashtbl.mem sizes alloc_child)); 32 + () 33 + ;; 34 + 35 + let () = test_fork ~quick_exit:false () 36 + let () = test_fork ~quick_exit:true ()
+5
test/geom_sampler.expected
··· 1 + sample rate: 0.01 (0.01), P(X = 3): 0.01 (0.01) 2 + sample rate: 0.13 (0.13), P(X = 3): 0.10 (0.10) 3 + sample rate: 0.42 (0.42), P(X = 3): 0.14 (0.14) 4 + sample rate: 0.73 (0.73), P(X = 3): 0.05 (0.05) 5 + sample rate: 1.00 (1.00), P(X = 3): 0.00 (0.00)
+33
test/geom_sampler.ml
··· 1 + open Memtrace.Geometric_sampler 2 + 3 + let test ~sampling_rate = 4 + let t = make ~sampling_rate () in 5 + let total = 100000 in 6 + let s = ref total in 7 + let n = ref 0 in 8 + let n3 = ref 0 in 9 + let rec go () = 10 + let k = draw t in 11 + s := !s - k; 12 + if !s >= 0 13 + then ( 14 + incr n; 15 + if k = 3 then incr n3; 16 + go ()) 17 + in 18 + go (); 19 + let observed_rate = float_of_int !n /. float_of_int total in 20 + let expected_p3 = (1. -. sampling_rate) *. (1. -. sampling_rate) *. sampling_rate in 21 + let observed_p3 = float_of_int !n3 /. float_of_int !n in 22 + (* These numbers are printed at low precision to have a high probability of producing 23 + the same answer. (We don't expect the observed and expected numbers to be exactly 24 + equal) *) 25 + Printf.printf 26 + "sample rate: %.2f (%.2f), P(X = 3): %.2f (%.2f)\n" 27 + sampling_rate 28 + observed_rate 29 + expected_p3 30 + observed_p3 31 + ;; 32 + 33 + let () = [ 0.01; 0.13; 0.42; 0.73; 1. ] |> List.iter (fun s -> test ~sampling_rate:s)
test/ocamlopt.ctf

This is a binary file and will not be displayed.

+201
test/test.ml
··· 1 + open Memtrace.Trace 2 + 3 + let with_temp f = 4 + let s = Filename.temp_file "memtrace" "ctf" in 5 + let fd = Unix.openfile s [ O_RDWR ] 0o600 in 6 + Fun.protect 7 + ~finally:(fun () -> if Sys.file_exists s then Unix.unlink s) 8 + (fun () -> f fd) 9 + ;; 10 + 11 + let mkloc filename line start_char end_char defname = 12 + Location.{ filename; line; start_char; end_char; defname } 13 + ;; 14 + 15 + (* These locations are expected to be encoded and decoded accurately *) 16 + let reasonable_locations = 17 + [ [ mkloc "foo.ml" 42 100 120 "func" ] 18 + ; [ mkloc "apiosdjfoaijsdf.ml" 100 58 1023 "aiosjdf" ] 19 + ; [ mkloc "apiosdjfoaijsdf.ml" 19 97 1023 "aiosjdf"; mkloc "inline" 1 1 1 "fjkisda" ] 20 + ] 21 + ;; 22 + 23 + (* These locations are too big, and are expected to be encoded as "unknown" *) 24 + let ridiculous_locations = 25 + [ [ mkloc (String.make 60000 'j') 1000 1000 1000 (String.make 60000 'a') ] 26 + ; List.init 10000 (fun _ -> mkloc "asdf.ml" 42 93 84 "Asdf.fun") 27 + ] 28 + ;; 29 + 30 + let locations = Array.of_list (reasonable_locations @ ridiculous_locations) 31 + let id : int -> Obj_id.t = Obj.magic 32 + let dom : int -> Domain_id.t = Obj.magic 33 + let loc (n : int) : Location_code.t = Obj.magic (n + 1) 34 + 35 + let info : Info.t = 36 + { sample_rate = 0.01 37 + ; word_size = 64 38 + ; executable_name = "exec" 39 + ; host_name = "host" 40 + ; ocaml_runtime_params = "runtime" 41 + ; pid = 42L 42 + ; initial_domain = dom 0 43 + ; start_time = Timestamp.of_int64 23897423L 44 + ; context = Some "context" 45 + } 46 + ;; 47 + 48 + let events : (int * Event.t) list = 49 + let big_length = 50 + if Int.max_int > 0x3fff_ffff then int_of_string "0x81234567" else 0x31234567 51 + in 52 + [ ( 0 53 + , Alloc 54 + { obj_id = id 0 55 + ; length = 42 56 + ; domain = dom 0 57 + ; nsamples = 1 58 + ; source = Minor 59 + ; backtrace_buffer = [| loc 0; loc 1 |] 60 + ; backtrace_length = 2 61 + ; common_prefix = 0 62 + } ) 63 + ; ( 1 64 + , Alloc 65 + { obj_id = id 1 66 + ; length = 2 67 + ; domain = dom 0 68 + ; nsamples = 1 69 + ; source = Major 70 + ; backtrace_buffer = [| loc 0; loc 1; loc 2 |] 71 + ; backtrace_length = 3 72 + ; common_prefix = 2 73 + } ) 74 + ; ( 100 75 + , Alloc 76 + { obj_id = id 2 77 + ; length = big_length 78 + ; domain = dom 0 79 + ; nsamples = 1 80 + ; source = Minor 81 + ; backtrace_buffer = [| loc 0; loc 1; loc 2 |] 82 + ; backtrace_length = 3 83 + ; common_prefix = 3 84 + } ) 85 + ; 101, Collect (id 1, dom 0) 86 + ; 102, Promote (id 2, dom 0) 87 + ; ( 103 88 + , Alloc 89 + { obj_id = id 3 90 + ; length = 2 91 + ; domain = dom 0 92 + ; nsamples = 1 93 + ; source = External 94 + ; backtrace_buffer = [| loc 3 |] 95 + ; backtrace_length = 1 96 + ; common_prefix = 0 97 + } ) 98 + ; ( 103 99 + , Alloc 100 + { obj_id = id 4 101 + ; length = 2 102 + ; domain = dom 0 103 + ; nsamples = 1 104 + ; source = Minor 105 + ; backtrace_buffer = [| loc 4 |] 106 + ; backtrace_length = 1 107 + ; common_prefix = 0 108 + } ) 109 + ] 110 + ;; 111 + 112 + let copy_event : Event.t -> Event.t = function 113 + | Alloc ev -> 114 + Alloc 115 + { ev with backtrace_buffer = Array.sub ev.backtrace_buffer 0 ev.backtrace_length } 116 + | ev -> ev 117 + ;; 118 + 119 + let test () = 120 + with_temp 121 + @@ fun fd -> 122 + let w = Writer.Multiplexed_domains.create fd info in 123 + let decode_loc l = locations.((l : Location_code.t :> int) - 1) in 124 + events 125 + |> List.iter (fun (i, ev) -> 126 + let now = Int64.(add (Timestamp.to_int64 info.start_time) (of_int (i * 1_000_000))) in 127 + Writer.Multiplexed_domains.put_event 128 + w 129 + ~decode_callstack_entry:decode_loc 130 + (Timestamp.of_int64 now) 131 + ev); 132 + Writer.Multiplexed_domains.flush w; 133 + let _ : int = Unix.lseek fd 0 SEEK_SET in 134 + let r = Reader.create fd in 135 + assert (Reader.info r = info); 136 + let evs = ref [] in 137 + Reader.iter r ~parse_backtraces:true (fun td ev -> 138 + (* assert (Int64.rem td 1_000_000L = 0L); *) 139 + evs 140 + := (Int64.(to_int (div (Timedelta.to_int64 td) 1_000_000L)), copy_event ev) :: !evs); 141 + let rec compare exp act = 142 + match exp, act with 143 + | [], [] -> () 144 + | exp :: exps, act :: acts when exp = act -> compare exps acts 145 + | [], (tact, act) :: _ -> 146 + Printf.printf 147 + "Extra event decoded:\n %d. %s\n%!" 148 + tact 149 + (Event.to_string decode_loc act); 150 + failwith "Extra events" 151 + | (texp, exp) :: _, [] -> 152 + Printf.printf "Missing event:\n %d. %s\n%!" texp (Event.to_string decode_loc exp); 153 + failwith "Missing events" 154 + | (texp, exp) :: _, (tact, act) :: _ -> 155 + Printf.printf 156 + "Event doesn't match. Expected:\n %d. %s\nbut decoded:\n %d. %s\n%!" 157 + texp 158 + (Event.to_string decode_loc exp) 159 + tact 160 + (Event.to_string decode_loc act); 161 + failwith "Incorrect event" 162 + in 163 + compare events (List.rev !evs); 164 + for i = 0 to List.length reasonable_locations - 1 do 165 + assert (Reader.lookup_location_code r (loc i) = locations.(i)) 166 + done; 167 + for i = List.length reasonable_locations to Array.length locations - 1 do 168 + assert (Reader.lookup_location_code r (loc i) = [ Location.unknown ]) 169 + done; 170 + Reader.close r 171 + ;; 172 + 173 + let test_failure () = 174 + with_temp 175 + @@ fun fd -> 176 + let w = Writer.create fd info in 177 + let got_epipe = Atomic.make false in 178 + let report_exn = function 179 + | Unix.Unix_error (Unix.EPIPE, "write", _) -> Atomic.set got_epipe true 180 + | e -> raise e 181 + in 182 + let t = Memtrace.Memprof_tracer.start ~report_exn ~sampling_rate:0.2 w in 183 + for _i = 1 to 50000 do 184 + let _ : int ref = Sys.opaque_identity (ref 42) in 185 + () 186 + done; 187 + let rd, wr = Unix.pipe () in 188 + if not Sys.win32 then Sys.Safe.set_signal Sys.sigpipe Signal_ignore; 189 + Unix.close rd; 190 + Unix.dup2 wr fd; 191 + Unix.close wr; 192 + for _i = 1 to 50000 do 193 + let _ : int ref = Sys.opaque_identity (ref 42) in 194 + () 195 + done; 196 + Memtrace.Memprof_tracer.stop t; 197 + if not (Atomic.get got_epipe) then failwith "should have failed" 198 + ;; 199 + 200 + let () = test () 201 + let () = test_failure ()
+149
test/trace.ml
··· 1 + let check_errors () = 2 + (Unix.putenv [@ocaml.alert "-unsafe_multidomain"]) "MEMTRACE" "/bad/file/name"; 3 + (match Memtrace.trace_if_requested () with 4 + | _ -> assert false 5 + | exception Invalid_argument _ -> ()); 6 + (Unix.putenv [@ocaml.alert "-unsafe_multidomain"]) "MEMTRACE" "/tmp/goodfilename"; 7 + (match Memtrace.trace_if_requested ~sampling_rate:(-3.) () with 8 + | _ -> assert false 9 + | exception Invalid_argument _ -> ()); 10 + (Unix.putenv [@ocaml.alert "-unsafe_multidomain"]) "MEMTRACE" "/tmp/goodfilename"; 11 + (Unix.putenv [@ocaml.alert "-unsafe_multidomain"]) "MEMTRACE_RATE" "42"; 12 + (match Memtrace.trace_if_requested () with 13 + | _ -> assert false 14 + | exception Invalid_argument _ -> ()); 15 + (Unix.putenv [@ocaml.alert "-unsafe_multidomain"]) "MEMTRACE" "/tmp/goodfilename"; 16 + (Unix.putenv [@ocaml.alert "-unsafe_multidomain"]) "MEMTRACE_RATE" "potato"; 17 + match Memtrace.trace_if_requested () with 18 + | _ -> assert false 19 + | exception Invalid_argument _ -> () 20 + ;; 21 + 22 + let () = check_errors () 23 + 24 + let is_bad_location (loc : Memtrace.Trace.Location.t) = 25 + let defname = loc.defname in 26 + String.ends_with defname ~suffix:"ext_alloc" 27 + || String.ends_with defname ~suffix:"ext_alloc_slowpath" 28 + ;; 29 + 30 + let validate_ext_alloc_backtrace r bt len = 31 + (* Make sure the backtrace has been scrubbed of calls to 32 + [ext_alloc] and [ext_alloc_slowpath] in [memprof_tracer.ml] *) 33 + for i = 0 to len - 1 do 34 + let locs = Memtrace.Trace.Reader.lookup_location_code r bt.(i) in 35 + List.iter 36 + (fun (loc : Memtrace.Trace.Location.t) -> assert (not (is_bad_location loc))) 37 + locs 38 + done 39 + ;; 40 + 41 + let globs = Array.make 1000 [||] 42 + let nglobs = ref 0 43 + 44 + let leak x = 45 + globs.(!nglobs) <- x; 46 + incr nglobs 47 + ;; 48 + 49 + let each_iter () = leak (Array.make 1000 0) 50 + 51 + let rec long_bt = function 52 + | 0 -> 53 + leak (Array.make 1000 0); 54 + (Sys.opaque_identity List.iter) each_iter [ () ]; 55 + 42 56 + | n -> if Random.bool () then 1 + long_bt (n - 1) else 2 + long_bt (n - 1) 57 + ;; 58 + 59 + let go () = 60 + let filename = Filename.temp_file "memtrace" "ctf" in 61 + (* Check that we can recover from a bad sampling rate *) 62 + (match Memtrace.start_tracing ~context:(Some "ctx") ~sampling_rate:42.0 ~filename with 63 + | _ -> failwith "should have failed" 64 + | exception _ -> ()); 65 + (* Verify that non-initial domains aren't traced *) 66 + let dom_func () = 67 + for _i = 1 to 10 do 68 + let _ : int array = Array.make 500 42 |> Sys.opaque_identity in 69 + match Memtrace.External.alloc ~bytes:50 with 70 + | Null -> () 71 + | This _ -> assert false 72 + done 73 + in 74 + let _ : _ = Sys.opaque_identity dom_func in 75 + let t = Memtrace.start_tracing ~context:(Some "ctx") ~sampling_rate:0.1 ~filename in 76 + let spawn_traced_domain () = 77 + (Domain.Safe.spawn [@inlined never] [@alert "-do_not_spawn_domains"]) 78 + (Obj.magic_portable dom_func) [@nontail] 79 + in 80 + let domain = 81 + if Domain.recommended_domain_count () <= 1 82 + then None 83 + else Some (spawn_traced_domain ()) 84 + in 85 + leak (Array.make 4242 42); 86 + for _i = 1 to 10 do 87 + let n = long_bt 10_000 in 88 + assert (n > 0) 89 + done; 90 + for _i = 1 to 1000 do 91 + match Memtrace.External.alloc ~bytes:(Sys.word_size / 8 * 7) with 92 + | Null -> () 93 + | This token -> Memtrace.External.free token 94 + done; 95 + Memtrace.stop_tracing t; 96 + Option.iter Domain.join domain; 97 + let r = Memtrace.Trace.Reader.open_ ~filename in 98 + let first = ref true in 99 + let n_long = ref 0 in 100 + let last_ext = ref None in 101 + let ext_samples = ref 0 in 102 + (* Ignore any samples arising from spawn_traced_domains *) 103 + let should_ignore_backtrace (bt : Memtrace.Trace.Location_code.t array) = 104 + bt 105 + |> Array.exists (fun loc -> 106 + Memtrace.Trace.Reader.lookup_location_code r loc 107 + |> List.exists (fun (loc : Memtrace.Trace.Location.t) -> 108 + String.ends_with loc.defname ~suffix:"spawn_traced_domain")) 109 + in 110 + let ignored_ids = Memtrace.Trace.Obj_id.Tbl.create 10 in 111 + Memtrace.Trace.Reader.iter r (fun _ ev -> 112 + match ev with 113 + | Alloc info 114 + when should_ignore_backtrace 115 + (Array.sub info.backtrace_buffer 0 info.backtrace_length) -> 116 + Memtrace.Trace.Obj_id.Tbl.add ignored_ids info.obj_id () 117 + | (Promote (id, _) | Collect (id, _)) 118 + when Memtrace.Trace.Obj_id.Tbl.mem ignored_ids id -> () 119 + | Alloc info when !first -> 120 + first := false; 121 + assert (info.length = 4242); 122 + () 123 + | Alloc info when info.length = 1000 -> 124 + (* backtraces should be truncated *) 125 + assert (info.backtrace_length > 3500 && info.backtrace_length < 4000); 126 + incr n_long 127 + | Alloc info when info.length = 7 -> 128 + last_ext := Some info.obj_id; 129 + ext_samples := !ext_samples + info.nsamples; 130 + validate_ext_alloc_backtrace r info.backtrace_buffer info.backtrace_length 131 + | Collect (id, _) -> 132 + assert (!last_ext = Some id); 133 + last_ext := None 134 + | e -> 135 + failwith 136 + ("unexpected " 137 + ^ Memtrace.Trace.Event.to_string (Memtrace.Trace.Reader.lookup_location_code r) e 138 + )); 139 + Memtrace.Trace.Reader.close r; 140 + Unix.unlink filename; 141 + assert (650 <= !ext_samples && !ext_samples < 750); 142 + assert (not !first); 143 + assert (!n_long = 20) 144 + ;; 145 + 146 + let () = 147 + go (); 148 + go () 149 + ;;
+8
trace_ocamlopt/README.md
··· 1 + # memtrace support for `ocamlopt` 2 + 3 + The `ocamlopt.exe` program in this directory is a version of 4 + `ocamlopt` that supports tracing with memtrace. 5 + 6 + Use the environment variable `MEMTRACE` to specify a trace file 7 + location. If desire, the sampling rate can be adjusted with 8 + `MEMTRACE_RATE`.
+4
trace_ocamlopt/ocamlopt.ml
··· 1 + let () = 2 + Memtrace.trace_if_requested (); 3 + exit (Optmaindriver.main Sys.argv Format.err_formatter) 4 + ;;