Strategies for finding binary dependencies
1<!--
2© Vlad-Stefan Harbuz <vlad@vlad.website>
3SPDX-License-Identifier: CC-BY-SA-4.0
4-->
5
6# bindep — Strategies for finding binary dependencies
7
8A codebase might depend on another project's source code; or it might depend on another project's compiled binaries.
9Source code dependency relationships are mostly easy to identify; binary dependency relationships are not. We need to
10identify binary dependency relationships to ensure the Open Source ecosystem is secure and sustainably funded.
11
12This project aims to provide tools that enable us to identify binary dependency relationships.
13
14## General Details
15
16**FOSDEM 2026 talk**
17
18* [Binary Dependencies: Identifying the Hidden Packages We All Depend On][fosdem-talk]
19
20**My initial proposal describing the broad approach — though the technical details are out of date**
21
22* [Bindep, a Binary Dependency Discovery System][proposal]
23
24**ecosyste.ms issue with more general details**
25
26* [Connecting the dots between system package managers and language package managers][packages1261]
27
28## Results: Finding Needed Dynamic Libraries in Python Wheels
29
30I analysed the most downloaded Python wheels, to see which dynamic libraries those wheels most depend on.
31
32I attempted to download wheels for the 15,000 most downloaded Python packages according to [hugovk's
33top-pypi-packages](https://hugovk.github.io/top-pypi-packages/).
34
35I successfully downloaded 13,874 wheels. I failed to download 1,126 wheels — mostly wheels that did not have builds for
36Linux available.
37
38I only analysed dependencies originating in extension modules included in these wheels. Unfortunately, other kinds of
39binary dependency relationships, like those implemented using `libfft`, will be more difficult to find. For more details
40on this, see my post [_How Binary Dependencies Work Across Different
41Languages_](https://vlad.website/how-binary-dependencies-work/).
42
43Of those wheels, 1,531 wheels contained `.so` files. This is around 9% of the Python ecosystem, so this validates the
44research direction — because we currently cannot reliably identify binary dependencies, it looks like we have
45significant holes in the dependency graph of around 9% of the Python ecosystem.
46
47I found a total of 12,137 `.so` files (of which 39 could not be read). Those `.so` files include both bundled
48dependencies, and the `.so` files of each respective wheel's extension modules.
49
50In those `.so` files, I looked up items listed as `DT_NEEDED` in the ELF file's `.dynamic` section — this gives us the
51names of the libraries that each `.so` file depends on.
52
53This means we _can_ see:
54
55* libs that extension modules depend on
56* libs that bundled dependencies depend on
57
58but we _cannot_ see
59
60* the libs that all of those libs depend on.
61
62This is a significant limitation.
63
64Among all `.so` files, I found 96,570 instances of a lib being needed. 2,862 unique libs were needed.
65
66The 10 most required libs are relatively unsurprising:
67
68```
69libc,11927
70libpthread,7827
71libm,7113
72libgcc_s,6619
73libstdc++,6267
74libdl,3186
75ld-linux-x86-64,1835
76librt,1434
77libGL,899
78libQt6Core,699
79```
80
81Some are a little interesting:
82
83```
84libxkbcommon,380
85libtensorflow_framework,379
86```
87
88Some I did not expect to be so common:
89
90```
91libvtkfmt,315
92libvtksys,314
93libvtkscn,314
94libvtktoken,314
95libvtkCommonCore,313
96```
97
98The full results can be found in
99[`results/260121-libs-found-in-python-wheels.txt`](/results/260121-libs-found-in-python-wheels.txt).
100
101The results are a little noisy — for example, a bunch of libs have names ending in hashes like `-01abcdef`. Maybe those
102suffixes should be removed; but then again, many packages seem to depend on the same hashes. Anyway, I think this is
103enough to get a general idea of the approach for now.
104
105The source code is available here:
106[`find_needed_libs.rs`](https://tangled.org/vlad.website/bindep/blob/main/src/bin/find_needed_libs.rs).
107
108My [initial proposal][proposal] mentioned constructing a big map of which dynamic symbols are required by which language
109package manager packages, and which dynamic symbols are provided by which system package manager packages. I didn't take
110this approach in this case.
111
112For one thing, the ELF files contain the name of the libraries they depend on, so we can figure that out without the
113symbols. And for another thing, knowing the filenames means we can examine system package managers to see which packages
114provide which dynamic library files. This should be relatively reliable.
115
116But — we might still want to mine symbols for some other reason.
117
118## Authorship
119
120Vlad-Stefan Harbuz ([vlad.website][vlad]) unless otherwise noted.
121
122[fosdem-talk]: https://fosdem.org/2026/schedule/event/7NQJNU-binary_dependencies_identifying_the_hidden_packages_we_all_depend_on/
123[packages1261]: https://github.com/ecosyste-ms/packages/issues/1261
124[proposal]: https://hackmd.io/@vladh/binary-dependencies
125[vlad]: https://vlad.website