Strategies for finding binary dependencies
at main 125 lines 4.9 kB view raw view rendered
1<!-- 2© Vlad-Stefan Harbuz <vlad@vlad.website> 3SPDX-License-Identifier: CC-BY-SA-4.0 4--> 5 6# bindep — Strategies for finding binary dependencies 7 8A codebase might depend on another project's source code; or it might depend on another project's compiled binaries. 9Source code dependency relationships are mostly easy to identify; binary dependency relationships are not. We need to 10identify binary dependency relationships to ensure the Open Source ecosystem is secure and sustainably funded. 11 12This project aims to provide tools that enable us to identify binary dependency relationships. 13 14## General Details 15 16**FOSDEM 2026 talk** 17 18* [Binary Dependencies: Identifying the Hidden Packages We All Depend On][fosdem-talk] 19 20**My initial proposal describing the broad approach — though the technical details are out of date** 21 22* [Bindep, a Binary Dependency Discovery System][proposal] 23 24**ecosyste.ms issue with more general details** 25 26* [Connecting the dots between system package managers and language package managers][packages1261] 27 28## Results: Finding Needed Dynamic Libraries in Python Wheels 29 30I analysed the most downloaded Python wheels, to see which dynamic libraries those wheels most depend on. 31 32I attempted to download wheels for the 15,000 most downloaded Python packages according to [hugovk's 33top-pypi-packages](https://hugovk.github.io/top-pypi-packages/). 34 35I successfully downloaded 13,874 wheels. I failed to download 1,126 wheels — mostly wheels that did not have builds for 36Linux available. 37 38I only analysed dependencies originating in extension modules included in these wheels. Unfortunately, other kinds of 39binary dependency relationships, like those implemented using `libfft`, will be more difficult to find. For more details 40on this, see my post [_How Binary Dependencies Work Across Different 41Languages_](https://vlad.website/how-binary-dependencies-work/). 42 43Of those wheels, 1,531 wheels contained `.so` files. This is around 9% of the Python ecosystem, so this validates the 44research direction — because we currently cannot reliably identify binary dependencies, it looks like we have 45significant holes in the dependency graph of around 9% of the Python ecosystem. 46 47I found a total of 12,137 `.so` files (of which 39 could not be read). Those `.so` files include both bundled 48dependencies, and the `.so` files of each respective wheel's extension modules. 49 50In those `.so` files, I looked up items listed as `DT_NEEDED` in the ELF file's `.dynamic` section — this gives us the 51names of the libraries that each `.so` file depends on. 52 53This means we _can_ see: 54 55* libs that extension modules depend on 56* libs that bundled dependencies depend on 57 58but we _cannot_ see 59 60* the libs that all of those libs depend on. 61 62This is a significant limitation. 63 64Among all `.so` files, I found 96,570 instances of a lib being needed. 2,862 unique libs were needed. 65 66The 10 most required libs are relatively unsurprising: 67 68``` 69libc,11927 70libpthread,7827 71libm,7113 72libgcc_s,6619 73libstdc++,6267 74libdl,3186 75ld-linux-x86-64,1835 76librt,1434 77libGL,899 78libQt6Core,699 79``` 80 81Some are a little interesting: 82 83``` 84libxkbcommon,380 85libtensorflow_framework,379 86``` 87 88Some I did not expect to be so common: 89 90``` 91libvtkfmt,315 92libvtksys,314 93libvtkscn,314 94libvtktoken,314 95libvtkCommonCore,313 96``` 97 98The full results can be found in 99[`results/260121-libs-found-in-python-wheels.txt`](/results/260121-libs-found-in-python-wheels.txt). 100 101The results are a little noisy — for example, a bunch of libs have names ending in hashes like `-01abcdef`. Maybe those 102suffixes should be removed; but then again, many packages seem to depend on the same hashes. Anyway, I think this is 103enough to get a general idea of the approach for now. 104 105The source code is available here: 106[`find_needed_libs.rs`](https://tangled.org/vlad.website/bindep/blob/main/src/bin/find_needed_libs.rs). 107 108My [initial proposal][proposal] mentioned constructing a big map of which dynamic symbols are required by which language 109package manager packages, and which dynamic symbols are provided by which system package manager packages. I didn't take 110this approach in this case. 111 112For one thing, the ELF files contain the name of the libraries they depend on, so we can figure that out without the 113symbols. And for another thing, knowing the filenames means we can examine system package managers to see which packages 114provide which dynamic library files. This should be relatively reliable. 115 116But — we might still want to mine symbols for some other reason. 117 118## Authorship 119 120Vlad-Stefan Harbuz ([vlad.website][vlad]) unless otherwise noted. 121 122[fosdem-talk]: https://fosdem.org/2026/schedule/event/7NQJNU-binary_dependencies_identifying_the_hidden_packages_we_all_depend_on/ 123[packages1261]: https://github.com/ecosyste-ms/packages/issues/1261 124[proposal]: https://hackmd.io/@vladh/binary-dependencies 125[vlad]: https://vlad.website