@recaptime-dev's working patches + fork for Phorge, a community fork of Phabricator. (Upstream dev and stable branches are at upstream/main and upstream/stable respectively.)
hq.recaptime.dev/wiki/Phorge
phorge
phabricator
1@title Drydock User Guide
2@group userguide
3
4Drydock, a software and hardware resource manager.
5
6Overview
7========
8
9WARNING: Drydock is very new and has many sharp edges. Prepare yourself for
10a challenging adventure in unmapped territory, not a streamlined experience
11where things work properly or make sense.
12
13Drydock is an infrastructure application that primarily helps other
14applications coordinate during complex build and deployment tasks. Typically,
15you will configure Drydock to enable capabilities in other applications:
16
17 - Harbormaster can use Drydock to host builds.
18 - Differential can use Drydock to perform server-side merges.
19
20Users will not normally interact with Drydock directly.
21
22If you want to get started with Drydock right away, see
23@{article:Drydock User Guide: Quick Start} for specific instructions on
24configuring integrations.
25
26
27What Drydock Does
28=================
29
30Drydock manages working copies, hosts, and other software and hardware
31resources that build and deployment processes may require in order to perform
32useful work.
33
34Many useful processes need a working copy of a repository (or some similar sort
35of resource) so they can read files, perform version control operations, or
36execute code.
37
38For example, you might want to be able to automatically run unit tests, build a
39binary, or generate documentation every time a new commit is pushed. Or you
40might want to automatically merge a revision or cherry-pick a commit from a
41development branch to a release branch. Any of these tasks need a working copy
42of the repository before they can get underway.
43
44These processes could just clone a new working copy when they started and
45delete it when they finished. This works reasonably well at a small scale, but
46will eventually hit limitations if you want to do things like: expand the build
47tier to multiple machines; or automatically scale the tier up and down based on
48usage; or reuse working copies to improve performance; or make sure things get
49cleaned up after a process fails; or have jobs wait if the tier is too busy.
50Solving these problems effectively requires coordination between the processes
51doing the actual work.
52
53Drydock solves these scaling problems by providing a central allocation
54framework for //resources//, which are physical or virtual resources like a
55host or a working copy. Processes which need to share hardware or software can
56use Drydock to coordinate creation, access, and destruction of those resources.
57
58Applications ask Drydock for resources matching a description, and it allocates
59a corresponding resource by either finding a suitable unused resource or
60creating a new resource. When work completes, the resource is returned to the
61resource pool or destroyed.
62
63
64Getting Started with Drydock
65============================
66
67In general, you will interact with Drydock by configuring blueprints, which
68tell Drydock how to build resources. You can jump into this topic directly
69in @{article:Drydock Blueprints}.
70
71For help on configuring specific application features:
72
73 - to configure server-side merges from Differential, see
74 @{article:Differential User Guide: Automated Landing}.
75
76You should also understand the Drydock security model before deploying it
77in a production environment. See @{article:Drydock User Guide: Security}.
78
79The remainder of this document has some additional high-level discussion about
80how Drydock works and why it works that way, which may be helpful in
81understanding the application as a whole.
82
83
84Drydock Concepts
85================
86
87The major concepts in Drydock are **Blueprints**, **Resources**, **Leases**,
88and the **Allocator**.
89
90**Blueprints** are configuration that tells Drydock how to create resources:
91where it can put them, how to access them, how many it can make at once, who is
92allowed to ask for access to them, how to actually build them, how to clean
93them up when they are no longer in use, and so on.
94
95Drydock starts without any blueprints. You'll add blueprints to configure
96Drydock and enable it to satisfy requests for resources. You can learn more
97about blueprints in @{article:Drydock Blueprints}.
98
99**Resources** represent things (like hosts or working copies) that Drydock has
100created, is managing the lifecycle for, and can give other applications access
101to.
102
103**Leases** are requests for resources with certain qualities by other
104applications. For example, Harbormaster may request a working copy of a
105particular repository so it can run unit tests.
106
107The **Allocator** is where Drydock actually does work. It works roughly like
108this:
109
110 - An application creates a lease describing a resource it needs, and
111 uses this lease to ask Drydock for an appropriate resource.
112 - Drydock looks at free resources to try to find one it can use to satisfy
113 the request. If it finds one, it marks the resource as in use and gives
114 the application details about how to access it.
115 - If it can't find an appropriate resource that already exists, it looks at
116 the blueprints it has configured to try to build one. If it can, it creates
117 a new resource, then gives the application access to it.
118 - Once the application finishes using the resource, it frees it. Depending
119 on configuration, Drydock may reuse it, destroy it, or hold onto it and
120 make a decision later.
121
122Some minor concepts in Drydock are **Slot Locks** and **Repository Operations**.
123
124**Slot Locks** are simple optimistic locks that most Drydock blueprints use to
125avoid race conditions. Their design is not particularly interesting or novel,
126they're just a fairly good fit for most of the locking problems that Drydock
127blueprints tend to encounter and Drydock provides APIs to make them easy to
128work with.
129
130**Repository Operations** help other applications coordinate writes to
131repositories. Multiple applications perform similar kinds of writes, and these
132writes require more sequencing/coordination and user feedback than other
133operations.
134
135
136Architecture Overview
137=====================
138
139This section describes some of Drydock's design goals and architectural
140choices, so you can understand its strengths and weaknesses and which problem
141domains it is well or poorly suited for.
142
143A typical use case for Drydock is giving another application access to a
144working copy in order to run a build or unit test operation. Drydock can
145satisfy the request and resume execution of application code in 1-2 seconds
146under reasonable conditions and with moderate tradeoffs, and can satisfy a
147large number of these requests in parallel.
148
149**Scalable**: Drydock is designed to scale easily to something in the realm of
150thousands of hosts in hundreds of pools, and far beyond that with a little
151work.
152
153Drydock is intended to solve resource management problems at very large scales
154and minimizes blocking operations, locks, and artificial sequencing. Drydock is
155designed to fully utilize an almost arbitrarily large pool of resources and
156improve performance roughly linearly with available hardware.
157
158Because the application assumes that deployment at this scale and complexity
159level is typical, you may need to configure more things and do more work than
160you would under the simplifying assumptions of small scale.
161
162**Heavy Resources**: Drydock assumes that resources are relatively
163heavyweight and and require a meaningful amount (a second or more) of work to
164build, maintain and tear down. It also assumes that leases will often have
165substantial lifespans (seconds or minutes) while performing operations.
166
167Resources like working copies (which typically take several seconds to create
168with a command like `git clone`) and VMs (which typically take several seconds
169to spin up) are good fits for Drydock and for the problems it is intended to
170solve.
171
172Lease operations like running unit tests, performing builds, executing merges,
173generating documentation and running temporary services (which typically last
174at least a few seconds) are also good fits for Drydock.
175
176In both cases, the general concern with lightweight resources and operations is
177that Drydock operation overhead is roughly on the order of a second for many
178tasks, so overhead from Drydock will be substantial if resources are built and
179torn down in a few milliseconds or lease operations require only a fraction of
180a second to execute.
181
182As a rule of thumb, Drydock may be a poor fit for a problem if operations
183typically take less than a second to build, execute, and destroy.
184
185**Focus on Resource Construction**: Drydock is primarily solving a resource
186construction problem: something needs a resource matching some description, so
187Drydock finds or builds that resource as quickly as possible.
188
189Drydock generally prioritizes responding to requests quickly over other
190concerns, like minimizing waste or performing complex scheduling. Although you
191can make adjustments to some of these behaviors, it generally assumes that
192resources are cheap compared to the cost of waiting for resource construction.
193
194This isn't to say that Drydock is grossly wasteful or has a terrible scheduler,
195just that efficient utilization and efficient scheduling aren't the primary
196problems the design focuses on.
197
198This prioritization corresponds to scenarios where resources are something like
199hosts or working copies, and operations are something like builds, and the cost
200of hosts and storage is small compared to the cost of engineer time spent
201waiting on jobs to get scheduled.
202
203Drydock may be a weak fit for a problem if it is bounded by resource
204availability and using resources as efficiently as possible is very important.
205Drydock generally assumes you will respond to a resource deficit by making more
206resources available (usually very cheap), rather than by paying engineers to
207wait for operations to complete (usually very expensive).
208
209**Isolation Tradeoffs**: Drydock assumes that multiple operations running at
210similar levels of trust may be interested in reducing isolation to improve
211performance, reduce complexity, or satisfy some other similar goal. It does not
212guarantee isolation and assumes most operations will not run in total isolation.
213
214If this isn't true for your use case, you'll need to be careful in configuring
215Drydock to make sure that operations are fully isolated and can not interact.
216Complete isolation will reduce the performance of the allocator as it will
217generally prevent it from reusing resources, which is one of the major ways it
218can improve performance.
219
220You can find more discussion of these tradeoffs in
221@{article:Drydock User Guide: Security}.
222
223**Agentless**: Drydock does not require an agent or daemon to be installed on
224hosts. It interacts with hosts over SSH.
225
226**Very Abstract**: Drydock's design is //extremely// abstract. Resources have
227very little hardcoded behavior. The allocator has essentially zero specialized
228knowledge about what it is actually doing.
229
230One aspect of this abstractness is that Drydock is composable, and solves
231complex allocation problems by //asking itself// to build the pieces it needs.
232To build a working copy, Drydock first asks itself for a suitable host. It
233solves this allocation sub-problem, then resolves the original request.
234
235This allows new types of resources to build on Drydock's existing knowledge of
236resource construction by just saying "build one of these other things you
237already know how to build, then apply a few adjustments". This also means that
238you can tell Drydock about a new way to build hosts (say, bring up VMs from a
239different service provider) and the rest of the pipeline can use these new
240hosts interchangeably with the old hosts.
241
242While this design theoretically makes Drydock more powerful and more flexible
243than a less abstract approach, abstraction is frequently a double-edged sword.
244
245Drydock is almost certainly at the extreme upper end of abstraction for tools
246in this space, and the level of abstraction may ultimately match poorly with a
247particular problem domain. Alternative approaches may give you more specialized
248and useful tools for approaching a given problem.
249
250
251Next Steps
252==========
253
254Continue by:
255
256 - understanding Drydock security concerns with
257 @{article:Drydock User Guide: Security}; or
258 - learning about blueprints in @{article:Drydock Blueprints}; or
259 - allowing Phorge to write to repositories with
260 @{article:Drydock User Guide: Repository Automation}.