@recaptime-dev's working patches + fork for Phorge, a community fork of Phabricator. (Upstream dev and stable branches are at upstream/main and upstream/stable respectively.) hq.recaptime.dev/wiki/Phorge
phorge phabricator

Implement optimistic "slot locks" in Drydock

Summary:
See discussion in D10304. There's a lot of context there, but the general idea is:

- Blueprints should manage locks in a granular way during the actual allocation/acquisition phase.
- Optimistic "slot locks" might a pretty good primitive to make that easy to implement and reason about in most cases.

The way these locks work is that you just pick some name for the lock (like the PHID of a resource) and say that it needs to be acquired for the allocation/acquisition to work:

```
...
->needSlotLock("mylock(PHID-XYZQ-...)")
...
```

When you fire off the acquisition or allocation, it fails unless it could acquire the slot with that name. This is really simple (no explicit lock management) and a pretty good fit for most of the locking that blueprints and leases need to do.

If you need to do limit-based locks (e.g., maximum of 3 locks) you could acquire a lock like this:

```
mylock(whatever).slot(2)
```

Blueprints generally only contend with themselves, so it's normally OK for them to pick whatever strategy works best for them in naming locks.

This may not work as well if you have a huge number of slots (e.g., 100TB you want to give out in 1MB chunks), or other complex needs for locks (like you have to synchronize access to some external resource), but slot locks don't need to be the only mechanism that blueprints use. If they run into a problem that slot locks aren't a good fit for, they can use something else instead. For now, slot locks seem like a good fit for the problems we currently face and most of the problems I anticipate facing.

(The release workflows have other race issues which I'm not addressing here. They work fine if nothing races, but aren't race-safe.)

Test Plan:
To create a race where the same binding is allocated as a resource twice:

- Add `sleep(10)` near the beginning of `allocateResource()`, after the free bindings are loaded but before resources are allocated.
- (Comment out slot lock acquisition if you have this patch.)
- Run `bin/drydock lease ...` in two windows, within 10 seconds of one another.

This will reliably double-allocate the binding because both blueprints see a view of the world where the binding is free.

To verify the lock works, un-comment it (or apply this patch) and run the same test again. Now, the lock fails in one process and only one resource is allocated.

Reviewers: hach-que, chad

Reviewed By: hach-que, chad

Differential Revision: https://secure.phabricator.com/D14118

+185 -19
+8
resources/sql/autopatches/20150916.drydock.slotlocks.1.sql
··· 1 + CREATE TABLE {$NAMESPACE}_drydock.drydock_slotlock ( 2 + id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, 3 + ownerPHID VARBINARY(64) NOT NULL, 4 + lockIndex BINARY(12) NOT NULL, 5 + lockKey LONGTEXT NOT NULL COLLATE {$COLLATE_TEXT}, 6 + UNIQUE KEY `key_lock` (lockIndex), 7 + KEY `key_owner` (ownerPHID) 8 + ) ENGINE=InnoDB, COLLATE {$COLLATE_TEXT};
+2
src/__phutil_library_map__.php
··· 863 863 'DrydockResourceViewController' => 'applications/drydock/controller/DrydockResourceViewController.php', 864 864 'DrydockSFTPFilesystemInterface' => 'applications/drydock/interface/filesystem/DrydockSFTPFilesystemInterface.php', 865 865 'DrydockSSHCommandInterface' => 'applications/drydock/interface/command/DrydockSSHCommandInterface.php', 866 + 'DrydockSlotLock' => 'applications/drydock/storage/DrydockSlotLock.php', 866 867 'DrydockWebrootInterface' => 'applications/drydock/interface/webroot/DrydockWebrootInterface.php', 867 868 'DrydockWorkingCopyBlueprintImplementation' => 'applications/drydock/blueprint/DrydockWorkingCopyBlueprintImplementation.php', 868 869 'FeedConduitAPIMethod' => 'applications/feed/conduit/FeedConduitAPIMethod.php', ··· 4585 4586 'DrydockResourceViewController' => 'DrydockResourceController', 4586 4587 'DrydockSFTPFilesystemInterface' => 'DrydockFilesystemInterface', 4587 4588 'DrydockSSHCommandInterface' => 'DrydockCommandInterface', 4589 + 'DrydockSlotLock' => 'DrydockDAO', 4588 4590 'DrydockWebrootInterface' => 'DrydockInterface', 4589 4591 'DrydockWorkingCopyBlueprintImplementation' => 'DrydockBlueprintImplementation', 4590 4592 'FeedConduitAPIMethod' => 'ConduitAPIMethod',
+11 -9
src/applications/drydock/blueprint/DrydockAlmanacServiceHostBlueprintImplementation.php
··· 67 67 $device = $binding->getDevice(); 68 68 $device_name = $device->getName(); 69 69 70 + $binding_phid = $binding->getPHID(); 71 + 70 72 $resource = $this->newResourceTemplate($blueprint, $device_name) 71 73 ->setActivateWhenAllocated(true) 72 74 ->setAttribute('almanacServicePHID', $binding->getServicePHID()) 73 - ->setAttribute('almanacBindingPHID', $binding->getPHID()); 74 - 75 - // TODO: This algorithm can race, and the "free" binding may not be 76 - // free by the time we acquire it. Do slot-locking here if that works 77 - // out, or some other kind of locking if it does not. 75 + ->setAttribute('almanacBindingPHID', $binding_phid) 76 + ->needSlotLock("almanac.host.binding({$binding_phid})"); 78 77 79 78 try { 80 79 return $resource->allocateResource(DrydockResourceStatus::STATUS_OPEN); ··· 93 92 DrydockResource $resource, 94 93 DrydockLease $lease) { 95 94 96 - // TODO: We'll currently lease each resource an unlimited number of times, 97 - // but should stop doing that. 95 + // TODO: The current rule is one lease per resource, and there's no way to 96 + // make that cheaper here than by just trying to acquire the lease below, 97 + // so don't do any special checks for now. When we eventually permit 98 + // multiple leases per host, we'll need to load leases anyway, so we can 99 + // reject fully leased hosts cheaply here. 98 100 99 101 return true; 100 102 } ··· 104 106 DrydockResource $resource, 105 107 DrydockLease $lease) { 106 108 107 - // TODO: Once we have limit rules, we should perform slot locking (or other 108 - // kinds of locking) here. 109 + $resource_phid = $resource->getPHID(); 109 110 110 111 $lease 111 112 ->setActivateWhenAcquired(true) 113 + ->needSlotLock("almanac.host.lease({$resource_phid})") 112 114 ->acquireOnResource($resource); 113 115 } 114 116
+6
src/applications/drydock/blueprint/DrydockBlueprintImplementation.php
··· 106 106 DrydockLease $lease); 107 107 108 108 final public function releaseLease( 109 + DrydockBlueprint $blueprint, 109 110 DrydockResource $resource, 110 111 DrydockLease $lease) { 112 + 113 + // TODO: This is all broken nonsense. 114 + 111 115 $scope = $this->pushActiveScope(null, $lease); 112 116 113 117 $released = false; ··· 117 121 $lease->reload(); 118 122 119 123 if ($lease->getStatus() == DrydockLeaseStatus::STATUS_ACTIVE) { 124 + $lease->release(); 120 125 $lease->setStatus(DrydockLeaseStatus::STATUS_RELEASED); 121 126 $lease->save(); 122 127 $released = true; ··· 293 298 294 299 $resource = id(new DrydockResource()) 295 300 ->setBlueprintPHID($blueprint->getPHID()) 301 + ->attachBlueprint($blueprint) 296 302 ->setType($this->getType()) 297 303 ->setStatus(DrydockResourceStatus::STATUS_PENDING) 298 304 ->setName($name);
+11
src/applications/drydock/storage/DrydockBlueprint.php
··· 163 163 } 164 164 165 165 166 + /** 167 + * @task lease 168 + */ 169 + public function releaseLease( 170 + DrydockResource $resource, 171 + DrydockLease $lease) { 172 + $this->getImplementation()->releaseLease($this, $resource, $lease); 173 + return $this; 174 + } 175 + 176 + 166 177 /* -( PhabricatorApplicationTransactionInterface )------------------------- */ 167 178 168 179
+19 -4
src/applications/drydock/storage/DrydockLease.php
··· 15 15 private $releaseOnDestruction; 16 16 private $isAcquired = false; 17 17 private $activateWhenAcquired = false; 18 + private $slotLocks = array(); 18 19 19 20 /** 20 21 * Flag this lease to be released when its destructor is called. This is ··· 128 129 $this->setStatus(DrydockLeaseStatus::STATUS_RELEASED); 129 130 $this->save(); 130 131 132 + DrydockSlotLock::releaseLocks($this->getPHID()); 133 + 131 134 $this->resource = null; 132 135 133 136 return $this; ··· 206 209 return $this; 207 210 } 208 211 212 + public function needSlotLock($key) { 213 + $this->slotLocks[] = $key; 214 + return $this; 215 + } 216 + 209 217 public function acquireOnResource(DrydockResource $resource) { 210 218 $expect_status = DrydockLeaseStatus::STATUS_PENDING; 211 219 $actual_status = $this->getStatus(); ··· 234 242 } 235 243 } 236 244 237 - $this 238 - ->setResourceID($resource->getID()) 239 - ->setStatus($new_status) 240 - ->save(); 245 + $this->openTransaction(); 246 + 247 + $this 248 + ->setResourceID($resource->getID()) 249 + ->setStatus($new_status) 250 + ->save(); 251 + 252 + DrydockSlotLock::acquireLocks($this->getPHID(), $this->slotLocks); 253 + $this->slotLocks = array(); 254 + 255 + $this->saveTransaction(); 241 256 242 257 $this->isAcquired = true; 243 258
+20 -4
src/applications/drydock/storage/DrydockResource.php
··· 17 17 private $blueprint = self::ATTACHABLE; 18 18 private $isAllocated = false; 19 19 private $activateWhenAllocated = false; 20 + private $slotLocks = array(); 20 21 21 22 protected function getConfiguration() { 22 23 return array( ··· 80 81 return $this; 81 82 } 82 83 84 + public function needSlotLock($key) { 85 + $this->slotLocks[] = $key; 86 + return $this; 87 + } 88 + 83 89 public function allocateResource($status) { 84 90 if ($this->getID()) { 85 91 throw new Exception( ··· 105 111 $new_status = DrydockResourceStatus::STATUS_PENDING; 106 112 } 107 113 108 - $this 109 - ->setStatus($new_status) 110 - ->save(); 114 + $this->openTransaction(); 111 115 112 - $this->didAllocate = true; 116 + $this 117 + ->setStatus($new_status) 118 + ->save(); 119 + 120 + DrydockSlotLock::acquireLocks($this->getPHID(), $this->slotLocks); 121 + $this->slotLocks = array(); 122 + 123 + $this->saveTransaction(); 124 + 125 + $this->isAllocated = true; 113 126 114 127 return $this; 115 128 } ··· 151 164 152 165 $this->setStatus(DrydockResourceStatus::STATUS_CLOSED); 153 166 $this->save(); 167 + 168 + DrydockSlotLock::releaseLocks($this->getPHID()); 169 + 154 170 $this->saveTransaction(); 155 171 } 156 172
+107
src/applications/drydock/storage/DrydockSlotLock.php
··· 1 + <?php 2 + 3 + /** 4 + * Simple optimistic locks for Drydock resources and leases. 5 + * 6 + * Most blueprints only need very simple locks: for example, a host blueprint 7 + * might not want to create multiple resources representing the same physical 8 + * machine. These optimistic "slot locks" provide a flexible way to do this 9 + * sort of simple locking. 10 + * 11 + * @task lock Acquiring and Releasing Locks 12 + */ 13 + final class DrydockSlotLock extends DrydockDAO { 14 + 15 + protected $ownerPHID; 16 + protected $lockIndex; 17 + protected $lockKey; 18 + 19 + protected function getConfiguration() { 20 + return array( 21 + self::CONFIG_TIMESTAMPS => false, 22 + self::CONFIG_COLUMN_SCHEMA => array( 23 + 'lockIndex' => 'bytes12', 24 + 'lockKey' => 'text', 25 + ), 26 + self::CONFIG_KEY_SCHEMA => array( 27 + 'key_lock' => array( 28 + 'columns' => array('lockIndex'), 29 + 'unique' => true, 30 + ), 31 + 'key_owner' => array( 32 + 'columns' => array('ownerPHID'), 33 + ), 34 + ), 35 + ) + parent::getConfiguration(); 36 + } 37 + 38 + public static function loadLocks($owner_phid) { 39 + return id(new DrydockSlotLock())->loadAllWhere( 40 + 'ownerPHID = %s', 41 + $owner_phid); 42 + } 43 + 44 + 45 + /* -( Acquiring and Releasing Locks )-------------------------------------- */ 46 + 47 + 48 + /** 49 + * Acquire a set of slot locks. 50 + * 51 + * This method either acquires all the locks or throws an exception (usually 52 + * because one or more locks are held). 53 + * 54 + * @param phid Lock owner PHID. 55 + * @param list<string> List of locks to acquire. 56 + * @return void 57 + * @task locks 58 + */ 59 + public static function acquireLocks($owner_phid, array $locks) { 60 + if (!$locks) { 61 + return; 62 + } 63 + 64 + $table = new DrydockSlotLock(); 65 + $conn_w = $table->establishConnection('w'); 66 + 67 + $sql = array(); 68 + foreach ($locks as $lock) { 69 + $sql[] = qsprintf( 70 + $conn_w, 71 + '(%s, %s, %s)', 72 + $owner_phid, 73 + PhabricatorHash::digestForIndex($lock), 74 + $lock); 75 + } 76 + 77 + // TODO: These exceptions are pretty tricky to read. It would be good to 78 + // figure out which locks could not be acquired and try to improve the 79 + // exception to make debugging easier. 80 + 81 + queryfx( 82 + $conn_w, 83 + 'INSERT INTO %T (ownerPHID, lockIndex, lockKey) VALUES %Q', 84 + $table->getTableName(), 85 + implode(', ', $sql)); 86 + } 87 + 88 + 89 + /** 90 + * Release all locks held by an owner. 91 + * 92 + * @param phid Lock owner PHID. 93 + * @return void 94 + * @task locks 95 + */ 96 + public static function releaseLocks($owner_phid) { 97 + $table = new DrydockSlotLock(); 98 + $conn_w = $table->establishConnection('w'); 99 + 100 + queryfx( 101 + $conn_w, 102 + 'DELETE FROM %T WHERE ownerPHID = %s', 103 + $table->getTableName(), 104 + $owner_phid); 105 + } 106 + 107 + }
+1 -2
src/applications/drydock/worker/DrydockAllocatorWorker.php
··· 350 350 DrydockBlueprint $blueprint, 351 351 DrydockLease $lease) { 352 352 $resource = $blueprint->allocateResource($lease); 353 - $this->validateAllocatedResource($resource); 353 + $this->validateAllocatedResource($blueprint, $resource, $lease); 354 354 return $resource; 355 355 } 356 356 ··· 369 369 DrydockBlueprint $blueprint, 370 370 $resource, 371 371 DrydockLease $lease) { 372 - $blueprint = $this->getBlueprintClass(); 373 372 374 373 if (!($resource instanceof DrydockResource)) { 375 374 throw new Exception(