@recaptime-dev's working patches + fork for Phorge, a community fork of Phabricator. (Upstream dev and stable branches are at upstream/main and upstream/stable respectively.) hq.recaptime.dev/wiki/Phorge
phorge phabricator

When storage is partitioned, refuse to serve requests unless web and databases agree on partitioning

Summary:
Ref T11044. One popular tool in a modern operations environment is Puppet. The primary purpose of this tool is to randomly revert hosts to older or different configurations.

Introducing an element of chaotic unpredictability into operations trains staff to be on high alert at all times, rather than lulled into complacency by predictability or consistency.

When Puppet reverts a Phabricator host's configuration to an older version, we might start writing data to a lot of crazy places where it shouldn't go. This will create a big sticky mess that is virtually impossible to undo, mostly because we'll get two files with ID 123 or two tasks with ID 456 or whatever else and good luck with that.

Instead, after changing the partition layout, require `bin/storage partition` to be run. This writes a copy of the config everywhere.

Then, when we start serving web requests, make sure every database has the exact same config. This will foil Puppet by refusing to run requests on hosts it has reverted.

Test Plan:
- Changed partition configuration.
- Ran Phabricator.
- FOILED!
- Ran `bin/storage partition` to sync config.
- Things worked again.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T11044

Differential Revision: https://secure.phabricator.com/D16910

+126 -3
+5
resources/sql/autopatches/20161121.cluster.01.hoststate.sql
··· 1 + CREATE TABLE {$NAMESPACE}_meta_data.hoststate ( 2 + stateKey VARCHAR(128) NOT NULL COLLATE {$COLLATE_TEXT}, 3 + stateValue LONGTEXT NOT NULL COLLATE {$COLLATE_TEXT}, 4 + PRIMARY KEY (stateKey) 5 + ) ENGINE=InnoDB, COLLATE {$COLLATE_TEXT};
+2
src/__phutil_library_map__.php
··· 3795 3795 'PhabricatorStorageManagementDatabasesWorkflow' => 'infrastructure/storage/management/workflow/PhabricatorStorageManagementDatabasesWorkflow.php', 3796 3796 'PhabricatorStorageManagementDestroyWorkflow' => 'infrastructure/storage/management/workflow/PhabricatorStorageManagementDestroyWorkflow.php', 3797 3797 'PhabricatorStorageManagementDumpWorkflow' => 'infrastructure/storage/management/workflow/PhabricatorStorageManagementDumpWorkflow.php', 3798 + 'PhabricatorStorageManagementPartitionWorkflow' => 'infrastructure/storage/management/workflow/PhabricatorStorageManagementPartitionWorkflow.php', 3798 3799 'PhabricatorStorageManagementProbeWorkflow' => 'infrastructure/storage/management/workflow/PhabricatorStorageManagementProbeWorkflow.php', 3799 3800 'PhabricatorStorageManagementQuickstartWorkflow' => 'infrastructure/storage/management/workflow/PhabricatorStorageManagementQuickstartWorkflow.php', 3800 3801 'PhabricatorStorageManagementRenamespaceWorkflow' => 'infrastructure/storage/management/workflow/PhabricatorStorageManagementRenamespaceWorkflow.php', ··· 8977 8978 'PhabricatorStorageManagementDatabasesWorkflow' => 'PhabricatorStorageManagementWorkflow', 8978 8979 'PhabricatorStorageManagementDestroyWorkflow' => 'PhabricatorStorageManagementWorkflow', 8979 8980 'PhabricatorStorageManagementDumpWorkflow' => 'PhabricatorStorageManagementWorkflow', 8981 + 'PhabricatorStorageManagementPartitionWorkflow' => 'PhabricatorStorageManagementWorkflow', 8980 8982 'PhabricatorStorageManagementProbeWorkflow' => 'PhabricatorStorageManagementWorkflow', 8981 8983 'PhabricatorStorageManagementQuickstartWorkflow' => 'PhabricatorStorageManagementWorkflow', 8982 8984 'PhabricatorStorageManagementRenamespaceWorkflow' => 'PhabricatorStorageManagementWorkflow',
+32
src/applications/config/check/PhabricatorDatabaseSetupCheck.php
··· 205 205 break; 206 206 } 207 207 208 + // If we have more than one master, we require that the cluster database 209 + // configuration written to each database node is exactly the same as the 210 + // one we are running with. 211 + $masters = PhabricatorDatabaseRef::getAllMasterDatabaseRefs(); 212 + if (count($masters) > 1) { 213 + $state_actual = queryfx_one( 214 + $conn_meta, 215 + 'SELECT stateValue FROM %T WHERE stateKey = %s', 216 + PhabricatorStorageManagementAPI::TABLE_HOSTSTATE, 217 + 'cluster.databases'); 218 + if ($state_actual) { 219 + $state_actual = $state_actual['stateValue']; 220 + } 208 221 222 + $state_expect = $ref->getPartitionStateForCommit(); 223 + 224 + if ($state_expect !== $state_actual) { 225 + $message = pht( 226 + 'Database host "%s" has a configured cluster state which disagrees '. 227 + 'with the state on this host ("%s"). Run `bin/storage partition` '. 228 + 'to commit local state to the cluster. This host may have started '. 229 + 'with an out-of-date configuration.', 230 + $ref->getRefKey(), 231 + php_uname('n')); 232 + 233 + $this->newIssue('db.state.desync') 234 + ->setName(pht('Cluster Configuration Out of Sync')) 235 + ->setMessage($message) 236 + ->setIsFatal(true); 237 + return true; 238 + } 239 + } 209 240 } 241 + 210 242 }
+17
src/docs/user/cluster/cluster_partitioning.diviner
··· 123 123 names. You can get a list of databases with `bin/storage databases` to identify 124 124 the correct database names. 125 125 126 + After you have configured partitioning, it needs to be committed to the 127 + databases. This writes a copy of the configuration to tables on the databases, 128 + preventing errors if a webserver accidentally starts with an old or invalid 129 + configuration. 130 + 131 + To commit the configuration, run this command: 132 + 133 + ``` 134 + phabricator/ $ ./bin/storage partition 135 + ``` 136 + 137 + Run this command after making any partition or clustering changes. Webservers 138 + will not serve traffic if their configuration and the database configuration 139 + differ. 140 + 126 141 127 142 Launching a new Partition 128 143 ========================= ··· 135 150 are partitioning, you will need to configure your existing master as the 136 151 new "default". This will let Phabricator interact with it, but won't send 137 152 any traffic to it yet. 153 + - Run `bin/storage partition`. 138 154 - Run `bin/storage upgrade` to initialize the schemata on the new hosts. 139 155 - Stop writes to the applications you want to move by putting Phabricator 140 156 in read-only mode, or shutting down the webserver and daemons, or telling ··· 143 159 - Load the data into the application databases on the new master. 144 160 - Reconfigure the "partition" setup so that Phabricator knows the databases 145 161 have moved. 162 + - Run `bin/storage partition`. 146 163 - While still in read-only mode, check that all the data appears to be 147 164 intact. 148 165 - Resume writes.
+11 -3
src/infrastructure/cluster/PhabricatorDatabaseRef.php
··· 180 180 return $this->applicationMap; 181 181 } 182 182 183 + public function getPartitionStateForCommit() { 184 + $state = PhabricatorEnv::getEnvConfig('cluster.databases'); 185 + foreach ($state as $key => $value) { 186 + // Don't store passwords, since we don't care if they differ and 187 + // users may find it surprising. 188 + unset($state[$key]['pass']); 189 + } 190 + 191 + return phutil_json_encode($state); 192 + } 193 + 183 194 public function setMasterRef(PhabricatorDatabaseRef $master_ref) { 184 195 $this->masterRef = $master_ref; 185 196 return $this; ··· 498 509 499 510 $masters = array(); 500 511 foreach ($refs as $ref) { 501 - if ($ref->getDisabled()) { 502 - continue; 503 - } 504 512 if ($ref->getIsMaster()) { 505 513 $masters[] = $ref; 506 514 }
+1
src/infrastructure/storage/management/PhabricatorStorageManagementAPI.php
··· 19 19 const COLLATE_FULLTEXT = 'COLLATE_FULLTEXT'; 20 20 21 21 const TABLE_STATUS = 'patch_status'; 22 + const TABLE_HOSTSTATE = 'hoststate'; 22 23 23 24 public function setDisableUTF8MB4($disable_utf8_mb4) { 24 25 $this->disableUTF8MB4 = $disable_utf8_mb4;
+44
src/infrastructure/storage/management/workflow/PhabricatorStorageManagementPartitionWorkflow.php
··· 1 + <?php 2 + 3 + final class PhabricatorStorageManagementPartitionWorkflow 4 + extends PhabricatorStorageManagementWorkflow { 5 + 6 + protected function didConstruct() { 7 + $this 8 + ->setName('partition') 9 + ->setExamples('**partition** [__options__]') 10 + ->setSynopsis(pht('Commit partition configuration to databases.')) 11 + ->setArguments(array()); 12 + } 13 + 14 + public function didExecute(PhutilArgumentParser $args) { 15 + echo tsprintf( 16 + "%s\n", 17 + pht('Committing configured partition map to databases...')); 18 + 19 + foreach ($this->getMasterAPIs() as $api) { 20 + $ref = $api->getRef(); 21 + $conn = $ref->newManagementConnection(); 22 + 23 + $state = $ref->getPartitionStateForCommit(); 24 + 25 + queryfx( 26 + $conn, 27 + 'INSERT INTO %T.%T (stateKey, stateValue) VALUES (%s, %s) 28 + ON DUPLICATE KEY UPDATE stateValue = VALUES(stateValue)', 29 + $api->getDatabaseName('meta_data'), 30 + PhabricatorStorageManagementAPI::TABLE_HOSTSTATE, 31 + 'cluster.databases', 32 + $state); 33 + 34 + echo tsprintf( 35 + "%s\n", 36 + pht( 37 + 'Wrote configuration on database host "%s".', 38 + $ref->getRefKey())); 39 + } 40 + 41 + return 0; 42 + } 43 + 44 + }
+14
src/infrastructure/storage/schema/PhabricatorStorageSchemaSpec.php
··· 18 18 'unique' => true, 19 19 ), 20 20 )); 21 + 22 + $this->buildRawSchema( 23 + 'meta_data', 24 + PhabricatorStorageManagementAPI::TABLE_HOSTSTATE, 25 + array( 26 + 'stateKey' => 'text128', 27 + 'stateValue' => 'text', 28 + ), 29 + array( 30 + 'PRIMARY' => array( 31 + 'columns' => array('stateKey'), 32 + 'unique' => true, 33 + ), 34 + )); 21 35 } 22 36 23 37 }