script to retroactively add commitids to past openbsd commits

new stuff to work with cvs provenance and branches

- put changesets in order, create new commitids for them by storing
a temporary commitid in CVSROOT/commitids-{tree}, running 'cvs
show' on it, reading the output hash, then overwrite the hash in
the commitids file with what we just got, then finally write out
all those new commitids to every revision

- add some new hacks for specific files in src/ that have revisions
with missing previous revisions, and some that are obviously part
of changesets with existing commitids but don't have them for some
reason

- sprinkle some BEGIN/COMMIT around heavy sqlite ops for speed, fix
index names so they're actually unique :/

- be more careful about string encoding in sqlite, get everything to
utf-8. data from rlog was coming in as binary, now iso-8859-1,
and needs to be properly converted to utf-8 before going into
sqlite, otherwise later queries trying to match on string data
(which are probably using utf-8 strings) will fail

- work better when re-running like not complaining about cvs-tmp
existing, since this is no longer intended to just run once.
store cheap cksum of file to detect changes to speed up re-scan.

+538 -165
+7 -44
README.md
··· 1 1 ###OpenBSD commitid generator 2 2 3 - A work in progress to assign `commitid` identifiers to all files in OpenBSD's 4 - CVS trees for commits before `commitid` functionality was enabled. 3 + A work in progress to assign CVS provenance-style `commitid` identifiers to 4 + all revisions of all files in OpenBSD's CVS trees. 5 5 6 6 ####Usage 7 7 ··· 20 20 21 21 `$ ruby openbsd-commitid.rb` 22 22 23 - **NOTE**: This script relies on recently added changes to OpenBSD's `rlog` 24 - and `cvs` tools: 23 + **NOTE**: This script relies on recently added changes to OpenBSD's `rlog` and 24 + `cvs` tools: 25 25 26 26 - `cvs admin -C` to set a revision's `commitid` 27 - - `rlog -E` and `rlog -S` to control the revision separators in `rlog` 28 - output, since the default line of dashes appears in old commit messages 29 - 30 - ####Details 31 - 32 - This script does the following steps for each of the `src`, `ports`, `www`, 33 - and `xenocara` trees. 27 + - `rlog -E` and `rlog -S` to control the revision separators in `rlog` output, 28 + since the default line of dashes appears in old commit messages 34 29 35 - 1. Recurse a directory of RCS ,v files (`/var/cvs-commitid/`), create a 36 - "files" record for each one. 37 - 38 - 2. Run `rlog` on each RCS file, parse the output and create a "revisions" 39 - record with each revision's author, date, version, commitid (if present), and 40 - log message. Update the "files" record to note the first non-`dead` version 41 - in the file. 42 - 43 - 3. Fetch all revisions not already matched to a changeset, ordered by author 44 - then date, and bundle them into changesets. Create a new "changesets" record 45 - for each, then update each of those "revisions" records with the new changeset 46 - id. By sorting all commits by author and date, it's possible to accurately 47 - find all files touched by an author in the same commit window. 48 - 49 - 4. For each newly created "changesets" record, update them with a definitive 50 - timestamp, log message, author, and commitid (creating a new one if needed) 51 - based on all of the "revisions" with that changeset id. 52 - 53 - 5. Do a `cvs checkout` from `/var/cvs-commitid/` to a temporary directory 54 - created in `/var/cvs-tmp/`, checking out revision 1.1 of each file. For each 55 - file that has a first-non-`dead` version number that is not 1.1, do another 56 - `cvs checkout` of that version of the file so that every file in the tree is 57 - now present in the working checked-out tree. This is required to operate on 58 - deleted files (moved to the Attic) and files where version 1.1 does not exist, 59 - such as those created on branches. 60 - 61 - 6. For each "revisions" record of each "files" record that doesn't have a 62 - recorded `commitid` (meaning this script generated one while bundling it into a 63 - "changesets" record), run `cvs admin -C` to assign the `commitid` to that 64 - revision in the RCS `,v` file in `/var/cvs-commitid/`. 65 - 66 - 7. `rm -rf` the temporary checked-out directory in `/var/cvs-tmp/` since the 67 - changes are now all present in the `/var/cvs-commitid/` tree. 30 + For details of how this script works, read `openbsd-commitid.rb`.
+25 -11
lib/db.rb
··· 30 30 class Db 31 31 def initialize(dbf) 32 32 @db = SQLite3::Database.new dbf 33 + @db.results_as_hash = true 33 34 34 35 @db.execute "CREATE TABLE IF NOT EXISTS changesets 35 36 (id INTEGER PRIMARY KEY, date INTEGER, author TEXT, commitid TEXT, 36 - log TEXT)" 37 - @db.execute "CREATE UNIQUE INDEX IF NOT EXISTS u_commitid ON changesets 37 + log TEXT, branch TEXT, csorder INTEGER)" 38 + @db.execute "CREATE UNIQUE INDEX IF NOT EXISTS u_cs_commitid ON changesets 38 39 (commitid)" 40 + @db.execute "CREATE UNIQUE INDEX IF NOT EXISTS u_cs_csorder ON changesets 41 + (csorder)" 42 + @db.execute "CREATE INDEX IF NOT EXISTS cs_branch ON changesets (branch)" 39 43 40 44 @db.execute "CREATE TABLE IF NOT EXISTS files 41 45 (id INTEGER PRIMARY KEY, file TEXT, first_undead_version TEXT, 42 - size INTEGER)" 43 - @db.execute "CREATE UNIQUE INDEX IF NOT EXISTS u_file ON files 46 + cksum TEXT)" 47 + @db.execute "CREATE UNIQUE INDEX IF NOT EXISTS u_f_file ON files 44 48 (file)" 45 49 46 50 @db.execute "CREATE TABLE IF NOT EXISTS revisions 47 51 (id INTEGER PRIMARY KEY, file_id INTEGER, changeset_id INTEGER, 48 52 date INTEGER, version TEXT, author TEXT, commitid TEXT, log TEXT, 49 - state TEXT)" 50 - @db.execute "CREATE UNIQUE INDEX IF NOT EXISTS u_revision ON revisions 53 + state TEXT, branch TEXT)" 54 + @db.execute "CREATE UNIQUE INDEX IF NOT EXISTS u_r_revision ON revisions 51 55 (file_id, version)" 52 - @db.execute "CREATE INDEX IF NOT EXISTS empty_changesets ON revisions 56 + @db.execute "CREATE INDEX IF NOT EXISTS r_empty_changesets ON revisions 53 57 (changeset_id)" 54 - @db.execute "CREATE INDEX IF NOT EXISTS cs_by_commitid ON revisions 58 + @db.execute "CREATE INDEX IF NOT EXISTS r_cs_by_commitid ON revisions 55 59 (commitid, changeset_id)" 56 - @db.execute "CREATE INDEX IF NOT EXISTS all_revs_by_author ON revisions 60 + @db.execute "CREATE INDEX IF NOT EXISTS r_all_revs_by_author ON revisions 57 61 (author, date)" 58 - @db.execute "CREATE INDEX IF NOT EXISTS all_revs_by_version_and_state ON 62 + @db.execute "CREATE INDEX IF NOT EXISTS r_all_revs_by_version_and_state ON 59 63 revisions (version, state)" 64 + @db.execute "CREATE INDEX IF NOT EXISTS r_branch ON revisions (branch)" 60 65 61 - @db.results_as_hash = true 66 + @db.execute("CREATE TABLE IF NOT EXISTS vendor_branches 67 + (id INTEGER PRIMARY KEY, revision_id INTEGER, branch TEXT)") 68 + @db.execute("CREATE INDEX IF NOT EXISTS vb_revision ON vendor_branches 69 + (revision_id)") 70 + @db.execute("CREATE INDEX IF NOT EXISTS vb_branch_branch ON vendor_branches 71 + (branch)") 62 72 end 63 73 64 74 def execute(*args) ··· 69 79 else 70 80 @db.execute(*args) 71 81 end 82 + end 83 + 84 + def last_insert_row_id 85 + @db.last_insert_row_id 72 86 end 73 87 end
+10 -5
lib/outputter.rb
··· 39 39 printlog = Proc.new { 40 40 fh.puts "Changes by: #{last["author"]}@#{domain} " << 41 41 Time.at(last["date"].to_i).strftime("%Y/%m/%d %H:%M:%S") 42 - fh.puts "Commitid: #{last["commitid"]}" 42 + if last["commitid"].to_s != "" 43 + fh.puts "Commitid: #{last["commitid"]}" 44 + end 45 + if last["branch"].to_s != "" 46 + fh.puts "Branch: #{last["branch"]}" 47 + end 43 48 fh.puts "" 44 49 fh.puts "Modified files:" 45 50 ··· 95 100 } 96 101 97 102 @scanner.db.execute("SELECT 98 - changesets.date, changesets.author, changesets.commitid, changesets.log, 99 - files.file 103 + changesets.csorder, changesets.date, changesets.author, 104 + changesets.commitid, changesets.log, files.file, revisions.branch 100 105 FROM changesets 101 106 LEFT OUTER JOIN revisions ON revisions.changeset_id = changesets.id 102 107 LEFT OUTER JOIN files ON revisions.file_id = files.id 103 - ORDER BY changesets.date, files.file") do |csfile| 104 - if csfile["commitid"] == last["commitid"] 108 + ORDER BY changesets.csorder, files.file") do |csfile| 109 + if csfile["csorder"] == last["csorder"] 105 110 files.push csfile["file"] 106 111 else 107 112 if files.any?
+20 -4
lib/rcsfile.rb
··· 26 26 # 27 27 28 28 class RCSFile 29 - attr_accessor :revisions, :first_undead_version 29 + attr_accessor :file, :revisions, :symbols, :first_undead_version 30 30 31 31 RCSEND = "==================OPENBSD_COMMITID_RCS_END==================" 32 32 REVSEP = "------------------OPENBSD_COMMITID_REV_SEP------------------" 33 33 34 34 def initialize(file) 35 + @file = file 35 36 @revisions = {} 37 + @symbols = {} 36 38 37 39 blocks = [] 38 40 IO.popen([ "rlog", "-E#{RCSEND}", "-S#{REVSEP}", file ]) do |rlog| 39 - blocks = rlog.read.force_encoding("binary"). 41 + blocks = rlog.read.force_encoding("iso-8859-1"). 40 42 split(/^(#{REVSEP}|#{RCSEND})\n?$/). 41 43 reject{|b| b == RCSEND || b == REVSEP } 42 44 end ··· 45 47 raise "file #{file} didn't come out of rlog properly" 46 48 end 47 49 48 - blocks.shift 50 + insymbols = false 51 + blocks.shift.split("\n").each do |l| 52 + if l.match(/^symbolic names:/) 53 + insymbols = true 54 + elsif insymbols && (m = l.match(/^\t(.+): ([\d\.]+)$/)) 55 + @symbols[m[1].encode("UTF-8")] = m[2].encode("UTF-8") 56 + else 57 + insymbols = false 58 + end 59 + end 60 + 49 61 blocks.each do |block| 50 - rev = RCSRevision.new(block) 62 + rev = RCSRevision.new(self, block) 51 63 if @revisions[rev.version] 52 64 raise "duplicate revision #{rev.version} in #{file}" 53 65 end ··· 58 70 # this has nothing to do with Gem, but it has a version comparator 59 71 sort{|a,b| Gem::Version.new(a.version) <=> Gem::Version.new(b.version) }. 60 72 select{|r| r.state != "dead" }.first.version 73 + end 74 + 75 + def to_s 76 + "RCSFile: #{@file}" 61 77 end 62 78 end
+97 -5
lib/rcsrevision.rb
··· 28 28 require "date" 29 29 30 30 class RCSRevision 31 - attr_accessor :version, :date, :author, :state, :lines, :commitid, :log 31 + attr_accessor :rcsfile, :version, :date, :author, :state, :lines, :commitid, 32 + :log, :branch, :vendor_branches 33 + 34 + def self.previous_of(ver) 35 + nums = ver.split(".").map{|z| z.to_i } 36 + 37 + if nums.last == 1 38 + # 1.3.2.1 -> 1.3 39 + 2.times { nums.pop } 40 + else 41 + # 1.3.2.2 -> 1.3.2.1 42 + nums[nums.count - 1] -= 1 43 + end 44 + 45 + outnum = nums.join(".") 46 + if outnum == "" 47 + return "0" 48 + else 49 + return outnum 50 + end 51 + end 52 + 53 + # 1.1.0.2 -> 1.1.2.1 54 + def self.first_branch_version_of(ver) 55 + nums = ver.split(".").map{|z| z.to_i } 56 + 57 + if nums[nums.length - 2] != 0 58 + return ver 59 + end 60 + 61 + last = nums.pop 62 + nums.pop 63 + nums.push last 64 + nums.push 1 65 + 66 + return nums.join(".") 67 + end 68 + 69 + def self.is_vendor_branch?(ver) 70 + !!ver.match(/^1\.1\.1\..*/) 71 + end 72 + 73 + def self.is_trunk?(ver) 74 + ver.split(".").count == 2 75 + end 32 76 33 77 # str: "revision 1.7\ndate: 1996/12/14 12:17:33; author: mickey; state: Exp; lines: +3 -3;\n-Wall'ing." 34 - def initialize(str) 78 + def initialize(rcsfile, str) 79 + @rcsfile = rcsfile 35 80 @version = nil 36 81 @date = 0 37 82 @author = nil ··· 39 84 @lines = nil 40 85 @commitid = nil 41 86 @log = nil 87 + @branch = nil 88 + @vendor_branches = [] 42 89 43 90 lines = str.gsub(/^\s*/, "").split("\n") 44 91 # -> [ ··· 52 99 lines.delete_at(2) 53 100 end 54 101 55 - @version = lines.first.scan(/^revision ([\d\.]+)($|\tlocked by)/).first.first 102 + @version = lines.first.scan(/^revision ([\d\.]+)($|\tlocked by)/).first. 103 + first.encode("UTF-8") 56 104 # -> "1.7" 57 105 58 106 # date/author/state/lines/commitid line 59 107 lines[1].split(/;[ \t]*/).each do |piece| 60 108 kv = piece.split(": ") 61 - self.send(kv[0] + "=", kv[1]) 109 + self.send(kv[0] + "=", kv[1].encode("UTF-8")) 62 110 end 63 111 # -> @date = "1996/12/14 12:17:33", @author = "mickey", ... 64 112 ··· 69 117 end 70 118 # -> @date = 850565853 71 119 72 - @log = lines[2, lines.count].join("\n") 120 + @log = lines[2, lines.count].join("\n").encode("UTF-8", 121 + :invalid => :replace, :undef => :replace, :replace => "?") 122 + 123 + if @version.match(/^\d+\.\d+$/) 124 + # no branch 125 + elsif @version.match(/^1\.1\.1\./) || 126 + (@version == "1.1.2.1" && @branch == nil) 127 + # vendor 128 + @rcsfile.symbols.each do |k,v| 129 + if v == "1.1.1" 130 + @vendor_branches.push k 131 + end 132 + end 133 + elsif m = @version.match(/^(\d+)\.(\d+)\.(\d+)\.\d+$/) 134 + # 1.2.2.3 -> 1.2.0.2 135 + sym = [ m[1], m[2], "0", m[3] ].join(".") 136 + @rcsfile.symbols.each do |s,v| 137 + if v == sym 138 + if @branch 139 + raise "version #{@version} matched two symbols (#{@branch}, #{s})" 140 + end 141 + 142 + @branch = s 143 + end 144 + end 145 + 146 + if !@branch && @rcsfile.symbols.values.include?(@version) 147 + # if there's an exact match, this was probably just an import done with 148 + # a vendor branch id (import -b) 149 + elsif !@branch 150 + # branch was deleted, but we don't want this appearing on HEAD, so call 151 + # it something 152 + @branch = "_branchless_#{@version.gsub(".", "_")}" 153 + end 154 + 155 + if @branch && @rcsfile.symbols[@branch] && 156 + @rcsfile.symbols[@branch].match(/^1\.1\.0\.\d+$/) 157 + # this is also a vendor branch 158 + if !@vendor_branches.include?(@branch) 159 + @vendor_branches.push @branch 160 + end 161 + end 162 + else 163 + raise "TODO: handle version #{@version}" 164 + end 73 165 end 74 166 end
+325 -74
lib/scanner.rb
··· 26 26 # 27 27 28 28 class Scanner 29 - attr_accessor :outputter, :db 29 + attr_accessor :outputter, :db, :commitid_hacks, :prev_revision_hacks 30 30 31 31 # how long commits by the same author with the same commit message can be 32 32 # from each other and still be grouped in the same changeset ··· 36 36 @db = Db.new dbf 37 37 @root = (root + "/").gsub(/\/\//, "/") 38 38 @outputter = Outputter.new(self) 39 + @prev_revision_hacks = {} 40 + @commitid_hacks = {} 39 41 end 40 42 41 43 def recursively_scan(dir = nil) ··· 55 57 end 56 58 57 59 def scan(f) 58 - stat = File.stat(f) 60 + cksum = "" 61 + IO.popen([ "cksum", "-q", f ]) do |c| 62 + parts = c.read.force_encoding("iso-8859-1").split(" ") 63 + if parts.length != 2 64 + raise "invalid output from cksum: #{parts.inspect}" 65 + end 66 + 67 + cksum = parts[0].encode("utf-8") 68 + end 69 + 59 70 canfile = f[@root.length, f.length - @root.length].gsub(/(^|\/)Attic\//, 60 71 "/").gsub(/^\/*/, "") 61 72 62 - fid = @db.execute("SELECT id, first_undead_version, size FROM files " + 73 + fid = @db.execute("SELECT id, first_undead_version, cksum FROM files " + 63 74 "WHERE file = ?", [ canfile ]).first 64 - if fid && fid["size"].to_i > 0 && fid["size"].to_i == stat.size 75 + if fid && fid["cksum"].to_s == cksum 65 76 return 66 77 end 67 78 ··· 69 80 70 81 rcs = RCSFile.new(f) 71 82 83 + @db.execute("BEGIN") 84 + 72 85 if fid 73 86 if fid["first_undead_version"] != rcs.first_undead_version 74 87 @db.execute("UPDATE files SET first_undead_version = ? WHERE id = ?", ··· 82 95 end 83 96 raise if !fid 84 97 98 + if @commitid_hacks && @commitid_hacks[canfile] 99 + @commitid_hacks[canfile].each do |v,cid| 100 + if rcs.revisions[v].commitid && 101 + rcs.revisions[v].commitid != cid 102 + raise "hack for #{canfile}:#{v} commitid of #{cid.inspect} would " + 103 + "overwrite #{rcs.revisions[v].commitid}" 104 + end 105 + 106 + puts " faking commitid for revision #{v} -> #{cid}" 107 + rcs.revisions[v].commitid = cid 108 + end 109 + end 110 + 85 111 rcs.revisions.each do |r,rev| 86 112 rid = @db.execute("SELECT id, commitid FROM revisions WHERE " + 87 113 "file_id = ? AND version = ?", [ fid["id"], r ]).first ··· 95 121 "AND version = ?", [ rev.commitid, fid["id"], rev.version ]) 96 122 end 97 123 else 98 - puts " inserted #{r}, authored #{rev.date} by #{rev.author}" + 124 + # files added on branches/imports have unhelpful commit messages with 125 + # the helpful ones on the branch versions, so copy them over while 126 + # we're here 127 + if rev.log.to_s == "Initial revision" 128 + if r == "1.1" && rcs.revisions["1.1.1.1"] 129 + rev.log = rcs.revisions["1.1.1.1"].log 130 + puts " revision #{r} using log from 1.1.1.1" 131 + else 132 + puts " revision #{r} keeping log #{rev.log.inspect}, no 1.1.1.1" 133 + end 134 + elsif m = rev.log.to_s. 135 + match(/\Afile .+? was initially added on branch ([^\.]+)\.\z/) 136 + brver = nil 137 + if br = rcs.symbols[m[1]] 138 + brver = RCSRevision.first_branch_version_of(br) 139 + if !rcs.revisions[brver] 140 + if rcs.revisions[brver + ".1"] 141 + brver += ".1" 142 + else 143 + puts " revision #{r} keeping log #{rev.log.inspect}, no #{brver}" 144 + brver = nil 145 + end 146 + end 147 + end 148 + 149 + if brver 150 + rev.log = rcs.revisions[brver].log 151 + puts " revision #{r} using log from #{brver}" 152 + 153 + # but consider this trunk revision on the branch the file was added 154 + # on, just so we keep it in the same changeset 155 + rev.branch = rcs.revisions[brver].branch 156 + else 157 + puts " revision #{r} keeping log #{rev.log.inspect}, no #{m[1]}" 158 + end 159 + end 160 + 161 + puts " inserted #{r}" + 162 + (rev.branch ? " (branch #{rev.branch})" : "") + 163 + ", authored #{rev.date} by #{rev.author}" + 99 164 (rev.commitid ? ", commitid #{rev.commitid}" : "") 100 165 101 166 @db.execute("INSERT INTO revisions (file_id, date, version, author, " + 102 - "commitid, state, log) VALUES (?, ?, ?, ?, ?, ?, ?)", 167 + "commitid, state, log, branch) VALUES (?, ?, ?, ?, ?, ?, ?, ?)", 103 168 [ fid["id"], rev.date, rev.version, rev.author, rev.commitid, 104 - rev.state, rev.log ]) 169 + rev.state, rev.log, rev.branch ]) 170 + rid = { "id" => @db.last_insert_row_id } 105 171 end 106 - end 107 172 108 - @db.execute("UPDATE files SET size = ? WHERE id = ?", 109 - [ stat.size, fid["id"] ]) 110 - end 173 + vbs = @db.execute("SELECT branch FROM vendor_branches WHERE " + 174 + "revision_id = ?", [ rid["id"] ]).map{|r| r["branch"] }.flatten 111 175 112 - def stray_commitids_to_changesets 113 - stray_commitids = @db.execute("SELECT DISTINCT author, commitid FROM " + 114 - "revisions WHERE commitid IS NOT NULL AND changeset_id IS NULL") 115 - stray_commitids.each do |row| 116 - csid = @db.execute("SELECT id FROM changesets WHERE commitid = ?", 117 - [ row["commitid"] ]).first 118 - if !csid 119 - @db.execute("INSERT INTO changesets (author, commitid) VALUES (?, ?)", 120 - [ row["author"], row["commitid"] ]) 121 - csid = @db.execute("SELECT id FROM changesets WHERE commitid = ?", 122 - [ row["commitid"] ]).first 176 + rev.vendor_branches.each do |vb| 177 + if !vbs.include?(vb) 178 + puts " inserting vendor branch #{vb}" 179 + @db.execute("INSERT INTO vendor_branches (revision_id, branch) " + 180 + "VALUES (?, ?)", [ rid["id"], vb ]) 181 + end 123 182 end 124 - raise if !csid 183 + 184 + vbs.each do |vb| 185 + if !rev.vendor_branches.include?(vb) 186 + @db.execute("DELETE FROM vendor_branches WHERE revision_id = ? " + 187 + "AND branch = ?", [ rid["id"], vb ]) 188 + end 189 + end 190 + end 125 191 126 - puts "commitid #{row["commitid"]} -> changeset #{csid["id"]}" 192 + @db.execute("UPDATE files SET cksum = ? WHERE id = ?", 193 + [ cksum, fid["id"] ]) 127 194 128 - @db.execute("UPDATE revisions SET changeset_id = ? WHERE commitid = ?", 129 - [ csid["id"], row["commitid"] ]) 130 - end 195 + @db.execute("COMMIT") 131 196 end 132 197 133 198 def group_into_changesets 199 + puts "grouping into changesets" 200 + 134 201 new_sets = [] 135 202 last_row = {} 136 203 cur_set = [] 137 204 138 - # TODO: don't conditionalize with null changeset_ids, to allow this to run 139 - # incrementally and match new commits to old changesets 205 + @db.execute("BEGIN") 206 + 207 + # commits by the same author with the same log message within a small 208 + # timeframe are grouped together 140 209 @db.execute("SELECT * FROM revisions WHERE changeset_id IS NULL ORDER " + 141 - "BY author ASC, date ASC") do |row| 142 - # commits by the same author with the same log message (unless they're 143 - # initial imports - 1.1.1.1) within a small timeframe are grouped 144 - # together 145 - if last_row.any? && row["author"] == last_row["author"] && 146 - (row["log"] == last_row["log"] || row["log"] == "Initial revision" || 147 - last_row["log"] == "Initial revision") && 210 + "BY author ASC, branch ASC, commitid ASC, date ASC") do |row| 211 + if last_row.any? && 212 + row["author"] == last_row["author"] && 213 + row["branch"] == last_row["branch"] && 214 + row["log"] == last_row["log"] && 215 + row["commitid"] == last_row["commitid"] && 148 216 row["date"].to_i - last_row["date"].to_i <= MAX_GROUP_WINDOW 149 217 cur_set.push row["id"].to_i 150 218 elsif !last_row.any? ··· 165 233 end 166 234 167 235 new_sets.each do |s| 168 - puts "new set with revision ids #{s.inspect}" 236 + puts " new set with revision ids #{s.inspect}" 169 237 @db.execute("INSERT INTO changesets (id) VALUES (NULL)") 170 238 id = @db.execute("SELECT last_insert_rowid() AS id").first["id"] 171 239 raise if !id ··· 180 248 if @db.execute("SELECT * FROM revisions WHERE changeset_id IS NULL").any? 181 249 raise "still have revisions with empty changesets" 182 250 end 251 + 252 + @db.execute("COMMIT") 253 + end 254 + 255 + def stray_commitids_to_changesets 256 + @db.execute("BEGIN") 257 + 258 + puts "finding stray commitids" 259 + 260 + stray_commitids = @db.execute("SELECT DISTINCT author, commitid FROM " + 261 + "revisions WHERE commitid IS NOT NULL AND changeset_id IS NULL") 262 + stray_commitids.each do |row| 263 + csid = @db.execute("SELECT id FROM changesets WHERE commitid = ?", 264 + [ row["commitid"] ]).first 265 + if !csid 266 + @db.execute("INSERT INTO changesets (author, commitid) VALUES (?, ?)", 267 + [ row["author"], row["commitid"] ]) 268 + csid = @db.execute("SELECT id FROM changesets WHERE commitid = ?", 269 + [ row["commitid"] ]).first 270 + end 271 + raise if !csid 272 + 273 + puts " commitid #{row["commitid"]} -> changeset #{csid["id"]}" 274 + 275 + @db.execute("UPDATE revisions SET changeset_id = ? WHERE commitid = ?", 276 + [ csid["id"], row["commitid"] ]) 277 + end 278 + 279 + @db.execute("COMMIT") 183 280 end 184 281 185 282 def fill_in_changeset_data 283 + puts "assigning dates to changesets" 284 + 285 + @db.execute("BEGIN") 286 + 186 287 cses = {} 187 288 @db.execute("SELECT id, commitid FROM changesets WHERE date IS NULL") do |c| 188 289 cses[c["id"]] = c["commitid"] 189 290 end 190 291 292 + # create canonical dates for each changeset, so we can pull them back out 293 + # in order 191 294 cses.each do |csid,comid| 192 295 date = nil 193 296 commitid = comid 194 297 log = nil 195 298 author = nil 299 + branch = nil 196 300 197 301 @db.execute("SELECT * FROM revisions WHERE changeset_id = ? ORDER BY " + 198 302 "date ASC", [ csid ]) do |rev| ··· 200 304 date = rev["date"] 201 305 end 202 306 203 - if rev["log"] != "Initial revision" 307 + if log && rev["log"] != log 308 + raise "logs different between revs of #{csid}" 309 + else 204 310 log = rev["log"] 205 311 end 206 312 ··· 209 315 else 210 316 author = rev["author"] 211 317 end 212 - end 213 318 214 - if commitid.to_s == "" 215 - commitid = "" 216 - while commitid.length < 16 217 - c = rand(75) + 48 218 - if ((c >= 48 && c <= 57) || (c >= 65 && c <= 90) || 219 - (c >= 97 && c <= 122)) 220 - commitid << c.chr 221 - end 319 + if branch && rev["branch"] != branch 320 + raise "branches different between revs of #{csid}" 321 + else 322 + branch = rev["branch"] 222 323 end 223 324 end 224 325 ··· 226 327 raise "no date for changeset #{csid}" 227 328 end 228 329 229 - puts "changeset #{csid} -> commitid #{commitid}" 330 + @db.execute("UPDATE changesets SET date = ?, log = ?, author = ?, " + 331 + "branch = ? WHERE id = ?", [ date, log, author, branch, csid ]) 332 + end 333 + 334 + @db.execute("COMMIT") 230 335 231 - @db.execute("UPDATE changesets SET date = ?, commitid = ?, log = ?, " + 232 - "author = ? WHERE id = ?", [ date, commitid, log, author, csid ]) 336 + puts "assigning changeset order" 337 + 338 + cses = [] 339 + @db.execute("SELECT id FROM changesets WHERE csorder IS NULL ORDER BY " + 340 + "date, author") do |c| 341 + cses.push c["id"] 233 342 end 234 - end 235 343 236 - def repo_surgery(tmp_dir, cvs_root, tree) 237 - puts "checking out #{tree} from #{cvs_root} to #{tmp_dir}" 344 + highestcs = @db.execute("SELECT MAX(csorder) AS lastcs FROM changesets " + 345 + "WHERE csorder IS NOT NULL").first["lastcs"].to_i 238 346 239 - Dir.chdir(tmp_dir) 347 + @db.execute("BEGIN") 348 + cses.each do |cs| 349 + highestcs += 1 350 + @db.execute("UPDATE changesets SET csorder = ?, commitid = NULL WHERE " + 351 + "id = ?", [ highestcs, cs ]) 352 + end 353 + @db.execute("COMMIT") 354 + end 240 355 356 + def stage_tmp_cvs(tmp_dir, cvs_root, tree) 241 357 # for a deleted file to be operated by with cvs admin, it must be 242 358 # present in the CVS/Entries files, so check out all files at rev 1.1 so we 243 359 # know they will not be deleted. otherwise cvs admin will fail silently 244 - system("cvs", "-Q", "-d", cvs_root, "co", "-r1.1", tree) || 245 - raise("cvs checkout returned non-zero") 360 + if File.exists?("#{tmp_dir}/#{tree}/CVS/Entries") 361 + puts "updating #{tmp_dir}#{tree} from #{cvs_root}" 362 + Dir.chdir("#{tmp_dir}/#{tree}") 363 + system("cvs", "-Q", "-d", cvs_root, "update", "-PAd", "-r1.1") || 364 + raise("cvs update returned non-zero") 365 + else 366 + puts "checking out #{cvs_root}#{tree} to #{tmp_dir}" 367 + Dir.chdir(tmp_dir) 368 + system("cvs", "-Q", "-d", cvs_root, "co", "-r1.1", tree) || 369 + raise("cvs checkout returned non-zero") 370 + end 371 + 372 + Dir.chdir(tmp_dir) 246 373 247 374 # but if any files were added on a branch or somehow have a weird history, 248 375 # their 1.1 revision will be dead so check out any non-dead revision of ··· 251 378 @db.execute("SELECT 252 379 file, first_undead_version 253 380 FROM files 254 - WHERE first_undead_version NOT LIKE '1.1'") do |rev| 381 + WHERE first_undead_version NOT LIKE '1.1' AND 382 + id IN (SELECT file_id FROM revisions WHERE commitid IS NULL)") do |rev| 255 383 dead11s[rev["file"]] = rev["first_undead_version"] 256 384 end 257 385 ··· 264 392 "#{tree}/#{confile}") || 265 393 raise("cvs co -r#{rev} #{confile} failed") 266 394 end 395 + 396 + Dir.chdir("#{tmp_dir}/#{tree}") 397 + end 398 + 399 + def recalculate_commitids(tmp_dir, cvs_root, tree, genesis) 267 400 Dir.chdir(tmp_dir + "/#{tree}") 268 401 269 - csid = nil 270 - @db.execute("SELECT 271 - files.file, changesets.commitid, changesets.author, changesets.date, 272 - revisions.version 402 + puts "recalculating new commitids from genesis #{genesis}" 403 + 404 + gfn = "#{cvs_root}/CVSROOT/commitid_genesis" 405 + if File.exists?(gfn) && File.read(gfn).strip != genesis 406 + raise "genesis in #{gfn} is not #{genesis.inspect}" 407 + else 408 + File.write("#{cvs_root}/CVSROOT/commitid_genesis", genesis + "\n") 409 + end 410 + 411 + changesets = [] 412 + @db.execute("SELECT id, csorder, commitid FROM changesets 413 + ORDER BY csorder ASC") do |cs| 414 + changesets.push cs 415 + end 416 + 417 + puts " writing commitids-#{tree} (#{changesets.length} " + 418 + "changeset#{changesets.length == 1 ? "" : "s"})" 419 + 420 + commitids = File.open("#{cvs_root}/CVSROOT/commitids-#{tree}", "w+") 421 + 422 + # every changeset needs to know the revisions of its files from the 423 + # previous change, taking into account branches. we can easily calculate 424 + # this, but we should make sure that calculated revision actually exists 425 + files = {} 426 + @db.execute("SELECT id, file FROM files") do |row| 427 + files[row["id"]] = row["file"] 428 + end 429 + files.each do |id,file| 430 + vers = [] 431 + 432 + @db.execute("SELECT version FROM revisions WHERE file_id = ?", 433 + [ id ]) do |rev| 434 + vers.push rev["version"] 435 + end 436 + 437 + vers.each do |rev| 438 + if prev_revision_hacks[file] && (hpre = prev_revision_hacks[file][rev]) 439 + puts " faking previous revision of #{file} #{rev} -> #{hpre}" 440 + pre = hpre 441 + else 442 + pre = RCSRevision.previous_of(rev) 443 + end 444 + 445 + if pre != "0" && !vers.include?(pre) 446 + raise "#{file}: revision #{rev} previous #{pre} not found" 447 + end 448 + end 449 + end 450 + files = {} 451 + 452 + # for each changeset with no commitid, store it in the commitids-* file 453 + # with a temporary commitid of just its changeset number, do a 'cvs show' 454 + # on it to calculate the actual commitid, then overwrite that hash in the 455 + # commitids file, and store our new one 456 + changesets.each do |cs| 457 + cline = [] 458 + commitid = "" 459 + if cs["commitid"].to_s != "" 460 + commitid = cs["commitid"] 461 + else 462 + commitid = sprintf("01-%064d-%07d", cs["csorder"], cs["csorder"]) 463 + end 464 + 465 + # order by length(revisions.version) to put 1.1 first, then 1.1.1.1, to 466 + # match 'cvs import' 467 + @db.execute("SELECT 468 + files.file, revisions.version, revisions.branch 469 + FROM revisions 470 + LEFT OUTER JOIN files ON files.id = revisions.file_id 471 + WHERE revisions.changeset_id = ? 472 + ORDER BY files.file ASC, LENGTH(revisions.version) ASC, 473 + revisions.version ASC", [ cs["id"] ]) do |rev| 474 + if cline.length == 0 475 + cline.push commitid 476 + end 477 + 478 + cline.push [ RCSRevision.previous_of(rev["version"]), rev["version"], 479 + rev["branch"].to_s, rev["file"].gsub(/,v$/, "") ].join(":") 480 + end 481 + 482 + pos = commitids.pos 483 + commitids.puts cline.join("\t") 484 + 485 + if cs["commitid"].to_s == "" 486 + commitids.fsync 487 + 488 + newcsum = `cvs show #{commitid} | tail -n +2 | cksum -a sha512/256`.strip 489 + if $?.exitstatus != 0 490 + raise "failed running cvs show #{commitid}" 491 + end 492 + 493 + # null 494 + if newcsum == "c672b8d1ef56ed28ab87c3622c5114069bdd3ad7b8f9737498d0c01ecef0967a" 495 + raise "failed getting new commitid from #{commitid}" 496 + end 497 + 498 + newid = sprintf("01-%64s-%07d", newcsum, cs["csorder"]) 499 + 500 + @db.execute("UPDATE changesets SET commitid = ? WHERE id = ?", 501 + [ newid, cs["id"] ]) 502 + 503 + puts " changeset #{cs["csorder"]} -> #{newid}" 504 + 505 + # go back, rewrite just our commitid, then get ready for the next line 506 + commitids.seek(pos) 507 + commitids.write(newid) 508 + commitids.seek(0, IO::SEEK_END) 509 + commitids.fsync 510 + else 511 + puts " changeset #{cs["csorder"]} == #{cs["commitid"]}" 512 + end 513 + end 514 + 515 + commitids.close 516 + end 517 + 518 + def repo_surgery(tmp_dir, cvs_root, tree) 519 + puts "updating commitids in rcs files at #{cvs_root} via #{tmp_dir}" 520 + 521 + Dir.chdir("#{tmp_dir}/#{tree}") 522 + 523 + # for each revision we have in the db (picked up from a scan) that has a 524 + # different commitid from what we assigned to its changeset, update the 525 + # commitid in the rcs file in the repo, and then our revisions records 526 + @db.execute(" 527 + SELECT 528 + files.file, changesets.commitid, revisions.version, revisions.id AS revid, 529 + revisions.commitid AS revcommitid 273 530 FROM revisions 274 - LEFT OUTER JOIN files ON files.id = file_id 531 + LEFT OUTER JOIN files ON files.id = revisions.file_id 275 532 LEFT OUTER JOIN changesets ON revisions.changeset_id = changesets.id 276 - WHERE revisions.commitid IS NULL 533 + WHERE changesets.commitid != IFNULL(revisions.commitid, '') 277 534 ORDER BY changesets.date ASC, files.file ASC") do |rev| 278 - if csid == nil || rev["commitid"] != csid 279 - puts " commit #{rev["commitid"]} at #{Time.at(rev["date"])} by " + 280 - rev["author"] 281 - csid = rev["commitid"] 282 - end 283 - 284 - puts " #{rev["file"]} #{rev["version"]}" 535 + puts [ "", rev["file"], rev["version"], rev["revcommitid"], "->", 536 + rev["commitid"] ].join(" ") 285 537 286 538 output = nil 287 539 IO.popen(ca = [ "cvs", "admin", "-C", ··· 295 547 end 296 548 end 297 549 298 - puts "cleaning up #{tmp_dir}/#{tree}" 299 - 300 - system("rm", "-rf", tmp_dir + "/#{tree}") || 301 - raise("rm of #{tmp_dir}/#{tree} failed") 550 + # re-read commitids and update file checksums since we probably just 551 + # changed many of them, which will then update commitids in revisions table 552 + sc.recursively_scan 302 553 end 303 554 end
+54 -22
openbsd-commitid.rb
··· 26 26 # THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 27 27 # 28 28 29 - DIR = File.dirname(__FILE__) + "/lib/" 29 + PWD = File.dirname(__FILE__) 30 30 31 - require DIR + "db" 32 - require DIR + "scanner" 33 - require DIR + "rcsfile" 34 - require DIR + "rcsrevision" 35 - require DIR + "outputter" 31 + require PWD + "/lib/db" 32 + require PWD + "/lib/scanner" 33 + require PWD + "/lib/rcsfile" 34 + require PWD + "/lib/rcsrevision" 35 + require PWD + "/lib/outputter" 36 36 37 37 CVSROOT = "/var/cvs-commitid/" 38 38 CVSTMP = "/var/cvs-tmp/" 39 39 CVSTREES = [ "src", "ports", "www", "xenocara" ] 40 40 41 + GENESIS = "01-f96d46480b33dcec5924884fef54166e169fc08d19f1d1812f5cd2d1f704219a-0000000" 42 + 41 43 CVSTREES.each do |tree| 42 - if Dir.exists?("#{CVSTMP}/#{tree}/CVS") 43 - raise "clean out #{CVSTMP} first" 44 + if !Dir.exists?("#{CVSROOT}/#{tree}") 45 + next 44 46 end 45 - end 46 47 47 - PWD = Dir.pwd 48 + sc = Scanner.new(PWD + "/db/openbsd-#{tree}.db", "#{CVSROOT}/#{tree}/") 48 49 49 - CVSTREES.each do |tree| 50 - sc = Scanner.new(PWD + "/db/openbsd-#{tree}.db", "#{CVSROOT}/#{tree}/") 50 + if tree == "src" 51 + # these revisions didn't get proper commitids with the others in the 52 + # changeset, so fudge them 53 + sc.commitid_hacks = { 54 + "sys/dev/pv/xenvar.h,v" => { 55 + "1.1" => "Ij2SOB19ATTH0yEx", 56 + "1.2" => "pq3FAYuwXteAsF4d", 57 + "1.3" => "C8vFI0RNH9XPJUKs", 58 + }, 59 + "usr.bin/mg/theo.c,v" => { 60 + "1.144" => "gSveQVkxMLs6vRqK", 61 + "1.145" => "GbEBL4CfPvDkB8hj", 62 + "1.146" => "8rkHsVfUx5xgPXRB", 63 + }, 64 + } 65 + 66 + # some rcs files have manually edited history that we need to work around 67 + sc.prev_revision_hacks = { 68 + # initial history gone? 69 + "sbin/isakmpd/pkcs.c,v" => { "1.4" => "0" }, 70 + # 1.6 gone 71 + "sys/arch/sun3/sun3/machdep.c,v" => { "1.7" => "1.5" }, 72 + } 73 + end 74 + 75 + # walk the directory of RCS files, create a "files" record for each one, 76 + # then run `rlog` on it and create a "revisions" record for each 51 77 sc.recursively_scan 78 + 79 + # group revisions into changesets by date/author/message, or for newer 80 + # commits, their stored commitid 52 81 sc.group_into_changesets 82 + 83 + # make sure every revision is accounted for 53 84 sc.stray_commitids_to_changesets 54 - sc.fill_in_changeset_data 55 85 56 - sc.repo_surgery(CVSTMP, CVSROOT, tree) 86 + # assign a canonical date/message/order to each changeset 87 + sc.fill_in_changeset_data 57 88 58 - sc.outputter.changelog("cvs.openbsd.org", 59 - f = File.open("out/Changelog-#{tree}", "w+")) 60 - f.close 89 + # check out the cvs tree in CVSTMP/tree and place each dead-1.1 file at its 90 + # initial non-dead revision found during `rlog` 91 + sc.stage_tmp_cvs(CVSTMP, CVSROOT, tree) 61 92 62 - sc.outputter.history(f = File.open("out/history-#{tree}", "w+")) 63 - f.close 93 + # calculate a hash for each commit by running 'cvs show' on it, and store it 94 + # in the commitids-{tree} file 95 + sc.recalculate_commitids(CVSTMP, CVSROOT, tree, GENESIS) 64 96 65 - sc.outputter.dup_script(f = File.open("out/add_commitids_to_#{tree}.sh", 66 - "w+"), tree) 67 - f.close 97 + # and finally, update every revision of every file and write its calculated 98 + # commitid, possibly replacing the random one already there 99 + sc.repo_surgery(CVSTMP, CVSROOT, tree) 68 100 end