new stuff to work with cvs provenance and branches · jcs.org/openbsd-commitid@964ddcd

+7 -44

README.md

··· 1 1 ###OpenBSD commitid generator 2 2 3 - A work in progress to assign `commitid` identifiers to all files in OpenBSD's 4 - CVS trees for commits before `commitid` functionality was enabled. 3 + A work in progress to assign CVS provenance-style `commitid` identifiers to 4 + all revisions of all files in OpenBSD's CVS trees. 5 5 6 6 ####Usage 7 7 ··· 20 20 21 21 `$ ruby openbsd-commitid.rb` 22 22 23 - **NOTE**: This script relies on recently added changes to OpenBSD's `rlog` 24 - and `cvs` tools: 23 + **NOTE**: This script relies on recently added changes to OpenBSD's `rlog` and 24 + `cvs` tools: 25 25 26 26 - `cvs admin -C` to set a revision's `commitid` 27 - - `rlog -E` and `rlog -S` to control the revision separators in `rlog` 28 - output, since the default line of dashes appears in old commit messages 29 - 30 - ####Details 31 - 32 - This script does the following steps for each of the `src`, `ports`, `www`, 33 - and `xenocara` trees. 27 + - `rlog -E` and `rlog -S` to control the revision separators in `rlog` output, 28 + since the default line of dashes appears in old commit messages 34 29 35 - 1. Recurse a directory of RCS ,v files (`/var/cvs-commitid/`), create a 36 - "files" record for each one. 37 - 38 - 2. Run `rlog` on each RCS file, parse the output and create a "revisions" 39 - record with each revision's author, date, version, commitid (if present), and 40 - log message. Update the "files" record to note the first non-`dead` version 41 - in the file. 42 - 43 - 3. Fetch all revisions not already matched to a changeset, ordered by author 44 - then date, and bundle them into changesets. Create a new "changesets" record 45 - for each, then update each of those "revisions" records with the new changeset 46 - id. By sorting all commits by author and date, it's possible to accurately 47 - find all files touched by an author in the same commit window. 48 - 49 - 4. For each newly created "changesets" record, update them with a definitive 50 - timestamp, log message, author, and commitid (creating a new one if needed) 51 - based on all of the "revisions" with that changeset id. 52 - 53 - 5. Do a `cvs checkout` from `/var/cvs-commitid/` to a temporary directory 54 - created in `/var/cvs-tmp/`, checking out revision 1.1 of each file. For each 55 - file that has a first-non-`dead` version number that is not 1.1, do another 56 - `cvs checkout` of that version of the file so that every file in the tree is 57 - now present in the working checked-out tree. This is required to operate on 58 - deleted files (moved to the Attic) and files where version 1.1 does not exist, 59 - such as those created on branches. 60 - 61 - 6. For each "revisions" record of each "files" record that doesn't have a 62 - recorded `commitid` (meaning this script generated one while bundling it into a 63 - "changesets" record), run `cvs admin -C` to assign the `commitid` to that 64 - revision in the RCS `,v` file in `/var/cvs-commitid/`. 65 - 66 - 7. `rm -rf` the temporary checked-out directory in `/var/cvs-tmp/` since the 67 - changes are now all present in the `/var/cvs-commitid/` tree. 30 + For details of how this script works, read `openbsd-commitid.rb`.

+25 -11

lib/db.rb

··· 30 30 class Db 31 31 def initialize(dbf) 32 32 @db = SQLite3::Database.new dbf 33 + @db.results_as_hash = true 33 34 34 35 @db.execute "CREATE TABLE IF NOT EXISTS changesets 35 36 (id INTEGER PRIMARY KEY, date INTEGER, author TEXT, commitid TEXT, 36 - log TEXT)" 37 - @db.execute "CREATE UNIQUE INDEX IF NOT EXISTS u_commitid ON changesets 37 + log TEXT, branch TEXT, csorder INTEGER)" 38 + @db.execute "CREATE UNIQUE INDEX IF NOT EXISTS u_cs_commitid ON changesets 38 39 (commitid)" 40 + @db.execute "CREATE UNIQUE INDEX IF NOT EXISTS u_cs_csorder ON changesets 41 + (csorder)" 42 + @db.execute "CREATE INDEX IF NOT EXISTS cs_branch ON changesets (branch)" 39 43 40 44 @db.execute "CREATE TABLE IF NOT EXISTS files 41 45 (id INTEGER PRIMARY KEY, file TEXT, first_undead_version TEXT, 42 - size INTEGER)" 43 - @db.execute "CREATE UNIQUE INDEX IF NOT EXISTS u_file ON files 46 + cksum TEXT)" 47 + @db.execute "CREATE UNIQUE INDEX IF NOT EXISTS u_f_file ON files 44 48 (file)" 45 49 46 50 @db.execute "CREATE TABLE IF NOT EXISTS revisions 47 51 (id INTEGER PRIMARY KEY, file_id INTEGER, changeset_id INTEGER, 48 52 date INTEGER, version TEXT, author TEXT, commitid TEXT, log TEXT, 49 - state TEXT)" 50 - @db.execute "CREATE UNIQUE INDEX IF NOT EXISTS u_revision ON revisions 53 + state TEXT, branch TEXT)" 54 + @db.execute "CREATE UNIQUE INDEX IF NOT EXISTS u_r_revision ON revisions 51 55 (file_id, version)" 52 - @db.execute "CREATE INDEX IF NOT EXISTS empty_changesets ON revisions 56 + @db.execute "CREATE INDEX IF NOT EXISTS r_empty_changesets ON revisions 53 57 (changeset_id)" 54 - @db.execute "CREATE INDEX IF NOT EXISTS cs_by_commitid ON revisions 58 + @db.execute "CREATE INDEX IF NOT EXISTS r_cs_by_commitid ON revisions 55 59 (commitid, changeset_id)" 56 - @db.execute "CREATE INDEX IF NOT EXISTS all_revs_by_author ON revisions 60 + @db.execute "CREATE INDEX IF NOT EXISTS r_all_revs_by_author ON revisions 57 61 (author, date)" 58 - @db.execute "CREATE INDEX IF NOT EXISTS all_revs_by_version_and_state ON 62 + @db.execute "CREATE INDEX IF NOT EXISTS r_all_revs_by_version_and_state ON 59 63 revisions (version, state)" 64 + @db.execute "CREATE INDEX IF NOT EXISTS r_branch ON revisions (branch)" 60 65 61 - @db.results_as_hash = true 66 + @db.execute("CREATE TABLE IF NOT EXISTS vendor_branches 67 + (id INTEGER PRIMARY KEY, revision_id INTEGER, branch TEXT)") 68 + @db.execute("CREATE INDEX IF NOT EXISTS vb_revision ON vendor_branches 69 + (revision_id)") 70 + @db.execute("CREATE INDEX IF NOT EXISTS vb_branch_branch ON vendor_branches 71 + (branch)") 62 72 end 63 73 64 74 def execute(*args) ··· 69 79 else 70 80 @db.execute(*args) 71 81 end 82 + end 83 + 84 + def last_insert_row_id 85 + @db.last_insert_row_id 72 86 end 73 87 end

+10 -5

lib/outputter.rb

··· 39 39 printlog = Proc.new { 40 40 fh.puts "Changes by: #{last["author"]}@#{domain} " << 41 41 Time.at(last["date"].to_i).strftime("%Y/%m/%d %H:%M:%S") 42 - fh.puts "Commitid: #{last["commitid"]}" 42 + if last["commitid"].to_s != "" 43 + fh.puts "Commitid: #{last["commitid"]}" 44 + end 45 + if last["branch"].to_s != "" 46 + fh.puts "Branch: #{last["branch"]}" 47 + end 43 48 fh.puts "" 44 49 fh.puts "Modified files:" 45 50 ··· 95 100 } 96 101 97 102 @scanner.db.execute("SELECT 98 - changesets.date, changesets.author, changesets.commitid, changesets.log, 99 - files.file 103 + changesets.csorder, changesets.date, changesets.author, 104 + changesets.commitid, changesets.log, files.file, revisions.branch 100 105 FROM changesets 101 106 LEFT OUTER JOIN revisions ON revisions.changeset_id = changesets.id 102 107 LEFT OUTER JOIN files ON revisions.file_id = files.id 103 - ORDER BY changesets.date, files.file") do |csfile| 104 - if csfile["commitid"] == last["commitid"] 108 + ORDER BY changesets.csorder, files.file") do |csfile| 109 + if csfile["csorder"] == last["csorder"] 105 110 files.push csfile["file"] 106 111 else 107 112 if files.any?

+20 -4

lib/rcsfile.rb

··· 26 26 # 27 27 28 28 class RCSFile 29 - attr_accessor :revisions, :first_undead_version 29 + attr_accessor :file, :revisions, :symbols, :first_undead_version 30 30 31 31 RCSEND = "==================OPENBSD_COMMITID_RCS_END==================" 32 32 REVSEP = "------------------OPENBSD_COMMITID_REV_SEP------------------" 33 33 34 34 def initialize(file) 35 + @file = file 35 36 @revisions = {} 37 + @symbols = {} 36 38 37 39 blocks = [] 38 40 IO.popen([ "rlog", "-E#{RCSEND}", "-S#{REVSEP}", file ]) do |rlog| 39 - blocks = rlog.read.force_encoding("binary"). 41 + blocks = rlog.read.force_encoding("iso-8859-1"). 40 42 split(/^(#{REVSEP}|#{RCSEND})\n?$/). 41 43 reject{|b| b == RCSEND || b == REVSEP } 42 44 end ··· 45 47 raise "file #{file} didn't come out of rlog properly" 46 48 end 47 49 48 - blocks.shift 50 + insymbols = false 51 + blocks.shift.split("\n").each do |l| 52 + if l.match(/^symbolic names:/) 53 + insymbols = true 54 + elsif insymbols && (m = l.match(/^\t(.+): ([\d\.]+)$/)) 55 + @symbols[m[1].encode("UTF-8")] = m[2].encode("UTF-8") 56 + else 57 + insymbols = false 58 + end 59 + end 60 + 49 61 blocks.each do |block| 50 - rev = RCSRevision.new(block) 62 + rev = RCSRevision.new(self, block) 51 63 if @revisions[rev.version] 52 64 raise "duplicate revision #{rev.version} in #{file}" 53 65 end ··· 58 70 # this has nothing to do with Gem, but it has a version comparator 59 71 sort{|a,b| Gem::Version.new(a.version) <=> Gem::Version.new(b.version) }. 60 72 select{|r| r.state != "dead" }.first.version 73 + end 74 + 75 + def to_s 76 + "RCSFile: #{@file}" 61 77 end 62 78 end

+97 -5

lib/rcsrevision.rb

··· 28 28 require "date" 29 29 30 30 class RCSRevision 31 - attr_accessor :version, :date, :author, :state, :lines, :commitid, :log 31 + attr_accessor :rcsfile, :version, :date, :author, :state, :lines, :commitid, 32 + :log, :branch, :vendor_branches 33 + 34 + def self.previous_of(ver) 35 + nums = ver.split(".").map{|z| z.to_i } 36 + 37 + if nums.last == 1 38 + # 1.3.2.1 -> 1.3 39 + 2.times { nums.pop } 40 + else 41 + # 1.3.2.2 -> 1.3.2.1 42 + nums[nums.count - 1] -= 1 43 + end 44 + 45 + outnum = nums.join(".") 46 + if outnum == "" 47 + return "0" 48 + else 49 + return outnum 50 + end 51 + end 52 + 53 + # 1.1.0.2 -> 1.1.2.1 54 + def self.first_branch_version_of(ver) 55 + nums = ver.split(".").map{|z| z.to_i } 56 + 57 + if nums[nums.length - 2] != 0 58 + return ver 59 + end 60 + 61 + last = nums.pop 62 + nums.pop 63 + nums.push last 64 + nums.push 1 65 + 66 + return nums.join(".") 67 + end 68 + 69 + def self.is_vendor_branch?(ver) 70 + !!ver.match(/^1\.1\.1\..*/) 71 + end 72 + 73 + def self.is_trunk?(ver) 74 + ver.split(".").count == 2 75 + end 32 76 33 77 # str: "revision 1.7\ndate: 1996/12/14 12:17:33; author: mickey; state: Exp; lines: +3 -3;\n-Wall'ing." 34 - def initialize(str) 78 + def initialize(rcsfile, str) 79 + @rcsfile = rcsfile 35 80 @version = nil 36 81 @date = 0 37 82 @author = nil ··· 39 84 @lines = nil 40 85 @commitid = nil 41 86 @log = nil 87 + @branch = nil 88 + @vendor_branches = [] 42 89 43 90 lines = str.gsub(/^\s*/, "").split("\n") 44 91 # -> [ ··· 52 99 lines.delete_at(2) 53 100 end 54 101 55 - @version = lines.first.scan(/^revision ([\d\.]+)($|\tlocked by)/).first.first 102 + @version = lines.first.scan(/^revision ([\d\.]+)($|\tlocked by)/).first. 103 + first.encode("UTF-8") 56 104 # -> "1.7" 57 105 58 106 # date/author/state/lines/commitid line 59 107 lines[1].split(/;[ \t]*/).each do |piece| 60 108 kv = piece.split(": ") 61 - self.send(kv[0] + "=", kv[1]) 109 + self.send(kv[0] + "=", kv[1].encode("UTF-8")) 62 110 end 63 111 # -> @date = "1996/12/14 12:17:33", @author = "mickey", ... 64 112 ··· 69 117 end 70 118 # -> @date = 850565853 71 119 72 - @log = lines[2, lines.count].join("\n") 120 + @log = lines[2, lines.count].join("\n").encode("UTF-8", 121 + :invalid => :replace, :undef => :replace, :replace => "?") 122 + 123 + if @version.match(/^\d+\.\d+$/) 124 + # no branch 125 + elsif @version.match(/^1\.1\.1\./) || 126 + (@version == "1.1.2.1" && @branch == nil) 127 + # vendor 128 + @rcsfile.symbols.each do |k,v| 129 + if v == "1.1.1" 130 + @vendor_branches.push k 131 + end 132 + end 133 + elsif m = @version.match(/^(\d+)\.(\d+)\.(\d+)\.\d+$/) 134 + # 1.2.2.3 -> 1.2.0.2 135 + sym = [ m[1], m[2], "0", m[3] ].join(".") 136 + @rcsfile.symbols.each do |s,v| 137 + if v == sym 138 + if @branch 139 + raise "version #{@version} matched two symbols (#{@branch}, #{s})" 140 + end 141 + 142 + @branch = s 143 + end 144 + end 145 + 146 + if !@branch && @rcsfile.symbols.values.include?(@version) 147 + # if there's an exact match, this was probably just an import done with 148 + # a vendor branch id (import -b) 149 + elsif !@branch 150 + # branch was deleted, but we don't want this appearing on HEAD, so call 151 + # it something 152 + @branch = "_branchless_#{@version.gsub(".", "_")}" 153 + end 154 + 155 + if @branch && @rcsfile.symbols[@branch] && 156 + @rcsfile.symbols[@branch].match(/^1\.1\.0\.\d+$/) 157 + # this is also a vendor branch 158 + if !@vendor_branches.include?(@branch) 159 + @vendor_branches.push @branch 160 + end 161 + end 162 + else 163 + raise "TODO: handle version #{@version}" 164 + end 73 165 end 74 166 end

+325 -74

lib/scanner.rb

··· 26 26 # 27 27 28 28 class Scanner 29 - attr_accessor :outputter, :db 29 + attr_accessor :outputter, :db, :commitid_hacks, :prev_revision_hacks 30 30 31 31 # how long commits by the same author with the same commit message can be 32 32 # from each other and still be grouped in the same changeset ··· 36 36 @db = Db.new dbf 37 37 @root = (root + "/").gsub(/\/\//, "/") 38 38 @outputter = Outputter.new(self) 39 + @prev_revision_hacks = {} 40 + @commitid_hacks = {} 39 41 end 40 42 41 43 def recursively_scan(dir = nil) ··· 55 57 end 56 58 57 59 def scan(f) 58 - stat = File.stat(f) 60 + cksum = "" 61 + IO.popen([ "cksum", "-q", f ]) do |c| 62 + parts = c.read.force_encoding("iso-8859-1").split(" ") 63 + if parts.length != 2 64 + raise "invalid output from cksum: #{parts.inspect}" 65 + end 66 + 67 + cksum = parts[0].encode("utf-8") 68 + end 69 + 59 70 canfile = f[@root.length, f.length - @root.length].gsub(/(^|\/)Attic\//, 60 71 "/").gsub(/^\/*/, "") 61 72 62 - fid = @db.execute("SELECT id, first_undead_version, size FROM files " + 73 + fid = @db.execute("SELECT id, first_undead_version, cksum FROM files " + 63 74 "WHERE file = ?", [ canfile ]).first 64 - if fid && fid["size"].to_i > 0 && fid["size"].to_i == stat.size 75 + if fid && fid["cksum"].to_s == cksum 65 76 return 66 77 end 67 78 ··· 69 80 70 81 rcs = RCSFile.new(f) 71 82 83 + @db.execute("BEGIN") 84 + 72 85 if fid 73 86 if fid["first_undead_version"] != rcs.first_undead_version 74 87 @db.execute("UPDATE files SET first_undead_version = ? WHERE id = ?", ··· 82 95 end 83 96 raise if !fid 84 97 98 + if @commitid_hacks && @commitid_hacks[canfile] 99 + @commitid_hacks[canfile].each do |v,cid| 100 + if rcs.revisions[v].commitid && 101 + rcs.revisions[v].commitid != cid 102 + raise "hack for #{canfile}:#{v} commitid of #{cid.inspect} would " + 103 + "overwrite #{rcs.revisions[v].commitid}" 104 + end 105 + 106 + puts " faking commitid for revision #{v} -> #{cid}" 107 + rcs.revisions[v].commitid = cid 108 + end 109 + end 110 + 85 111 rcs.revisions.each do |r,rev| 86 112 rid = @db.execute("SELECT id, commitid FROM revisions WHERE " + 87 113 "file_id = ? AND version = ?", [ fid["id"], r ]).first ··· 95 121 "AND version = ?", [ rev.commitid, fid["id"], rev.version ]) 96 122 end 97 123 else 98 - puts " inserted #{r}, authored #{rev.date} by #{rev.author}" + 124 + # files added on branches/imports have unhelpful commit messages with 125 + # the helpful ones on the branch versions, so copy them over while 126 + # we're here 127 + if rev.log.to_s == "Initial revision" 128 + if r == "1.1" && rcs.revisions["1.1.1.1"] 129 + rev.log = rcs.revisions["1.1.1.1"].log 130 + puts " revision #{r} using log from 1.1.1.1" 131 + else 132 + puts " revision #{r} keeping log #{rev.log.inspect}, no 1.1.1.1" 133 + end 134 + elsif m = rev.log.to_s. 135 + match(/\Afile .+? was initially added on branch ([^\.]+)\.\z/) 136 + brver = nil 137 + if br = rcs.symbols[m[1]] 138 + brver = RCSRevision.first_branch_version_of(br) 139 + if !rcs.revisions[brver] 140 + if rcs.revisions[brver + ".1"] 141 + brver += ".1" 142 + else 143 + puts " revision #{r} keeping log #{rev.log.inspect}, no #{brver}" 144 + brver = nil 145 + end 146 + end 147 + end 148 + 149 + if brver 150 + rev.log = rcs.revisions[brver].log 151 + puts " revision #{r} using log from #{brver}" 152 + 153 + # but consider this trunk revision on the branch the file was added 154 + # on, just so we keep it in the same changeset 155 + rev.branch = rcs.revisions[brver].branch 156 + else 157 + puts " revision #{r} keeping log #{rev.log.inspect}, no #{m[1]}" 158 + end 159 + end 160 + 161 + puts " inserted #{r}" + 162 + (rev.branch ? " (branch #{rev.branch})" : "") + 163 + ", authored #{rev.date} by #{rev.author}" + 99 164 (rev.commitid ? ", commitid #{rev.commitid}" : "") 100 165 101 166 @db.execute("INSERT INTO revisions (file_id, date, version, author, " + 102 - "commitid, state, log) VALUES (?, ?, ?, ?, ?, ?, ?)", 167 + "commitid, state, log, branch) VALUES (?, ?, ?, ?, ?, ?, ?, ?)", 103 168 [ fid["id"], rev.date, rev.version, rev.author, rev.commitid, 104 - rev.state, rev.log ]) 169 + rev.state, rev.log, rev.branch ]) 170 + rid = { "id" => @db.last_insert_row_id } 105 171 end 106 - end 107 172 108 - @db.execute("UPDATE files SET size = ? WHERE id = ?", 109 - [ stat.size, fid["id"] ]) 110 - end 173 + vbs = @db.execute("SELECT branch FROM vendor_branches WHERE " + 174 + "revision_id = ?", [ rid["id"] ]).map{|r| r["branch"] }.flatten 111 175 112 - def stray_commitids_to_changesets 113 - stray_commitids = @db.execute("SELECT DISTINCT author, commitid FROM " + 114 - "revisions WHERE commitid IS NOT NULL AND changeset_id IS NULL") 115 - stray_commitids.each do |row| 116 - csid = @db.execute("SELECT id FROM changesets WHERE commitid = ?", 117 - [ row["commitid"] ]).first 118 - if !csid 119 - @db.execute("INSERT INTO changesets (author, commitid) VALUES (?, ?)", 120 - [ row["author"], row["commitid"] ]) 121 - csid = @db.execute("SELECT id FROM changesets WHERE commitid = ?", 122 - [ row["commitid"] ]).first 176 + rev.vendor_branches.each do |vb| 177 + if !vbs.include?(vb) 178 + puts " inserting vendor branch #{vb}" 179 + @db.execute("INSERT INTO vendor_branches (revision_id, branch) " + 180 + "VALUES (?, ?)", [ rid["id"], vb ]) 181 + end 123 182 end 124 - raise if !csid 183 + 184 + vbs.each do |vb| 185 + if !rev.vendor_branches.include?(vb) 186 + @db.execute("DELETE FROM vendor_branches WHERE revision_id = ? " + 187 + "AND branch = ?", [ rid["id"], vb ]) 188 + end 189 + end 190 + end 125 191 126 - puts "commitid #{row["commitid"]} -> changeset #{csid["id"]}" 192 + @db.execute("UPDATE files SET cksum = ? WHERE id = ?", 193 + [ cksum, fid["id"] ]) 127 194 128 - @db.execute("UPDATE revisions SET changeset_id = ? WHERE commitid = ?", 129 - [ csid["id"], row["commitid"] ]) 130 - end 195 + @db.execute("COMMIT") 131 196 end 132 197 133 198 def group_into_changesets 199 + puts "grouping into changesets" 200 + 134 201 new_sets = [] 135 202 last_row = {} 136 203 cur_set = [] 137 204 138 - # TODO: don't conditionalize with null changeset_ids, to allow this to run 139 - # incrementally and match new commits to old changesets 205 + @db.execute("BEGIN") 206 + 207 + # commits by the same author with the same log message within a small 208 + # timeframe are grouped together 140 209 @db.execute("SELECT * FROM revisions WHERE changeset_id IS NULL ORDER " + 141 - "BY author ASC, date ASC") do |row| 142 - # commits by the same author with the same log message (unless they're 143 - # initial imports - 1.1.1.1) within a small timeframe are grouped 144 - # together 145 - if last_row.any? && row["author"] == last_row["author"] && 146 - (row["log"] == last_row["log"] || row["log"] == "Initial revision" || 147 - last_row["log"] == "Initial revision") && 210 + "BY author ASC, branch ASC, commitid ASC, date ASC") do |row| 211 + if last_row.any? && 212 + row["author"] == last_row["author"] && 213 + row["branch"] == last_row["branch"] && 214 + row["log"] == last_row["log"] && 215 + row["commitid"] == last_row["commitid"] && 148 216 row["date"].to_i - last_row["date"].to_i <= MAX_GROUP_WINDOW 149 217 cur_set.push row["id"].to_i 150 218 elsif !last_row.any? ··· 165 233 end 166 234 167 235 new_sets.each do |s| 168 - puts "new set with revision ids #{s.inspect}" 236 + puts " new set with revision ids #{s.inspect}" 169 237 @db.execute("INSERT INTO changesets (id) VALUES (NULL)") 170 238 id = @db.execute("SELECT last_insert_rowid() AS id").first["id"] 171 239 raise if !id ··· 180 248 if @db.execute("SELECT * FROM revisions WHERE changeset_id IS NULL").any? 181 249 raise "still have revisions with empty changesets" 182 250 end 251 + 252 + @db.execute("COMMIT") 253 + end 254 + 255 + def stray_commitids_to_changesets 256 + @db.execute("BEGIN") 257 + 258 + puts "finding stray commitids" 259 + 260 + stray_commitids = @db.execute("SELECT DISTINCT author, commitid FROM " + 261 + "revisions WHERE commitid IS NOT NULL AND changeset_id IS NULL") 262 + stray_commitids.each do |row| 263 + csid = @db.execute("SELECT id FROM changesets WHERE commitid = ?", 264 + [ row["commitid"] ]).first 265 + if !csid 266 + @db.execute("INSERT INTO changesets (author, commitid) VALUES (?, ?)", 267 + [ row["author"], row["commitid"] ]) 268 + csid = @db.execute("SELECT id FROM changesets WHERE commitid = ?", 269 + [ row["commitid"] ]).first 270 + end 271 + raise if !csid 272 + 273 + puts " commitid #{row["commitid"]} -> changeset #{csid["id"]}" 274 + 275 + @db.execute("UPDATE revisions SET changeset_id = ? WHERE commitid = ?", 276 + [ csid["id"], row["commitid"] ]) 277 + end 278 + 279 + @db.execute("COMMIT") 183 280 end 184 281 185 282 def fill_in_changeset_data 283 + puts "assigning dates to changesets" 284 + 285 + @db.execute("BEGIN") 286 + 186 287 cses = {} 187 288 @db.execute("SELECT id, commitid FROM changesets WHERE date IS NULL") do |c| 188 289 cses[c["id"]] = c["commitid"] 189 290 end 190 291 292 + # create canonical dates for each changeset, so we can pull them back out 293 + # in order 191 294 cses.each do |csid,comid| 192 295 date = nil 193 296 commitid = comid 194 297 log = nil 195 298 author = nil 299 + branch = nil 196 300 197 301 @db.execute("SELECT * FROM revisions WHERE changeset_id = ? ORDER BY " + 198 302 "date ASC", [ csid ]) do |rev| ··· 200 304 date = rev["date"] 201 305 end 202 306 203 - if rev["log"] != "Initial revision" 307 + if log && rev["log"] != log 308 + raise "logs different between revs of #{csid}" 309 + else 204 310 log = rev["log"] 205 311 end 206 312 ··· 209 315 else 210 316 author = rev["author"] 211 317 end 212 - end 213 318 214 - if commitid.to_s == "" 215 - commitid = "" 216 - while commitid.length < 16 217 - c = rand(75) + 48 218 - if ((c >= 48 && c <= 57) || (c >= 65 && c <= 90) || 219 - (c >= 97 && c <= 122)) 220 - commitid << c.chr 221 - end 319 + if branch && rev["branch"] != branch 320 + raise "branches different between revs of #{csid}" 321 + else 322 + branch = rev["branch"] 222 323 end 223 324 end 224 325 ··· 226 327 raise "no date for changeset #{csid}" 227 328 end 228 329 229 - puts "changeset #{csid} -> commitid #{commitid}" 330 + @db.execute("UPDATE changesets SET date = ?, log = ?, author = ?, " + 331 + "branch = ? WHERE id = ?", [ date, log, author, branch, csid ]) 332 + end 333 + 334 + @db.execute("COMMIT") 230 335 231 - @db.execute("UPDATE changesets SET date = ?, commitid = ?, log = ?, " + 232 - "author = ? WHERE id = ?", [ date, commitid, log, author, csid ]) 336 + puts "assigning changeset order" 337 + 338 + cses = [] 339 + @db.execute("SELECT id FROM changesets WHERE csorder IS NULL ORDER BY " + 340 + "date, author") do |c| 341 + cses.push c["id"] 233 342 end 234 - end 235 343 236 - def repo_surgery(tmp_dir, cvs_root, tree) 237 - puts "checking out #{tree} from #{cvs_root} to #{tmp_dir}" 344 + highestcs = @db.execute("SELECT MAX(csorder) AS lastcs FROM changesets " + 345 + "WHERE csorder IS NOT NULL").first["lastcs"].to_i 238 346 239 - Dir.chdir(tmp_dir) 347 + @db.execute("BEGIN") 348 + cses.each do |cs| 349 + highestcs += 1 350 + @db.execute("UPDATE changesets SET csorder = ?, commitid = NULL WHERE " + 351 + "id = ?", [ highestcs, cs ]) 352 + end 353 + @db.execute("COMMIT") 354 + end 240 355 356 + def stage_tmp_cvs(tmp_dir, cvs_root, tree) 241 357 # for a deleted file to be operated by with cvs admin, it must be 242 358 # present in the CVS/Entries files, so check out all files at rev 1.1 so we 243 359 # know they will not be deleted. otherwise cvs admin will fail silently 244 - system("cvs", "-Q", "-d", cvs_root, "co", "-r1.1", tree) || 245 - raise("cvs checkout returned non-zero") 360 + if File.exists?("#{tmp_dir}/#{tree}/CVS/Entries") 361 + puts "updating #{tmp_dir}#{tree} from #{cvs_root}" 362 + Dir.chdir("#{tmp_dir}/#{tree}") 363 + system("cvs", "-Q", "-d", cvs_root, "update", "-PAd", "-r1.1") || 364 + raise("cvs update returned non-zero") 365 + else 366 + puts "checking out #{cvs_root}#{tree} to #{tmp_dir}" 367 + Dir.chdir(tmp_dir) 368 + system("cvs", "-Q", "-d", cvs_root, "co", "-r1.1", tree) || 369 + raise("cvs checkout returned non-zero") 370 + end 371 + 372 + Dir.chdir(tmp_dir) 246 373 247 374 # but if any files were added on a branch or somehow have a weird history, 248 375 # their 1.1 revision will be dead so check out any non-dead revision of ··· 251 378 @db.execute("SELECT 252 379 file, first_undead_version 253 380 FROM files 254 - WHERE first_undead_version NOT LIKE '1.1'") do |rev| 381 + WHERE first_undead_version NOT LIKE '1.1' AND 382 + id IN (SELECT file_id FROM revisions WHERE commitid IS NULL)") do |rev| 255 383 dead11s[rev["file"]] = rev["first_undead_version"] 256 384 end 257 385 ··· 264 392 "#{tree}/#{confile}") || 265 393 raise("cvs co -r#{rev} #{confile} failed") 266 394 end 395 + 396 + Dir.chdir("#{tmp_dir}/#{tree}") 397 + end 398 + 399 + def recalculate_commitids(tmp_dir, cvs_root, tree, genesis) 267 400 Dir.chdir(tmp_dir + "/#{tree}") 268 401 269 - csid = nil 270 - @db.execute("SELECT 271 - files.file, changesets.commitid, changesets.author, changesets.date, 272 - revisions.version 402 + puts "recalculating new commitids from genesis #{genesis}" 403 + 404 + gfn = "#{cvs_root}/CVSROOT/commitid_genesis" 405 + if File.exists?(gfn) && File.read(gfn).strip != genesis 406 + raise "genesis in #{gfn} is not #{genesis.inspect}" 407 + else 408 + File.write("#{cvs_root}/CVSROOT/commitid_genesis", genesis + "\n") 409 + end 410 + 411 + changesets = [] 412 + @db.execute("SELECT id, csorder, commitid FROM changesets 413 + ORDER BY csorder ASC") do |cs| 414 + changesets.push cs 415 + end 416 + 417 + puts " writing commitids-#{tree} (#{changesets.length} " + 418 + "changeset#{changesets.length == 1 ? "" : "s"})" 419 + 420 + commitids = File.open("#{cvs_root}/CVSROOT/commitids-#{tree}", "w+") 421 + 422 + # every changeset needs to know the revisions of its files from the 423 + # previous change, taking into account branches. we can easily calculate 424 + # this, but we should make sure that calculated revision actually exists 425 + files = {} 426 + @db.execute("SELECT id, file FROM files") do |row| 427 + files[row["id"]] = row["file"] 428 + end 429 + files.each do |id,file| 430 + vers = [] 431 + 432 + @db.execute("SELECT version FROM revisions WHERE file_id = ?", 433 + [ id ]) do |rev| 434 + vers.push rev["version"] 435 + end 436 + 437 + vers.each do |rev| 438 + if prev_revision_hacks[file] && (hpre = prev_revision_hacks[file][rev]) 439 + puts " faking previous revision of #{file} #{rev} -> #{hpre}" 440 + pre = hpre 441 + else 442 + pre = RCSRevision.previous_of(rev) 443 + end 444 + 445 + if pre != "0" && !vers.include?(pre) 446 + raise "#{file}: revision #{rev} previous #{pre} not found" 447 + end 448 + end 449 + end 450 + files = {} 451 + 452 + # for each changeset with no commitid, store it in the commitids-* file 453 + # with a temporary commitid of just its changeset number, do a 'cvs show' 454 + # on it to calculate the actual commitid, then overwrite that hash in the 455 + # commitids file, and store our new one 456 + changesets.each do |cs| 457 + cline = [] 458 + commitid = "" 459 + if cs["commitid"].to_s != "" 460 + commitid = cs["commitid"] 461 + else 462 + commitid = sprintf("01-%064d-%07d", cs["csorder"], cs["csorder"]) 463 + end 464 + 465 + # order by length(revisions.version) to put 1.1 first, then 1.1.1.1, to 466 + # match 'cvs import' 467 + @db.execute("SELECT 468 + files.file, revisions.version, revisions.branch 469 + FROM revisions 470 + LEFT OUTER JOIN files ON files.id = revisions.file_id 471 + WHERE revisions.changeset_id = ? 472 + ORDER BY files.file ASC, LENGTH(revisions.version) ASC, 473 + revisions.version ASC", [ cs["id"] ]) do |rev| 474 + if cline.length == 0 475 + cline.push commitid 476 + end 477 + 478 + cline.push [ RCSRevision.previous_of(rev["version"]), rev["version"], 479 + rev["branch"].to_s, rev["file"].gsub(/,v$/, "") ].join(":") 480 + end 481 + 482 + pos = commitids.pos 483 + commitids.puts cline.join("\t") 484 + 485 + if cs["commitid"].to_s == "" 486 + commitids.fsync 487 + 488 + newcsum = `cvs show #{commitid} | tail -n +2 | cksum -a sha512/256`.strip 489 + if $?.exitstatus != 0 490 + raise "failed running cvs show #{commitid}" 491 + end 492 + 493 + # null 494 + if newcsum == "c672b8d1ef56ed28ab87c3622c5114069bdd3ad7b8f9737498d0c01ecef0967a" 495 + raise "failed getting new commitid from #{commitid}" 496 + end 497 + 498 + newid = sprintf("01-%64s-%07d", newcsum, cs["csorder"]) 499 + 500 + @db.execute("UPDATE changesets SET commitid = ? WHERE id = ?", 501 + [ newid, cs["id"] ]) 502 + 503 + puts " changeset #{cs["csorder"]} -> #{newid}" 504 + 505 + # go back, rewrite just our commitid, then get ready for the next line 506 + commitids.seek(pos) 507 + commitids.write(newid) 508 + commitids.seek(0, IO::SEEK_END) 509 + commitids.fsync 510 + else 511 + puts " changeset #{cs["csorder"]} == #{cs["commitid"]}" 512 + end 513 + end 514 + 515 + commitids.close 516 + end 517 + 518 + def repo_surgery(tmp_dir, cvs_root, tree) 519 + puts "updating commitids in rcs files at #{cvs_root} via #{tmp_dir}" 520 + 521 + Dir.chdir("#{tmp_dir}/#{tree}") 522 + 523 + # for each revision we have in the db (picked up from a scan) that has a 524 + # different commitid from what we assigned to its changeset, update the 525 + # commitid in the rcs file in the repo, and then our revisions records 526 + @db.execute(" 527 + SELECT 528 + files.file, changesets.commitid, revisions.version, revisions.id AS revid, 529 + revisions.commitid AS revcommitid 273 530 FROM revisions 274 - LEFT OUTER JOIN files ON files.id = file_id 531 + LEFT OUTER JOIN files ON files.id = revisions.file_id 275 532 LEFT OUTER JOIN changesets ON revisions.changeset_id = changesets.id 276 - WHERE revisions.commitid IS NULL 533 + WHERE changesets.commitid != IFNULL(revisions.commitid, '') 277 534 ORDER BY changesets.date ASC, files.file ASC") do |rev| 278 - if csid == nil || rev["commitid"] != csid 279 - puts " commit #{rev["commitid"]} at #{Time.at(rev["date"])} by " + 280 - rev["author"] 281 - csid = rev["commitid"] 282 - end 283 - 284 - puts " #{rev["file"]} #{rev["version"]}" 535 + puts [ "", rev["file"], rev["version"], rev["revcommitid"], "->", 536 + rev["commitid"] ].join(" ") 285 537 286 538 output = nil 287 539 IO.popen(ca = [ "cvs", "admin", "-C", ··· 295 547 end 296 548 end 297 549 298 - puts "cleaning up #{tmp_dir}/#{tree}" 299 - 300 - system("rm", "-rf", tmp_dir + "/#{tree}") || 301 - raise("rm of #{tmp_dir}/#{tree} failed") 550 + # re-read commitids and update file checksums since we probably just 551 + # changed many of them, which will then update commitids in revisions table 552 + sc.recursively_scan 302 553 end 303 554 end

+54 -22

openbsd-commitid.rb

··· 26 26 # THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 27 27 # 28 28 29 - DIR = File.dirname(__FILE__) + "/lib/" 29 + PWD = File.dirname(__FILE__) 30 30 31 - require DIR + "db" 32 - require DIR + "scanner" 33 - require DIR + "rcsfile" 34 - require DIR + "rcsrevision" 35 - require DIR + "outputter" 31 + require PWD + "/lib/db" 32 + require PWD + "/lib/scanner" 33 + require PWD + "/lib/rcsfile" 34 + require PWD + "/lib/rcsrevision" 35 + require PWD + "/lib/outputter" 36 36 37 37 CVSROOT = "/var/cvs-commitid/" 38 38 CVSTMP = "/var/cvs-tmp/" 39 39 CVSTREES = [ "src", "ports", "www", "xenocara" ] 40 40 41 + GENESIS = "01-f96d46480b33dcec5924884fef54166e169fc08d19f1d1812f5cd2d1f704219a-0000000" 42 + 41 43 CVSTREES.each do |tree| 42 - if Dir.exists?("#{CVSTMP}/#{tree}/CVS") 43 - raise "clean out #{CVSTMP} first" 44 + if !Dir.exists?("#{CVSROOT}/#{tree}") 45 + next 44 46 end 45 - end 46 47 47 - PWD = Dir.pwd 48 + sc = Scanner.new(PWD + "/db/openbsd-#{tree}.db", "#{CVSROOT}/#{tree}/") 48 49 49 - CVSTREES.each do |tree| 50 - sc = Scanner.new(PWD + "/db/openbsd-#{tree}.db", "#{CVSROOT}/#{tree}/") 50 + if tree == "src" 51 + # these revisions didn't get proper commitids with the others in the 52 + # changeset, so fudge them 53 + sc.commitid_hacks = { 54 + "sys/dev/pv/xenvar.h,v" => { 55 + "1.1" => "Ij2SOB19ATTH0yEx", 56 + "1.2" => "pq3FAYuwXteAsF4d", 57 + "1.3" => "C8vFI0RNH9XPJUKs", 58 + }, 59 + "usr.bin/mg/theo.c,v" => { 60 + "1.144" => "gSveQVkxMLs6vRqK", 61 + "1.145" => "GbEBL4CfPvDkB8hj", 62 + "1.146" => "8rkHsVfUx5xgPXRB", 63 + }, 64 + } 65 + 66 + # some rcs files have manually edited history that we need to work around 67 + sc.prev_revision_hacks = { 68 + # initial history gone? 69 + "sbin/isakmpd/pkcs.c,v" => { "1.4" => "0" }, 70 + # 1.6 gone 71 + "sys/arch/sun3/sun3/machdep.c,v" => { "1.7" => "1.5" }, 72 + } 73 + end 74 + 75 + # walk the directory of RCS files, create a "files" record for each one, 76 + # then run `rlog` on it and create a "revisions" record for each 51 77 sc.recursively_scan 78 + 79 + # group revisions into changesets by date/author/message, or for newer 80 + # commits, their stored commitid 52 81 sc.group_into_changesets 82 + 83 + # make sure every revision is accounted for 53 84 sc.stray_commitids_to_changesets 54 - sc.fill_in_changeset_data 55 85 56 - sc.repo_surgery(CVSTMP, CVSROOT, tree) 86 + # assign a canonical date/message/order to each changeset 87 + sc.fill_in_changeset_data 57 88 58 - sc.outputter.changelog("cvs.openbsd.org", 59 - f = File.open("out/Changelog-#{tree}", "w+")) 60 - f.close 89 + # check out the cvs tree in CVSTMP/tree and place each dead-1.1 file at its 90 + # initial non-dead revision found during `rlog` 91 + sc.stage_tmp_cvs(CVSTMP, CVSROOT, tree) 61 92 62 - sc.outputter.history(f = File.open("out/history-#{tree}", "w+")) 63 - f.close 93 + # calculate a hash for each commit by running 'cvs show' on it, and store it 94 + # in the commitids-{tree} file 95 + sc.recalculate_commitids(CVSTMP, CVSROOT, tree, GENESIS) 64 96 65 - sc.outputter.dup_script(f = File.open("out/add_commitids_to_#{tree}.sh", 66 - "w+"), tree) 67 - f.close 97 + # and finally, update every revision of every file and write its calculated 98 + # commitid, possibly replacing the random one already there 99 + sc.repo_surgery(CVSTMP, CVSROOT, tree) 68 100 end