@recaptime-dev's working patches + fork for Phorge, a community fork of Phabricator. (Upstream dev and stable branches are at upstream/main and upstream/stable respectively.) hq.recaptime.dev/wiki/Phorge
phorge phabricator
at upstream/main 41 lines 1.7 kB view raw
1<?php 2 3final class PhabricatorRobotsPlatformController 4 extends PhabricatorRobotsController { 5 6 protected function newRobotsRules() { 7 $out = array(); 8 9 // Prevent indexing of '/diffusion/', since the content is not generally 10 // useful to index, web spiders get stuck scraping the history of every 11 // file, and much of the content is Ajaxed in anyway so spiders won't even 12 // see it. These pages are also relatively expensive to generate. 13 14 // Note that this still allows commits (at '/rPxxxxx') to be indexed. 15 // They're probably not hugely useful, but suffer fewer of the problems 16 // Diffusion suffers and are hard to omit with 'robots.txt'. 17 18 $out[] = 'User-Agent: *'; 19 $out[] = 'Disallow: /diffusion/'; 20 $out[] = 'Disallow: /source/'; 21 // See T15670. Also prevent directly accessing commits in Diffusion. 22 $out[] = 'Disallow: /r*'; 23 24 // See T15662. Prevent indexing line anchor links in Pastes. Per RFC 9309 25 // section 2.2.3, percentage-encode "$" to avoid interpretation as end of 26 // match pattern. However, crawlers may not abide by it but follow the 27 // original standard at https://www.robotstxt.org/orig.html with no mention 28 // how to interpret characters like "$" and thus entirely ignore this rule. 29 $out[] = 'Disallow: /P*%24*'; 30 31 // Add a small crawl delay (number of seconds between requests) for spiders 32 // which respect it. The intent here is to prevent spiders from affecting 33 // performance for users. The possible cost is slower indexing, but that 34 // seems like a reasonable tradeoff, since most Phabricator installs are 35 // probably not hugely concerned about cutting-edge SEO. 36 $out[] = 'Crawl-delay: 1'; 37 38 return $out; 39 } 40 41}