@recaptime-dev's working patches + fork for Phorge, a community fork of Phabricator. (Upstream dev and stable branches are at upstream/main and upstream/stable respectively.)
hq.recaptime.dev/wiki/Phorge
phorge
phabricator
1<?php
2
3final class PhabricatorRobotsPlatformController
4 extends PhabricatorRobotsController {
5
6 protected function newRobotsRules() {
7 $out = array();
8
9 // Prevent indexing of '/diffusion/', since the content is not generally
10 // useful to index, web spiders get stuck scraping the history of every
11 // file, and much of the content is Ajaxed in anyway so spiders won't even
12 // see it. These pages are also relatively expensive to generate.
13
14 // Note that this still allows commits (at '/rPxxxxx') to be indexed.
15 // They're probably not hugely useful, but suffer fewer of the problems
16 // Diffusion suffers and are hard to omit with 'robots.txt'.
17
18 $out[] = 'User-Agent: *';
19 $out[] = 'Disallow: /diffusion/';
20 $out[] = 'Disallow: /source/';
21 // See T15670. Also prevent directly accessing commits in Diffusion.
22 $out[] = 'Disallow: /r*';
23
24 // See T15662. Prevent indexing line anchor links in Pastes. Per RFC 9309
25 // section 2.2.3, percentage-encode "$" to avoid interpretation as end of
26 // match pattern. However, crawlers may not abide by it but follow the
27 // original standard at https://www.robotstxt.org/orig.html with no mention
28 // how to interpret characters like "$" and thus entirely ignore this rule.
29 $out[] = 'Disallow: /P*%24*';
30
31 // Add a small crawl delay (number of seconds between requests) for spiders
32 // which respect it. The intent here is to prevent spiders from affecting
33 // performance for users. The possible cost is slower indexing, but that
34 // seems like a reasonable tradeoff, since most Phabricator installs are
35 // probably not hugely concerned about cutting-edge SEO.
36 $out[] = 'Crawl-delay: 1';
37
38 return $out;
39 }
40
41}