16 February 2002. Add Sun (which points to a 1998 article on robots.txt revelations: http://www.eiffel.com/private/meyer/robots.html )

15 February 2002. Add sites:

The Internet Movie Database, Defense Intelligence Agency, Argonne National Laboratory, Princeton University, American Airlines (informative, backdoor?), Disney, Electronic Data Systems (EDS) (careers/closer_look?), Center for Disease Control, US Agency for International Development, Department of Commerce (China?), Food and Drug Administration (area51?), Department of Health and Human Services, National Science Foundation (infonerdish).

14 February 2002. Thanks to S.

As noted by S, a site's file tree may be partially examined by calling up its robots.txt file if the site uses that method to exclude file access. To access a site's robots.txt:

http://www.site.name/robots.txt

White House (informative), European Union, United Nations (odd), NSA, DSD, FBI, Army, Air Force, House of Representatives, Department of Justice, US Courts, Treasury Department, IRS, Los Alamos National Laboratory (informative), Lawrence Livermore National Laboratory (jed's killers?) Verisign (most informative), New York Times, Morgan Stanley (informative), Citibank, Yale (Napster?), Stanford, MIT, Federation of American Scientists, Safeweb, Anonymizer (informative), EFF, Cryptome.

If you locate sites with really scandalous directories -- sex, porn, crime, payoffs, codewords, classified material, security holes, backdoors -- archive the evidence and let us know; send to: jya@pipeline.com


White House

http://www.whitehouse.gov/robots.txt

# robots.txt for http://www.whitehouse.gov/

User-agent:     *
Disallow:       /cgi-bin
Disallow:       /search
Disallow:       /query.html
Disallow:       /help
Disallow:       /afac/index.htm/text
Disallow:       /afac/text
Disallow:       /appointments/text
Disallow:       /cea/text
Disallow:       /ceq/text
Disallow:       /contact/text
Disallow:       /dpc/text
Disallow:       /email/text
Disallow:       /energy/text
Disallow:       /espanol/text
Disallow:       /firstlady/images/text
Disallow:       /firstlady/news-speeches/releases/print/text
Disallow:       /firstlady/news-speeches/releases/text
Disallow:       /firstlady/news-speeches/speeches/print/text
Disallow:       /firstlady/news-speeches/speeches/text
Disallow:       /firstlady/news-speeches/text
Disallow:       /firstlady/photoessay/text
Disallow:       /firstlady/text
Disallow:       /fsbr/text
Disallow:       /government/handbook/text
Disallow:       /government/images/text
Disallow:       /government/text
Disallow:       /greeting/text
Disallow:       /history/art/images/text
Disallow:       /history/art/text
Disallow:       /history/eeobtour/images/text
Disallow:       /history/eeobtour/text
Disallow:       /history/firstladies/text
Disallow:       /history/presidents/text
Disallow:       /history/text
Disallow:       /history/tours/print/text
Disallow:       /history/tours/text
Disallow:       /history/whtour/images/text
Disallow:       /history/whtour/text
Disallow:       /holiday/text
Disallow:       /homeland/text
Disallow:       /infocus/defense/text
Disallow:       /infocus/economy/text
Disallow:       /infocus/education/states/text
Disallow:       /infocus/education/text
Disallow:       /infocus/energy/text
Disallow:       /infocus/environment/text
Disallow:       /infocus/faith-based/text
Disallow:       /infocus/medicare/text
Disallow:       /infocus/social-security/text
Disallow:       /infocus/tax-relief/text
Disallow:       /infocus/text
Disallow:       /kids/abc/text
Disallow:       /kids/album/text
Disallow:       /kids/barney/text
Disallow:       /kids/connection/text
Disallow:       /kids/contact/text
Disallow:       /kids/dreamteam/text
Disallow:       /kids/firstlady/text
Disallow:       /kids/guide/print/text
Disallow:       /kids/guide/text
Disallow:       /kids/holiday/text
Disallow:       /kids/india/text
Disallow:       /kids/mrscheney/text
Disallow:       /kids/ofelia/text
Disallow:       /kids/president/text
Disallow:       /kids/quiz/text
Disallow:       /kids/spotty/text
Disallow:       /kids/teeball/text
Disallow:       /kids/teeball2/text
Disallow:       /kids/teeball3/text
Disallow:       /kids/text
Disallow:       /kids/timeline/text
Disallow:       /kids/tour/text
Disallow:       /kids/vicepresident/text
Disallow:       /library/omb/text
Disallow:       /mrscheney/news/text
Disallow:       /mrscheney/text
Disallow:       /national-anthem/text
Disallow:       /nec/text
Disallow:       /news/briefings/print/text
Disallow:       /news/briefings/text
Disallow:       /news/freedominitiative/text
Disallow:       /news/images/text
Disallow:       /news/nominations/text
Disallow:       /news/orders/text
Disallow:       /news/press/radio/text
Disallow:       /news/press/text
Disallow:       /news/print/releases/text
Disallow:       /news/print/text
Disallow:       /news/proclamations/text
Disallow:       /news/radio/print/text
Disallow:       /news/radio/text
Disallow:       /news/releases/2001/01/images/print/text
Disallow:       /news/releases/2001/01/images/text
Disallow:       /news/releases/2001/01/print/text
Disallow:       /news/releases/2001/01/text
Disallow:       /news/releases/2001/02/images/print/text
Disallow:       /news/releases/2001/02/images/text
Disallow:       /news/releases/2001/02/print/text
Disallow:       /news/releases/2001/02/text
Disallow:       /news/releases/2001/03/images/print/text
Disallow:       /news/releases/2001/03/images/text
Disallow:       /news/releases/2001/03/print/text
Disallow:       /news/releases/2001/03/text
Disallow:       /news/releases/2001/04/images/print/text
Disallow:       /news/releases/2001/04/images/text
Disallow:       /news/releases/2001/04/print/text
Disallow:       /news/releases/2001/04/text
Disallow:       /news/releases/2001/05/images/print/text
Disallow:       /news/releases/2001/05/images/text
Disallow:       /news/releases/2001/05/print/text
Disallow:       /news/releases/2001/05/text
Disallow:       /news/releases/2001/06/images/print/text
Disallow:       /news/releases/2001/06/images/text
Disallow:       /news/releases/2001/06/print/text
Disallow:       /news/releases/2001/06/text
Disallow:       /news/releases/2001/07/images/print/text
Disallow:       /news/releases/2001/07/images/text
Disallow:       /news/releases/2001/07/print/text
Disallow:       /news/releases/2001/07/text
Disallow:       /news/releases/2001/08/images/print/text
Disallow:       /news/releases/2001/08/images/text
Disallow:       /news/releases/2001/08/print/text
Disallow:       /news/releases/2001/08/text
Disallow:       /news/releases/2001/09/images/print/text
Disallow:       /news/releases/2001/09/images/text
Disallow:       /news/releases/2001/09/print/text
Disallow:       /news/releases/2001/09/text
Disallow:       /news/releases/2001/10/images/print/text
Disallow:       /news/releases/2001/10/images/text
Disallow:       /news/releases/2001/10/print/text
Disallow:       /news/releases/2001/10/text
Disallow:       /news/releases/2001/11/images/print/text
Disallow:       /news/releases/2001/11/images/text
Disallow:       /news/releases/2001/11/print/text
Disallow:       /news/releases/2001/11/text
Disallow:       /news/releases/2001/12/images/print/text
Disallow:       /news/releases/2001/12/images/text
Disallow:       /news/releases/2001/12/print/text
Disallow:       /news/releases/2001/12/text
Disallow:       /news/releases/2002/01/images/print/text
Disallow:       /news/releases/2002/01/images/text
Disallow:       /news/releases/2002/01/print/text
Disallow:       /news/releases/2002/01/text
Disallow:       /news/releases/print/text
Disallow:       /news/releases/text
Disallow:       /news/reports/text
Disallow:       /news/text
Disallow:       /news/usbudget/blueprint/text
Disallow:       /news/usbudget/states/print/text
Disallow:       /news/usbudget/states/text
Disallow:       /nsc/text
Disallow:       /oa/foia/text
Disallow:       /oa/jobs/text
Disallow:       /oa/oapo/text
Disallow:       /oa/text
Disallow:       /omb/budget/fy2002/text
Disallow:       /omb/budget/text
Disallow:       /omb/bulletins/text
Disallow:       /omb/circulars/a001/text
Disallow:       /omb/circulars/a016/text
Disallow:       /omb/circulars/a019/text
Disallow:       /omb/circulars/a021/text
Disallow:       /omb/circulars/a025/text
Disallow:       /omb/circulars/a034/text
Disallow:       /omb/circulars/a045/text
Disallow:       /omb/circulars/a050/text
Disallow:       /omb/circulars/a076/text
Disallow:       /omb/circulars/a087/text
Disallow:       /omb/circulars/a089/text
Disallow:       /omb/circulars/a094/text
Disallow:       /omb/circulars/a097/text
Disallow:       /omb/circulars/a102/text
Disallow:       /omb/circulars/a11/text
Disallow:       /omb/circulars/a110/text
Disallow:       /omb/circulars/a119/text
Disallow:       /omb/circulars/a122/text
Disallow:       /omb/circulars/a123/text
Disallow:       /omb/circulars/a126/text
Disallow:       /omb/circulars/a127/text
Disallow:       /omb/circulars/a129/text
Disallow:       /omb/circulars/a130/text
Disallow:       /omb/circulars/a131/text
Disallow:       /omb/circulars/a133/text
Disallow:       /omb/circulars/a133_compliance/00/text
Disallow:       /omb/circulars/a133_compliance/text
Disallow:       /omb/circulars/a134/text
Disallow:       /omb/circulars/a135/text
Disallow:       /omb/circulars/text
Disallow:       /omb/credit.bak/text
Disallow:       /omb/credit/text
Disallow:       /omb/fedreg/text
Disallow:       /omb/financial/text
Disallow:       /omb/foia/text
Disallow:       /omb/gils/text
Disallow:       /omb/grants/text
Disallow:       /omb/inforeg/text
Disallow:       /omb/legislative/7day/text
Disallow:       /omb/legislative/paygo/text
Disallow:       /omb/legislative/sap/105-1/text
Disallow:       /omb/legislative/sap/105-2/text
Disallow:       /omb/legislative/sap/106-1/text
Disallow:       /omb/legislative/sap/106-2/text
Disallow:       /omb/legislative/sap/107-1/appropriations/text
Disallow:       /omb/legislative/sap/107-1/number/text
Disallow:       /omb/legislative/sap/107-1/subcommittee/text
Disallow:       /omb/legislative/sap/107-1/text
Disallow:       /omb/legislative/sap/107-2/text
Disallow:       /omb/legislative/sap/1997/text
Disallow:       /omb/legislative/sap/1998/text
Disallow:       /omb/legislative/sap/1999/text
Disallow:       /omb/legislative/sap/2000/text
Disallow:       /omb/legislative/sap/text
Disallow:       /omb/legislative/testimony/text
Disallow:       /omb/legislative/text
Disallow:       /omb/memoranda/text
Disallow:       /omb/mgmt-gpra/text
Disallow:       /omb/organization/text
Disallow:       /omb/procurement/text
Disallow:       /omb/pubpress/text
Disallow:       /omb/recruitment/text
Disallow:       /omb/reports/text
Disallow:       /omb/text
Disallow:       /omb/whatsnew/text
Disallow:       /onap/text
Disallow:       /pfiab/text
Disallow:       /president/100days/text
Disallow:       /president/american-flag/text
Disallow:       /president/attack-response/text
Disallow:       /president/domestic-gallery/text
Disallow:       /president/gallery/photoessay/text
Disallow:       /president/gallery/text
Disallow:       /president/heartland-tour-gallery/text
Disallow:       /president/holiday/cards/text
Disallow:       /president/holiday/cheer/text
Disallow:       /president/holiday/deck-halls/text
Disallow:       /president/holiday/decorations/text
Disallow:       /president/holiday/hanukkah/text
Disallow:       /president/holiday/tree/text
Disallow:       /president/holiday/whtree/text
Disallow:       /president/images/text
Disallow:       /president/independence-day/text
Disallow:       /president/international-gallery/text
Disallow:       /president/intl-gallery2/text
Disallow:       /president/intl-gallery3/text
Disallow:       /president/intl-gallery4/text
Disallow:       /president/intl-gallery5/text
Disallow:       /president/intl-gallery6/text
Disallow:       /president/presidential-homes/text
Disallow:       /president/putin-visit/text
Disallow:       /president/statedinner-mexico-200109/text
Disallow:       /president/statedinnerprep-mexico-200109/text
Disallow:       /president/statevisitday2-mexico-200109/TEMP/text
Disallow:       /president/statevisitday2-mexico-200109/text
Disallow:       /president/tee-ball-01/text
Disallow:       /president/tee-ball-02/text
Disallow:       /president/tee-ball-03/text
Disallow:       /president/text
Disallow:       /president/world-leaders/text
Disallow:       /response/diplomatic/text
Disallow:       /response/military/text
Disallow:       /response/text
Disallow:       /text
Disallow:       /vicepresident/images/text
Disallow:       /vicepresident/news-speeches/speeches/print/text
Disallow:       /vicepresident/news-speeches/speeches/text
Disallow:       /vicepresident/news-speeches/text
Disallow:       /vicepresident/photoessay/text
Disallow:       /vicepresident/text
Disallow:       /whmo/text


User-agent:     whsearch
Disallow:       /cgi-bin
Disallow:       /search
Disallow:       /query.html
Disallow:       /help
Disallow:       /sitemap.html
Disallow:       /privacy.html
Disallow:       /accessibility.html


European Union

http://www.europa.eu.int/robots.txt

# robots.txt for EUROPA httpd-80 production server
#
# created by Rudi Mosselmans on 8/10/96
#
User-agent: *           # match any robot name
Disallow: /cgi-bin/     # don't allow robots into cgi-bin
Disallow: /comm/agriculture/rica/dwh/   # prevent robots from overrunning SAS
Disallow: /comm/commissioners/liikanen/_ # Albert Rouben 20010918
Disallow: /comm/commissioners/liikanen/bin # Albert Rouben 20010918


United Nations

http://www.un.org/robots.txt

# robots.txt for http://www.cyberschoolbus.org/
User-agent: *
Disallow: 


National Security Agency

http://www.nsa.gov/robots.txt

User-agent: *
Disallow: /images/
Disallow: /templates/
Disallow: *.gif
Disallow: notice.html
Disallow: statistics.html


Australian Defense Signals Direcorate

http://www.dsd.gov.au/robots.txt

User-agent: *
Disallow: /cgi-bin/sources


Federal Bureau of Investigation

http://www.fbi.gov/robots.txt

User-Agent:
Disallow:


US Army

http://www.army.mil/robots.txt

User-agent: * 
Disallow: /cgi-bin/ 
Disallow: /reports/ 
Disallow: /summary/ 
Disallow: /old_design_ahp/ 
Disallow: /beta/ 
Disallow: /Documentation/ 
Disallow: /old_design_ahp/ 
Disallow: /images/ 
Disallow: /logs/ 
Disallow: /monthly/ 
Disallow: /photos/ 
Disallow: /vetinfo/ 
Disallow: /100days/ 


US Air Force

http://www.af.mil/robots.txt

#robots.txt to skip over CGI directories
User-agent: *
Disallow: /cgi-bin/
Disallow: /passcgi/
Disallow: /tmp/


House of Representatives

http://www.house.gov/robots.txt

#
#   No robots allowed in the following directories !
#
User-agent: *
Disallow: /htbin
Disallow: /docs/ARCHIVE
Disallow: /docs/apps
Disallow: /docs/moved_sites
Disallow: /docs/temp
Disallow: /docs/test
Disallow: /docs/sites/bin
Disallow: /docs/sites/etc
Disallow: /docs/sites/dev
Disallow: /docs/sites/usr
Disallow: /docs/sites/other/webassistance


Department of Justice

http://www.usdoj.gov/robots.txt

User-agent: *
Disallow: /Admin/
Disallow: /cgi-bin/
Disallow: /help/
Disallow: /img/
Disallow: /gif/
Disallow: /ins/
Disallow: /gopherdata/
Disallow: /ojp/
Disallow: /wusage/
Disallow: /archive/
Disallow: /opa/pr/support/

User-agent: Netscape-Compass-Robot/Archive
Disallow: 


US Courts

http://www.uscourts.gov/robots.txt

User-Agent:
Disallow:


Treasury Department

http://www.treasury.gov/robots.txt

User-agent: *
Disallow: /cgi-bin/
Disallow: /getstats/
Disallow: /home-temp/
Disallow: /logs/
Disallow: /new.junk/
Disallow: /public/
Disallow: /statbot/
Disallow: /templates/
Disallow: /webcache/
Disallow: /test/


Internal Revenue Service

http://www.irs.gov/robots.txt

User-agent: *
Disallow: /foo/foobar.html
Disallow: /barfoo


Los Alamos National Laboratory

http://www.lanl.gov/robots.txt

User-agent: *
Disallow: /tools/hypermail/
Disallow: /projects/etcap/bib/
Disallow: /orgs/cic/cic1/testsite
Disallow: /orgs/im/im1/testsite
Disallow: /projects/asci/statusreports/
Disallow: /projects/asci/ascijobs/
Disallow: /projects/asci/OLD_ARCHIVE/
Disallow: /projects/asci/bluemtn/OLD_BLUE_ARCHIVE/
Disallow: /projects/sme/OLD_ARCHIVE/
Disallow: /www-team/
Disallow: /projects/wwwug/OLD_STUFF
Disallow: /projects/asci/DCE/OLD
Disallow: /orgs/citpo


Lawrence Livermore National Laboratory

http://www.llnl.gov/robots.txt

# robots.txt file for www.llnl.gov
User-agent: *           # all web crawlers and searchers
Disallow: /tmp/         # temp files
#Disallow: /www/llnl-bin/       # stay out of binaries
#Disallow: /www/llnl_only       # stay out of internal
#Disallow: /www/llnl_only-bin   # stay out of internal binaries
#Disallow: /www/review          # stay out of unreviewed pages
# This is how Lee thinks this should look
Disallow: cgi-bin/              # stay out of binaries
Disallow: llnl-bin/     # stay out of binaries
Disallow: /llnl-bin/    # stay out of binaries
#Disallow: /llnl_only/  # stay out of internal
#          disallowed by httpd server
Disallow: /llnl_only-bin/       # stay out of internal binaries
Disallow: /development/ # Stay out of development
Disallow: /development-bin/  # stay out of development-bin dirs
Disallow: /review/              # stay out of unreviewed pages
Disallow: /stats/               # stay out of statistics pages
Disallow: /llnl_only/stats/             # stay out of statistics pages
Disallow: /llnl/lists/historyarc  # Stay out of list-of-lists history
Disallow: /historyarc/  # Stay out of list-of-lists history
Disallow: atp/comprehensive2-95.html  # jed's killers
Disallow: atp/www-servers.html
Disallow: atp/telecom-media.html
Disallow: /atp/crackdown/               # wrong stuff
Disallow: llnl_only/tid/lof/test        # Library of Future test files
Disallow: llnl/lists/   # memory fault problems
Disallow: /www/IPandC/opportunities93   # obsolete pages
Disallow: /www/tid/lof/documents        # lof pages, index them manually

_________________

[Thanks to SP.]

"jed's killers" may have to do with security -

 "I am a person who likes challenges, physical and mental. My university degrees are bachelors in Mathematics and Physics, and a Masters in Mathematics from the Davis campus of the University of California (where I REALLY enjoyed school and packed in every class I possibly could). I was a hacker coming out of school. I dropped into a wonderful environment at the Lawrence Livermore Laboratory where I was able to hack the early ARPA network (circa 1973) implementing network protocols and breaking into computer systems on the net as part of a "tiger" team. I was the technical liaison for LLNL to the ARPA network during the middle 1970s. Although my given name is James, there were too many kids that answered to "Jim" in high school, so I changed my preferred nickname to "Jed", my initials." - http://www.webstart.com/jed/jed-personal.html


Verisign

http://verisign.com/robots.txt

User-Agent: *
Disallow: /about/
Disallow: /aol/
Disallow: /att/
Disallow: /authentic/
Disallow: /aventail/
Disallow: /b2b/
Disallow: /cd/
Disallow: /cdrom/
Disallow: /checkpoint/
Disallow: /client/
Disallow: /clientauth/
Disallow: /contact/
Disallow: /cps/
Disallow: /criticalpath/
Disallow: /cus/
Disallow: /demos/
Disallow: /developers/
Disallow: /dm/
Disallow: /domain/
Disallow: /ebiz/
Disallow: /employment/
Disallow: /error/
Disallow: /events/
Disallow: /exchange/
Disallow: /feature/
Disallow: /gov/
Disallow: /government/
Disallow: /graphics/
Disallow: /idcenter/
Disallow: /images/
Disallow: /installshield/
Disallow: /investor/
Disallow: /its/
Disallow: /japan/
Disallow: /learn/
Disallow: /library/
Disallow: /link/
Disallow: /lobby/
Disallow: /mcsp/
Disallow: /microsoft/
Disallow: /netscape/
Disallow: /microsoft/
Disallow: /msmail/
Disallow: /netsure/
Disallow: /newballgame/
Disallow: /nike/
Disallow: /nowsafe/
Disallow: /nsi/
Disallow: /nspremsvcs/
Disallow: /offer/
Disallow: /onsite/
Disallow: /partner/
Disallow: /payment/
Disallow: /press/
Disallow: /product/
Disallow: /rpa/
Disallow: /rpa-kr/
Disallow: /rsa2000/
Disallow: /rsc/
Disallow: /securemail/
Disallow: /server/
Disallow: /servicecenter/
Disallow: /services/
Disallow: /set/
Disallow: /sia/
Disallow: /signio/
Disallow: /site/
Disallow: /smime/
Disallow: /solutions/
Disallow: /spectrum/
Disallow: /spt/
Disallow: /supporyt/
Disallow: /trial/
Disallow: /transarc/
Disallow: /update/
Disallow: /valid/
Disallow: /vpnseminar/
Disallow: /vselp/
Disallow: /webtrust/
Disallow: /westgroup/
Disallow: /whitepaper/
Disallow: /win2000/
Disallow: /wireless/
Disallow: /y2k/


New York Times

http://www.nytimes.com/robots.txt

# robots.txt, nytimes.com 1/18/2001
#

User-agent: *
Disallow:   /96
Disallow:   /97
Disallow:   /98
Disallow:   /99
Disallow:   /00
Disallow:   /01
Disallow:   /1996
Disallow:   /1997
Disallow:   /1998
Disallow:   /1999
Disallow:   /2000
Disallow:   /2001
Disallow:   /library
Disallow:   /aponline
Disallow:   /reuters
Disallow:   /cnet
Disallow:   /partners
Disallow:   /archives
Disallow:   /indexes
Disallow:   /events
Disallow:   /features
Disallow:   /reference
Disallow:   /specials
Disallow:   /services
Disallow:   /thestreet
Disallow:   /weather
Disallow:   /RealMedia


Morgan Stanley

http://www.morganstanley.com/robots.txt

User-agent: *
Disallow: /institutional/investmentmanagement/10
Disallow: /institutional/investmentmanagement/20
Disallow: /institutional/investmentmanagement/30
Disallow: /institutional/investmentmanagement/40
Disallow: /institutional/investmentmanagement/50
Disallow: /institutional/investmentmanagement/hnavs
Disallow: /institutional/investmentmanagement/products
Disallow: /institutional/investmentmanagement/clbuttons
Disallow: /institutional/investmentmanagement/img
Disallow: /institutional/investmentmanagement/cgi-bin/msdwim/parser.pl
Disallow: /institutional/investmentmanagement/cgi-bin/msdwim/siteSearch.pl
Disallow: /institutional/investmentmanagement/cgi-bin/msdwim/productSearch.pl
Disallow: /institutional/investmentmanagement/70/71
Disallow: /institutional/investmentmanagement/70/72
Disallow: /institutional/investmentmanagement/70/73
Disallow: /institutional/investmentmanagement/70/74
Disallow: /institutional/investmentmanagement/70/75


Citibank

http://www.citibank.com/robots.txt

# robots.txt for http://www.citicorp.com/

User-agent: *
Disallow: /cgi-bin/             # scripts
Disallow: /usage/               # WWW usage statistics
Disallow: /statistics/          # UNIX statistics
Disallow: /wwwstat/             # More WWW stats
Disallow: /accesswatch/         # Even More WWW stats
Disallow: /branches/AP          # Asian/Pacific Branches (too many...)
Disallow: /branches/EU          # European Branches (too many...)
Disallow: /branches/LA          # Latin American Branches (too many...)
Disallow: /branches/NA          # North American Branches (too many...)


Yale University

http://www.yale.edu/robots.txt

User-agent: *
Disallow: /engineering/
Disallow: /webmaster/stats/
Disallow: /webmaster/logs/
Disallow: /napster/


Stanford University

http://www.stanford.edu/leland/robots.txt

# Robot Policy file as per Robot Exclusion standard 17-jun-94
User-agent: Lycos
Disallow: /
User-agent: Lycos_Spider_(T-Rex)/1.0
Disallow: /
User-agent: Lycos_Spider_(T-Rex)/3.0
Disallow: /


Massachusetts Institute of Technology

http://www.mit.edu/robots.txt

# robots.txt for http://www.mit.edu/

User-agent: *
Disallow: /cgi/
Disallow: /comment
Disallow: /finger
Disallow: /machine
Disallow: /zlocate
Disallow: /zwrite


Federation of American Scientists

http://fas.org/robots.txt

User-agent: *   
Disallow: /eye/kosovo
User-agent: ia_archiver   
Disallow: /irp/overhead/
User-agent: ia_archiver
Disallow: /irp/facilities/


Safeweb

http://safeweb.com/robots.txt

# exclude help system from robots
User-agent: *
Disallow: /manual/ 
Disallow: /doc/ 
Disallow: /gif/
# but allow htdig to index our doc-tree
User-agent: susedig
Disallow:


Anonymizer

http://anonymizer.com/robots.txt

User-agent: *
Disallow: /china/
Disallow: /documents/
Disallow: /errors/
Disallow: /images/
Disallow: /includes/
Disallow: /india/
Disallow: /japan/
Disallow: /styles/
Disallow: /cgi-bin/


Electronic Frontier Foundation

http://www.eff.org/robots.txt

User-agent: *    # applies to all robots
Disallow: /temp      # disallow indexing of these pages
Disallow: /test
Disallow: /Temp
Disallow: /Test
Disallow: /tmp
Disallow: /Tmp
Disallow: /templates
Disallow: /Templates
Disallow: /internal
Disallow: /Internal
Disallow: /staff
Disallow: /Staff
Disallow: /old
Disallow: /Old
Disallow: /duh
Disallow: /homes/mech/Temp
Disallow: /homes/mech/A-G
Disallow: /~mech/Temp
Disallow: /~mech/A-G
Disallow: /˜mech/Temp
Disallow: /˜mech/A-G
Disallow: /˜mech/Temp
Disallow: /˜mech/A-G
Disallow: /˜mech/Temp
Disallow: /˜mech/A-G
Disallow: /%7Emech/Temp
Disallow: /%7emech/A-G


Cryptome

http://cryptome.org/robots.txt

# go away
User-agent: *
Disallow: /


Added 15 February 2002

[Thanks to BH.]

LOL I like the message at the end of this one. :)

The Internet Movie Database

http://us.imdb.com/robots.txt

# robots.txt for http://us.imdb.com/  & mirror sites
User-agent: *
Disallow: /MyMovies
Disallow: /register
Disallow: /tiger_redirect
Disallow: /Title/ASIN*
Disallow: /M/
Disallow: /Ballot/
Disallow: /Icons/
Disallow: /Movies/
Disallow: /harvest_me
Disallow: /Tsearch
Disallow: /Nsearch
Disallow: /Credits
Disallow: /Details
Disallow: /More
Disallow: /Bio
Disallow: /List
Disallow: /GName
Disallow: /SName
Disallow: /FName
Disallow: /AName
Disallow: /RName
Disallow: /PName
Disallow: /VName
Disallow: /Movies
Disallow: /Companies
Disallow: /Mlinks
Disallow: /Guests
Disallow: /Quotes
Disallow: /OnThisDay
Disallow: /BusinessThisDay
Disallow: /Goofs
Disallow: /Trivia
Disallow: /Goofs
Disallow: /Soundtracks
Disallow: /CrazyCredits
Disallow: /AlternateVersions
Disallow: /Recommendations
Disallow: /AddRecommendation
Disallow: /Reviews
Disallow: /Tawards
Disallow: /Ratings
Disallow: /Awards
Disallow: /Sales
Disallow: /SearchBios
Disallow: /BTrivia
Disallow: /BQuotes
Disallow: /BWorks
Disallow: /BPublicity
Disallow: /SearchQuotes
Disallow: /BAgent
Disallow: /Business
Disallow: /Taglines
Disallow: /ReleaseDates
Disallow: /Locations
Disallow: /Technical
Disallow: /Laserdisc
Disallow: /DVD
Disallow: /Laserdisc
Disallow: /Literature
Disallow: /Trailers
Disallow: /NUrls
Disallow: /TUrls
Disallow: /Ratings
Disallow: /OnTV
Disallow: /Ontv
Disallow: /Psales
Disallow: /Pawards
Disallow: /Posters
Disallow: /Showing
Disallow: /Quiz
Disallow: /BornInYear
Disallow: /DiedInYear
Disallow: /MarriedInYear
Disallow: /ExciteTitle
Disallow: /TitleBrowse
Disallow: /Vote
Disallow: /WorkedWith
Disallow: /Character
Disallow: /SearchTrivia
Disallow: /SearchLiterature
Disallow: /SearchGoofs
Disallow: /SearchTechnical
Disallow: /SearchRatios
Disallow: /SearchBusiness
Disallow: /SearchLaserdisc
Disallow: /SearchDVD
Disallow: /SearchAwards
Disallow: /SearchSongs
Disallow: /SearchVersions
Disallow: /SearchCrazy
Disallow: /SearchPlots
Disallow: /SearchPlotWriters
Disallow: /SearchTaglines
Disallow: /ShowAll
Disallow: /LocationTree
Disallow: /JointVentures
Disallow: /pick_n_mix
Disallow: /prepare_data
Disallow: /name_pick_n_mix
Disallow: /MetaSearch
Disallow: /HelpPage
Disallow: /ActorSearch
Disallow: /ActressSearch
Disallow: /BornWhere
Disallow: /Overlap
Disallow: /ReleasedInYear
Disallow: /DiedWhere
Disallow: /Maltin
Disallow: /CommentsShow
Disallow: /CommentsEnter
Disallow: /CommentsAuthor
Disallow: /CommentsIndex
Disallow: /Showtimes
Disallow: /Find
Disallow: /Lookup
Disallow: /boards

# stop reading here! :-)

# Stay out of these directories they contain temporary files, or will
#  cause unwanted DATABASE QUERY system load or BOTH.

# Robots that cause a denial of service will be charged $0.01 per unwanted
# request to this site. Follow the guidelines + robots.txt convention and
# we can all live happily together.
#
# If you have any questions or need permission to work inside of the
# protected URLs, please contact www @ imdb.com


#  Mozilla/4.0 (compatible; MSIE 5.12; Mac_PowerPC) ... we are watching you!


Defense Intelligence Agency

http://www.dia.mil/robots.txt

# robots.txt
#
# purpose:  Exclude robots from directories. 
#
# description:  A 'robots.txt' file consists of 'records'
# which are composed of a 'User-agent' line followed by 
# some number of 'Disallow' directives.  There should NOT 
# be any blank lines between the 'User-agent' and 'Disallow'
# directives.  There may be multiple 'records' for different
# named 'User-agents'.
#
# modification history
# ----------------------------------------------------------
# 20000619 Created
#
# ----------------------------------------------------------

User-agent: *
Disallow: /Business
Disallow: /Careers
Disallow: /Errors
Disallow: /Graphics
Disallow: /History
Disallow: /Jmic
Disallow: /Public
Disallow: /This
Disallow: /bin


Argonne National Laboratory

http://www.anl.gov/robots.txt

# robots.txt for www.anl.gov

User-agent: *
Disallow: /x500
Disallow: /ECT/et
Disallow: /ECT/templates
Disallow: /pjbmosaic
Disallow: /ECT/djs
Disallow: /OPA/galley/captions/
Disallow: /CPP/exploit.html
Disallow: /MERTC/


Princeton University

http://www.princeton.edu/robots.txt

# General purpose
User-agent: *
Disallow: /cgi-bin/
Disallow: /cgi/      
Disallow: /~mablount/unix/      #   an infinite virtual URL space
Disallow: /~mablount/mablount/  #   an infinite virtual URL space
Disallow: /~mkporwit/pub_links/ #   an infinite virtual URL space
Disallow: /~sdesiano/pub_links/ #   an infinite virtual URL space
Disallow: /~gmsierra/pub/       #   an infinite virtual URL space
Disallow: /~lhjensen/pub_links/ #   an infinite virtual URL space
Disallow: /~euphorb/Issues/     #   an infinite virtual URL space
Disallow: /~mablount/Submissions/       #   an infinite virtual URL space
Disallow: /~mablount/Staff/     #   an infinite virtual URL space
Disallow: /webinator/
#Disallow: /Princeton/GG/
Disallow: /~financial/          #  Under development
Disallow: /dev/                 #  CIT Web Services Group development area


American Airlines

http://www.aa.com/robots.txt

User-agent: *
Disallow: /404
Disallow: /aa_home
Disallow: /aad/*
Disallow: /adjust_profile.tmpl
Disallow: /away
Disallow: /back.html
Disallow: /backdoor.html
Disallow: /backto.html
Disallow: /bookmark.tmpl
Disallow: /bookmarks
Disallow: /bottom.html
Disallow: /citibank
Disallow: /corporate
Disallow: /countries.txt
Disallow: /default.html
Disallow: /directURL
Disallow: /editorials
Disallow: /entry.tmpl
Disallow: /error
Disallow: /example.html
Disallow: /footerlong.html
Disallow: /footershort.html
Disallow: /frameset.html
Disallow: /global-metanav.html
Disallow: /global-metanav.html.secure
Disallow: /graphics.html
Disallow: /images
Disallow: /index-error.tmpl
Disallow: /index-guest.tmpl
Disallow: /index-member.tmpl
Disallow: /index.tmpl
Disallow: /index_sitedown.html
Disallow: /intro.html
Disallow: /list.txt
Disallow: /main-bookmark.tmpl
Disallow: /main-guest.tmpl
Disallow: /main-member.tmpl
Disallow: /media
Disallow: /message.html
Disallow: /message.html.down
Disallow: /meta
Disallow: /nav-member.tmpl
Disallow: /navguest.tmpl
Disallow: /npd
Disallow: /onlinelogo.gif
Disallow: /pands
Disallow: /profile8.tmpl
Disallow: /promo
Disallow: /prototypes
Disallow: /redirect.links.tmpl
Disallow: /redirect.tmpl
Disallow: /redirect2.tmpl
Disallow: /redirect3.tmpl
Disallow: /redirect4.tmpl
Disallow: /secure
Disallow: /secure2
Disallow: /shop.html
Disallow: /sitetour
Disallow: /specials
Disallow: /specialsfarecontent.tmpl
Disallow: /specialsfaremore.tmpl
Disallow: /targeting
Disallow: /TAusage.html.tmpl
Disallow: /ticonderoga
Disallow: /travelp/*
Disallow: /upgrade.html
Disallow: /wait.html
Disallow: /wait.new.html
#
# USWeb additions
Disallow: /usw-metrics


Disney

http://disney.go.com/robots.txt

#
# robot.txt file for http://disney.go.com/
#

User-agent: *
Disallow: /cgi-bin
Disallow: /Mail
Disallow: /Help
Disallow: /Search
Disallow: /Sign-in
Disallow: /ActiveDesktop
Disallow: /Legal
Disallow: /Ads/
Disallow: /DesignOnline
Disallow: /DisneyCareers
Disallow: /DisneyStore
Disallow: /DisneyWorld/Local
Disallow: /TravelAgents/Booking
Disallow: /DisneyVacationClub/VideoOffer
Disallow: /globalmedia/chrome
Disallow: /dmail 

User-agent: Ultraseek
Disallow: /cgi-bin
Disallow: /Mail
Disallow: /Search
Disallow: /Sign-in
Disallow: /ActiveDesktop
Disallow: /Legal
Disallow: /Ads/
Disallow: /DesignOnline
Disallow: /DisneyWorld/Local
Disallow: /TravelAgents/Booking
Disallow: /DisneyVacationClub/VideoOffer
Disallow: /globalmedia/chrome
Disallow: /dmail


Electronic Data Systems (EDS)

http://www.eds.com/robots.txt

User-agent: *
Disallow: /ads/
Disallow: /database/
Disallow: /excite/
Disallow: /images/
Disallow: /administrative/common_includes/
Disallow: /administrative/common_images/
Disallow: /administrative/maintenance/
Disallow: /administrative/portal/
Disallow: /administrative/stock_info/
Disallow: /administrative/swish/
Disallow: /administrative/excite/
Disallow: /administrative/ir_info/
Disallow: /administrative/images/
Disallow: /administrative/news/
Disallow: /administrative/stock_info/
Disallow: /administrative/search/
Disallow: /95annual_text/
Disallow: /careers/closer_look/
Disallow: /gmec/  
Disallow: /general/workforce_mgmt/client/
Disallow: /case_studies/security/
Disallow: /general/career_resource/


Center for Disease Control

http://www.cdc.gov/robots.txt

# Ignore FrontPage files
User-agent: *
Disallow: /_borders
Disallow: /_derived
Disallow: /_fpclass
Disallow: /_overlay
Disallow: /_private
Disallow: /_themes
Disallow: /_vti_bin
Disallow: /_vti_cnf
Disallow: /_vti_log
Disallow: /_vti_map
Disallow: /_vti_pvt
Disallow: /_vti_txt

# Rover is a bad dog
User-agent: Roverbot
Disallow: /

# EmailSiphon is a hunter/gatherer which extracts email addresses for spam-mailers to use
User-agent: EmailSiphon
Disallow: /


US Agency for International Development

http://www.usaid.gov/robots.txt

User-agent: *
Disallow: /~*/
Disallow: /info_technology/xweb/
Disallow: /economic_growth/egad/
Disallow: /pop_health/dev/
Disallow: /regions/eni/partners/
Disallow: /test/
Disallow: /info_technology/itt/ptt
Disallow: /iss/
Disallow: /regions/afr/mission/test/rcsa/
Disallow: /test/senegal/
Disallow: /info_technology/xweb/stats/
Disallow: /stats/


Department of Commerce

http://www.doc.gov/robots.txt

User agent: *
disallow: /TopicalIndex

User agent: *
disallow: /ops

User agent: *
disallow: /china

User agent: *
disallow: /_BORDERS


User agent: *
disallow: /_CUSUDI

User agent: *
disallow: /_DISC1

User agent: *
disallow: /_DISCU~1

User agent: *
disallow: /_PRIVATE

User agent: *
disallow: /_SDISC1

User agent: *
disallow: /_VTI_BIN

User agent: *
disallow: /_VTI_CNF

User agent: *
disallow: /_VTI_LOG

User agent: *
disallow: /_VTI_PVT

User agent: *
disallow: /_VTI_TXT

User agent: *
disallow: /CRC

User agent: *
disallow: /digeconomy

User agent: *
disallow: /directory

user agent: *
disallow: /ecommerce

user agent: *
disallow: /images

user agent: *
disallow: /pics

user agent: *
disallow: /listserv

user agent: *
disallow: /logos

user agent: *
disallow: /ops

user agent: *
disallow: /samples

User agent: *
disallow: /script3

user agent: *
disallow: /search

user agent: *
disallow: /srchadmin

user agent: *
disallow: /test

user agent: *
disallow: /tperl


Food and Drug Administration

http://www.fda.gov/robots.txt

#robots.txt file for http://www.fda.gov 
User-agent: * 
Disallow: /scripts/ 
Disallow: /data/ 
Disallow: /binn/ 
Disallow: /cder/test/ 
Disallow: /opacom/area51/ 
Disallow: /oha/ 
Disallow: /oc/ofacs/staffmanuals/3130_3.htm 
Disallow: /oc/clips/ 
#Disallow: /cdrh/ftparea/cdrh/MDR/coll/mdr/mdrcoll/ 
Hit-rate: 30 
# wait 30 seconds before starting a new URL request default=30 
Visiting-hours: 23:00EDT-05:00EDT 
#index this site between 11PM - 5AM EDT Concurrent-hits: 2 
# limit concurrent active URLS to 2 for each index server 


Department of Health and Human Services

http://www.hhs.gov/robots.txt

# robots.txt for http://www.hhs.gov
# robots.txt for http://www.os.dhhs.gov
# robots.txt for http://www.dhhs.gov
# robots.txt for http://os.dhhs.gov:80

user-agent: *  # directed to all spiders, not just Scooter
Disallow: /CGI/
Disallow: /cgi-bin/
Disallow: /analog/
Disallow: /analyze/
Disallow: /images/
Disallow: /template/
Disallow: /templates/
Disallow: /sources/
Disallow: /response/
Disallow: /search/
Disallow: /wwwstats/
Disallow: /HTMLgen/
Disallow: /htmlgen/
Disallow: /HTMLGen/
Disallow: /gay/
Disallow: /hrsa/
Disallow: /osophs/
Disallow: /aoa/
Disallow: /epsa/
Disallow: /su_docs/
Disallow: /access_stats
Disallow: /progorg/ohr/forum/EAPopen/
Disallow: /progorg/ohr/forum/EAPpro/
Disallow: /cio/
Disallow: /news/archive/
Disallow: /masters/
Hit-rate: 30
# wait 30 seconds before starting a new URL request default=30
Visiting-hours: 01:00EST-05:00EST
# index this site between 1AM - 5AM EST
Concurrent-hits: 2
# limit concurrent active URLs to 2 for each index server


National Science Foundation

http://www.nsf.gov/robots.txt

# robots.txt for http://www.nsf.gov/
# see <http://web.nexor.co.uk/mak/doc/robots/norobots.html> for an explanation.
# Change history:

User-agent: MOMspider           # The Multi-Owner Maintenance Spider
Disallow: /cgi-bin/             #     Script files
Disallow: /stats/               #     Big statistics files
Disallow: /pubsys/data/         #     ODS database files
Disallow: /search97cgi/vtopic/  #     Disable index search engine
Disallow: /home/ebulletin/archive/  #   skip Ebulletin archive
Disallow: /sbe/srs/start.htm

User-agent: vspider             # Verity spider
Disallow: /cgi-bin/             #     Script files
Disallow: /stats/               #     Big statistics files
Disallow: /home/nsforg/         #     Breaks Verity spider - temporary
Disallow: /awards/              #     Award abstracts
Disallow: /pubsys/data/         #     ODS database files
Disallow: /search97cgi/vtopic/  #     Disable index search engine
Disallow: /seind98/topdemo.htm  # temp
Disallow: /nsf99338/topdemo.htm  # temp
Disallow: /home/ebulletin/archive/  #   skip Ebulletin archive
Disallow: /sbe/srs/start.htm
Disallow: /web/           # authoring guide

User-agent: *                   # All other spiders should avoid 
Disallow: /cgi-bin/             #     Script files
Disallow: /stats/               #     Big statistics files
Disallow: /pubsys/data/         #     ODS database files
Disallow: /search97cgi/vtopic/  #     Disable index search engine
Disallow: /seind98/topdemo.htm  # temp
Disallow: /nsf99338/topdemo.htm  # temp
Disallow: /home/ebulletin/archive/  #   skip Ebulletin archive
Disallow: /sbe/srs/start.htm
Disallow: /web/                 #authoring guide


[Thanks to JM.]

16 February 2002

Sun

http://www.sun.com/robots.txt

# /robots.txt for www.sun.com

#--------------------------------------------------------------------------
# Mon Feb  2 11:59:27 PST 1998, Fred Elliott
# A NOTE TO THOSE WHO'D BOTHER TO LOOK AT THIS FILE:
#
# Bertrand Meyer's excellent "comp.risks" posting about the potential
# for misusing "robots.txt" files
# (http://www.eiffel.com/private/meyer/robots.html) includes a snapshot
# of the contents of this file here on www.sun.com.
#
# In the article, Bertrand speculates that the directories listed below
# contain proprietary information.  Well, they don't.  They do, though,
# contain information that we'd prefer people register for before they
# download it.
#
# The purpose of the "robots.txt" file is to keep these directories
# from being indexed so that the average user doesn't stumble across them
# while performing searches, and those that should be accessing these
# directories will do so through the URL that requires them to register.
# Of course, having the contents of this file advertised in "comp.risks"
# diminishes its purpose.  Thanks Bertrand. ;-)
#
# If you do actually go to the trouble of figuring out how to download
# the files without registering, what you'll end up with is 1 or 2MB of
# stuff that is meaningless to you unless you have purchased an
# Ultra AX board from Sun.  So, please do purchase an Ultra AX board,
# but then you might as well use the URL you'll be given along with it.
#--------------------------------------------------------------------------
#
# Thu Jan 30 16:58:19 PST 1997, Fred Elliott
# o Created this file to prevent indexing of one
#   SME directory.

User-agent: *

Disallow: /sparc/SPARCengineUltraAX/download/
Disallow: /microelectronics/SPARCengineUltraAX/download/
Disallow: /javachip/SPARCengineUltraAX/download/
Disallow: /javachips/SPARCengineUltraAX/download/
Disallow: /joeroebuck/

# Java Systems files

Disallow: /javastation/remotewindowing/citrix/JICAEng.zip
Disallow: /javastation/remotewindowing/citrix/JavaEnt.tar