Donate for the Cryptome archive of files from June 1996 to the present


19 July 2012

Kill All Bots


http://digitalcorpora.org/archives/162

Bots downloading disk images

December 27th, 2010 admin

I’m preparing some statistics on who (and what) are downloading the disk images we have here at digitalcorpora.org. The first thing that I’ve done is suppress the bots that are, for whatever reason, downloading the images.

Here’s the bots that we’ve found, and the number of times each image has been downloaded by a bot.

    Rank     Count     Value(s):
  ====================================================================================

      1      2334      Mozilla/5.0 (compatible; Googlebot/2.1; 
                       +http://www.google.com/bot.html)

      2       851      MLBot 
                       (www.metadatalabs.com/mlbot)

      3       811      SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 
                       Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 
                       (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; 
                       +http://www.google.com/bot.html)

      4       749      Mozilla/5.0 (compatible; DotBot/1.1; 
                       http://www.dotnetdotcom.org/, crawler@dotnetdotcom.org)

      5       492      Mozilla/5.0 (compatible; YandexBot/3.0; 
                       +http://yandex.com/bots)

      6       130      Mozilla/5.0 (compatible; bingbot/2.0; 
                       +http://www.bing.com/bingbot.htm)

      7       115      Mozilla/5.0 (compatible; DBLBot/1.0; 
                       +http://www.dontbuylists.com/)

      8       109      msnbot/2.0b 
                       (+http://search.msn.com/msnbot.htm)

      9       108      Mozilla/5.0 (compatible; SiteBot/0.1; 
                       +http://www.sitebot.org/robot/)

     10        89      CCBot/1.0 
                       (+http://www.commoncrawl.org/bot.html)

     11        87      Mozilla/5.0 (Twiceler-0.9 
                       http://www.cuil.com/twiceler/robot.html)

     12        78      TwengaBot-Discover 
                       (http://www.twenga.fr/bot-discover.html)

     13        58      Mozilla/5.0 (compatible; Purebot/1.1; 
                       +http://www.puritysearch.net/)

     14        51      msnbot/1.1 
                       (+http://search.msn.com/msnbot.htm)

     15        26      Mozilla/5.0 (compatible; MJ12bot/v1.3.2; 
                       http://www.majestic12.co.uk/bot.php?+)

     16        21      Cityreview Robot 
                       (+http://www.cityreview.org/crawler/)

     17        18      'citeseerxbot'

     18        15      SindiceBot (heritrix/2.0.2 
                       +http://sindice.com/developers/bot)

     19        12      Mozilla/5.0 (compatible; MJ12bot/v1.3.1; 
                       http://www.majestic12.co.uk/bot.php?+)

     20        11      Mozilla/5.0 (compatible; discobot/1.1; 
                       +http://discoveryengine.com/discobot.html

     21         9      Mozilla/5.0 (compatible; Exabot/3.0; 
                       +http://www.exabot.com/go/robot)

     22         7      CatchBot/3.0; 
                       +http://www.catchbot.com

                7      CyberPatrol SiteCat Webbot 
                       (http://www.cyberpatrol.com/cyberpatrolcrawler.asp)

                7      yacybot (amd64 Linux 2.6.26-2-xen-amd64; java 1.6.0_20; Europe/en) 
                       http://yacy.net/bot.html

     25         6      Mozilla/5.0 (compatible; Search17Bot/1.1; 
                       http://www.search17.com/bot.php)

                6      yacybot (amd64 Linux 2.6.26-2-xen-amd64; java 1.6.0_20; Europe/de) 
                       http://yacy.net/bot.html

     27         5      MSRBOT 
                       (http://research.microsoft.com/research/sv/msrbot/)

                5      yacybot (amd64 Linux 2.6.31-20-generic; java 1.6.0_15; Europe/en) 
                       http://yacy.net/bot.html

                5      yacybot (i386 Linux 2.6.32-trunk-686; java 1.6.0_18; America/en) 
                       http://yacy.net/bot.html

     30         3      msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)

     31         2      Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9

                2      yacybot (amd64 Linux 2.6.26-2-amd64; java 1.6.0_20; Europe/en) 
                       http://yacy.net/bot.html

                2      yacybot (amd64 Linux 2.6.28-18-generic; java 1.6.0_19; GMT/en) 
                       http://yacy.net/bot.html

                2      yacybot (i386 Linux 2.6.31-21-generic; java 1.6.0_0; Europe/en) 
                       http://yacy.net/bot.html

     35         1      Mozilla/5.0 (compatible; Googlebot/2.1;  
                       http://www.google.com/bot.html)

                1      Mozilla/5.0 (compatible; discobot/1.1; 
                       +http://discoveryengine.com/discobot.html)

                1      findfiles.net/0.96 (Robot;test_robot@gmx-topmail.de)

                1      librabot/1.0 
                       (+http://search.msn.com/msnbot.htm)

                1      yacybot (amd64 Linux 2.6.18-164.11.1.el5xen; java 1.6.0; Europe/en) 
                       http://yacy.net/bot.html

                1      yacybot (amd64 Linux 2.6.18-164.15.1.el5; java 1.6.0_14; Europe/de) 
                       http://yacy.net/bot.html

                1      yacybot (x86 Windows XP 5.1; java 1.6.0_18; Europe/de) 
                       http://yacy.net/bot.html

                1      yacybot (x86 Windows XP 5.1; java 1.6.0_20; Europe/de) 
                       http://yacy.net/bot.html

                1      yacybot (x86_64 Mac OS X 10.6.4; java 1.6.0_20; America/en) 
                       http://yacy.net/bot.html 

Total items printed: 6242