AccompanyBot #
- Notes
- AI-driven relationship intelligence platform. Basically a sales tool.
- Website
- https://www.accompany.com
Data Mining Unknown if it respects robots.txt
Please find below a manually curated and researched list of users
agents I came across. It's impressive to see how many of the bots
active today flat out do not respect robots.txt settings — or claim to
do it but ignore them. This list is updated regularly, whenever I spot
new user agents and look into their behavior. There is no JavaScript,
here no fancy search.
Cmd-F and Ctrl-F work
beautifully.
The information on this site is free as in beer. I take no responsibility for anything related to it and commercial use is explicitly forbidden. Meaning: you are not allowed to sell it, either separate or as part of a product. Otherwise it's public information. Please don't do anything stupid.
I'm always happy to hear from people this helped in some small way. If you feel like it, drop me a line: marcel@herrbischoff.com
Data Mining Unknown if it respects robots.txt
Mozilla/5.0 (compatible; Adsbot/3.1)Suspicious Does not respect robots.txt
Mozilla/5.0 (compatible; adscanner/)/1.0 (http://seocompany.store; spider@seocompany.store)Advertising Unknown if it respects robots.txt
adstxt.com/1.2Advertising Does not respect robots.txt
Mozilla/5.0 (compatible; aiHitBot/2.9; +https://www.aihitdata.com/about)Search Respects robots.txt
Python/3.7 aiohttp/3.0.9Automation Does not respect robots.txt
AnkitMalware Does not respect robots.txt
Apache-HttpClient/4.5.12 (Java/14.0.1)Suspicious Does not respect robots.txt
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot)Search Respects robots.txt
Mozilla/5.0 (compatible; archive.org_bot +http://www.archive.org/details/archive.org_bot)Archival Does not respect robots.txt
Mozilla/5.0 (compatible;AspiegelBot)Advertising Does not respect robots.txt
axios/0.19.0Automation Does not respect robots.txt
Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)Search Does not respect robots.txt
BananaBot/0.6.1Suspicious Does not respect robots.txt
Barkrowler/0.9 (+http://www.exensa.com/crawl)Data Mining Respects robots.txt
Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)Search Respects robots.txt
Blackboard SafeassignLegal Does not respect robots.txt
borneoBot/0.6.7 (crawlcheck123@gmail.com)Suspicious Does not respect robots.txt
botnet/2.0Malware Does not respect robots.txt
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko; compatible; BuiltWith/1.0; +http://builtwith.com/biup) Chrome/74.0.3729.131 Safari/537.36Data Mining Does not respect robots.txt
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko; compatible; BW/1.1; bit.ly/2W6Px8S) Chrome/74.0.3729.131 Safari/537.36Data Mining Does not respect robots.txt
CATExplorador/1.0beta (sistemes at domini dot cat; http://domini.cat/catexplorador.html)Data Mining Does not respect robots.txt
CCBot/2.0 (https://commoncrawl.org/faq/)Data Mining Respects robots.txt
CheckHost (https://check-host.net/)Automation Does not respect robots.txt
CheckMarkNetwork/1.0 (+http://www.checkmarknetwork.com/spider.html)Legal Respects robots.txt
chimebotSuspicious Does not respect robots.txt
Mozilla/5.0 (compatible; Cincraw/1.0; +http://cincrawdata.net/bot/)Data Mining Does not respect robots.txt
CISPA Webcrawler (https://vuln-notify-checker.cispa.saarland)Security Does not respect robots.txt
Mozilla/5.0 (compatible; Clarabot/1.4; +http://www.clarabot.info/bots)Suspicious Respects robots.txt
Mozilla/5.0 (compatible; Cliqzbot/3.0; +http://cliqz.com/company/cliqzbot)Search Does not respect robots.txt
Cloud mapping experiment. Contact research@pdrlabs.netSuspicious Does not respect robots.txt
Mozilla/4.0 (CMS Crawler: http://www.cmscrawler.com)Data Mining Does not respect robots.txt
Mozilla/5.0 (compatible; coccocbot-image/1.0; +http://help.coccoc.com/searchengine)Search Respects robots.txt
Mozilla/5.0 (compatible; coccocbot-web/1.0; +http://help.coccoc.com/searchengine)Search Respects robots.txt
CowBot/1.0Suspicious Respects robots.txt
crawler4j (https://github.com/yasserg/crawler4j/)Automation Does not respect robots.txt
curbSuspicious Does not respect robots.txt
curl/7.70.0Automation Does not respect robots.txt
Mozilla/5.0 (X11; Datanyze; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36Data Mining Does not respect robots.txt
Mozilla/5.0 (compatible; Dataprovider.com)Data Mining Respects robots.txt
DF Bot 1.0Suspicious Does not respect robots.txt
Dispatch/0.14.0-SNAPSHOTAutomation Does not respect robots.txt
Mozilla/5.0 (compatible; Domains Project/1.1.0; +https://domainsproject.org)Data Mining Does not respect robots.txt
Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)SEO Respects robots.txt
drupalfinder1 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36Suspicious Does not respect robots.txt
Mozilla/5.0 (compatible; DuckDuckBot-Https/1.1; https://duckduckgo.com/duckduckbot)Search Respects robots.txt
Mozilla/5.0 (compatible; DuckDuckGo-Favicons-Bot/1.0; +http://duckduckgo.com)Search Does not respect robots.txt
eContext/1.0 (eContext Classification Engine)Data Mining Does not respect robots.txt
ElisabotSuspicious Does not respect robots.txt
Emacs Elfeed 3.3.0Automation Does not respect robots.txt
eZ Publish Link ValidatorAutomation Does not respect robots.txt
Faraday v0.15.4Automation Does not respect robots.txt
FeedFetcher-Google; (+http://www.google.com/feedfetcher.html)Search Does not respect robots.txt
finbotSuspicious Unknown if it respects robots.txt
GarlikCrawler/1.2 (http://garlik.com/, crawler@garlik.com)Suspicious Respects robots.txt
Googlebot (gocrawl v0.4)Automation Respects robots.txt
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)Search Respects robots.txt
Googlebot-Image/1.0Search Does not respect robots.txt
Googlebot-Video/1.0Search Respects robots.txt
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/68.0.3440.106 Safari/537.36Automation Does not respect robots.txt
HealthCheckBot/0.2Suspicious Does not respect robots.txt
Hello, worldMalware Does not respect robots.txt
Mozilla/5.0 (compatible; heritrix/3.4.0-20200304 +http://hbi640.ir/)Archival Respects robots.txt
http.rb/4.4.1Automation Does not respect robots.txt
ia_archiverData Mining Respects robots.txt
Mozilla/5.0 (Windows NT 6.1; rv:38.0) Gecko/20100101 Firefox/38.0 (IndeedBot 1.1)Suspicious Does not respect robots.txt
Mozilla/5.0 (compatible; Integrity/8; +https://peacockmedia.software/mac/integrity/Automation Does not respect robots.txt
Internet-structure-research-project-botSuspicious Does not respect robots.txt
Mozilla/5.0 (compatible; ips-agent)Suspicious Unknown if it respects robots.txt
Java/1.8.0_211Automation Does not respect robots.txt
Mozilla/5.0 (X11; U; Linux Core i7-4980HQ; de; rv:32.0; compatible; JobboerseBot; http://www.jobboerse.com/bot.htm) Gecko/20100101 Firefox/38.0Data Mining Unknown if it respects robots.txt
KOCMOHABT (https://kozmonavt.tk/) Mozilla/5.0 (Web Explorer)Suspicious Does not respect robots.txt
LCC (+http://corpora.informatik.uni-leipzig.de/crawler_faq.html)Data Mining Does not respect robots.txt
LeapSuspicious Does not respect robots.txt
Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)Security Does not respect robots.txt
libwww-perl/6.43Automation Does not respect robots.txt
LightspeedSystemsCrawler Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)Security Respects robots.txt
Linguee Bot (http://www.linguee.com/bot; bot@linguee.com)Search Respects robots.txt
ltx71 - (http://ltx71.com/)Security Respects robots.txt
lua-resty-http/0.10 (Lua) ngx_lua/10000Automation Does not respect robots.txt
Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/2.0; +http://go.mail.ru/help/robots)Search Does not respect robots.txt
masscan/1.0 (https://github.com/robertdavidgraham/masscan)Security Does not respect robots.txt
MauiBot (crawler.feedback+wc@gmail.com)Suspicious Respects robots.txt
Mozilla/5.0 (compatible; MixrankBot; crawler@mixrank.com)Data Mining Does not respect robots.txt
Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)SEO Respects robots.txt
myseosnapshot/1.0Suspicious Does not respect robots.txt
netEstate NE Crawler (+http://www.website-datenbank.de/)Search Respects robots.txt
NetNewsWire (RSS Reader; https://ranchero.com/netnewswire/)Automation Does not respect robots.txt
Mozilla/5.0 (compatible; Nimbostratus-Bot/v1.3.2; http://cloudsystemnetworks.com)Suspicious Does not respect robots.txt
Mozilla/5.0 (compatible; oBot/2.3.1; http://www.xforce-security.com/crawler/)Suspicious Does not respect robots.txt
OnalyticaBotSuspicious Does not respect robots.txt
OrgProbe/2.0.0 (+http://www.blocked.org.uk)Data Mining Does not respect robots.txt
Pandalytics/1.0 (https://domainsbot.com/pandalytics/)SEO Respects robots.txt
Mozilla/5.0 (compatible; Panscient/1.0; +http://panscient.com/faq.htm)Data Mining Respects robots.txt
Pinterest/0.2 (+https://www.pinterest.com/bot.html)Data Mining Respects robots.txt
Mozilla/5.0 (compatible; Pinterestbot/1.0; +http://www.pinterest.com/bot.html)Data Mining Respects robots.txt
Mozilla/5.0 (compatible; Plukkie/1.6; http://www.botje.com/plukkie.htm)Search Respects robots.txt
polaris botnetMalware Does not respect robots.txt
python-requests/2.21.0Automation Does not respect robots.txt
Python-urllib/3.8Automation Does not respect robots.txt
Qwantify/1.0Search Respects robots.txt
Mozilla/5.0 (compatible; Qwantify/Bleriot/1.1; +https://help.qwant.com/bot)Search Respects robots.txt
Mozilla/5.0 zgrab/0.x (compatible; Researchscan/t12sns; +http://researchscan.comsys.rwth-aachen.de)Security Does not respect robots.txt
RestSharp/105.2.3.0Automation Does not respect robots.txt
Riddler (http://riddler.io/about)Data Mining Respects robots.txt
RobotsChecker/0.6 (+http://www.blocked.org.uk)Data Mining Does not respect robots.txt
RubyAutomation Does not respect robots.txt
RyteBot/1.0.0 (+https://bot.ryte.com/)SEO Unknown if it respects robots.txt
Scrapy/1.7.2 (+https://scrapy.org)Automation Unknown if it respects robots.txt
Screaming Frog SEO Spider/13.0SEO Respects robots.txt
SearchAtlas.com SEO CrawlerSEO Respects robots.txt
Mozilla/5.0 (compatible; Seekport Crawler; http://seekport.com/Search Unknown if it respects robots.txt
Mozilla/5.0 (compatible; SemrushBot/1.0~bm; +http://www.semrush.com/bot.html)SEO Respects robots.txt
Semtix.cz <https://semtix.cz/bot>SEO Does not respect robots.txt
Mozilla/5.0 (compatible; SeoChecker/1.1)SEO Does not respect robots.txt
Mozilla/5.0 (compatible; SeznamBot/3.2; +http://napoveda.seznam.cz/en/seznambot-intro/)Search Does not respect robots.txt
shopify-partner-homepage-scraperSuspicious Does not respect robots.txt
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36 (compatible; SMTBot/1.0; +http://www.similartech.com/smtbot)Data Mining Does not respect robots.txt
Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)Search Does not respect robots.txt
Mozilla/5.0 (compatible; Sophora; http://www.subshell.com)Automation Does not respect robots.txt
Mozilla/5.0 (compatible; special_archiver/3.1.1 +http://www.archive.org/details/archive.org_bot)Archival Does not respect robots.txt
spiderSuspicious Does not respect robots.txt
Spider2.0Suspicious Does not respect robots.txt
Mozilla/5.0 (compatible; SpiderLing (a SPIDER for LINGustic research); +http://nlp.fi.muni.cz/projects/biwec/)Data Mining Respects robots.txt
Mozilla/5.0 (compatible; SurdotlyBot/1.0; +http://sur.ly/bot.html)Security Does not respect robots.txt
SWRLinkcheckerAutomation Does not respect robots.txt
TelegramBot (like TwitterBot)Automation Does not respect robots.txt
TestcrawlerSuspicious Does not respect robots.txt
The Knowledge AISuspicious Respects robots.txt
TprAdsTxtCrawler/1.0Suspicious Does not respect robots.txt
Mozilla/5.0 (compatible; tracemyfile/1.0; +bot@tracemyfile.com)Data Mining Does not respect robots.txt
Mozilla/5.0 (TweetmemeBot/4.0; +http://datasift.com/bot.html) Gecko/20100101 Firefox/31.0Automation Does not respect robots.txt
Mozilla/5.0 (compatible; Twingly Recon; twingly.com)Data Mining Respects robots.txt
UniversalFeedParser/5.2.1 +https://code.google.com/p/feedparser/Automation Does not respect robots.txt
Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)Automation Does not respect robots.txt
Mozilla/5.0 (compatible; VelenPublicWebCrawler/1.0; +https://velen.io)Data Mining Does not respect robots.txt
VsuSearchSpider/1.0Suspicious Respects robots.txt
W3C_Validator/1.3 http://validator.w3.org/servicesAutomation Does not respect robots.txt
Mozilla/5.0 (compatible; Wappalyzer)Data Mining Does not respect robots.txt
Mozilla/5.0 (compatible; webtechbot; +https://www.webtechsurvey.com/bot)Data Mining Respects robots.txt
Wget/1.20 (mingw32)Automation Does not respect robots.txt
Who.is BotSuspicious Does not respect robots.txt
Mozilla/4.0 (compatible; Win32; WinHttp.WinHttpRequest.5)Automation Does not respect robots.txt
Mozilla/5.0 (iPad; CPU OS 11_0 like Mac OS X) AppleWebKit/604.1.34 (KHTML, like Gecko) Version/11.0 Mobile/15A5341f Safari/604.1 (compatible; woorankreview/2.0; +https://www.woorank.com/)SEO Does not respect robots.txt
www.deadlinkchecker.com Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36Automation Does not respect robots.txt
Xenu Link Sleuth/1.3.8Automation Does not respect robots.txt
XTCMalware Does not respect robots.txt
yacybot (/global; amd64 Linux 5.7.4; java 1.8.0_201; America/en) http://yacy.net/bot.htmlSearch Respects robots.txt
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)Advertising Does not respect robots.txt
Mozilla/5.0 (iPhone; CPU iPhone OS 8_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B411 Safari/600.1.4 (compatible; YandexMobileBot/3.0; +http://yandex.com/bots)Search Does not respect robots.txt
Mozilla/5.0 (compatible; Yeti/1.1; +http://naver.me/spd)Search Respects robots.txt
Mozilla/5.0 zgrab/0.xAutomation Does not respect robots.txt
ZoominfoBot (zoominfobot at zoominfo dot com)Data Mining Respects robots.txt