AccompanyBot #
- Notes
- AI-driven relationship intelligence platform. Basically a sales tool.
- Website
- https://www.accompany.com
Data Mining Unknown if it respects robots.txt
Please find below a manually curated and researched list of users
agents I came across. It's impressive to see how many of the bots
active today flat out do not respect robots.txt settings — or claim to
do it but ignore them. This list is updated regularly, whenever I spot
new user agents and look into their behavior. There is no JavaScript,
here no fancy search.
Cmd-F and Ctrl-F work
beautifully.
The information on this site is free as in beer. I take no responsibility for anything related to it and commercial use is explicitly forbidden. Meaning: you are not allowed to sell it, either separate or as part of a product. Otherwise it's public information. Please don't do anything stupid.
I'm always happy to hear from people this helped in some small way. If you feel like it, drop me a line: marcel@herrbischoff.com
Data Mining Unknown if it respects robots.txt
Mozilla/5.0 (compatible; Adsbot/3.1)
Suspicious Does not respect robots.txt
Mozilla/5.0 (compatible; adscanner/)/1.0 (http://seocompany.store; spider@seocompany.store)
Advertising Unknown if it respects robots.txt
adstxt.com/1.2
Advertising Does not respect robots.txt
Mozilla/5.0 (compatible; aiHitBot/2.9; +https://www.aihitdata.com/about)
Search Respects robots.txt
Python/3.7 aiohttp/3.0.9
Automation Does not respect robots.txt
Ankit
Malware Does not respect robots.txt
Apache-HttpClient/4.5.12 (Java/14.0.1)
Suspicious Does not respect robots.txt
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot)
Search Respects robots.txt
Mozilla/5.0 (compatible; archive.org_bot +http://www.archive.org/details/archive.org_bot)
Archival Does not respect robots.txt
Mozilla/5.0 (compatible;AspiegelBot)
Advertising Does not respect robots.txt
axios/0.19.0
Automation Does not respect robots.txt
Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
Search Does not respect robots.txt
BananaBot/0.6.1
Suspicious Does not respect robots.txt
Barkrowler/0.9 (+http://www.exensa.com/crawl)
Data Mining Respects robots.txt
Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
Search Respects robots.txt
Blackboard Safeassign
Legal Does not respect robots.txt
borneoBot/0.6.7 (crawlcheck123@gmail.com)
Suspicious Does not respect robots.txt
botnet/2.0
Malware Does not respect robots.txt
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko; compatible; BuiltWith/1.0; +http://builtwith.com/biup) Chrome/74.0.3729.131 Safari/537.36
Data Mining Does not respect robots.txt
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko; compatible; BW/1.1; bit.ly/2W6Px8S) Chrome/74.0.3729.131 Safari/537.36
Data Mining Does not respect robots.txt
CATExplorador/1.0beta (sistemes at domini dot cat; http://domini.cat/catexplorador.html)
Data Mining Does not respect robots.txt
CCBot/2.0 (https://commoncrawl.org/faq/)
Data Mining Respects robots.txt
CheckHost (https://check-host.net/)
Automation Does not respect robots.txt
CheckMarkNetwork/1.0 (+http://www.checkmarknetwork.com/spider.html)
Legal Respects robots.txt
chimebot
Suspicious Does not respect robots.txt
Mozilla/5.0 (compatible; Cincraw/1.0; +http://cincrawdata.net/bot/)
Data Mining Does not respect robots.txt
CISPA Webcrawler (https://vuln-notify-checker.cispa.saarland)
Security Does not respect robots.txt
Mozilla/5.0 (compatible; Clarabot/1.4; +http://www.clarabot.info/bots)
Suspicious Respects robots.txt
Mozilla/5.0 (compatible; Cliqzbot/3.0; +http://cliqz.com/company/cliqzbot)
Search Does not respect robots.txt
Cloud mapping experiment. Contact research@pdrlabs.net
Suspicious Does not respect robots.txt
Mozilla/4.0 (CMS Crawler: http://www.cmscrawler.com)
Data Mining Does not respect robots.txt
Mozilla/5.0 (compatible; coccocbot-image/1.0; +http://help.coccoc.com/searchengine)
Search Respects robots.txt
Mozilla/5.0 (compatible; coccocbot-web/1.0; +http://help.coccoc.com/searchengine)
Search Respects robots.txt
CowBot/1.0
Suspicious Respects robots.txt
crawler4j (https://github.com/yasserg/crawler4j/)
Automation Does not respect robots.txt
curb
Suspicious Does not respect robots.txt
curl/7.70.0
Automation Does not respect robots.txt
Mozilla/5.0 (X11; Datanyze; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36
Data Mining Does not respect robots.txt
Mozilla/5.0 (compatible; Dataprovider.com)
Data Mining Respects robots.txt
DF Bot 1.0
Suspicious Does not respect robots.txt
Dispatch/0.14.0-SNAPSHOT
Automation Does not respect robots.txt
Mozilla/5.0 (compatible; Domains Project/1.1.0; +https://domainsproject.org)
Data Mining Does not respect robots.txt
Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)
SEO Respects robots.txt
drupalfinder1 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36
Suspicious Does not respect robots.txt
Mozilla/5.0 (compatible; DuckDuckBot-Https/1.1; https://duckduckgo.com/duckduckbot)
Search Respects robots.txt
Mozilla/5.0 (compatible; DuckDuckGo-Favicons-Bot/1.0; +http://duckduckgo.com)
Search Does not respect robots.txt
eContext/1.0 (eContext Classification Engine)
Data Mining Does not respect robots.txt
Elisabot
Suspicious Does not respect robots.txt
Emacs Elfeed 3.3.0
Automation Does not respect robots.txt
eZ Publish Link Validator
Automation Does not respect robots.txt
Faraday v0.15.4
Automation Does not respect robots.txt
FeedFetcher-Google; (+http://www.google.com/feedfetcher.html)
Search Does not respect robots.txt
finbot
Suspicious Unknown if it respects robots.txt
GarlikCrawler/1.2 (http://garlik.com/, crawler@garlik.com)
Suspicious Respects robots.txt
Googlebot (gocrawl v0.4)
Automation Respects robots.txt
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Search Respects robots.txt
Googlebot-Image/1.0
Search Does not respect robots.txt
Googlebot-Video/1.0
Search Respects robots.txt
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/68.0.3440.106 Safari/537.36
Automation Does not respect robots.txt
HealthCheckBot/0.2
Suspicious Does not respect robots.txt
Hello, world
Malware Does not respect robots.txt
Mozilla/5.0 (compatible; heritrix/3.4.0-20200304 +http://hbi640.ir/)
Archival Respects robots.txt
http.rb/4.4.1
Automation Does not respect robots.txt
ia_archiver
Data Mining Respects robots.txt
Mozilla/5.0 (Windows NT 6.1; rv:38.0) Gecko/20100101 Firefox/38.0 (IndeedBot 1.1)
Suspicious Does not respect robots.txt
Mozilla/5.0 (compatible; Integrity/8; +https://peacockmedia.software/mac/integrity/
Automation Does not respect robots.txt
Internet-structure-research-project-bot
Suspicious Does not respect robots.txt
Mozilla/5.0 (compatible; ips-agent)
Suspicious Unknown if it respects robots.txt
Java/1.8.0_211
Automation Does not respect robots.txt
Mozilla/5.0 (X11; U; Linux Core i7-4980HQ; de; rv:32.0; compatible; JobboerseBot; http://www.jobboerse.com/bot.htm) Gecko/20100101 Firefox/38.0
Data Mining Unknown if it respects robots.txt
KOCMOHABT (https://kozmonavt.tk/) Mozilla/5.0 (Web Explorer)
Suspicious Does not respect robots.txt
LCC (+http://corpora.informatik.uni-leipzig.de/crawler_faq.html)
Data Mining Does not respect robots.txt
Leap
Suspicious Does not respect robots.txt
Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)
Security Does not respect robots.txt
libwww-perl/6.43
Automation Does not respect robots.txt
LightspeedSystemsCrawler Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)
Security Respects robots.txt
Linguee Bot (http://www.linguee.com/bot; bot@linguee.com)
Search Respects robots.txt
ltx71 - (http://ltx71.com/)
Security Respects robots.txt
lua-resty-http/0.10 (Lua) ngx_lua/10000
Automation Does not respect robots.txt
Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/2.0; +http://go.mail.ru/help/robots)
Search Does not respect robots.txt
masscan/1.0 (https://github.com/robertdavidgraham/masscan)
Security Does not respect robots.txt
MauiBot (crawler.feedback+wc@gmail.com)
Suspicious Respects robots.txt
Mozilla/5.0 (compatible; MixrankBot; crawler@mixrank.com)
Data Mining Does not respect robots.txt
Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)
SEO Respects robots.txt
myseosnapshot/1.0
Suspicious Does not respect robots.txt
netEstate NE Crawler (+http://www.website-datenbank.de/)
Search Respects robots.txt
NetNewsWire (RSS Reader; https://ranchero.com/netnewswire/)
Automation Does not respect robots.txt
Mozilla/5.0 (compatible; Nimbostratus-Bot/v1.3.2; http://cloudsystemnetworks.com)
Suspicious Does not respect robots.txt
Mozilla/5.0 (compatible; oBot/2.3.1; http://www.xforce-security.com/crawler/)
Suspicious Does not respect robots.txt
OnalyticaBot
Suspicious Does not respect robots.txt
OrgProbe/2.0.0 (+http://www.blocked.org.uk)
Data Mining Does not respect robots.txt
Pandalytics/1.0 (https://domainsbot.com/pandalytics/)
SEO Respects robots.txt
Mozilla/5.0 (compatible; Panscient/1.0; +http://panscient.com/faq.htm)
Data Mining Respects robots.txt
Pinterest/0.2 (+https://www.pinterest.com/bot.html)
Data Mining Respects robots.txt
Mozilla/5.0 (compatible; Pinterestbot/1.0; +http://www.pinterest.com/bot.html)
Data Mining Respects robots.txt
Mozilla/5.0 (compatible; Plukkie/1.6; http://www.botje.com/plukkie.htm)
Search Respects robots.txt
polaris botnet
Malware Does not respect robots.txt
python-requests/2.21.0
Automation Does not respect robots.txt
Python-urllib/3.8
Automation Does not respect robots.txt
Qwantify/1.0
Search Respects robots.txt
Mozilla/5.0 (compatible; Qwantify/Bleriot/1.1; +https://help.qwant.com/bot)
Search Respects robots.txt
Mozilla/5.0 zgrab/0.x (compatible; Researchscan/t12sns; +http://researchscan.comsys.rwth-aachen.de)
Security Does not respect robots.txt
RestSharp/105.2.3.0
Automation Does not respect robots.txt
Riddler (http://riddler.io/about)
Data Mining Respects robots.txt
RobotsChecker/0.6 (+http://www.blocked.org.uk)
Data Mining Does not respect robots.txt
Ruby
Automation Does not respect robots.txt
RyteBot/1.0.0 (+https://bot.ryte.com/)
SEO Unknown if it respects robots.txt
Scrapy/1.7.2 (+https://scrapy.org)
Automation Unknown if it respects robots.txt
Screaming Frog SEO Spider/13.0
SEO Respects robots.txt
SearchAtlas.com SEO Crawler
SEO Respects robots.txt
Mozilla/5.0 (compatible; Seekport Crawler; http://seekport.com/
Search Unknown if it respects robots.txt
Mozilla/5.0 (compatible; SemrushBot/1.0~bm; +http://www.semrush.com/bot.html)
SEO Respects robots.txt
Semtix.cz <https://semtix.cz/bot>
SEO Does not respect robots.txt
Mozilla/5.0 (compatible; SeoChecker/1.1)
SEO Does not respect robots.txt
Mozilla/5.0 (compatible; SeznamBot/3.2; +http://napoveda.seznam.cz/en/seznambot-intro/)
Search Does not respect robots.txt
shopify-partner-homepage-scraper
Suspicious Does not respect robots.txt
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36 (compatible; SMTBot/1.0; +http://www.similartech.com/smtbot)
Data Mining Does not respect robots.txt
Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)
Search Does not respect robots.txt
Mozilla/5.0 (compatible; Sophora; http://www.subshell.com)
Automation Does not respect robots.txt
Mozilla/5.0 (compatible; special_archiver/3.1.1 +http://www.archive.org/details/archive.org_bot)
Archival Does not respect robots.txt
spider
Suspicious Does not respect robots.txt
Spider2.0
Suspicious Does not respect robots.txt
Mozilla/5.0 (compatible; SpiderLing (a SPIDER for LINGustic research); +http://nlp.fi.muni.cz/projects/biwec/)
Data Mining Respects robots.txt
Mozilla/5.0 (compatible; SurdotlyBot/1.0; +http://sur.ly/bot.html)
Security Does not respect robots.txt
SWRLinkchecker
Automation Does not respect robots.txt
TelegramBot (like TwitterBot)
Automation Does not respect robots.txt
Testcrawler
Suspicious Does not respect robots.txt
The Knowledge AI
Suspicious Respects robots.txt
TprAdsTxtCrawler/1.0
Suspicious Does not respect robots.txt
Mozilla/5.0 (compatible; tracemyfile/1.0; +bot@tracemyfile.com)
Data Mining Does not respect robots.txt
Mozilla/5.0 (TweetmemeBot/4.0; +http://datasift.com/bot.html) Gecko/20100101 Firefox/31.0
Automation Does not respect robots.txt
Mozilla/5.0 (compatible; Twingly Recon; twingly.com)
Data Mining Respects robots.txt
UniversalFeedParser/5.2.1 +https://code.google.com/p/feedparser/
Automation Does not respect robots.txt
Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)
Automation Does not respect robots.txt
Mozilla/5.0 (compatible; VelenPublicWebCrawler/1.0; +https://velen.io)
Data Mining Does not respect robots.txt
VsuSearchSpider/1.0
Suspicious Respects robots.txt
W3C_Validator/1.3 http://validator.w3.org/services
Automation Does not respect robots.txt
Mozilla/5.0 (compatible; Wappalyzer)
Data Mining Does not respect robots.txt
Mozilla/5.0 (compatible; webtechbot; +https://www.webtechsurvey.com/bot)
Data Mining Respects robots.txt
Wget/1.20 (mingw32)
Automation Does not respect robots.txt
Who.is Bot
Suspicious Does not respect robots.txt
Mozilla/4.0 (compatible; Win32; WinHttp.WinHttpRequest.5)
Automation Does not respect robots.txt
Mozilla/5.0 (iPad; CPU OS 11_0 like Mac OS X) AppleWebKit/604.1.34 (KHTML, like Gecko) Version/11.0 Mobile/15A5341f Safari/604.1 (compatible; woorankreview/2.0; +https://www.woorank.com/)
SEO Does not respect robots.txt
www.deadlinkchecker.com Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36
Automation Does not respect robots.txt
Xenu Link Sleuth/1.3.8
Automation Does not respect robots.txt
XTC
Malware Does not respect robots.txt
yacybot (/global; amd64 Linux 5.7.4; java 1.8.0_201; America/en) http://yacy.net/bot.html
Search Respects robots.txt
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
Advertising Does not respect robots.txt
Mozilla/5.0 (iPhone; CPU iPhone OS 8_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B411 Safari/600.1.4 (compatible; YandexMobileBot/3.0; +http://yandex.com/bots)
Search Does not respect robots.txt
Mozilla/5.0 (compatible; Yeti/1.1; +http://naver.me/spd)
Search Respects robots.txt
Mozilla/5.0 zgrab/0.x
Automation Does not respect robots.txt
ZoominfoBot (zoominfobot at zoominfo dot com)
Data Mining Respects robots.txt