Despite having a no BDFetch robots.txt directive, Brand Dimensions has downloaded hundreds of my original pages with photos on them. None of these pages mention any brand names of any companies, so I'm curious as to what BD is really doing. I'm guessing they could also provide some serious competitive intelligence to their clients. I just wonder what happens when they represent competing companies, like Coke and Pepsi. Here are some representative entries from my log files:
/var/log/httpd/access_log.1:72.14.164.139 - - [11/Aug/2009:07:25:27 -0400] "GET /carleton/reunionweb/WebPage-Full.00001.html HTTP/1.1" 200 1394 "-" "LinkWalker/2.0"
/var/log/httpd/access_log.1:72.14.164.140 - - [11/Aug/2009:07:25:42 -0400] "GET /carleton/reunionweb/WebPage-Full.00015.html HTTP/1.1" 200 1468 "-" "LinkWalker/2.0"
/var/log/httpd/access_log.1:72.14.164.197 - - [11/Aug/2009:07:25:57 -0400] "GET /carleton/reunionweb/WebPage-Full.00018.html HTTP/1.1" 200 1468 "-" "LinkWalker/2.0"
/var/log/httpd/access_log.1:72.14.164.157 - - [11/Aug/2009:07:26:12 -0400] "GET /carleton/reunionweb/WebPage-Thumb.00023.html HTTP/1.1" 200 3648 "-" "LinkWalker/2.0"
/var/log/httpd/access_log.1:72.14.164.179 - - [11/Aug/2009:07:26:27 -0400] "GET /carleton/reunionweb/WebPage-Full.00013.html HTTP/1.1" 200 1468 "-" "LinkWalker/2.0"
/var/log/httpd/access_log.1:72.14.164.193 - - [11/Aug/2009:07:26:42 -0400] "GET /skiing/webdest/WebPage-Full.00011.html HTTP/
Brand Dimensions switched the name of their bot to sidestep robots.txt directives. Based on my own Google Analytics info, I can safely say a lot of people are interested in what Brand Dimensions is doing and how to stop it. More LinkWalker info here. Other webmasters report that the LinkWalker agent is also used by spambots harvesting email addresses for phishing attacks and the like.
Here are my latest robots.txt lines:
User-agent: BDFetch
Disallow: /
User-agent: BPImageWalker
Disallow: /
User-agent: VoilaBot
Disallow: /
User-Agent: LinkWalker/2.0
Disallow: /
User-Agent: LinkWalker
Disallow: /
I only know unix. So for .htaccess:
ReplyDelete# domains
RewriteCond %{HTTP_REFERER} cbwatch.com [NC,OR]
RewriteCond %{HTTP_REFERER} copyscape\.com [NC]
RewriteRule ^(.*)$ - [F]
#user agents
Order Deny,Allow
Deny from env=bad_bot
BrowserMatchNoCase LinkWalker\/2\.0 bad_bot
BrowserMatchNoCase Mon_httpDownload bad_bot
BrowserMatchNoCase ZmEu bad_bot
robot.txt is for nice bots