^*^%$$$!!! Referrer Spam Spiders!

I'm really starting to get annoyed with all the spiders I am getting through my site, simply so that they can put referrers in my web logs.

It wouldn't be such a big deal, considering it's very easy for me to pick out which referrers are real and which aren't pretty fast. The issue comes from the fact that they are badly written, so spider my links badly, and send URL requests that send me about 10-15 error message emails a day!

They all use a Internet Explorer User Agent, so I can't stop them all that way, I've tried banning people by IP, but they change that often it't almost not worth it.

I'm half considering puting some sort of CAPTCHA on my site just to kill these bots… but that would stop every other good bot too (i.e. Google).

Any thoughts anyone?

Btw – sorry about the lack of posts, I'm currently in the process of rewriting CT using a new OO framework idea I've been toying around with, but more on that later…

Leave a Comment

Comments

  • Mark | December 17, 2004

    Thanks for that!

    I’m not on Apache, so .htaccess isn’t an option. But I’ve written some code that look for a bad referrer and pushes it out to
    http://www.compoundtheory.com/banned.html

    It should do the trick quite nicely I think.

  • Matthew Bourke | January 24, 2005

    what about ya get a list of good bots IP’s and do a simple
    if ip in good bots list ignore
    if not captcha

  • Kevin | March 30, 2006

    Use the CAPTCHA, but allow the good spiders.
    Basically:
    if(capthchaisgood)or(spider in [good spiders])
    allow the crawl
    else
    bye-bye