Doable strategies against blog spam?

posted: May 20th, 2006 · by: Sven

in: Misc stuff · tagged as: , , ·  1 comments »

My old Wordpress blog had the option to turn on “moderated comments”. That means that I’ve recieved an email notification about every comment asking me to “please moderate this comment”. The email contained a link to instantly approve and a link to reject the comment.

Sigh. My blog never has been linked to that extensively that I’d expected comment spam to become a problem in the first place. I’ve been wrong. Wordpress seems to allure blog spam like a light bulb flies on a midsummer evening. I recieved some hundred mail notifications per week, asking me to moderate cheap perfumes, pills and online bets (for the world cup, lately). Finally my email spam filter caught up and started to sort out the notifications from my own blog. Hum.

Having switched over to Typo I was curious about Typos anti-spam measurements and came up with some thoughts about possible “joint-forces” strategies against blog spam.

In the Typo admin area there’s some “blacklist” option in the admin interface, which claims to compare the users IP address to “local and remote blacklists”. The interface explains this as “Good defense against spam bots.”

Also you’re recommended to not “allow non-ajax comments” which means that comments will only accepted when send via Ajax (which most spammers don’t seem to be capable of).

Don’t know about the blacklist option yet. I think I’ll read up the code. But I wonder why there’s any need of the non-ajax comments option at all when the IP comparsion really does it’s job well.

While I’ve been thinking about this someone dropped an annoucement for a Rails <a href=”http://cuttingtheredtape.blogspot.com/2006/05/actsasclassifiable.html”>act_as_classifiable plugin to the RoR Mailinglist.

Hmmm. I wonder if there’s been any attemp to accomplish something like this:

  • Manually confirmed Typo blogs build up a “trusted blogs net” featuring a distributed notification service about obvious spam requests. By obvious I mean things like requests to Wordpress or other missing files on a Typo blog or manually moderated comments. When there’s any such request on any Typo blog, it will be made visible to all other Typo installations by distributing it somehow (either through a central server or some kind of peer-to-peer approach).
  • Each HTTP request will be analysed (i.e. “learned”) as a whole (like <a href=”http://www.homelandstupidity.us/software/bad-behavior/”>Bad Behaviour does it) through a Bayesian filter as “bad/spam” so that each blog acts as a user to the Bayesian knowledge of the Typo blog net as a whole.
  • This knowledge will be published back to the blogs so that each blog can decide on additional measurements – like actually blocking the whole request even for GET with a 412 error (“you’re not allowed to read my blog.”) when a Request fails a local Baysian test.

This way the following conditions would trigger whole HTTP Requests to be marked/learned as “bad”.

  • it comes from an IP address known as a spammer by a remote blacklist
  • it has been moderated manually as spam

HTTP Requests that come from IP addresses that are used by clients who occasionally log in will be learned as “good”.

This approach could also be used to greatly enhance a comment moderation queue I think. Depending on the results of the Bayesian analysis the notification mails could include a marker in the subject line indicating that the system requires human feedback for this comment. Clicking on a given URL in the mail would learn/unlearn the comment as good/bad. Comments that have been recognized as “good” could immidiately be published.

Newly “learned” spam sources could be published to services like Akismet, SURBL or DNSBL.

Also there could be an additional interface to allow for complaints when your comment has been treated as “bad” by the system.

Another thought:

I don’t know enough about how the majority of spam bots work but I assume that they’re using some “fire and forget” strategy – by just issuing the POST request to several known “candidate” URLs.

If so, this would allow for a very basic additional measurement – simply add some kind of “please confirm your comment” page or (in case of Ajax being used) alert box. Someone who just “fires-and-forgets” would fail this handshake.

If not so, this on the other hand would allow to send the spammer to a honeypot which could make him wait half an hour or so to complete the request and additionally logging as much of information as possible. (I’ve read about 1&1 having been asked to provide their logs to a lawsuit against a spammer but can’t remember the details.)

Leave a comment

1 Comment

  1. Chris said June 16th, 2006 at 10:01 PM  

    I just found this page ... which I found a nice overview.

    Too Biased mentions that there's (been?) some kind of "Sophisticated Spam protection" in Typo 2.0 ... would probably be interesting to check this out?

Leave a comment

Name required
E-Mail and Website optional

If you can read this, you don't use a typical webbrowser that plays nice with CSS.
Please do not fill in anything here!

Hint: Markdown will be applied to your comment. If you post any code, be sure to escape underscores (like so: \_) if you do not want them to be converted to an <em>phasis.

artweb design
Sven Fuchs
Grünberger Str. 65
10245 Berlin, Germany


http://www.artweb-design.de

Fon +49 (30) 47 98 69 96
Fax +49 (30) 47 98 69 97