How to add a floodgate to Mephisto's nearly perfect spam protection

posted: April 4th, 2007 · by: Sven

in: Programming · tagged as: , , , , , , , ·  5 comments »

Mephisto comes with Akismet support baked right into its heart and it works like a charm. I’ve been totally pleased with it spotting nearly every spam comment and collecting them for later review and bulk deletion.

Now, 30 days later, I found it was time to clean up the spammy comments piles that Mephisto hunted down for me. Actually nearly 900 comments had piled up. Year, that’s 30 per day. I don’t think that’s that much compared to what others receive.

But to me 900 spam comments in 30 days is too much. I don’t want to spend the time to manually check 900 comments even if I’d have to do it only once a month. No way.

I thus revisited an idea that Damien Katz called “negative captcha”. I realized that I’ve actually had experiences with this type of “protection” and they’ve been pretty good.

[Update]

This little anti-spam trick has been that efficient that I have had no blog comment spam so sort out for months (still counting). I therefore decided to “upgrade” to a slightly more sophisticated version (re-allowing commenters to add an email address) and re-vamped the whole thing as a more distributable Mephisto plugin instead of two shaky patches.

I’m going to put some notes about the new plugin asap. I’ve added an article about the plugin now: “Inverse Captcha Anti-Comment-Spam Technique: Now A Regular Mephisto Plugin”.

You may also want to refer to this page for additional information: Mephisto Inverse Captcha Anti-Comment-Spam Plugin.

The idea

When I was running Typo last year I recieved almost no spam at all! The reason for this was that Typo had a preference that allowed you to only accept comments that were posted through Ajax. None of the default-dumb-ass spam bots seem to be able to recognize this and act accordingly. Of course a moderately skilled spammer could easily have cracked this “protection” … but actually nobody seemed to care - and that’s the point.

With this in mind and for a quick test run I monkeypatched an “inverse captcha” mechanism right into this very Mephisto installation that you’re looking at. (I prefer the term “inverse” because “negative” sounds negative.)

What’s happening is nothing more than that the email form field is hidden through simple CSS. The normal user won’t see it. But bots will fill them in. If a human user happens to use some kind of super-exotic browser and read my blog and wants to comment on it he will also see a short notice that advises him not to fill in the email field. So this fallback mechanism can even count as a simple form of a Touring test as well.

Of course this is not perfect. I hope that it will act as a front flood gate that keeps out the vast majority of dumbness though - and in the light of my Typo experiences that’s something like 99% of all spam.

The rest will be spotted by Akismet and packed away by Mephisto anyway as it happens now! Thus I hope to find only a very small number of spammy comments in my admin section after another 30 days and I’ll happily check and deleted them then! Whatever results I’ll see I’ll keep you posted :-)

Howto implement something like this?

That’s super easy. On the controller side all you need to change is one line in the MephistoController’s dispatch_comments action:

line 47    @comment.save!

now reads:

line 47    @comment.save! if @comment.author_email.blank?   

(One might argue that this behaviour counts in as a business rule and because we want to marry “fat models” with “thin controllers” we’d better move this to the Comment model. That’s right. But we can leave this for later refactoring and go with the “simplest thing that could possibly work” here for now.)

Here’s a patch that does just this: inverse_captcha.diff

But obviously this alone would leave most of your users run into a concrete wall. Additionally we’ll need to update our view accordingly and also hide the author_email field in the comments form through CSS like this:

#comment-email {
  display: none;
}   

Also, like mentioned above, I’ve added a short notice that asks people to not fill in the email field. I don’t think it will ever be seen at all. But otherwise it will prevent people from being locked out for no obvious reason.


<p id="comment-email">
  If you can read this, you don't use a typical webbrowser that plays 
  nice with CSS. <br />
  <strong>Please do not fill in an e-mail address then!</strong><br />
  {{ form.email }} <label for="author_email">E-Mail</label>
</p> 

Here’s another patch that I’ve filed away for my personal backup: inverse_captcha_theme.diff. Obviously it will only work with my own theme.

What do you think?

PS: If you’re interested in the results of this experiment you might want to have a look at my follow-up article: Report: 30 days with no blog spam on Mephisto!

Leave a comment

5 Comments

  1. Tim said April 4th, 2007 at 01:33 PM  

    That’s cool. Thanks!

  2. ember said April 4th, 2007 at 08:50 PM  

    hm

    i wonder if this will work with a bot who acts like a real user-agent?

    btw - in my wp-blog spam drops to near zero after i have installed a hashcash-plugin.

    http://elliottback.com/wp/archives/2005/10/23/wordpress-hashcash-30-beta/

    kudos,

    ember

  3. ember said April 4th, 2007 at 08:53 PM  

    … and i hate to say this (and its a little bit OT) - but the timestamp for comments are broken - it is allways the same as the creation time of the original article.

    bye,

    ember

  4. Sven said April 4th, 2007 at 10:17 PM  

    Hi ember!

    No, of course this won’t work for all bots. Like I said: I want it to lock out the masses of dumb bots. Everything that gets beyond this “outer flood gate” will be picked up by Mephisto’s Akismet integration anyway. Akismet had successfully flagged 887 out of 889 spam comments within the last 30 days. I think that counts in as “practically airtight” - at least from my point of view.

    I’ve had a short look at the plugin that you mention and yes! From my experience with that Typo Ajax thing I agree that this will work for almost everything as well. But, as this has been your criticism … I’d suspect the hashcash thing to be vulnerable for a “real user-agent” bot as well, don’t you? And it excludes “all those” people who have Javascript disabled also.

    Regarding the timestamps, year. Thanks for spotting that!

  5. Sven said April 5th, 2007 at 08:03 PM  

    That stupid timestamp/timezone bug should be fixed now. This probably makes a good candidate for another “I18n mistakes” installment:

    TZInfo::Timezone.get(‘Europe/Berlin’).utc_to_local(Time.now.utc) does not return ‘Thu Apr 05 18:00:00 CEST 2007’ like one would expect, but ‘Thu Apr 05 18:00:00 UTC 2007’ (which IMO is just plain wrong)

Leave a comment

Name required
E-Mail and Website optional

If you can read this, you don't use a typical webbrowser that plays nice with CSS.
Please do not fill in anything here!

Hint: Markdown will be applied to your comment. If you post any code, be sure to escape underscores (like so: \_) if you do not want them to be converted to an <em>phasis.

artweb design
Sven Fuchs
Grünberger Str. 65
10245 Berlin, Germany


http://www.artweb-design.de

Fon +49 (30) 47 98 69 96
Fax +49 (30) 47 98 69 97