Most webmasters will have a form – or forms – on their site. And most of those forms will get targeted by spam software bots looking for any way whatsoever to get links to their spurious offerings onto the World Wide Web. And it’s not just one-offs either; once found this abuse will be relentless and over time, downright irritating. Ask any web developer and they will talk to you with great vigour about spam and their vendetta against it.
There are many steps a webmaster can take to prevent unsolicited form submissions and everyone has their own opinion as to which one works best. This post will outline five of the main methods and will also outline their respective pros and cons.
A CAPTCHA is an image that contains a verification code; the user must enter the code to prove they are not spam. Since some spam bots have basic optical character recognition capabilities, some CAPTCHAs attempt to obscure the code by warping the figures or setting them against a ‘noisy’ background.
How effective is it? Since spam bots are acquiring increasingly sophisticated levels of character recognition, CAPTCHAs are being presented in increasingly more obscured formats. The result is that while many of them are very effective in deterring spam, they are also very difficult for humans to read too.
This simple method involves adding an extra text input element to your form. In an external style sheet you set the element to display: none; thus making it invisible to all users with CSS enabled. Since spam bots will usually fill all fields in a form you know that any forms submitted where this invisible field is not empty are spam.
How effective is it? In most scenarios, this method will filter out the majority of spam. However, many bots are wise to this method as it is relatively easy to parse the CSS and determine if it is hidden. The extra field will also display for users that have CSS disabled. In its credit, this is by far the most unobtrusive as most users will not even know a spam check is taking place.
Lots of forums use this method (in combination with a captcha). The form must ask for a valid email address; when the form is filled out an email is sent to that address along with an activation link, which the user must click for the submission process to be completed. This means that spam bots fill out the form but since they don’t give out a valid email address – and they aren’t human – they never complete the process.
How effective is it? This works very effectively but taints the user experience as further action is required from them even after they have submitted the form.
Add a simple question to your form; it can be anything that you’d assume everyone knows the answer to. For example, what colour is the sky? Then add an extra text input element to your form.
<input type="text" name="spamanswer" value="" />
Then add a hidden input element that states the answer.
<input type="hidden" value="blue" name="spamsolution" />
Then all you need to do is check whether the POST values of spamanswer and spamsolution are the same. If they are, it’s a legitimate submission. You can add further spam protection by removing the hidden input element and keeping its value server-side (in a text file or database). If you use this method, you should make sure that the answer is obvious to anyone, regardless of their academic background. And you should also compensate for common typing errors and case sensitivity.
How effective is it? Using a simple mathematical question such ‘What is the sum of one plus seven?’ is arguably the most effective way of filtering out spam submissions. If you use words for the numbers (e.g. seven instead of 7) you will make it even harder for the spam bots to decipher. And seeing as it involves adding two numbers together it is even accessible to people who perhaps don’t speak your language as their first language.
There are other methods of course and these can be combined for extra effectiveness. We should exercise caution however. Any spam filtering we do, should do just that. Filter spam. The moment legitimate visitors fill a form out that gets sent to its destruction – along with all the other spam – or the moment a legitimate visitor gives up trying to fill the form out means we’ve taken our vendetta against spam too far.