134

Hi folks. On the comment of our programmer saying that "our webform needs a captcha", I felt it was time in 2011 to rethink the old dilemma... There must be/I must invent a better solution! Why?

Factual Problem

Most of my clients hate CAPTCHAs. My clients are typically 40 ~ 60 age, female & male managers & decision makers, not nerds like myself. Now, admitted: I myself feel often like a robot: I have to squeeze my eyes and read those meaningless obfuscated letters in the Capcha... Sometimes I even fail when filling them in! Go back etc. If that is a turnoff for me, just imagine what it must be like for human customers. So: shouldn't the forms have better A.I. built in by now to smell the difference between spammy robots and real human visitors/clients?

The Big Picture

To tell difference between human and robot by giving credibility points. 100 points=human | 0%=robot.

  • AWARD human credibility points for:
  • human mouse movements, that don't follow any mathematical patterns
  • non-instantaneous reading delays between page load and first input(s) in form
  • when typing in form, delays are measured for letters, spaces and word completions
  • typical human hesitations, behaviors, deleting, rephrasing etc
  • when global flooding threshold isn't reached (number of total submissions within 1 hour)
  • RETRACT credibility points for:
  • suspiciously instant pasting / completions of one or more form fields
  • website hyperlinks in form (very spammy and uncommon in most forms)
  • single ip quickly reading many many pages with <1 sec. page viewing-time on all no-form pages
  • when url/email field (inside HTML, but invisible by human) is populated or non-empty
  • retract points if IP of visitor is inside worldwide network of DBs of spammy sites

When more than 50% human credibility, allow to be sent without captcha. If less than 50%, force easy captcha puzzle to be shown. Less than 25%? Show a difficult captcha like the today's default eye-squeezing nonsense word captchas.

Currently websites assume we are 0% human by default. Unless we prove them otherwise.
I feel its time to reverse that false prejudice!

Imagine the user-friendliness

Your site distinguishing itself from others, showing your audience your sites KNOWS the difference between a robot and a human. Imagine the advantage. I am trying to capture the essence of that distinguishing edge.

Programming Question

Obviously this question is centered around inventive ideas and new A.I. code. Let us for a minute not think in existing .js .css .php .cfm etc but first try to distinguish human/spamserver behavior, then think of simple smart ways that provide a better, more user-friendly alternative than forcing your clients/visitors to write CAPTHCAS.

The bounty to my question goes to the most elegant/clever/simple answer. Important side note: Sometimes creative new idea's or questions are quickly or jealously closed off as off-topic due to lack of exsisting old answers. So was the case with my question! As an art director I can say that this is the greatest bottleneck to innovation and progress. Fortunately some angels vote for reopening and now its a wiki! Thanks for not accepting yesteryears remedies, and pushing innovation forward!

91

I tend to include text fields (later visually hidden or obscured with Javascript) with "name" parameters like "email", "url" and "name". Spam bots always fill these in. Your users won't, because the fields are hidden. If the fields are filled in, your submission came from a spambot. Easy!

44

I think a good solution will incorporate several VERY easy and NOT time consuming things for a human to do.

@Justin Niessner, I REALLY like the idea of image based captcha. Have a huge list of nouns and images of that noun:

Orange

Apple

Car

Motorcycle

Dog

Cat

Then ask the person what they see. If you are clever you could include verbs. Show a picture of a female jogger, and you can ask WHAT is in the picture (woman, jogger, etc...) or ask what she is doing (jogging, running, etc...) with mutliple choice responses.

I also like the idea of answering simple questions:

  1. What is the sum of 11 and 13?
  2. What time is it (with Javascript clock)?
  3. How many primary colors are there? (with RED, GREEN and BLUE) in parens... a bot won't get it.
  4. Who is the current U.S. President? (might anger people), but most know the answer.
  5. Uncheck this box if you are a human [CheckBox with randomized ID]

If you want to make it less annoying, any or all of these could be multiple choice. Display your math answers as words, not numbers, to solving them more trouble.

To take the simple question idea a step further, you could obfuscate the text of your questions and then decrypt them with JavaScript. This makes solving the "sum of eleven and thirteen" that much harder. Check out the following page and look at the "debfus" function for an idea on how this works: http://www.cpuboards.com/contact. If you search the source on that page, you will not see "sales@cpuboards.com." But if you click the link for that email address on the right, your mail client will start an email to that address, because the JS decrypts all the mess in the source so your browser can read it. Any bot that cannot do that process will fail.

I think tracking mouse movements is a bad idea. I never use my mouse on a form, other than to put the cursor in the first box. After that it is strictly tabbing from field to field, so I would likely fail that one.

Time spent on page is another good one... say at least 1 or 2 seconds per required field, although if you use an auto-complete tool this might be a problem.

Still another idea is to require registration, but allow you to "post" your comment before confirming the registration so it is less annoying.

  1. Fill out form, with comment or whatever. Require email address with VERY SHORT AND CLEAR message that "we will not send you spam."
  2. Comment is "staged" but not displayed.
  3. User clicks confirmation link. 4) Comment appears on site.

This encourages you to go ahead and make your remark, without all the interruption of Registering with a long form, opening your mail client, clicking a link, going back to the original page you wanted to comment on, logging in, etc... Just fill out your comment, and click a link the next time you check your email.

There will always be somebody who gets into your form that you don't want posting there. I think the bottom line is just to not leave it wide open. People don't really like captchas, and I think there are PLENTY of simple questions out there you could ask to beat a bot. 2-3 multiple choice would likely do the trick in most cases.

If your site is really popular (Youtube, Facebook, Etc...) I think you are going to have to change the method you use on a regular basis. People will figure damned near everything out.

42

How could this not have been posted:

And what about all the people who won't be able to join the community because they're terrible at making helpful and constructive co-- ... oh.

(Sorry, I couldn't resist)

33

It's not CAPTCHAs that are annoying, but rather having to deal with them repeatedly for every little thing. They become exponentially annoying when they are hard enough for humans that you have to press their refresh button to get one that you can actually read.

Most tests that you described (which, by the way, are also a form of CAPTCHA) have a major problem: they can be defeated by recording the actions of a human user, or even by simulating some of them. Simulating random mouse movement and typing behaviour is quite simple for a motivated spammer. And any moderately popular site will attract motivated spammers.

The key word here is motivated. For spammers everything is a balance between effort and profit. You don't really need to distingush computers from humans - what you actually need to do is make spaming through your site costly enough (in effort or money) that spammers won't use it.

If, hypothetically, your site allows a spammer to send messages all over the world with no limit or possibility of termination, then they could very well have an actual human being fill in the account forms.

In my opinion the best solution is to use an account system where a new account has only minimal capabilities and gains more as they go along. You could do that by moderating new accounts, having peers provide feedback, using a decreasing amount of CAPTCHAs or some other heuristic. You should also use heuristics to track suspect behaviour, such as creating posts en-mass.

While this will inconvenience new users for a while, it will allow frequent users to go about their work without unwelcome interruptions of the "are you really a human?" variety.

By the time the account has proven itself to belong to a human, the effort involved will have discouraged most spammers. The few persistent ones will be far easier to weed out. StackOverflow uses such a system (and quite successfully).

If having accounts is not possible, then CAPTCHAs in some form is the way to go, so that your site will not be hammered by the spammers' botnets...

...or you could simply ask people for a moderate fee :-)

26

One alternative that no one has mentioned is Hashcash, it was designed to make email SPAM a costly action but it can also be adapted to the web paradigm, here is how it works:

  1. user opens the webpage, as soon as the document is ready a JavaScript code kicks in
  2. the JavaScript generates an hash (md5 or sha1) with two salts (one static: the domain name of the site for instance, and one dynamic: a token that exists in a cookie or hidden input field) and an incremented number
  3. if the first x chars of the hash are not 0's the JavaScript increments the number and computes the hash again until a hash with x initial 0's is found and places the string on a hidden field
  4. the server hashes the string in the hidden field and if the hash is composed of x initial 0's it's valid

It's trivial for the server to verify this, but it usually (depending on the number of 0's) requires the client to compute tens or hundreds of thousands of hashes before finding one that is valid, since this is a process that takes time (and a JavaScript interpreter or a custom written bot) it won't appeal bots very much, while normal users won't even notice since the hashes are being generated while they read and fill the form / page - also, even if a bot really wants to cheat the speed that they are able to do it is decreased.


As for the A.I. idea: don't waste time on it. If you could build it, someone would work around it.

24

I dare to say if it was that easy to make it would be already done.

real mouse movements

What stops you from recording one mouse movement and then using it to jump to fields? Probably slightly altering it? Adding random movement here and there?

real reading and typing delays (no copy pasting instant fililng, byt real keyboard pressures...)

What about those poor folks who use auto completion?

The hard truth is, everything you can send to the server can be generated as well by an automated script. Unless it is something which requires some human traits (abstract thinking, common knowledge, semantic understanding of the text, complex recognition [how the hell can this be called?]) it won't be too difficult for someone to write a bot breaking this system.

19

Easy, if someone reads to many old pages with low view count in short time it's a bot.

16

Inspired by wufoo.com's smart CAPTCHA, I helped work on a module for the Drupal content management system, called CAPTCHA After. The idea is similar to what you describe, the module only shows the CAPTCHA if triggered by suspicious behavior. The module currently supports the following triggers:

  • submit threshold (number of non-validating submissions per user)
  • flooding threshold (number of correct submissions per machine IP)
  • global flooding threshold (number of submissions by all users within an hour)

It's certainly not perfect, but it does increase usability over using a CAPTCHA every time. Depending on your needs, it can certainly be worth using.

12

Disclaimer

This answer is creative, experimental and not at all best practice or even verified to work! Don't use this in a real life situation if you're not having a giant research and test team. This answer was given only as a response to the bounty requirements to come up with something new and creative.

There are a lot of good and approved ideas for captcha alternatives (here, here and here), but I would like to come up with something that I haven't seen out there (yet).

The main concept would be to first generate a ticket on the server-side. Then there are three buttons. Two buttons are hidden (we'll come to that later) and one is visible. This means that there's just one human-clickable button.

We assign all buttons with the same looking ticket, but only the visible button gets the right ticket that we are going to verify after the post.

Of course, something like display: none; would be way too trivial. But something like position: absolute; left: -200px; would also work to hide the button. What if, we pack the 'to hide' buttons in nested divs, spans etc. and relocate their position like:

<div style="position: absolute; left: -200px;"><div><input type="button" id="test" /></div></div>

Of course this should follow some randomness (of nesting and styles), to make it nearly impossible for the bot to figure out which parent element causes the hiding.

Even something tricky like this would be possible:

<div style="position: absolute; left: -200px;"><div style="position: absolute; left: 200px;"><input type="button" id="test" /></div></div>

The more buttons, the lower the chances are that the bot hits the right one.

A big plus would be, that it doesn't require JavaScript.

Just my 2 cents ;)

Update

This is a classic one I'd like to share. XKCD - A New CAPTCHA Approach:

enter image description here

10

Spam bot usually skips the form and just sends a forged POST to the page. Key Press, Mouse Movement and other things are strictly client sides related.

I saw very nice CAPTCHA alternatives like this one:

http://research.microsoft.com/en-us/um/redmond/projects/asirra/

9

Most Robots do have different patterns than users. The trick is to detect these patterns within a single page or at most two page requests.


Behaviour Matching
Many of the 'I pretend to be a human'-robots mask themselves with common user agents. Some of them are quite ridiculous (Masking themselves as Internet Explorer 4.01 for example), but the better ones use a whole list of 'credible' user-agents.

The good thing is, real browsers have common, detectable patterns. So, you can match behaviour vs expected behaviour.

Let's say the reported user agent is:
Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 ( .NET CLR 3.5.30729)

This is firefox:

  • firefox displays images.
  • firefox requests css files (even through the more obscure calls such as @import)
  • firefox understands javascript (not always, but mostly so)
  • Firefox normally sends the correct referer. (can be disabled, but usually isn't)
  • etc, etc.

There is a lot of 'known behaviour' for known user agents.


Non Matching Behaviour?
So, if we get behaviour that is not 'normal' for a known user-agent, we can ask for additional proof (a captcha for example). The same goes for unknown user agents.


Javascript usage
The 'problem' with javascript is that it can be turned off. People that do turn off javascript now days seem to accept that they miss out on the 'extra' functionality. In this case, you could ask them for additional proof of them being human.
However, if Javascript is enabled, it can help a lot in making the distinction between humans and robots. For example, making an ajax request to the server is something most robots do not do. The same goes for DOM-manipulation: add a new (1px) image to the end of the DOM. if it is not requested by a user-agent that is known to support images (and has made other image request) -> ask for extra proof.


Browse history
The more request a visitor makes, the easier it becomes to make the distinction between human and automated process. You can use the referrer to your advantage then. Did they send one? does the referring page actually have link to the new page? Did they ever request the page listed in the referer? etc, etc.

Some 'smartass' robots tend to send a referer that equals the page requested, however, if this is the first request for a particular visitor, it is non-possible behaviour.


Hope this helps in fighting the creepy crawlers :)

8

For bots that crawl, how about a honey pot?

Set up some sort of form page that is linked to your site but with links that are hidden in such a way that your human users would never follow them but bots would. Put it right next to your real form and make it something really easy for a bot to fill out. This way the bot will almost always fill out the this alternate form and your humans won't.

You let the bot fill out both forms and use that to identify the data (or data sources) to delete that is submitted through the first form. You could check by IP, timestamp, or embedded links. You could event let the bot think it succeeded and then maybe it won't come back.

Two benefits of this over several other methods mentioned. One, users would still be able to post links without drawing "suspicion" of being a bot. Because the honey pot would be a separate page you wouldn't have to worry about browser auto complete filling in hidden form fields for human users.

6

If you dont like captchas where people have to identify word on images, simply ask an easy to solve question, for instance

  • what's 3 and 3
  • what time of day is 8am
  • what color is a lemon

or skip the question and just tell straight:

  • "Enter PIN 1234 to validate your are human"

where 1234 is a variable number. This is usually enough to hold off most Spambots and painless for most audiences. It is completely accessible, even from non-visual user agents, like screenreaders and does not depend on JavaScript or CSS.

3

The problem is that a lot of real world captchas are not broken by bots, but by real people. For instance decaptcha (http://www.decaptcher.com/client/). So the best way to protect your self is to use anything unique: that way you can guarantee that there are no bots/services available to break your measures; and it also doesn't matter which technique you are using.

3

I'm wondering here, if you're trying to solver the correct problem. If a user is authenticated as a user, (sorry for the tautology), than you don't need a captcha, do you?
I'd suggest to use a SSO service like OpenID. Your users only have to create an account once, which might need a CAPTCHA, but that's once. Look at how SO does it!

3

The answers being posed here are all targeting the symptoms, and not the disease:

People don't have any actual issue with the bots per-se. They have issues with the spam that the bots produce, and the server-load of massive numbers of requests.

The best replacement solution to CAPTCHA I've ever heard of was by Randall Munroe in his XKCD comic: Constructive. Implementation would of course be difficult, but I don't believe it would be impossible.

3

An alternative or addon to the awesome answer above by @stefan we could add some basic statistics on and from our site.

If a site is big, or even huge, say as Stack overflow or Wikipedia. We will most likely tag or somehow categorize our site.

By analyzing our users behaviour we can build normgroups for each category/tag, from now on category. We can then see how likely it is for a person of this normgroup to visit another specific category. By using reasonable extremes we can easily find anti groups. People in one norm group is rarely spotted in another group etc.

A parameter to building and evaluating a norm group could be to only use data for people who has actually submitted content to each category. This will rule out most lurkers who might have been bored at work or doing a usual "wikipedia rampage".

Some imagined list of anti groups:

  • CSS vs ASM on SO
  • How to save trees vs How to reduce oil prices on a Wiki
  • Live sport results vs For everyone who hates sports on Facebook

So, now we have some anti groups, what if someone actually try to write in Orange who belongs to Blue since before? Well, use captcha or moderation/flagging; if he is a bot remove him from all groups if not let him be verified in both groups for the future, statistically you should just leave him there.

And dont forget to rebuild your stats and norm groups and thereby also anti groups frequently!

A bonus of all this is that nosy people will have to fill in captchas all the damn time, that'll keep them to stick to their own business ;)

3

Rather than use a captcha, you can use something with real world status. For instance you could use a twitter or facebook sign up. Admittedly, this is moving the problem, effectively stating you can save the fact you are not a robot.

Option 2, along the same lines is to create an open standard for this problem. Have a service which is effectively a single sign on solution thats for proving you are human.

The issue is always that you are creating extra steps to prove your sign up is of value.

2

Given that IBM's pretty much ready to have a computer play Jeopardy against the current reigning champs, a-la Deep Blue v.s. Kasparov, I think it's safe to say that within 10 years any of the techniques discussed here are going to be trivially performed by your average desktop computer.

2

You should look for an alternate solution. Chrome's or Safari's auto-fill options will make anyone using them fail a big part of your proposed test; not using a mouse will take care of the rest.

Besides, some parts just won't work. A mouse tracking tool to look for non-mathematical patterns has to become exponentially more complex for any improvement made to the random mouse pattern generator your potential cracker will create (assuming P != NP).

Akimset is, IMHO, the best alternative to captchas on the market.

EDIT: I forgot to mentions the UX problems. What if someone is recognized as a robot? You just kick him out? You spawn a captcha? You tell him: "you kinda look like a robot, can you please take some human-standard time to fill your form"?

2

"Most of my clients hate CAPTCHAs"

I made a survey and most of respondents prefer passing a captcha to seeing "Your comment is awaiting moderation" or automatically be dumped.
Then ppl hate difficult and distracting but not easy and cool captchas.

Update: see Update1 below

"AWARD human credibility points for:"

All these ideas already were implemented in existing spamstopping technologies

How would you give credibility points

Big Brother eye-balling everywhere? This does not sound as a great or new idea.

Update: It is funny to read from the same person, contemplating more crazy police system (instead of giving to users/visitors the right to decide for ex., to login or not using captcha and when to delete, wiki, edit its posts):

  • "My quesion was re-opened after being closed for some time as off topic, so I thought nice lets rejuvenate it and edited it. Suddenly my question became automatically a community wiki!? Since my native language is Dutch, not English, I need more edits, especially feared by the moderation police who had closed off my question in the first place! Now, can some of those powerfull people please help me roll my question back to its non-CommunityWiki state? That will make things even again! Thank You!" enter image description here

"1.Are such things possible at all to program?"

I do not believe so. You should not count on it, anyway.

One should count on flexibilities of approaches and common sense like balance between usability and security but not final solution and silver bullets

"Could this page be the start of a new AI captcha, OR would it, like most creative new idea's"

Have you googled before?

Update1:

KeyCAPTCHA users love this captcha:

Here is an image from "The aliens having sexy party" post:
try out live demo at KeyCAPTCHA.com

2

Most of the answers I like best here seem to be relationship based. If you have a way of characterizing the site's relationship with the user then you can captcha them if the relationship remains aboslutely identical or if the relationship changes drastically. Once you've captcha'd them a few times then you can decide they're human and quit doing it. Combine this with starting the relationship small, and you've got a plan. You could even start this with a characterization of relationship with IP, and then propogate when they register.

2
suspiciously instant pasting / completions of one or more form fields

Bear in mind for this one there are many browser plugins and apps for automatically filling out fields with user details and addresses etc.

2

From a pragmatic point of view, while many of these tactics looking for specific behavior may seem easy to defeat, it only seems that way because you know what the validation is looking for. I could imagine these could be quite difficult to guess, if you are not telling the user why it was rejected. Especially if the tactics are uncommon or unique in some way.

2

When it comes to forums or comments, a lot of spammers want to supply a hyperlink, either to spread malware, sell something, or to exploit the Google PageRank algorithm (if you don't use the nofollow attribute). It may help to require a CAPTCHA before allowing a hyperlink to be submitted, unburdening people that are leaving a normal link-free comment.

Small sites should use different anti-spam techniques than large ones. A small site might get away with a trivial text captcha like "type the word 'dog' here:". A very large site can't settle for a simple and easy solution, but because it is large, it can look for patterns: the same message posted in more than one place (or similar messages differing in punctuation/capitalization/small typos, or that contain garbage like "zkslwasppty"), or large numbers of requests from one IP (doesn't help if the spammer uses a botnet).

StackOverflow has a reputation system that all users can see, but any large site that has registered users could maintain a secret reputation number on the server for each user, and give captchas only when the reputation is low, e.g. below 100. This "spam reputation" number could be increased or decreased temporarily in some cases: how many requests has the user made in the past one minute? Decrease rep by 25 for each POST and 2 for each GET request. Was there no delay between GETting the page and POSTing a response? Decrease rep by 50. Did the user just fill out a CAPTCHA the last time he posted (whether that was 5 minutes ago or last week)? Increase rep by 50 so the user is less likely to be required to fill out another one.

You could have a button to "flag a message as spam". Besides notifying an administrator, this could decrease the rep of the user by an amount that depends on the reputation of the person flagging the message (I am slightly concerned about a hypothetical malicious bot that flags everyone else to make itself look better).

This idea would also require some ways to increase reputation. You would have to identify some things that normal users do, and reward those actions--like the suggested credibility points, except that the points accumulate slowly over time so a bot cannot easily earn them without spending time behaving like a legitimate user. Keeping the whole system secret would help thwart any bot trying to exploit you.

2

The More Captchas Are Used, the Better AI Attack Scripts Get at Reading Them. Most of what is public information about AI attacks upon captchas is academic; as one group of researchers develops a more difficult captcha, another group tries to find ways to defeat it?and often succeeds. There is no reason to imagine that the situation is any different in the nonacademic world, although spammers (unlike professors) are not typically talking about their successes. When the rewards are high enough, someone will make the effort to break the challenge.

What this really means for you as a programmer is that no high-stakes challenge you develop is likely to be successful for very long. For that reason, you should monitor usage of your website carefully, examining log files to see to what extent users successfully pass through your captcha challenges, and whether they go where you expect them to. You should also be sure to update your challenges as better versions become available.

Link

2

IMHO, the onus is on the implementer to filter for spam, not the client. There shouldn't be a visible captcha for the client.

My solution is four part:

  1. Generate a seed server-side and invokes a server-side hashing algorithm on the user-agent. Store the seed, the user agent and the hash in the session. Pass the seed to the client.
  2. Execute javascript on the client-side that invokes the same hashing algorithm on the seed and the user agent. Perform some additional "environment detection" checking that tests that the javascript host object (window) is consistent with the user agent. Write the resulting hash to a hidden html element so that it's posted when the form is submitted.
  3. On the server, confirm that the server-side generated hash matches the client-side generate hash. If it doesn't, place it into a "spam queue".
  4. Lastly, all requests marked as spam should be monitored by a person periodically to ensure that the code is working as expected and the spam is being separated as expected.

The code won't be perfect. There will be false positives and smart spammers who know how to run javascript and emulate the host environment may be able to generate valid requests. However, it should catch a fair amount of them.

Note: Javascript code should be obfuscated and minified to further block spammers.

HTH

1

1) Not 100% reliably, no. If they were, then the world would be a spam-free place. Even CAPTCHA isn't 100% impossible to defeat.

2) See 1).

3) Akismet is a fantastic plugin for Wordpress. I can't believe how well that works. But even that isn't completely reliable.

I hate CAPTCHA as well. But some of the pages I designed years ago for a client have this in, because it was the style at the time, and now there are bigger priorities than replacing it with something else, as it's such a big job.

1
  1. Yes, such things are possible (of course, a near perfect solution is not probable since none of us have a supercomputer capable of fast machine learning sitting in our offices).

  2. Write code to track user behavior through the site. Classify different behavior as a bot vs. a human.

  3. Very good solutions? Questionable. The best I have seen is here on StackOverflow when rapidly editing answers.

  4. ...

  5. I would forget about AI for now and figure out a better CAPTCHA system for your side. For older audiences, image based or audio based CAPTCHA would probably work better than text based.

1

Nearly every single point of this list is either extremely difficult to realize (if at all) or easily defeated.

* award points for real human mouse movements

How to recognize "real human mouse movements"? Basically the whole problem is about telling computers and humans apart. Still there is the possibility of recording and playback.

* award points for mousemovements that dont follow any mathematical patterns

There's no way of finding out that something follows no pattern. Depending on the definition of pattern, there may be no movement that qualifies for your test at all.

* award points for non-instantaneous reading delays between load and first input in form

This would probably be done client-side and thus the check results can easily be changed. Simply replay "test ok".

* when typing in form, delays are measured between letters and words entered
* award points for typical human behaviour measured (deleting, rephrasing etc)
* retract points for instant pasting in various fields

Same as above.

* retract points for credibility when hyperlinks found in form

Don't understand this one.

* test wether fake email field (invisible by human) is populated (suggested by Tomalak)

Simply solved when programming a site specific bot.

* retract points if ip of visitor is inside worldwide db of spammy sites

This might work. But might also fail.

1

My website asks one simple question every human can answer but software wouldn't. "The song says to row, row, row your what?" Stopped the bots cold.

1

In general... every computer run but hardcoded analysis is prone break by a a computer run but hardcoded solver. Mouse and typeing behaveiour will prevail only as long as someone needs to come behind the usage of it and some parameters used. It doesn't exploit well known deterministic brain features as the Captcha does.

So a Captcha got the computer where it is weak relative to the humans brain: A large area of the brain is actually assigned to specialised visual computing circuits. Thus the brain has an outstanding visual computing performance where it easily outperforms machines as long as they have less computational power.

It would be hard to find something with a higher human vs. machine solving speed ratio as a Captcha.

1

How about another approach. Instead of giving them a captcha or a question or hiding buttons. rather then trying to give any complication to the user, We complicate the actual website.

Make the page so complex that the browser can figure out the display, but a bot would be drawn into a hundred different pitfalls, make it so complex that developing a bot to traverse it would be more costly then building your own site.

Here are the steps.

  1. Come up with the design of the page.
  2. Remove all labels. and replace them with a portion of a sprite image.
    • Sprite image should be generated by the server with all the labels of the form needing to be filled out.
    • Images and CSS Background offset is used to replace the labels.
    • Multiple CSS tags should be used that override the previous to confuse the bot. (but not always)
  3. Move all the labels (now images) and textboxes around with absolute position.
    • this CSS file should be generated by the server (will be explained soon)
  4. Name each box with a 100% unique name. {GUID Comes to Mind here}
    • These should be tied with the CSS generated in step 3. along with the label associated with them.
  5. Complex layout should be nested and offset, floated left and given absolute position. Everything you can think of. Make the browser work for its usefulness. And the bot more confused then a bunny laying eggs.

Now the user sees something completely normal, and the bot has to work too hard for it to be useful.

Finally, if you want to make it even more bot unfriendly, Javascript will be a great way to go.

Disable the form by default (a honeypot path to accepting the form submission), then after the form is being filled out, time is elapsing and text boxes are being selected and clicked on, it can change the form post method and allow the form to be submitted though the real path.

No need for captcha, but a heck of a job with obfuscation in the side of the site programmer.

However if you developed a Handler for the page before it went to the user. this could mostly be done by the server. Kinda like code obfuscation you hide the actual meaning of your page.

Downside. (Google is not going to know what to think of your site)
Upside. Users will get a rich and meaningful experience, and Bots will go to Hades. (Do not pass go, do not collect 200$)

This posting was produced by a bot. Thank you, come again.

1

Here's a very theoretical option that could become an interesting open source solution. The approach creates a third party to help web site developers figure out who is human:

Would people and web site developers be willing to give up a little privacy in exchange for not having to complete a captcha? I know I would as follows.

Write a browser plugin called PLUGIN that generates a unique identifier for the computer on which it is running. While this is non-trivial, it is doable.

Write a server called SERVER that communicates with PLUGIN. Have PLUGIN report browsing behavior to to SERVER, things like date/time/url visited would work.

Make SERVER responsible for deciding if PLUGIN is reporting human like browsing behavior. If PLUGIN is reporting human like browsing behavior, then flag the associated unique id with "Human Browser" If PLUGIN reports robot like behavior, then flag id with "Robot"

Now... a web site developer called SITEDEV makes a web site. SITEDEV can use SERVER to decide if requests for their web pages are coming from humans or robots. As a cost of getting to use SERVER to identify humans and robots, SITEDEV agrees to post page request activity and form submission activity logs to SERVER as well. SITDEV does not have to share actual content, just the url requested, a datetime stamp, and ip address would be enough.

So... SERVER now has two records of browsing behavior. One reported by PLUGIN and one reported by SITEDEV. As part of authentication, SERVER compares these records. If they match, then PLUGIN can be further trusted. As you might expect, comparing these two information streams is also non-trivial, but it seems reasonable to expect that PLUGIN and SITEDEV should report some similar activity that can be used to externally authenticate/validate PLUGIN.

  1. What if PLUGIN is being used by a well intentioned user? The activity reported by PLUGIN and from sites using SERVER will match. After a few site visits, SERVER can determine that PLUGIN is being used by a human and this happy user does not have fill out captchas.

  2. What if PLUGIN is being used by a mischievous spammer? The activity reported by PLUGIN and submitted logs from SITEDEV may not match. This user must fill out capthchas.

  3. What if PLUGIN is being used by a busy spammer? Sites using SERVER report too many page requests and form submissions for a given plugin and SERVER flags PLUGIN as "robot" This user must now fill out capthas at sites that use SERVER.

Back to reality. I am not sure what the bandwidth, processing, and storage costs could be for SERVER, but I am guessing they would grow quickly. The original poster asked for theoretical solutions, not easy ones.

One thing I like about this scheme is that users and site developers each have an incentive to use PLUGIN and SERVER and this could help drive adoption. One drawback is that the scheme would need some 'critical mass' of participating sites to get started or SERVER would not have much data to draw on to authenticate individual PLUGINs.

If it all worked, I can see web browser writers also having an incentive to tightly integrate PLUGIN into their code as a benefit for their users, further promoting adoption.