Sunday, October 30, 2011

Captcha and Duolingo!

Most of you must know what Captchas are (even if you have not heard the term) and have spent some time deciphering and typing what they display. They often are distorted words you see as figures and need to type, when paying a bill or making a comment online, ... so that a website verifies it is indeed a human filling out a form, doing a transaction, or making a comment and not an automatic computer program. Sometimes, it is hard for me to decipher what one of the words is to type and have to ask for a different Captcha before I succeed. Sometimes they are annoying, sometimes you think you are wasting time. This bothered captcha's inventor as well and motivated him to do something else with them.

Yesterday, I attended the 2011 TEDxMidAtlantic. Another wonderful event, with enthusiastic and passionate speakers and audience. A whole building full of enough positive energy to keep you going till the next year's conference. Among the speakers was Luis von Ahn, the inventor of Captcha, founder of reCaptcha, and a computer science professor at Carnegie Mellon University.

So it turns out we spend about 10 seconds every time we type a captcha, and that people around the world all together type about 200 million captchas a day, that is over 5000 hours a day. So, Luis von Ahn thought is there a harder problem that could be mapped and translated to typing captchas, so that by filling out the forms and doing transactions online people actually spend that 10 precious seconds doing something worthwhile, and solving a hard computational problem?

In fact there exists such a problem: digitizing books. Many old books have weird fonts, washed out words, or ones that are dragged across the paper over the years, in a way that it is very hard for computers to automatically understand them. For humans, on the other hand, that is an easy task. So, what do they do? They provide one of the words from the books they are digitizing to you in a captcha. However, since the point of the captcha was to verify you were in fact a human, they cannot test you with something that the computer does not know the answer to. So, they ask you to type two words: one is the word they know what it is and they can correctly verify your answer, and the other the one they are asking you to recognize to help them digitize books. They also gain some confidence when you type the word that they know what it should be. Then, when enough people distinguished the ambiguous word the same way you did, they declare it digitized. They continue the same process word by word, line by line, page by page, book after book. So, the next time you get frustrated digitizing books, do not! You are doing something great for free!

p.s. You can also learn a new language while helping translate the whole web! Brilliant idea by the same person on Duolingo!

References:
[1] Captcha
[2] reCaptcha
[2] Duolingo.
[3] Luis von Ahn.

No comments:

Post a Comment