September 16, 2009

Google Buys reCAPTCHA

reCAPTCHA seems like a perfect match for Google: it's a project that generates CAPTCHAs and uses the results to digitize books. "reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. (...) Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one."


It's no wonder that Google decided to acquire reCAPTCHA and use the service to improve Google Book Search's digitizing accuracy.

"reCAPTCHA's unique technology improves the process that converts scanned images into plain text, known as Optical Character Recognition (OCR). This technology also powers large scale text scanning projects like Google Books and Google News Archive Search. Having the text version of documents is important because plain text can be searched, easily rendered on mobile devices and displayed to visually impaired users. So we'll be applying the technology within Google not only to increase fraud and spam protection for Google products but also to improve our books and newspaper scanning process."

The service offers a simple JavaScript API that allows you to embed CAPTCHAs in any web page and many popular sites use it: Facebook, Twitter, Ticketmaster.

16 comments:

  1. I hope Google save us from automatic subimitions techniques.

    ReplyDelete
  2. I find captchas sometimes very annoying when you really can't get what it says, but if it helps to digitalize books, I think its worth it :)

    Good job google

    ReplyDelete
  3. I don't understand why all world apps continue to use a fu**ed system like captcha, when on blogosphere many many blogger discovered Hiddy.
    Hiddy is a tech solution without visual input. It's first version built in Coldfusion.

    I don't know if Hiddy is the definitely solution, but I think it's a good start!

    ReplyDelete
  4. I agree with all the above. Modern CAPTCHA styles have grown too distorted for the sake of security, omitting usability.

    I'm currently working on a new CAPTCHA style (in PHP) that is elementary, but still complicated for bots to discern.

    ReplyDelete
  5. Aside from whether CAPTCHA is still the best type of defense against spam bots, the OCR fine-tuning-through-CAPTCHA idea itself is brilliant.

    ReplyDelete
  6. Yeah, the first purpose of reCAPTCHA is to sustitute OCR process and THEN avoid spam, etc. Now, you cannot ask why are we still using recaptcha.

    ReplyDelete
  7. Hiddy can only really be considered a temporary solution; if everyone used Hiddy, then the spambots will eventually just start parsing the CSS to determine what's visible.

    ReplyDelete
  8. This will help to keep away spam and other automated software from than 100,000 websites as those websites have now Google powered CAPTCHA technology.

    ReplyDelete
  9. Keep away form spam? I dont think so my friends :)

    ReplyDelete
  10. @joelpt: maybe you say true. But I think that it maybe a good start to create something new there's no hate captcha. Integrating css, script, ... I don't know something... but I think we can find a solution.

    Example: a text input when we can write via js in it "please keep it void"... :)

    ReplyDelete
  11. Do you know how much they bought it for? That info would b interesting...I'm a student at Carnegie Mellon's School of Computer Science and I still havent been able to find out how much Prof. Louis Von Ahn actualli got...

    ReplyDelete
  12. "displayed to visually impaired users."

    How do you display text to blind users?

    ReplyDelete
  13. I heard that the price is $150 000 000, but that sounds too much for me

    ReplyDelete
  14. When can we have this for Blogger???

    ReplyDelete

Note: Only a member of this blog may post a comment.