An unofficial blog that watches Google's attempts to move your operating system online since 2005. Not affiliated with Google.

Send your tips to gostips@gmail.com.

April 12, 2007

A Simplified Version of Google's Spell Checker

Peter Norvig (from Google) explains in a detailed article how to write in 20 lines of Python code a spell checker almost as good as the one used by Google to show the famous "did you mean" corrections. Well, at least for one-word corrections.
We will read a text file, holmes.txt (that I happened to have on my laptop) which is a collection of Sherlock Holmes stories (from Project Gutenberg) consisting of about 100,000 words. We then extract the individual words from the file (using the function words, which converts everything to lowercase, so that "the" and "The" will be the same). Next we train a probability model, which is a fancy way of saying we count how many times each word occurs. (...)

Now let's look at the problem of enumerating the possible corrections c of a given word w. It is common to talk of the edit distance between two words: the number of edits it would take to turn one into the other. An edit can be a deletion (remove one letter), a transposition (swap adjacent letters), an alteration (change one letter to another) or an insertion (add a letter). (...)

The literature on spelling correction claims that 80 to 95% of spelling errors are an edit distance of 1 from the target.

A simple way to define the error model was to say that "all known words of edit distance 1 are infinitely more probable than known words of edit distance 2, and infinitely less probable than a known word of edit distance 0". From all the candidates for the correction, you can choose the most frequent word.

In Peter Norvig's tests, this simple algorithm returned correct answers in more than 80% of the cases. Of course that Google has more data than the holmes.txt file (it crawls the web, right?) and has access to a huge list of queries and refinements that could improve the algorithm, but this is an example of a simple yet powerful program.

8 comments:

  1. I would like to see Google donate their spell check algorithm to OpenOffice.org, their spell check is terrible.

    ReplyDelete
  2. Pfft, I can write this in two lines of code by screenscraping Google results, and it'll be not 80-90% as good as Google's corrections, but 100% as good ;)

    ReplyDelete
    Replies
    1. plz, can you enplane for me more about implementing "did you mean" ? for my search engine project.

      Delete
  3. Nice. The original article has changed, now it is 21 lines.

    Could someone compile this into an .exe w/ the big.txt and upload it, so we can try it? If not I'll do it.

    ReplyDelete
  4. I thought google stored all times someone queries and changes the query to something very similar without clicking on one of the results. This is also the way how people googlebomb the 'did you mean'.

    ReplyDelete
  5. I need to spell check a list of 50.000 names and this code does not take Literal expression "Angela Merkel" for exemple.

    And I repeat the demand of Zachary Liu : can anybody compile this with a GUI ?
    I did not try this dev :
    http://www.peterbe.com/plog/spellcorrector
    Gspell (for mac) which should work does not Literal.

    thanks

    ReplyDelete
  6. I felt like ending it i lost my husband to another woman 2 weeks ago after 27 years of marriage. We had a lovely marriage but he started a relationship with a co worker who chased after him. He is living away near his work, and he refuses to talk to me or to come home . I am devastated and am finding it hard to cope i was emotionally down . I wish I did not love him and that I could move on but I can't . I don't know how to stop feeling like this I wish I could and its eating me away and I'm starting to feel ill. I have degraded myself begging him to come home all to no avail. I became very worried and needed help. As I was browsing through the internet one day, I came across a website that suggested that Dr Ogudugu can help solve marital problems, restore broken relationships and I also came across several testimonies about this particular man so on. So, I felt I should give him a try. I contacted him and he did a spell for me. 24 hours later, my husband came to me and apologized for the wrongs he did and promise never to do it again. Ever since then, everything has returned back to normal. I and my family are living together happily again.. All thanks to Dr Ogudugu . If you need a spell caster that can cast a spell that truly works, I suggest you contact him. He will not disappoint you. if you have any problem contact him, I give you 100% guarantee that he will help you, This is his details, E-mail: GREATOGUDUGU@GMAIL.COM. Thank you all for reading.

    ReplyDelete