An unofficial blog that watches Google's attempts to move your operating system online since 2005. Not affiliated with Google.

Send your tips to

March 26, 2008

Google Sets, the Search Engine for Lists

SEO by the Sea points to an interesting patent that describes how Google Sets works. Google Sets is one of the first services that were added to Google Labs and it's a cool way to find list of related terms. Google Sets is a tool that generates lists from a small number of examples by using the web as a big pool of data. You enter some items and Google Sets finds other items that tend to co-occur frequently with your examples. For example, you could enter Barack Obama, Hillary Clinton, Rudy Giuliani and get a list of US presidential candidates.
One particular type of information often present on the web includes lists, such as lists of restaurants, lists of automobiles, lists of names, etc. Lists may be identified in a number of different ways. For example, a list may include an ordered list or unordered list. Special tags in a HyperText Markup Language (HTML) document identify the presence of ordered and unordered lists. An ordered list commences with an <OL> tag; whereas an unordered list commences with an <UL> tag. Each item in an ordered or unordered list is preceded by an <LI> tag.

Another type of list may include a definition list. A special tag in a HTML document identifies the presence of a definition list. A definition list commences with a <DL> tag. Each item in a definition list is preceded by a <DT> tag. Yet another type of list may include document headers. Special tags in a HTML document identifies headers using <H1> through <H6> tags. Other types of lists may be presented in yet other ways. For example, a list may be presented as items in a table or as items separated by commas or tabs.

After identifying lists on the web, Google generates a probabilistic model from the examples provided by users and classifies the lists according to the model. The items are assigned weights based on the classified lists and the weights are added to form a list based on the total weights.


  1. Each item in an ordered or unordered list is preceded by an < IL > tag.

    Erm, I don't think so Google. My list items use the < LI > tag... ;-)

  2. I fixed the mistake, even if it wasn't mine.

  3. That's very nice, but I'd like to have first a NEAR operator like Dejanews had before they became part of the Google hegemony... ;-)

  4. I don't think you actually need a NEAR operator since Google looks for matches where your keywords are next to each other. Anyway, you can use the star to replace one or more keywords from your query.

    "google launched * last week" matches: Google launched Knol late last week, Google launched Froogle last week, Google launched Picasa Web Albums last week etc.

  5. I've found this a very useful thing when I know a few brand names of things like watches or sunglasses and I want to know more.

  6. IAC, thanks for the * tip, I didn't know that. I was thinking when you're trying to search for why your "xxx" has problem "yyy", google will often return lists, digests, etc. in which both terms are there, but it's about "xxx" has problem "zzz" or about "vvv" has problem "yyy". To get around this you'd have to create complex queries, while in Dejanews you'd only have to search xxx NEAR yyy.