March 22, 2007

Summarizing Search Results

Search engines shows snippets next to each search results so you can decide if a page is relevant without visiting it first. But snippets are pretty short and focus on the text that includes your keywords. To find more about a page, a text summarizer that shows the key phrases would be helpful.

Syntatica is a company that combined search results with summarization and the results is Syntatica Search. If you leave behind the fact that Syntatica uses Live Search, you'll notice that the summary gives you a better idea about the content of a page than the snippet. This works well especially with pages that have a lot of content on a single topic.

"Unlike conventional search-and-retrieval programs, Syntactica does not simply match strings of letters to other strings of letters in an index. It analyzes concepts in the context of the sentence structures in which the words reside. (...) The program first determines the semantic weight of the concept from the dictionary. (...) After the semantic weight is determined, the program determines the concept's place in the text's syntactic structure to determine its overall relevance. Once a concept's relevance has been determined, the program follows more rules that compare the weight of all the concepts within a text, and generates summaries based on the relevance of all its concepts to produce the desired output."

This could be a machine-generated replacement of the meta description tag, that was mostly used to mislead search engines. It's the core essence of a page expressed using portions of the text.

The screenshot below shows a search result for [Twitter], a snippet generated by the search engine and an abstract produced by Syntatica.

4 comments:

  1. It's an interesting idea, but in practice, after doing a few searches, I really can't say anything very positive about it at all.

    Among other problems, the abstracts end up mostly unreadable, and take up far more space than a regular "snippet"-based result from a normal engine (ie: google).

    Not only is it not a step forward, but I even think it's a step back in usability. The abstract is so unhelpful, the result summary ends up telling you LESS about the page than a normal search engine.

    Not only that, but it gives you less indication as to the quality of a page -- when doing research, you can usually tell which pages are more valuable than others from the search results, simply by the level of sophistication of language used by the page's author -- when I see in a search result on google, excerpts of a page like "Napoleon Bonaparte was born on August 15, 1769 in Ajaccio on the Mediterranean island of Corsica.", I'm instantly given an idea of not only the richness of information that the page will contain, but it's format too. The above page is quite obviously going to be almost encyclopedic, and very factually-oriented in its format.

    Contrast that to another result, "Napoleon Bonaparte One of the most brilliant individuals in history, Napoleon Bonaparte was a masterful soldier, an unequalled grand tactician and a superb ..." -- and you instantly can tell the difference in quality. The first site, almost certainly, is going to give me a much more concise, informational article, whereas the second reads almost like a fluff piece.

    This kind of indicator simply doesn't work with Syntatica, because it's giving you zero information as to what context the requested phrase appears in on the page, and you end up with a much, much less useful page summary.

    Great idea in theory, terrible in practice.

    ReplyDelete
  2. nice idea :)
    I agree it could do with being shorter, maybe a 2 line page summary before the snippet
    Didn't like their main page though, popup menus when i'm moving my mouse past them - Yuck! :(
    - imma

    ReplyDelete
  3. Such functionality has been added in DataparkSearch Engine, see http://www.dataparksearch.org/ , about a year ago. You may test it at http://www.43n39e.ru/

    ReplyDelete
  4. I would be interested in seeing where you found the passage you quote from, because the extracts you give are quite general, and would apply to the methods that any respectable search engines would use.

    Interestingly, Peter Norvig of Google research has done some useful work on this kind of thing in the past(here is a 1983 publication), so I am sure that google are working on a onebox to extract simple answers and descriptions out of top results. E.g. searching for "elephants" could bring up a onebox with a summary about elephants.

    Google already have the hardware to do this, and they should have extensive knowledge on four out of the six problems Peter Norvig mentioned in this paper to be able to provide search as well as they do. When Google do start doing this they will be miles ahead of the competition, just because they will we able to do it better.

    ReplyDelete

Note: Only a member of this blog may post a comment.