December 20, 2007

Let's Test Powerset

Powerset, the search engine that shows results for natural language queries, started to let its testers enter any query.

From the about page: "Our unique innovations in search are rooted in breakthrough technologies that take advantage of the structure and nuances of natural language. Using these advanced techniques, Powerset is building a large-scale search engine that breaks the confines of keyword search." For now, Powerset only indexes pages from the English Wikipedia and it's not publicly available (but you can request an invite).

For example, Google shows almost the same search results for [Pyra Labs acquired by Google] and for [When was Pyra Labs acquired by Google?]. Even if the answer can be found in one of the first snippets, Google doesn't highlight it or display it prominently, like you can see in Powerset.

Do you know a query that shows irrelevant results at Google? Post it in the comments and I'll upload a screenshot of Powerset's results. Obviously, to compare Powerset's results with Google results, you need to restrict Google to, the only site currently indexed by Powerset.


Query #1 (from Matt Cutts): How many states are in the United States?
Conclusion: the first 10 Powerset results are terrible. On the other hand, Google shows the answer in a OneBox, but also a strange book search OneBox at the top: "How Many Doctors Do We Need?" by Duncan Yaggy, Patricia Hodgson.

Query #2: What Nobel Prize winners were born in Russia?
Conclusion: Google's results are better even if you restrict them to Wikipedia. The top result from Wikipedia (the third Google result) is a page titled Noble laureates by country. Only few of the people mentioned in Powerset's results are Russian who won the Nobel prize and there's no complete list.

Query #3: Who was the last president of United States?
Conclusion: The results #3, #4, #6 mention George W. Bush, but there are other names of former presidents. Google's fourth result has this title: "George W Bush: Last President Of The United States Of America?".

Query #4: Who are the founders of Yahoo?
Conclusion: the second result includes the answer, but it's only partially highlighted in the snippet. Google's top results has a good snippet: "The two founders of Yahoo!, David Filo and Jerry Yang, Ph.D. candidates in Electrical Engineering at Stanford University, started their guide in a campus."


    Indexing only Wikipedia will never show its real power or its real flaws.

  4. gives me better results

    Yellow colored snippets from powerset
    1. founders David Filo
    2. founder Larry Lessig to call
    3. Founders Jerry Yang and David Filo selected the name because they considered themselves yahoos.

  5. At Matt Cutts query>>

    Source of the google's Onebox result for "How many states are in the United States?" is not coming from Wikipedia and the fact that powerset is indexing only the wikipedia domain could be a reason for not showing the relevant results.

  6. @Felipe:
    Expanding the index from Wikipedia's pages to the entire web is an important problem, but Powerset tries to refine the algorithms using Wikipedia, one of the most important source of facts.

    Two answers from Powersetters:

    "Resources and time, basically. It takes a lot of hardware and a lot of time to index the whole web (and keep it up to date). We're focusing on Wikepedia as our corpus in our growing phase since it allows us to build up an index more rapidly while still including information on a wide swath of topics." (Ian Collins)

    "I'd also like to add that Wikipedia has a few other advantages for our roll-out testing: It is a very popular reference website, so people can find Wikipedia search actually useful, and it is a large but relatively static source, which makes indexing a bit more manageable than, say," (Eitan Frachtenberg)

  7. I'd like to run an OpenID service

  8. Hey Ionut, I understood the reason Powerset chose Wikipedia. I Just think Powerset promised too much, and indexing only Wikipedia is way too far from its promise.
    I'm not saying they're not in the right path. But... I was hoping for more.

    When I get my invitation to PowerLabs, maybe I'll run some tests with Google and Powerset. Comparing several queries and some scientific measures.

  9. @Paret:
    If that's a query, Powerset doesn't find any result.

    I think there are very few situations in which Powerset delivers better results than Google, Yahoo etc. And it's difficult to scale syntactical analysis for billions of web pages and tens of languages.

    Google delivers pretty good results even if it doesn't understand the meaning of your query and I'm not sure if it's really necessary to understand grammatical dependencies .

  10. Ionut, you've just said everything I was thinking. I don't really believe in Powerset. But... Time will tell us the real thing.

  11. Thanks for starting the discussion about Powerset’s latest demo. Please keep in mind that this pre-alpha version of Wikipedia search is still a work in progress. We’re continually improving ranking, summarization and highlighting features, and will soon be offering features for enhanced browsing of semantically indexed content. We also plan to integrate features that are unique to Powerset, such as automatically aggregating and browsing facts and relationships from our index (see our Powermouse demo in Powerlabs). Expect many changes in the coming weeks!

    Here are some queries that offer some more insight into how we index Wikipedia and display results:

    when did earthquakes hit san francisco
    who supports Barack Obama
    does Tom Cruise have kids
    who did Hulk Hogan defeat

    We certainly welcome the feedback, and encourage your readers to join Powerlabs ( for more information about our demos and vision for the future of search.

    Scott Prevost