November 16, 2007

Picasa Web Integrates with Google Image Search

I've always wondered why Google prevents search engines to index a lot of user-generated content from its properties (photos uploaded to Blogger and Picasa Web Albums, public documents from Google Docs). It's a strange decision from a company whose goal is to make information widely available. For example, no photo uploaded to this blog can be found in Google Image Search or in other image search engines because a robots.txt file disallows that.

Some reasons could be more technical: Picasa Web uses a lot of AJAX and loads images using JavaScript, so search engines can't crawl its pages, but that doesn't mean Google can't come up with a interface that uses less JavaScript.

To solve this problem, a message from Picasa Web Albums announces the integration with Google Image Search:
Get more exposure for the public albums you're currently sharing on Picasa Community Search. Now, public albums from users with 'Public Search' enabled may also be included in Google image search results.


What I don't understand is why Google calls it an integration and why the public albums are available only in Google Image Search. Last time when I heard about an integration between a photo sharing site and an image search engine, Yahoo's search results were crowded with a lot of irrelevant images from Flickr, even if Flickr allows all search engines to index its pages.

Since there's no change in Picasa Web's robots.txt file, I suspect Google will do the same thing as Yahoo: mix the results from Picasa Web with the standard results, hopefully in a balanced manner. That means the public photos from Google's image hosting service will continue to be searchable only from Google's properties (previously, you could search them inside Picasa Web).


It's interesting to see that Google requires to login to Picasa Web Albums, even if you are already logged in your Google Account and the photo that appears in the search results for [caleb 2 months old] is from a public album.


Another change is that photos embedded in other pages are searchable, as you can see by restricting your search to these subdomains: lh3.google.com, lh4.google.com, lh5.google.com and lh6.google.com. Google sends you to the full-size image even if the author of the page only linked to a thumbnail.

Update. A better query: site:lh3.google.com sunset.

11 comments:

  1. > That means the public photos from
    > Google's image hosting service will
    > continue to be searchable only
    > from Google's properties

    Why? Any search engine can theoretically index those photos at Picasa Web which are embedded in or linked from a public page, like a Blogspot blog. As you mention, only the Picasa Web albums HTML is excluded from bots at this time, but not the actual image storage servers like lh3.google.com.

    But I agree it still seems as if Google is using their own data here. 'Cause when you image-search for e.g. [site:lh3.google.com caleb 2 months old] you can see picasaweb.google.com in the result URLs, but how should Google know that picasaweb.google.com includes that picture if they don't internally spider it (as picasaweb.google.com disallows all via robots.txt)?

    ReplyDelete
  2. Didn't Yahoo have this feature ages ago when they blended Flickr into search results?

    ReplyDelete
  3. Yahoo! integration with flickr made it the best image search engine on the web. they dont clutter irrelavent results from flickr.. its doing a smart job.

    google is taking yahoo's idea here

    ReplyDelete
  4. I think whether Google indexes Picasa web albums has nothing to do with AJAX. As Google owns Picasa and the database of Picasa servers, the search engine of Google need not search the web pages with Javascripts, but search the database directly with all meta info as well.

    Google just does not want to promote all the goodness at once, just give out some candy from time to time in order to refresh you.

    ReplyDelete
  5. The industry should come up with a solution for search bot to access multi-media contents especially those user generated contents easier. A web interfaces providing meta data of multi-media contents will make the jobs easier. And such interface may become part of the open social protocols initiated by Google and other companies.

    ReplyDelete
  6. So to get a picture indexed all I have to do is put it in a Picasa album?

    ReplyDelete
  7. How do I get a video indexed.

    ReplyDelete
  8. I personally have found that using pictures is a great way to get your website indexed. Picasa is really good for this.

    ReplyDelete
  9. Picassa really makes dealing with pictures much easier.

    ReplyDelete
  10. I agree with google's policy. Picasaweb is full of irrelevant images and spammers. In search results I'd tolerate to see maybe 2 out of 10 images, but no more.

    ReplyDelete
  11. Agreed, free image hosting services provide buckets of junk images that simply obfuscate the search landscape. However, eBay, for instance, only allows such services for hosting images and often doesn't allow owned domain-hosted images to be used.

    ReplyDelete

Note: Only a member of this blog may post a comment.