An unofficial blog that watches Google's attempts to move your operating system online since 2005. Not affiliated with Google.

Send your tips to gostips@gmail.com.

June 26, 2007

Reconstruct a Feed's History Using Google Reader

Google Reader is more than a feed reader: it's also a platform for feed caching and archiving. That means Google Reader stores all the posts from the subscribed feeds and they're available if you keep scrolling down in the interface.

A simple application for this feature is to retrieve the history of a feed for archiving purposes or to import it in a database. If you visit a blog or a news site, the feed will only contain the latest 10-20 posts, but Google Reader can show you more than that.

Just enter this URL in the address bar:
http://www.google.com/reader/atom/feed/FEED_URL?r=n&n=NUMBER_OF_ITEMS
and replace FEED_URL with the address of the feed and NUMBER_OF_ITEMS with the number of historical posts from the feed.

For example, http://www.google.com/reader/atom/
feed/http://feeds.feedburner.com/GoogleOperatingSystem?r=n&n=100
should return the latest 100 posts from this blog as an ATOM/XML file.

23 comments:

  1. Wooow, who would have thought that?

    Hopefully, someone in the future could use this for reading what his/her mother/father wrote about as a teenage. Better that pics and family tales, huh?

    ReplyDelete
  2. Honestly, I never used Google reader before. The way you write this article make me want to try this great application...:)

    ReplyDelete
  3. I had this question which I asked at DigitalPoint a couple of days back.

    But how they cache on new feeds ?

    ReplyDelete
  4. Im using google reader for a couple of months, but there is one thing that is missing... a SEARCH! Its amazing how a google application doesnt have search included... and about this feature, its great. I use it too.

    ReplyDelete
  5. It requires that you have an Account on Google Reader. Has it always been like this?

    ReplyDelete
  6. To use Google Reader, you need an account.

    ReplyDelete
  7. How about using this programmatically? Has anyone done it?

    ReplyDelete
  8. S.,

    Using Perl and LWP you can easily access this data. Notice that you need to authenticate yourself before accessing Google Reader.

    You can download up to a maximum of 5,000 entries per feed.

    ReplyDelete
  9. I was excited to find this article, because within a short period of time my blog's database was lost on both my hosting company's servers and my local computer. Anyway, the majority of the content is cached by Google Reader and I am hoping to find a way to retrieve the lost data and import it back into my WordPress blog.

    I tried this method and see that it works fine in the example provided, but I cannot replicate the results with my blog. I suspect it has to do with the syntax of my RSS feed's URL. My site is cached within Google Reader using the standard WordPress URL "www.domainname.com/?feed=rss2". I didn't list my specific URL, because it's adult-oriented content, but you can find it by following the link from my name (NSFW). Again, I'm guessing but I think that question mark is confusing the function. I am wondering if it should be escaped somehow or if anyone knows another method.

    Thanks in advance!

    ReplyDelete
  10. Google Reader caches a feed only if there's at least one subscriber and starting with the moment when someone subscribes to that feed. For example, if the first Google Reader subscriber added the feed on October 21st 2007, the cache will include posts published starting with that date.

    ReplyDelete
  11. There is a 1000 items limit for me.
    How could we bypass this ?
    I wan to import all my starred items.
    Regards,
    Antoine

    ReplyDelete
  12. It doesn't work for me either. I'm trying to recover old posts from a fotolog account and it doesn't really work. It only shows 100 feeds no matter the number I put after n=100. Maybe it has something to do with the RSS feed's url, which is http://www.fotolog.com/username/feed/main/rss20 Any idea?

    Thanks a lot
    Andrés

    ReplyDelete
  13. Probably nobody subscribed to the feed and Google didn't cache the posts.

    ReplyDelete
  14. Is there anyway to delete this history so that past posts that have been deleted do not appear in the cache to new subscribers, and possibly current subscribers?

    ReplyDelete
  15. I was excited to find this article, because within a short period of time my blog's database was lost on both my hosting company's servers and my local computer.

    ReplyDelete
  16. Is there anyway to delete this history so that past posts that have been deleted do not appear in the cache to new subscribers, and possibly current subscribers?

    Same problem! the content owner have all right to control his content and content in feed, Google reader should be protect this right and give webmaster or content owner to control their content in feed like normal search cache!

    ReplyDelete
  17. No, it's not possible to clear the cache.

    ReplyDelete
  18. I have the file but can't import it into wordpress, any suggestions on getting this into a format I can import?

    ReplyDelete
  19. Anyone tried to use feedparser to do this?

    I don't know how I can login from a python script...

    ReplyDelete
  20. I was excited to hear this, for a moment. But after reading the comments, I was disappointed. I wanted to fetch historical feeds of many websites and after reading "Ion Alex Chitu" comment, I felt its the same problem I got with NewsBlur(www.newsblur.com) Unless someone already subscribed to a blog, you wont get an old feedentry of a website, which you cant guarantee for any website.

    ReplyDelete
  21. Can we use Google Reader(as a platform) for commercial use ?

    ReplyDelete
  22. This is brilliant, but has anyone found a solution to the 1000-post limit?

    ReplyDelete
  23. >>> http://www.google.com/reader/atom/feed/http://feeds.feedburner.com/GoogleOperatingSystem?r=n&n=100
    Yes it works but I must log in to Google in order to use it.
    Is there any way I can use it as anonymous?

    ReplyDelete