September 5, 2007

Google Spreadsheets Lets You Import Online Data

Google Spreadsheets wins the prize for the coolest new features launched this summer in Google's web applications.

The magical autofill lets you type the first elements from a series, select them, drag the fill handle (a small blue square) across the range that you want to fill and obtain the rest of the items automatically. It works for time series (days of the week, months), numerical series (arithmetical progressions), a combination between text and numbers (for example: type Q1, Q2 and Google Spreadsheets will add Q3, Q4 etc.) or dates.


You can also try entering related words, products, companies, countries and you may get surprising results; just press Ctrl while dragging the fill handle. It seems that autofill uses data from Google Sets. Look what I got when I typed "pop", "rock", "rap":


The other great new feature uses the "online" part from from "online spreadsheet". Now you can add data from many different sources available online: feeds, HTML files or simply text files (CSV/TSV). The data is automatically refreshed, although it's not very clear how often.

For HTML or XML files, you have to type an XPath expression to describe the data:
=importXML("URL","XPath expression")

For example, to get the Google search results for [live], you need to check the source code and notice that the class attribute has the value "l":
=importXML("http://www.google.com/search?q=live", "//a[@class='l']/@href")


There's a special function to import tables and lists from HTML files:
=importHtml(URL, element, index)

Element can be either "list" or "table", while the index tells an element's order in the page. Here's how to import all the definitions for [live] found by Google:
=importHTML("http://www.google.com/search?q=define:live", "list", 1)


To import structured text files, use this function:
=importData("URL")

This is especially useful if you want to import data from multiple CSV files available online.


Atom and RSS feeds can be imported using this function:
=GoogleReader(URL)

The function has an advanced form that lets you add only some of the items from the feed. For example, here's how to obtain the latest headline from New York Times:
=GoogleReader ("http://graphics8.nytimes.com/services/xml/rss/nyt/HomePage.xml", "items title", "false", 1)

If we couple these new additions with the two other functions that retrieve information from the web and consider that each published spreadsheet is available as a feed, it's clear that you can connect multiple spreadsheets and easily reference data available online.

18 comments:

  1. Try filling in a couple cells in a column with some color names, then hold the ctrl key (alt on a mac) as you drag down. It's a bit wonky, but Sets integration is already there :).

    ReplyDelete
  2. Finally Google Set becomes useful :)

    Very cool feature!

    ReplyDelete
  3. I'm not so sure about that. Google Sets is used indirectly in Google's search engines (5 points if you find where).

    ReplyDelete
  4. The autofill was a long awaited feature so I can only say “yeah, finally”. However the way it's implemented is really impressive. I mean this integration with Google Sets. Thanks to this it became even better than in other spreadsheet applications.

    And those external sources are interesting. This opens whole new class of things that can be done with Google Spreadsheets. It perhaps may be more useful than I can imagine now.

    Where else are Google Sets used? I don't know but maybe it's the synonym search (with “~” operator)? Do I get any points?

    ReplyDelete
  5. I can't get it to work on Firefox 2 under WinXP. There doesn't seem to be any documentation...

    ReplyDelete
  6. Neat! But I tried to use importData to fetch some tab-delimited data, and it all ended up in the same cell :-(

    ReplyDelete
  7. Alex, Am I right in saying that Google Sets is used for showing suggestions in the results page?

    ReplyDelete
  8. It's used in the list of suggested search terms.

    ReplyDelete
  9. Most of these queries do not work if I try to use true XPath.
    The error message is hardly sufficient.
    I wonder if people really use this feature.

    ReplyDelete
  10. The example works, but I try with a authenticated web service and didn't work. Anybody try with authenticated web services ?
    Do you know if exists official documentation about this feature?
    Thanks

    ReplyDelete
  11. =GoogleReader function doesn't work. Does it exist at all?

    ReplyDelete
  12. Heya…wondering about scape formating or regex with index() combined with ImportHTML().

    Trying to create spreadsheet table pulling live sliver and gold values…I would use yahoo but they don’t/can’t find the silver index so been looking around the net and finding little, but what I did find gave me back — ‘LAST : *16.88*’ — asterix included.

    I have looked around for regex axample and colab, but haven’t found yet.

    Help??

    Repeated at

    http://electronicmuseum.org.uk/2009/07/07/scraping-scripting-hacking/#comment-8153

    ReplyDelete
  13. Googlereader() == importfeed()

    ReplyDelete
  14. Bruno @ September 21, 2009 7:59 PM,

    please use:

    =MID(X,2,6)

    where X is *16.88*

    this returns

    16.88

    which you wanted...

    ReplyDelete
  15. how can I import a div element, not class or table?!

    ReplyDelete
  16. Since 2007, the year this post had been created, Google Sheets advanced significantly. Probably one of the most widely used functions is IMPORTRANGE to import data from multiple Google Sheets.

    However, I found IMPORTRANGE hard to use and not very beginner-friendly.

    I kept searching and found Import Sheet [1], a Google Sheets add-on that provides a visual interface to create connections between multiple Google Sheets and MS Excel documents.

    These connections allow you to schedule imports and exports of data between these multiple sheets.

    Maybe it's useful to someone reading this!

    [1] http://importsheet.com/install

    ReplyDelete
    Replies
    1. Hi all,

      Import Sheet is now called Sheetgo.

      https://www.sheetgo.com/

      Just to let you know. :-)

      Delete

Note: Only a member of this blog may post a comment.