August 16, 2008

Export Files from Google Page Creator

Update: the exporting tool is no longer available, now that Google Page Creator no longer exists.

You probably heard that Google intends to close Google Page Creator and migrate the users to Google Sites, a service that seems to be targeted to a different audience and that lacks many features available in Page Creator. Google Sites will add some of the missing features by the time Google closes Page Creator, but those who want to move to a different service or maybe to buy a domain can already export the files.

Requirement #1. There are three kinds of files that are trapped inside Page Creator: uploaded files and web pages created using the editor which can be public or unpublished. The following exporting tool can only work for uploaded files and the public web pages. If you have pages that are unpublished and you want to export them, click on "Publish all changes" in the sidebar. You can undo this action later.

Requirement #2. Another prerequisite for the exporting tool is a software that downloads all the files linked from a page. For Internet Explorer, try the excellent download manager FlashGet (I use the classic version). For Firefox, there's an extension called DownThemAll that has some of the features from FlashGet. In both cases, you'll have to restart the browser before continuing. As usually, Opera users don't need third-party software for advanced features: there's a sidebar panel that shows all the links from a page.

How to export the files.

All sites from Google Page Creator have a sitemap that lists all the public files from a site: it's available at SITENAME.googlepages.com/sitemap.xml. Just go to your site's homepage and add /sitemap.xml in the address bar. To copy the content of the XML file in the box below, you could right-click, select "view source" and copy the code (for Firefox, Opera) or open the file http://SITENAME.googlepages.com/sitemap.xml in Notepad.

After clicking on "Obtain URLs", you should see a pop-up window that lists all the files from your Page Creator site. Right-click inside the page and select "Download all by Flashget" or "DownThemAll!", depending on your browser. Make sure to check "All files" in DownThemAll and to choose a folder where the files will be copied. In Opera, press F4, click on the "Links" panel, select all the links using Ctrl-A and click on "Save to Download Folder".


Unfortunately, there's still some manual editing you need to do for the pages created using the editor: replace <img src="name.gif/name-full.jpg" style="border: 0pt none ;"> and similar code with <img src="name.gif" style="border: 0pt none ;">.

Two free alternatives to Page Creator are Weebly and Synthasite. Wall Street Journal has an article that explains how to buy a domain and host a site without paying too much.

{ Inspired by Peter Dawson. }

8 comments:

  1. I would like to suggest you to visit this page :
    http://gilles.rasigade.googlepages.com/View.htm

    It is possible to visualize most of the Google Page Creator files that have been added to the sitemap.xml.

    Regards,

    ReplyDelete
  2. Another method is to use a Web crawler that crawls through all links and create a offline (optional) version of the page.

    Teleport Pro
    http://www.tenmax.com/teleport/pro/home.htm

    HTTrack
    http://www.httrack.com/

    ReplyDelete
  3. or if you're on unix,
    just create a blank file
    paste in all of the links
    if you're using vim, just type:
    :1,$s/^/wget /
    and then make it executable: chmod +x filenamehere
    and then run it: ./filenamehere

    ReplyDelete
  4. @corey:

    This code should work in Unix and it only need the address of your site.

    wget "http://site.googlepages.com/sitemap.xml"
    grep -E -o "<loc>.+</loc>" sitemap.xml | sed --e "s/<\/loc>//" -e "s/<loc>//" >links.txt
    wget -i links.txt -P site.googlepages.com

    ReplyDelete
  5. Here's some even better code for Unix. Paste this in a text editor, save the file as exportgpc, execute
    chmod +x exportgpc and then run:
    ./exportgpc sitename
    OR
    ./exportgpc sitename.googlepages.com
    OR
    ./exportgpc http://sitename.googlepages.com

    All the files are downloaded to a new directory named sitename.googlepages.com.

    The code:



    #!/bin/bash
    #Exports the file hosted in a Google Page Creator site.

    url=$1
    [ $# -eq 0 ] || [ -z ${url} ] &&
    echo -e "exportgpc: missing URL\nUsage: exportgpc SITENAME.\nFor example, use exportgpc sundayclub if the URL is http://sundayclub.googlepages.com." && exit 1
    url=${url/http:\/\//}
    url=${url/.googlepages.com\//}
    url=${url/.googlepages.com/}
    wget "http://"${url}".googlepages.com/sitemap.xml" -P ${url}".googlepages.com" -N
    grep -E -o "<loc>.+</loc>" ${url}".googlepages.com/sitemap.xml" | sed --e "s/<\/loc>//" -e "s/<loc>//" >links.txt
    wget -i links.txt -P ${url}".googlepages.com" -N
    rm -f links.txt

    ReplyDelete
  6. I used HTTrack to get pages. http://www.httrack.com/page/2/en/index.html It might get more than you want but it does get all published pages and links.

    Gotchas to watch for are:
    Character codes that GPC may have put in your script.
    You probably lost any comments that a script had when you pasted into GPC,
    Don't forget that the templates have special files that are included in your code seen in the "style" section seen as url(-/include/...),
    Remember you hacked GPC with a script to change the style with you custom style sheet, You need to change either to link in the stylesheet or add the code to the html document. I recommend linking it in after the "style" section.
    If where you are moving supports directories think about fixing your HTML code and moving files into a directory structure.
    Some places don't support filenames without extensions, your going to need to add htm or html to your files.

    ReplyDelete
  7. Thanks for the tips, Alex!

    I'm just trying to export a single page that's using the default settings (nothing custom, just some text and a few pictures using the out of the box settings). The formatting of the text comes out great, but the background changes and I loose my borders when I move the files to a different server. Do you know if Page Creator calls to outside CSS files or if it uses some type of JavaScript for formatting?

    ReplyDelete
  8. Much easier way posted here http://www.ialwayscapital.com/2009/05/exportbackup-all-google-pages-files-in.html

    ReplyDelete