01010101

To content | To menu | To search

Friday, October 23 2009

Geocoding with Processing and Google maps API

I'm currently working on a datavis tool for the Robert Dubois-Corneau museum where my wife works as a curator. Among some other infovis, I plan to build a map of admissions by geographical area.

For this purpose, and before building a form to enter visitors information, I needed a list of all cities in France with their geocodes.

In this post, I will explain how to automate the geocoding of a list of addresses using the Google Maps API.

Prerequisites

If you don't have Processing, you can donwload it here.

Download the Processing sketch source code here.
Nb: it also contains a JSON java library.

Then, you require a Google Maps API key. If you don't have one, you can sign up here.

You'll also need a tsv (tab separated values) file containing the addresses you want to geocode. If you don't have one, you can use the sample provided with the source code.

Installation

Copy the "GoogleGeocoder" and the "libraries" folders in your Processing Sketchbook folder.

Copy your tsv file in the "data" folder right under the GoogleGeocoder sketch root folder.

The code

Run Processing and open GoogleGeocoder.pde.
You'll need to do a few changes to make it work with your address file.

You first need to change this line with your filename (or rename your address file to "input.tsv"):

  1. String inputFile = "input.tsv";

Copy/paste your Google Maps API key here:

  1. String GOOGLEMAP_APIKEY = "copy_paste_your_api_key_here";

If your file has a first row header or if you want to start at a certain line number, change the currentRow value. Otherwise, set it to 0 to start from the first line.

  1. currentRow = 1;

If you want to split the results in several files, replace "endingRow" with the number of lines you want per file.

  1. nbLinesPerFile = endingRow;

The most tricky part is to build your geocoding address to be used for the request.
Each line and column of your address file was parsed during the setup() and can later be accessed with the method below. To read the data in the first column (index 0) just do that.

  1. locationTable.getString(currentRow,0)

If your input file contains several columns (streets, cities, zipcodes, ...) you can create an address string by calling individually each column as shown below. The best way to know if your address can then be used for geocoding is to try it on GoogleMaps.

Here is the sample provided:

  1. address = locationTable.getString(currentRow,1) + ","
  2. + locationTable.getString(currentRow,2)+ ","
  3. + helper(
  4. locationTable.getString(currentRow,3) + ","
  5. + locationTable.getString(currentRow,4)
  6. );

Take a look at the "input.tsv" file provided to understand how the address string is build.
What you'll notice is that I inserted a call to a helper(String s) method.
I could have written the address like this...

  1. address = locationTable.getString(currentRow,1) + ","
  2. + locationTable.getString(currentRow,2)+ ","
  3. + locationTable.getString(currentRow,3) + ","
  4. + locationTable.getString(currentRow,4);

... but if you try this, you'll notice that the first address cannot be geocoded.

The first line corresponds to a city in French Guiana, which, even if a valid French county code, cannot be found using the format "%address%, %city%, %county code%, France".
So, I had to add this little helper to reformat such addresses.
Here, if "%county code%,France" equals "973,France", we'll use "French Guiana" instead.
You can adapt this method to your needs when required.

Last important point is to create an index that will be saved along with the geocoded data. This will allow you to match between both input and output files. I'll cover this in another post.

  1. uniqueID = locationTable.getString(currentRow,0);

Now we're done, so just run this stuff and wait...

The output file(s) will contain for each address, the uniqueID you choose (or nothing if you set it to "") followed by:
Either an error (flagged with [ERROR] + reason + address)
Or the following geocoded information separated by tabs :

  • Accuracy
  • Latitude
  • Longitude
  • Altitude
  • North limit (ie: for cities)
  • South limit
  • East limit
  • West limit

Thursday, October 22 2009

Foreword

I've been thinking of a way to share source code and experience for a while and I decided that setting up a blog would probably be the best way to do that. So, welcome on board, I hope this can benifit anyone interested in digital art, datavis/infovis, technology, etc...