Geog 210 – Geocoding with QGIS


Return to top

Return to Geog 210 page


Part 2: Geocoding (with QGIS)

In exercise 6, you used a point layer containing crimes in the city of Worcester in 2014. Now, you will learn how to create this layer of the locations of crimes using street addresses, a process called geocoding. Crime data were obtained from crimereports.com (deadlink). In addition to the table of addresses, we’ll use an “Address Locator” (more on this later) provided by OpenStreetMap.

The address locator requires a vector street layer containing house numbers for each street segment. Since houses are arranged as odd/even numbers on different sides of the street, the vector layer must have house numbers on both, left and right sides of the street. Fortunately this information has been compiled by the US census bureau in the street TIGER files. See the image for a look at one section of streets between South Hadley and Chicopee.

The figure below shows columns of the TIGER file for Massachusetts (from MassGIS). Notice the street name column: there are multiple line segments for the same street. The rest of the columns indicate the range of house numbers found on the left and on the right for each segment:

address locator table
  • LFROMADD – contains the house number that starts the segment on the left side (left from address).  Same for RFROMADD for the right side.
  • LTOADD – is the last house number found in the street segment on the left address (left to address).  Same for RTOADD for the right side.
  • ZIPL and ZIPR – these are the zip codes that can be found either on the left or right side of the street segment.

Be aware that this business of geocoding street addresses will vary enormously from country to country depending on the quality of the data and the situation of the addresses on each street in each country.

  1. Create a new project in QGIS
  2. Add the table with crime data in your working folder (X:ex9), Crimes_Worcester_2014.csv and examine its contents.  (in the Data Source manager, you can use the Browser tab to find it)

Question 3. How many records are there?

Notice the column called “Web_Address”. This is how crimereports.com gave you the address.  Obviously, it’s a degraded address. I edited those addresses (removed the word “Block”) and created a column called “Address”.  The result of this operation is that all the crimes that happened in one particular block will be geocoded to the first address of the block.

The procedure we are going to use takes a long time, so we are going to get only a subset of the crimes in Worcester. For our purposes, we’ll only study the Drug related crimes.

  1. Select only “Drugs” from Crimes_Worcester_2014.csv
  1. Get the properties of the table in the Layers pane
    • In the Source tab click on Query Builder
    • Enter the expression Crime_type = Drugs (after choosing “Crime_Type” on the left window, push “All” button under the Values window to pick “Drugs”
    • Click OK
  1. Export the selected records (302 records) as a Comma Separated Value (CSV) file (NOT a shapefile) and save it in your working folder.

Next, we’ll use a QGIS plugin. This is an external program that provides extra functionality to QGIS.

  1. In the top menu PluginsManage and Install Plugins…
  2. Select All, on the left, then type “MMQGIS” in the search box, it’ll be listed in the center pane, select it and click install in the lower-right corner of the window. Close the Plugins dialog. Notice a new menu MMQGIS added to the menu bar on top.
  3. From the MMQGIS menu select GeocodeGeocode CSV with Web Service
  1. Provide the csv file with the drug crimes as input
  2. The address fields should be populated correctly (address, city, and state). Make sure selections are correct
  3. Web server: OpenStreetMap/Nominatim (this is going to use an OSM server, make sure you have internet access for this)
  4. Enter output shapefile (SHP) and Not Found output list (CSV) in your working folder. Make sure this last one is in the right format (CSV).
  5. Click Apply and wait until finished (~5mins)
  1. When the program finishes geocoding you can see the shapefile with the geocoded addresses. There will be some obvious errors (outside the city of Worcester).  You can add the outline of Worcester from ex6Worcester.gdb. Those outliers need to be extracted, verified, and run again (we won’t do that here).
  2. Check the Not Found table. These addresses didn’t get “matched” to a street address, maybe a misspelled or wrong address.  Depending on the importance of the job, you should try to fix those addresses and geocode them until there are no more unmatched addresses.

Question 4.  How many unmatched addresses in the Not Found output csv file did you get?  _____________ (It should be a very low number, hopefully)

  1. Open the table of the new geocoded layer and notice the new columns that have been created. These include a new address field (DISPLAY_NA) and LatLong among others.

Return to top

Return to Geog 210 page