Exercise 5 – Census Data and Join Operations

Contents


Return to Geog 205 page


Part 1: Tutorial – Learning Data Manipulation and Attribute Join Operations-

In this exercise, you will be working with US census data.  To understand this data, read the short document “US Census and GIS” AND the section “The main geographical hierarchy” (pages 21-24) of the document “Unlocking the Census with GIS – Chapter 1” in Canvas (the rest of the document is recommended but optional).

The data provided in this exercise IS NOT in geodatabase format but in a common open format called “shapefile” which consists of a group of files (from 3 to >10) with the same name (e.g. US_cities) but different extensions (shp, dbj, sbn, prj, shx, etc.).  A shapefile in ArcGIS looks like only one file (not multiple files).  With Windows Explorer, you can see all the files associated with each respective shapefile.

Inside the folder \ex5\Part1 you’ll find the following themes:

  • CENSUS2010_TOWNS_POLY.shp, shapefile of 2010 census town data for Massachusetts.  Obtained from the Office of Geographic Information of Mass., MassGIS  (link in Canvas – Exercise 5 resources)
  • CENSUS2010_BG_POLY.shp, shapefile of the 2010 census bureau Block Group data for Massachusetts.
  • CENSUS2010_BG_SF1_POP_RACE.dbf, databasetable (dBase format) of the 2010 census containing race information of the Block Groups in Massachusetts.
  • Mass_outline.shp, this is the outline of the state of Massachusetts, used for display purposes.
  • US_cities.shp, a point layer of cities and towns, in NAD83 geographic coordinates.

Unless noted, data are in Mass. State Plane Coordinate System – NAD83 – meters, coordinates (this doesn’t mean much to you now, but it’ll come later in the semester). 

Task 1: Census data representation

  1. Copy the data folder, create a new map and add the census data CENSUS2010_BG_POLY.shp inside the folder \Part1 to your new empty map frame.

Each polygon is a block group, a unit of aggregation in collecting census data.

  1. Open the attribute table and inspect the values for the column labeled POP100_Re. This is the total population (100%) for each block group (RE stands for redistricting) in Massachusetts.
  2. Next, right click on the column name (POP100_RE), and select Visualize Statistics. 

Question 1. What is the total population in Massachusetts, reported in this dataset, in 2010? ________

Question 2 – How many census block groups are there in Massachusetts? _______________

Below the map, where the tables open, there is a chart showing the Frequency Distribution chart. Looking at this chart, note that the data has many small values (tall bar = many entities), and a few very large values (the histogram has a tail to the right).  This “long-tailed” distribution is common in some types of data, and often displays better with a non-uniform set of symbol ranges. We will demonstrate here by example.

  1. Close the table stats panel chart, and open the Symbology tool (click in the theme in the CP and find Symbology under “Feature Layer” tab in the ribbon interface; or right click on the theme in the CP and choose Symbology)

Since POP100_RE is a numeric field, what kind of palette (primary symbology or color scheme) would you use to represent it?  (Unique value?  Graduated colors/symbols?)

  1. Display the POP100_RE field with a graduated color palette. Keep the default number of classes (5), natural breaks, and select a single color gradient color scheme. Study the result.

Notice how little information this map shows (it’s rather messy). Total population values are commonly not too informative, as usually large units (block groups, towns, states, and countries) tend to have larger total population than smaller ones.  Population density is usually a more informative parameter. To create a map of population density you need to divide the total pop of each unit by its corresponding area. Our shapefile contains a column with the number of acres for each block group, so we’ll use this.

NOTE: Generally, it’s not a good idea to use an “area” column that you have not calculated yourself, especially with shapefiles.  You’ll learn to calculate areas in the next part.

There are two ways to do this. First, creating a new column in the attribute table (e.g. POP_Density) and calculating new values for this column (e.g. Population/acres). You know how to do this from the previous exercise.

Another method is using “Normalization” in the Symbology dialog for quantitative data: Normalization divides the value field by the normalization field, in this case population divided by area is equivalent to pop density (or people per acre).

  1. To use the lats method open the attribute table of the census block groups: In the Symbology panel
  1. Choose Pop100-RE as Field
  2. For Normalization choose AREA_ACRES

Zoom in around Boston and notice the rather useless selection of classes: it doesn’t show much contrast; also, the grey outlines of the polygons obscure the colors for the small units. 

  1. In the Symbology panel, change the Methods to Geometrical Interval and change the number of Classes to 10.

Notice how this improves color contrast somewhat, but polygon outlines still obscure small polygons. 

  1. To remove the outlines, return to the Symbology panel, click on Color scheme options tool (looks like a gear) located next to the Color scheme bar and check “Apply to fill and outline”. This will apply the colors used in the fills to the outlines, i.e. each polygon will have a color equal to its fill.

Note: When representing layers don’t necessarily accept the default representation by the software. Make sure you represent the map the way you want it (number of classes, method, colors, .etc.)

While we’re working with the symbology, let’s fix the number of decimal places that ArcGIS Pro uses by default.

  1. In the Symbology panel, go to the “Advanced symbology options” tab  and open Format Labels. Choose Decimal places and change the value to 2. From now on, always use a sensible number of decimal places.

Now the map should be displayed with no lines separating the polygons and with more reasonable legend labels. This symbology provides a clearer view of the variation in the data, due to removal of the borderlines, and using a geometric interval in the classification method of the data representation.

  1. Add the shapefile Mass_outline on top of the table of content and adjust its color to a medium gray to show a light outline of the state (for better looks only). Don’t include this layer in the legend.

Task 2: Data extraction and map preparation

  1. Add the US_cities.shp point theme to the current data frame.

This shapefile has the cities for the entire U.S. and territories.  Use the Zoom to Full Extent button  to view the full dataset, and then Zoom to Previous Extent button to return to Massachusetts.

It is burdensome to work with a large data set when we only want a small portion or subset.  Now we will select just the cities data that correspond to our census data set. 

  1. In the Map tab click on Select by Location and in its dialog box, make
  1. Input Feature: US_cities
  2. Selecting Feature: the block groups (CENSUS2010_BG_POLY)
  3. Set the Relationship to Intersect and
  4. click OK.
  1. Create a new theme from this selection (right-clicking Data\Export Features). Save it as Mass_cities.

Question 3 – What is the average pop. density per census block groups in Massachussets? ________ (Hint: we don’t have Population density column so we can’t get the statistics from the attribute table.  Since the data is “normalized” look at the lower part of the Symbology panel and click the down arrow where it says More and choose “Show statistics”. You’ll get the stats of the normalized values.

Question 4 – How many cities are within the state? ______________________

  1. Make a map suitable for a report showing the population density of the block groups of Massachusetts and the cities of the state. 
  1. Make sure to show the whole state and to fix the labels, decimal places, and units in the legend.
  2. Insert all the required components of the map that you know (Title, author, scale bar with reasonable units, legend, north arrow)

Before you finish, let’s learn how to add another component, a Grid.

  1. While looking at your layout, click on the map frame added then in the Insert tab of the ribbon, look in the Map Frames group for “Grid” button (see note below if Grid button is grayed out)
  1. Select a Measured Grid
  2. Notice that the Grid is added in the CP and on the layout (NOTE: Don’t use a Graticule; that’s for Lat Long, and we want to use the local Mass State Plane Coordinate System
  3. Right click on the Grid in the CP and get the Properties and you’ll see a “Format Map Grid” panel on the right
  4. Observe that the coordinate system is “NAD 1983 StatePlane Massachusetts Mainland” (you’ll learn about this later)
  5. Uncheck the tick box under “Interval – Automatically adjust”
  6. Go to the Components page of the panel by clicking the second icon on top of the panel
  7. Click Gridlines in the Components field and enter 50,000 Meters (no commas) for intervals in X and Y (they might be there already)
  8. Now click Labels and, again, enter 50000 in both X and Y (they might be there already)

Now you have a correct map with all the required components. 

NOTE: If the Grid button is grayed out, remember to select (click) the Map data frame in the layout first.

FROM NOW ON, always include a grid (or Graticule in Lat-Long) with your maps, along with the other necessary components.

Question 5 – Upload yourname_ex5_map1.pdf (in PDF format) to Canvas.

Task 3: Joining Tables – Attribute Join

Sometimes the information we want to map is not available in the attribute table of the spatial data we have.  This information can be found in external tables (like census tables) that don’t have the spatial component. To visualize that particular data we have to “join” the information from the table into the attribute table of the spatial data.  Tabular joins use a common unique identifier to attach an attribute table to a spatial layer. 

  1. Return to the Map windows by closing the Layout tab.
  2. If not loaded in your view, load the shapefile containing the Census Block Groups (BG) plus the corresponding table containing Population and Race (CEN2010_BG_SF1_POP_RACE.dbf).
  3. Open this latter table and study its fields. It should contain total population and races.

Notice that this dbf file is just a table, it doesn’t have spatial information.  .

Identifying a Key

To join tables, you must identify a field that is common to both tables (attribute and data table). This field is known as a key because it uniquely identifies each record in a table. The values must be formatted in an identical way (i.e. numerical, string, date).

The attribute table of the block group layer (shapefile) contains multiple fields that uniquely identify each record. This table has attributes of census block groups but no actual population data (other than POP100_RE, which is a total value). Therefore, we cannot map the different races from a census dataset until we bring the information from the other table.  If you read the information from the MassGIS website (Optional reading: MassGIS Data: 2010 U.S. Census ) about this dataset you would find that there is one field that uniquely identifies records and, thus, it can be used to match fields in the census Block Group shapefile: LOGSF1.  The corresponding field in the population table is called LOGRECNO.  Fields do not need to have same names in table and layer in order to join them; they must be formatted in the same way.

Keep in mind that ArcGIS Pro does not check to make sure that the key fields or their formats match, so you should double check them (by opening both tables and getting the properties of the corresponding columns by hovering your cursor over the headings; they should be the same) before performing the join.

Note: the next two questions are not in Canvas, but you’ll need to answer them in order to proceed
What type of field is LOGRECNO in the census table?  _______________
What type of field is LOGSF1 in the block group layer table?  _______________

  1. Right click on the Block Groups shapefile in CP and select Joins and Relates\Add Join.
  1. In the “Add Join” window: the Input Table should be CENSUS2010_BG_POLY
  2. Under Input Join Field, use the drop down box to select the key field in your shapefile, LOGSF1 (careful, there is another field with similar name).
  3. Under Join Table, use the drop down box to select CEN2010_BG_SF1_POP_RACE (the table doesn’t have to be added to the CP)
  4. Under Input Join Table Field, use the drop down box to select the key field in your table that you wish to join, LOGRECNO
  5. Leave the “Keep all input records” box checked
  6. Click on Validate Join. Read the validation results window. Towards the bottom it should indicate how many records would be joined, this is important! It can’t be “0”! Then close it, and click OK to proceed with the Join.
  7. Open the attribute table of the shapefile and notice that the columns of the population table have been added at the end. Make sure there are numbers in these extra columns and not all cells with value of zero or <Null>.  It you find empty fields, make sure no objects are selected before the join, and then repeat the join.

When you perform an attribute join the data is dynamically joined together. Nothing is written to disk and the join is not permanent. To make the join permanent (JUST FOR INFO, NO NEED TO DO IT), export the theme (right-click\Data\Export) in the Contents Panel. It can be exported as a new shapefile or into our default GeoDB, and it will include attributes from both (or all) joined tables. 

Note: Several tables or layers can be joined to a single table or layer.

DON’T DO THIS, but explore: Joins only exist within the confines of project file (Map).  They can be “removed” by right-clicking the layer name in the CP and selecting “Joins and Relates” and clicking “Remove Join”. You have the option of removing a specific join or all joins.

  1. Prepare a map layout showing the distribution of Hispanics (HISP) in the state. Choose a single color ramp and remove the outlines from the polygons, and add the outline of the state.  DO NOT include Mass_cities. Use a Geometric Interval classification and 10 classes (like map 1); these are numbers of people so make sure labels are integer numbers (no decimal places). The map should be a good composition with all the necessary components.  Save as yourname_ex5_map2.pdf

Question 6 – Upload yourname_ex5_map2.pdf (in PDF format) to Canvas. 

Task 4: Table joining with aggregation

The map you previously made suffers from too much “granularity”.  Sometimes we want to show the data (i.e. Number of Hispanics) in a more general way.  Let’s map the population data by towns (instead of block groups).

  1. Add the CENSUS2010_TOWNS_POLY theme to your map and familiarize yourself with the content of its attribute table.
  2. Open the table of the census block groups shapefile and check the existing columns.

The Towns shapefile contains a column called “TOWN” which is unique (also TOWN_ID, and TOWN2). The block group shapefile also contains one column called “TOWN” which is unique to each town, but since there are multiple block groups in each town, they have to be summarized before they get joined to the town theme. We need the total number of people or races, etc. in each town; therefore, we need to add up all the block groups in the town, first. If we don’t summarize the block groups before joining this would be a case of a “many to one” join, where there are multiple options to join to the one town/city and ArcGIS Pro would just pick the first one it finds.  Since we want the population race data for each town, we need a join, with data summarized by town.

  1. In the table of the block group shapefile with the joined data (from previous task), right click on TOWN and choose Summarize.  Summarize the SUM for the variables:
  1. POP_WHITE, POP_BLACK, HISP (make sure Not to pick _NOT_HISP), and any other race that might interest you
  2. The table will be save in your default GeoDB. Call it something like “Race_Town_Summary”. It will be added to the map (if not, you can add it yourself).
  3. Open the table (it should be at the bottom of the CP) and familiarize yourself with its contents. Notice that now there is only one line per town with the corresponding totals for POP_2010 and the totals of the different races.

Notice, if you’re confused about the columns names, that for a “joined” column its name is preceded by the name of the table it comes from.  So this is called CEN2010_BG_SF1_POP_RACE.POP_2010.

Now answer the following questions:

Hint: Calculate the total population of Massachusetts using this table; it should be equal to the answer of question 1. (Hint of a hint: Statistics)

Question 7 – How many towns contain no Hispanics? _______

Question 8 – How many towns have a white population larger than the sum of the Black and Hispanic populations? (Hint: you might need to add/calculate fields and select by attributes. __________

Question 9 – How many towns have a total population over 10,000 that is, at least, one quarter Hispanic (Hint: Choose “Double” under Data Type (“double precision) to create a column with decimals)? ____________

Question 10 – Of the above towns (Q. 9), how many are also at least one quarter Black?    _______

Now we’re going to join the summary data to the town polygons. Notice that the TOWNS shapefile has two fields with town names, written in different ways (one UPPERCASE, the other with Capitals).  Fields to be joined must be written/spelled the same way.  Which column in the Town shapefile would you use?  TOWN or TOWN2?   What would you do if one of the tables doesn’t have the names spelled the same way?

  1. Clear all selections before proceeding.
  2. Join the summary table you just created (Race_Town_Summary) to the towns theme, NOT the block group (BG) theme.
  3. Prepare a map layout showing the distribution of Hispanics (or Blacks, etc.) in the state (no cities!). Choose the same color ramp, classification, and number of classes used in Question 6 (Map2). DO NOT remove the outlines of the towns.  The map should be a good composition with all the necessary components (don’t forget the grid). Save as yourname_ex5_map3.pdf.

Note: some towns might show up empty. Don’t worry about this, but can you guess why?

Question 11 – Upload yourname_ex5_map3.pdf (in PDF format) with all required components to Canvas.


Return to Top of page

Return to Geog 205 page


Part 2: Tutorial – Spatial Joins

The data for this part is in the Part 2 folder and it’s also in shapefile format. All these were obtained from the Office of Geographic Information of Mass., MassGIS (link in Canvas – Exercise 5 resources).  Unless noted, data are in Mass. State Plane Coordinate System – NAD83 – meters, coordinates (this doesn’t mean much to you now, but it’ll come later in the semester). 

  • Major_Basins.shp, Major Basins (watersheds) in Massachusetts
  • Certified_Vernal_Pools.shp, Certified Vernal pools

Spatial joins use common geography (location) to append fields from one layer to another layer. This allows you to assign the characteristics of an area—such as a watershed, town, school district, etc.—to individual houses, individuals, or events as well as to aggregate points by areas. In our case we’ll be adding to the vernal pool theme the names of the watershed they fall into.

Aggregating Points to Polygons

Using a spatial join, you can determine how many points fall in each polygon feature. For example, you might need to determine how many crimes occurred within each police district. You must have a point layer and a polygon layer in ArcGIS in order to do this.

  1. Start a new project and load the data inside \Part2 folder: Certified_Vernal_Pools.shp AND Major_Basins.shp, into your view.  These themes contain certified vernal pools and major basins respectively in Massachusetts.
  2. Open the attribute table o the vernal pools theme and notice that they don’t have any location information.

We want to create a map of certified vernal pools distribution in different watersheds (basins) of Massachusetts (this could be also be done per town; you might want to do this with the town layer of previous part, just for fun).  For this we have to “join” the data based on their location rather than on some common field, i.e. we want to produce a map that shows the number of vernal pools per watershed.

Note: Always make sure there are no objects selected before doing any operation, unless you’re working with the selected subset.

  1. First open the attribute table of Major_Basins and study it (NOTE:  get used to this step, it’s a good practice).

Notice that there is an “AREA_ACRES” column.  NEVER trust an area column that you haven’t calculated yourself.  The only exception to this rule is the “SHAPE_AREA” column in a Geodatabase, but his theme is a shapefile so we can’t trust this latter column either!

Since we’re in the topic of “area” lets add an area column in this theme that we’ll use later.

  1. Open the Major_Basins attribute table and push the first button on the windows to Add a field
  1. In the new window that opens enter the name “Acres” for the new field
  2. Choose “Double” under Data Type (“double precision”, it can hold lots of decimal places)
  3. Push Save in the ribbon and close this Fields window
  1. In the Major_Basins attribute table notice a new empty column “Acres”, right click on it and choose “Calculate Geometry
  1. The Fields “Input Features” and “Field” must be prepopulated
  2. Under Property choose “Area”Under Area Unit choose “US Survey Acres”Click OK

Now let’s do the spatial join

  1. In the Contents Panel right-click the Major_Basins theme
    • Go to “Joins and Relates” and click “Add Spatial Join”Under Target Features choose Major_Basins theme should be listedUnder Join Features choose Certified_Vernal_PoolsUnder Match Option study the different options and select “Contains.”
    • Click OK
  2. In the attribute table of the basin theme and notice that there is a new column called “Join_Count”; this is the number of vernal pools contained in each basin.

Notice that there are “basins’ that have no vernal pools, these are most likely the bunch of little islands around the coast. Study the data. You might want to symbolize the basin theme as graduated colors using Join_Count.

NOTE: This join, as in the case of attribute join, is also temporary and can be removed.  There is another way of doing Spatial Join, under the Geoprocessing toolbox that create a new output theme thus making the join permanent.  We will not use that method here.

NOTE 2: if you ever want to make the join permanent, all you have to do is to export the joined data into a new theme.

Question 12 – How many basins contain 100 or more vernal pools in Massachusetts? _____

Question 13 – Which basin has the most vernal pools? ________

Question 14 – Which basin have the largest density of vernal pools in number per acre?  _______
(Hint 1: Add a “Double” field (e.g. “Pool_den”) to the table
(Hint 2: Calculate this new field and divide “Join_Count” by “Acres”

Assigning Area Characteristics to Points

Using a spatial join, you can identify in which area a point falls. For example, you might need to determine in what basin each vernal pool is located. You must have a point layer and a polygon layer in ArcGIS in order to do this.  Let’s add the basin information to the vernal pool data:

  1. Open the attribute table of Certified_Vernal_Pools and study it
  2. In the CP:
  1. Right-click Certified_Vernal_Pools, select “Joins and Relates” and click “Add Spatial Join”
  2. Under Target Features select Certified_Vernal_Pools
  3. Under Join Features select Major_Basins
  4. Under Match Option select “Within”

Question 15 – How many vernal pools are in the Blackstone watershed: _________

  1. Save your project.

Return to Top of page

Return to Geog 205 page


Part 3: Case Study – Addressing maintenance problems in the city of Chicago

Now that you have learned how to do attribute/spatial joins, it is time to practice the skills you have learned so far.  Say that you are a city employee in Chicago and it is your job to decide where to direct maintenance funds. Your supervisor has asked you to choose a zip code in particular need of attention as part of an effort to equalize public resources across all neighborhoods.

How will you pick a place, without taking to the sidewalks and looking yourself?  Chicago has a 311 system, where residents can call and make complaints or requests about maintenance problems.   Rather than examining all 311 complaints, you have chosen a few to study:

  • Street Lights (all lights on pole out)
  • Pot Holes
  • Sanitation Code Complaints

Note: The city of Chicago website, the source of data for this part, has many more GIS themes about Chicago. Link is also in Canvas if interested.

To begin, add the shapefile and tables (in dbf format) in the folder \Part 3 to a new ArcGIS map and explore the information they contain.  Notice that the shapefile shows the boundaries of each zip code in Illinois, while each of the tables contains the “311” complaints, in the three categories above, for the city of Chicago

Your goal is to find out which zip code has the greatest number of open (unresolved) complaints. 

A few hints before you get started:

  1. As you learned in Part 1, you will need to summarize the complaints by zip code before joining them to the shapefile.
  2. Once all three tables are joined to the shapefile, you can create a new field in the attribute table to add up the complaints from all three categories.  (Note: you will get some error messages because not all zip codes will be included in all tables.  That’s fine—these zip codes’ sums will turn into zeros and you can ignore them) This process should give you the zip code with the most open complaints!

Question 16. What zip code did you select for special maintenance attention? _____________

Question 17. How many complaints were located in the selected zip code? ____________


Return to Top of page

Return to Geog 205 page