Objectives

There is no single package for mapping data in R. I’ve found that different packages and functions perform best for different types of maps. In this tutorial, I will walk you through several packages, data sources, and plotting tools. I realize this may be a bit more confusing than focusing on a single package, but after quite a bit of thought on the matter, I decided the most useful thing would be to introduce you to the best tools I’ve found to date and let you decide which package/toolkit is the most appropriate for your visualization needs. With this in mind, our objectives for this tutorial are as follows:


My assumptions


Set-up

Make sure you’ve installed and loaded the following:

library(ggmap)
library(tidyverse)
library(rgdal)
library(tmap)

In what follows, I’m going to walk you through the basics of map-making in R. We’ll start by learning how to access some great basemaps (not to be used for analysis, just for visualization), the slowly add shapefiles and rasters. We’ll learn how to make some common maps (i.e. choropleth) in R and walk through a couple of packages that simplify the whole map-making process. We won’t have time to get to interactive maps, which is one of the cooler things you can do with R, but if you’re interested in this, then you’ll need to learn more about Leaflet and Shiny. You can see a few examples of interactive maps made in R here.


Mapping your own data

This week we are going to work with datasets from the Utah GIS repository. You can download each dataset at the links below:

Once you’ve downloaded the data, load them into R. I’ve loaded them with the readOGR() function from rgdal, but by now you know several ways to load spatial data, so you do you:

library(rgdal)
trails <- readOGR("./data/HistoricTrails.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/emily.burchfield/Box/Teaching/GSA/2019_GSA/09_Mapping_data/data/HistoricTrails.shp", layer: "HistoricTrails"
## with 3415 features
## It has 9 fields
cities <- readOGR("./data/CitiesTownsLocations.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/emily.burchfield/Box/Teaching/GSA/2019_GSA/09_Mapping_data/data/CitiesTownsLocations.shp", layer: "CitiesTownsLocations"
## with 460 features
## It has 7 fields
## Integer64 fields read as strings:  POPULATION
tracts <- readOGR("./data/CensusTracts2010.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/emily.burchfield/Box/Teaching/GSA/2019_GSA/09_Mapping_data/data/CensusTracts2010.shp", layer: "CensusTracts2010"
## with 588 features
## It has 31 fields
## Integer64 fields read as strings:  P0010011 P0010012 P0010013

Luckily, all of these datasets are already in the same projection. This allows us to quickly visualize all three datasets in a single plot. If they weren’t in the same projection, you’d need to reproject them to a single projection before doing any visualization.

plot(tracts)
plot(cities, add = T, col = "red")
plot(trails, add = T, col = "blue")

You can actually get fairly far with the base R plot() function. The code gets dense quite quickly and the visualizations are nothing like what tmap and ggplot/ggmap can produce (see below). I recommend using plot() only to quickly visualize your data. Data visualizations you intend to publish or share should be made with one of the packages I describe below. If you’re interested in learning more about using plot(), I recommend Chapter 3 of Applied Spatial Data Analysis in R by Roger Bivand and colleagues. Below, I’ve inserted a table from p. 68 of this text that reviews useful arguments to be passed to the plot() function when working with spatial objects. If you want to learn even more about plot(), go here and here and/or here and/or here.

Before we begin, I am going to transform these datasets into a lat/long coordinate system. This will help us add them to web-based maps later in the tutorial:

crs <- CRS("+init=epsg:4269") 
tracts <- spTransform(tracts, crs)
cities <- spTransform(cities, crs)
trails <- spTransform(trails, crs)

ggplot2 and ggmap

You can do quite a bit of mapping with our friend ggplot2. The disadvantage of ggplot is that you have to transform any sp object you’re working with into a data.frame.1 The tool you use to transform a sp object to a data.frame depends on the type of sp object you’re working with. For point data, you can use the data.frame() function like so:

cities_df <- data.frame(cities)
head(cities_df)
##              NAME    COUNTY COUNTYNBR COUNTYSEAT POPULATION TYPE
## 1         Milford    Beaver        01          0       1422 City
## 2     Minersville    Beaver        01          0        921 Town
## 3       Snowville Box Elder        02          0        178 Town
## 4 Bear River City Box Elder        02          0        869 City
## 5         Corinne Box Elder        02          0        694 City
## 6      Deweyville Box Elder        02          0        347 Town
##        CARTO coords.x1 coords.x2 optional
## 1 Town/Place -113.0108  38.39707     TRUE
## 2 Town/Place -112.9240  38.21553     TRUE
## 3 Town/Place -112.7105  41.96263     TRUE
## 4 Town/Place -112.1270  41.61653     TRUE
## 5 Town/Place -112.1093  41.55077     TRUE
## 6 Town/Place -112.0928  41.71537     TRUE

This function essentially appends the cities@coords coordinate columns (here coords.x1 and coords.x2) to the cities@data data.frame.

How would we transform polygons or lines into a data.frame? Think of a line shape as a collection of points connected by straight lines. We can store coordinates for each point and the order in which points should be drawn in rows describing a single line. Attributes for a single line can be repeated for each point on a line. Let me show you this with our trail data. First, I’ll select a single route using spdplyr to make this a bit easier to understand:

library(spdplyr)
brl <- trails %>%
  filter(RouteName == "Bear River Loop")
plot(brl)

I have isolated one route from the larger trails dataset. This SpatialLinesDataFrame route is stored in several line segments:

glimpse(brl@data)
## Observations: 7
## Variables: 9
## $ TrailName  <fct> California Trail, California Trail, California Trail,…
## $ RouteName  <fct> Bear River Loop, Bear River Loop, Bear River Loop, Be…
## $ AltRouteNa <fct> NA, NA, NA, NA, NA, NA, NA
## $ County     <fct> Lincoln, Lincoln, Bear Lake, Bear Lake, Bear Lake, Be…
## $ State      <fct> WY, WY, ID, ID, ID, ID, ID
## $ Scale      <fct> "1:100,000", "1:100,000", "1:100,000", "1:100,000", "…
## $ AdminAgenc <fct> NPS, NPS, NPS, NPS, NPS, NPS, NPS
## $ Source     <fct> NPS, NPS, NPS, NPS, NPS, NPS, NPS
## $ Shape_Leng <dbl> 4003.20738, 69.89398, 8575.35463, 10733.19146, 617.61…

To plot this using ggplot, I need to transform this set of lines into a data.frame. The fortify() function essentially transforms this SpatialLinesDataFrame into a data.frame where each row describes the coordinates and plotting order of a single point along a line. Watch:

brl_df <- fortify(brl)
head(brl_df)
##        long      lat order piece  id group
## 1 -111.0003 42.17425     1     1 219 219.1
## 2 -111.0071 42.17616     2     1 219 219.1
## 3 -111.0119 42.17795     3     1 219 219.1
## 4 -111.0173 42.17985     4     1 219 219.1
## 5 -111.0203 42.18088     5     1 219 219.1
## 6 -111.0229 42.18156     6     1 219 219.1

Notice that the new data frame contains coordinates (long and lat) for points along a line. order indicates the order in which a point should be drawn and group indicates the line segment to which the point belongs. We can append the other data in the original brl SpatialLinesDataFrame by using the id variable:

brl@data$id <- rownames(brl@data)
brl_df <- merge(brl_df, brl@data, by = "id")

Now we can plot this line using ggplot():

ggplot(brl_df) +
  geom_path(aes(x = long, y = lat, group = group)) +
  coord_fixed()

I’ll explain how to build maps in a second. First, let me walk you through fortify() with a polygon. Like lines, think of a polygon as an area described by a collection of points joined by lines to enclose a polygon. When we fortify() a SpatialPolygonsDataFrame into a data.frame, we generate a row for each point, an order in which points should be plotted, and the group, or polygon, to which a point belongs:

tract_df <- fortify(tracts)
## Regions defined for each Polygons
tracts@data$id <- rownames(tracts@data)
tract_df <- merge(tract_df, tracts@data, by = "id")
head(tract_df)
##   id      long      lat order  hole piece group STATEFP10 COUNTYFP10
## 1  0 -112.0021 41.52189     1 FALSE     1   0.1        49        003
## 2  0 -112.0020 41.52181     2 FALSE     1   0.1        49        003
## 3  0 -112.0020 41.52179     3 FALSE     1   0.1        49        003
## 4  0 -112.0017 41.52155     4 FALSE     1   0.1        49        003
## 5  0 -112.0016 41.52147     5 FALSE     1   0.1        49        003
## 6  0 -112.0015 41.52136     6 FALSE     1   0.1        49        003
##   TRACTCE10     GEOID10  NAME10 FUNCSTAT10  INTPTLAT10   INTPTLON10
## 1    960601 49003960601 9606.01          S +41.5159999 -112.0089180
## 2    960601 49003960601 9606.01          S +41.5159999 -112.0089180
## 3    960601 49003960601 9606.01          S +41.5159999 -112.0089180
## 4    960601 49003960601 9606.01          S +41.5159999 -112.0089180
## 5    960601 49003960601 9606.01          S +41.5159999 -112.0089180
## 6    960601 49003960601 9606.01          S +41.5159999 -112.0089180
##   LOGRECNO AREALAND AREAWATR POP100 HU100 P0020001 P0020002 P0020003
## 1  0000009  2185421        0   3720  1395     3720      396     3324
## 2  0000009  2185421        0   3720  1395     3720      396     3324
## 3  0000009  2185421        0   3720  1395     3720      396     3324
## 4  0000009  2185421        0   3720  1395     3720      396     3324
## 5  0000009  2185421        0   3720  1395     3720      396     3324
## 6  0000009  2185421        0   3720  1395     3720      396     3324
##   P0020004 P0020005 P0020006 P0020007 P0020008 P0020009 P0020010 MTFCC
## 1     3262     3188       17       27       15        6        9 G5020
## 2     3262     3188       17       27       15        6        9 G5020
## 3     3262     3188       17       27       15        6        9 G5020
## 4     3262     3188       17       27       15        6        9 G5020
## 5     3262     3188       17       27       15        6        9 G5020
## 6     3262     3188       17       27       15        6        9 G5020
##   P0010011 P0010012 P0010013 P0010014 SqMiles SHAPE_Leng SHAPE_Area
## 1       11       30       17        0       0   6709.768    2184057
## 2       11       30       17        0       0   6709.768    2184057
## 3       11       30       17        0       0   6709.768    2184057
## 4       11       30       17        0       0   6709.768    2184057
## 5       11       30       17        0       0   6709.768    2184057
## 6       11       30       17        0       0   6709.768    2184057

Now let’s get to mapping. One of the nice things about ggplot is that you can easily grab base maps off the web using ggmap. Let’s say we want to grab a map of Utah from Google Maps or OpenStreetMap.

library(ggmap)
base_map <- get_map(location = 'utah', zoom = 6, maptype = 'satellite', source = 'google', crop = T)
plot(base_map)

You can play around with the zoom and maptype to get the map you need. This website provides a nice overview of the maptype and source options. You drop in coordinates for your location or use the bounding box of a shapefile you’re working with to define the basemap extent. Note that these basemaps are projected in a Web Mercator projection that makes rendering visualizations more efficient. These basemaps aren’t to be used in analyses, just for visualization. If you plan on using a ggmap basemap or other basemaps with projected data, I strongly recommend you read this article. Great overviews of all the things you can do with this package can be found here and here.

Once you find a basemap you like, you can add layers of data to ggmap() just like you would with ggplot() (e.g. + geom_point()). Note, however, that ggmap() sets the default map dataset and aesthetics. This means that as you add layers other than the basemap, you have to specify both the mapping and data arguments to the geom.

trails_df <- fortify(trails)
trails@data$id <- rownames(trails@data)
trails_df <- merge(trails_df, trails@data, by = "id")

ggmap(base_map) +
  geom_point(data = cities_df, aes(x = coords.x1, y = coords.x2)) +
  geom_path(data = trails_df, aes(x = long, y = lat, group = group)) +
  coord_fixed()
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Warning: Removed 5472 rows containing missing values (geom_path).