Objectives

There is no single package for mapping data in R. I’ve found that different packages and functions perform best for different types of maps. In this tutorial, I will walk you through several packages, data sources, and plotting tools. I realize this may be a bit more confusing than focusing on a single package, but after quite a bit of thought on the matter, I decided the most useful thing would be to introduce you to the best tools I’ve found to date and let you decide which package/toolkit is the most appropriate for your visualization needs. With this in mind, our objectives for this tutorial are as follows:


My assumptions


Set-up

Make sure you’ve installed and loaded the following:

library(ggmap)
library(tidyverse)
library(rgdal)
library(tmap)

In what follows, I’m going to walk you through the basics of map-making in R. We’ll start by learning how to access some great basemaps (not to be used for analysis, just for visualization), the slowly add shapefiles and rasters. We’ll learn how to make some common maps (i.e. choropleth) in R and walk through a couple of packages that simplify the whole map-making process. We won’t have time to get to interactive maps, which is one of the cooler things you can do with R, but if you’re interested in this, then you’ll need to learn more about Leaflet and Shiny. You can see a few examples of interactive maps made in R here.


Mapping your own data

This week we are going to work with datasets from the Utah GIS repository. You can download each dataset at the links below:

Once you’ve downloaded the data, load them into R. I’ve loaded them with the readOGR() function from rgdal, but by now you know several ways to load spatial data, so you do you:

library(rgdal)
trails <- readOGR("./data/HistoricTrails.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/emily.burchfield/Box/Teaching/GSA/2019_GSA/09_Mapping_data/data/HistoricTrails.shp", layer: "HistoricTrails"
## with 3415 features
## It has 9 fields
cities <- readOGR("./data/CitiesTownsLocations.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/emily.burchfield/Box/Teaching/GSA/2019_GSA/09_Mapping_data/data/CitiesTownsLocations.shp", layer: "CitiesTownsLocations"
## with 460 features
## It has 7 fields
## Integer64 fields read as strings:  POPULATION
tracts <- readOGR("./data/CensusTracts2010.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/emily.burchfield/Box/Teaching/GSA/2019_GSA/09_Mapping_data/data/CensusTracts2010.shp", layer: "CensusTracts2010"
## with 588 features
## It has 31 fields
## Integer64 fields read as strings:  P0010011 P0010012 P0010013

Luckily, all of these datasets are already in the same projection. This allows us to quickly visualize all three datasets in a single plot. If they weren’t in the same projection, you’d need to reproject them to a single projection before doing any visualization.

plot(tracts)
plot(cities, add = T, col = "red")
plot(trails, add = T, col = "blue")

You can actually get fairly far with the base R plot() function. The code gets dense quite quickly and the visualizations are nothing like what tmap and ggplot/ggmap can produce (see below). I recommend using plot() only to quickly visualize your data. Data visualizations you intend to publish or share should be made with one of the packages I describe below. If you’re interested in learning more about using plot(), I recommend Chapter 3 of Applied Spatial Data Analysis in R by Roger Bivand and colleagues. Below, I’ve inserted a table from p. 68 of this text that reviews useful arguments to be passed to the plot() function when working with spatial objects. If you want to learn even more about plot(), go here and here and/or here and/or here.

Before we begin, I am going to transform these datasets into a lat/long coordinate system. This will help us add them to web-based maps later in the tutorial:

crs <- CRS("+init=epsg:4269") 
tracts <- spTransform(tracts, crs)
cities <- spTransform(cities, crs)
trails <- spTransform(trails, crs)

ggplot2 and ggmap

You can do quite a bit of mapping with our friend ggplot2. The disadvantage of ggplot is that you have to transform any sp object you’re working with into a data.frame.1 The tool you use to transform a sp object to a data.frame depends on the type of sp object you’re working with. For point data, you can use the data.frame() function like so:

cities_df <- data.frame(cities)
head(cities_df)
##              NAME    COUNTY COUNTYNBR COUNTYSEAT POPULATION TYPE
## 1         Milford    Beaver        01          0       1422 City
## 2     Minersville    Beaver        01          0        921 Town
## 3       Snowville Box Elder        02          0        178 Town
## 4 Bear River City Box Elder        02          0        869 City
## 5         Corinne Box Elder        02          0        694 City
## 6      Deweyville Box Elder        02          0        347 Town
##        CARTO coords.x1 coords.x2 optional
## 1 Town/Place -113.0108  38.39707     TRUE
## 2 Town/Place -112.9240  38.21553     TRUE
## 3 Town/Place -112.7105  41.96263     TRUE
## 4 Town/Place -112.1270  41.61653     TRUE
## 5 Town/Place -112.1093  41.55077     TRUE
## 6 Town/Place -112.0928  41.71537     TRUE

This function essentially appends the cities@coords coordinate columns (here coords.x1 and coords.x2) to the cities@data data.frame.

How would we transform polygons or lines into a data.frame? Think of a line shape as a collection of points connected by straight lines. We can store coordinates for each point and the order in which points should be drawn in rows describing a single line. Attributes for a single line can be repeated for each point on a line. Let me show you this with our trail data. First, I’ll select a single route using spdplyr to make this a bit easier to understand:

library(spdplyr)
brl <- trails %>%
  filter(RouteName == "Bear River Loop")
plot(brl)

I have isolated one route from the larger trails dataset. This SpatialLinesDataFrame route is stored in several line segments:

glimpse(brl@data)
## Observations: 7
## Variables: 9
## $ TrailName  <fct> California Trail, California Trail, California Trail,…
## $ RouteName  <fct> Bear River Loop, Bear River Loop, Bear River Loop, Be…
## $ AltRouteNa <fct> NA, NA, NA, NA, NA, NA, NA
## $ County     <fct> Lincoln, Lincoln, Bear Lake, Bear Lake, Bear Lake, Be…
## $ State      <fct> WY, WY, ID, ID, ID, ID, ID
## $ Scale      <fct> "1:100,000", "1:100,000", "1:100,000", "1:100,000", "…
## $ AdminAgenc <fct> NPS, NPS, NPS, NPS, NPS, NPS, NPS
## $ Source     <fct> NPS, NPS, NPS, NPS, NPS, NPS, NPS
## $ Shape_Leng <dbl> 4003.20738, 69.89398, 8575.35463, 10733.19146, 617.61…

To plot this using ggplot, I need to transform this set of lines into a data.frame. The fortify() function essentially transforms this SpatialLinesDataFrame into a data.frame where each row describes the coordinates and plotting order of a single point along a line. Watch:

brl_df <- fortify(brl)
head(brl_df)
##        long      lat order piece  id group
## 1 -111.0003 42.17425     1     1 219 219.1
## 2 -111.0071 42.17616     2     1 219 219.1
## 3 -111.0119 42.17795     3     1 219 219.1
## 4 -111.0173 42.17985     4     1 219 219.1
## 5 -111.0203 42.18088     5     1 219 219.1
## 6 -111.0229 42.18156     6     1 219 219.1

Notice that the new data frame contains coordinates (long and lat) for points along a line. order indicates the order in which a point should be drawn and group indicates the line segment to which the point belongs. We can append the other data in the original brl SpatialLinesDataFrame by using the id variable:

brl@data$id <- rownames(brl@data)
brl_df <- merge(brl_df, brl@data, by = "id")

Now we can plot this line using ggplot():

ggplot(brl_df) +
  geom_path(aes(x = long, y = lat, group = group)) +
  coord_fixed()

I’ll explain how to build maps in a second. First, let me walk you through fortify() with a polygon. Like lines, think of a polygon as an area described by a collection of points joined by lines to enclose a polygon. When we fortify() a SpatialPolygonsDataFrame into a data.frame, we generate a row for each point, an order in which points should be plotted, and the group, or polygon, to which a point belongs:

tract_df <- fortify(tracts)
## Regions defined for each Polygons
tracts@data$id <- rownames(tracts@data)
tract_df <- merge(tract_df, tracts@data, by = "id")
head(tract_df)
##   id      long      lat order  hole piece group STATEFP10 COUNTYFP10
## 1  0 -112.0021 41.52189     1 FALSE     1   0.1        49        003
## 2  0 -112.0020 41.52181     2 FALSE     1   0.1        49        003
## 3  0 -112.0020 41.52179     3 FALSE     1   0.1        49        003
## 4  0 -112.0017 41.52155     4 FALSE     1   0.1        49        003
## 5  0 -112.0016 41.52147     5 FALSE     1   0.1        49        003
## 6  0 -112.0015 41.52136     6 FALSE     1   0.1        49        003
##   TRACTCE10     GEOID10  NAME10 FUNCSTAT10  INTPTLAT10   INTPTLON10
## 1    960601 49003960601 9606.01          S +41.5159999 -112.0089180
## 2    960601 49003960601 9606.01          S +41.5159999 -112.0089180
## 3    960601 49003960601 9606.01          S +41.5159999 -112.0089180
## 4    960601 49003960601 9606.01          S +41.5159999 -112.0089180
## 5    960601 49003960601 9606.01          S +41.5159999 -112.0089180
## 6    960601 49003960601 9606.01          S +41.5159999 -112.0089180
##   LOGRECNO AREALAND AREAWATR POP100 HU100 P0020001 P0020002 P0020003
## 1  0000009  2185421        0   3720  1395     3720      396     3324
## 2  0000009  2185421        0   3720  1395     3720      396     3324
## 3  0000009  2185421        0   3720  1395     3720      396     3324
## 4  0000009  2185421        0   3720  1395     3720      396     3324
## 5  0000009  2185421        0   3720  1395     3720      396     3324
## 6  0000009  2185421        0   3720  1395     3720      396     3324
##   P0020004 P0020005 P0020006 P0020007 P0020008 P0020009 P0020010 MTFCC
## 1     3262     3188       17       27       15        6        9 G5020
## 2     3262     3188       17       27       15        6        9 G5020
## 3     3262     3188       17       27       15        6        9 G5020
## 4     3262     3188       17       27       15        6        9 G5020
## 5     3262     3188       17       27       15        6        9 G5020
## 6     3262     3188       17       27       15        6        9 G5020
##   P0010011 P0010012 P0010013 P0010014 SqMiles SHAPE_Leng SHAPE_Area
## 1       11       30       17        0       0   6709.768    2184057
## 2       11       30       17        0       0   6709.768    2184057
## 3       11       30       17        0       0   6709.768    2184057
## 4       11       30       17        0       0   6709.768    2184057
## 5       11       30       17        0       0   6709.768    2184057
## 6       11       30       17        0       0   6709.768    2184057

Now let’s get to mapping. One of the nice things about ggplot is that you can easily grab base maps off the web using ggmap. Let’s say we want to grab a map of Utah from Google Maps or OpenStreetMap.

library(ggmap)
base_map <- get_map(location = 'utah', zoom = 6, maptype = 'satellite', source = 'google', crop = T)
plot(base_map)

You can play around with the zoom and maptype to get the map you need. This website provides a nice overview of the maptype and source options. You drop in coordinates for your location or use the bounding box of a shapefile you’re working with to define the basemap extent. Note that these basemaps are projected in a Web Mercator projection that makes rendering visualizations more efficient. These basemaps aren’t to be used in analyses, just for visualization. If you plan on using a ggmap basemap or other basemaps with projected data, I strongly recommend you read this article. Great overviews of all the things you can do with this package can be found here and here.

Once you find a basemap you like, you can add layers of data to ggmap() just like you would with ggplot() (e.g. + geom_point()). Note, however, that ggmap() sets the default map dataset and aesthetics. This means that as you add layers other than the basemap, you have to specify both the mapping and data arguments to the geom.

trails_df <- fortify(trails)
trails@data$id <- rownames(trails@data)
trails_df <- merge(trails_df, trails@data, by = "id")

ggmap(base_map) +
  geom_point(data = cities_df, aes(x = coords.x1, y = coords.x2)) +
  geom_path(data = trails_df, aes(x = long, y = lat, group = group)) +
  coord_fixed()
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Warning: Removed 5472 rows containing missing values (geom_path).

The coord_fixed() command is important. It tells ggplot how to fix the aspect ratio between latitude and longitude (y/x). If we set a ratio above one, we exaggerate, latitude.

ggmap(base_map) +
  geom_point(data = cities_df, aes(x = coords.x1, y = coords.x2)) +
  geom_path(data = trails_df, aes(x = long, y = lat, group = group)) +
  coord_fixed(3.0)
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Warning: Removed 5472 rows containing missing values (geom_path).

And a low value exaggerates longitude:

ggmap(base_map) +
  geom_point(data = cities_df, aes(x = coords.x1, y = coords.x2)) +
  geom_path(data = trails_df, aes(x = long, y = lat, group = group)) +
  coord_fixed(0.3)
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Warning: Removed 5472 rows containing missing values (geom_path).

We can play around with aesthetics to make this look a bit better, say by coloring our cities based the county in which they are located and changing our trail color to gray so they are easier to detect:

ggmap(base_map) +
  geom_point(data = cities_df, aes(x = coords.x1, y = coords.x2, color = COUNTY)) +
  geom_path(data = trails_df, aes(x = long, y = lat, group = group), color = "gray") +
  coord_fixed(1.0)
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Warning: Removed 5472 rows containing missing values (geom_path).

We can also plot polygons in using geom_polygon():

ggmap(base_map) +
  geom_polygon(data = tract_df, aes(x = long, y = lat, group = group), fill = "gray", color = "black", alpha = 0.4)

Just like our polygon, we need to specify the aesthetic x, y, and group. I added information telling ggplot how to plot the fill color, border color, and shading of the polygons.

Finally, let’s change the color of each census tract based on population. To do this, add fill to your aesthetic and specify the variable of interest. I also dropped the alpha option which specified transparency.

ggmap(base_map) +
  geom_polygon(data = tract_df, aes(x = long, y = lat, group = group, fill = POP100), color = "black") +
  guides(fill=guide_legend(title="Population"))

Don’t forget to change the legend title! More about manipulating legends with ggplot2 found here. Of course, you can also make maps without a basemap. Here are two quick examples:

ggplot() +
  geom_polygon(data = tract_df, aes(x = long, y = lat, group = group), alpha = 0.2, color = "black") +
  geom_point(data = cities_df, aes(x = coords.x1, y = coords.x2), color = "orange") +
  geom_path(data = trails_df, aes(x = long, y = lat, group = group), color = "dark green")

And if you want to drop the gray background, just add + theme_nothing():

ggplot() +
  geom_polygon(data = tract_df, aes(x = long, y = lat, group = group), alpha = 0.2, color = "black") +
  geom_point(data = cities_df, aes(x = coords.x1, y = coords.x2), color = "orange") +
  geom_path(data = trails_df, aes(x = long, y = lat, group = group), color = "dark green") +
  theme_nothing()
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead

If you look at this code all at once, it may seem overwhelming, but the big advantage of ggplot is that it’s modular. You can - and should - add pieces to your visualization one step at a time. If you think you’ll be mapping a lot with ggplot, this tutorial walks you through the whole process with different data.


sp

With the sp package, you can skip the fortify() step and jump right to plotting your Spatial object. Let’s say I have a line shapefile of historic trails in Utah, a point shapefile indicating the location of major cities and towns in the state, and a polygon shapefile indicating counties.

spplot(tracts, "POP100")

We can play around with the colors using RColorBrewer. In this case, the high population of the area around Salt Lake City makes it hard to tease out population differences in other areas. We can do a quick scale transformation to help with visualization.

library(RColorBrewer)
#display.brewer.all() # prints out palette options to console

palette <- brewer.pal(n = 7, name = "YlGnBu")
spplot(tracts, "POP100", main = "Population", col.regions = palette, cuts = 6, col = "transparent")

Here, I’ve specified my own color palette using RColorBrewer, added a title, and set the cuts (breaks) in the symbology (note that in general cuts should be one less than the n breaks defined in your palette). I also set county boundaries to transparent, but am regretting this decision since there are so many contiguous boundaries with no difference in population. I’m also noticing that the high population of the Salt Lake City region makes it hard for us to see population variations in rural Utah. I’ll add a few more breaks to my color palette to help us see this. Another option is to log transform population.

palette <- brewer.pal(n = 9, name = "YlGnBu")
spplot(tracts, "POP100", main = "Population", col.regions = palette, cuts = 8, col = "gray")

If you need to specify specific color breaks, this tutorial will help.

What if we want to add additional layers to this choropleth map? We can do this using sp.layout. To use sp.layout, first create a new list where the first item is the type of layer to add, the second is the actual object to plot, and any following items are plotting options:

trail_layer <- list("sp.lines", trails, col = "dark green")
spplot(tracts, "POP100", sp.layout = trail_layer, main = "Population", col.regions = palette, cuts = 8, col = "gray")

If you want to add multiple shapefiles, just join them in a list:

city_layer <- list("sp.points", cities, col = "blue")
spplot(tracts, "POP100", sp.layout = list(trail_layer, city_layer), main = "Population", col.regions = palette, cuts = 8, col = "gray")

I find this sp.layout step clunky. This is how you add additional info to spplot(), which I find far more complicated than the + approach of ggplot. If you’re interested in going further with spplot(), check out this example and this tutorial.


tmap

I’ve saved the best for last. The tmap package makes it easy to visualize sp and raster objects in R. It’s far more elegant than the other options I’ve shown you and makes creating interactive maps very simple. It also makes all of the DOGTAILS work straightforward. I am going to walk through some of the tools in the tmap package as applied to our Utah dataset. The full package vignette is fantastic and found here.

Let’s start with the qtm() or quick thematic map tool. This quickly creates maps of any shape object (similar to the base R plot() function):

library(tmap)
tmap_mode("plot") # set to "view" for interactive visualizations, see below
## tmap mode set to plotting
qtm(trails)
## Linking to GEOS 3.6.1, GDAL 2.1.3, PROJ 4.9.3

We can use qtm() to build fast choropleth maps:

qtm(tracts, fill = "POP100", fill.title = "Population in Utah")

Notice how without specification, qtm() creates a legend! As you bang your head against your desk expressing frustration over the last four packages we’ve gone through when, ultimately, you’ll probably just use this package, remember that knowledge is POWER! I wanted to introduce you to the MANY tools out there, you never know when you might get stuck and need one of them. qtm() has a number of cool arguments:

  • text allows you to specify text to go over each polygon/attribute in your shapefile (i.e. state names, abbreviations, whatever)
  • text.size controls text size.
  • format - the package comes with a few pre-defined map formats that are worth checking out. There are also a few predefined style options included.
  • fill.title titles the legend
  • text.root determines how text size is increased
  • fill.textNA allows you to specify the name of attributes with value NA

In addition to this awesome tool, tmap has a number of other tools for specific spatial data types. I’ve reproduced the full list from the tmap vignette here and recommend you go read this vignette if you plan on plotting with tmap.

Think of these as similar to the geo elements you add to a ggplot. Like ggplot, you can use tmap to add complexity and layers to a map. Unlike ggplot, we do not have to transform our sp objects into data.frames. Hooray! Here’s an example with our Utah data sets. First first element is tm_shape(), which specifies the shape object. Then we can add additional information. Here’s a very simple example:

tm_shape(tracts) +
  tm_fill()

Very informative map. Basically here I tell tmap that I am working with the tracts shapefile (tm_shape(tracts)) and I just want to fill in the whole thing in the default color, a bleak shade of gray. We can do better than this:

tm_shape(tracts) +
  tm_fill(col = "POP100", title = "Population (x100)") +
  tm_compass() +
  tm_scale_bar()

Oooohhh! We can add more information to tm_fill() to specify exactly how we want to fill polygons. Here’s one example:

tm_shape(tracts) +
  tm_fill(col = "POP100", convert2density = T, style = "jenks", title = "Population (x100)") +
  tm_compass() +
  tm_scale_bar()

You know what to do to get more info about fill options (?tm_fill). Notice how the style we chose significantly changes our visualization (i.e. maps can lie). Here, I’ve used jenks, which minimizes each class’s average deviation from the class mean. I’ve also asked tmap to plot population density, i.e. POP100 divided by the area of each polygon (which tmap calculates in the background… sniff sniff, so beautiful).

And to add the tract polygons we can add tm_borders() and specify the darkness of borders using alpha:

tm_shape(tracts) +
  tm_fill(col = "POP100", convert2density = T, style = "jenks", title = "Population (x100)") +
  tm_borders(alpha = 0.3) +
  tm_compass() +
  tm_scale_bar()

Did you notice that tmap makes adding our DOGTAILS painless? It’s a lot more painful in ggplot and spplot, which is why I didn’t show it in this tutorial (trust me, painful). Each tm_shape() and it’s specifications are referred to as a group. We can stack multiple groups (i.e. specific visualization of tm_shape(tracts), tm_shape(trails), and tm_shape(cities)). A crazy thing about tmap is that each shapefile you intend to plot can have different projections and extents. tmap will automatically reproject stacked groups of data to the projection of the first group (tracts in our case).

tm <- tm_shape(tracts) +
  tm_fill(col = "POP100", convert2density = T, style = "jenks", title = "Population (x100)") +
  tm_borders(alpha = 0.3) +
  tm_compass() +
  tm_scale_bar() +
tm_shape(trails) +
  tm_lines(col = "brown", alpha = 0.4)

tm

You can alter the format, shape, and legend, among other things, but digging a bit more into the tmap elements and code. I strongly recommend you review the [tmap in a Nutshell Vignette](https://cran.r-project.org/web/packages/tmap/vignettes/tmap-nutshell.html) and the full package documentation that explains all of the cool things you can do. I’ll quickly show you how to fix our map to make the legend a bit more legible:

tm <- tm +
  tm_legend(text.size=1,
    title.size=1.2,
    position = c("left","bottom"), 
    bg.color = "white", 
    bg.alpha=.5, 
    frame="gray50", 
    height=.6)
tm
## Some legend labels were too wide. These labels have been resized to 0.87. Increase legend.width (argument of tm_layout) to make the legend wider and therefore the labels larger.

Want to save this beautiful map we made?

save_tmap(tm, "./figs/my_awesome_map.png", width = 800, height = 1000)
## Warning in save_tmap(tm, "./figs/my_awesome_map.png", width = 800, height =
## 1000): save_tmap is deprecated as of tmap version 2.0. Please use tmap_save
## instead
## Legend labels were too wide. The labels have been resized to 0.85, 0.57, 0.50, 0.50, 0.46. Increase legend.width (argument of tm_layout) to make the legend wider and therefore the labels larger.
## Map saved to /Users/emily.burchfield/Box/Teaching/GSA/2019_GSA/09_Mapping_data/figs/my_awesome_map.png
## Resolution: 800 by 1000 pixels
## Size: 2.666667 by 3.333333 inches (300 dpi)

Let me load this thing back into my RMarkdown doc:

The last amazing, mindbogglingly cool thing we can do with tmap is quickly make interactive visualizations. I won’t have time to cover Leaflet and Shiny, the main tools for interactive mapping in R. If, however, you want to QUICKLY build a cool interactive map, tmap can help!

tmap_mode("view")
## tmap mode set to interactive viewing
last_map()
## Warning in last_map(): last_map is deprecated as of tmap version 2.0.
## Please use tmap_last
## Compass not supported in view mode.
## legend.postion is used for plot mode. Use view.legend.position in tm_view to set the legend position in view mode.
## legend.postion is used for plot mode. Use view.legend.position in tm_view to set the legend position in view mode.
# interactive map!

There are millions of (fairly easy) ways you can create beautiful maps with tmap. A few more examples can be found here:

  1. Interactive multivariate world map
  2. Fancy world map
  3. World facets
  4. US choropleth
  5. London crimes
  6. Adding insets for Hawaii and Alaska

Advanced mapping tools

…that we won’t cover, but that are very cool and that I encourage you to explore: * Shiny overview here. * Leaflet example found here and tutorials here and here.


Additional resources


  1. Note that this has changed with the introduction of sf.