There is no single package for mapping data in R
. I’ve found that different packages and functions perform best for different types of maps. In this tutorial, I will walk you through several packages, data sources, and plotting tools. I realize this may be a bit more confusing than focusing on a single package, but after quite a bit of thought on the matter, I decided the most useful thing would be to introduce you to the best tools I’ve found to date and let you decide which package/toolkit is the most appropriate for your visualization needs. With this in mind, our objectives for this tutorial are as follows:
R
to…R
is fantastic for spatial analysis. R
is OK at spatial data visualization. R
is great for interactive data visualization (Leaflet, Shiny). If you have a highly map-centric project, there is nothing wrong with working in ArcGIS or QGIS if you think the mapping tools I show you in this tutorial are insufficient.R
.Make sure you’ve installed and loaded the following:
library(ggmap)
library(tidyverse)
library(rgdal)
library(tmap)
In what follows, I’m going to walk you through the basics of map-making in R
. We’ll start by learning how to access some great basemaps (not to be used for analysis, just for visualization), the slowly add shapefiles and rasters. We’ll learn how to make some common maps (i.e. choropleth) in R
and walk through a couple of packages that simplify the whole map-making process. We won’t have time to get to interactive maps, which is one of the cooler things you can do with R
, but if you’re interested in this, then you’ll need to learn more about Leaflet and Shiny. You can see a few examples of interactive maps made in R
here.
This week we are going to work with datasets from the Utah GIS repository. You can download each dataset at the links below:
Once you’ve downloaded the data, load them into R
. I’ve loaded them with the readOGR()
function from rgdal
, but by now you know several ways to load spatial data, so you do you:
library(rgdal)
trails <- readOGR("./data/HistoricTrails.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/emily.burchfield/Box/Teaching/GSA/2019_GSA/09_Mapping_data/data/HistoricTrails.shp", layer: "HistoricTrails"
## with 3415 features
## It has 9 fields
cities <- readOGR("./data/CitiesTownsLocations.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/emily.burchfield/Box/Teaching/GSA/2019_GSA/09_Mapping_data/data/CitiesTownsLocations.shp", layer: "CitiesTownsLocations"
## with 460 features
## It has 7 fields
## Integer64 fields read as strings: POPULATION
tracts <- readOGR("./data/CensusTracts2010.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/emily.burchfield/Box/Teaching/GSA/2019_GSA/09_Mapping_data/data/CensusTracts2010.shp", layer: "CensusTracts2010"
## with 588 features
## It has 31 fields
## Integer64 fields read as strings: P0010011 P0010012 P0010013
Luckily, all of these datasets are already in the same projection. This allows us to quickly visualize all three datasets in a single plot. If they weren’t in the same projection, you’d need to reproject them to a single projection before doing any visualization.
plot(tracts)
plot(cities, add = T, col = "red")
plot(trails, add = T, col = "blue")
You can actually get fairly far with the base R
plot()
function. The code gets dense quite quickly and the visualizations are nothing like what tmap
and ggplot/ggmap
can produce (see below). I recommend using plot()
only to quickly visualize your data. Data visualizations you intend to publish or share should be made with one of the packages I describe below. If you’re interested in learning more about using plot()
, I recommend Chapter 3 of Applied Spatial Data Analysis in R by Roger Bivand and colleagues. Below, I’ve inserted a table from p. 68 of this text that reviews useful arguments to be passed to the plot()
function when working with spatial objects. If you want to learn even more about plot()
, go here and here and/or here and/or here.
Before we begin, I am going to transform these datasets into a lat/long coordinate system. This will help us add them to web-based maps later in the tutorial:
crs <- CRS("+init=epsg:4269")
tracts <- spTransform(tracts, crs)
cities <- spTransform(cities, crs)
trails <- spTransform(trails, crs)
ggplot2
and ggmap
You can do quite a bit of mapping with our friend ggplot2
. The disadvantage of ggplot
is that you have to transform any sp
object you’re working with into a data.frame
.1 The tool you use to transform a sp
object to a data.frame
depends on the type of sp
object you’re working with. For point data, you can use the data.frame()
function like so:
cities_df <- data.frame(cities)
head(cities_df)
## NAME COUNTY COUNTYNBR COUNTYSEAT POPULATION TYPE
## 1 Milford Beaver 01 0 1422 City
## 2 Minersville Beaver 01 0 921 Town
## 3 Snowville Box Elder 02 0 178 Town
## 4 Bear River City Box Elder 02 0 869 City
## 5 Corinne Box Elder 02 0 694 City
## 6 Deweyville Box Elder 02 0 347 Town
## CARTO coords.x1 coords.x2 optional
## 1 Town/Place -113.0108 38.39707 TRUE
## 2 Town/Place -112.9240 38.21553 TRUE
## 3 Town/Place -112.7105 41.96263 TRUE
## 4 Town/Place -112.1270 41.61653 TRUE
## 5 Town/Place -112.1093 41.55077 TRUE
## 6 Town/Place -112.0928 41.71537 TRUE
This function essentially appends the cities@coords
coordinate columns (here coords.x1
and coords.x2
) to the cities@data
data.frame
.
How would we transform polygons or lines into a data.frame
? Think of a line shape as a collection of points connected by straight lines. We can store coordinates for each point and the order in which points should be drawn in rows describing a single line. Attributes for a single line can be repeated for each point on a line. Let me show you this with our trail data. First, I’ll select a single route using spdplyr
to make this a bit easier to understand:
library(spdplyr)
brl <- trails %>%
filter(RouteName == "Bear River Loop")
plot(brl)
I have isolated one route from the larger trails
dataset. This SpatialLinesDataFrame
route is stored in several line segments:
glimpse(brl@data)
## Observations: 7
## Variables: 9
## $ TrailName <fct> California Trail, California Trail, California Trail,…
## $ RouteName <fct> Bear River Loop, Bear River Loop, Bear River Loop, Be…
## $ AltRouteNa <fct> NA, NA, NA, NA, NA, NA, NA
## $ County <fct> Lincoln, Lincoln, Bear Lake, Bear Lake, Bear Lake, Be…
## $ State <fct> WY, WY, ID, ID, ID, ID, ID
## $ Scale <fct> "1:100,000", "1:100,000", "1:100,000", "1:100,000", "…
## $ AdminAgenc <fct> NPS, NPS, NPS, NPS, NPS, NPS, NPS
## $ Source <fct> NPS, NPS, NPS, NPS, NPS, NPS, NPS
## $ Shape_Leng <dbl> 4003.20738, 69.89398, 8575.35463, 10733.19146, 617.61…
To plot this using ggplot
, I need to transform this set of lines into a data.frame
. The fortify()
function essentially transforms this SpatialLinesDataFrame
into a data.frame
where each row describes the coordinates and plotting order of a single point along a line. Watch:
brl_df <- fortify(brl)
head(brl_df)
## long lat order piece id group
## 1 -111.0003 42.17425 1 1 219 219.1
## 2 -111.0071 42.17616 2 1 219 219.1
## 3 -111.0119 42.17795 3 1 219 219.1
## 4 -111.0173 42.17985 4 1 219 219.1
## 5 -111.0203 42.18088 5 1 219 219.1
## 6 -111.0229 42.18156 6 1 219 219.1
Notice that the new data frame contains coordinates (long
and lat
) for points along a line. order
indicates the order in which a point should be drawn and group
indicates the line segment to which the point belongs. We can append the other data in the original brl
SpatialLinesDataFrame
by using the id
variable:
brl@data$id <- rownames(brl@data)
brl_df <- merge(brl_df, brl@data, by = "id")
Now we can plot this line using ggplot()
:
ggplot(brl_df) +
geom_path(aes(x = long, y = lat, group = group)) +
coord_fixed()
I’ll explain how to build maps in a second. First, let me walk you through fortify()
with a polygon. Like lines, think of a polygon as an area described by a collection of points joined by lines to enclose a polygon. When we fortify()
a SpatialPolygonsDataFrame
into a data.frame
, we generate a row for each point, an order in which points should be plotted, and the group, or polygon, to which a point belongs:
tract_df <- fortify(tracts)
## Regions defined for each Polygons
tracts@data$id <- rownames(tracts@data)
tract_df <- merge(tract_df, tracts@data, by = "id")
head(tract_df)
## id long lat order hole piece group STATEFP10 COUNTYFP10
## 1 0 -112.0021 41.52189 1 FALSE 1 0.1 49 003
## 2 0 -112.0020 41.52181 2 FALSE 1 0.1 49 003
## 3 0 -112.0020 41.52179 3 FALSE 1 0.1 49 003
## 4 0 -112.0017 41.52155 4 FALSE 1 0.1 49 003
## 5 0 -112.0016 41.52147 5 FALSE 1 0.1 49 003
## 6 0 -112.0015 41.52136 6 FALSE 1 0.1 49 003
## TRACTCE10 GEOID10 NAME10 FUNCSTAT10 INTPTLAT10 INTPTLON10
## 1 960601 49003960601 9606.01 S +41.5159999 -112.0089180
## 2 960601 49003960601 9606.01 S +41.5159999 -112.0089180
## 3 960601 49003960601 9606.01 S +41.5159999 -112.0089180
## 4 960601 49003960601 9606.01 S +41.5159999 -112.0089180
## 5 960601 49003960601 9606.01 S +41.5159999 -112.0089180
## 6 960601 49003960601 9606.01 S +41.5159999 -112.0089180
## LOGRECNO AREALAND AREAWATR POP100 HU100 P0020001 P0020002 P0020003
## 1 0000009 2185421 0 3720 1395 3720 396 3324
## 2 0000009 2185421 0 3720 1395 3720 396 3324
## 3 0000009 2185421 0 3720 1395 3720 396 3324
## 4 0000009 2185421 0 3720 1395 3720 396 3324
## 5 0000009 2185421 0 3720 1395 3720 396 3324
## 6 0000009 2185421 0 3720 1395 3720 396 3324
## P0020004 P0020005 P0020006 P0020007 P0020008 P0020009 P0020010 MTFCC
## 1 3262 3188 17 27 15 6 9 G5020
## 2 3262 3188 17 27 15 6 9 G5020
## 3 3262 3188 17 27 15 6 9 G5020
## 4 3262 3188 17 27 15 6 9 G5020
## 5 3262 3188 17 27 15 6 9 G5020
## 6 3262 3188 17 27 15 6 9 G5020
## P0010011 P0010012 P0010013 P0010014 SqMiles SHAPE_Leng SHAPE_Area
## 1 11 30 17 0 0 6709.768 2184057
## 2 11 30 17 0 0 6709.768 2184057
## 3 11 30 17 0 0 6709.768 2184057
## 4 11 30 17 0 0 6709.768 2184057
## 5 11 30 17 0 0 6709.768 2184057
## 6 11 30 17 0 0 6709.768 2184057
Now let’s get to mapping. One of the nice things about ggplot
is that you can easily grab base maps off the web using ggmap
. Let’s say we want to grab a map of Utah from Google Maps or OpenStreetMap.
library(ggmap)
base_map <- get_map(location = 'utah', zoom = 6, maptype = 'satellite', source = 'google', crop = T)
plot(base_map)
You can play around with the zoom
and maptype
to get the map you need. This website provides a nice overview of the maptype
and source
options. You drop in coordinates for your location or use the bounding box of a shapefile you’re working with to define the basemap extent. Note that these basemaps are projected in a Web Mercator projection that makes rendering visualizations more efficient. These basemaps aren’t to be used in analyses, just for visualization. If you plan on using a ggmap
basemap or other basemaps with projected data, I strongly recommend you read this article. Great overviews of all the things you can do with this package can be found here and here.
Once you find a basemap you like, you can add layers of data to ggmap()
just like you would with ggplot()
(e.g. + geom_point()
). Note, however, that ggmap()
sets the default map dataset and aesthetics. This means that as you add layers other than the basemap, you have to specify both the mapping
and data
arguments to the geom.
trails_df <- fortify(trails)
trails@data$id <- rownames(trails@data)
trails_df <- merge(trails_df, trails@data, by = "id")
ggmap(base_map) +
geom_point(data = cities_df, aes(x = coords.x1, y = coords.x2)) +
geom_path(data = trails_df, aes(x = long, y = lat, group = group)) +
coord_fixed()
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Warning: Removed 5472 rows containing missing values (geom_path).