Objectives


Set up

As usual, I’ll be using the EPI dataset. In this tutorial, I am filtering the full dataset to years when the TCL variable is available and a small subset of 7 countries:

library(tidyverse)
epi <- readRDS("./data/epir.RDS")

tcl <- epi %>% filter(year %in% 2001:2016) %>% select(year, continent, country, TCL) %>%
  filter(country %in% c("United States", "Brazil", "Canada", "Belgium", "Madagascar", "Peru", "Mexico"))

Heat maps

The geom_tile() geometry allows us to visualize values of a variable through time and by groups. This basically divides the data into discrete categories (boxes) by time and assigns a color to each category (box) based on the value of the variable of interest. The easiest way to understand what heat maps to is to look at one, so let’s plot changes in TCL by country through time:

ggplot(data = tcl) +
  geom_tile(aes(x = year, y = country, fill = TCL))

Each row in the plot represents a country. Columns represent years. The color of blue in the table represents the value of tree cover loss for that county-year combination. We can see that Madagascar appears to have the highest rates of TCL, and that these rates grew significantly through time. Let’s do a few quick tweaks to make the plot look better:

ggplot(data = tcl) +
  geom_tile(aes(x = year, y = country, fill = TCL)) +
  scale_fill_gradient(name = "TCL", 
                      low = "#f0f4ed",
                      high = "#496160" ) +
  theme_minimal() +
  xlab("") +
  ylab("") +
  scale_x_continuous(breaks = seq(from = 2001, to = 2015, by = 2))

The color palette isn’t great, but this gives you an idea of how to customize a color palette based on the colors of your final project. In the scale_fill_gradient() function, you need to list the variable you want to symbolize (in our case, TCL), and the low to high colors you want to use to define your color ramp. Here I go from light gray to dark gray.

To reorder by the values of the last column, try this:

reo = tcl %>% filter(year == "2014") # replace year with your last year in heat map
tcl$REO_COUNTRY = factor(tcl$country, levels = reo$country[order(reo$TCL)]) # replace TCL with your VOI

ggplot(data = tcl) +
   geom_tile(aes(x = year, y = REO_COUNTRY, fill = TCL)) +
  scale_fill_gradient(name = "TCL", 
                      low = "#f0f4ed",
                      high = "#496160" ) +
  theme_minimal() +
  xlab("") +
  ylab("") +
  scale_x_continuous(breaks = seq(from = 2001, to = 2015, by = 2))


Line plots

Another useful way to visualize change through time is with line plots. We’ve already made line plots in this class, but below, I’ll show you how to highlight specific lines to tell a more interesting story about change through time. First, let’s make a basic line plot and also add the actual data with geom_point():

ggplot(data = tcl) +
  geom_point(aes(x = year, y = TCL, color = country)) +
  geom_line(aes(x = year, y = TCL, color = country))

Remember that you can also use geom_smooth() to visualize a more smoothed version of these lines. Remember, however, that smoothing may not always be the appropriate choice since it can hide important variation in the data. See, for example, Madagascar below. If we only showed the smoothed lines, we’d miss the bumps in the data in 2011 and 2013.

ggplot(data = tcl) +
  geom_point(aes(x = year, y = TCL, color = country)) +
  geom_smooth(aes(x = year, y = TCL, color = country))

We can facet this line plot to see differences across groups. Say, for example, we want to visualize out change through time in separate line plots for each continent:

ggplot(data = tcl) +
  geom_point(aes(x = year, y = TCL, color = country)) +
  geom_smooth(aes(x = year, y = TCL, color = country)) +
  facet_grid(cols = vars(continent))

Or better yet, say we want to visualize trends through time for ALL countries in each continent:

epi %>% filter(year %in% 2001:2016) %>%
  ggplot() +
  geom_point(aes(x = year, y = TCL), alpha = 0.2) +
  geom_smooth(aes(x = year, y = TCL, color = continent), alpha = 0.5, se = F) +
  facet_grid(cols = vars(continent)) +
  theme(legend.position = "none",
      axis.text.x = element_text(angle = 45, hjust = 1)) +
  ylim(0,2.5)

Faceting is a great way to simplify complex visualizations of change through time for many groups. Another useful trick to know with line plots is how you can highlight a certain line to visualize one entities change relative to all other entities you are analyzing. Say, for example, you want to show how the rate of TCL change through time for China compares to other countries in Asia. You can do this by creating two subsets of the data (china and not_china) and using different aesthetics to symbolize each group (e.g. color = "red", and color = "grey"):

china <- epi %>% filter(continent == "Asia", year %in% 2001:2016, country == "China")
not_china <- epi %>% filter(continent == "Asia", year %in% 2001:2016, country != "China")

ggplot() +
  geom_smooth(data = not_china, aes(x = year, y = TCL, group = country), color = "grey", alpha = 0.1, se=F) +
  geom_smooth(data = china, aes(x = year, y = TCL), color = "red", se=F) +
  ylim(c(0, 2))


Contour plots

If you want to go above/beyond a scatter plot for illustrating relationships between two variables, consider a contour plot! This takes the same basic information in a scatter but visualizes it in a different way. Let’s start with a basic scatter showing the relationship between TCL and SHI or the Species Habitat Index:

ggplot(data = epi %>% filter(year == 2014)) +
  geom_point(aes(x = TCL, y = SHI))

This suggests that there may be some relationship between TCL and SHI, but it’s a pretty boring visualization. Here are some other options. First, instead of plotting points, we can visualize the density of points in a grid:

ggplot(data = epi %>% filter(year == 2014)) +
  geom_bin2d(aes(x = TCL, y = SHI)) +
  theme_bw()

To change the size of the pixels in the grid, just change the bin size:

ggplot(data = epi %>% filter(year == 2014)) +
  geom_bin2d(aes(x = TCL, y = SHI), bins = 70) +
  theme_bw()

If you want to visualize bins as hexagons rather than squares, check this out:

ggplot(data = epi %>% filter(year == 2014)) +
  geom_hex(aes(x = TCL, y = SHI)) +
  theme_bw()

Or maybe you want to visualize the relationship as a contour plot:

ggplot(data = epi %>% filter(year == 2014)) +
  geom_density_2d(aes(x = TCL, y = SHI)) +
  theme_bw()

Or a contour plot with with a bit more color (aka a 2D density plot):

ggplot(data = epi %>% filter(year == 2014)) +
  stat_density_2d(aes(x = TCL, y = SHI, fill = ..level..), geom ="polygon") +
  theme_bw()

Or both in one plot. This plot visualizes both the density polygons and the contour lines.

ggplot(data = epi %>% filter(year == 2014)) +
  stat_density_2d(aes(x = TCL, y = SHI, fill = ..level..), geom ="polygon", color = "white") +
  theme_bw()

Finally, if you want a continuous surface, you can use the stat_density_2d() geometry:

# Call the palette with a number
ggplot(data = epi %>% filter(year == 2014)) +
  stat_density_2d(aes(x = TCL, y = SHI, fill = ..density..), geom = "raster", contour = FALSE) +
  scale_fill_distiller(palette=4, direction=-1) +
  scale_x_continuous(expand = c(0, 0)) +
  scale_y_continuous(expand = c(0, 0)) +
  theme(
    legend.position='none'
  )


Additional Resources