Objectives

Set-up

library(tidyverse)
epi <- readRDS("./data/epir.RDS")

Bar plots

Bar plots are great for visually comparing values of categorical variables. Say you want to use a bar plot to visually compare the population of several countries.

epi_sub <- epi %>%
  filter(country %in% c("China", "India", "United States of America", "Brazil"), year == 2016)

ggplot(epi_sub) +
  geom_bar(aes(x = country, y = POP), stat = "identity") + 
  xlab("") +
  ylab("Population in 2016")

Not a particularly beautiful plot, but we can clearly see that India and China have massive populations compared to the other countries we’ve selected. There are a few things we can do to make this look much better. First, we can reorder the bars by population:

ggplot(epi_sub) +
  geom_bar(aes(x = reorder(country, desc(POP)), y = POP), stat = "identity") +
  xlab("") +
  ylab("Population in 2016") 

For ascending order, just drop the desc() function. Another nice trick is to flip the bar plot to make the axes a bit easier to read:

ggplot(epi_sub) +
  geom_bar(aes(x = reorder(country, POP), y = POP), stat = "identity") +
  xlab("") +
  ylab("Population in 2016") +
  coord_flip()

We can change the fill color, border color, and transparency of all bars as follows:

ggplot(epi_sub) +
  geom_bar(aes(x = reorder(country, POP), y = POP), stat = "identity",
           fill = "dodgerblue", color = "grey40", alpha = 0.5) +
  xlab("") +
  ylab("Population in 2016") +
  coord_flip()

Here, the fill argument refers to the color of the bars, color refers to the line around the bars, and alpha refers to the transparency of the fill.

We can use the scales package to remove the scientific notation on the x-axis and also improve some of our labeling and our theme:

library(scales)
ggplot(epi_sub) +
  geom_bar(aes(x = reorder(country, POP), y = POP), stat = "identity",
           fill = "dodgerblue", color = "grey40", alpha = 0.5) +
  xlab("") +
  ylab("") +
  labs(title = "2016 Population") +
  coord_flip() +
  scale_y_continuous(labels = comma) +
  theme_minimal()

…and we could even change our color scheme to fill the color of the bar based on the continent in which it is located by adding a fill argument in the aes() function:

ggplot(epi_sub) +
  geom_bar(aes(x = reorder(country, POP), y = POP, fill = continent), stat = "identity") +
  xlab("") +
  ylab("") +
  labs(title = "2016 Population") +
  coord_flip() +
  scale_y_continuous(labels = comma) +
  theme_minimal()

Finally, we can manually change the fill colors as follows:

ggplot(epi_sub) +
  geom_bar(aes(x = reorder(country, POP), y = POP, fill = continent), stat = "identity") +
  scale_fill_manual(values = c("#e5f5e0", "#a1d99b", "#31a354")) +
  xlab("") +
  ylab("") +
  labs(title = "2016 Population") +
  coord_flip() +
  scale_y_continuous(labels = comma) +
  theme_minimal() +
  theme(legend.title = element_blank())  # also dropped legend title

One last trick that’s good to know with bar plots is that you can play around with the width of the bars by adding a width argument to the geom_bar() function:

ggplot(epi_sub) +
  geom_bar(aes(x = reorder(country, POP), y = POP, fill = continent), stat = "identity", width = 0.6) +
  scale_fill_manual(values = c("#e5f5e0", "#a1d99b", "#31a354")) +
  xlab("") +
  ylab("") +
  labs(title = "2016 Population") +
  coord_flip() +
  scale_y_continuous(labels = comma) +
  theme_minimal() +
  theme(legend.title = element_blank())  # also dropped legend title

Another cool idea is to add labels to each bar. These labels could be the numerical value you are trying to visualize (population in our case):

ggplot(epi_sub) +
  geom_bar(aes(x = reorder(country, POP), y = POP, fill = continent), stat = "identity") +
  scale_fill_manual(values = c("#e5f5e0", "#a1d99b", "#31a354")) +
  xlab("") +
  ylab("") +
  labs(title = "2016 Population", subtitle = "Billions of individuals") +
  coord_flip() +
  scale_y_continuous(labels = comma) +
  theme_minimal() +
  theme(legend.title = element_blank())  +
  geom_text(aes(x = reorder(country, POP), y = POP, label = round(POP/1000000000, 2)), nudge_y = 2, color = "black")

You can play with the nudge_x and nudge_y parameters to get the labels where you’d like, or write out as a svg and make these changes in InkScape:

ggsave("myfig.svg", my_plot)

Say you want to visualize the population of all countries in Asia and highlight China. This is a great way to visualize difference. To make this easier, I’m going to create a new variable in our epi_sub dataset that is equal to one of the country is China and equal to zero in all other cases:

epi_ch <- epi_sub %>% mutate(china = ifelse(country == "China", TRUE, FALSE))

ggplot(epi_ch) +
  geom_bar(aes(x = reorder(country, POP), y = POP, fill = china), stat = "identity", width = 0.6) +
  scale_fill_manual(values = c("grey", "red")) +
  xlab("") +
  ylab("") +
  labs(title = "2016 Population") +
  coord_flip() +
  scale_y_continuous(labels = comma) +
  theme_minimal() +
  theme(legend.position = "none") +
  annotate("text", x = "China", y = 1000000000, label = "Population of 1.38 billion", color = "white") 

Additional resources