Objectives


Pre-lab assignment


My assumptions


Set up

library(tidyverse)
library(rgdal)
library(spdep)
library(spdplyr)
library(stringr)
library(RColorBrewer)
library(gstat)

Our data

Today we’ll be working with a dataset near and dear to my heart… corn yield! Sound super boring? Well, let me take a second to discuss the importance of corn in this fine nation of ours…

The US is the world’s leading producer of corn. Most of the corn we grow is used to feed livestock and produce ethanol. Lots of corn shows up in our food, even we don’t think of our food as containing corn (high fructose corn syrup, anyone?). Corn is also the main grain used to feed livestock in the U.S. That burger you’re eating? Corn. Those chicken nuggets? Corn. Corn and all of the resources (water, energy, land, labor) required to create that corn. More than 90 MILLION acres are cultivated with corn (out of a total of 1.9 billion acres in the U.S.). Take a second to ponder the environmental impacts of all that corn. Ready to drop everything to study corn? Yeah, I know.

We’ll be working with data from the USDA’s Natural Agricultural Statistics Service (NASS). Click the footnote to learn how to download this data on your own.2

corn <- read.csv("./data/iowa.csv")
glimpse(corn)
## Observations: 99
## Variables: 21
## $ Program          <fctr> SURVEY, SURVEY, SURVEY, SURVEY, SURVEY, SURV...
## $ Year             <int> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 201...
## $ Period           <fctr> YEAR, YEAR, YEAR, YEAR, YEAR, YEAR, YEAR, YE...
## $ Week.Ending      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ Geo.Level        <fctr> COUNTY, COUNTY, COUNTY, COUNTY, COUNTY, COUN...
## $ State            <fctr> IOWA, IOWA, IOWA, IOWA, IOWA, IOWA, IOWA, IO...
## $ State.ANSI       <int> 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 1...
## $ Ag.District      <fctr> CENTRAL, CENTRAL, CENTRAL, CENTRAL, CENTRAL,...
## $ Ag.District.Code <int> 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 5...
## $ County           <fctr> BOONE, DALLAS, GRUNDY, HAMILTON, HARDIN, JAS...
## $ County.ANSI      <int> 15, 49, 75, 79, 83, 99, 127, 153, 157, 169, 1...
## $ Zip.Code         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ Region           <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ watershed_code   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ Watershed        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ Commodity        <fctr> CORN, CORN, CORN, CORN, CORN, CORN, CORN, CO...
## $ Data.Item        <fctr> CORN, GRAIN - YIELD, MEASURED IN BU / ACRE, ...
## $ Domain           <fctr> TOTAL, TOTAL, TOTAL, TOTAL, TOTAL, TOTAL, TO...
## $ Domain.Category  <fctr> NOT SPECIFIED, NOT SPECIFIED, NOT SPECIFIED,...
## $ Value            <dbl> 208.4, 204.0, 198.5, 209.0, 208.0, 214.7, 211...
## $ CV....           <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...

First, I’ll use dplyr to clean things up a bit. I downloaded only data for CORN, GRAIN - YIELD, MEASURED IN BU/ACRE so I am going to drop the Data.Item column that tells me this is what I downloaded and rename the Value column (which is actually the yield data) Yield. I’m also going to drop the Program variable (all observations are from the survey), Year (all data is from 2016), and the Week.Ending and CV variables (all values as NA). It’s a bummer, but for the whatever reason, the Watershed and Watershed_code data is not available (though we now know how to use a spatial join to link watersheds to counties), so I’ll drop those too… basically I’m keeping all of the info we could use to geolocate the yield data and the actual yield data.

corn <- corn %>% select(State, State.ANSI, Ag.District, Ag.District.Code, County, County.ANSI, Yield = Value) 
glimpse(corn)
## Observations: 99
## Variables: 7
## $ State            <fctr> IOWA, IOWA, IOWA, IOWA, IOWA, IOWA, IOWA, IO...
## $ State.ANSI       <int> 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 1...
## $ Ag.District      <fctr> CENTRAL, CENTRAL, CENTRAL, CENTRAL, CENTRAL,...
## $ Ag.District.Code <int> 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 5...
## $ County           <fctr> BOONE, DALLAS, GRUNDY, HAMILTON, HARDIN, JAS...
## $ County.ANSI      <int> 15, 49, 75, 79, 83, 99, 127, 153, 157, 169, 1...
## $ Yield            <dbl> 208.4, 204.0, 198.5, 209.0, 208.0, 214.7, 211...

A first interesting question is whether there’s significant variation in yields across counties. One way to quickly visualize this is by building a histogram:

hist(corn$Yield, 100, xlab = "Corn yields (bu/ac)", main="")

We know that yields vary across space (and across time, but that’s for another day). Therefore, our first challenge is to visualize this spatial variation by making this dataset spatial. I know this data is at the county level and I have some attributes describing counties (County, County.ANSI, State, State.ANSI). ANSI stands for the American National Standards Institute (ANSI) codes. These are used across government institutions to help standardize data. What does this mean for us? That we should be able to join our data to a U.S. Census County shapefile! I just happen to have one here (minus counties outside of the contiguous US, sorry AK and HI you guys are just too far away… and I’m not sure Alaskan corn is a thing). I’ll start by filtering out Iowa:

library(spdplyr)
library(rgdal)
cty <- readOGR("./data/cb_2015_us_county_500k.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "./data/cb_2015_us_county_500k.shp", layer: "cb_2015_us_county_500k"
## with 3108 features
## It has 11 fields
## Integer64 fields read as strings:  ALAND AWATER
cty <- cty %>% filter(STATEFP == 19)
plot(cty)

I know, from looking up the FIPS code for Iowa that the values of State.ANSI and County.ANSI in the USDA dataset are the same as the STATEFP and COUNTYFP in the cty dataset. Hooray! We can join the data! The last issue is that the COUNTYFP and STATEFP variables in our cty shapefile have leading zeros (i.e. 015 instead of 15). Let’s add leading zeros to the County.ANSI dataset using the str_pad() function in the stringr package:

library(stringr)
corn$State.ANSI <- str_pad(corn$State.ANSI, 2, pad="0")
corn$County.ANSI <- str_pad(corn$County.ANSI, 3, pad = "0")

Now let’s join our corny data to the county shapefile. Remember, that some polygons in this shapefile will not have data. If we do this join right, these values will be listed as NA. First, I’m going to concatenate our state and county identifiers into a single ID for each county. This way we can create a unique ID for each county (otherwise County.ANSI has multiple instances of the same value across different states).

corn$ID <- paste0(corn$State.ANSI, corn$County.ANSI)
cty@data$ID <- paste0(cty@data$STATEFP, cty@data$COUNTYFP)
head(corn$ID)
## [1] "19015" "19049" "19075" "19079" "19083" "19099"
head(cty@data$ID)
## [1] "19041" "19045" "19111" "19135" "19027" "19075"

I’ll also take a quick peak at how many shared values there are across the two datasets:

length(intersect(cty@data$ID, corn$ID))
## [1] 99

99… out of 99 counties. YES! Let’s merge these suckers:

cty_corn <- merge(cty, corn, by.x = "ID", by.y = "ID", all.x=T)

Now this new shapefile contains all of our data:

glimpse(cty_corn@data)
## Observations: 99
## Variables: 18
## $ ID               <chr> "19041", "19045", "19111", "19135", "19027", ...
## $ STATEFP          <fctr> 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, ...
## $ COUNTYFP         <fctr> 041, 045, 111, 135, 027, 075, 137, 157, 163,...
## $ COUNTYNS         <fctr> 00465625, 00465211, 00465244, 00465256, 0046...
## $ AFFGEOID         <fctr> 0500000US19041, 0500000US19045, 0500000US191...
## $ GEOID            <fctr> 19041, 19045, 19111, 19135, 19027, 19075, 19...
## $ NAME             <fctr> Clay, Clinton, Lee, Monroe, Carroll, Grundy,...
## $ LSAD             <fctr> 06, 06, 06, 06, 06, 06, 06, 06, 06, 06, 06, ...
## $ ALAND            <fctr> 1469139469, 1799830897, 1340377291, 11233172...
## $ AWATER           <fctr> 13866649, 39514681, 55290807, 1522790, 21453...
## $ AREA_SQKM        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ State            <fctr> IOWA, IOWA, IOWA, IOWA, IOWA, IOWA, IOWA, IO...
## $ State.ANSI       <chr> "19", "19", "19", "19", "19", "19", "19", "19...
## $ Ag.District      <fctr> NORTHWEST, EAST CENTRAL, SOUTHEAST, SOUTH CE...
## $ Ag.District.Code <int> 10, 60, 90, 80, 40, 50, 70, 50, 60, 10, 20, 7...
## $ County           <fctr> CLAY, CLINTON, LEE, MONROE, CARROLL, GRUNDY,...
## $ County.ANSI      <chr> "041", "045", "111", "135", "027", "075", "13...
## $ Yield            <dbl> 202.0, 213.8, 194.6, 180.0, 203.1, 198.5, 185...

FINALLY! LET’S PLOT THIS:

library(RColorBrewer)
my.palette <- brewer.pal(n = 9, name = "YlGn") 
spplot(cty_corn, "Yield", col.regions = my.palette, cuts = 8, col = "gray", main = "Corn yields", xlab = "", ylab ="")