Objectives

  • Learn how to import, visualize, and transform rasters in R using the terra package.

Set up

The main tool we’ll be working with in this tutorial is the terra package, so go ahead and install/load that. You can work with rasters using the older raster, but the terra package is quickly becoming the best package for gridded data in R, so we’ll focus on that!

library(terra)
library(sf)
library(tidyverse)

Raster 101

Raindrops on roses and rasters of land use… these are a few of my favorite things. Rasters are fantastic tools to store massive amounts of massively cool data. Think of a raster as a matrix… or a grid of numbers. Each pixel in the raster has number associated with it that tells us something about the data of interest, i.e.

  • How red is this pixel?
  • How near-infrared is this pixel (yes we can answer this)?
  • How wet is this pixel?
  • What grows in this pixel?
  • How deep is the groundwater under this pixel (see GRACE satellite)?
  • How many people live in this pixel (yes, people have done this using night light data)?
  • How many cars are in that parking lot by your house? How many lawn chairs in your backyard (freaking out yet?)? You get the idea.

Let me pause here and state that you are very lucky because you have insanely easy access to incredible, beautiful visualizations of our precious planet.. in near real time, all the time, anywhere. Take a second to ponder that. Then go download Google Earth and start playing.


Get to know your raster

The data stored in rasters can be continuous (precipitation, elevation, reflectance) or categorical (land use category). In this tutorial we’ll work with both. For a raster to be a raster, we need a few key pieces of information:

  • Resolution (in space and time!)
  • Extent
  • Values (elevation, land use category, etc.)
  • Projection information

When you’re working with raster data, especially satellite imagery, think carefully about trade-offs between resolution in space and time. Some datasets are available daily at a low spatial resolution; others are available monthly at a high spatial resolution; and others are high spatial and temporal resolution, but massively large and hard for you to work with on your laptop.


Raster data 101

The main data type you’ll be working with in raster-world is the SpatRaster. Wuzdat? It’s basically a data structure that allows R to read and understand gridded data. This data can be two dimensional (so think a single picture) or three-dimensional (so think a STACK of pictures, for example a stack of rasters showing land use in one place through time). Note that terra also supports SpatVectors, so points, lines and polygons like we’ve been working with in sf, but we’ll stick to sf in this class.

OK, so what are the basic ingredients you need to “bake” a raster? Well, think of the things that describe a picture (which is just a big matrix of pixels). A picture has:

  • Dimensions, measured in the number of rows and columns in the matrix
  • Resolution, or how big each pixel in the image is
  • Since we’re working with spatial data, we also have extent, or the coordinates that define the bounds of the raster image
  • Ah! And since it’s spatial, we also need to include projection information, stored as the coordinate reference system used to project the raster into space.
  • DATA. This is the actual value of the pixel cells in the raster. Sometimes this data measures continuous phenomena (like temperature and precipitation), and other times it measures categorical phenomena (like land use categories).

Let’s start by making a simple raster from scratch. Then we’ll play with some categorical and continuous raster data using the terra package.

my_rast <- rast()

Congrats! You just made a raster called my_rast using the rast() function. If you ask R what the class() of this new thing you built is, you’ll see it’s a SpatRaster:

class(my_rast)
## [1] "SpatRaster"
## attr(,"package")
## [1] "terra"

You can get more information about the raster by simply typing the name of the raster in the console:

my_rast
## class       : SpatRaster 
## dimensions  : 180, 360, 1  (nrow, ncol, nlyr)
## resolution  : 1, 1  (x, y)
## extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84

Cool. We have made a raster with 180 rows, 360 columns and 1 layer (so it’s not a stacked raster with multiple layers, e.g. showing change through time or different wavelengths of light, it’s just ONE picture). It’s got a resolution of 1x1. 1x1 what…? Let’s ask R for more information about the projection of this data (which we can see is lon/lat WGS 84 above):

crs(my_rast)
## [1] "GEOGCRS[\"WGS 84\",\n    DATUM[\"World Geodetic System 1984\",\n        ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n            LENGTHUNIT[\"metre\",1]],\n        ID[\"EPSG\",6326]],\n    PRIMEM[\"Greenwich\",0,\n        ANGLEUNIT[\"degree\",0.0174532925199433],\n        ID[\"EPSG\",8901]],\n    CS[ellipsoidal,2],\n        AXIS[\"longitude\",east,\n            ORDER[1],\n            ANGLEUNIT[\"degree\",0.0174532925199433,\n                ID[\"EPSG\",9122]]],\n        AXIS[\"latitude\",north,\n            ORDER[2],\n            ANGLEUNIT[\"degree\",0.0174532925199433,\n                ID[\"EPSG\",9122]]]]"

Well that’s a ton of info - but check out the piece that says LENGTHUNIT… you’ll see that the unit length here is in meters. Note that this isn’t always the case. Sometimes the resolution is reported in degrees, kilometers, and (for rogue Americans) miles. So always check this to understand your data and to make sure it is being read by R correctly.

OK, so these are the default geometry parameters for a raster in R, but what if we want to make a special, fancy raster with parameters we define?

my_fancy_rast <- rast(ncol = 10, nrow = 10)
ncell(my_fancy_rast)
## [1] 100

Now I’ve made a 10x10 raster with 100 cells. Cool.

hasValues(my_fancy_rast)
## [1] FALSE

But sadly there’s no data in this raster (aka, the cells don’t have any actual values, so we can’t really visualize anything here). It’s kinda’ like we have an empty matrix - it has specific dimensions, but is empty. Let’s add some random values:

values(my_fancy_rast) <- runif(ncell(my_fancy_rast)) # runif takes random numbers from a uniform distribution
plot(my_fancy_rast, main = "LOOK AT MY FANCY RASTER:")


Loading raster data

Well congrats on your raster. But fortunately, we rarely have to make our own rasters from scratch. Most of the time, we load rasters already built by other folks (thanks NASA!). These data can take many many many different forms. In this tutorial, we will focus on learning how to load and manipulate the most commonly used types of raster data. There are tons of raster extensions out there: .grd, .asc, .sdat, .rst, .nc, .tif, .envi, .bil, .img, .hdf. I’ve encountered all of these at some point… and the terra package can handle just about all of them.

We’ll get to raster stacks later in the tutorial. For now, let’s work with a single layer of my current favorite raster data set, the CropScape dataset.. This is a categorical raster that covers most of the country from 2000 to present at a 30 meter - YES T-H-I-R-T-Y meter - resolution. Since the data set was built by the folks at the U.S. Department of Agriculture, there are more detailed categorizations for agricultural lands than for ecosystems.1

I went to the CropScape website and downloaded data for DeKalb County, GA in 2022. To do this, click “Select Area” and type in the state (Georgia) and county (DeKalb). This should take you to DeKalb County. You’ll see lots of gray which is (you guessed it) - urban cover. Play with the slider bar at the bottom of the map to see if you can spot any interesting changes in land use through time with your eyes (you can use the Animate tool for this). Then, make sure you select 2022, and click Export in the top right corner of the screen. Again, confirm you’re downloading data for 2022. Note the projection (Web Mercator (WGS 84)) and especially the file type. I recommend sticking with GeoTiffs. This file structure can hold both the data (and thus the image) but also the other relevant information we need for spatial analysis (projection, extent, etc…). DPI of 96 is fine, then click export and Download Exported Area. Unzip the folder you’ve downloaded. In it, you should see several files… we need ALL of these files to load the raster in R, so move them ALL to your data folder.

Then you can load your raster with the easiest function ever (pointing to the .tif file):

cdl <- rast("./data/clipped.tif")
cdl
## class       : SpatRaster 
## dimensions  : 1888, 1609, 1  (nrow, ncol, nlyr)
## resolution  : 30, 30  (x, y)
## extent      : -9395692, -9347422, 3972700, 4029340  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / Pseudo-Mercator (EPSG:3857) 
## source      : clipped.TIF 
## color table : 1 
## categories  : Class_Name 
## name        :  Class_Name 
## min value   :        Corn 
## max value   : Blueberries

OK, cool, what can we see here? We have a raster with 1888 rows and 1609 columns. Only one layer (so only 2022). The extent tells us about the coordinates (in the WGS84 CRS) that define the extent of the raster. Wait a sec… blueberries!? Corn!?

plot(cdl)

Before we dive into this categorical raster a bit more, let’s pull out some basic info about the data:

ext(cdl) # extent
## SpatExtent : -9395691.70219402, -9347421.70219402, 3972699.54617784, 4029339.54617784 (xmin, xmax, ymin, ymax)
print(paste("This groovy data has", ncol(cdl), "columns.", sep = " "))
## [1] "This groovy data has 1609 columns."
print(paste("This groovy data has", nrow(cdl), "rows.", sep = " "))
## [1] "This groovy data has 1888 rows."

Or we can simply call:

dim(cdl)
## [1] 1888 1609    1

We can also find the resolution of our data:

res(cdl)
## [1] 30 30

30 meters!!!!!!!!!!!! Can you believe it!? Go, science, go!!!!

There’s also the projection information (crs(cdl)) and the data itself (values(cdl)). The nice thing about terra objects is that you don’t need to specify the values(cdl) piece for a number of useful functions to work. Need the max and min values of the raster?

range(cdl)
## class       : SpatRaster 
## dimensions  : 1888, 1609, 2  (nrow, ncol, nlyr)
## resolution  : 30, 30  (x, y)
## extent      : -9395692, -9347422, 3972700, 4029340  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / Pseudo-Mercator (EPSG:3857) 
## source(s)   : memory
## names       : range_min, range_max 
## min values  :         1,         1 
## max values  :       242,       242

Let’s see a histogram of our pixel values:

hist(cdl)

Any NAs? Here are a few ways to answer that question:

length(cdl[is.na(cdl)])
## [1] 1906311
global(cdl, fun = "isNA")
##               isNA
## Class_Name 1906311
global(cdl, fun = "notNA")
##              notNA
## Class_Name 1131481

The global() function lets you apply functions (even those you build!) to your raster. Here, we’re just counting the amount if isNA (pixels that are NA) and notNA (pixels that are not NA in our raster).

Lots of missing data! What percent of pixels are missing?

length(cdl[is.na(cdl)])/(ncell(cdl))
## [1] 0.6275318

Why so much missing data? Look at the plot of our raster. Raster datasets are stored with a well defined bounding box, so they are always rectangular. The region of interest, however, is rarely rectangular (unless you’re in one of those states with a bunch of square counties). This leads to a lot of missing data around the edges. Check for this and be wary of missing data when you compute summary stats on your raster… You can quickly check to see the value of this “black space” in the raster plot by indexing pixels at the corners of the raster, i.e.:

cdl[1,1]
##   Class_Name
## 1       <NA>

This returns the value of the pixel at location row = 1, column = 1, which is stored as NA. Note that if all this missing data around your raster really bothers you, you can use the trim() function to remove outer rows and columns that only contain NA values.

Another nice trick to help you check the contents of your raster: you can use the freq() function to build a frequency table for your data. Probably only do this for categorical data, or you’ll have a gigantic frequency table:

freq(cdl)
##    layer                    value  count
## 1      1                     Corn     75
## 2      1                   Cotton     37
## 3      1                 Soybeans    149
## 4      1             Winter Wheat     37
## 5      1 Dbl Crop WinWht/Soybeans     29
## 6      1                      Rye     10
## 7      1                     Oats      6
## 8      1                   Millet     10
## 9      1    Other Hay/Non Alfalfa    984
## 10     1                 Tomatoes      1
## 11     1           Sod/Grass Seed      5
## 12     1                  Peaches      7
## 13     1         Other Tree Crops      1
## 14     1                   Pecans      6
## 15     1               Open Water  10985
## 16     1     Developed/Open Space 296101
## 17     1  Developed/Low Intensity 261846
## 18     1  Developed/Med Intensity 147132
## 19     1 Developed/High Intensity  79371
## 20     1                   Barren   9384
## 21     1         Deciduous Forest 137228
## 22     1         Evergreen Forest 126904
## 23     1             Mixed Forest  17696
## 24     1                Shrubland   2525
## 25     1        Grassland/Pasture  28467
## 26     1           Woody Wetlands  12360
## 27     1      Herbaceous Wetlands    102
## 28     1     Dbl Crop WinWht/Corn      3
## 29     1       Dbl Crop Oats/Corn      1
## 30     1  Dbl Crop WinWht/Sorghum     16
## 31     1   Dbl Crop Soybeans/Oats      1
## 32     1              Blueberries      2

Or if you want to know the frequency of one specific land use category, check this out:

freq(cdl, value = 1) # Corn
##   layer value count
## 1     1     1    75

You can use your plotting skillz to turn this into a ggplot:

library(tidyverse)
freq <- as.data.frame(freq(cdl)) %>% filter(value != "NA")
ggplot(data = freq) +
  geom_bar(aes(x = reorder(value, desc(count)), y = count), stat = "identity") +
  xlab("Land use category") + ylab("Count of pixels") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  ggtitle("What's going on in our county?")


Raster reprojection

Oftentimes two raster datasets will not line up because of differences in resolution and/or extent. The projectRaster() function from the raster package lets us reproject datasets so they can talk to one another. Let’s look at the projection info for our cdl raster:

crs(cdl)
## [1] "PROJCRS[\"WGS 84 / Pseudo-Mercator\",\n    BASEGEOGCRS[\"WGS 84\",\n        DATUM[\"World Geodetic System 1984\",\n            ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n                LENGTHUNIT[\"metre\",1]]],\n        PRIMEM[\"Greenwich\",0,\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n        ID[\"EPSG\",4326]],\n    CONVERSION[\"unnamed\",\n        METHOD[\"Popular Visualisation Pseudo Mercator\",\n            ID[\"EPSG\",1024]],\n        PARAMETER[\"Latitude of natural origin\",0,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8801]],\n        PARAMETER[\"Longitude of natural origin\",0,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8802]],\n        PARAMETER[\"False easting\",0,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8806]],\n        PARAMETER[\"False northing\",0,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8807]]],\n    CS[Cartesian,2],\n        AXIS[\"easting\",east,\n            ORDER[1],\n            LENGTHUNIT[\"metre\",1]],\n        AXIS[\"northing\",north,\n            ORDER[2],\n            LENGTHUNIT[\"metre\",1]],\n    ID[\"EPSG\",3857]]"

This is stored in CRS format, which now you guys know all about. Let’s say we want to transform our raster from its current Albers Equal Area projection to a US National Atlas Equal Area projection:

crs_code <- '+proj=utm +zone=11 +ellps=GRS80 +datum=NAD83 +units=m +no_defs'
cdl_albers <- project(cdl, crs_code)

Note that this can take awhile depending on the size of your raster. It’s like adjusting a very elaborate patchwork quilt over a complex surface. Also, BE SURE to think about whether your data is categorical or continuous before you begin reprojecting. Reprojecting rasters means that the cell centroids and even size may change. To assign values to the new cells in the reprojected raster, projectRaster() may have to compute new values for the reprojected cells using one of two interpolation procedures: nearest neighbor (near) or bilinear (bilinear) which is the default. If you apply bilinear interpolation to a categorical raster with integer values like 1, 22, 33, 4 etc, you’ll end up with pixels with values of averages of these categories (11.5, 27.5). This totally destroys your categorization. So when you’re working with categorical rasters like the CropScape data, be sure you set method = "near" in your reprojection. You can read more about the methods for projecting different types of rasters by typing ?project in the console.


Raster math

If you want to find the difference between values in two rasters, it’s fairly complicated:

diff <- raster_1 - raster_1

Or, if for some reason you wanted to add 10 to all cells in our CDL raster:

newcdl <- cdl + 10

Don’t forget that a raster is just a matrix, so math is a breeze! Well, kinda… raster math that’s this simple only works if your rasters have the same extent and resolution and if the calculations are fairly simple. You can see the full set of operations you can use here.

You can also use simple R indexing to replace values in the raster. If for some reason, you wanted to turn all instances of corn (pixel values of 1) to soy (pixel values of five), you simple do the following:

nocorn <- cdl
nocorn[nocorn == 1] <- 5

Be careful when you replace values in the ORIGINAL raster you loaded. This changes the cdl SpatRaster you’re working with in this session of R. I recommend making a copy of the data when you’re doing big replacements, so you can keep the unaltered original data for other analyses.

If you want a single value summarizing the entire raster (so, for example, the sum of the values of all pixels in raster), use the global() function we saw above:

global(cdl, "isNA")
##               isNA
## Class_Name 1906311

Other cool things you can do with rasters

If you want to do a deep read of other more complex things you can do with rasters in R, read this chapter of the RSpatial textbook.


Cropping and masking with a shapefile

Cropping rasters in R is pretty straightforward. If you know the coordinates of your new raster extent, then:

new.extent <- extent(c(-125, -100, 20, 50))
out <- crop(raster, new.extent)

You can get the new.extent values from an existing raster by calling ext(new_extent_raster). You can also pull this info out of a shapefile.

If you want to crop to a specific shape rather than the extent of that shape, then you need to add an additional step. Say we want to extract the part of this land use raster located in the city of Decatur2 (note that these two spatial datasets have to be in the same projection for the crop() to work):

shp <- readRDS("./data/decatur.RDS")
shp <- st_transform(shp, crs(cdl))
plot(cdl)
plot(shp, add=T, col = "red")

city_raster <- crop(cdl, shp)
plot(city_raster)

Notice that this only cropped the land use raster to the extent of the Decatur shapefile, not to the actual boundaries of the shapefile. To do that, we have to add one more step:

city_raster <- crop(cdl, shp, mask = T) 
plot(city_raster)

mask=T basically tells R that we want to use the shapefile as a mask for the raster and mark as NA anything that isn’t within the shapefile boundaries we’re using.

Masking by data values

Masking raster data is something we need to do quite a bit in the remote sensing world. We can do this using the mask() function. Let’s say we want to mask all pixels not classified as “Developed - High Intensity” (124) in our new city raster:

city_raster[city_raster == 124] <- NA
plot(city_raster)

Integration with ggplot

So how do we pull this all together into a nice visual? With the tidyterra package, terra objects can play nice with tidyverse tools. Be sure to install tidyterra, then try this:

library(tidyterra)
ggplot() +
  geom_spatraster(data = city_raster) +
  geom_sf(data = shp, fill = "transparent") +
  theme_bw() +
  labs(fill = "",
       title = "LULC in Decatur, GA")

There’s a lot more you can do with tidyterra, so read more here!


Multiband rasters

Great! Now that we know how to do some basic things with categorical rasters in R, let’s look at some multi-band and continuous data. Let’s play with the ENACTS-BMD data created by Dr. Nachiketa Acharya and his colleagues (you can read more about the data here). This gridded data describes daily temperature and rainfall observations in Bangladesh. Today, we’ll work with a small subset of this data describing rainfall in January 2022.

jan22 <- rast("./data/JAN22.tif")
jan22
## class       : SpatRaster 
## dimensions  : 134, 108, 31  (nrow, ncol, nlyr)
## resolution  : 0.04999999, 0.05  (x, y)
## extent      : 87.65, 93.05, 20.35, 27.05  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : JAN22.tif 
## names       : precip_1,   precip_2, precip_3,    precip_4, precip_5, precip_6, ... 
## min values  : 0.000000, 0.00000000, 0.000000, 0.000000000, 0.000000, 0.000000, ... 
## max values  : 3.211378, 0.01945309, 4.411213, 0.006855291, 2.826128, 3.603938, ... 
## time (days) : 2022-01-01 to 2022-01-31

Note that this raster as 31 layers. Each layer represents precipitation for one day in January 2022. For example, if I wanted to visualize precipitation on January 11th, I could do this:

plot(jan22[[11]])

Similarly, I could ask for mean precipitation for the entire country on January 11th:

global(jan22[[11]], fun = mean, na.rm=T)
##               mean
## precip_11 1.258884

We could also sum total precipitation for each pixel in Bangladesh for the month of January as follows:

totalJan <- app(jan22, fun = sum)
plot(totalJan)

Pretty cool, huh?!

Saving rasters

You can write out a raster using the writeRaster() function in the raster package.

writeRaster(my_raster, "my/directory/raster.tif") # you can use whatever extension you like - I prefer .tif
# note that if you want to write netCDF files, use writeCDL()

Finding rasters

USGS data repositories allow you to search for data by time, year, satellite, resolution, etc.:

You can also download data directly from NASA (easiest when downloading large time-series). University of Maryland maintains a database of pre-processed images for the remote sensing of land surface processes. You can find lots of DEMs here (ASTER and STRM are the most widely used). You can find cool, free high-resolution topographic and bathymetric data here. This website groups data sources by application (e.g. oceanography, land surface processes, forecasting) and can be helpful to explore.


Additional resources

  • This tutorial was heavily inspired by this fantastic chapter in the Spatial Data Science online textbook.
  • If you’ll be working a lot with MODIS data, check out this package
  • If you’ll be working with Landsat, check out this package
  • For more information about LiDAR data in R, check this out.
  • If you think you’ll be working with hyperspectral data, you’ll want to read this
  • This chapter in the RSpatial text provides more information about remote sensing in R.
  • GO PLAY WITH GOOGLE EARTH!
  • If you want to learn more about spatial predictions with rasters, check this fantastic resource out

  1. The way we categorize data in rasters can become EXTREMELY important as we move towards spatial analysis of raster data. More to come later in the class!↩︎

  2. You can download the shapefile here. I’ve gone ahead and crunched things up into an RDS to make file sharing easier.↩︎