Objectives


Pre-lab assignment


My assumptions


Set up

library(sp)
library(tidyverse)
library(spatstat)
library(raster)

NOTE: Point pattern analysis depends on DISTANCES so be absolutely sure that your data is in a PROJECTED COORDINATE SYSTEM.


Our data

Today, I’ll work with a few data sets. Many some standard with the spatstat package (lansing, longleaf, etc.). One set of data you will have to download from Chapter 8 of the R Spatial online textbook. At the top of Chapter 8, you will see hyperlinked crime, city, and census .rds files which you should download and store locally on your computer before beginning the tutorial. These .rds files are shapefiles containing census data, a city boundary, and incidences of crime.

city <- readRDS("./data/city.rds")
census <- readRDS("./data/census2000.rds")
crime <- readRDS("./data/crime.rds")

plot(city)
plot(crime, add=T, col = "red" , cex = 0.5, pch = "*")

The city SpatialPolygonsDataFrame delineates the boundary of a city. The census data set contains lots of data on the demographics of census blocks within the city. Crime contains information about crimes that occurred in the city as well as their location. The types of crimes and number of observations within each crime category are listed below:

crime@data %>% group_by(CATEGORY) %>% summarize(n = n()) %>% arrange(desc(n))
## # A tibble: 15 x 2
##                CATEGORY     n
##                   <chr> <int>
##  1          Petty Theft   665
##  2            Vandalism   355
##  3      Drunk in Public   232
##  4     Vehicle Burglary   221
##  5 Residential Burglary   219
##  6                  DUI   212
##  7             Assaults   172
##  8  Commercial Burglary   143
##  9          Grand Theft   143
## 10   Drugs or Narcotics   134
## 11           Auto Theft    86
## 12              Robbery    49
## 13              Weapons    15
## 14                Arson     9
## 15          Hate Crimes     6

What is a point process?

Point data gives the location of objects (trees) or events (crime) within an area. Often, in addition to coordinates, points have other data attached to them called marks. Marks are essentially attributes of the point (i.e. species of tree, type of crime). Marks could be categorical (type of crime) or continuous (diameter of tree). Marks are different from covariates. Marks are attributes associated with the point. Covariates are any kind of data that can be linked to the points as an explanatory variable (climate data, elevation, socioeconomic data, land use, etc.). We can answer a number of important questions with point-pattern data, including:

  1. What is the intensity of a process (e.g. crime hot spots)?
  2. How are the points distributed in space (e.g. uniform, random, clustered, see below)?
  3. How does the intensity of points covary with other processes (e.g. species density and elevation, cancer and pollution exposure)?
  4. How do points with different marks segregate across space (e.g. how do the distributions of species age and size vary differently across space?)

We can think of a point process as a black box that generates a random observed spatial point pattern according to some rules. Point processes are assumed to be (1) random (locations and numbers of points are random) and (2) do not have a serial order (unless there are marks attached). Point processes are located within a sampling window which is important to identify and define.


Introduction to spatstat

In this tutorial, we’ll be working with the spatstat package, so make sure you’ve installed this on your machine. spatstat stores data a bit differently than the sp package, so I’ll begin by explaining the peculiarities of this package. spatstat classifies objects in the following way:

There are a series of methods associated with each class. To get this list, type:

methods(class = "ppp") # replace ppp with class name of interest
##   [1] [                [<-              affine           anyDuplicated   
##   [5] as.data.frame    as.im            as.layered       as.owin         
##   [9] as.ppp           auc              berman.test      boundingbox     
##  [13] boundingcentre   boundingcircle   boundingradius   by              
##  [17] cdf.test         circumradius     closepairs       closing         
##  [21] coerce           connected        coords           coords<-        
##  [25] crossdist        crosspairs       cut              density         
##  [29] dilation         distfun          distmap          domain          
##  [33] duplicated       edit             envelope         erosion         
##  [37] fardist          flipxy           Frame<-          has.close       
##  [41] head             identify         initialize       intensity       
##  [45] iplot            is.connected     is.empty         is.marked       
##  [49] is.multitype     kppm             markformat       marks           
##  [53] marks<-          multiplicity     nnclean          nncross         
##  [57] nndensity        nndist           nnfun            nnwhich         
##  [61] nobjects         npoints          opening          pairdist        
##  [65] pcf              periodify        pixellate        plot            
##  [69] ppm              print            quadrat.test     quadratcount    
##  [73] quantess         rebound          relevel          relrisk         
##  [77] rescale          rhohat           roc              rotate          
##  [81] round            rounding         rshift           scalardilate    
##  [85] scanmeasure      segregation.test sharpen          shift           
##  [89] show             slotsFromS3      Smooth           Smoothfun       
##  [93] split            split<-          subset           summary         
##  [97] superimpose      tail             unique           unitname        
## [101] unitname<-       unmark           unstack          Window          
## [105] Window<-        
## see '?methods' for accessing help and source code

It’s fairly easy to transform data between SpatialPoints sp objects and the ppp objects used in spatstat. Let’s start by loading data from the spatstat package:

data("swedishpines")
x <- swedishpines
plot(x)

This sample data sets plots the occurrence of Swedish Pines within a small window.

class(x)
## [1] "ppp"
summary(x)
## Planar point pattern:  71 points
## Average intensity 0.007395833 points per square unit (one unit = 0.1 
## metres)
## 
## Coordinates are integers
## i.e. rounded to the nearest unit (one unit = 0.1 metres)
## 
## Window: rectangle = [0, 96] x [0, 100] units
## Window area = 9600 square units
## Unit of length: 0.1 metres

Notice that the x variable we created is of class ppp. We see that it includes 71 points and that the average intensity (points/unit area) of these points is .0074 points per square unit or in this case 0.74 trees per square meter.

Unless you acquire magical data, your data will not come to you in ppp format. Working things into ppp (then later sp for advanced plotting) will help you easily apply the complex analyses and functions listed below to your data, so we’re going to work through an example of loading a ’.csv of coordinates and attributes into R, transforming this to a ppp object, then to an sp SpatialPoints object. Let’s look at the location of dumps in the western US:

lf <- read.table("./data/landfill.txt", sep=",", stringsAsFactors = F, skip = 1)
colnames(lf) <- c("id", "lat", "lon")
# 1. store x and y coords in two vectors
lon <- lf$lon
lat <- lf$lat

# 2. create two vectors xrange and yrange with dimensions of triangle that contain all points
xrange <- range(lon, na.rm=T)
yrange <- range(lat, na.rm=T)

# 3. create ppp
lf <- ppp(lon, lat, xrange, yrange)
## Warning: data contain duplicated points
plot(lf)