Distance to a lake

wilakedistance2Living in Wisconsin, I was curious how far from a lake I was at any given point in the state. There’s a lot of lakes in Wisconsin, but it was hard to visualize their distribution.

Step 1) Download the high-resolution National Hydrography Dataset for Wisconsin (I host this locally as a shapefile). I only kept lakes with a surface area larger than 4 ha, and only polygons designated as lakes ‘Lake/pond; a standing body of water with a predominantly natural shoreline surrounded by land.’

nhd = readOGR(dsn = instertlocaldirectory,layer = 'WI')
areaX = nhd@data$AreaSqKm > 0.04 # Lakes > 4 ha
lakeX = nhd@data$FType == 390 # Only lakes
nhd2 = nhd[areaX & lakeX,]

The NHD data is presented in lat/long. Since I want to measure distances, I needed to project it. Wisconsin is in UTM Zone 15.

nhdUTM = spTransform(nhd2,CRSobj = "+proj=utm +zone=15 ellps=WGS84")

Step 2) Rasterize lake data onto a grid. The rasterize function (package:: raster) transfer values associated with ‘object’ type spatial data (points, lines, polygons) to raster cells.

emptyRaster = raster(ncol=5000, nrow=4280) #approximately 100 m grid for WI 
emptyRaster = extent(nhdUTM) #add spatial extent
lakeRaster = rasterize(nhdUTM,emptyRaster,field = 1, background = 0)
lakeRaster_class = reclassify(lakeRaster,cbind(0,NA))

Step 3) Compute distance from each grid cell to closest cell with value “1”. The raster::distance function does the heavy lifting here.

lakeDistance = distance(lakeRaster_class)

Step 4) Crop our data to just the state of Wisconsin. I can get a state map from the maps package in R.

library(maps); library(mapStats) 
usMap <- readOGR(system.file("shapes/usMap.shp", package="mapStats")[1],layer='usMap')
stMap = usMap[usMap@data$STATE_ABBR==state,]
stMap = spTransform(stMap,crs(nhdUTM))

Crop lake distance grid with state map:

distanceWI = rasterize(stMap,lakeDistance,mask=T)

Step 5) Plot the results! (in miles for convenience)

plot(distanceWI*0.62/1000,breaks = c(0,2,5,10,20),axes=FALSE, box=FALSE,
 col = colorRampPalette(c('azure2', "navy"))( 5),
 main = 'Distance from a lake in Wisconsin',
 legend.args=list(text='Distance (miles)', side=4, font=2, line=2, cex=0.8),
 axis.args=list(at=c(0,2,5,10,20), labels=c(0,2,5,10,'>20'), cex.axis=1))

Extracting polygon areas in R

Suppose I have a shapefile (lakes.shp) which consists of 5 polygons (Figure 1a).

lakes = shapefile('lakes.shp')
plot(lakes, col='dodgerblue',main='a) lakes',lty=0)
Figure 1: a) Lake polygons, b) Lake polygons with 30 m buffer
Figure 1: a) Lake polygons, b) Lake polygons with 30 m buffer

The shapefile is stored as a SpatialPolygonsDataFrame. The data underlying these polygons can be viewed with:


There is a column called SHAPE_Area, which is the area of each polygon. Now let’s say, we want to create a buffer zone around the lake (Figure 1b).

lakebuffer = buffer(lakes, width=30, dissolve=F) 
plot(buffer_ring,col='darkred',main = 'b) 30 m buffer',lty=0)

Great! We can still see the underlying data with


However, the SHAPE_Area shown are the original lake areas. To find the true polygon area, we must dig into the polygon layer: buffer_ring@polygons. But each polygon is an element of a list, and the slot with “area” is deeply nested. Instead of looping through each polygon, here’s an easy way to extract polygon areas:

bufAreas = sapply(slot(lakebuffer, "polygons"), slot, "area")

For the area of only the buffer:

lakeAreas = sapply(slot(lakes, "polygons"), slot, "area")
output = data.frame(bufferArea = bufAreas - lakeAreas)

readr 0.2

If your life as an ecologist is spent loading/writing .csv files in R, you should be using readr. Why? Well it’s a Hadley Wickham package for one. But really, it’s defaults make sense!

  1. read_csv: strings are assumed as characters, not factors. FINALLY!
  2. write_csv: doesn’t include row numbers by default.
  3. readr recognizes YYYY-mm-dd format. Goodbye strptime!
  4. In v0.2, readr looks at the first 1000 rows when guessing column types (although 5000 would be handy too).

Read all about v.0.2.0 in the RStudio Blog.

While readr is my friend 99% of the time, not everything is sunshine and lollipops. For instance, while looping through thousands of files, I would randomly get an “unknown TZ” error, even when setting a timezone. After banging my head a few times, I went back to read.csv. You can’t win ’em all.

Timezone error using read_csv
Timezone error using read_csv

Find my weather station

Ever wonder how much it snowed last year, or the year before, or every year since 1950? Well, to find this data, you’ll first need to choose a weather station.

As much as I like wasting time on the NOAA website, I figured this could be easier in R. Introducing nearWX (found here)

All you need is lat/long values (in the US) and a NOAA key. Request one at http://www.ncdc.noaa.gov/cdo-web/token.

Once you source the function from github:

Madison <- data.frame(-89.4,43.0667)
noaakey <- 'shFYexr.........' # your personal key
station <- nearWX(Madison,noaakey)

Station will return the list of closest weather stations. You will want the $id for use in the rnoaa package

In the rnoaa package there are few options for datasets (ANNUAL: Annual Summaries; GHCND: Daily Summaries; GHCNDMS: Monthly Summaries; …) parameters (CLDD: Cooling degree days; MNTM: Monthly mean temperature; TPCP: Total precipitation; TSNW: Total snowfall;…)

An example for retrieving monthly snowfall:

out <- ncdc(datasetid = "GHCNDMS", stationid = station$id[1], datatypeid = 'TSNW', 
startdate = "2010-01-01", enddate = "2010-12-31",token=noaakey,limit=100)

We’re limited to a year of data, so create a loop:

# hydroYr: TRUE = Sep-May, FALSE: Jan-Dec
annualTot <- function(stationid,datatypeid,startyear,endyear,hydroYr = FALSE,unitConv = 1) {
   output = data.frame(year = seq(startyear,endyear), parameter = NA)
   for (i in 1:nrow(output)){
   if (hydroYr == TRUE) {
   out <- ncdc(datasetid = "GHCNDMS", stationid = stationid, datatypeid = datatypeid, startdate = paste(output$year[i],"-09-01",sep=''), enddate = paste(output$year[i]+1,"-05-31",sep=''),token=noaakey,limit=100)
    } else {
   out <- ncdc(datasetid = "GHCNDMS", stationid = stationid, datatypeid = datatypeid, startdate = paste(output$year[i],"-01-01",sep=''), enddate = paste(output$year[i],"-12-31",sep=''),token=noaakey,limit=100)
    tot = sum(out$data$value) * unitConv #convert from mm to in
    output$parameter[i] = tot
    Sys.sleep(3) #NOAA timeout

And plot:

#annual total snowfall sep-may reported in inches
madSnow <- annualTot(station$id[3],'TSNW',1960,2014,hydroYr=T,unitConv = 0.03937)
barplot(madSnow$parameter,names.arg = madSnow$year,las=2,cex.names = 0.8,
cex.axis = 0.8, col='darkblue',ylab = 'total snowfall (inches)', xaxs = "i")
Annual Snowfall Madison, Wisconsin 1960-2014 (Dane County Airport)
Annual Snowfall Madison, Wisconsin 1960-2014 (Dane County Airport)

Oh god, umlauts!

Sweden has more lake data than most countries have lakes. However, they also have a 29 letter alphabet including ‘å’, ‘ä’ and ‘ö’. This results in lake names like Fräcksjön and Östra Helgtjärnen.

Combine two .csv files (data and metadata). The common ID is lake name.

Every lake name with a ‘swedish character’ would not match! Frustration!

Stack Overflow to the rescue.
Turns out in unicode, these special characters can be represented in two different forms.
1 character: ä (Normal-Form-Composed NFC) or
2 characters: a + double dots (Normal-Form-Decomposed NFD)
Therefore, Fräcksjön is 9 characters long in NFC, but 11 in NFD. Since grep in R matches characters and lengths, this leads to a problem.

My problem stemmed from the fact that one .csv was made on a PC (NFC) and one on MacOS (NCD).

You must switch the encoding on one of the files.

iconv(data, "UTF-8-mac", "UTF-8")

All that troubleshooting for one line of code. Here’s a great example:

Screen Shot 2015-11-11 at 5.04.24 PM

Watercolor your lakes in R

All lakes are beautiful. Or at least when they’re coded in watercolor!


For this you will need to install and load ggmap. I am a huge Hadley Wickham fan (obvs), but am not usually a proponent of plotting packages. I love base. However, you can’t beat this one.

install.packages('ggmap') library(ggmap)

You will need two customize two things:
1) Lat/long <- The center of your map in decimal degrees
2) Zoom <- Must be an integer. This is frustrating with small scale maps. Like the Great Lakes

qmap(c(-94.542155,49.3938), zoom=9, source = "stamen", maptype = "watercolor")

To plot figure without axes, set extent to ‘device. Now output your figure and put it on your fridge! Or your phone wallpaper.

png('mylake.png',width=4,height=4,units='in',res=300) mylake=get_map(c(-94.542,49.393), zoom=9, source = "stamen", maptype = "watercolor") ggmap(mylake,extent = "device") dev.off()