Thematic map with categorical data

We need two libraries and county-level data and map files. The data are from IPUMS:

  1. Minnesota Population Center. North Atlantic Population Project: Complete Count Microdata. Version 2.0 [Machine-readable database]. Minneapolis: Minnesota Population Center, 2008. https://www.nappdata.org/napp/
  2. Steven Ruggles et al. Integrated Public Use Microdata Series: Version 5.0 [Machine-readable database]. Minneapolis: University of Minnesota, 2010.https://www.nappdata.org/napp/
  3. Minnesota Population Center. National Historical Geographic Information System: Version 2.0. Minneapolis, MN: University of Minnesota 2011. http://www.nhgis.org
library(sp)
library(RColorBrewer)
load(url("http://dl.dropbox.com/u/9256203/US1880.Rdata"), .GlobalEnv)
ls()
## [1] "cnty" "orig" "st"

The data file contains the number of instances a person in the row county had a parent born in the column location. The data and the maps are from 1880. Only persons 40 years old and older are considered; which gives some sense of how population migrated between 1820-1880. We convert the raw count into percentages, and then find for each county the largest single parent birthplace. It's hard to use more than 12 distinguishable colors in a single map, so we find the 11 most common birthplaces and group the others in a single category called “other”.

q <- as.matrix(orig)
q <- q/rowSums(q)
z <- which(q == apply(q, 1, max), arr.ind = TRUE)
z <- z[match(unique(z[, 1]), z[, 1]), ]
q <- q[z[, 1], ]
r <- colnames(q)[z[, 2]]
a <- table(r)
axx <- names(a[order(-a)])[1:11]
k <- match(r, axx)
r[which(is.na(k))] <- "other"
r <- as.matrix(r)
rownames(r) <- rownames(q)
table(r)
## r
##        Georgia        Ireland       Kentucky       New.York North.Carolina 
##            176            144            163            186            181 
##         Norway          other  Other.Germany   Pennsylvania South.Carolina 
##             47            406            177            260            128 
##      Tennessee       Virginia 
##            176            367
head(r)
##       [,1]   
## 10010 "other"
## 10030 "other"
## 10070 "other"
## 10230 "other"
## 10250 "other"
## 10530 "other"

We use the RColorBrewer package to set 12 distinguishable colors and assign each category its own color.

avv <- unique(r[, 1])
ki <- match(cnty@data$fips, rownames(r))
farv <- brewer.pal(12, "Paired")
nuk <- match(r[ki, 1], avv)

We set the xlim parameter to plot just the Eastern US. To make it easier to read, we remove the county outlines, and overlay with state outlines.

coord <- coordinates(cnty)
plot(cnty, col = farv[nuk], lty = 0, xlim = c(-97, max(coord[, 1])))
plot(st, add = TRUE, lty = 1, lwd = 0.2, xlim = c(-97, max(coord[, 1])))
mtext("Major parental source")
legend("bottomright", legend = avv, fill = farv[1:max(nuk, na.rm = TRUE)], cex = 0.8)

plot of chunk unnamed-chunk-4