The latest version of the Dow-Eff functions (Manual: pdf; html) can perform analyses on five different ethnological datasets:

abbreviation |
dataset |
codebook |
---|---|---|

WNAI | Western North American Indians | codebook |

SCCS | Standard Cross-Cultural Sample | codebook |

EA | Ethnographic Atlas | codebook |

LRB | Lewis R. Binford forager data | codebook |

XC | Merged 371 society data | codebook |

The code below outlines the workflow for the **SCCS** estimations found in:

Brown, Christian, and Eff, E. Anthon. (2010) The State and the Supernatural: Support for Prosocial Behavior.

Structure and Dynamics: eJournal of Anthropological and Related Sciences: Vol. 4: No. 1, Article 3. Link

You will need a number of R packages to run the Dow-Eff functions. These are loaded using the *library* command. If a package is not found, it should be first installed. The following command will initiate the installation of a package named *mice*, for example:

`install.packages("mice")`

```
#--set working directory and load needed libraries--
options('width'=150)
# setwd("/home/yagmur/Dropbox/functions")
setwd("e:/Dropbox/functions/")
library(Hmisc)
library(mice)
library(foreign)
library(stringr)
library(AER)
library(spdep)
library(psych)
library(geosphere)
library(relaimpo)
library(linprog)
library(dismo)
library(forward)
library(pastecs)
library(classInt)
library(maps)
library(dismo)
library(plyr)
library(aod)
library(reshape)
library(RColorBrewer)
library(XML)
library(tm)
library(mlogit)
```

The Dow-Eff functions, as well as the five ethnological datasets, are contained in an R-workspace, located in the cloud.

```
load(url("http://capone.mtsu.edu/eaeff/downloads/DEf01f.Rdata"))
#-show the objects contained in DEf01f.Rdata
data.frame(type=sapply(ls(),function(x) class(get(x))))
```

```
## type
## addesc function
## capwrd function
## chK function
## CSVwrite function
## doLogit function
## doMI function
## doMNLogit function
## doOLS function
## EA data.frame
## EAcov list
## EAfact data.frame
## EAkey data.frame
## fv4scale function
## GISaux character
## gSimpStat function
## kln function
## llm matrix
## LRB data.frame
## LRBcov list
## LRBfact data.frame
## LRBkey data.frame
## MEplots function
## metadata list
## mkcatmappng function
## mkdummy function
## mkmappng function
## mknwlag function
## mkscale function
## mkSq function
## mmgg function
## plotSq function
## quickdesc function
## resc function
## rmcs function
## rnkd function
## SCCS data.frame
## SCCScov list
## SCCSfact data.frame
## SCCSkey data.frame
## setDS function
## showlevs function
## spmang function
## widen function
## WNAI data.frame
## WNAIcov list
## WNAIfact data.frame
## WNAIkey data.frame
## XC data.frame
## XCcov list
## XCfact data.frame
## XCkey data.frame
```

The **setDS(** *xx* **)** command sets one of the four ethnological datasets as the source for the subsequent analysis. The five valid options for *xx* are: *XC*, *LRB*, *EA*, *SCCS*, and *WNAI*. The **setDS()** command creates objects:

object name |
description |
---|---|

cov |
Names of covariates to use during imputation step |

dx |
The selected ethnological dataset is now called dx |

dxf |
The factor version of dx |

key |
A metadata file for dx |

wdd |
A geographic proximity weight matrix for the societies in dx |

wee |
An ecological similarity weight matrix for the societies in dx |

wll |
A linguistic proximity weight matrix for the societies in dx |

`setDS("SCCS") `

The next step in the workflow is to create any new variables and add them to the dataset *dx*. New variables can be created directly, as in the following example. When created in this way, one should also record a description of the new variable, using the command **addesc()**. The syntax takes first the name of the new variable, and then the description.

```
dx$inhreal=(SCCS$v278>1)*1
addesc("inhreal","Dummy: real property is inherited")
dx$inhmove=(SCCS$v279>1)*1
addesc("inhmove","Dummy: movable property is inherited")
dx$marrgood=(SCCS$v208<4)*1
addesc("marrgood","Dummy: marriage includes transfer of goods")
```

Dummy variables (variables taking on the values zero or one) should be added using the command **mkdummy()**. This command will automatically record a variable description. Dummy variables are appropriate for categorical variables. The syntax of **mkdummy()** takes first the categorical variable name, and then the category number (these can be found in the codebook for each ethnological dataset). Note that the resulting dummy variable will be called *variable name*+*.d*+*category number*.

`mkdummy("v245",2)`

```
## [1] "This dummy variable is named v245.d2"
## [1] "The variable description is: 'Milking of Domestic Animals == Milked more often than sporadically'"
```

After making any new variables, list the variables you intend to use in your analysis in the following form.

```
evm<-c("v238","v921","v928","v1685","v232","v206","v245.d2",
"v270","v272","v237","v155","v72","v1726","inhreal",
"inhmove","marrgood","v63","v64","v1665","v1666",
"v1667","v666","v767","v768","v770","v773","v891",
"v1649","v1650")
```

Missing values of these variables are then imputed, using the command **doMI()**. Below, the number of imputed datasets is 5, and 7 iterations are used to estimate each imputed value (5 imputations is borderline OK, 10 or 15 would be better). The stacked imputed datasets are collected into a single dataframe which here is called *smi*.

This new dataframe *smi* will contain not only the variables in evm, but also a set of normalized (mean=0, sd=1) variables related to climate, location, and ecology (these are used in the OLS analysis to address problems of endogeneity). In addition, squared values are calculated automatically for variables with at least three discrete values and maximum absolute values no more than 300. These squared variables are given names in the format *variable name*+*Sq*.

Finally, *smi* contains a variable called *.imp*, which identifies the imputed dataset, and a variable called *.id* which gives the society name.

`smi<-doMI(evm,nimp=5,maxit=7)`

```
## [1] "--create variables to use as covariates--"
## [1] "--finding covariates for inhreal, inhmove --"
## [1] "v238"
## [1] "v1685"
## [1] "v272"
## [1] "v237"
## [1] "v72"
## [1] "v1726"
## [1] "inhreal"
## [1] "inhmove"
## [1] "v63"
## [1] "v64"
## [1] "v1665"
## [1] "v1666"
## [1] "v1667"
## [1] "v666"
## [1] "v767"
## [1] "v768"
## [1] "v770"
## [1] "v773"
## [1] "v891"
## [1] "v1649"
## [1] "v1650"
## [1] "WARNING: variable may not be ordinal--v272" "WARNING: variable may not be ordinal--v270" "WARNING: variable may not be ordinal--society"
## [4] "WARNING: variable may not be ordinal--dxid"
## Time difference of 17.99 secs
```

`dim(smi) # dimensions of new dataframe smi`

`## [1] 930 121`

`smi[1,] # first row of new dataframe smi`

```
## .imp .id v238 v1685 v272 v237 v72 v1726 inhreal inhmove v63 v64 v1665 v1666 v1667 v666 v767 v768 v770 v773 v891 v1649 v1650 x y
## 1 1 Nama 1 3 1 2 5 3 0 1 3 1 1 8 1 2 2 3 1 3 2 17 17 0.02853 -1.513
## x2 y2 xy Austronesian Nigercongo mht.name.dTropicalandsubtropicalgrasslandssavannasandshrublands
## 1 0.0008138 2.29 -0.04317 0 0 0
## mht.name.dTropicalandsubtropicalmoistbroadleafforests koeppengei.dAf koeppengei.dAw continent.dAfrica continent.dAsia continent.dNorthAmerica
## 1 0 0 0 1 0 0
## continent.dSouthAmerica region.dNorthernAmerica region.dSouthAmerica v921 v928 v232 v206 v245.d2 v270 v155 marrgood bio.1 bio.2 bio.3 bio.4
## 1 0 0 0 12 3 1 5 1 2 1 0 0.1303 1.954 0.07583 0.051
## bio.5 bio.6 bio.8 bio.9 bio.10 bio.11 bio.12 bio.13 bio.14 bio.15 bio.16 bio.17 bio.18 bio.19 meanalt mnnpp sdalt society lati long
## 1 0.4836 -0.226 0.4157 -0.2491 0.1784 0.04833 -1.124 -1.112 -0.7141 1.391 -1.078 -0.7515 -0.717 -0.9039 1.024 -1.02 -0.4929 Nama -23.32 17.08
## dxid v238Sq v1685Sq v272Sq v237Sq v72Sq v1726Sq v63Sq v64Sq v1665Sq v1666Sq v1667Sq v767Sq v768Sq v770Sq v773Sq v891Sq v1649Sq v1650Sq v921Sq
## 1 1 1 9 1 4 25 9 9 1 1 64 1 4 9 1 9 4 289 289 144
## v928Sq v232Sq v206Sq v270Sq v155Sq bio.1Sq bio.2Sq bio.3Sq bio.4Sq bio.5Sq bio.6Sq bio.8Sq bio.9Sq bio.10Sq bio.11Sq bio.12Sq bio.13Sq bio.14Sq
## 1 9 1 25 4 1 0.01699 3.818 0.00575 0.002601 0.2339 0.05106 0.1728 0.06206 0.03184 0.002336 1.263 1.237 0.5099
## bio.15Sq bio.16Sq bio.17Sq bio.18Sq bio.19Sq meanaltSq mnnppSq sdaltSq latiSq longSq dxidSq
## 1 1.935 1.161 0.5648 0.5141 0.8171 1.049 1.04 0.2429 543.7 291.8 1
```

The variables for a scale can be combined using the function *mkscale*. The function can calculate three different kinds of scales: 1) based on linear programming as described in Eff (2010); 2) the mean of the standardized values; 3) the first principal component of the standardized values. Below, three scales are created using the first principal component.

One should look at the output of *mkscale* to ensure that the components correlate in the expected way with the final scale. The column *inv* in *fec$corrs* shows that whether the variable was inverted. Compare that with the description of the variable, and its factor levels, to understand whether it correlates as expected.

```
# ==agricultural potential==
agp<-c("v921","v928")
fec<-mkscale(compvarbs="agp", udnavn="PCAP", impdata=smi, set.direction="v921",
type="pc1", add.descrip="1st PC: Agricultural potential high")
```

```
## [1] "PCAP"
## [1] "Pct Variance Explained by component"
## Comp.1 Comp.2
## 0.8257 0.1743
## c("v921", "v928")
```

```
#--check reasonableness of scale--
fec$stats
```

```
## std.alpha
## 1 0.7889
```

`fec$corrs`

```
## min.load max.load cor.w.scale inv varb description
## v921 0.7071 0.7071 0.909 1 v921 Agricultural Potential 1: Sum of Land Slope, Soils, Climate Scales
## v928 0.7071 0.7071 0.909 1 v928 Agricultural Potential 2: Lowest of Land Slope, Soils, Climate Scales
```

```
smi[,names(fec$scales)]<-fec$scales
# ==commmunity size==
csz<-c("v63","v237")
fec<-mkscale(compvarbs="csz", udnavn="PCsize", impdata=smi, set.direction="v63",
type="pc1", add.descrip="1st PC: Community size large")
```

```
## [1] "PCsize"
## [1] "Pct Variance Explained by component"
## Comp.1 Comp.2
## 0.7325 0.2675
## c("v237", "v63")
```

```
#--check reasonableness of scale--
fec$stats
```

```
## std.alpha
## 1 0.6347
```

`fec$corrs`

```
## min.load max.load cor.w.scale inv varb description
## v63 0.7071 0.7071 0.856 1 v237 Jurisdictional Hierarchy beyond Local Community
## v237 0.7071 0.7071 0.856 1 v63 Community Size
```

```
smi[,names(fec$scales)]<-fec$scales
# ==violence==
vio<-c("v1665","v1666","v1667")
fec<-mkscale(compvarbs="vio", udnavn="PCviol", impdata=smi, set.direction="v1665",
type="pc1", add.descrip="1st PC: High levels of violence")
```

```
## [1] "PCviol"
## [1] "Pct Variance Explained by component"
## Comp.1 Comp.2 Comp.3
## 0.5996 0.2447 0.1558
## c("v1667", "v1665", "v1666")
```

```
#--check reasonableness of scale--
fec$stats
```

```
## std.alpha
## 1 0.6613
```

`fec$corrs`

```
## min.load max.load cor.w.scale inv varb description
## v1667 0.4802 0.5178 0.675 1 v1667 Individual Aggression - Theft
## v1665 0.5906 0.6147 0.806 1 v1665 Individual Aggression - Homicide
## v1666 0.6056 0.6485 0.833 1 v1666 Individual Aggression - Assault
```

`smi[,names(fec$scales)]<-fec$scales`

All of the variables selected to play a role in the model must be found in the new dataframe *smi*. Below, the variables are organized according to the role they will play.

```
# --dependent variable--
dpV<-"v238"
#--independent variables in UNrestricted model--
UiV<-c("PCAP","PCsize","PCsizeSq","PCviol",
"v1685","v232","v206","v245.d2",
"v270","v272","v155","v72","v1726","inhreal",
"inhmove","marrgood","v64",
"v666","v767","v768","v770","v773","v891",
"v1649","v1650")
#--independent variables in restricted model (all must be in UiV above)--
RiV<-c("PCAP","PCsize","PCsizeSq",
"v1685","v206","v272","v1650")
```

The command **doOLS()** estimates the model on each of the imputed datasets, collecting output from each estimation and processing them to obtain final results. To control for Galtons Problem, a network lag model is used, with the user able to choose a combination of geographic proximity (dw), linguistic proximity (lw), and ecological similarity (ew) weight matrices. In most cases, the user should choose the default of dw=TRUE, lw=TRUE, ew=FALSE.

There are several options that increase the time **doOLS()** takes to run: stepW runs a background stepwise regression to find which variables perform best over the set of estimations; relimp calculates the relative importance of each variable in the restricted model, using a technique to partition R^{2}; slmtests calculates LaGrange multiplier tests for spatial dependence using the three weight matrices. All of these should be set to FALSE if one wishes to speed up estimation times.

```
h<-doOLS(smi,depvar=dpV,indpv=UiV,rindpv=RiV,othexog=NULL,
dw=TRUE,lw=TRUE,ew=FALSE,stepW=FALSE,boxcox=FALSE,getismat=FALSE,
relimp=TRUE,slmtests=FALSE,haustest=NULL,mean.data=TRUE,doboot=300,full.set=FALSE)
```

```
## [1] "--finding optimal weight matrix------"
## [1] "Exogenous variables used to instrument Wy: xWPCAP, xWPCsize, xWv245.d2, xWv270, xWv272, xWv72, xWv1726, xWmarrgood, xWv64, xWv666, xWv767, xWv768, xWv770, xWv891, xWv1649, xWv1650, xWPCAPSq, xWPCviolSq, xWv1685Sq, xWv270Sq, xWv272Sq, xWv72Sq, xWv1726Sq, xWv64Sq, xWv768Sq, xWv891Sq, xWv1649Sq"
## [1] "--looping through the imputed datasets--"
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## Time difference of 25.96 secs
```

`names(h)`

```
## [1] "DependVarb" "URmodel" "model.varbs" "Rmodel" "EndogeneityTests"
## [6] "Diagnostics" "OtherStats" "DescripStats.ImputedData" "DescripStats.OriginalData" "totry"
## [11] "didwell" "usedthese" "dfbetas" "data"
```

The output from **doOLS()**, here called *h*, is a list containing 14 items, explained in more detail in the manual.

name |
description |
---|---|

DependVarb | Description of dependent variable. |

URmodel | Coefficient estimates from the unrestricted model (includes standardized coefficients and VIFs). Two pvalues are given for H0: coefficient =0. One is the usual pvalue, the other (hcpval) is heteroskedasticity consistent. If stepkept=TRUE, the table will also include the proportion of times a variable is retained in the model using stepwise regression. |

model.varbs | Short descriptions of model variables: shows the meaning of the lowest and highest values of the variable. This can save a trip to the codebook. |

Rmodel | Coefficient estimates from the restricted model. If relimp=TRUE, the R^{2} assigned to each independent variable is shown here. |

EndogeneityTests | Hausman tests (H0: variable is exogneous), with F-statistic for weak instruments (a rule of thumb is that the instrument is weak if the F-stat is below 10), and Sargan test (H0: instrument is uncorrelated with second-stage 2SLS residuals). |

Diagnostics | Regression diagnostics for the restricted model: RESET test (H0: model has correct functional form); Wald test (H0: appropriate variables dropped); Breusch-Pagan test (H0: residuals homoskedastic; Shapiro-Wilkes test (H0: residuals normal); Hausman test (H0: Wy is exogenous); Sargan test (H0: residuals uncorrelated with instruments for Wy). If slmtests=TRUE, the LaGrange multiplier tests (H0: spatial error model not appropriate) are reported here. |

OtherStats | Other statistics: Composite weight matrix weights; R^{2} for restricted model and unrestricted model; number of imputations; number of observations; Fstat for weak instruments for Wy. |

DescripStats.ImputedData | Descriptive statistics for model variables found only in imputed data. |

DescripStats.OriginalData | Descriptive statistics for model variables found in pre-imputation dataset. |

totry | Character string of variables that were most significant in the unrestricted model as well as additional variables that proved significant using the add1 function on the restricted model. |

didwell | Character string of variables that were most significant in the unrestricted model. |

usedthese | Table showing how observations used differ from observations not used, regarding ecology, continent, and subsistence. |

dfbetas | Influential observations for dfbetas. |

data | Data as used in the estimations. Observations with missing values of the dependent variable have been dropped. If mean.data=TRUE, will output format that can be used to make maps. |

The last two items in the list can be quite large. Here are three of the first 12 items:

`h$Rmodel`

```
## coef stdcoef VIF relimp pval hcpval bootpval star desc
## (Intercept) -0.21019 NaN NaN NaN 0.52580 0.45023 0.51835 <NA>
## PCAP -0.12464 -0.13448 1.172 0.01247 0.02695 0.01571 0.02610 ** 1st PC: Agricultural potential high
## PCsize 0.22209 0.23416 1.774 0.04000 0.00172 0.00103 0.00179 *** 1st PC: Community size large
## PCsizeSq -0.11570 -0.20130 1.442 0.01421 0.00299 0.01046 0.02151 ** 1st PC: Community size large Squared
## v1650 -0.03384 -0.18813 1.081 0.02190 0.00219 0.00098 0.00110 *** Frequency of External Warfare (Resolved Rating)
## v1685 0.05800 0.06101 1.034 0.00585 0.31624 0.31379 0.32538 Chronic Resource Problems (Resolved Ratings)
## v206 0.09371 0.16248 1.498 0.12130 0.01773 0.01732 0.01799 ** Dependence on Animal Husbandry (Atlas 4)
## v272 0.12351 0.07558 1.169 0.03591 0.21122 0.09062 0.13173 Caste Stratification (Endogamy)
## Wy 1.11370 0.51095 1.646 0.25200 0.00000 0.00000 0.00000 *** Network lag term
```

`h$Diagnostics`

```
## Fstat df pvalue star
## RESET test. H0: model has correct functional form 2.6897 15221 0.1010
## Wald test. H0: appropriate variables dropped 0.2029 67 0.6538
## Breusch-Pagan test. H0: residuals homoskedastic 1.0637 5551 0.3024
## Shapiro-Wilkes test. H0: residuals normal 7.6729 82395 0.0056 ***
## Hausman test. H0: Wy is exogenous 1.1716 828 0.2794
## Sargan test. H0: residuals uncorrelated with instruments 2.3389 1428 0.1264
```

`h$OtherStats`

```
## d l e Weak.Identification.Fstat R2.final.model R2.UR.model nimp nobs BClambda
## 1 0.7 0.3 0 21.19 0.5039 0.5506 5 168 none
```

The 14^{th} item in list *h* is a dataframe containing mean values of variables across imputations. This can be used to make maps, employing the functions **mkmappng()** (for ordinal data) or **mkcatmappng()** (for categorical data).

`mkmappng(h$data,"v238","v238MoralGods",show="ydata",numnb.lg=3,numnb.lm=20,numch=5,pvlm=.05,dfbeta.show=TRUE)`

`## Loading required package: mapproj`

```
## pdf
## 2
```

Click here to see the map png

There are squared terms in the estimated model, which makes the coefficients a bit difficult to comprehend. The function **plotSq()** will make a plot for each of the variables with squared terms: the abscissa gives the values of the variable found in the averaged data, while the ordinate gives the marginal effect on the dependent variable. The number of observations at each value is shown both by the rugplots in green at the top of the plot, and by the size of the red circles at each variable value.

`plotSq(h)`

One can also write the list *h* to a csv format file that can be opened as a spreadsheet. The following command writes *h* to a file in the working directory called *olsresults.csv*.

`CSVwrite(h,"olsresults",FALSE)`

Click here to see the spreadsheet csv