Objects contained in R workspace DEf01f.Rdata

Datasets
EA	Ethnographic Atlas dataset
EAkey	Ethnographic Atlas metadata file
EAfact	Ethnographic Atlas dataset with factor labels
EAcov	Ethnographic Atlas variable covariates for imputation
LRB	Binford forager dataset
LRBkey	Binford forager metadata file
LRBfact	Binford forager dataset with factor labels
LRBcov	Binford forager variable covariates for imputation
SCCS	Standard Cross-Cultural Sample dataset
SCCSkey	Standard Cross-Cultural Sample metadata file
SCCSfact	Standard Cross-Cultural Sample dataset with factor labels
SCCScov	Standard Cross-Cultural Sample variable covariates for imputation
WNAI	Western North American Indians dataset
WNAIkey	Western North American Indians metadata file
WNAIfact	Western North American Indians dataset with factor labels
WNAIcov	Western North American Indians variable covariates for imputation
XC	Merged 371 society dataset
XCkey	Merged 371 society metadata file
XCfact	Merged 371 society dataset with factor labels
XCcov	Merged 371 society variable covariates for imputation
llm	Matrix of linguistic proximities between all pairs of societies
Undocumented functions
chK	auxiliary function that finds some characteristics of variables in dataframe
chkpmc	auxiliary function that checks variables for high collinearity
gSimpStat	auxiliary function that obtains descriptive statistics for numeric variables in dataframe
kln	auxiliary function that converts all variables in a dataframe to either numeric or character
mmgg	auxiliary function that cleans up output from aggregate() function
quickdesc	auxiliary function that outputs summary of codebook description for variable
resc	auxiliary function that rescales a variable
rmcs	auxiliary function that removes characters common to a set of strings
rnkd	auxiliary function that assigns ranks to values (1=lowest)
showlevs	auxiliary function that describes largest and smallest values of a variable
spmang	auxiliary function that removes leading and trailing spaces from string
widen	auxiliary function that widens the range of a variable
Documented functions
setDS	sets up environment to work with one of the four datasets (EA, LRB, SCCS, WNAI)
mkdummy	makes dummy variable and creates entry for it in metadata
mknwlag	makes network lag variable
addesc	adds or changes description of variable in metadata
fv4scale	helper function to find variables for use in a scale
doMI	creates multiple imputed datasets
mkscale	makes a scale (composite index) from several similar variables
doOLS	estimates regression model using OLS with imputed datasets, including network lag term
doLogit	estimates regression model using logit with imputed datasets, including network lag term
doMNLogit	estimates model using multinomial logit with imputed datasets, including network lag term
CSVwrite	writes objects to csv format file
mkmappng	plots an ordinal variable on world map and writes a png format file
mkcatmappng	plots a categorical variable on world map and writes a png format file
plotSq	plots effects of all independent variables with squared terms and writes a png format file
MEplots	plots marginal effects of independent variables used in doMNLogit

setDS Select ethnological dataset to use in subsequent analysis

Description

Prior to running any other function, one must select the particular ethnological dataset one is using. The function creates the appropriate weight matrices and other auxiliary files.

Usage

setDS(dsname)

Arguments

dsname

name of ethnological dataset (one of : "SCCS", "LRB", "WNAI", "EA", "XC")

Value

The function writes the following objects to the general environment, where they are accessible to the other functions.

cov	Names of covariates to use during imputation step
dx	The selected ethnological dataset is now called dx
dxf	The factor version of dx
key	A metadata file for dx
wdd	A geographic proximity weight matrix for the societies in dx
wee	An ecological similarity weight matrix for the societies in dx
wll	A linguistic proximity weight matrix for the societies in dx

Details

Note

Author(s)

Anthon Eff Anthon.Eff@mtsu.edu

Examples

setDS("SCCS")

mkdummy Make dummy variable and store a description in key file

Description

The function makes a dummy variable from a variable, and creates a description which is used in doOLS output.

Usage

mkdummy(varb, val, rlt="==", showname=TRUE)

Arguments

varb	name of a variable
val	the value of variable vv for which the dummy equals one.
rlt	one of: "==", ">", "<", ">=", "<="
showname	should variable name and description print to the console?

Value

With rlt="==" (the default), the function returns a variable named vv.dval, which equals one when vv==val, and equals zero otherwise. Dummies with other relational operators are: rlt=">=" returns vv.dGeval; rlt=">" returns vv.dGtval; rlt="<=" returns vv.dLeval; and rlt="<" returns vv.dLtval.

Details

There are two reasons why one should use this function to create dummy variables. First, it makes it possible to use the predetermined set of best covariates, found in the auxiliary file "cov", for multiple imputation in doMI. Second, the function will automatically append a description for the dummy variable to the key file, which is then available for use in doOLS output. The description is created using the variable name from the key file and the description of the value from the factor version of the ethnological dataset.

Note

Author(s)

Anthon Eff Anthon.Eff@mtsu.edu

Examples

mkdummy("v70",3) # the default creates variable v70.d3

mkdummy("v70",3,"==") # can also create variable v70.d3 like this

mkdummy("v70",3,">=") # creates variable v70.dGe3

mkdummy("v70",3,"<=") # creates variable v70.dLe3

mkdummy("v70",3,"<") # creates variable v70.dLt3

mkdummy("v70",3,">") # creates variable v70.dGt3

mknwlag Make network lag variable

Description

The function makes a network lag variable.

Usage

mknwlag(MIdata,wtMat,varb)

Arguments

MIdata	multiply imputed dataset, produced using doMI()
wtMat	weight matrix, typically wdd, wll, or wee
varb	name of a variable found in data.frame MIdata

Value

The function returns a variable which is the network lag of varb.

Details

The primary reason to use this function would be to create a network lagged independent variable. Note that this function is not suitable for creating an independent variable which is the network lag of the dependent variable, since such a variable would be endogenous.

Note

Author(s)

Anthon Eff Anthon.Eff@mtsu.edu

Examples

# frequency with which neighbors engage in external war

smi$nbwar<-mknwlag(smi,wdd,"v1650")

addesc Add a variable description to the key file

Description

The function adds a variable description to the key file. This is useful in cases where a new variable is created, whose description is not yet in the key file. The description is then available for use in doOLS output.

Usage

addesc(nvbs, nvbsdes)

Arguments

nvbs	name of variable
nvbsdes	description of nvbs

Value

The function appends the description to the key file.

Details

Note

Author(s)

Anthon Eff Anthon.Eff@mtsu.edu

Examples

dx$valchild <-(dx$v473+dx$v474+dx$v475+dx$v476)

addesc("valchild", "Degree to which society values children")

fv4scale Find potential components for scale

Description

The function scans the metadata for keywords and returns a list of variable names that might be suitable either for using as independent variables or for combining into a scale. Can be helpful in quickly identifying potential scale components, but care should be taken to eliminate those that are unsuitable.

Usage

fv4scale(lookword, dropword=NULL, keepword=NULL, coreword=NULL, nmin=93, minalpha=.7, chklevels=FALSE, verbose=TRUE, doscale=TRUE)

Arguments

lookword	keywords to look for in variable descriptions (from metadata)
dropword	if identified variables contain these keywords, then they should be dropped
keepword	keep only identified variables also containing these keywords
corewords	these are the most important keywords, keep only those correlating highly with this set
nmin	look only for variables with at least this many non-missing values
minalpha	minimum value of Cronbach’s alpha for set of variables (those least conforming will be eliminated until this target is hit)
chklevels	should factor levels also be scanned for keywords (in addition to variable descriptions)?
verbose	should function write information about variables to console (can help in deciding which variables to keep).
doscale	will variables be used in a scale? If TRUE (the default), the function selects variables that result in a suitably high Cronbach’s alpha. If FALSE, the function simply follows the logical rules implicit in lookword, keepword, and dropword.

Value

The function returns a string of variable names.

Details

The function should be used with caution. It provides only candidate variables, not necessarily the best variables, to include in a scale. The widest set of candidate variables can be found by setting chklevels=TRUE, which creates dummy variables for those variables containing a keyword within a factor level label. After identifying variables with keywords in lookword, retaining those meeting the keepword condition and dropping those meeting the dropword condition, the procedure will narrow down the set of retained variables further by looking at the covariances among the variables. It does this in two ways. First, if the coreword option is used, those variables containing the coreword keywords are compared to those not containing the coreword keywords, and of the latter set, only those correlating most strongly with the coreword set are retained. Second, Cronbach’s alpha is calculated for the set of candidate variables, and if alpha<minalpha then that variable is dropped that most increases alpha by being dropped. This procedure is repeated until alpha≥minalpha.

The function fv4scale is run on the original data dx, as created by the function setDS. The alpha produced here is calculated using listwise deletion, and might be lower when a scale is created with multiply imputed data, using the function mkscale.

Note

Author(s)

Anthon Eff Anthon.Eff@mtsu.edu

Examples

# --finds SCCS variables related to female economic contribution--

femecon<-fv4scale(lookword=c("market", "exchange", "wage", "trade", "subsistence", "goods", "product", "labor"), keepword=c("female", "women", "woman"), coreword=c("subsistence"), nmin=60, chklevels=TRUE, verbose=FALSE)

doMI Produce multiply imputed datasets

Description

The function produces multiply imputed datasets from an ethnological dataset, using methods from the mice package.

Usage

smi<-doMI(varbnames, nimp=10, maxit=7)

Arguments

varbnames	names of variables to include in the imputed data.
nimp	the number of imputed datasets to create (default=10)
maxit	the number of iterations used to estimate imputed data (default=7).

Value

The function doMI returns a dataframe containing the number of imputed datasets specified by the nimp option. The datasets are stacked one atop the other, and indexed by the variable ".imp".

Details

This function imputes several new datasets, using covariates for each variable to create a conditional distribution of estimates for each missing value, and then replacing the missing value with a draw from the distribution; as a result, each of the imputed datasets will typically have slightly different values for the estimated cells. The key to successful imputation is to have good covariates for each variable. The auxiliary file "cov" lists the best covariates found in a lengthy specification search. For those variables with no covariates found in "cov" (such as user-created variables), the best covariates are selected from a set of variables with no missing values, including network lag variables (based on geographic distance, language, and ecology).

The first argument is a list of variable names—all of these must be found in the ethnological dataset (transformed variables must be added to the ethnological dataset prior to running doMI). These will be the data used in model building. One should include all data one thinks might be useful, including all transformed data, but no additional data. The second argument is the number of imputed datasets to create: between 5 and 10 imputed datasets are considered adequate, but there is no harm in choosing more; the default is 10. The third argument is the number of iterations to perform in creating each imputed dataset; the default is 7.

It is usually a good idea to take a look at the returned dataframe, to see what variables it contains. It will contain not only the variables listed in varbnames, but also a set of normalized (mean=0, sd=1) climate and ecology variables that will be used as exogenous variables in the function doOLS. In addition, all variables with at least three discrete values, and with a maximum absolute value less than 300, will have a squared variable also entered (the squared variables all have the suffix "Sq"). Finally, the data.frame contains a variable called ".imp", which identifies the imputed dataset, and a variable called ".id" which gives the society name.

Note

Based on the methods proposed by Malcolm M. Dow and E. Anthon Eff.

Author(s)

Anthon Eff Anthon.Eff@mtsu.edu

Examples

scnn<-c("v1649", "v1127", "v2137", "v1265")

smi<-doMI(scnn, nimp=10, maxit=7)

dim(smi) # dimensions of new dataframe smi

smi[1:2, ] # first two rows of new dataframe smi

mkscale Calculate scale (composite index) from component variables

Description

The function calculates a scale from a multiply imputed dataset.

Usage

mkscale(compvarbs, udnavn=NULL, impdata, type="LP", add.descrip=NULL, set.direction=NULL, set.range=NULL)

Arguments

compvarbs	names of component variables to include in the scale.
udnavn	the name of the scale.
impdata	the name of the multiply imputed dataset containing component variables.
type	the method to use in calculating the scale (one of "LP", "mean", "pc1").
add.descrip	the description of the scale, to add to the metadata file.
set.direction	a component variable name, with which the scale should positively correlate.
add.range	two numbers, such as c(0,10), which will become the lower and upper bound of the rescaled scale.

Value

scales	a dataframe, with two values for each observation in the input data: the calculated scale, and its square.
stats	Cronbach’s alpha for the scale components.
corrs	correlation between scale and scale component variables.
varb.desc	component variable descriptions, as rendered by the function quickdesc().

Details

The function can calculate three different kinds of scales: 1) based on linear programming as described in Eff (2010); 2) the mean of the standardized values; 3) the first principal component of the standardized values. Those components that vary negatively with the total scale are multipled by -1; all components are then standardized (mean=10, sd=1).

Output is a list that includes the scale itself, as well as some statistics to help assess whether the scale is performing as desired. The corrs object should be examined: all correlations between components and total scale are positive since those that originally correlated negatively were multiplied by -1. The column labeled "inv" indicates with a "-1" those components that were inverted. The column "levels" reports the factor level labels, and provides a way to understand what higher values of a variable mean. If one variable correlates with the total scale in a way inconsistent with the other variables, then one should try again to find good component variables.

Note

Based on the methods proposed by Malcolm M. Dow and E. Anthon Eff.

Eff, E. A. (2010). A scale for markets and property using the Standard Cross-Cultural Sample: a linear programming approach. World Cultures eJournal. 17(2). Retrieved from: http://escholarship.org/uc/item/12k7z4st

Author(s)

Anthon Eff Anthon.Eff@mtsu.edu

Examples

scnn<-c(femecon, "v1649", "v1127", "v2137", "v1265")

smi<-doMI(scnn, nimp=10, maxit=7)

fec<-mkscale(compvarbs="femecon", udnavn="femecon.lp", impdata=smi,

type="LP", add.descrip="female economic contribution (LP scale)")

#--check reasonableness of scale--

fec$stats

fec$corrs

smi[,names(fec$scales)]<-fec$scales

doOLS Estimate OLS model on multiply imputed data

Description

The function estimates an unrestricted and restricted OLS model, with network lag term, providing common diagnostics.

Usage

doOLS(MIdata, depvar, indpv, rindpv=NULL, othexog=NULL, dw=TRUE,

lw=TRUE, ew=FALSE, stepW=FALSE, relimp=FALSE, slmtests=FALSE, haustest=NULL,

boxcox=FALSE, getismat=FALSE, mean.data=TRUE, doboot=0, full.set=FALSE)

Arguments

MIdata	a multiply imputed dataset, created by the function doMI
depvar	the name of the dependent variable (must be in MIdata)
indpv	the names of the independent variables for the unrestricted model (must be in MIdata)
rindpv	names of restricted model independent variables (must be in indpv; when default of NULL is executed, the restricted model independent variables will be the same as the unrestricted model, minus the last variable)
othexog	names of additional exogenous variables (must be in MIdata; will be added to a list of 21 variables; default is NULL)
dw	Should geographic proximity be used in constructing composite weight matrix (default=TRUE)
lw	Should linguistic proximity be used in constructing composite weight matrix (default=TRUE)
ew	Should ecological proximity be used in constructing composite weight matrix (default=FALSE)
stepW	Should stepwise regression be done to show most-selected variables from unrestricted model (default=FALSE)
relimp	Should relative importance be calculated for independent variables of restricted model (default=FALSE)
slmtests	Should spatial error tests be run for the three weight matrices (default=FALSE)
haustest	Hausman tests (H0: variable exogenous) are run for each independent variable listed here (variable must be in the restricted model). Default of NULL runs no tests.
boxcox	When boxcox=TRUE, a Box-Cox transformation is applied to the dependent variable, to make residuals as normal as possible. Default is FALSE.
getismat	When getismat=TRUE, the distance weight matrix is modified in the way suggested by Getis and Aldstadt (2003). Default is FALSE.
mean.data	When mean.data=TRUE (the default), output file includes a dataframe with mean values (across imputations) of the unrestricted model variables for each society, as well as significant dfbeta scores for restricted model independent variables, and latitude and longitude. mean.data=FALSE returns the entire, unaggregated set of data.
doboot	Enter the number of bootstrap repetitions to calculate bootstrap standard errors. Legal values lie between 10 and 10, 000. The default (doboot=0) does not calculate bootstrap standard errors.
full.set	The default uses von Hippel’s recommended method of deleting observations for which the dependent variable is missing. To use all observations, use full.set=TRUE.

Value

Returns a list with 14 elements:

DependVarb	Description of dependent variable
URmodel	Coefficient estimates from the unrestricted model (includes standardized coefficients and VIFs). Two pvalues are given for H0: β =0. One is the usual pvalue, the other (hcpval) is heteroskedasticity consistent. If stepkept=TRUE, the table will also include the proportion of times a variable is retained in the model using stepwise regression.
model.varbs	Short descriptions of model variables: shows the meaning of the lowest and highest values of the variable. This can save a trip to the codebook.
Rmodel	Coefficient estimates from the restricted model. If relimp=TRUE, the R²assigned to each independent variable is shown here.
EndogeneityTests	Hausman tests (H0: variable is exogneous), with F-statistic for weak instruments (a rule of thumb is that the instrument is weak if the F-stat is below 10), and Sargan test (H0: instrument is uncorrelated with second-stage 2SLS residuals).
Diagnostics	Regression diagnostics for the restricted model: RESET test (H0: model has correct functional form); Wald test (H0: appropriate variables dropped); Breusch-Pagan test (H0: residuals homoskedastic; Shapiro-Wilkes test (H0: residuals normal); Hausman test (H0: Wy is exogenous); Sargan test (H0: residuals uncorrelated with instruments for Wy). If slmtests=TRUE, the LaGrange multiplier tests (H0: spatial error model not appropriate) are reported here.
OtherStats	Other statistics: Composite weight matrix weights (see details); R²for restricted model and unrestricted model; number of imputations; number of observations; Fstat for weak instruments for Wy.
DescripStats.ImputedData	Descriptive statistics for variables in unrestricted model.
DescripStats.OriginalData	Descriptive statistics for variables in unrestricted model.
totry	Character string of variables that were most significant in the unrestricted model as well as additional variables that proved significant using the add1 function on the restricted model.
didwell	Character string of variables that were most significant in the unrestricted model.
usedthese	Table showing how observations used differ from observations not used, regarding ecology, continent, and subsistence.
dfbetas	Influential observations for dfbetas (see details)
data	Data as used in the estimations. Observations with missing values of the dependent variable have been dropped.

Details

Users can choose any of three kinds of proximity/similarity weight matrices for constructing a network lag term: geographic, linguistic, and ecological. In most cases, users should choose only geographic and linguistic (the defaults). The optimal composite weight matrix, constructed as the weighted sum of the chosen weight matrices, is that which returns the most significant LaGrange multiplier statistic on the unrestricted model without network lag term (i.e., the composite matrix that finds the most autocorrelated structure in the unrestricted model residuals). The network lag term is entered in each model as the variable "Wy".

The dfbetas are scaled changes in restricted model coefficient estimates caused by adding an observation to the restricted model. Negative values indicate that including that observation lowers the coefficient estimate; positive values indicate that inclusion raises the estimate. Only the most influential dfbetas are output.

The stepwise procedure can provide additional insight into which independent variables provide the best model fit. Since the imputed datasets differ slightly from each other, the variables selected by a stepwise procedure typically differ slightly for each imputed dataset. If the stepW=TRUE option is chosen, a column labeled "stepkept" will be added to the table reporting unrestricted model results. The column reports the proportion of times the independent variable was retained in the model by a stepwise procedure using both forward and backward selection.

The add1 function tests whether the members of a list of variables prove significant when added singly to a model. The list of variables includes all numeric variables in the imputed dataset, as well as squared terms of variables currently in the unrestricted regression. Variables proving significant in over 80 percent of the m estimated models are returned in the character string "totry".

Relative importance is a method of assigning R² to each independent variable. The method repeatedly estimates a model, first with one independent variable, then with two, etc. and calculates the change in R² as each variable is introduced. The order of entry is changed, and the process repeated, to consider all possible orders of entry. The relative importance measure is the average change in R² when introducing an independent variable across all these different orders of entry. With large numbers of independent variables, the calculations are prohibitively slow. Setting relimp=TRUE will calculate the relative importance of independent variables in the restricted model, and report these in the column labeled "relimp".

Endogeneity is a recognized problem with network lag terms. The Hausman test for endogenous regressors is performed on Wy, which is replaced by an instrumental variable which is the fitted value from regressing Wy on the network lagged other independent variables. The instrumental variable should be highly correlated with the endogenous variable, but not correlated with the 2SLS second-stage residual. A test for the latter is the Sargan test, with H0: residuals are uncorrelated with instruments. A test for the former is to calculate the F-statistic with H0: the excluded instruments are irrelevant in the first-stage regression; the rule of thumb is that this "weak identification F-stat" should be larger than 10. Since the weak identification F-stat will be low if irrelevant instruments are chosen, a stepwise procedure is used to select among a set of possible instruments including both the network lagged independent variables and the climate and ecology variables.

All independent variables can be tested for endogeneity (squared variables are tested in their original form). For these, the potential instruments consist of the climate, location, and ecology variables, and stepwise regression is used to pick a significant subset. While these variables are certainly exogenous, they are unlikely to be good instruments, since finding good instruments is a process requiring a great deal of creativity and patience on the part of the econometrician, and is not something that can be automated. Thus, one should think carefully about variables that might serve as instruments for any variable one wishes to test for endogeneity, and include these in the othexog= option.

Heteroskedasticity biases the standard errors of estimated coefficients. If the Breusch-Pagan test rejects the null that errors are homoskedastic, one should use either the heteroskedasticity consistent p-values (hcpval) in the URmodel and Rmodel results, or the p-values from bootstrap standard errors. Bootstraps take a fairly long time to calculate, so one shouldn't set the number of repetitions too high; in most cases, good results can be obtained with doboot=500.

If the residuals are not normal, and introduction of new independent variables and functional form changes do not make them normal, one can use the Box-Cox transformation where the dependent variable y is now equal to (y^λ-1)/λ and λ is chosen so as to make the residuals as normal as possible.

Note

Based on the methods proposed by Malcolm M. Dow and E. Anthon Eff.

Getis, A. and Aldstadt, J. (2002). Constructing the spatial weights matrix using a local statistic. Geographical Analysis 36:90-104.

Author(s)

Anthon Eff Anthon.Eff@mtsu.edu

Examples

scnn<-c("valchild", "v1649", "v1127", "v2137", "v1265", "v245.d2")

smi<-doMI(scnn, nimp=10, maxit=7)

iv<-c("v1649", "v1127", "v2137", "v1265", "v245.d2")

riv<- c("v1649", "v1127", "v2137")

h<-doOLS(MIdata=smi, depvar="valchild", indpv=iv, rindpv=riv, othexog=NULL, dw=TRUE, lw=TRUE, ew=FALSE, stepW=FALSE, relimp=FALSE, slmtests=FALSE, haustest=NULL, boxcox=FALSE, getismat=FALSE, mean.data=TRUE, doboot=0, full.set=FALSE)

# look at first 11 elements in h

h[1:11]

doLogit Estimate logit model on multiply imputed data

Description

The function estimates an unrestricted and restricted logit model in a multiple imputation environment, with network lag term, providing common diagnostics.

Usage

doLogit(MIdata, depvar, indpv, rindpv=NULL, dw=TRUE, lw=TRUE, ew=FALSE, doboot=500, mean.data=TRUE, getismat=FALSE, othexog=NULL, full.set=FALSE)

Arguments

MIdata	a multiply imputed dataset, created by the function doMI
depvar	the name of the dependent variable (must be in MIdata)
indpv	the names of the independent variables for the unrestricted model (must be in MIdata)
rindpv	names of restricted model independent variables (must be in indpv; when default of NULL is executed, the restricted model independent variables will be the same as the unrestricted model, minus the last variable)
dw	Should geographic proximity be used in constructing composite weight matrix (default=TRUE)
lw	Should linguistic proximity be used in constructing composite weight matrix (default=TRUE)
ew	Should ecological proximity be used in constructing composite weight matrix (default=FALSE)
doboot	Enter the number of bootstrap repetitions to calculate bootstrap standard errors. Legal values lie between 10 and 10, 000. The default (doboot=500) is usually sufficient.
mean.data	When mean.data=TRUE (the default), output file includes a dataframe with mean values (across imputations) of the unrestricted model variables for each society, as well as predicted value and residuals for the restricted model, and latitude and longitude. mean.data=FALSE returns the entire, unaggregated set of data.
getismat	When getismat=TRUE, the distance weight matrix is modified in the way suggested by Getis and Aldstadt (2003). Default is FALSE.
othexog	names of additional exogenous variables (must be in MIdata; will be added to a list of 21 variables; default is NULL)
full.set	The default uses von Hippel’s recommended method of deleting observations for which the dependent variable is missing. To use all observations, use full.set=TRUE.

Value

Returns a list with 8 elements:

DependVarb	Description of dependent variable
URmodel	Coefficient estimates from the unrestricted; pvalues are from bootstrap standard errors.
model.varbs	Short description of model variables. Can save a trip to the codebook.
Rmodel	Coefficient estimates from the restricted model.
Diagnostics1	Three likelihood ratio tests: LRtestNull-R (H0: all variables in restricted model have coefficients equal zero); LRtestNull-UR (H0: all variables in unrestricted model have coefficients equal zero); LRtestR-R (H0: variables in unrestricted model, not carried over to restricted model, have coefficients equal zero). One Wald test: waldtest-R (H0: variables in unrestricted model, not carried over to restricted model, have coefficients equal zero).
Diagnostics2	Statistics without formal hypothesis tests. pLargest: the largest of proportion 1s or proportion 0s; the model should be able to outperform simply picking the most common outcome. pRight: proportion of fitted values that equal actual value of dependent variable. NetpRight=pRight-pLargest; this is positive in a good model. McIntosh.Dorfman: (num. correct 0s/num. 0s) + (num. correct 1s/num. 1s); this exceeds one in a good model; McFadden.R2 and Nagelkerke.R2 are two versions of pseudo R².
OtherStats	Other statistics: Composite weight matrix weights; number of imputations; number of observations.
data	Data as used in the estimations. Observations with missing values of the dependent variable have been dropped.

Details

Users can choose any of three kinds of proximity/similarity weight matrices for constructing a network lag term: geographic, linguistic, and ecological. In most cases, users should choose only geographic and linguistic (the defaults). The optimal composite weight matrix, constructed as the weighted sum of the chosen weight matrices, is that which returns the most significant LaGrange multiplier statistic on the unrestricted model without network lag term, estimated with OLS. The network lag term is entered in each model as the variable "Wy".

Endogeneity is a recognized problem with network lag terms. In the logit context, the network lag term will generate incorrect standard errors, so that the only legitimate p-values will be those coming from bootstrap standard errors. Bootstraps take a fairly long time to calculate, so one shouldn't set the number of repetitions too high; in most cases, good results can be obtained with doboot=500 (the default).

Note

Based on the methods proposed by Malcolm M. Dow and E. Anthon Eff.

Getis, A. and Aldstadt, J. (2002). Constructing the spatial weights matrix using a local statistic. Geographical Analysis 36:90-104.

McFadden, D. (1973). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in Econometrics. New York: Academic Press.

McIntosh, C. S., & Dorfman, J. H. (1992). Qualitative forecast evaluation: A test for information value. American Journal of Agricultural Economics, 74, 209-214.

Nagelkerke, N. J. D. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78, 691-692.

Author(s)

Anthon Eff Anthon.Eff@mtsu.edu

Examples

dpV<-"v67.d3"

UiV<-c("v2002.d2", "v1845", "v1649", "v1127.d2", "v2137", "v279.d5", "v213.d3",

"v1265", "v1", "v234", "femecon.lp", "rectang")

RiV<-c("v1649", "v1127.d2", "v2137", "v1265")

q<-doLogit(smi, depvar=dpV, indpv=UiV, rindpv=RiV, dw=TRUE, lw=TRUE, ew=FALSE,

doboot=1000, mean.data=TRUE, getismat=FALSE, othexog=NULL)

#--look at first seven objects in q--

q[1:7]

doMNLogit Estimate multinomial logit model on multiply imputed data

Description

The function estimates an unrestricted and restricted multinomial logit model in a multiple imputation environment, with network lag term, providing marginal effects and a few common diagnostics. This is to be used in cases where the dependent variable is categorical, with three or more categories.

Usage

doLogit(MIdata,depvar,indpv,rindpv=NULL,dw=TRUE,lw=TRUE,doboot=200,subgrps=NULL, full.set=FALSE)

Arguments

MIdata	a multiply imputed dataset, created by the function doMI
depvar	the name of the dependent variable (must be categorical variable in MIdata)
indpv	the names of the independent variables for the unrestricted model (must be in MIdata)
rindpv	names of restricted model independent variables (must be in indpv; when default of NULL is executed, the restricted model independent variables will be the same as the unrestricted model, minus the last variable)
dw	Should geographic proximity be used in constructing composite weight matrix (default=TRUE)
lw	Should linguistic proximity be used in constructing composite weight matrix (default=TRUE)
doboot	Enter the number of bootstrap repetitions to calculate bootstrap standard errors. Legal values lie between 10 and 10,000. The default is 200.
subgrps	The name of a dummy variable, present in MIdata, used to compare mean marginal effects in two halves of the data. The default does not divide the data to compare marginal effects.
full.set	The default uses von Hippel’s recommended method of deleting observations for which the dependent variable is missing. To use all observations, use full.set=TRUE.

Value

Returns a list with 23 elements:

DependVarb	Description of dependent variable
URmeanME.MargEff	Mean marginal effects for unrestricted model, with Fst, df, and pvalue
URmeanME.MEpval	Mean marginal effects for unrestricted model. Pvalues only.
URmeanME.MEmean	Mean marginal effects for unrestricted model. Mean only.
RmeanME.MargEff	Mean marginal effects for restricted model, with Fst, df, and pvalue
RmeanME.MEpval	Mean marginal effects for restricted model. Pvalues only.
RmeanME.MEmean	Mean marginal effects for restricted model. Mean only.
URdifME	Differences in mean marginal effects across alternatives: unrestricted model.
RdifME	Differences in mean marginal effects across alternatives: restricted model.
URcoef	Coefficient estimates from the unrestricted model.
Rcoef	Coefficient estimates from the restricted model.
TestRestr	Two tests for model restrictions (H0: dropped variables don’t belong in the model).
TestIIA	Tests for each alternative of Independence of Irrelevant Alternatives (H0: dropping alternative does not affect choice for other alternatives).
URpredTable.predTable	Table comparing predicted choices with actual choices: unrestricted model.
URpredTable.crlg	Ratio of number of correct choices over number in largest alternative: unrestricted model.
RpredTable.predTable	Table comparing predicted choices with actual choices: restricted model.
RpredTable.crlg	Ratio of number of correct choices over number in largest alternative: restricted model.
OtherStats	Other statistics: Composite weight matrix weights; ratio of number of correct predictions over number in largest category; number of imputations; number of observations; number of bootstrap iterations.
UsubgrpDiff	Comparing mean marginal effects across two subgroups indicated by 0,1 binary variable: mean of group 1 minus mean of group 0, with pvalue. Unrestricted model.
RsubgrpDiff	Comparing mean marginal effects across two subgroups indicated by 0,1 binary variable: mean of group 1 minus mean of group 0, with pvalue. Restricted model.
URmarEff	Society-level marginal effects calculated using final coefficient values and mean (across imputations) data values: unrestricted model.
RmarEff	Society-level marginal effects calculated using final coefficient values and mean (across imputations) data values: restricted model.
data	Mean (across imputations) data values for each society.

Details

A spatial lag term is found by combining a geographic and linguistic proximity matrix. The optimal composite weight matrix, constructed as the weighted sum of the chosen weight matrices, is that which returns the highest log likelihood ratio on the unrestricted model. The network lag term is entered in each model as the variable "Wy".

Endogeneity is a recognized problem with network lag terms. In the multinomial logit context, the network lag term will generate incorrect standard errors, so that the only legitimate p-values will be those coming from bootstrap standard errors. These bootstraps take a very long time to calculate, so one shouldn't set the number of repetitions too high. The default is doboot=200, but 300 to 1000 should be used for published work.

The signs of coefficient estimates are not meaningful in multinomial logit models, since the marginal effects are a function of all coefficient values and data values. The marginal effects will be unique for each society, for each variable, for each alternative. It is traditional to take the mean marginal effect, for each variable, for each alternative (i.e., take the mean across societies) and use bootstrapping to test whether the marginal effect is significantly different from zero.

Occasionally, one might be interested in how marginal effects vary between two subsets of the data. For example, one might want to compare the marginal effects for foragers versus non-foragers.

Note

Based on the methods proposed by Malcolm M. Dow and E. Anthon Eff.

Author(s)

Anthon Eff Anthon.Eff@mtsu.edu

Examples

dpV<-"residence"

UiV<-c("enviro.mean","anim.mean","path.mean","localviol.mean","femecon.mean","tech")

RiV<-c("anim.mean","localviol.mean","femecon.mean","tech")

h<-doMNLogit(smi,dpV,UiV,RiV,doboot=300,subgrps="nomadic")

CSVwrite(h,"mnl0",FALSE)

MEplots(h,mod="R",filetitle="nom",setylim=RiV,subgrps="nomadic",dpires=300)

CSVwrite Write object to *.csv file

Description

The function writes an object, with elements capable of being coerced to a dataframe, to a csv file. It is used to write the output from doOLS or doLogit to a file that can be read by a spreadsheet.

Usage

CSVwrite(object, filestem, appnd2=FALSE)

Arguments

object	Object to be written—typically output from function doOLS or doLogit
filestem	The base name of the *.csv file (do not include the ".csv" extension)
appnd2	Should the object be appended to the existing file? (default=FALSE)

Value

No values are returned in the R environment; only changes occur to the specified *.csv file.

Details

Set the option appnd2=TRUE to append the output of object to an existing file with base name "filestem". The default will simply overwrite any existing csv file with base name "filestem".

Note

Author(s)

Anthon Eff Anthon.Eff@mtsu.edu

Examples

CSVwrite(h, "olsresults", FALSE)

mkmappng Create png format map for values of ordinal variable

Description

This function writes a png format Pacific-centered world map file to the working directory. Dots represent societies, and the size and color of the dots reflects the value of a variable specified by the user. Options allow presentation of information about local autocorrelation and dfbetas.

Usage

mkmappng (usedata, varb, filetitle=NULL, show="ydata", numnb.lg=3, numnb.lm=20, numch=0, pvlm=.05, dfbeta.show=FALSE, zoom=FALSE, map.width=8, map.height=5, map.units="in", map.pointsize=10, map.res=500)

Arguments

usedata	Name of a dataframe. It must contain a column named "lati" and a column named "long" (latitude and longitude in decimal degrees)
varb	Name of a variable in the dataframe.
filetitle	Stem title of png file (".png" suffix added automatically). Default is same as varb.
show	Type of value to display. Legal values are lgt (local G), ydata (original data values), lmtp (classifies points into significant and non-significant local autocorrelation, based on local Moran), and lmtz (local Moran z-value). Default is lgt.
numnb.lg	Number of nearest neighbors to use when creating local G. Default is 3.
numnb.lm	Number of nearest neighbors to use when creating local Moran. Default is 20.
numch	Number of convex hulls to draw around regions of local autocorrelation. Default is 0.
pvlm	Cut-off p-value for considering a local Moran statistic significant. Default is 0.05.
dfbeta.show	Should map indicate points with significant dfbeta values for this variable. Default is FALSE.
zoom	Should map zoom in to plotted points. Default is FALSE. Set to TRUE when using WNAI data.
map.width	Parameter for png map file. This gives width of map. Default is 8.
map.height	Parameter for png map file. This gives height of map. Default is 5.
map.units	Parameter for png map file. This gives units in which width and height are measured. Default is "in".
map.pointsize	Parameter for png map file. This gives pointsize. Default is 10.
map.res	Parameter for png map file. This gives resolution of map file. Default is 500 dpi.

Value

The function writes a png format map to a file in the working directory. Larger values of the mapped variable are shown as larger and darker (reddish) circles; smaller values are shown as smaller and lighter (yellowish) circles.

Details

Option show=lgt gives the local G statistic, which is essentially a spatial moving average, converted to a z-score. It is a reasonable way to smooth—spatially—map points. The default uses only the three nearest neighbors, plus self, to calculate this spatial moving average.

The local Moran is a test for autocorrelation, i.e. the degree to which a society has values similar to those of its neighbors, where the default number of neighbors is 20. Option show=lmtz will display the local Moran z-score, and option show=lmtp displays the binary significant/not significant for the z-score, using the p-value given in option pvlm. Convex hulls are drawn around areas of significant positive local autocorrelation; one must input the number of convex hulls to draw, but otherwise assignment of a point to a specific convex hull is automatic, based on distances between points. Usually some experimentation is needed to find the correct number of convex hulls, and it is easiest to do this experimentation on maps where show=lmtp.

This function is intended for use with data relevant to models estimated by the function doOLS. The function doOLS has the option mean.data, when this is set to TRUE (the default), the output from doOLS contains a dataframe with values for the dependent and independent variables (including Wy) calculated as the mean across all imputed datasets. There are also latitude and longitude coordinates, and the mean values of the dfbetas for variables used in the restricted model. The societies which, when included, cause a significant change in the estimated parameter in the restricted model, can be shown in the map when dfbeta.show=TRUE. Triangles pointing upward indicate societies which, when included, significantly increase the value of the coefficient; triangles pointing downward indicate societies whose inclusion significantly lowered the value of the coefficient.

Note

Author(s)

Anthon Eff Anthon.Eff@mtsu.edu

Examples

dpV<-"v67.d3"

UiV<-c("v2002.d2", "v1845", "v1649", "v1127.d2", "v2137", "v279.d5", "v213.d3",

"v1265", "v1", "v234", "femecon.lp", "rectang")

RiV<-c("v1649", "v1127.d2", "v2137", "v1265")

h<-doOLS(MIdata=smi, depvar=dpV, indpv=UiV, rindpv=RiV, othexog=NULL,

dw=TRUE, lw=TRUE, ew=FALSE, stepW=TRUE, boxcox=FALSE, getismat=FALSE,

relimp=TRUE, slmtests=FALSE, haustest=NULL, mean.data=TRUE, doboot=500)

p<-h[[12]]

# experimenting to find the right number of convex hulls

sapply(2:11, function(x) mkmappng(p, "femecon.lp", paste("Womenswork", x, sep=""),

show="lmtp", numch=x, dfbeta.show=TRUE))

# creates file called "Womenswork_ydata.png"

mkmappng(usedata=p, varb="femecon.lp", filetitle="Womenswork", show="ydata", numch=8, dfbeta.show=TRUE)

mkcatmappng Create png format map for values of categorical variable

Description

This function writes a png format Pacific-centered world map file to the working directory. Symbols represent societies, and the shape and color of the symbols represent the categories of a variable specified by the user.

Usage

mkcatmappng (usedata, varb, filetitle, zoom=FALSE, map.width=8, map.height=5, map.units="in", map.pointsize=10, map.res=500)

Arguments

usedata	Name of a dataframe. It must contain a column named "lati" and a column named "long" (latitude and longitude in decimal degrees)
varb	Name of a variable in the dataframe.
filetitle	Stem title of png file (".png" suffix added automatically). Default is same as varb.
zoom	Should map zoom in to plotted points. Default is FALSE. Set to TRUE when using WNAI data.
map.width	Parameter for png map file. This gives width of map. Default is 8.
map.height	Parameter for png map file. This gives height of map. Default is 5.
map.units	Parameter for png map file. This gives units in which width and height are measured. Default is "in".
map.pointsize	Parameter for png map file. This gives pointsize. Default is 10.
map.res	Parameter for png map file. This gives resolution of map file. Default is 500 dpi.

Value

The function writes a png format map to a file in the working directory. A legend identifies the category represented by each symbol.

Details

This function is intended for cases where the plotted variable is categorical. Symbols for each society have a color and shape representing the category, and a legend associates the symbols with the category label. In general, this map will be most effective when the number of categories is small (six or fewer).

When using the WNAI data, one should set zoom=TRUE so that the map centers on western North America.

Note

Author(s)

Anthon Eff Anthon.Eff@mtsu.edu

Examples

mkcatmappng(dx,"ekd","Zekd",zoom=TRUE)

plotSq Make plots of marginal effects of all independent variables with squared terms

Description

The function takes output from doOLS or doLogit, scans the independent variables in the restricted model for variables with squared terms, and creates plots of their marginal effects on the dependent variable

Usage

plotSq(x,filetitle=NULL)

Arguments

x	name of output from doOLS or doLogit
filetitle	name of png file (default=NULL will write plots to GUI)

Value

The function creates plots of the marginal effects of all restricted model independent variables with squared terms.

Details

In a linear regression, the sign of the marginal effect is simply the sign of the coefficient. But with polynomial expressions, the marginal effect sign will vary over the values of the independent variable. These plots show the pattern of variation in cases where an independent variable is entered as a quadratic or simply as a squared term. The abscissa gives the values of the variable found in the averaged data, while the ordinate gives the marginal effect on the dependent variable. The number of observations at each value is shown both by the rugplots in green at the top of the plot, and by the size of the red circles at each variable value.

One must specify the filetitle in order to save the plot to a png format file with name filetitle.png.

Note

Author(s)

Anthon Eff Anthon.Eff@mtsu.edu

Examples

plotSq(h)

MEplots Make plots of marginal effects of all independent variables used in doMNLogit estimation

Description

The function takes output from doMNLogit, and produces boxplots showing the range of marginal effects, by alternative, for each independent variable.

Usage

plotSq(x,mod="R",varbs=NULL,filetitle=NULL,setylim=NULL,subgrps=NULL,dpires=500)

Arguments

x	name of output from doMNLogit
mod	"R" plots marginal effects from restricted model; "UR" from unrestricted
varbs	names of variables to plot. Default will plot all variables.
filetitle	name of png file (default=NULL will write plots to GUI)
setylim	list of independent variable names for which plots should have the same y-axis range
subgrps	If the subgrps option was used in doMNLogit, can invoke it here as well to display separate boxplots for each subgroup.
dpires	set the dots per inch resolution of the png file (300 is the usual "publication quality", higher is even better).

Value

The function creates plots of the effects of all restricted model independent variables with squared terms.

Details

One must specify the filetitle in order to save plot to png format file names filetitle.png.

Note

Author(s)

Anthon Eff Anthon.Eff@mtsu.edu

Examples

MEplots(h)