Objects
contained in R workspace DEf01f.Rdata
Datasets |
|
EA |
Ethnographic Atlas dataset |
EAkey |
Ethnographic Atlas metadata file |
EAfact |
Ethnographic Atlas dataset with factor labels |
EAcov |
Ethnographic Atlas variable covariates for imputation |
LRB |
Binford forager dataset |
LRBkey |
Binford forager metadata file |
LRBfact |
Binford forager dataset with factor labels |
LRBcov |
Binford forager variable covariates for imputation |
SCCS |
Standard Cross-Cultural Sample dataset |
SCCSkey |
Standard Cross-Cultural Sample metadata file |
SCCSfact |
Standard Cross-Cultural Sample dataset with factor labels |
SCCScov |
Standard Cross-Cultural Sample variable covariates for imputation |
WNAI |
Western North American Indians dataset |
WNAIkey |
Western North American Indians metadata file |
WNAIfact |
Western North American Indians dataset with factor labels |
WNAIcov |
Western North American Indians variable covariates for imputation |
XC |
Merged 371 society dataset |
XCkey |
Merged 371 society metadata file |
XCfact |
Merged 371 society dataset with factor labels |
XCcov |
Merged 371 society variable covariates for imputation |
llm |
Matrix of linguistic proximities between all pairs of societies |
Undocumented
functions |
|
chK |
auxiliary function that finds some characteristics of variables in dataframe |
chkpmc |
auxiliary function that checks variables for high collinearity |
gSimpStat |
auxiliary function that obtains descriptive statistics for numeric variables in dataframe |
kln |
auxiliary function that converts all variables in a dataframe to either numeric or character |
mmgg |
auxiliary function that cleans up output from aggregate() function |
quickdesc |
auxiliary function that outputs summary of codebook description for variable |
resc |
auxiliary function that rescales a variable |
rmcs |
auxiliary function that removes characters common to a set of strings |
rnkd |
auxiliary function that assigns ranks to values (1=lowest) |
showlevs |
auxiliary function that describes largest and smallest values of a variable |
spmang |
auxiliary function that removes leading and trailing spaces from string |
widen |
auxiliary function that widens the range of a variable |
Documented
functions |
|
setDS |
sets up environment to work with one of the four datasets (EA, LRB, SCCS, WNAI) |
mkdummy |
makes dummy variable and creates entry for it in metadata |
mknwlag |
makes network lag variable |
addesc |
adds or changes description of variable in metadata |
fv4scale |
helper function to find variables for use in a scale |
doMI |
creates multiple imputed datasets |
mkscale |
makes a scale (composite index) from several similar variables |
doOLS |
estimates regression model using OLS with imputed datasets, including network lag term |
doLogit |
estimates regression model using logit with imputed datasets, including network lag term |
doMNLogit |
estimates model using multinomial logit with imputed datasets, including network lag term |
CSVwrite |
writes objects to csv format file |
mkmappng |
plots an ordinal variable on world map and writes a png format file |
mkcatmappng |
plots a categorical variable on world map and writes a png format file |
plotSq |
plots effects of all independent variables with squared terms and writes a png format file |
MEplots |
plots marginal effects of independent variables used in doMNLogit |
setDS Select ethnological dataset to use in subsequent analysis
Description
Prior to running any other function, one must select the particular ethnological dataset one is using. The function creates the appropriate weight matrices and other auxiliary files.
Usage
setDS(dsname)
Arguments
dsname |
name of ethnological dataset (one of : "SCCS", "LRB", "WNAI", "EA", "XC") |
Value
The function writes the following objects to the general environment, where they are accessible to the other functions.
cov |
Names of covariates to use during imputation step |
dx |
The selected ethnological dataset is now called dx |
dxf |
The factor version of dx |
key |
A metadata file for dx |
wdd |
A geographic proximity weight matrix for the societies in dx |
wee |
An ecological similarity weight matrix for the societies in dx |
wll |
A linguistic proximity weight matrix for the societies in dx |
Details
Note
Author(s)
Anthon Eff Anthon.Eff@mtsu.edu
Examples
setDS("SCCS")
mkdummy Make dummy variable and store a description in key file
Description
The function makes a dummy variable from a variable, and creates a description which is used in doOLS output.
Usage
mkdummy(varb, val,
rlt="==", showname=TRUE)
Arguments
varb |
name of a variable |
val |
the value of variable vv for which the dummy equals one. |
rlt |
one of: "==", ">", "<", ">=", "<=" |
showname |
should variable name and description print to the console? |
Value
With rlt="==" (the default), the function returns a variable named vv.dval, which equals one when vv==val, and equals zero otherwise. Dummies with other relational operators are: rlt=">=" returns vv.dGeval; rlt=">" returns vv.dGtval; rlt="<=" returns vv.dLeval; and rlt="<" returns vv.dLtval.
Details
There are two reasons why one should use this function to create dummy
variables. First, it makes it possible to use the predetermined set of best
covariates, found in the auxiliary file "cov", for multiple imputation in doMI. Second, the function will automatically append a description for the
dummy variable to the key file, which is then available for use in doOLS output. The
description is created using the variable name from the key file and the
description of the value from the factor version of the ethnological dataset.
Note
Author(s)
Anthon Eff Anthon.Eff@mtsu.edu
Examples
mkdummy("v70",3) # the default creates variable v70.d3
mkdummy("v70",3,"==") # can also create variable v70.d3 like this
mkdummy("v70",3,">=") # creates variable v70.dGe3
mkdummy("v70",3,"<=")
# creates variable v70.dLe3
mkdummy("v70",3,"<")
# creates variable v70.dLt3
mkdummy("v70",3,">")
# creates variable v70.dGt3
mknwlag Make network lag variable
Description
The function makes a network lag variable.
Usage
mknwlag(MIdata,wtMat,varb)
Arguments
MIdata |
multiply imputed dataset, produced using doMI() |
wtMat |
weight matrix, typically wdd, wll, or wee |
varb |
name of a variable found in data.frame MIdata |
Value
The function returns a variable which is the network lag of varb.
Details
The primary reason to use this function would be to create a network lagged independent variable. Note that this function is not suitable for creating an independent variable which is the network lag of the dependent variable, since such a variable would be endogenous.
Note
Author(s)
Anthon Eff Anthon.Eff@mtsu.edu
Examples
# frequency with which neighbors engage in external war
smi$nbwar<-mknwlag(smi,wdd,"v1650")
addesc Add a variable description to the key file
Description
The function adds a variable description to the key file. This is useful in cases where a new variable is created, whose description is not yet in the key file. The description is then available for use in doOLS output.
Usage
addesc(nvbs, nvbsdes)
Arguments
nvbs |
name of variable |
nvbsdes |
description of nvbs |
Value
The function appends the description to the key file.
Details
Note
Author(s)
Anthon Eff Anthon.Eff@mtsu.edu
Examples
dx$valchild <-(dx$v473+dx$v474+dx$v475+dx$v476)
addesc("valchild", "Degree to which society values children")
fv4scale Find potential components for scale
Description
The function scans the metadata for keywords and returns a list of variable names that might be suitable either for using as independent variables or for combining into a scale. Can be helpful in quickly identifying potential scale components, but care should be taken to eliminate those that are unsuitable.
Usage
fv4scale(lookword, dropword=NULL,
keepword=NULL, coreword=NULL,
nmin=93, minalpha=.7, chklevels=FALSE, verbose=TRUE, doscale=TRUE)
Arguments
lookword |
keywords to look for in variable descriptions (from metadata) |
dropword |
if identified variables contain these keywords, then they should be dropped |
keepword |
keep only identified variables also containing these keywords |
corewords |
these are the most important keywords, keep only those correlating highly with this set |
nmin |
look only for variables with at least this many non-missing values |
minalpha |
minimum value of Cronbach’s alpha for set of variables (those least conforming will be eliminated until this target is hit) |
chklevels |
should factor levels also be scanned for keywords (in addition to variable descriptions)? |
verbose |
should function write information about variables to console (can help in deciding which variables to keep). |
doscale |
will variables be used in a scale? If TRUE (the default), the function selects variables that result in a suitably high Cronbach’s alpha. If FALSE, the function simply follows the logical rules implicit in lookword, keepword, and dropword. |
Value
The function returns a string of variable names.
Details
The function should be used with caution.
It provides only candidate variables, not necessarily the best variables, to
include in a scale. The widest set of candidate variables can be found by
setting chklevels=TRUE, which creates dummy
variables for those variables containing a keyword within a factor level label.
After identifying variables with keywords in lookword, retaining those meeting the keepword condition and dropping those meeting the dropword condition, the procedure will narrow down the set of
retained variables further by looking at the covariances among the variables.
It does this in two ways. First, if the coreword option is used, those variables containing the coreword keywords are compared to those not containing the coreword keywords, and of the latter set, only those
correlating most strongly with the coreword set are retained. Second, Cronbach’s
alpha is calculated for the set of candidate variables, and if alpha<minalpha then that variable is dropped that most increases
alpha by being dropped. This procedure is repeated until alpha≥minalpha.
The function fv4scale is run on the original data dx, as created by the function setDS. The alpha produced here is calculated using listwise deletion, and might be lower when a scale is created with multiply imputed data, using the function mkscale.
Note
Author(s)
Anthon Eff Anthon.Eff@mtsu.edu
Examples
#
--finds SCCS variables related to female economic contribution--
femecon<-fv4scale(lookword=c("market",
"exchange", "wage", "trade", "subsistence",
"goods", "product", "labor"), keepword=c("female",
"women", "woman"), coreword=c("subsistence"),
nmin=60, chklevels=TRUE, verbose=FALSE)
doMI Produce multiply imputed datasets
Description
The function produces multiply imputed datasets from an ethnological dataset, using methods from the mice package.
Usage
smi<-doMI(varbnames, nimp=10, maxit=7)
Arguments
varbnames |
names of variables to include in the imputed data. |
nimp |
the number of imputed datasets to
create (default=10) |
maxit |
the number of iterations used to estimate imputed data (default=7). |
Value
The function doMI returns a dataframe containing the number of imputed datasets specified by the nimp option. The datasets are stacked one atop the other, and indexed by the variable ".imp".
Details
This function imputes several new datasets, using covariates for each variable to create a conditional distribution of estimates for each missing value, and then replacing the missing value with a draw from the distribution; as a result, each of the imputed datasets will typically have slightly different values for the estimated cells. The key to successful imputation is to have good covariates for each variable. The auxiliary file "cov" lists the best covariates found in a lengthy specification search. For those variables with no covariates found in "cov" (such as user-created variables), the best covariates are selected from a set of variables with no missing values, including network lag variables (based on geographic distance, language, and ecology).
The first argument is a list of variable names—all of these must be found in the ethnological dataset (transformed variables must be added to the ethnological dataset prior to running doMI). These will be the data used in model building. One should include all data one thinks might be useful, including all transformed data, but no additional data. The second argument is the number of imputed datasets to create: between 5 and 10 imputed datasets are considered adequate, but there is no harm in choosing more; the default is 10. The third argument is the number of iterations to perform in creating each imputed dataset; the default is 7.
It is usually a good idea to take a look at the returned dataframe, to see what variables it contains. It will contain not only the variables listed in varbnames, but also a set of normalized (mean=0, sd=1) climate and ecology variables that will be used as exogenous variables in the function doOLS. In addition, all variables with at least three discrete values, and with a maximum absolute value less than 300, will have a squared variable also entered (the squared variables all have the suffix "Sq"). Finally, the data.frame contains a variable called ".imp", which identifies the imputed dataset, and a variable called ".id" which gives the society name.
Note
Based on the methods proposed by Malcolm M. Dow and E. Anthon Eff.
Author(s)
Anthon Eff Anthon.Eff@mtsu.edu
Examples
scnn<-c("v1649",
"v1127", "v2137", "v1265")
smi<-doMI(scnn, nimp=10, maxit=7)
dim(smi) # dimensions of new dataframe smi
smi[1:2, ] # first two rows of new dataframe smi
mkscale Calculate scale (composite index) from component variables
Description
The function calculates a scale from a multiply imputed dataset.
Usage
mkscale(compvarbs, udnavn=NULL,
impdata, type="LP", add.descrip=NULL,
set.direction=NULL, set.range=NULL)
Arguments
compvarbs |
names of component variables to include in the scale. |
udnavn |
the name of the scale. |
impdata |
the name of the multiply imputed dataset containing component variables. |
type |
the method to use in calculating the scale (one of "LP", "mean", "pc1"). |
add.descrip |
the description of the scale, to add to the metadata file. |
set.direction |
a component variable name, with which the scale should positively
correlate. |
add.range |
two numbers, such as c(0,10), which will become the lower and upper bound
of the rescaled scale. |
Value
scales |
a dataframe, with two values for each
observation in the input data: the calculated scale, and its square. |
stats |
Cronbach’s alpha for the scale components. |
corrs |
correlation between scale and scale component variables. |
varb.desc |
component variable descriptions, as rendered by the function quickdesc(). |
Details
The function can calculate three different kinds of scales: 1) based on linear programming as described in Eff (2010); 2) the mean of the standardized values; 3) the first principal component of the standardized values. Those components that vary negatively with the total scale are multipled by -1; all components are then standardized (mean=10, sd=1).
Output is a list that includes the scale itself, as
well as some statistics to help assess whether the scale is performing as
desired. The corrs object should be examined: all
correlations between components and total scale are positive since those that
originally correlated negatively were multiplied by -1. The column labeled "inv" indicates with a "-1" those components that were
inverted. The column "levels" reports the factor level labels, and provides a
way to understand what higher values of a variable mean. If one variable
correlates with the total scale in a way inconsistent with the other variables,
then one should try again to find good component variables.
Note
Based on the
methods proposed by Malcolm M. Dow and E. Anthon Eff.
Eff, E. A. (2010). A scale for markets and property using the Standard Cross-Cultural Sample: a linear programming approach. World Cultures eJournal. 17(2). Retrieved from: http://escholarship.org/uc/item/12k7z4st
Author(s)
Anthon Eff Anthon.Eff@mtsu.edu
Examples
scnn<-c(femecon, "v1649", "v1127", "v2137",
"v1265")
smi<-doMI(scnn, nimp=10, maxit=7)
fec<-mkscale(compvarbs="femecon", udnavn="femecon.lp", impdata=smi,
type="LP", add.descrip="female economic contribution (LP
scale)")
#--check
reasonableness of scale--
fec$stats
fec$corrs
smi[,names(fec$scales)]<-fec$scales
doOLS Estimate OLS model on multiply imputed data
Description
The function estimates an unrestricted and restricted OLS model, with network lag term, providing common diagnostics.
Usage
doOLS(MIdata, depvar,
indpv, rindpv=NULL, othexog=NULL, dw=TRUE,
lw=TRUE, ew=FALSE, stepW=FALSE, relimp=FALSE, slmtests=FALSE, haustest=NULL,
boxcox=FALSE, getismat=FALSE, mean.data=TRUE, doboot=0, full.set=FALSE)
Arguments
MIdata |
a multiply imputed dataset, created by the function doMI |
depvar |
the name of the dependent variable (must be in MIdata) |
indpv |
the names of the independent variables for the unrestricted model (must be in MIdata) |
rindpv |
names of restricted model independent variables (must be in indpv; when default of NULL is executed, the restricted model independent variables will be the same as the unrestricted model, minus the last variable) |
othexog |
names of additional exogenous variables (must be in MIdata; will be added to a list of 21 variables; default is NULL) |
dw |
Should geographic proximity be used in constructing composite weight matrix (default=TRUE) |
lw |
Should linguistic proximity be used in constructing composite weight matrix (default=TRUE) |
ew |
Should ecological proximity be used in constructing composite weight matrix (default=FALSE) |
stepW |
Should stepwise regression be done to show most-selected variables from unrestricted model (default=FALSE) |
relimp |
Should relative importance be calculated for independent variables of restricted model (default=FALSE) |
slmtests |
Should spatial error tests be run for the three weight matrices (default=FALSE) |
haustest |
Hausman tests (H0: variable exogenous) are run for each independent variable listed here (variable must be in the restricted model). Default of NULL runs no tests. |
boxcox |
When boxcox=TRUE, a Box-Cox transformation is applied to the dependent variable, to make residuals as normal as possible. Default is FALSE. |
getismat |
When getismat=TRUE, the distance weight matrix is modified in the way suggested by Getis and Aldstadt (2003). Default is FALSE. |
mean.data |
When mean.data=TRUE (the default), output file includes a dataframe with mean values (across imputations) of the unrestricted model variables for each society, as well as significant dfbeta scores for restricted model independent variables, and latitude and longitude. mean.data=FALSE returns the entire, unaggregated set of data. |
doboot |
Enter the number of bootstrap repetitions to calculate bootstrap standard errors. Legal values lie between 10 and 10, 000. The default (doboot=0) does not calculate bootstrap standard errors. |
full.set |
The default uses von Hippel’s recommended method of deleting observations for which the dependent variable is missing. To use all observations, use full.set=TRUE. |
Value
Returns a list with 14 elements:
DependVarb |
Description of dependent variable |
URmodel |
Coefficient estimates from the unrestricted model (includes standardized coefficients and VIFs). Two pvalues are given for H0: β =0. One is the usual pvalue, the other (hcpval) is heteroskedasticity consistent. If stepkept=TRUE, the table will also include the proportion of times a variable is retained in the model using stepwise regression. |
model.varbs |
Short descriptions of model variables: shows the meaning of the lowest and highest values of the variable. This can save a trip to the codebook. |
Rmodel |
Coefficient estimates from the restricted model. If relimp=TRUE, the R2 assigned to each independent variable is shown here. |
EndogeneityTests |
Hausman tests (H0: variable is exogneous), with F-statistic for weak instruments (a rule of thumb is that the instrument is weak if the F-stat is below 10), and Sargan test (H0: instrument is uncorrelated with second-stage 2SLS residuals). |
Diagnostics |
Regression diagnostics for the restricted model: RESET test (H0: model has correct functional form); Wald test (H0: appropriate variables dropped); Breusch-Pagan test (H0: residuals homoskedastic; Shapiro-Wilkes test (H0: residuals normal); Hausman test (H0: Wy is exogenous); Sargan test (H0: residuals uncorrelated with instruments for Wy). If slmtests=TRUE, the LaGrange multiplier tests (H0: spatial error model not appropriate) are reported here. |
OtherStats |
Other statistics: Composite weight matrix weights (see details); R2 for restricted model and unrestricted model; number of imputations; number of observations; Fstat for weak instruments for Wy. |
DescripStats.ImputedData |
Descriptive statistics for variables in unrestricted model. |
DescripStats.OriginalData |
Descriptive statistics for variables in unrestricted model. |
totry |
Character string of variables that were most significant in the unrestricted model as well as additional variables that proved significant using the add1 function on the restricted model. |
didwell |
Character string of variables that were most significant in the unrestricted model. |
usedthese |
Table showing how observations used differ from observations not used, regarding ecology, continent, and subsistence. |
dfbetas |
Influential observations for dfbetas (see details) |
data |
Data as used in the estimations. Observations with missing values of the dependent variable have been dropped. |
Details
Users can choose any of three kinds of proximity/similarity weight matrices for constructing a network lag term: geographic, linguistic, and ecological. In most cases, users should choose only geographic and linguistic (the defaults). The optimal composite weight matrix, constructed as the weighted sum of the chosen weight matrices, is that which returns the most significant LaGrange multiplier statistic on the unrestricted model without network lag term (i.e., the composite matrix that finds the most autocorrelated structure in the unrestricted model residuals). The network lag term is entered in each model as the variable "Wy".
The dfbetas are scaled changes in restricted model coefficient estimates caused by adding an observation to the restricted model. Negative values indicate that including that observation lowers the coefficient estimate; positive values indicate that inclusion raises the estimate. Only the most influential dfbetas are output.
The stepwise procedure can provide additional insight into which independent variables provide the best model fit. Since the imputed datasets differ slightly from each other, the variables selected by a stepwise procedure typically differ slightly for each imputed dataset. If the stepW=TRUE option is chosen, a column labeled "stepkept" will be added to the table reporting unrestricted model results. The column reports the proportion of times the independent variable was retained in the model by a stepwise procedure using both forward and backward selection.
The add1 function tests whether the members of a list of variables prove significant when added singly to a model. The list of variables includes all numeric variables in the imputed dataset, as well as squared terms of variables currently in the unrestricted regression. Variables proving significant in over 80 percent of the m estimated models are returned in the character string "totry".
Relative importance is a method of assigning R2 to each independent variable. The method repeatedly estimates a model, first with one independent variable, then with two, etc. and calculates the change in R2 as each variable is introduced. The order of entry is changed, and the process repeated, to consider all possible orders of entry. The relative importance measure is the average change in R2 when introducing an independent variable across all these different orders of entry. With large numbers of independent variables, the calculations are prohibitively slow. Setting relimp=TRUE will calculate the relative importance of independent variables in the restricted model, and report these in the column labeled "relimp".
Endogeneity is a recognized problem with network lag terms. The Hausman test for endogenous regressors is performed on Wy, which is replaced by an instrumental variable which is the fitted value from regressing Wy on the network lagged other independent variables. The instrumental variable should be highly correlated with the endogenous variable, but not correlated with the 2SLS second-stage residual. A test for the latter is the Sargan test, with H0: residuals are uncorrelated with instruments. A test for the former is to calculate the F-statistic with H0: the excluded instruments are irrelevant in the first-stage regression; the rule of thumb is that this "weak identification F-stat" should be larger than 10. Since the weak identification F-stat will be low if irrelevant instruments are chosen, a stepwise procedure is used to select among a set of possible instruments including both the network lagged independent variables and the climate and ecology variables.
All independent variables can be tested for endogeneity (squared variables are tested in their original form). For these, the potential instruments consist of the climate, location, and ecology variables, and stepwise regression is used to pick a significant subset. While these variables are certainly exogenous, they are unlikely to be good instruments, since finding good instruments is a process requiring a great deal of creativity and patience on the part of the econometrician, and is not something that can be automated. Thus, one should think carefully about variables that might serve as instruments for any variable one wishes to test for endogeneity, and include these in the othexog= option.
Heteroskedasticity biases the standard errors of estimated coefficients. If the Breusch-Pagan test rejects the null that errors are homoskedastic, one should use either the heteroskedasticity consistent p-values (hcpval) in the URmodel and Rmodel results, or the p-values from bootstrap standard errors. Bootstraps take a fairly long time to calculate, so one shouldn't set the number of repetitions too high; in most cases, good results can be obtained with doboot=500.
If the residuals are not normal, and introduction of new independent variables and functional form changes do not make them normal, one can use the Box-Cox transformation where the dependent variable y is now equal to (yλ-1)/λ and λ is chosen so as to make the residuals as normal as possible.
Note
Based on the methods proposed by Malcolm M. Dow and E. Anthon Eff.
Getis, A. and Aldstadt, J. (2002). Constructing the spatial weights matrix using a local statistic. Geographical Analysis 36:90-104.
Author(s)
Anthon Eff Anthon.Eff@mtsu.edu
Examples
scnn<-c("valchild", "v1649", "v1127", "v2137",
"v1265", "v245.d2")
smi<-doMI(scnn, nimp=10, maxit=7)
iv<-c("v1649", "v1127", "v2137",
"v1265", "v245.d2")
riv<- c("v1649",
"v1127", "v2137")
h<-doOLS(MIdata=smi, depvar="valchild", indpv=iv, rindpv=riv, othexog=NULL, dw=TRUE, lw=TRUE, ew=FALSE, stepW=FALSE, relimp=FALSE, slmtests=FALSE, haustest=NULL, boxcox=FALSE, getismat=FALSE, mean.data=TRUE, doboot=0, full.set=FALSE)
# look at first 11 elements in h
h[1:11]
doLogit Estimate logit model on multiply imputed data
Description
The function estimates an unrestricted and restricted logit model in a multiple imputation environment, with network lag term, providing common diagnostics.
Usage
doLogit(MIdata, depvar,
indpv, rindpv=NULL, dw=TRUE, lw=TRUE, ew=FALSE, doboot=500, mean.data=TRUE, getismat=FALSE, othexog=NULL, full.set=FALSE)
Arguments
MIdata |
a multiply imputed dataset, created by the function doMI |
depvar |
the name of the dependent variable (must be in MIdata) |
indpv |
the names of the independent variables for the unrestricted model (must be in MIdata) |
rindpv |
names of restricted model independent variables (must be in indpv; when default of NULL is executed, the restricted model independent variables will be the same as the unrestricted model, minus the last variable) |
dw |
Should geographic proximity be used in constructing composite weight matrix (default=TRUE) |
lw |
Should linguistic proximity be used in constructing composite weight matrix (default=TRUE) |
ew |
Should ecological proximity be used in constructing composite weight matrix (default=FALSE) |
doboot |
Enter the number of bootstrap repetitions to calculate bootstrap standard errors. Legal values lie between 10 and 10, 000. The default (doboot=500) is usually sufficient. |
mean.data |
When mean.data=TRUE (the default), output file includes a dataframe with mean values (across imputations) of the unrestricted model variables for each society, as well as predicted value and residuals for the restricted model, and latitude and longitude. mean.data=FALSE returns the entire, unaggregated set of data. |
getismat |
When getismat=TRUE, the distance weight matrix is modified in the way suggested by Getis and Aldstadt (2003). Default is FALSE. |
othexog |
names of additional exogenous variables (must be in MIdata; will be added to a list of 21 variables; default is NULL) |
full.set |
The default uses von Hippel’s recommended method of deleting observations for which the dependent variable is missing. To use all observations, use full.set=TRUE. |
Value
Returns a list with 8 elements:
DependVarb |
Description of dependent variable |
URmodel |
Coefficient estimates from the unrestricted; pvalues are from bootstrap standard errors. |
model.varbs |
Short description of model variables. Can save a trip to the codebook. |
Rmodel |
Coefficient estimates from the restricted model. |
Diagnostics1 |
Three likelihood ratio tests: LRtestNull-R (H0: all variables in restricted model have coefficients equal zero); LRtestNull-UR (H0: all variables in unrestricted model have coefficients equal zero); LRtestR-R (H0: variables in unrestricted model, not carried over to restricted model, have coefficients equal zero). One Wald test: waldtest-R (H0: variables in unrestricted model, not carried over to restricted model, have coefficients equal zero). |
Diagnostics2 |
Statistics without formal hypothesis tests. pLargest: the largest of proportion 1s or proportion 0s; the model should be able to outperform simply picking the most common outcome. pRight: proportion of fitted values that equal actual value of dependent variable. NetpRight=pRight-pLargest; this is positive in a good model. McIntosh.Dorfman: (num. correct 0s/num. 0s) + (num. correct 1s/num. 1s); this exceeds one in a good model; McFadden.R2 and Nagelkerke.R2 are two versions of pseudo R2. |
OtherStats |
Other statistics: Composite weight matrix weights; number of imputations; number of observations. |
data |
Data as used in the estimations. Observations with missing values of the dependent variable have been dropped. |
Details
Users can choose any of three kinds of proximity/similarity weight matrices for constructing a network lag term: geographic, linguistic, and ecological. In most cases, users should choose only geographic and linguistic (the defaults). The optimal composite weight matrix, constructed as the weighted sum of the chosen weight matrices, is that which returns the most significant LaGrange multiplier statistic on the unrestricted model without network lag term, estimated with OLS. The network lag term is entered in each model as the variable "Wy".
Endogeneity is a recognized problem with network lag terms. In the logit context, the network lag term will generate incorrect standard errors, so that the only legitimate p-values will be those coming from bootstrap standard errors. Bootstraps take a fairly long time to calculate, so one shouldn't set the number of repetitions too high; in most cases, good results can be obtained with doboot=500 (the default).
Note
Based on the methods proposed by Malcolm M. Dow and E. Anthon Eff.
Getis, A. and Aldstadt, J. (2002). Constructing the spatial weights
matrix using a local statistic. Geographical
Analysis 36:90-104.
McFadden, D. (1973). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in Econometrics.
New York: Academic Press.
McIntosh, C.
S., & Dorfman, J. H. (1992). Qualitative forecast evaluation: A test for
information value. American Journal of
Agricultural Economics, 74, 209-214.
Nagelkerke, N. J. D. (1991). A note on a
general definition of the coefficient of determination. Biometrika, 78, 691-692.
Author(s)
Anthon Eff Anthon.Eff@mtsu.edu
Examples
dpV<-"v67.d3"
UiV<-c("v2002.d2",
"v1845", "v1649", "v1127.d2", "v2137",
"v279.d5", "v213.d3",
"v1265", "v1",
"v234", "femecon.lp", "rectang")
RiV<-c("v1649",
"v1127.d2", "v2137", "v1265")
q<-doLogit(smi, depvar=dpV, indpv=UiV,
rindpv=RiV, dw=TRUE, lw=TRUE, ew=FALSE,
doboot=1000,
mean.data=TRUE, getismat=FALSE,
othexog=NULL)
#--look at first seven objects in q--
q[1:7]
doMNLogit Estimate multinomial logit model on multiply imputed data
Description
The function estimates an unrestricted and restricted multinomial logit model in a multiple imputation environment, with network lag term, providing marginal effects and a few common diagnostics. This is to be used in cases where the dependent variable is categorical, with three or more categories.
Usage
doLogit(MIdata,depvar,indpv,rindpv=NULL,dw=TRUE,lw=TRUE,doboot=200,subgrps=NULL,
full.set=FALSE)
Arguments
MIdata |
a multiply imputed dataset, created by the function doMI |
depvar |
the name of the dependent variable (must be categorical variable in MIdata) |
indpv |
the names of the independent variables for the unrestricted model (must be in MIdata) |
rindpv |
names of restricted model independent variables (must be in indpv; when default of NULL is executed, the restricted model independent variables will be the same as the unrestricted model, minus the last variable) |
dw |
Should geographic proximity be used in constructing composite weight matrix (default=TRUE) |
lw |
Should linguistic proximity be used in constructing composite weight matrix (default=TRUE) |
doboot |
Enter the number of bootstrap repetitions to calculate bootstrap standard errors. Legal values lie between 10 and 10,000. The default is 200. |
subgrps |
The name of a dummy variable, present in MIdata, used to compare mean marginal effects in two halves of the data. The default does not divide the data to compare marginal effects. |
full.set |
The default uses von Hippel’s recommended method of deleting observations for which the dependent variable is missing. To use all observations, use full.set=TRUE. |
Value
Returns a list
with 23 elements:
DependVarb |
Description of dependent variable |
URmeanME.MargEff |
Mean marginal effects for unrestricted model, with Fst,
df, and pvalue |
URmeanME.MEpval |
Mean marginal effects for unrestricted model. Pvalues
only. |
URmeanME.MEmean |
Mean marginal effects for unrestricted model. Mean only. |
RmeanME.MargEff |
Mean marginal effects for restricted model, with Fst,
df, and pvalue |
RmeanME.MEpval |
Mean marginal effects for restricted model. Pvalues
only. |
RmeanME.MEmean |
Mean marginal effects for restricted model. Mean only. |
URdifME |
Differences in mean marginal effects across alternatives: unrestricted
model. |
RdifME |
Differences in mean marginal effects across alternatives: restricted
model. |
URcoef |
Coefficient estimates from the unrestricted model. |
Rcoef |
Coefficient estimates from the restricted model. |
TestRestr |
Two tests for model restrictions (H0: dropped variables don’t belong
in the model). |
TestIIA |
Tests for each alternative of Independence of Irrelevant Alternatives
(H0: dropping alternative does not affect choice for other alternatives). |
URpredTable.predTable |
Table comparing predicted choices with actual choices: unrestricted
model. |
URpredTable.crlg |
Ratio of number of correct choices over number in largest alternative:
unrestricted model. |
RpredTable.predTable |
Table comparing predicted choices with actual choices: restricted
model. |
RpredTable.crlg |
Ratio of number of correct choices over number in largest alternative:
restricted model. |
OtherStats |
Other statistics: Composite weight matrix weights; ratio of number of
correct predictions over number in largest category; number of imputations;
number of observations; number of bootstrap iterations. |
UsubgrpDiff |
Comparing mean marginal effects across two subgroups indicated by 0,1 binary variable: mean of group 1 minus mean of group 0,
with pvalue. Unrestricted model. |
RsubgrpDiff |
Comparing mean marginal effects across two subgroups indicated by 0,1 binary variable: mean of group 1 minus mean of group 0,
with pvalue. Restricted model. |
URmarEff |
Society-level marginal effects calculated using final coefficient
values and mean (across imputations) data values: unrestricted model. |
RmarEff |
Society-level marginal effects calculated using final coefficient
values and mean (across imputations) data values: restricted model. |
data |
Mean (across imputations) data values for each society. |
Details
A spatial lag term is found by combining a geographic and linguistic proximity matrix. The optimal composite weight matrix, constructed as the weighted sum of the chosen weight matrices, is that which returns the highest log likelihood ratio on the unrestricted model. The network lag term is entered in each model as the variable "Wy".
Endogeneity is a recognized problem with network lag
terms. In the multinomial logit context, the network lag term will generate
incorrect standard errors, so that the only legitimate p-values will be those
coming from bootstrap standard errors. These bootstraps take a very long time
to calculate, so one shouldn't set the number of repetitions too high. The
default is doboot=200, but 300 to 1000 should be used for published work.
The signs of coefficient estimates are not meaningful
in multinomial logit models, since the marginal effects are a function of all
coefficient values and data values. The marginal effects will be unique for
each society, for each variable, for each alternative. It is traditional to
take the mean marginal effect, for each variable, for each alternative (i.e.,
take the mean across societies) and use bootstrapping to test whether the
marginal effect is significantly different from zero.
Occasionally, one might be interested in how marginal effects vary between two subsets of the data. For example, one might want to compare the marginal effects for foragers versus non-foragers.
Note
Based on the methods proposed by Malcolm M. Dow and E. Anthon Eff.
Author(s)
Anthon Eff Anthon.Eff@mtsu.edu
Examples
dpV<-"residence"
UiV<-c("enviro.mean","anim.mean","path.mean","localviol.mean","femecon.mean","tech")
RiV<-c("anim.mean","localviol.mean","femecon.mean","tech")
h<-doMNLogit(smi,dpV,UiV,RiV,doboot=300,subgrps="nomadic")
CSVwrite(h,"mnl0",FALSE)
MEplots(h,mod="R",filetitle="nom",setylim=RiV,subgrps="nomadic",dpires=300)
CSVwrite Write object to *.csv file
Description
The function writes an object, with elements capable of being coerced to a dataframe, to a csv file. It is used to write the output from doOLS or doLogit to a file that can be read by a spreadsheet.
Usage
CSVwrite(object, filestem,
appnd2=FALSE)
Arguments
object |
Object to be written—typically
output from function doOLS
or doLogit |
filestem |
The base name of the *.csv file
(do not include the ".csv" extension) |
appnd2 |
Should the object be appended to
the existing file? (default=FALSE) |
Value
No values are returned in the R environment; only changes occur to the specified *.csv file.
Details
Set the option appnd2=TRUE to append the output of object to an existing file with base name "filestem". The default will simply overwrite any existing csv file with base name "filestem".
Note
Author(s)
Anthon Eff Anthon.Eff@mtsu.edu
Examples
CSVwrite(h, "olsresults",
FALSE)
mkmappng Create png format map for values of ordinal variable
Description
This function writes a png format Pacific-centered world map file to the working directory. Dots represent societies, and the size and color of the dots reflects the value of a variable specified by the user. Options allow presentation of information about local autocorrelation and dfbetas.
Usage
mkmappng (usedata, varb, filetitle=NULL, show="ydata", numnb.lg=3, numnb.lm=20, numch=0, pvlm=.05, dfbeta.show=FALSE, zoom=FALSE, map.width=8, map.height=5, map.units="in", map.pointsize=10, map.res=500)
Arguments
usedata |
Name of a dataframe. It
must contain a column named "lati" and a column
named "long" (latitude and longitude in decimal degrees) |
varb |
Name of a variable in the dataframe. |
filetitle |
Stem title of png file (".png" suffix added automatically). Default is same as varb. |
show |
Type of value to display. Legal values are lgt (local G), ydata
(original data values), lmtp (classifies
points into significant and non-significant local autocorrelation, based on
local Moran), and lmtz (local Moran
z-value). Default is lgt. |
numnb.lg |
Number of nearest neighbors to use when creating local
G. Default is 3. |
numnb.lm |
Number of nearest neighbors to use when creating
local Moran. Default is 20. |
numch |
Number of convex hulls to draw around regions of
local autocorrelation. Default is 0. |
pvlm |
Cut-off p-value for considering a local Moran
statistic significant. Default is 0.05. |
dfbeta.show |
Should map indicate points with significant dfbeta
values for this variable. Default is FALSE. |
zoom |
Should map zoom in to plotted
points. Default is FALSE. Set to TRUE when using WNAI data. |
map.width |
Parameter for png map
file. This gives width of map. Default is 8. |
map.height |
Parameter for png map
file. This gives height of map. Default is 5. |
map.units |
Parameter for png map
file. This gives units in which width and height are measured. Default is
"in". |
map.pointsize |
Parameter for png map
file. This gives pointsize. Default is 10. |
map.res |
Parameter for png map
file. This gives resolution of map file. Default is 500 dpi. |
Value
The function writes a png format map to a file in the working directory. Larger values of the mapped variable are shown as larger and darker (reddish) circles; smaller values are shown as smaller and lighter (yellowish) circles.
Details
Option show=lgt gives the local G statistic, which is essentially a spatial moving average, converted to a z-score. It is a reasonable way to smooth—spatially—map points. The default uses only the three nearest neighbors, plus self, to calculate this spatial moving average.
The local Moran is a test for autocorrelation, i.e. the degree to which a society has values similar to those of its neighbors, where the default number of neighbors is 20. Option show=lmtz will display the local Moran z-score, and option show=lmtp displays the binary significant/not significant for the z-score, using the p-value given in option pvlm. Convex hulls are drawn around areas of significant positive local autocorrelation; one must input the number of convex hulls to draw, but otherwise assignment of a point to a specific convex hull is automatic, based on distances between points. Usually some experimentation is needed to find the correct number of convex hulls, and it is easiest to do this experimentation on maps where show=lmtp.
This function is intended for use with data relevant to models estimated by the function doOLS. The function doOLS has the option mean.data, when this is set to TRUE (the default), the output from doOLS contains a dataframe with values for the dependent and independent variables (including Wy) calculated as the mean across all imputed datasets. There are also latitude and longitude coordinates, and the mean values of the dfbetas for variables used in the restricted model. The societies which, when included, cause a significant change in the estimated parameter in the restricted model, can be shown in the map when dfbeta.show=TRUE. Triangles pointing upward indicate societies which, when included, significantly increase the value of the coefficient; triangles pointing downward indicate societies whose inclusion significantly lowered the value of the coefficient.
Note
Author(s)
Anthon Eff Anthon.Eff@mtsu.edu
Examples
dpV<-"v67.d3"
UiV<-c("v2002.d2", "v1845", "v1649", "v1127.d2", "v2137", "v279.d5", "v213.d3",
"v1265", "v1", "v234", "femecon.lp", "rectang")
RiV<-c("v1649", "v1127.d2", "v2137", "v1265")
h<-doOLS(MIdata=smi, depvar=dpV, indpv=UiV, rindpv=RiV, othexog=NULL,
dw=TRUE, lw=TRUE, ew=FALSE, stepW=TRUE, boxcox=FALSE, getismat=FALSE,
relimp=TRUE, slmtests=FALSE, haustest=NULL, mean.data=TRUE, doboot=500)
p<-h[[12]]
# experimenting to find the right number of convex hulls
sapply(2:11, function(x) mkmappng(p, "femecon.lp", paste("Womenswork", x, sep=""),
show="lmtp", numch=x, dfbeta.show=TRUE))
# creates file called "Womenswork_ydata.png"
mkmappng(usedata=p, varb="femecon.lp", filetitle="Womenswork", show="ydata", numch=8, dfbeta.show=TRUE)
mkcatmappng Create png format map for values of categorical variable
Description
This function writes a png format Pacific-centered world map file to the working directory. Symbols represent societies, and the shape and color of the symbols represent the categories of a variable specified by the user.
Usage
mkcatmappng (usedata, varb, filetitle, zoom=FALSE, map.width=8, map.height=5, map.units="in", map.pointsize=10, map.res=500)
Arguments
usedata |
Name of a dataframe. It
must contain a column named "lati" and a column
named "long" (latitude and longitude in decimal degrees) |
varb |
Name of a variable in the dataframe. |
filetitle |
Stem title of png file (".png" suffix added automatically). Default is same as varb. |
zoom |
Should map zoom in to plotted
points. Default is FALSE. Set to TRUE when using WNAI data. |
map.width |
Parameter for png map
file. This gives width of map. Default is 8. |
map.height |
Parameter for png map
file. This gives height of map. Default is 5. |
map.units |
Parameter for png map
file. This gives units in which width and height are measured. Default is
"in". |
map.pointsize |
Parameter for png map
file. This gives pointsize. Default is 10. |
map.res |
Parameter for png map
file. This gives resolution of map file. Default is 500 dpi. |
Value
The function writes a png format map to a file in the working directory. A legend identifies the category represented by each symbol.
Details
This function is intended for cases where the plotted
variable is categorical. Symbols for each society have a color and shape
representing the category, and a legend associates the symbols with the
category label. In general, this map will be most effective when the number of
categories is small (six or fewer).
When using the WNAI data, one should set zoom=TRUE so that the map centers on western North America.
Note
Author(s)
Anthon Eff Anthon.Eff@mtsu.edu
Examples
mkcatmappng(dx,"ekd","Zekd",zoom=TRUE)
plotSq Make plots of marginal effects of all independent variables with squared terms
Description
The function takes output from doOLS or doLogit, scans the independent variables in the restricted model for variables with squared terms, and creates plots of their marginal effects on the dependent variable
Usage
plotSq(x,filetitle=NULL)
Arguments
x |
name of output from doOLS or doLogit |
filetitle |
name of png file (default=NULL will write plots to GUI) |
Value
The function creates plots of the marginal effects of all restricted model independent variables with squared terms.
Details
In a linear regression, the sign of the marginal effect is simply the
sign of the coefficient. But with polynomial expressions, the marginal effect
sign will vary over the values of the independent variable. These plots show
the pattern of variation in cases where an independent variable is entered as a
quadratic or simply as a squared term. The abscissa gives the values of the
variable found in the averaged data, while the ordinate gives the marginal effect
on the dependent variable. The number of observations at each value is shown
both by the rugplots in green at the top of the plot,
and by the size of the red circles at each variable value.
One must specify the filetitle in order to save the plot to a png format file with name filetitle.png.
Note
Author(s)
Anthon Eff Anthon.Eff@mtsu.edu
Examples
plotSq(h)
MEplots Make plots of marginal effects of all independent variables used in doMNLogit estimation
Description
The function takes output from doMNLogit, and produces boxplots showing the range of marginal effects, by alternative, for each independent variable.
Usage
plotSq(x,mod="R",varbs=NULL,filetitle=NULL,setylim=NULL,subgrps=NULL,dpires=500)
Arguments
x |
name of output from doMNLogit |
mod |
"R" plots marginal effects from restricted model; "UR" from unrestricted |
varbs |
names of variables to plot. Default will plot all variables. |
filetitle |
name of png file (default=NULL will write plots to GUI) |
setylim |
list of independent variable names for which plots should have the same y-axis range |
subgrps |
If the subgrps option was used in doMNLogit, can invoke it here as well to display separate boxplots for each subgroup. |
dpires |
set the dots per inch resolution of the png file (300 is the usual "publication quality", higher is even better). |
Value
The function creates plots of the effects of all restricted model independent variables with squared terms.
Details
One must specify the filetitle in order to save plot to png format file names filetitle.png.
Note
Author(s)
Anthon Eff Anthon.Eff@mtsu.edu
Examples
MEplots(h)