The functionality of this webpage is constrained in D2L, and you might find it easier to read and navigate if you download this html file to your own computer.
Temporal autocorrelation occurs in time series when residuals are correlated with their immediately preceding values. Here is a graphical illustration of positive autocorrelation (this is much more common than negative autocorrelation). Note that each residual is likely to be of the same sign (positive or negative) as the immediately preceding residual.
Autocorrelation between a residual and its lagged-one value is first order autocorrelation; autocorrelation over two lags is termed second order, and so on.
#Autocorrelated time series
#U.S. farm population 1945-1986
#variables are FARMPOP, YEAR, and TIME
download.file("http://capone.mtsu.edu/eaeff/6060/farmpop.csv","z.csv") # downloads to working directory, gives name z.csv
uu<-read.csv("z.csv",as.is=TRUE)
head(uu)
## YEAR TIME FARMPOP
## 1 1945 1 0.205
## 2 1946 2 0.192
## 3 1947 3 0.179
## 4 1948 4 0.166
## 5 1949 5 0.162
## 6 1950 6 0.152
summary(fm<-lm(FARMPOP~TIME,data=uu))
##
## Call:
## lm(formula = FARMPOP ~ TIME, data = uu)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.019641 -0.014876 -0.004423 0.010468 0.043733
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1652044 0.0053783 30.72 <2e-16 ***
## TIME -0.0039375 0.0002179 -18.07 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01712 on 40 degrees of freedom
## Multiple R-squared: 0.8909, Adjusted R-squared: 0.8881
## F-statistic: 326.5 on 1 and 40 DF, p-value: < 2.2e-16
#--look at charts--
layout(matrix(1:2,1,2))
## visualize residuals
plot(uu$YEAR,residuals(fm),type="b",pch=19,ylim=range(residuals(fm)),ylab = "Residuals")
abline(h =0,lty=2,col="blue")
## visualize fitted series
plot(uu$YEAR,uu$FARMPOP,lty=1,lwd=1,type="b",pch=19,ylab="Pct. Population on Farm")
lines(uu$YEAR,fitted(fm), col ="red")
layout(1)
This Monte Carlo simulation runs through 500 iterations. Each time, the intercept is set to zero and the slope coefficient for the single independent variable is set to six. The independent variable is identical for all iterations. Only the error terms change, so that the value of the dependent variable will change in each iteration.
We make two versions of the error term: one is autocorrelated, so that the error for an observation is significantly correlated with errors of previous observations. The other is not autocorrelated. We run two regressions, one with autocorrelated errors, the other with errors that are not autocorrelated. Each time we collect the estimated coefficients and standard errors.
#--Monte Carlo: Temporal Autocorrelation--
#---We make up our data, 5000 times---
#---Since we make the data, we KNOW our coefficients--
#---We estimate the coefficients and compare the estimates with true values--
est<-corvals<-NULL
nobs<-30
truecoef<-6
x<-sample(1:9,replace=TRUE,nobs)
for (i in 1:5000){
zerr<-ts(rnorm(nobs+4))
aerr<-ts(as.matrix(embed(zerr,5))%*%as.matrix((5:1)/sum(5:1)))
kk<-ts.intersect(aerr,zerr)
aerr<-as.numeric(scale(kk[,"aerr"])*12)#mean(aerr);sd(aerr) # give same mean and sd to both error terms
err<-as.numeric(scale(kk[,"zerr"])*12) #mean(err);sd(err)
y1<-x*truecoef+err
y2<-x*truecoef+aerr
ec<-ts.intersect(ts(err),lag(ts(err),-1))
aec<-ts.intersect(ts(aerr),lag(ts(aerr),-1))
corvals<-rbind(corvals,data.frame(NOac.corr=cor(ec)[1,2],ac.corr=cor(aec)[1,2]))
z1<-summary(lm(y1~x))
NoacBeta<-z1$coefficients[2,1]
NoacSE<-z1$coefficients[2,2]
z2<-summary(lm(y2~x))
acBeta<-z2$coefficients[2,1]
acSE<-z2$coefficients[2,2]
est<-rbind(est,data.frame(NoacBeta,NoacSE,acBeta,acSE))
}
First we want to verify that the residuals were indeed correlated as we thought (uncorrelated in one set of regressions, positively correlated in the other).
apply(corvals,2,mean)
## NOac.corr ac.corr
## -0.03477958 0.65040639
Now we want to compare the estimated coefficients and standard errors with their true values. For the coefficients, we simply compare the true value (six) with the mean of the coefficients from the non-autocorrelated-residuals and autocorrelated-residuals regressions.
#-True value of coefficient is 6; compare that to the mean of the estimated coefficients--
apply(est[,c("NoacBeta","acBeta")],2,mean)
## NoacBeta acBeta
## 6.005553 5.996245
For the standard errors we want to compare the true value with the mean of the estimated standard errors. What is the true value? It is simply the standard deviation of the estimated coefficients.
#-True standard error is the standard deviation of the coefficient estimates--
#-Compare true standard error with the mean of the estimated standard errors--
apply(est[,c("NoacSE","acSE")],2,mean)
## NoacSE acSE
## 0.9001753 0.9082001
apply(est[,c("NoacBeta","acBeta")],2,sd)
## NoacBeta acBeta
## 0.9103229 0.6574592
We have 500 estimates of the slope coefficients and a histogram allows one to see if the histogram centers on the true value.
#--check if the coefficient is biased---
xrange<-range(c(est$NoacBeta,est$acBeta,truecoef))*c(.9,1.1)
layout(matrix(1:2,1,2))
hist(est$NoacBeta,breaks=50,xlim=xrange)
lines(density(est$NoacBeta),col="blue")
abline(v=truecoef,col="red",lty=2,lwd=2)
hist(est$acBeta,breaks=50,xlim=xrange)
lines(density(est$acBeta),col="blue")
abline(v=truecoef,col="red",lty=2,lwd=2)
layout(1)
Next we can compare the estimated standard errors with the true value of the standard error (the standard deviation of the estimated coefficients)
#--check if the standard error is biased--
xrange<-range(c(est$NoacSE,est$acSE,sd(est$NoacBeta),sd(est$acBeta)))*c(.9,1.1)
layout(matrix(1:2,1,2))
hist(est$NoacSE,breaks=30,xlim=xrange)
mtext("true value: red line; mean estimate: blue line")
abline(v=sd(est$NoacBeta),col="red",lty=2,lwd=2)
abline(v=mean(est$NoacSE),col="blue",lty=3,lwd=2)
hist(est$acSE,breaks=30,xlim=xrange)
mtext("true value: red line; mean estimate: blue line")
abline(v=sd(est$acBeta),col="red",lty=2,lwd=2)
abline(v=mean(est$acSE),col="blue",lty=3,lwd=2)
layout(1)
The simulation results reflect what statistical theory proves: the coefficients are unbiased but the standard errors are biased.
The problem this causes us is that the t-statistics reported in our regression will be wrong. They test the null hypothesis that the true value of the estimated coefficient is zero, and are constructed as the ratio of the estimated coefficient to the estimated standard error. So we cannot trust our t-statistics and therefore cannot trust our p-values.
The most common test for Temporal Autocorrelation is the Breusch-Godfrey test. The null hypothesis is that the residuals are not autocorrelated. The code below will test for first- through fourth-order autocorrelation:
fm<-lm(FARMPOP~TIME,data=uu) bgtest(fm, order = 4)
Autocorrelation is probably the product of model misspecification. On encountering autocorrelation, your first reaction should be to try to respecify your model. That is, you should hunt for some other way of modeling the relationship between your dependent and independent variables.
It often happens that autocorrelation vanishes when dynamics (i.e., lagged values) are introduced in a time series model. By this is meant that lagged values of the dependent or one or more independent variables are included as independent variables.
For the first figure on this page, the shape of the graph suggests that a semi-log model or a polynomial model would result in non-autocorrelated residuals.
If respecification is unsuccessful, a corrected estimate of the standard error will correct for the problem of unreliable t-statistics. You can use bootstrapping to correct your standard errors, if you are unable to remove autocorrelation by respecifying. The R package AER contains an estimate of the Newey-West autocorrelation consistent covariance matrix, which is used as below:
fm<-lm(FARMPOP~TIME,data=uu) coeftest(fm,vcov=NeweyWest(fm,lag=4))
We will bring in income and consumption time series data from FRED, and build a consumption function model. We will test the FRED series for stationarity, and if they are non-stationary, will take the first differences and test those.
#--bring in GDP and Personal Consumption Expenditures from FRED at St.Louis Fed--
ww<-pdfetch_FRED(c("GDP","PCEC"))
z<-removeNA(merge(ww,diff(ww))) # merge ww with first differences of ww; remove missing values
head(z) # always look at your data
## GDP PCEC GDP.1 PCEC.1
## 1947-06-30 245.968 160.031 2.804 3.870
## 1947-09-30 249.585 163.543 3.617 3.512
## 1947-12-31 259.745 167.672 10.160 4.129
## 1948-03-31 265.742 170.372 5.997 2.700
## 1948-06-30 272.567 174.142 6.825 3.770
## 1948-09-30 279.196 177.072 6.629 2.930
plot(as.ts(z)) # plot if you can
#--augmented Dickey-Fuller test--H0: series non-stationary--
for (i in colnames(z)){print(i);print(suppressWarnings(adf.test(z[,i])))}
## [1] "GDP"
##
## Augmented Dickey-Fuller Test
##
## data: z[, i]
## Dickey-Fuller = -0.95931, Lag order = 6, p-value = 0.9439
## alternative hypothesis: stationary
##
## [1] "PCEC"
##
## Augmented Dickey-Fuller Test
##
## data: z[, i]
## Dickey-Fuller = -0.92061, Lag order = 6, p-value = 0.9501
## alternative hypothesis: stationary
##
## [1] "GDP.1"
##
## Augmented Dickey-Fuller Test
##
## data: z[, i]
## Dickey-Fuller = -4.9267, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary
##
## [1] "PCEC.1"
##
## Augmented Dickey-Fuller Test
##
## data: z[, i]
## Dickey-Fuller = -4.5353, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary
ww<-removeNA(diff(ww)) # work with first difference, since first difference is stationary
colnames(ww)<-c("Y","C") # rename columns
head(ww)
## Y C
## 1947-06-30 2.804 3.870
## 1947-09-30 3.617 3.512
## 1947-12-31 10.160 4.129
## 1948-03-31 5.997 2.700
## 1948-06-30 6.825 3.770
## 1948-09-30 6.629 2.930
The simplest specification of the consumption function would be lm(C~Y,data=ww)
, but we want to include lagged values of the dependent and independent variables.
zww<-removeNA(merge(ww,lag(ww,1),lag(ww,2),lag(ww,3),lag(ww,4))) # merge ww with lagged values of ww; remove missing values
zww<-zww[,order(colnames(zww))] # reorder columns to make easier to read
head(zww)
## C C.1 C.2 C.3 C.4 Y Y.1 Y.2 Y.3 Y.4
## 1948-06-30 3.770 2.700 4.129 3.512 3.870 6.825 5.997 10.160 3.617 2.804
## 1948-09-30 2.930 3.770 2.700 4.129 3.512 6.629 6.825 5.997 10.160 3.617
## 1948-12-31 0.856 2.930 3.770 2.700 4.129 1.170 6.629 6.825 5.997 10.160
## 1949-03-31 -1.097 0.856 2.930 3.770 2.700 -5.332 1.170 6.629 6.825 5.997
## 1949-06-30 1.615 -1.097 0.856 2.930 3.770 -3.683 -5.332 1.170 6.629 6.825
## 1949-09-30 -0.640 1.615 -1.097 0.856 2.930 1.538 -3.683 -5.332 1.170 6.629
plot(as.ts(zww))
summary(cf<-lm(C~Y+Y.1+Y.2+C.1+C.2,data=zww))
##
## Call:
## lm(formula = C ~ Y + Y.1 + Y.2 + C.1 + C.2, data = zww)
##
## Residuals:
## Min 1Q Median 3Q Max
## -77.238 -8.932 -2.645 6.029 115.196
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.076924 1.967653 1.056 0.29208
## Y 0.708914 0.008797 80.582 < 2e-16 ***
## Y.1 0.123193 0.043404 2.838 0.00486 **
## Y.2 0.016886 0.042027 0.402 0.68814
## C.1 -0.249502 0.059818 -4.171 4.03e-05 ***
## C.2 -0.048487 0.060468 -0.802 0.42330
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.64 on 285 degrees of freedom
## Multiple R-squared: 0.9601, Adjusted R-squared: 0.9594
## F-statistic: 1372 on 5 and 285 DF, p-value: < 2.2e-16
vif(cf)
## Y Y.1 Y.2 C.1 C.2
## 1.044222 25.258703 16.974987 25.540423 17.314686
We do the Breusch-Godfrey test. The null hypothesis is that there is no autocorrelation.
#--Breusch-Godfrey test-H0: no autocorrelation--
bgtest(cf,order=4)
##
## Breusch-Godfrey test for serial correlation of order up to 4
##
## data: cf
## LM test = 7.0651, df = 4, p-value = 0.1325
Since the p-value is below 0.05 we reject the null hypothesis, which means that we have the problem of Temporal Autocorrelation.
Our first reaction should be to respecify the model, in this case by adding or removing lagged values of \(C\) and \(Y\). This is left to you as an exercise: respecify the model until the residuals are no longer significantly autocorrelated.
The diagram below depicts the pecuniary flows in the circular flow diagram. The numbers in red show the proportion of income that follows any particular path.
Factor income flows from firms to households, but 20% of income is diverted in the form of taxes. Once income arrives at households, about 10% leaks out of the system as payments for imports; another 5% goes into savings. That leaves 65% of income that comes back again as consumption spending. That 65% is important enough that we have a special name for it: the Marginal Propensity to Consume (MPC). The MPC is the proportion of an additional dollar of income that is spent on consumption.
Once you have a good model, calculate the Marginal Propensity to Consume (MPC). This is straightforward with a simple consumption function:
\[ C_{t}=\hat\alpha_0+\hat\beta_0 Y_{t} \tag {1}\] \[MPC=\frac{\delta C}{\delta Y}=\hat\beta_0 \] But the MPC is not that straightforward in a more realistic model with lag terms. \[ C_{t}=\hat\alpha_0+\hat\alpha_1 C_{t-1}+\hat\alpha_2 C_{t-2}+\hat\beta_0 Y_{t}+\hat\beta_1 Y_{t-1}+\hat\beta_2 Y_{t-2} \tag {2}\] We have to imagine the relationship between \(C\) and \(Y\) to be in long-run equilibrium, so that neither \(C\) nor \(Y\) are changing: \(C_{t}=C_{t-1}=C_{t-2}=C^{\prime}\) and \(Y_{t}=Y_{t-1}=Y_{t-2}=Y^{\prime}\).1
With this assumption, we rewrite Equation (2): \[ C^{\prime}=\hat\alpha_0+\hat\alpha_1 C^{\prime}+\hat\alpha_2 C^{\prime}+\hat\beta_0 Y^{\prime}+\hat\beta_1 Y^{\prime}+\hat\beta_2 Y^{\prime}\]
\[ C^{\prime}=\hat\alpha_0+(\hat\alpha_1+\hat\alpha_2) C^{\prime}+(\hat\beta_0 +\hat\beta_1+\hat\beta_2) Y^{\prime}\]
\[ (1-(\hat\alpha_1+\hat\alpha_2))C^{\prime}=\hat\alpha_0+(\hat\beta_0 +\hat\beta_1+\hat\beta_2) Y^{\prime}\] \[ C^{\prime}=\frac{\hat\alpha_0}{(1-(\hat\alpha_1+\hat\alpha_2))} +\frac{(\hat\beta_0 +\hat\beta_1+\hat\beta_2)}{(1-(\hat\alpha_1+\hat\alpha_2))} Y^{\prime}\]
\[MPC=\frac{\delta C^{\prime}}{\delta Y^{\prime}}=\frac{(\hat\beta_0 +\hat\beta_1+\hat\beta_2)}{(1-(\hat\alpha_1+\hat\alpha_2))}\]
Calculating a derivative at long-run equilibrium, using a model with lagged terms, turns out to be extremely useful. The derivative is called the long-run multiplier (LRM).
The long run multiplier can be readily calculated in R, as below.
#--calculate long run multiplier--
u<-coef(cf)
MPC1<-sum(u[grep("Y",names(u))])/(1-sum(u[grep("C",names(u))]))
MPC1
## [1] 0.6540839
The long run multiplier formula is complicated, and a confidence interval can most easily be calculated by bootstrapping. Note that the function we use to bootstrap time series (tsboot()
) is different from the function we used for cross-sectional (boot()
). The difference (hidden within the function) is that rather than sampling individual observations from the data, we extract blocks of data, to preserve some of the sequential property of the time series.
#--bootstrap the long run multiplier--
fzz<-formula(cf$call) # extracting the formula from our previous regression
Slrm<-function(data,i){coef(lm(fzz,data=data[i,]))} # our function simply keeps the estimated coefficients
bs<-tsboot(tseries=zww,statistic=Slrm,R=500,l=20,sim="geom")
colnames(bs$t)<-names(coef(cf))
head(bs$t)
## (Intercept) Y Y.1 Y.2 C.1 C.2
## [1,] -0.1497223 0.7078944 0.07079571 0.077080970 -0.184809494 -0.14043238
## [2,] 1.2243168 0.4488750 0.11311402 0.021495024 0.003045454 0.09220373
## [3,] 1.9381887 0.7112258 0.01925804 -0.002106380 -0.110956319 -0.01980252
## [4,] 3.3453447 0.4708896 0.13096561 0.006927385 -0.058378457 0.09066521
## [5,] 1.7924921 0.7175858 0.11516964 0.028821776 -0.240229056 -0.06769921
## [6,] 1.0230420 0.4084635 0.12597074 0.002157195 -0.015365511 0.12065441
MPC<-apply(bs$t[,grep("Y",colnames(bs$t))],1,sum)/(1-apply(bs$t[,grep("C",colnames(bs$t))],1,sum))
hist(MPC,breaks=50)
abline(v=quantile(MPC,c(.05,.95)),col="red")
abline(v=MPC1,col="blue",lwd=2,lty=2)
pt(abs(MPC1/sd(MPC)),(NROW(zww)-length(coef(cf))),lower.tail=FALSE) #H0: long run multiplier=0
## [1] 1.943543e-95
pt(abs((MPC1-0.9)/sd(MPC)),(NROW(zww)-length(coef(cf))),lower.tail=FALSE) #H0: long run multiplier=0.9
## [1] 5.137448e-27
pt(abs((MPC1-0.55)/sd(MPC)),(NROW(zww)-length(coef(cf))),lower.tail=FALSE) #H0: long run multiplier=0.55
## [1] 4.372679e-07
coeftest(cf,vcov=cov(bs$t)) # corrected standard errors for you estimated coefficients
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.076924 2.524813 0.8226 0.41142
## Y 0.708914 0.120486 5.8838 1.123e-08 ***
## Y.1 0.123193 0.057522 2.1417 0.03307 *
## Y.2 0.016886 0.048243 0.3500 0.72657
## C.1 -0.249502 0.102454 -2.4353 0.01549 *
## C.2 -0.048487 0.086785 -0.5587 0.57680
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In sum, you should be able to answer these four questions on a test: