1 Try downloading this webpage

The functionality of this webpage is constrained in D2L, and you might find it easier to read and navigate if you download this html file to your own computer.

2 Resources to learn R

R tutorial on YouTube (21 videos, total time: 1 hour, 7 minutes)
Cheat sheets
R web search Use this to hunt for documentation for specific package or function

3 Temporal autocorrelation

3.1 What is it?

Temporal autocorrelation occurs in time series when residuals are correlated with their immediately preceding values. Here is a graphical illustration of positive autocorrelation (this is much more common than negative autocorrelation). Note that each residual is likely to be of the same sign (positive or negative) as the immediately preceding residual.

Autocorrelation between a residual and its lagged-one value is first order autocorrelation; autocorrelation over two lags is termed second order, and so on.

#Autocorrelated time series
#U.S. farm population 1945-1986
#variables are FARMPOP, YEAR, and TIME
download.file("http://capone.mtsu.edu/eaeff/6060/farmpop.csv","z.csv") # downloads to working directory, gives name z.csv
uu<-read.csv("z.csv",as.is=TRUE)

head(uu)

##   YEAR TIME FARMPOP
## 1 1945    1   0.205
## 2 1946    2   0.192
## 3 1947    3   0.179
## 4 1948    4   0.166
## 5 1949    5   0.162
## 6 1950    6   0.152

summary(fm<-lm(FARMPOP~TIME,data=uu))

## 
## Call:
## lm(formula = FARMPOP ~ TIME, data = uu)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.019641 -0.014876 -0.004423  0.010468  0.043733 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.1652044  0.0053783   30.72   <2e-16 ***
## TIME        -0.0039375  0.0002179  -18.07   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01712 on 40 degrees of freedom
## Multiple R-squared:  0.8909, Adjusted R-squared:  0.8881 
## F-statistic: 326.5 on 1 and 40 DF,  p-value: < 2.2e-16

#--look at charts--
layout(matrix(1:2,1,2))
## visualize residuals
plot(uu$YEAR,residuals(fm),type="b",pch=19,ylim=range(residuals(fm)),ylab = "Residuals")
abline(h =0,lty=2,col="blue")
## visualize fitted series
plot(uu$YEAR,uu$FARMPOP,lty=1,lwd=1,type="b",pch=19,ylab="Pct. Population on Farm")
lines(uu$YEAR,fitted(fm), col ="red")

layout(1)

3.2 Why is it a problem?

Autocorrelation does not cause a bias in your parameter estimates.
Autocorrelation, however, will create a bias in the standard errors of the estimates- for positive autocorrelation, the estimated standard error will be smaller than the true standard error.
The result is that one can no longer trust the t-statistics from OLS.

3.2.1 Monte Carlo

This Monte Carlo simulation runs through 500 iterations. Each time, the intercept is set to zero and the slope coefficient for the single independent variable is set to six. The independent variable is identical for all iterations. Only the error terms change, so that the value of the dependent variable will change in each iteration.

We make two versions of the error term: one is autocorrelated, so that the error for an observation is significantly correlated with errors of previous observations. The other is not autocorrelated. We run two regressions, one with autocorrelated errors, the other with errors that are not autocorrelated. Each time we collect the estimated coefficients and standard errors.

#--Monte Carlo: Temporal Autocorrelation--
#---We make up our data, 5000 times---
#---Since we make the data, we KNOW our coefficients--
#---We estimate the coefficients and compare the estimates with true values--

est<-corvals<-NULL
nobs<-30
truecoef<-6
x<-sample(1:9,replace=TRUE,nobs)
for (i in 1:5000){
  zerr<-ts(rnorm(nobs+4))               
  aerr<-ts(as.matrix(embed(zerr,5))%*%as.matrix((5:1)/sum(5:1)))
  kk<-ts.intersect(aerr,zerr)
  aerr<-as.numeric(scale(kk[,"aerr"])*12)#mean(aerr);sd(aerr) # give same mean and sd to both error terms
  err<-as.numeric(scale(kk[,"zerr"])*12) #mean(err);sd(err)
  y1<-x*truecoef+err
  y2<-x*truecoef+aerr  
  ec<-ts.intersect(ts(err),lag(ts(err),-1))
  aec<-ts.intersect(ts(aerr),lag(ts(aerr),-1))
  corvals<-rbind(corvals,data.frame(NOac.corr=cor(ec)[1,2],ac.corr=cor(aec)[1,2]))
  z1<-summary(lm(y1~x))
  NoacBeta<-z1$coefficients[2,1]
  NoacSE<-z1$coefficients[2,2]
  z2<-summary(lm(y2~x))
  acBeta<-z2$coefficients[2,1]
  acSE<-z2$coefficients[2,2]
  est<-rbind(est,data.frame(NoacBeta,NoacSE,acBeta,acSE))
}

First we want to verify that the residuals were indeed correlated as we thought (uncorrelated in one set of regressions, positively correlated in the other).

apply(corvals,2,mean)

##   NOac.corr     ac.corr 
## -0.03477958  0.65040639

3.2.1.1 Compare estimated values with true values

Now we want to compare the estimated coefficients and standard errors with their true values. For the coefficients, we simply compare the true value (six) with the mean of the coefficients from the non-autocorrelated-residuals and autocorrelated-residuals regressions.

#-True value of coefficient is 6; compare that to the mean of the estimated coefficients--
apply(est[,c("NoacBeta","acBeta")],2,mean)

## NoacBeta   acBeta 
## 6.005553 5.996245

For the standard errors we want to compare the true value with the mean of the estimated standard errors. What is the true value? It is simply the standard deviation of the estimated coefficients.

#-True standard error is the standard deviation of the coefficient estimates--
#-Compare true standard error with the mean of the estimated standard errors--

apply(est[,c("NoacSE","acSE")],2,mean)

##    NoacSE      acSE 
## 0.9001753 0.9082001

apply(est[,c("NoacBeta","acBeta")],2,sd)

##  NoacBeta    acBeta 
## 0.9103229 0.6574592

3.2.1.1.1 Histograms of coefficients

We have 500 estimates of the slope coefficients and a histogram allows one to see if the histogram centers on the true value.

#--check if the coefficient is biased---
xrange<-range(c(est$NoacBeta,est$acBeta,truecoef))*c(.9,1.1)
layout(matrix(1:2,1,2))
hist(est$NoacBeta,breaks=50,xlim=xrange)
lines(density(est$NoacBeta),col="blue")
abline(v=truecoef,col="red",lty=2,lwd=2)
hist(est$acBeta,breaks=50,xlim=xrange)
lines(density(est$acBeta),col="blue")
abline(v=truecoef,col="red",lty=2,lwd=2)

layout(1)

3.2.1.1.2 Histograms of standard errors

Next we can compare the estimated standard errors with the true value of the standard error (the standard deviation of the estimated coefficients)

#--check if the standard error is biased--
xrange<-range(c(est$NoacSE,est$acSE,sd(est$NoacBeta),sd(est$acBeta)))*c(.9,1.1)
layout(matrix(1:2,1,2))
hist(est$NoacSE,breaks=30,xlim=xrange)
mtext("true value: red line; mean estimate: blue line")
abline(v=sd(est$NoacBeta),col="red",lty=2,lwd=2)
abline(v=mean(est$NoacSE),col="blue",lty=3,lwd=2)
hist(est$acSE,breaks=30,xlim=xrange)
mtext("true value: red line; mean estimate: blue line")
abline(v=sd(est$acBeta),col="red",lty=2,lwd=2)
abline(v=mean(est$acSE),col="blue",lty=3,lwd=2)

layout(1)

The simulation results reflect what statistical theory proves: the coefficients are unbiased but the standard errors are biased.

The problem this causes us is that the t-statistics reported in our regression will be wrong. They test the null hypothesis that the true value of the estimated coefficient is zero, and are constructed as the ratio of the estimated coefficient to the estimated standard error. So we cannot trust our t-statistics and therefore cannot trust our p-values.

3.3 How to test for it

The most common test for Temporal Autocorrelation is the Breusch-Godfrey test. The null hypothesis is that the residuals are not autocorrelated. The code below will test for first- through fourth-order autocorrelation:

fm<-lm(FARMPOP~TIME,data=uu) bgtest(fm, order = 4)

3.4 If temporal autocorrelation exists, what do we do about it?

Autocorrelation is probably the product of model misspecification. On encountering autocorrelation, your first reaction should be to try to respecify your model. That is, you should hunt for some other way of modeling the relationship between your dependent and independent variables.

It often happens that autocorrelation vanishes when dynamics (i.e., lagged values) are introduced in a time series model. By this is meant that lagged values of the dependent or one or more independent variables are included as independent variables.

For the first figure on this page, the shape of the graph suggests that a semi-log model or a polynomial model would result in non-autocorrelated residuals.

If respecification is unsuccessful, a corrected estimate of the standard error will correct for the problem of unreliable t-statistics. You can use bootstrapping to correct your standard errors, if you are unable to remove autocorrelation by respecifying. The R package AER contains an estimate of the Newey-West autocorrelation consistent covariance matrix, which is used as below:

fm<-lm(FARMPOP~TIME,data=uu) coeftest(fm,vcov=NeweyWest(fm,lag=4))

4 Consumption function example

We will bring in income and consumption time series data from FRED, and build a consumption function model. We will test the FRED series for stationarity, and if they are non-stationary, will take the first differences and test those.

#--bring in GDP and Personal Consumption Expenditures from FRED at St.Louis Fed--
ww<-pdfetch_FRED(c("GDP","PCEC"))

z<-removeNA(merge(ww,diff(ww))) # merge ww with first differences of ww; remove missing values
head(z) # always look at your data

##                GDP    PCEC  GDP.1 PCEC.1
## 1947-06-30 245.968 160.031  2.804  3.870
## 1947-09-30 249.585 163.543  3.617  3.512
## 1947-12-31 259.745 167.672 10.160  4.129
## 1948-03-31 265.742 170.372  5.997  2.700
## 1948-06-30 272.567 174.142  6.825  3.770
## 1948-09-30 279.196 177.072  6.629  2.930

plot(as.ts(z)) # plot if you can

#--augmented Dickey-Fuller test--H0: series non-stationary--
for (i in colnames(z)){print(i);print(suppressWarnings(adf.test(z[,i])))}

## [1] "GDP"
## 
##  Augmented Dickey-Fuller Test
## 
## data:  z[, i]
## Dickey-Fuller = -0.95931, Lag order = 6, p-value = 0.9439
## alternative hypothesis: stationary
## 
## [1] "PCEC"
## 
##  Augmented Dickey-Fuller Test
## 
## data:  z[, i]
## Dickey-Fuller = -0.92061, Lag order = 6, p-value = 0.9501
## alternative hypothesis: stationary
## 
## [1] "GDP.1"
## 
##  Augmented Dickey-Fuller Test
## 
## data:  z[, i]
## Dickey-Fuller = -4.9267, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary
## 
## [1] "PCEC.1"
## 
##  Augmented Dickey-Fuller Test
## 
## data:  z[, i]
## Dickey-Fuller = -4.5353, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary

ww<-removeNA(diff(ww)) # work with first difference, since first difference is stationary
colnames(ww)<-c("Y","C") # rename columns
head(ww)

##                 Y     C
## 1947-06-30  2.804 3.870
## 1947-09-30  3.617 3.512
## 1947-12-31 10.160 4.129
## 1948-03-31  5.997 2.700
## 1948-06-30  6.825 3.770
## 1948-09-30  6.629 2.930

The simplest specification of the consumption function would be lm(C~Y,data=ww), but we want to include lagged values of the dependent and independent variables.

zww<-removeNA(merge(ww,lag(ww,1),lag(ww,2),lag(ww,3),lag(ww,4))) # merge ww with lagged values of ww; remove missing values
zww<-zww[,order(colnames(zww))] # reorder columns to make easier to read
head(zww)

##                 C    C.1    C.2   C.3   C.4      Y    Y.1    Y.2    Y.3    Y.4
## 1948-06-30  3.770  2.700  4.129 3.512 3.870  6.825  5.997 10.160  3.617  2.804
## 1948-09-30  2.930  3.770  2.700 4.129 3.512  6.629  6.825  5.997 10.160  3.617
## 1948-12-31  0.856  2.930  3.770 2.700 4.129  1.170  6.629  6.825  5.997 10.160
## 1949-03-31 -1.097  0.856  2.930 3.770 2.700 -5.332  1.170  6.629  6.825  5.997
## 1949-06-30  1.615 -1.097  0.856 2.930 3.770 -3.683 -5.332  1.170  6.629  6.825
## 1949-09-30 -0.640  1.615 -1.097 0.856 2.930  1.538 -3.683 -5.332  1.170  6.629

plot(as.ts(zww))

summary(cf<-lm(C~Y+Y.1+Y.2+C.1+C.2,data=zww))

## 
## Call:
## lm(formula = C ~ Y + Y.1 + Y.2 + C.1 + C.2, data = zww)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -77.238  -8.932  -2.645   6.029 115.196 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.076924   1.967653   1.056  0.29208    
## Y            0.708914   0.008797  80.582  < 2e-16 ***
## Y.1          0.123193   0.043404   2.838  0.00486 ** 
## Y.2          0.016886   0.042027   0.402  0.68814    
## C.1         -0.249502   0.059818  -4.171 4.03e-05 ***
## C.2         -0.048487   0.060468  -0.802  0.42330    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.64 on 285 degrees of freedom
## Multiple R-squared:  0.9601, Adjusted R-squared:  0.9594 
## F-statistic:  1372 on 5 and 285 DF,  p-value: < 2.2e-16

vif(cf)

##         Y       Y.1       Y.2       C.1       C.2 
##  1.044222 25.258703 16.974987 25.540423 17.314686

We do the Breusch-Godfrey test. The null hypothesis is that there is no autocorrelation.

#--Breusch-Godfrey test-H0: no autocorrelation--
bgtest(cf,order=4)

## 
##  Breusch-Godfrey test for serial correlation of order up to 4
## 
## data:  cf
## LM test = 7.0651, df = 4, p-value = 0.1325

Since the p-value is below 0.05 we reject the null hypothesis, which means that we have the problem of Temporal Autocorrelation.

Our first reaction should be to respecify the model, in this case by adding or removing lagged values of \(C\) and \(Y\). This is left to you as an exercise: respecify the model until the residuals are no longer significantly autocorrelated.

4.1 Marginal Propensity to Consume

The diagram below depicts the pecuniary flows in the circular flow diagram. The numbers in red show the proportion of income that follows any particular path.

Factor income flows from firms to households, but 20% of income is diverted in the form of taxes. Once income arrives at households, about 10% leaks out of the system as payments for imports; another 5% goes into savings. That leaves 65% of income that comes back again as consumption spending. That 65% is important enough that we have a special name for it: the Marginal Propensity to Consume (MPC). The MPC is the proportion of an additional dollar of income that is spent on consumption.

Leakages from income and the MPC

Once you have a good model, calculate the Marginal Propensity to Consume (MPC). This is straightforward with a simple consumption function:

\[ C_{t}=\hat\alpha_0+\hat\beta_0 Y_{t} \tag {1}\] \[MPC=\frac{\delta C}{\delta Y}=\hat\beta_0 \] But the MPC is not that straightforward in a more realistic model with lag terms. \[ C_{t}=\hat\alpha_0+\hat\alpha_1 C_{t-1}+\hat\alpha_2 C_{t-2}+\hat\beta_0 Y_{t}+\hat\beta_1 Y_{t-1}+\hat\beta_2 Y_{t-2} \tag {2}\] We have to imagine the relationship between \(C\) and \(Y\) to be in long-run equilibrium, so that neither \(C\) nor \(Y\) are changing: \(C_{t}=C_{t-1}=C_{t-2}=C^{\prime}\) and \(Y_{t}=Y_{t-1}=Y_{t-2}=Y^{\prime}\).¹

With this assumption, we rewrite Equation (2): \[ C^{\prime}=\hat\alpha_0+\hat\alpha_1 C^{\prime}+\hat\alpha_2 C^{\prime}+\hat\beta_0 Y^{\prime}+\hat\beta_1 Y^{\prime}+\hat\beta_2 Y^{\prime}\]

\[ C^{\prime}=\hat\alpha_0+(\hat\alpha_1+\hat\alpha_2) C^{\prime}+(\hat\beta_0 +\hat\beta_1+\hat\beta_2) Y^{\prime}\]

\[ (1-(\hat\alpha_1+\hat\alpha_2))C^{\prime}=\hat\alpha_0+(\hat\beta_0 +\hat\beta_1+\hat\beta_2) Y^{\prime}\] \[ C^{\prime}=\frac{\hat\alpha_0}{(1-(\hat\alpha_1+\hat\alpha_2))} +\frac{(\hat\beta_0 +\hat\beta_1+\hat\beta_2)}{(1-(\hat\alpha_1+\hat\alpha_2))} Y^{\prime}\]

\[MPC=\frac{\delta C^{\prime}}{\delta Y^{\prime}}=\frac{(\hat\beta_0 +\hat\beta_1+\hat\beta_2)}{(1-(\hat\alpha_1+\hat\alpha_2))}\]

4.1.1 Calculate the MPC in R

Calculating a derivative at long-run equilibrium, using a model with lagged terms, turns out to be extremely useful. The derivative is called the long-run multiplier (LRM).

The long run multiplier can be readily calculated in R, as below.

#--calculate long run multiplier--
u<-coef(cf) 
MPC1<-sum(u[grep("Y",names(u))])/(1-sum(u[grep("C",names(u))]))
MPC1

## [1] 0.6540839

The long run multiplier formula is complicated, and a confidence interval can most easily be calculated by bootstrapping. Note that the function we use to bootstrap time series (tsboot()) is different from the function we used for cross-sectional (boot()). The difference (hidden within the function) is that rather than sampling individual observations from the data, we extract blocks of data, to preserve some of the sequential property of the time series.

#--bootstrap the long run multiplier--
fzz<-formula(cf$call) # extracting the formula from our previous regression
Slrm<-function(data,i){coef(lm(fzz,data=data[i,]))} # our function simply keeps the estimated coefficients

bs<-tsboot(tseries=zww,statistic=Slrm,R=500,l=20,sim="geom") 
colnames(bs$t)<-names(coef(cf)) 
head(bs$t)

##      (Intercept)         Y        Y.1          Y.2          C.1         C.2
## [1,]  -0.1497223 0.7078944 0.07079571  0.077080970 -0.184809494 -0.14043238
## [2,]   1.2243168 0.4488750 0.11311402  0.021495024  0.003045454  0.09220373
## [3,]   1.9381887 0.7112258 0.01925804 -0.002106380 -0.110956319 -0.01980252
## [4,]   3.3453447 0.4708896 0.13096561  0.006927385 -0.058378457  0.09066521
## [5,]   1.7924921 0.7175858 0.11516964  0.028821776 -0.240229056 -0.06769921
## [6,]   1.0230420 0.4084635 0.12597074  0.002157195 -0.015365511  0.12065441

MPC<-apply(bs$t[,grep("Y",colnames(bs$t))],1,sum)/(1-apply(bs$t[,grep("C",colnames(bs$t))],1,sum))

hist(MPC,breaks=50)
abline(v=quantile(MPC,c(.05,.95)),col="red")
abline(v=MPC1,col="blue",lwd=2,lty=2)

pt(abs(MPC1/sd(MPC)),(NROW(zww)-length(coef(cf))),lower.tail=FALSE) #H0: long run multiplier=0

## [1] 1.943543e-95

pt(abs((MPC1-0.9)/sd(MPC)),(NROW(zww)-length(coef(cf))),lower.tail=FALSE) #H0: long run multiplier=0.9

## [1] 5.137448e-27

pt(abs((MPC1-0.55)/sd(MPC)),(NROW(zww)-length(coef(cf))),lower.tail=FALSE) #H0: long run multiplier=0.55

## [1] 4.372679e-07

coeftest(cf,vcov=cov(bs$t)) # corrected standard errors for you estimated coefficients

## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)  2.076924   2.524813  0.8226   0.41142    
## Y            0.708914   0.120486  5.8838 1.123e-08 ***
## Y.1          0.123193   0.057522  2.1417   0.03307 *  
## Y.2          0.016886   0.048243  0.3500   0.72657    
## C.1         -0.249502   0.102454 -2.4353   0.01549 *  
## C.2         -0.048487   0.086785 -0.5587   0.57680    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

5 Conclusion

In sum, you should be able to answer these four questions on a test:

What is Temporal Autocorrelation?
Why is it a problem?
How do you test for it?
If you have the problem, how do you fix it?

Wickens, M.R., T.S. Breusch, Dynamic Specification, the Long-Run and The Estimation of Transformed Regression Models. The Economic Journal, Vol. 98, No. 390, Supplement: Conference Papers (1988), pp 189-205 Link ↩︎

Temporal Autocorrelation

Anthon Eff