The functionality of this webpage is constrained in D2L, and you might find it easier to read and navigate if you download this html file to your own computer.
So far we have worked with cross-sectional data, where every observation is a different person or place, at around the same moment in time. In time series data, every observation is the same person or place, but at different moments in time. Time series are subscripted \(t\), like \(x_t\), where \(t\) indicates the current time period, \(t-1\) indicates the immediately preceding time period, \(t-2\) indicates two time periods in the past, etc. Because of this notation, time series data are always sorted in chronological order, with the earliest period first and the latest last.
Time series data have some advantages. They can be used for forecasting, and they allow testing of causal relationshps between variables.
Causality testing is a technique developed by Clive W.J. Granger. The technique rests on a simple and reasonable assumption: If variable \(A\) causes changes in \(B\), then one will observe that changes in \(A\) will precede changes in \(B\).
The Granger testing procedure requires that one set up and test two equations. In each equation, the current value of one variable (\(A_t\) or \(B_t\) ) is a function of the other variable and its own value in previous time periods (lagged values). (The number of previous time periods is set at two here simply as an example). The intuition behind the Granger test is simple: if previous values of variable \(A\) significantly influence current values of variable \(B\), then one can say that \(A\) causes \(B\).
\[A_t = \alpha_0 +\alpha_1 A_{t-1} +\alpha_2 A_{t-2} +\beta_1 B_{t-1} +\beta_2 B_{t-2} +\varepsilon_t \tag {1}\] \[B_t = \gamma_0 +\gamma_1 A_{t-1} +\gamma_2 A_{t-2} +\pi_1 B_{t-1} +\pi_2 B_{t-2} +\epsilon_t \tag {2}\]
Equation (1) is used to test the following null hypothesis. \(H_0\): \(B\) does not cause \(A\) (\(B \not\Rightarrow A\)).
\[A_t = \alpha_0 +\alpha_1 A_{t-1} +\alpha_2 A_{t-2} +\beta_1 B_{t-1} +\beta_2 B_{t-2} +\varepsilon_t\tag {unrestricted model}\] \[A_t = \alpha_0 +\alpha_1 A_{t-1} +\alpha_2 A_{t-2} +\varepsilon_t\tag {restricted model}\]
From these regressions, create your F-statistic. If the p-value on the F-statistic is low enough (\(\leq 0.05\)), you can reject \(H_0\) and conclude that \(B\) causes \(A\) (\(B \Rightarrow A\)).
Equation (2) is used to test the following null hypothesis. \(H_0\): \(A\) does not cause \(B\) (\(A \not\Rightarrow B\)).
\[B_t = \gamma_0 +\gamma_1 A_{t-1} +\gamma_2 A_{t-2} +\pi_1 B_{t-1} +\pi_2 B_{t-2} +\epsilon_t \tag {unrestricted model}\] \[B_t = \gamma_0 +\pi_1 B_{t-1} +\pi_2 B_{t-2} +\epsilon_t\tag {restricted model}\]
From these regressions, calculate a second F-statistic. If the p-value on the F-statistic is low enough (\(\leq 0.05\)), you can reject \(H_0\) and conclude that \(A\) causes \(B\) (\(A \Rightarrow B\)).
Compare the results of these two F-Statistics against the following table.
\(B \Rightarrow A\) | \(B \not\Rightarrow A\) | |
---|---|---|
\(A \Rightarrow B\) | Feedback relationship | \(A\) Granger-causes \(B\) |
\(A \not\Rightarrow B\) | \(B\) Granger-causes \(A\) | No relationship between \(A\) and \(B\) |
You have two time-series variables. You would like to know whether one causes the other, or whether they are involved in a feedback relationship. In this example we will use Personal Consumption Expenditures (PCEC) and Personal Income (PINCOME).
There are four steps:
FRED II is the economic data repository maintained by the Saint Louis Federal Reserve Bank. As the website says: Download, graph, and track 766,000 US and international time series from 101 sources. R can directly access FRED data.
#--------------------------------------
#--bring in PCEC and PINCOME from FRED II at St.Louis Fed--
#--------------------------------------
ww<-pdfetch_FRED(c("PCEC","PINCOME"))
class(ww)
## [1] "xts" "zoo"
tail(ww) # most recent data is strange
## PCEC PINCOME
## 2020-09-30 14293.83 19777.45
## 2020-12-31 14467.61 19542.00
## 2021-03-31 15005.44 21867.34
## 2021-06-30 15681.70 20669.90
## 2021-09-30 15964.94 20823.77
## 2021-12-31 16314.20 20947.67
# look at plot
plot(ww) # the red is PINCOME; the black is PCEC
ww<-window(ww,end="2020-01-01") # restricting data to period before COVID-19
Time series variables are either stationary or non-stationary. A stationary variable is one whose mean and variance do not systematically differ over the time period. Most of the familiar macro-variables are non-stationary: GDP, the CPI, and retail sales all increase substantially over the post-war period: their mean in the 1950s is very different from their mean in the 1990s.
Regressions in which the dependent and independent variables are non-stationary can lead to spurious results: the variables may share the same time trend, even though they are not really related, so that the regression will exaggerate their relationship.
The augmented Dickey-Fuller test (the R command adf.test
) tests for a unit root (when a series has a unit root it is non-stationary). The null hypothesis is that the series is non-stationary; if the p-value is low enough then reject the null hypothesis. If the pvalue is higher than 0.05, and you must accept the null hypothesis, try transforming the series. Typically the first difference (\(\Delta x_t=x_{t}-x_{t-1}\)) will be stationary.
#--------------------------------------
#--Make stationary---------------------
#--------------------------------------
C<-ww[,"PCEC"]
Y<-ww[,"PINCOME"]
#--augmented Dickey-Fuller test--
#--H0:series has unit root (series NON-stationary)--
adf.test(C) #accept
## Warning in adf.test(C): p-value greater than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: C
## Dickey-Fuller = 1.0541, Lag order = 6, p-value = 0.99
## alternative hypothesis: stationary
adf.test(Y) #accept
## Warning in adf.test(Y): p-value greater than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: Y
## Dickey-Fuller = 1.566, Lag order = 6, p-value = 0.99
## alternative hypothesis: stationary
#--take first difference if variable is NON-stationary --
C<-diff(ww[,"PCEC"],1)
Y<-diff(ww[,"PINCOME"],1)
#--H0:series has unit root (series NON-stationary)--
adf.test(C[which(!is.na(C))]) #reject
## Warning in adf.test(C[which(!is.na(C))]): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: C[which(!is.na(C))]
## Dickey-Fuller = -5.1675, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary
adf.test(Y[which(!is.na(Y))]) #reject
## Warning in adf.test(Y[which(!is.na(Y))]): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: Y[which(!is.na(Y))]
## Dickey-Fuller = -6.8117, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary
# -- look at plot--
a<-merge(C,Y) # it is understood that the merge is by date
plot(a) # the red is PINCOME; the black is PCEC
In setting up the model, how many past time periods should you consider? Your results can be quite different, depending on how far back you look in your model. It is made a bit confusing by the fact that there are several approaches to determining lag lengths. In this class, I want you to use the Akaike Information Criterion, a measure similar to the adjusted \(R^2\).
#--------------------------------------
#--find optimal lag length, using AIC--
#--------------------------------------
ss<-20
vx<-c("C","Y")
taic<-NULL
for (k in 1:NCOL(ww)){
v<-as.matrix(ww[,k])
nobs<-NROW(ww)
cb<-matrix(NA,nobs,ss)
for (i in 1:ss){
cb[(i+1):nobs,i]<-v[1:(nobs-i)]
is.na(cb[1:i,i])<-TRUE
}
aic<-matrix(0,ss,2)
z<-which(!is.na(cb[,ss]))
for (i in 1:ss){
aic[i,2]<-AIC(lm(v[z]~cb[z,(1:i)]),k=2)
}
aic[,1]<-(1:ss)
aic<-data.frame(aic[order(aic[,2]),])
names(aic)<-c("lags","aic")
aic$varb<-as.character(vx[k])
taic<-rbind(taic,aic[1,])
}
taic
## lags aic varb
## 1 15 2693.465 C
## 2 20 3057.715 Y
This is really no different than any other F-test you have conducted. You run a regression, then drop some variables and run a second regression.
#--------------------------------------
#--Granger causality-------------------
#--------------------------------------
nbs<-NROW(ww)
sc<-taic$lags[1]
v<-as.matrix(ww[,1])
cb<-matrix(0,nbs,sc)
for (i in 1:sc){
cb[(i+1):nbs,i]<-v[1:(nbs-i)]
is.na(cb[1:i,i])<-TRUE
}
sy<-taic$lags[2]
v<-as.matrix(ww[,2])
yb<-matrix(0,nbs,sy)
for (i in 1:sy){
yb[(i+1):nbs,i]<-v[1:(nbs-i)]
is.na(yb[1:i,i])<-TRUE
}
z<-which(!is.na(rowSums(yb)) & !is.na(rowSums(cb)))
o<-lm(C[z]~yb[z,]+cb[z,])
kii<-names(coef(o))
dropt<-kii[grep("yb",kii)]
Ftest<-linearHypothesis(o,dropt)
pval=Ftest$`Pr(>F)`[2]
#H0: Y does not Granger cause C
pval
## [1] 1.896883e-10
o<-lm(Y[z]~yb[z,]+cb[z,])
kii<-names(coef(o))
dropt<-kii[grep("cb",kii)]
Ftest<-linearHypothesis(o,dropt)
pval=Ftest$`Pr(>F)`[2]
#H0: C does not Granger cause Y
pval
## [1] 1.450412e-11
Due in one week, two hours before class
Pick any four variables from FRED for which theory suggests causal inter-relationships. For example:
Test causality between each pair of your four variables. Report your results. Turn in your R script.