Compiled on 2021-03-10 by E. Anthon Eff
You might find it easier to read and navigate this html file if you download it to your own computer.
Economists estimate production functions to make predictions, and to study productivity. The general form of the production function is \(output=f(inputs)\), where the output is something humans want (final goods and services) and the inputs are things used to make the output. Inputs are traditionally divided into three categories:
A production function can be specified many different ways; the Cobb-Douglas specification in Equation (1) is probably the most common. \(Q_i\) is the quantity of the output produced; \(A_i\) is the level of technology; \(K_i\) is the quantity of services from capital; and \(L_i\) is the quantity of Labor used. Each observation \(i\) is a production unit, or a group of production units with something in common, such as physical location.
\[Q_i=A_i{K_i}^{\alpha_K}{L_i}^{\alpha_L}\tag 1\]
The file pwt91_extract2017.xlsx contains 2017 data extracted from the Penn World Tables v9.1. The data contain variables for \(Q_i\) (cgdpo
), \(K_i\) (cn
) and \(L_i\) (emp
). Each observation is a country.
We lack data for \(A_i\), and for the exponents \(\alpha_K\) and \(\alpha_L\). We will estimate these.
By taking the log of both sides, we can convert Equation (1) into a linear equation. \[ln(Q_i)=ln(A_i)+\alpha_K ln(K_i)+\alpha_L ln(L_i) \tag 2\] When we estimate this equation, we get:
\[ln(Q_i)=\hat\alpha_0+\hat\alpha_K ln(K_i)+\hat\alpha_L ln(L_i)+\hat\epsilon_i \tag 3\]
#--read in Penn World Tables v9.1 data for 2017 from working directory--
dim(aa<-data.frame(read_excel("pwt91_extract2017.xlsx",sheet="data")))
## [1] 182 51
rownames(aa)<-aa$iso3
dii<-c("cgdpo","emp","cn")
aa[,dii]<-log(aa[,dii]+1) # convert Q, K, and L to logs
dim(aa<-aa[,c("iso3","country",dii)])
## [1] 182 5
dim(aa<-aa[complete.cases(aa),])# listwise deletion (remove observations with missing values)
## [1] 171 5
summary(zz<-lm(cgdpo~emp+cn, data=aa)) # Cobb-Douglass production function
##
## Call:
## lm(formula = cgdpo ~ emp + cn, data = aa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.0136 -0.1952 0.0292 0.2169 1.0888
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.12234 0.23845 0.513 0.609
## emp 0.24409 0.03471 7.032 4.89e-11 ***
## cn 0.84565 0.02182 38.750 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3704 on 168 degrees of freedom
## Multiple R-squared: 0.9668, Adjusted R-squared: 0.9665
## F-statistic: 2450 on 2 and 168 DF, p-value: < 2.2e-16
vif(zz)
## emp cn
## 2.484043 2.484043
reset(zz)
##
## RESET test
##
## data: zz
## RESET = 7.2857, df1 = 2, df2 = 166, p-value = 0.0009271
One reason economists like the Cobb-Douglas specification is that the estimated coefficients \(\hat\alpha_K\) and \(\hat\alpha_L\) have an immediate and useful interpretation: they are the elasticity of the output with respect to the input. Thus, for example, \(\hat\alpha_K\) gives the percentage change in output for a one percent increase in dollars of capital used.
Productivity is the ratio of output over an input. Total factor productivity is the ratio of output over some function of all inputs. The technological variable \(A_i\) in Equation (1) provides a measure of the total factor productivity of a county:
\[A_i=\frac{Q_i}{{K_i}^{\alpha_K}{L_i}^{\alpha_L}}\tag 4\]
Which implies that
\[\widehat{ln(A_i)}=ln(Q_i)-(\hat\alpha_K ln(K_i)+\hat\alpha_L ln(L_i))=\hat\alpha_0+\hat\epsilon_i \tag 5\]
so that
\[\hat{A_i}=e^{(\hat\alpha_0+\hat\epsilon_i)} \tag 6\]
length(aa$tfp<-exp(coef(zz)[1]+zz$residuals)) #using Equation (6) above to calculate TFP
## [1] 171
head(aa[order(-aa$tfp),c("iso3","country","tfp")]) #highest TFP
## iso3 country tfp
## KGZ KGZ Kyrgyzstan 3.357196
## AZE AZE Azerbaijan 2.910586
## IRQ IRQ Iraq 2.802029
## EGY EGY Egypt 2.740138
## UZB UZB Uzbekistan 2.599712
## MLI MLI Mali 2.572592
You can view Total Factor Productivity and the data used for estimation below:
Suppose that we increase every input by the same amount, such as 10%. How much would output change?
\[A_i{(1.1K_i)}^{\alpha_K}{(1.1L_i)}^{\alpha_L}=1.1^{(\alpha_K+\alpha_L)}A_i{K_i}^{\alpha_K}{L_i}^{\alpha_L}=1.1^{(\alpha_K+\alpha_L)}Q_i\tag 7\]
So the estimated parameters from a Cobb-Douglas production function can be used to test for returns to scale.
sum(coef(zz)[-1]) # display the actual sum
## [1] 1.089743
linearHypothesis(zz,"emp+cn=1",vcov=hccm(zz)) # H0: constant returns to scale
## Linear hypothesis test
##
## Hypothesis:
## emp + cn = 1
##
## Model 1: restricted model
## Model 2: cgdpo ~ emp + cn
##
## Note: Coefficient covariance matrix supplied.
##
## Res.Df Df F Pr(>F)
## 1 169
## 2 168 1 12.831 0.0004462 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A Chow test splits your data into two parts and estimates a separate model on each part. The two separate models are combined to be your unrestricted model, and they are compared to a model estimated once on the entire dataset, which is your restricted model (the parameters are restricted to be the same across all observations). The null hypothesis of the Chow test is that the parameters are the same in both subsets. If the p-value is \(\leq 0.05\) you would reject the null hypothesis.
For example, one might wonder whether production in Less Developed Countries (LDCs) has the same output elasticities as production elsewhere.
#--read in PWT data and a dummy for LDCs--
dim(oo<-data.frame(read_excel("internationalOrganization_CIA.xlsx",sheet="data")))
## [1] 243 215
dim(aa<-data.frame(read_excel("pwt91_extract2017.xlsx",sheet="data")))
## [1] 182 51
aa<-merge(aa[,c("country","iso3","cgdpo","emp","cn")],oo[,c("iso3","LDCs")],by="iso3")
rownames(aa)<-aa$iso3
dii<-c("cgdpo","emp","cn")
aa[,dii]<-log(aa[,dii]+1) # convert Q, K, and L to logs
dim(aa<-aa[complete.cases(aa),]) # listwise deletion (remove observations with missing values)
## [1] 171 6
chowTest<-function(frm,dta,dum){
# Chow test for dta partitioned by dum, using frm
zz<-lm(frm, data=dta) #--estimate for all observations
ESSR<-sum(zz$residuals^2) #--collect the sum of squared residuals
z<-which(dta[,dum]==0) #--identify observations that are dum==0
z1<-lm(frm, data=dta[z,]) #--estimate for dum==0
ESS1<-sum(z1$residuals^2) #--collect the sum of squared residuals
z<-which(dta[,dum]==1) #--identify observations that are dum==1
z2<-lm(frm, data=dta[z,]) #--estimate for dum==1
ESS2<-sum(z2$residuals^2) #--collect the sum of squared residuals
ESSUR<-ESS1+ESS2 #--unrestricted ESS (parameters allowed to change across subsets)
dfUR<-NROW(dta)-2*length(coef(zz)) #--degrees of freedom equals nobs minus numb estimated parameters
numres<-2*length(coef(zz)) #-- you are restricting both the dum==0 coefficients and dum==1 coefficients to equal the coefficients estimated with all observations
Fstat=((ESSR-ESSUR)/numres)/(ESSUR/dfUR) #--(H0: both subsets have same parameter values)
pval=1-pf(Fstat,numres,dfUR) #--calculate pvalue
#--display F-stat and pvalue
ff<-list(dum,table(dta[,dum]),data.frame(H0="coef same for the subsets",Fstat,numres,dfUR,pval),coef(z1),coef(z2))
names(ff)<-c("dummy_variable","frequency","Chow_test","coeff_when_dum==0","coeff_when_dum==1")
ff
}
chowTest("cgdpo~emp+cn",aa,"LDCs")
## $dummy_variable
## [1] "LDCs"
##
## $frequency
##
## 0 1
## 49 122
##
## $Chow_test
## H0 Fstat numres dfUR pval
## 1 coef same for the subsets 4.479993 6 165 0.0003136118
##
## $`coeff_when_dum==0`
## (Intercept) emp cn
## 3.4351228 0.6131559 0.5673762
##
## $`coeff_when_dum==1`
## (Intercept) emp cn
## -0.01636236 0.24329884 0.85335359
We rejected the null hypothesis that the output elasticities are the same for LDCs and non-LDCs. The next step would typically be to see how they are different. The stargazer
package makes it easy to create a professional-looking table showing coefficients and standard errors from the three regressions:
library(stargazer)
z<-which(aa$LDCs==1);z1<-lm("cgdpo~emp+cn",aa[z,]) #LDCs
z<-which(aa$LDCs==0);z0<-lm("cgdpo~emp+cn",aa[z,]) #non-LDCs
zz<-lm("cgdpo~emp+cn",aa) #all observations
stargazer(zz,z1,z0,type="text",keep.stat=c("n","rsq"),column.labels=c("total","LDCs","non-LDCs"))
##
## ==========================================
## Dependent variable:
## -----------------------------
## NA
## total LDCs non-LDCs
## (1) (2) (3)
## ------------------------------------------
## emp 0.244*** 0.243*** 0.613***
## (0.035) (0.040) (0.085)
##
## cn 0.846*** 0.853*** 0.567***
## (0.022) (0.026) (0.054)
##
## Constant 0.122 -0.016 3.435***
## (0.238) (0.279) (0.618)
##
## ------------------------------------------
## Observations 171 122 49
## R2 0.967 0.970 0.966
## ==========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
We can see that the estimated coefficients are different (the output elasticity of capital is higher for LDCs; the output elasticity of labor is higher for non-LDCs) but we do not know if they are significantly different. To determine that, we must reestimate the Cobb-Douglas production function, using the LDC dummy as an interaction term. For each of the interaction terms, the null hypothesis is that the output elasticity is identical in LDCs and non-LDCs
summary(zz<-lm("cgdpo~LDCs*(emp+cn)",aa))
##
## Call:
## lm(formula = "cgdpo~LDCs*(emp+cn)", data = aa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.0176 -0.2087 0.0488 0.2187 0.9418
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.43512 0.71801 4.784 3.79e-06 ***
## LDCs -3.45149 0.76556 -4.508 1.23e-05 ***
## emp 0.61316 0.09924 6.179 4.87e-09 ***
## cn 0.56738 0.06248 9.081 3.18e-16 ***
## LDCs:emp -0.36986 0.10644 -3.475 0.000653 ***
## LDCs:cn 0.28598 0.06739 4.244 3.65e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3466 on 165 degrees of freedom
## Multiple R-squared: 0.9715, Adjusted R-squared: 0.9706
## F-statistic: 1125 on 5 and 165 DF, p-value: < 2.2e-16
sum(coef(zz)[c("LDCs:emp","LDCs:cn")]) # the amount by which LDC returns to scale differs from non-LDC
## [1] -0.0838797
linearHypothesis(zz,"LDCs:emp+LDCs:cn=0") #H0: the returns to scale are the same for LDCs and non-LDCs
## Linear hypothesis test
##
## Hypothesis:
## LDCs:emp + LDCs:cn = 0
##
## Model 1: restricted model
## Model 2: cgdpo ~ LDCs * (emp + cn)
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 166 20.067
## 2 165 19.817 1 0.2496 2.0782 0.1513
sum(coef(zz)[c("emp","LDCs:emp")]) #output elasticity of labor for LDCs
## [1] 0.2432988
sum(coef(zz)[c("cn","LDCs:cn")]) #output elasticity of capital for LDCs
## [1] 0.8533536
coef(zz)["emp"] #output elasticity of labor for non-LDCs
## emp
## 0.6131559
coef(zz)["cn"] #output elasticity of capital for non-LDCs
## cn
## 0.5673762
The coefficients for both LDCs:emp
and LDCs:cn
are significantly different from zero. We therefore reject the null hypotheses that LDCs and non-LDCs have the same output elasticities. LDCs are considered to be relatively labor-abundant and capital-scarce. In a world with diminishing marginal returns, it makes sense that they would have a higher output elasticity of capital and a lower output elasticity of labor.
Though the returns to scale are lower for LDCs, the difference is not statistically significant.
Note that the output elasticities are exactly the same, regardless of whether they are estimated by using interaction terms or by separately regressing on the subsets of observations.