Compiled on 2021-03-10 by E. Anthon Eff
Jones College of Business, Middle Tennessee State University

1 Resources to learn and use R

You might find it easier to read and navigate this html file if you download it to your own computer.

R web search Use this to hunt for documentation for specific package or function
Cheat sheets

2 The Cobb-Douglas production function

Economists estimate production functions to make predictions, and to study productivity. The general form of the production function is \(output=f(inputs)\), where the output is something humans want (final goods and services) and the inputs are things used to make the output. Inputs are traditionally divided into three categories:

the factors of production (Labor, Kapital, land, entrepreneurship)
intermediate goods (materials puchased to make the final good or service)
the level of technology

A production function can be specified many different ways; the Cobb-Douglas specification in Equation (1) is probably the most common. \(Q_i\) is the quantity of the output produced; \(A_i\) is the level of technology; \(K_i\) is the quantity of services from capital; and \(L_i\) is the quantity of Labor used. Each observation \(i\) is a production unit, or a group of production units with something in common, such as physical location.

\[Q_i=A_i{K_i}^{\alpha_K}{L_i}^{\alpha_L}\tag 1\]

2.1 Estimating the Cobb-Douglas production function

The file pwt91_extract2017.xlsx contains 2017 data extracted from the Penn World Tables v9.1. The data contain variables for \(Q_i\) (cgdpo), \(K_i\) (cn) and \(L_i\) (emp). Each observation is a country.

We lack data for \(A_i\), and for the exponents \(\alpha_K\) and \(\alpha_L\). We will estimate these.

By taking the log of both sides, we can convert Equation (1) into a linear equation. \[ln(Q_i)=ln(A_i)+\alpha_K ln(K_i)+\alpha_L ln(L_i) \tag 2\] When we estimate this equation, we get:

\[ln(Q_i)=\hat\alpha_0+\hat\alpha_K ln(K_i)+\hat\alpha_L ln(L_i)+\hat\epsilon_i \tag 3\]

#--read in Penn World Tables v9.1 data for 2017 from working directory--
dim(aa<-data.frame(read_excel("pwt91_extract2017.xlsx",sheet="data")))

## [1] 182  51

rownames(aa)<-aa$iso3
dii<-c("cgdpo","emp","cn")  
aa[,dii]<-log(aa[,dii]+1) # convert Q, K, and L to logs
dim(aa<-aa[,c("iso3","country",dii)])

## [1] 182   5

dim(aa<-aa[complete.cases(aa),])# listwise deletion (remove observations with missing values)

## [1] 171   5

summary(zz<-lm(cgdpo~emp+cn, data=aa)) # Cobb-Douglass production function

## 
## Call:
## lm(formula = cgdpo ~ emp + cn, data = aa)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0136 -0.1952  0.0292  0.2169  1.0888 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.12234    0.23845   0.513    0.609    
## emp          0.24409    0.03471   7.032 4.89e-11 ***
## cn           0.84565    0.02182  38.750  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3704 on 168 degrees of freedom
## Multiple R-squared:  0.9668, Adjusted R-squared:  0.9665 
## F-statistic:  2450 on 2 and 168 DF,  p-value: < 2.2e-16

vif(zz)

##      emp       cn 
## 2.484043 2.484043

reset(zz)

## 
##  RESET test
## 
## data:  zz
## RESET = 7.2857, df1 = 2, df2 = 166, p-value = 0.0009271

2.2 Output elasticities

One reason economists like the Cobb-Douglas specification is that the estimated coefficients \(\hat\alpha_K\) and \(\hat\alpha_L\) have an immediate and useful interpretation: they are the elasticity of the output with respect to the input. Thus, for example, \(\hat\alpha_K\) gives the percentage change in output for a one percent increase in dollars of capital used.

2.3 Total factor productivity

Productivity is the ratio of output over an input. Total factor productivity is the ratio of output over some function of all inputs. The technological variable \(A_i\) in Equation (1) provides a measure of the total factor productivity of a county:

\[A_i=\frac{Q_i}{{K_i}^{\alpha_K}{L_i}^{\alpha_L}}\tag 4\]

Which implies that

\[\widehat{ln(A_i)}=ln(Q_i)-(\hat\alpha_K ln(K_i)+\hat\alpha_L ln(L_i))=\hat\alpha_0+\hat\epsilon_i \tag 5\]

so that

\[\hat{A_i}=e^{(\hat\alpha_0+\hat\epsilon_i)} \tag 6\]

length(aa$tfp<-exp(coef(zz)[1]+zz$residuals)) #using Equation (6) above to calculate TFP

## [1] 171

head(aa[order(-aa$tfp),c("iso3","country","tfp")]) #highest TFP

##     iso3    country      tfp
## KGZ  KGZ Kyrgyzstan 3.357196
## AZE  AZE Azerbaijan 2.910586
## IRQ  IRQ       Iraq 2.802029
## EGY  EGY      Egypt 2.740138
## UZB  UZB Uzbekistan 2.599712
## MLI  MLI       Mali 2.572592

You can view Total Factor Productivity and the data used for estimation below:

2.4 Returns to scale

Suppose that we increase every input by the same amount, such as 10%. How much would output change?

\[A_i{(1.1K_i)}^{\alpha_K}{(1.1L_i)}^{\alpha_L}=1.1^{(\alpha_K+\alpha_L)}A_i{K_i}^{\alpha_K}{L_i}^{\alpha_L}=1.1^{(\alpha_K+\alpha_L)}Q_i\tag 7\]

If \((\alpha_K+\alpha_L)>1\) then \(Q_i\) will increase by more than 10% (increasing returns to scale).
If \((\alpha_K+\alpha_L)<1\) then \(Q_i\) will increase by less than 10% (decreasing returns to scale).
If \((\alpha_K+\alpha_L)=1\) then \(Q_i\) will increase by exactly 10% (constant returns to scale.

So the estimated parameters from a Cobb-Douglas production function can be used to test for returns to scale.

sum(coef(zz)[-1])  # display the actual sum

## [1] 1.089743

linearHypothesis(zz,"emp+cn=1",vcov=hccm(zz)) # H0: constant returns to scale

## Linear hypothesis test
## 
## Hypothesis:
## emp  + cn = 1
## 
## Model 1: restricted model
## Model 2: cgdpo ~ emp + cn
## 
## Note: Coefficient covariance matrix supplied.
## 
##   Res.Df Df      F    Pr(>F)    
## 1    169                        
## 2    168  1 12.831 0.0004462 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3 Chow Test

A Chow test splits your data into two parts and estimates a separate model on each part. The two separate models are combined to be your unrestricted model, and they are compared to a model estimated once on the entire dataset, which is your restricted model (the parameters are restricted to be the same across all observations). The null hypothesis of the Chow test is that the parameters are the same in both subsets. If the p-value is \(\leq 0.05\) you would reject the null hypothesis.

For example, one might wonder whether production in Less Developed Countries (LDCs) has the same output elasticities as production elsewhere.

#--read in PWT data and a dummy for LDCs--
dim(oo<-data.frame(read_excel("internationalOrganization_CIA.xlsx",sheet="data")))

## [1] 243 215

dim(aa<-data.frame(read_excel("pwt91_extract2017.xlsx",sheet="data")))

## [1] 182  51

aa<-merge(aa[,c("country","iso3","cgdpo","emp","cn")],oo[,c("iso3","LDCs")],by="iso3")
rownames(aa)<-aa$iso3
dii<-c("cgdpo","emp","cn")  
aa[,dii]<-log(aa[,dii]+1) # convert Q, K, and L to logs
dim(aa<-aa[complete.cases(aa),]) # listwise deletion (remove observations with missing values)

## [1] 171   6

chowTest<-function(frm,dta,dum){
  # Chow test for dta partitioned by dum, using frm
  zz<-lm(frm, data=dta) #--estimate for all observations
  ESSR<-sum(zz$residuals^2) #--collect the sum of squared residuals
  
  z<-which(dta[,dum]==0)    #--identify observations that are dum==0
  z1<-lm(frm, data=dta[z,]) #--estimate for dum==0
  ESS1<-sum(z1$residuals^2) #--collect the sum of squared residuals
  
  z<-which(dta[,dum]==1)    #--identify observations that are dum==1 
  z2<-lm(frm, data=dta[z,]) #--estimate for dum==1 
  ESS2<-sum(z2$residuals^2) #--collect the sum of squared residuals
  
  ESSUR<-ESS1+ESS2  #--unrestricted ESS (parameters allowed to change across subsets)
  dfUR<-NROW(dta)-2*length(coef(zz))    #--degrees of freedom equals nobs minus numb estimated parameters
  numres<-2*length(coef(zz))    #-- you are restricting both the dum==0 coefficients and dum==1 coefficients to equal the coefficients estimated with all observations
  Fstat=((ESSR-ESSUR)/numres)/(ESSUR/dfUR)  #--(H0: both subsets have same parameter values)
  pval=1-pf(Fstat,numres,dfUR)  #--calculate pvalue 
  #--display F-stat and pvalue
  ff<-list(dum,table(dta[,dum]),data.frame(H0="coef same for the subsets",Fstat,numres,dfUR,pval),coef(z1),coef(z2))
  names(ff)<-c("dummy_variable","frequency","Chow_test","coeff_when_dum==0","coeff_when_dum==1")
  ff
}

chowTest("cgdpo~emp+cn",aa,"LDCs")

## $dummy_variable
## [1] "LDCs"
## 
## $frequency
## 
##   0   1 
##  49 122 
## 
## $Chow_test
##                          H0    Fstat numres dfUR         pval
## 1 coef same for the subsets 4.479993      6  165 0.0003136118
## 
## $`coeff_when_dum==0`
## (Intercept)         emp          cn 
##   3.4351228   0.6131559   0.5673762 
## 
## $`coeff_when_dum==1`
## (Intercept)         emp          cn 
## -0.01636236  0.24329884  0.85335359

We rejected the null hypothesis that the output elasticities are the same for LDCs and non-LDCs. The next step would typically be to see how they are different. The stargazer package makes it easy to create a professional-looking table showing coefficients and standard errors from the three regressions:

library(stargazer)
z<-which(aa$LDCs==1);z1<-lm("cgdpo~emp+cn",aa[z,]) #LDCs
z<-which(aa$LDCs==0);z0<-lm("cgdpo~emp+cn",aa[z,]) #non-LDCs
zz<-lm("cgdpo~emp+cn",aa) #all observations
stargazer(zz,z1,z0,type="text",keep.stat=c("n","rsq"),column.labels=c("total","LDCs","non-LDCs"))

## 
## ==========================================
##                   Dependent variable:     
##              -----------------------------
##                           NA              
##                total     LDCs    non-LDCs 
##                 (1)       (2)       (3)   
## ------------------------------------------
## emp          0.244***  0.243***  0.613*** 
##               (0.035)   (0.040)   (0.085) 
##                                           
## cn           0.846***  0.853***  0.567*** 
##               (0.022)   (0.026)   (0.054) 
##                                           
## Constant       0.122    -0.016   3.435*** 
##               (0.238)   (0.279)   (0.618) 
##                                           
## ------------------------------------------
## Observations    171       122       49    
## R2             0.967     0.970     0.966  
## ==========================================
## Note:          *p<0.1; **p<0.05; ***p<0.01

We can see that the estimated coefficients are different (the output elasticity of capital is higher for LDCs; the output elasticity of labor is higher for non-LDCs) but we do not know if they are significantly different. To determine that, we must reestimate the Cobb-Douglas production function, using the LDC dummy as an interaction term. For each of the interaction terms, the null hypothesis is that the output elasticity is identical in LDCs and non-LDCs

summary(zz<-lm("cgdpo~LDCs*(emp+cn)",aa))

## 
## Call:
## lm(formula = "cgdpo~LDCs*(emp+cn)", data = aa)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0176 -0.2087  0.0488  0.2187  0.9418 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.43512    0.71801   4.784 3.79e-06 ***
## LDCs        -3.45149    0.76556  -4.508 1.23e-05 ***
## emp          0.61316    0.09924   6.179 4.87e-09 ***
## cn           0.56738    0.06248   9.081 3.18e-16 ***
## LDCs:emp    -0.36986    0.10644  -3.475 0.000653 ***
## LDCs:cn      0.28598    0.06739   4.244 3.65e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3466 on 165 degrees of freedom
## Multiple R-squared:  0.9715, Adjusted R-squared:  0.9706 
## F-statistic:  1125 on 5 and 165 DF,  p-value: < 2.2e-16

sum(coef(zz)[c("LDCs:emp","LDCs:cn")]) # the amount by which LDC returns to scale differs from non-LDC

## [1] -0.0838797

linearHypothesis(zz,"LDCs:emp+LDCs:cn=0") #H0: the returns to scale are the same for LDCs and non-LDCs

## Linear hypothesis test
## 
## Hypothesis:
## LDCs:emp  + LDCs:cn = 0
## 
## Model 1: restricted model
## Model 2: cgdpo ~ LDCs * (emp + cn)
## 
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    166 20.067                           
## 2    165 19.817  1    0.2496 2.0782 0.1513

sum(coef(zz)[c("emp","LDCs:emp")])  #output elasticity of labor for LDCs

## [1] 0.2432988

sum(coef(zz)[c("cn","LDCs:cn")])  #output elasticity of capital for LDCs

## [1] 0.8533536

coef(zz)["emp"]  #output elasticity of labor for non-LDCs

##       emp 
## 0.6131559

coef(zz)["cn"] #output elasticity of capital for non-LDCs

##        cn 
## 0.5673762

The coefficients for both LDCs:emp and LDCs:cn are significantly different from zero. We therefore reject the null hypotheses that LDCs and non-LDCs have the same output elasticities. LDCs are considered to be relatively labor-abundant and capital-scarce. In a world with diminishing marginal returns, it makes sense that they would have a higher output elasticity of capital and a lower output elasticity of labor.

Though the returns to scale are lower for LDCs, the difference is not statistically significant.

Note that the output elasticities are exactly the same, regardless of whether they are estimated by using interaction terms or by separately regressing on the subsets of observations.