Introduction to Econometrics Seyhan Erden

Department of Economics S3412
Columbia University Summer 2022
SOLUTIONS to Problem Set 2
Introduction to Econometrics
Seyhan Erden
1. [graded] For many years, housing economists believed that households spend a constant
fraction of income on housing, as in
housing expenditure =  (income) + u
The file housing.dta contains housing expenditures (housing) and total expenditures
(total) for a sample of 19th century Belgian workers collected by Edouard Ducpetiaux1
.
The differences in housing expenditures from one observation to the next are in the
variables dhousing; the differences in total expenditures are in variable dtotal.
(a) Compute the means of total expenditure and housing expenditure in this sample
(b) Estimate  using total expenditure for total income.
(c) If income rises by 100 (it averages around 900 in this sample) what change in
estimated expected housing expenditure results according to your estimate in (b)?
(d) Interpret the R
2
(e) What economic argument would you make against housing absorbing a constant
share of income?
(f) What are some determinants of housing captured by u?
Solution:
a)
b)
1 Edouard Ducpetiaux, Budgets Economiques de Classes de Ouvrieres en Belgique (Brussels, Hayaz 1855)
total 162 902.8239 411.6408 377.06 2822.54
housing 162 72.54259 57.26064 7.25 450.52
Variable Obs Mean Std. Dev. Min Max
. sum housing total
c) Housing expenditure is expected to increase by 7.49
d) Since this regression does not contain a constant, we cannot necessarily interpret
the R2
in the usual way (i.e. 64.6% of the variations in housing expenditures can be
explained by the variations in income). To see this, run the regression including a
constant; the R2
is now 12.3%!
e) The relationship more likely to be non-linear
f) Price, mortgage interest rates, location, etc. (answers will vary here)
2. [graded] Use Table 2 to answer the following questions. Table 2 presents the results of four
regressions, one in each column. Estimate the indicated regressions and fill in the values
(you may either handwrite or type the entries in; if you choose to type up the table, an
electronic copy of Table 2 in .doc format is available on the course Web site). For example,
to fill in column (1), estimate the regression with colGPA as the dependent variable and
hsGPA and skipped as the independent variables, using the “robust” option, and fill in the
estimated coefficients
(a) Fill out the table with necessary numbers, some will be on STATA output some you
will need to calculate yourself.
(b) Common sense predicts that your high school GPA (hsGPA) and the number of
classes you skipped (skipped) are determinants of your college GPA (colGPA). Use
regression (2) to test the hypothesis (at the 5% significance level) that the coefficients
on these two economic variables are all zero, against the alternative that at least one
coefficient is nonzero.
at least one coef. is nonzero
The p-value for the F-statistic =.00<.05, thus we reject at the 5% significance level. We tend
to conclude at least one coefficient is nonzero.
(c) Find the F-statistic for regression (3) and explain what is it testing?
The F-statistic for regression (3) is 12.07; it is jointly testing whether all of the coefficients are
total .0749545 .0043495 17.23 0.000 .0663651 .0835439
housing Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 1380396.94 162 8520.96874 Root MSE = 54.901
Adj R-squared = 0.6463
Residual 485275.167 161 3014.13147 R-squared = 0.6485
Model 895121.769 1 895121.769 Prob > F = 0.0000
F( 1, 161) = 296.98
Source SS df MS Number of obs = 162
. reg housing total, noconstant
> 12\Problem Set 2 Fall 2012\housing.dta”, clear
equal to 0, that is – if all the regressors jointly have no explanatory power.
(d) Find the F-statistic for regression (4) and explain what is it testing?
The F-statistic for regression (4) is 11.14 (this is not from the table); it is jointly testing whether
all of the coefficients are equal to 0, that is – if all the regressors jointly have no explanatory
power.
(e) Are bgfriend (whether you have a boy/girlfriend) and campus (whether you live on
campus) jointly significant determinants of college GPA? Use regression (2) and (4)
to test your hypothesis. (i.e. use homoskedasticity-only F stat formula, eq.7.14 in the
book, instead of directly testing with STATA)
at least one coef. is nonzero
where and are obtained from regression (4) and (2) respectively; the number of
restrictionsis q=2; is the number of regressors in the unrestricted regression; q and
are degrees of freedom for the F distriction(here we performnon-robust regressions).
F(2, 135, α=0.05) =3.06>2.62, thus we cannot reject at the 5% significance level. We tend to
conclude that bgfriend and campus jointly have no explanatory power. Alternatively, we can use
the Stata command: di fprob(2, 135, 2.62) to find the associated p-value = .077>.05, again we
cannot reject at the 5% significance level
Table 1
Definitions of Variables in GPA4.dta (data is from Wooldridge textbook)
Variable Definition
colGPA Cumulative College Grade Point Average of a sample of 141 students
at Michigan State University in 1994.
hsGPA High School GPA of students.
skipped Average number of classes skipped per week.
PC = 1 if the students owns a personal computer
= 0 otherwise.
bgfriend = 1 if the student answered “yes” to having a boy/girl friend
question
= 0 otherwise.
campus = 1 if the student lives on campus.
= 0 otherwise.
Table 2
College GPA Results Dependent variable: colGPA
Regressor (1) (2) (3) (4)
hsGPA .458
(.094)
.455
(.092)
.460
(.093)
.461
(.090)
Skipped -.077 -.065 -.065 -.071
(.025) (.025) (.025) (.026)
PC __ .128
(.059)
.130
(.059)
.136
(.058)
bgfriend __ __ .084
(.055)
.085
(.054)
campus __ __ __ -.124
(.078)
Intercept 1.579
(.325)
1.526
(.321)
1.469
(.325)
1.490
(.317)
F-statistics testing the hypothesis that the population coefficients on the indicated
regressors are all zero:
hsGPA, skipped 20.90
(.00)
19.34
(.00)
19.42
(.00)
21.19
(.00)
hsGPA, skipped, PC __ 15.47
(.00)
15.56
(.00)
17.46
(.00)
hsGPA, skipped, PC, bgfriend, __ __ 12.07
(.00)
13.62
(.00)
bgfriend, campus __ __ __ 2.55
(.082)
Regression summary statistics
.211 .234 .241 .252
R
2
.223 .250 .263 .278
Regression RMSE .331 .326 .324 .322
n 141 141 141 141
Notes: Heteroskedasticity-robust standard errors are given in parentheses under estimated
coefficients, and p-values are given in parentheses under F- statistics. The F-statistics are
heteroskedasticity-robust.
2 R
Following questions will not be graded, they are for you to practice and will be discussed at
the recitation by your teaching assistant:
1. SW Empirical Exercise 6.1
6.1. Regressions used in (a) and (b)
Regressor
Model
a b
Beauty 0.133 0.166
Intro 0.011
OneCredit 0.634
Female −0.173
Minority −0.167
NNEnglish −0.244
Intercept 4.00 4.07
SER 0.545 0.513
R
2 0.036 0.155
(a) The estimated slope is 0.133
(b) The estimated slope is 0.166. The coefficient does not change by a large amount. Thus, there
does not appear to be large omitted variable bias.
(c) The first step and second step are summarized in the table
Regressor
Dependent Variable
Beauty Course_eval
Intro 0.12 0.03
OneCredit –0.37 0.57
Female 0.19 –0.14
Minority 0.08 –0.15
NNEnglish 0.02 –0.24
Intercept –0.11 4.05
Regressing the residual from step 2 onto the residual from step 1 yield a coefficient on
Beauty that is equal to 0.166 (as in (b)).
(d) Professor Smith’s predicted course evaluation = (0.166  0) + (0.011  0) + (0.634  0) −
(0.173  0) − (0.167  1) − (0.244  0) + 4.068 = 3.901
2. SW Empirical Exercises 7.1
Regressor
Model
a b
Age 0.60
(0.04)
0.59
(0.04)
Female −3.66
(0.21)
Bachelor 8.08
(0.21)
Intercept 1.08
(1.17)
–0.63
(1.08)
SER 9.99 9.07
R
2 0.029 0.200
2 R
0.029 0.199
(a) The estimated slope is 0.60. The estimated intercept is 1.08.
(b) The estimated marginal effect of Age on AHE is 0.59 dollars per year. The 95%
confidence interval is 0.59  1.96  0.04 or 0.51 to 0.66.
(c) The results are quite similar. Evidently the regression in (a) does not suffer from
important omitted variable bias.
(d) Bob’s predicted average hourly earnings = (0.59  26) + (− 3.66  0) + (8.08  0)
−  = $14.17. Alexis’s predicted average hourly earnings = (0.59  30) + (− 3.66  1)
+ (8.08  1) −  = $21.49.
(e) The regression in (b) fits the data much better. Gender and education are important
predictors of earnings. The R
2
and
2 R
are similar because the sample size is large (n =
7711).
(f) Gender and education are important. The F-statistic is 781, which is (much) larger than
the 1% critical value of 4.61.
(g) The omitted variables must have non-zero coefficients and must correlated with the
included regressor. From (f) Female and Bachelor have non-zero coefficients; yet there
does not seem to be important omitted variable bias, suggesting that the correlation of
Age and Female and Age and Bachelor is small. (The sample correlations are
Cor
(Age,
Female) = −0.03 and
Cor
(Age,Bachelor) = 0.00).
3. SW Exercises 7.1
Estimated Regressions
Regressor
Model
a b
Age 0.60
(0.04)
0.59
(0.04)
Female −3.66
(0.21)
Bachelor 8.08
(0.21)
Intercept 1.08
(1.17)
–0.63
(1.08)
SER 9.99 9.07
R
2 0.029 0.200
0.029 0.199
(a) The estimated slope is 0.60. The estimated intercept is 1.08.
(b) The estimated marginal effect of Age on AHE is 0.59 dollars per year. The 95% confidence
interval is 0.59  1.96  0.04 or 0.51 to 0.66.
(c) The results are quite similar. Evidently the regression in (a) does not suffer from important
omitted variable bias.
(d) Bob’s predicted average hourly earnings = (0.59  26) + (− 3.66  0) + (8.08  0) −  =
$14.17. Alexis’s predicted average hourly earnings = (0.59  30) + (− 3.66  1) + (8.08  1) –
 = $21.49.
(e) The regression in (b) fits the data much better. Gender and education are important predictors
of earnings. The R
2
and are similar because the sample size is large (n = 7711).
(f) Gender and education are important. The F-statistic is 781, which is (much) larger than the
1% critical value of 4.61.
(g) The omitted variables must have non-zero coefficients and must correlated with the included
regressor. From (f) Female and Bachelor have non-zero coefficients; yet there does not seem
to be important omitted variable bias, suggesting that the correlation of Age and Female and
Age and Bachelor is small. (The sample correlations are (Age, Female) = −0.03 and
(Age,Bachelor) = 0.00).
2 R
2 R Cor Cor

Don't use plagiarized sources. Get Your Custom Essay on
Introduction to Econometrics Seyhan Erden
Just from $13/Page
Order Essay
Still stressed from student homework?
Get quality assistance from academic writers!
error: Content is protected !!
Open chat
1
Need assignment help? You can contact our live agent via WhatsApp using +1 718 717 2861

Feel free to ask questions, clarifications, or discounts available when placing an order.

Order your essay today and save 30% with the discount code LOVE