Department of Economics S3412

Columbia University Summer 2022

SOLUTIONS to Problem Set 2

Introduction to Econometrics

Seyhan Erden

1. [graded] For many years, housing economists believed that households spend a constant

fraction of income on housing, as in

housing expenditure = ï¢ (income) + u

The file housing.dta contains housing expenditures (housing) and total expenditures

(total) for a sample of 19th century Belgian workers collected by Edouard Ducpetiaux1

.

The differences in housing expenditures from one observation to the next are in the

variables dhousing; the differences in total expenditures are in variable dtotal.

(a) Compute the means of total expenditure and housing expenditure in this sample

(b) Estimate ï¢ using total expenditure for total income.

(c) If income rises by 100 (it averages around 900 in this sample) what change in

estimated expected housing expenditure results according to your estimate in (b)?

(d) Interpret the R

2

(e) What economic argument would you make against housing absorbing a constant

share of income?

(f) What are some determinants of housing captured by u?

Solution:

a)

b)

1 Edouard Ducpetiaux, Budgets Economiques de Classes de Ouvrieres en Belgique (Brussels, Hayaz 1855)

total 162 902.8239 411.6408 377.06 2822.54

housing 162 72.54259 57.26064 7.25 450.52

Variable Obs Mean Std. Dev. Min Max

. sum housing total

c) Housing expenditure is expected to increase by 7.49

d) Since this regression does not contain a constant, we cannot necessarily interpret

the R2

in the usual way (i.e. 64.6% of the variations in housing expenditures can be

explained by the variations in income). To see this, run the regression including a

constant; the R2

is now 12.3%!

e) The relationship more likely to be non-linear

f) Price, mortgage interest rates, location, etc. (answers will vary here)

2. [graded] Use Table 2 to answer the following questions. Table 2 presents the results of four

regressions, one in each column. Estimate the indicated regressions and fill in the values

(you may either handwrite or type the entries in; if you choose to type up the table, an

electronic copy of Table 2 in .doc format is available on the course Web site). For example,

to fill in column (1), estimate the regression with colGPA as the dependent variable and

hsGPA and skipped as the independent variables, using the â€œrobustâ€ option, and fill in the

estimated coefficients

(a) Fill out the table with necessary numbers, some will be on STATA output some you

will need to calculate yourself.

(b) Common sense predicts that your high school GPA (hsGPA) and the number of

classes you skipped (skipped) are determinants of your college GPA (colGPA). Use

regression (2) to test the hypothesis (at the 5% significance level) that the coefficients

on these two economic variables are all zero, against the alternative that at least one

coefficient is nonzero.

at least one coef. is nonzero

The p-value for the F-statistic =.00<.05, thus we reject at the 5% significance level. We tend

to conclude at least one coefficient is nonzero.

(c) Find the F-statistic for regression (3) and explain what is it testing?

The F-statistic for regression (3) is 12.07; it is jointly testing whether all of the coefficients are

total .0749545 .0043495 17.23 0.000 .0663651 .0835439

housing Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 1380396.94 162 8520.96874 Root MSE = 54.901

Adj R-squared = 0.6463

Residual 485275.167 161 3014.13147 R-squared = 0.6485

Model 895121.769 1 895121.769 Prob > F = 0.0000

F( 1, 161) = 296.98

Source SS df MS Number of obs = 162

. reg housing total, noconstant

> 12\Problem Set 2 Fall 2012\housing.dta”, clear

equal to 0, that is – if all the regressors jointly have no explanatory power.

(d) Find the F-statistic for regression (4) and explain what is it testing?

The F-statistic for regression (4) is 11.14 (this is not from the table); it is jointly testing whether

all of the coefficients are equal to 0, that is – if all the regressors jointly have no explanatory

power.

(e) Are bgfriend (whether you have a boy/girlfriend) and campus (whether you live on

campus) jointly significant determinants of college GPA? Use regression (2) and (4)

to test your hypothesis. (i.e. use homoskedasticity-only F stat formula, eq.7.14 in the

book, instead of directly testing with STATA)

at least one coef. is nonzero

where and are obtained from regression (4) and (2) respectively; the number of

restrictionsis q=2; is the number of regressors in the unrestricted regression; q and

are degrees of freedom for the F distriction(here we performnon-robust regressions).

F(2, 135, Î±=0.05) =3.06>2.62, thus we cannot reject at the 5% significance level. We tend to

conclude that bgfriend and campus jointly have no explanatory power. Alternatively, we can use

the Stata command: di fprob(2, 135, 2.62) to find the associated p-value = .077>.05, again we

cannot reject at the 5% significance level

Table 1

Definitions of Variables in GPA4.dta (data is from Wooldridge textbook)

Variable Definition

colGPA Cumulative College Grade Point Average of a sample of 141 students

at Michigan State University in 1994.

hsGPA High School GPA of students.

skipped Average number of classes skipped per week.

PC = 1 if the students owns a personal computer

= 0 otherwise.

bgfriend = 1 if the student answered â€œyesâ€ to having a boy/girl friend

question

= 0 otherwise.

campus = 1 if the student lives on campus.

= 0 otherwise.

Table 2

College GPA Results Dependent variable: colGPA

Regressor (1) (2) (3) (4)

hsGPA .458

(.094)

.455

(.092)

.460

(.093)

.461

(.090)

Skipped -.077 -.065 -.065 -.071

(.025) (.025) (.025) (.026)

PC __ .128

(.059)

.130

(.059)

.136

(.058)

bgfriend __ __ .084

(.055)

.085

(.054)

campus __ __ __ -.124

(.078)

Intercept 1.579

(.325)

1.526

(.321)

1.469

(.325)

1.490

(.317)

F-statistics testing the hypothesis that the population coefficients on the indicated

regressors are all zero:

hsGPA, skipped 20.90

(.00)

19.34

(.00)

19.42

(.00)

21.19

(.00)

hsGPA, skipped, PC __ 15.47

(.00)

15.56

(.00)

17.46

(.00)

hsGPA, skipped, PC, bgfriend, __ __ 12.07

(.00)

13.62

(.00)

bgfriend, campus __ __ __ 2.55

(.082)

Regression summary statistics

.211 .234 .241 .252

R

2

.223 .250 .263 .278

Regression RMSE .331 .326 .324 .322

n 141 141 141 141

Notes: Heteroskedasticity-robust standard errors are given in parentheses under estimated

coefficients, and p-values are given in parentheses under F- statistics. The F-statistics are

heteroskedasticity-robust.

2 R

Following questions will not be graded, they are for you to practice and will be discussed at

the recitation by your teaching assistant:

1. SW Empirical Exercise 6.1

6.1. Regressions used in (a) and (b)

Regressor

Model

a b

Beauty 0.133 0.166

Intro 0.011

OneCredit 0.634

Female âˆ’0.173

Minority âˆ’0.167

NNEnglish âˆ’0.244

Intercept 4.00 4.07

SER 0.545 0.513

R

2 0.036 0.155

(a) The estimated slope is 0.133

(b) The estimated slope is 0.166. The coefficient does not change by a large amount. Thus, there

does not appear to be large omitted variable bias.

(c) The first step and second step are summarized in the table

Regressor

Dependent Variable

Beauty Course_eval

Intro 0.12 0.03

OneCredit â€“0.37 0.57

Female 0.19 â€“0.14

Minority 0.08 â€“0.15

NNEnglish 0.02 â€“0.24

Intercept â€“0.11 4.05

Regressing the residual from step 2 onto the residual from step 1 yield a coefficient on

Beauty that is equal to 0.166 (as in (b)).

(d) Professor Smithâ€™s predicted course evaluation = (0.166 ï‚´ 0) + (0.011 ï‚´ 0) + (0.634 ï‚´ 0) âˆ’

(0.173 ï‚´ 0) âˆ’ (0.167 ï‚´ 1) âˆ’ (0.244 ï‚´ 0) + 4.068 = 3.901

2. SW Empirical Exercises 7.1

Regressor

Model

a b

Age 0.60

(0.04)

0.59

(0.04)

Female âˆ’3.66

(0.21)

Bachelor 8.08

(0.21)

Intercept 1.08

(1.17)

â€“0.63

(1.08)

SER 9.99 9.07

R

2 0.029 0.200

2 R

0.029 0.199

(a) The estimated slope is 0.60. The estimated intercept is 1.08.

(b) The estimated marginal effect of Age on AHE is 0.59 dollars per year. The 95%

confidence interval is 0.59 ï‚± 1.96 ï‚´ 0.04 or 0.51 to 0.66.

(c) The results are quite similar. Evidently the regression in (a) does not suffer from

important omitted variable bias.

(d) Bobâ€™s predicted average hourly earnings = (0.59 ï‚´ 26) + (âˆ’ 3.66 ï‚´ 0) + (8.08 ï‚´ 0)

âˆ’ ï€°ï€®ï€¶ï€³ = $14.17. Alexisâ€™s predicted average hourly earnings = (0.59 ï‚´ 30) + (âˆ’ 3.66 ï‚´ 1)

+ (8.08 ï‚´ 1) âˆ’ ï€°ï€®ï€¶ï€³ = $21.49.

(e) The regression in (b) fits the data much better. Gender and education are important

predictors of earnings. The R

2

and

2 R

are similar because the sample size is large (n =

7711).

(f) Gender and education are important. The F-statistic is 781, which is (much) larger than

the 1% critical value of 4.61.

(g) The omitted variables must have non-zero coefficients and must correlated with the

included regressor. From (f) Female and Bachelor have non-zero coefficients; yet there

does not seem to be important omitted variable bias, suggesting that the correlation of

Age and Female and Age and Bachelor is small. (The sample correlations are

Cor

(Age,

Female) = âˆ’0.03 and

Cor

(Age,Bachelor) = 0.00).

3. SW Exercises 7.1

Estimated Regressions

Regressor

Model

a b

Age 0.60

(0.04)

0.59

(0.04)

Female âˆ’3.66

(0.21)

Bachelor 8.08

(0.21)

Intercept 1.08

(1.17)

â€“0.63

(1.08)

SER 9.99 9.07

R

2 0.029 0.200

0.029 0.199

(a) The estimated slope is 0.60. The estimated intercept is 1.08.

(b) The estimated marginal effect of Age on AHE is 0.59 dollars per year. The 95% confidence

interval is 0.59 ï‚± 1.96 ï‚´ 0.04 or 0.51 to 0.66.

(c) The results are quite similar. Evidently the regression in (a) does not suffer from important

omitted variable bias.

(d) Bobâ€™s predicted average hourly earnings = (0.59 ï‚´ 26) + (âˆ’ 3.66 ï‚´ 0) + (8.08 ï‚´ 0) âˆ’ ï€°ï€®ï€¶ï€³ =

$14.17. Alexisâ€™s predicted average hourly earnings = (0.59 ï‚´ 30) + (âˆ’ 3.66 ï‚´ 1) + (8.08 ï‚´ 1) â€“

ï€°ï€®ï€¶ï€³ = $21.49.

(e) The regression in (b) fits the data much better. Gender and education are important predictors

of earnings. The R

2

and are similar because the sample size is large (n = 7711).

(f) Gender and education are important. The F-statistic is 781, which is (much) larger than the

1% critical value of 4.61.

(g) The omitted variables must have non-zero coefficients and must correlated with the included

regressor. From (f) Female and Bachelor have non-zero coefficients; yet there does not seem

to be important omitted variable bias, suggesting that the correlation of Age and Female and

Age and Bachelor is small. (The sample correlations are (Age, Female) = âˆ’0.03 and

(Age,Bachelor) = 0.00).

2 R

2 R Cor Cor

Don't use plagiarized sources. Get Your Custom Essay on

Introduction to Econometrics Seyhan Erden

Just from $13/Page