Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Econometrics Midterm Exam Solutions: Regression Analysis and OLS Estimators, Exams of Economics

Answers to an econometrics midterm exam, covering topics such as random samples, random variables, the impact of control variables, and the properties of the ols estimator. It includes detailed explanations and derivations related to linear regression models, hypothesis testing, and potential issues like endogeneity. The document also addresses the interpretation of regression coefficients and the validity of statistical tests in the context of wage analysis. This material is useful for students studying econometrics, providing insights into the application of econometric techniques and the interpretation of results. 489 characters long.

Typology: Exams

2024/2025

Available from 05/29/2025

elam-dennis
elam-dennis 🇨🇦

12 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
ECO5185
Midterm Answers;
Fall 2025.
Question 1
a.
Briefly explain (in words) the idea/concept of each of the following, and briefly explain their
implication for econometric analyses
(a)
random sample
Answer:
A random sample is a sample whose draws are independent and identically
distributed. It means that for the simple linear regression model A3, i.e.
E(ϵ
i
|
X
) =
0
(1)
simplifies to
E(ϵ
i
|
x
i2
)
=
0.
(2)
As such, for the OLS estimator to be unbiased we only need to worry about the explana-
tory variable not being correlated with the error term within each draw. It will also have
implications for the structure of the variance of our estimator.
(b)
random variable
Answer:
A random variable is a variable whose outcome is uncertain/unknown.
Given that
y
i
and
x
ik
(i =
1, . . . , n and k
=
1, . . . ,
K)
are random variables, the OLS
estimator, β
k
(k
=
1, . . . ,
K)
is also a random variable. It has a mean and a variance,
and as such, has statistical properties and we can carry-out hypothesis testing.
b.
Discuss how the addition of a control variable has implications for
(a)
the variation used to estimate a parameter of interest
Answer:
By the FWL theorem, the addition of a control variable means that we are not
using some of the variation in x2 when estimating its parameter (i.e. β2). More precisely,
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Econometrics Midterm Exam Solutions: Regression Analysis and OLS Estimators and more Exams Economics in PDF only on Docsity!

ECO5185 Midterm Answers; Fall 2025.

Question 1

a. Briefly explain (in words) the idea/concept of each of the following, and briefly explain their

implication for econometric analyses

(a) random sample

Answer: A random sample is a sample whose draws are independent and identically

distributed. It means that for the simple linear regression model A3, i.e.

E ( ϵi | X ) = 0 (1)

simplifies to

E ( ϵi | xi 2 ) = 0_._ (2)

As such, for the OLS estimator to be unbiased we only need to worry about the explana-

tory variable not being correlated with the error term within each draw. It will also have

implications for the structure of the variance of our estimator.

(b) random variable

Answer: A random variable is a variable whose outcome is uncertain/unknown.

Given that yi and xik ( i = 1 ,... , n and k = 1 ,... , K ) are random variables, the OLS

estimator, βk ( k = 1 ,... , K ) is also a random variable. It has a mean and a variance,

and as such, has statistical properties and we can carry-out hypothesis testing.

b. Discuss how the addition of a control variable has implications for

(a) the variation used to estimate a parameter of interest

Answer: By the FWL theorem, the addition of a control variable means that we are not

using some of the variation in x 2 when estimating its parameter (i.e. β 2 ). More precisely,

we are not using the variation in x 2 that is correlated with the new control.

(b) the validity of a test

Answer: Adding a control may result in assumption A3 holding, if the additional variable

belongs in the model and is correlated with the explanatory variable. Recall that if A

does not hold the t-test and F-test are not valid.

c. Briefly evaluate the following statement

The R^2 measures the proportion of the variation in the dependent variable that is explained by

the regression line, and as such, a high R^2 implies the estimate of the parameter of interest is

probably close to the true parameter.”

Answer: The first part of the statement is true. The R^2 measures how much of the variation in

the dependent variable is explained by the OLS regression line. Having said this, the second

part of the statement is not true. Having a high R^2 has no bearing on whether assumption

A3 holds, and as such no bearing on whether the estimator is unbiased or consistent. As such,

it has no bearing on whether the estimate is close (or probably close) to the true.

d. You are interested in exploring whether men, on average, have a higher hourly wage than

women in the Canadian labour market. Your co-author suggests the following estimator

imale (^) wage inm

ifemale wagei nf

where wagei is the hourly wage of individual i , and nm and nf are the number of males and

females in the sample, respectively.

Would this estimator, when applied to data, generate a guess that close to the true parameter

of interest? Justify your answer.

Answer: This is a method of moment estimator (i.e. the sample analogs of the population

moments). It is made up of sample means, and we know that sample means converge in

probability to population means if we have a random sample. Now, if the estimator is con-

sistent (i.e. we have a random sample) and we have a large sample, one can conclude that the

estimator is probably close to the true population parameter. We cannot, however, say it

with certainty.

Question 2

Assume the true population model takes the form

yi = β 1 + β 2 xi 2 + ϵi

Σ x Σ Σ Σ x

Σ Σ Σ x Σ Σ x Σ x

i = n i =

2 i 2 i =

i =1 i 2 n i =

i =1 i 2 i =

n i =

2 i 2

i =

i = 2 i 2 i =

n i = i = n i = i =

Answer: Σ n yixi 2

Σ n ( β 2 xi 2 + ϵi ) xi 2 = (^) n i = n (^) [ β 2 x 2

2 i 2

  • ϵixi 2 ]

β 2

Σ n x^2

Σ n ϵixi 2

Σ n ϵixi 2

c. Will this slope parameter estimator be unbiased? Show your work.

Answer: Σ n ϵixi 2

Σ n ϵixi 2

Σ n xi 2 E [ ϵi | X ]

Σ n xi 2 · 0 = β 2 +

= β 2

n (^) 2 i =1 i 2

It will be unbiased if A3 holds.

Question 3

a. Show the following

(a)

PM = 0

= E [ β 2 | X ] + E

x

x

x

x

b 2 =

2 i 2 =

  • (^) n i =

2 i 2

= β 2 +

E [ b 2 | X ] = E β 2 +^ n i =

| X

2 i 2

| X

= β 2 + (^2) i 2

n i =1 x

2 i 2

Answer:

PM = X ( X ′ X )−^1 X ′[ I − X ( X ′ X )−^1 X ′]

= [ X ( X ′ X )−^1 X ′ I ] − [ X ( X ′ X )−^1 X ′ X ( X ′ X )−^1 X ′]

= [ X ( X ′ X )−^1 X ′ I ] − [ X ( X ′ X )−^1 IX ′]

= [ X ( X ′ X )−^1 X ′] − [ X ( X ′ X )−^1 X ′]

(b)

M is idempotent

Answer:

MM = [ I − X ( X ′ X )−^1 X ′][ I − X ( X ′ X )−^1 X ′]

= [ II ] − [ IX ( X ′ X )−^1 X ′] − [ X ( X ′ X )−^1 X ′ I ] + [ X ( X ′ X )−^1 X ′ X ( X ′ X )−^1 X ′]

= I − X ( X ′ X )−^1 X ′^ − X ( X ′ X )−^1 X ′^ + X ( X ′ X )−^1 IX ′

= I − X ( X ′ X )−^1 X ′^ − X ( X ′ X )−^1 X ′^ + X ( X ′ X )−^1 X ′

= I − X ( X ′ X )−^1 X ′

= M

(c)

[( M 1 X 2 )′( M 1 X 2 )]−^1 ( M 1 X 2 )′( M 1 y ) = [ X ′^ X 2 − X ′^ X 1 ( X ′^ X 1 )−^1 X ′^ X 2 ]−^1 [ X ′^ yX ′^ X 1 ( X ′^ X 1 )−^1 X ′^ y ] 2 2 1 1 2 2 1 1

where the underlying population model is

y = X 1 β 1 + X 2 β 2 + ε

Answer:

[( M 1 X 2 )′( M 1 X 2 )]−^1 ( M 1 X 2 )′( M 1 y ) = [ X ′^ M 1 X 2 ]−^1 [ X ′^ M 1 y ] 2 2 = [ X ′^ ( IX 1 ( X ′^ X 1 )−^1 X ′^ ) X 2 ]−^1 [ X ′^ ( IX 1 ( X ′^ X 1 )−^1 X ′^ ) y ] 2 1 1 2 1 1 = [( X ′^ − X ′^ X 1 ( X ′^ X 1 )−^1 X ′^ ) X 2 ]−^1 [( X ′^ − X ′^ X 1 ( X ′^ X 1 )−^1 X ′^ ) y ] 2 2 1 1 2 2 1 1 = [ X ′^ X 2 − X ′^ X 1 ( X ′^ X 1 )−^1 X ′^ X 2 ]−^1 [ X ′^ yX ′^ X 1 ( X ′^ X 1 )−^1 X ′^ y ] 2 2 1 1 2 2 1 1

(c) Another co-author suggests the following estimator

b = ( XPZX )−^1 XPZy

where P Z = Z ( ZZ )−^1 Z ′, with Z being the X matrix where the information for the xK -

th variable is replaced with information on z 1 and z 2. Therefore Z is n × K + 1. P Z is

both symmetric and idempotent. Show that, under certain identifying assumptions, this estimator will be consistent. (5 marks)

Answer:

b = ( XPZX )−^1 XPZy

= ( XPZX )−^1 XPZ ( X β + ε )

= β + ( XPZX )−^1 XPZ ε (prop. of matrices)

= β + ( XZ ( ZZ )−^1 ZX )−^1 XZ ( ZZ )−^1 Zε

X ′ Z Z ′ Z

− 1 ZX

X ′ Z Z ′ Z

− 1 Zε

X ′ Z Z ′ Z

− 1 ZX

X ′ Z Z ′ Z

− 1 Zε

plim ( b ) = plim β + n n n n^ n^ n

= plim β + plim

X ′ Z

n

plim

Z ′ Z

− 1

n

plim

Z ′ X

− 1

n

plim

X ′ Z

n

plim

Z ′ Z

− 1

n

plim

Zε

n

= β + ( Q XZ Q −^1 Q ZX )−^1 Q XZ Q −^1 0 ZZ ZZ = β

Question 5

The STATA output needed to answer this question can be found in the following pages.

Assume the following population model

wagei = β 1 + β 2 unioni + β 3 indigenousi + β 4 westi + β 5 easti

  • β 6 indigenousi · westi + β 7 indigenousi · easti + εi

where wagei is the hourly wage (in dollars) of individual i. union is a binary variable equal to

one if the individual is unionized or covered by a union (and zero otherwise), and indigenous is

a binary variable that equals one if the person identifies themselves as being indigenous (and zero

n n n n n n = β +

2

2

otherwise). Finally, west is a binary variable that equals one if the person lives in Western Canada

(and zero otherwise) and east is a binary variable that equals one if the person lives in Eastern Canada

(and zero otherwise).^1

a. Interpret the coefficient estimate of β 2.

Answer: Holding all other factors constant, a worker that is unionized (or covered by a union)

makes $ 5.04 more per hour than a worker that is not unionized (nor covered by a union).

b. Test whether being unionized or covered by a union has an impact on the wage. Show your

work. (5 marks)

Answer:

Step 1:

H 0 : β 2 = 0

Ha : β 2 ̸= 0

Step 2: Under A 1, A 2 a , A 3, A 4, A 5, and A 6

tstat^ =

bols^ − β 2 se ( bols )

~ tnK ( t 53779 )

Step 3:

tstat^ =

Furthermore, the critical values are is 1_._ 962 and - 1_._ 962

Step 4 (conclusion): Since

tstat^ = 35_._ 652 > 1_._ 962

one rejects H 0 in favour of Ha at the 5% level of significance. Said less formally, being unionized

(or covered by a union) impacts the wage.

c. Do you believe that the test carried above is valid? Justify your answer. (5 marks)

Answer: Gender probably belongs in the wage equation and women tend to be more unionized

than men. As such, assumption A3 probably does not hold which means that the test is not

valid. (^1) Central Canada is the reference group.

i

sw 2 = i = n − 1

sw,z = i = n − 1

Equations

Properties of sums

n n n Σ ( xi 2 + yi ) =

xi 2 +

yi i =1 (^) n i =1 n i =

ayi = a

yi i = n

i =

a = n · a i =

other equations

n n n Σ ( yiy ¯) = 0

( xi 2 − x ¯ 2 )( yiy ¯) =

( xi 2 − x ¯ 2 ) yi

Expectation properties (with a being a constant)

E ( wi + zi ) = E ( wi ) + E ( zi )

E ( awi ) = aE ( wi )

E ( a ) = a

E ( wi ) = Ez { E [ wi | zi ] }

Variance and covariance

V ar ( wi ) = E ( wiE ( wi ))^2

= E ( w^2 ) − E ( wi ) E ( wi )

Cov ( wi, zi ) = E ( wiE ( wi ))( ziE ( zi ))

= E ( wizi ) − E ( wi ) E ( zi )

Sample variance and sample covariance

Σ n ( wiw ¯ )^2

Σ n ( wiw ¯)( ziz ¯)

i =1 i =1 i =

i =

s^2 =

1 Σ^

e^2

Simple linear regression model

yi = β 1 + β 2 xi 2 + εi

Σ n ( xi 2 − x ¯ 2 )( yiy ¯)

var ( bols | X ) = (^) n i =

σ^2 ( xi 2 — x ¯ )^2

est. var ( bols | X ) = (^) n i =

s^2 ( xi 2 — x ¯ )^2

where

se ( bols ) =

n − 2

s

Σ n

i =

s^2 ( x

i

x ¯)^2

i =1 i

yi = y ˆ i + ei where y ˆ i = b 1 + b 2 xi 2

Simple linear regression model assumptions

A 1: Linearity (in the parameters) of the regression model

A 2 a : Variation in x 2 (in the sample)

A 3 (Regarding the error term ( ε )): Exogeneity of X , E ( εi | X ) = 0

A 4: Homoskedasticity and absence of autocorrelation var ( εi ) = σ^2 and cov ( εi, εj ) = 0 for i ̸= j

A 5: Data Generation (stochastic process for the explanatory variable(s))

n i =1 ( x^ i^2 —^ x ¯^2 ) 2

ols ols ols b 1 = y ¯ − b 2 x ¯^2 b 2 =

n

e

X

R-squared

R^2 =

SSR

SST

where SST (total sum of squares) is

n ( yiy ¯)^2 i =

SSR (regression sum of squares) is

n ( y ˆ iy ¯)^2 i =

and SSE (sum of squared error) is

n 2 i i =

Properties of Matrices

a. Given two matrices ( A and B ) of equal dimensions

A + B = B + A

b. Given three matrices ( A , B and C ) of equal dimensions

( A + B ) + C = A + ( B + C )

c. Given two matrices ( A and B ) of equal dimensions

( A + B )′^ = A ′^ + B ′

d.

( A ′)′^ = A

e. In general,

AB ̸= BA

f.

( AB ) C = A ( BC )

g.

A ( B + C ) = AB + AC and ( A + B ) C = AC + BC

h.

( AB )′^ = B ′ A ′

i.

AI = IA = A

where I is the identity matrix

j. If A is square and of full rank, A −^1 exists, and

A −^1 A = AA −^1 = I

k.

( A −^1 )−^1 = A

l.

( A −^1 )′^ = ( A ′)−^1

m.

( AB )−^1 = B −^1 A −^1

Multiple linear regression model assumptions

A 1: Linearity (in the parameters) of the regression model

A 2: X is full rank (i.e. rank ( X ) = K ).

A 3 (Regarding the Error Term ( ε )): Exogeneity of X , E ( ε | X ) = 0

A 4: Homoskedasticity and absence of autocorrelation

E ( εε ′| X ) = σ^2 I n

A 5: Data Generation (stochastic process for the explanatory variable(s))

A 6: Normality of the error terms - The disturbance terms are normally distributed, conditional on

x

Fstat^

( R^2 − R ∗^2 ) /J

(1 − R^2 ) /nK