Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Reviewing Assumptions - Econometric Methods - Lecture Notes, Study notes of Econometrics and Mathematical Economics

Reviewing Assumptions, Dummy variables, Structural Change, Unrestricted model, Specification Analysis, Model Building, Non Nested or Competing Models, Least Squares describe this handout.

Typology: Study notes

2011/2012

Uploaded on 11/10/2012

uzman
uzman 🇮🇳

4.8

(12)

148 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Pg 1 AREc525: Econometric Methods: Re Assumptions (notes taken and arranged by Ryan Siegel)
Econometric Methods
Reviewing Assumptions
Assumption 1: “Correctly specified, linear model”
Dummy variables
o Case 1: Binary, model specified as:
iii Dy
εδµ
++=
, (Di = 1 if female)
=
malefemale
male
µµ
µ
δ
µ
ˆ
ˆ
Alternately:
iififi HDy
εµµ
++=
o Case 2: Several Categories
i
fallsummer
spring
ii
D
D
D
yc
ε
δδ
δβ
µ
++
+
+
+
=
3
2
1
where:
=
erwfall
erwsum
er
wspring
erw
cc
cc
cc
c
int
int
int
int
3
2
1
1
ˆ
ˆ
ˆ
ˆ
δ
δ
δ
µ
Alternately:
i
erw
fall
summer
springi
i
DD
DDy
c
ε
δδ
δδβ
++
+
++
=
int
33
21
where:
=
4
3
2
1
4
3
2
1
ˆ
ˆ
ˆ
ˆ
c
c
c
c
δ
δ
δ
δ
o Case 3: (many categories, many values, just examined)
o Case 4: Threshold Effects
If wanted to measure effect of increasing levels of, say, education,
one couldn’t set up a variable where Ei = 1 if high school, 2 if
bachelors, 3 if masters, 4 if PhD, etc. since this assumes that each
‘jump’ (1 to 2, 2 to 3) has equal ‘value.’
Instead, use dummy variables:
( )
εδβα
+++=
=
3
1
)(
iii Eageincome
(notice that dummy variable trap is avoided by summing through 3,
not 4 (which is the total number of categories)
o Case 5: Interaction Terms
Suppose have a model:
ε
γβα
+
++= )
()( genderageincome
( )
=+=
=
)(
0
ageincomeE
i
D
βα
Allow for interaction:
ε
δγβα
+
+++
=)()(
)( ageDD
ageincome
ii
( )
( ) ( )
)()()(
1
ageageageincomeE
i
D
βδγαδγβα
+++=+++=
=
( )
)(
0ageincomeEi
D
βα
+=
=
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Reviewing Assumptions - Econometric Methods - Lecture Notes and more Study notes Econometrics and Mathematical Economics in PDF only on Docsity!

Econometric Methods

Reviewing Assumptions

Assumption 1: “Correctly specified, linear model”

 Dummy variables

o Case 1: Binary, model specified as: yi = μ +δ Di + ε i , (Di = 1 if female)

female male

male

 Alternately: yi =μ fDi +μ fHi + ε i

o Case 2: Several Categories

summer fall i

i i spring

D D

c y D

2 3

1 where:

fall w er

sum w er

spring w er

w er

c c

c c

c c

c

int

int

int

int

3

2

1

1

 Alternately: fall w er i

i i spring summer

D D

c y D D

3 3 int

1 2 where:

4

3

2

1

4

3

2

1

c

c

c

c

o Case 3: (many categories, many values, just examined) o Case 4: Threshold Effects  If wanted to measure effect of increasing levels of, say, education, one couldn’t set up a variable where Ei = 1 if high school, 2 if bachelors, 3 if masters, 4 if PhD, etc. since this assumes that each ‘jump’ (1 to 2, 2 to 3) has equal ‘value.’

 Instead, use dummy variables: = α +β +∑(δ )+ε

3

1

i

income age i Ei

(notice that dummy variable trap is avoided by summing through 3, not 4 (which is the total number of categories) o Case 5: Interaction Terms

 Suppose have a model: income = α +β( age )+γ( gender )+ ε

  • ( ) ( ) ( ) ( ) 1 E income age age Di =α +β +γ=α+γ + β =
  • ( ) = + =

0 E income age Di

 Allow for interaction:

income = α +β( age )+γ( Di )+δ( Di ∗ age )+ ε

  • E ( income 1 ) ( age ) ( age ) ( ) ( ) ( age ) Di

= =α^ +β +γ+δ =α+γ +δ+^ β

  • ( ) (^) ( )

E incomeDi = 0 =α + β age

  • Therefore, interaction terms help change slopes

 Structural Change

o We generally assume that β is the same for all Y. However, it may look as follows:

o H 0 : Rβ=q (wil tell us if β applies to all y)

Unrestricted model : (^)  

×

×

×

×

×

×

×

×

21

11

21

11

21

11

21

11

2

1

2

1

2

1

2

1 0

n

n

n

n

n

n

n

n X

X

y

y

  • Yields (^1111) 1 b 1 (^) = ( X 1 ′ X 1 ) XYee − and

2 2 2 2

1 b 2 (^) = ( X 2 ′ X 2 ) XYee − and the total residual residual

sum of squares: ee = e 1 ′ e 1 + e 2 ′ e 2

Restricted Model

  • If the restriction is β 1 =β 2 we can impose k restrictions since there are 2k parameters (doesn’t violate assumption that

J<K):

1 2

2

1 1

1

1 2 k k

o [ ]

R k Ik Ik k

o

1 2

2

1 1

1

k k

R q

  • In restricted scenario only have one set of β:

×

×

×

×

×

×

21

11

21

11

21

11 2

1

2

1

2

1

n

n

n

n

n

n X

X

y

y

  • If  F-Test:

( )

[ e e ( n n k )]

ee ee k F (^) k n n k 1 2 2

** ( , 1 22 ) ′ + −

x

y

o ~ ( 0 , 1 ) 2 ˆ

N

w w

T

rk

r t ∑ =+

o (^) ∑ =+

T

rk

wr w T k 1

2 ( ) ( 1 )

σˆ and (^) ∑ − =+

T

rk

wr T k

w ( ) 2

 Specification Analysis

o Omitting relevant variables  Biased parameters  Variance is smaller than true model, therefore get higher t-ratios  s 2 is a biased estimator of σ 2

o Including an irrelevant variable  Unbiased parameters  Variance is greater than true variance  s 2 is an unbiased estimator of σ 2

 Model Building

o R2 or adjusted R2 (slowly add variables to increase it)

o Akaike Information Criterion

kn AIC k sy R e

2 2 2 / ( )= ( 1 − ) (choose the lower

AIC)

o Bayesian Information Criterion:

kn BIC k sy R e

2 2 / ( )= ( 1 − )

 Non-Nested or Competing Models

o Macroeconomics makes use of this

o H 0 : y = X β +ε x H 1 : y = Z γ+ ε z

o Encompassing Test

 (Look for variables in common) W = XZ , X and Z are those remaining in each model

 y = X β + Z γ+ w δ+ ε test if β = 0 or Z = 0

o J-Test  y = ( 1 −α ) X β+α Z γ+ ε

 Regress Y on Z, get γˆ

 Construct Z γˆ

 Run y = ( 1 −α ) X β+α( Z γˆ)+ε, ( )

N

SE α

o Cox Test  This was not discussed in class. (There’s a complex discussion in the book.)

w

t

time

breaks Stable model

unstable model

2

Assumption 2: Matrix X, has rank K

 Multicollinearity o Two cases:  Perfect multcollinearity, solution: drop variable (if possible)  Near or high multicollinearity o Detect with:

 Variance Inflation Factor: ( ) 2 1

R k

(15-20 is a large number)

 Characteristic roots: SCR

LCR

(if > 20, then problems) where LCR

and SCR are the largest and smallest characteristic roots. o Fix:  Remove observation (however, not always possible due to theory, etc.)  Missing observations o Ignorable – data are unavailable for unknown reasons  Case A: YA, XA nA observations on X and Y are available  Case B: — , XB nB observations missing on Y  Case C: YC, — nC observations missing on X

  • Zero Order Regression: use X to replace XC o bLS will not change (recall that for simple regression:

=

=

A

A

n

i

i

n

i

i i

x x

y y x x

b

1

2

1

so when we add an observation ( X ) it makes no addition since: ( xx ) = 0 ).

o R

2 will be lower:

(recall that: YMY

ee R 0

2 1 ′

= − , additional yi adds other

value)

  • Modified Zero Order Regression: fill missing spaces with zeros and add a dummy variable that takes on value of 1 for missing values (algebraically identical to filling the gaps with X. o bLS will change o R

2 will change o Systematic – when there is a reason (sample selection bias issue)  Outliers or Influential Observations

 Let Xn and Yn be matrices whose elements are r.v. and plim Xn = A, plim Yn = B:

  • plim X

     n = A - 
  • plim (Xn∙Yn) = A∙B (if conformable) o Convergence in Distribution  xn converges in distribution to x with a CDF F(x) if:

lim (^) n → ∞ F ( xn )− F ( x )= 0 at all continuity points of F(), notation:

x x

d n

 Rules (let x x

d n →^ and^ p^ lim^ yn = c )

  • x y xc

d n n

  • x y x c

d n + n → +

  • x y x c

d n /^ n →^ / if c ≠ 0

  • Let g(xn) be a continuous function: g ( x ) g ( ) x

d n

  • If p lim( xnyn ) = 0 then xn has the same limiting

distribution as yn

 Least Squares:

o bLS is a consistent estimator of β, plim b = β

 Use assumption that 1

1 lim −

 = 

 ′^

Q

n

XX

p

n

X

n

XX

b

1 therefore

n

X

Q p n

X

n

XX

p b p

lim limβ lim

1

1

lim (^) = 0 

n

X

p

somehow… using convergence in mean square:

n

X

E

, lim (^) n →∞ 0 = 0

  • ( ) n

XX

n

E X X

n n

X

Var

 ′^

2

2

, and

lim 0 ( ) 0

2 = =

→ ∞ Q

n

XX

n

n

σ

o Therefore p lim b = β

o B has a limiting distribution that is normal o Stabilizing transformation:

 This, n ( b −β), converges in distribution to a normal distrbituion

 Lindberg-Levy Univariate Central Limit Theorem

  • Let x 1 , …, xn are a random sample from a probability distribution with finite mean μ and variance σ 2 and

n

i

i x n

x

=

∑ , then ( ) ( 0 , ) 2

n b β N σ

d − → (this is the

distribution of the statistic)

  • (^ )^ (^ )^ 

n

X

p nb Q p

n

X

n

XX

nb

β , lim lim

1

1

here we need to show that the latter term is distributed normally since Q is a constant:

o (^) = 0 

n

X

E

, by assumptions 3 and 5

o ( ) n

XX

E X X

n n

X

Var

 ′^12

therefore

Q

n

XX

p 2 2

limσ = σ

o Therefore, ( 0 , ) 2 N Q n

X d

o And,

( )

( )

( ( ) ) 2 1

2 1

2 1

1 1 2 1

− − −

b N X X

n

XX

n

b N

nb N Q

nb N Q Q QQ

d

d

d

d

o Implications for t and f-tests

T-test:

  • 2 2

p lim s = σ

[ ]

[ ]

2 2 2

1 2

1

2 1

lim lim

i

i n

p s p

n

X

n

XX

n

X

n k n

n s

X XX X

n k

I X XX X

n k

M

n k n k

ee s

  • In large sample context t-statistic becomes z-statistic (NOTES LESS FROM NOW ON)  F-Test:
  • Recall, OLS:

( )

2 1 1 ˆ (^) ( ) ( ) − − ′ −

= ′ = XX

n k

ee

Var β OLS s XX

  • Recall, MLE:

( ) 2 1 1 ˆ (^) ˆ ( ) ( ) − − ′

= ′ = XX

n

ee

Var β MLE σ XX

  • So, in small sample context they are different, but in large sample context they are the same.

 Generalized Regression Model: Y = X β + ε, E ( ε | X )= 0 , = Ω

2

Var (ε | X ) σ

o Heteroskedasticity:

= Ω× =

2

2 11 11 2 2

nn nn

Var X nn

 Often found in cross-section data, also in high frequency o Autocorrelation (memory, persistence)

×

1

2 1

1 1

1 2 1

2 2

n

n

Var X nn

o Least Squares in General Context:

 ( ) ( )

2 1 1 ( | )

− −

Var bLS X = σ X ′ X X ′Ω X X ′ X

 Not efficient estimator

 Assume

lim Q n

X X

p (^) = 

 ′Ω^

, then p lim bLS = β, therefore

consistent  Also asymptotically normal:

− 1 − 1

2 , lim Q n

X X

Q p n

b N

d LS

β CAN BE SHOWN?

MAYBE DO?

 However, asymptotically inefficient, does not achieve CRLB o Omega knowledge:  Ω is known

  • Use spectral decomposition ( Ω = C Λ C ′), transform parameters, and run least-squares on them (Weighted Least Squares).
  • Use GLS for smaller samples
  • Use MLE for larger samples  Ω is unknown but its structure is known
  • E.g. if variance is a function of firm size then can run

regression, forecast and yield (^) Ωˆ^ which can be used:

σ i = f ( firmsize )=α+β si + ν i

2

. If this regression has

spherical disturbances (no autocorr. or heterosked.) then ˆ^2

σ i is a consistent and asymptotically efficient estimator.

 Ω is completely unknown

  • Since Ω is n x n and it is symmetric it has: 2

n ( n + 1 )

parameters which is greater than n (i.e. it cannot be estimated)

  • However, X ′Ω X has (^) n

k k <

parameters, therefore

X ′Ω^ X can be approximated

o Define (^) ∑∑ = =

n

i

n

j

ij xixj n

Q

1 1

σ which approximates

X ′Ω X

o White’s (Heteroskedasticity) Consistent Covariance matrix

 defined: (^) ∑

×

n

i

k k eixixi

n

s

1

2 0

p lim s 0 = Q

 ( ) ( )

1

1

2

2 1 1 1 1 ( )

=

 

= ′ ∑ XX n

e xx n

XX

n n

Varb

n

i

LS i i i

using White’s heteroskedasticity consistent covariance matrix… o Newey-West Autocorrelation Consistent Covariance Matrix

 ( )( ) 

= + ∑ ∑ ′ + ′ = =+

− − −

L

l

T

tl

wl etetl xtxtl xtlxt n

Q s 1 1

0

where 1

L

l wl , where L is the lag length

o Generalized Least Squares:

 Transform data in such a way that assumption 4, Var(ε|X) = σ

2 I, is satisfied  Model: Y=Xβ + ε, Var(ε|X) = σ

2 Ω, transform data into:


Y = X β + ε and Var X I

    • 2 (ε | )= σ

 Spectral Decomposition:

  • Consistent
  • Asymptotic Normality
  • Asymptotically Efficient

2 * *^1211

    • 1

lim

lim

− − −

N X X N X X

X X Q

n

X X p n

p

d

β GLS βσ β σ

 Since βˆ MLE^ achieves Cramer-Rao Lower Bound, so does βˆ GLS

 Good to use in smaller-sample case, MLE for larger samples

o Weighted Least Squares

 Size of industry case, divide all observations by size:

nn s^ n

s

2

1

2 11

n

nk n

n n

n

k

s

x s

s s

x

s

x s

s s

x

X PX

1

1

1 1

1 1

11

  

o What is the structure of Ω? (Case when Ω is unknown but its structure is

known)  White’s Test

  • When running LS model, e should be a consistent estimator of ε, therefore
  • Run regression of

ei = f ( 1 , x , x , x , x , x , x ,( x 1 x 2 ),( x 1 x 3 ),( x 2 x 3 )) + υ i

2 3

2 2

2 1 2 3 1

2

  • H 0 : σi

2 = σ

2 , H 1 : not H 0.

  • nR

2 ~ χ

2 (P), where P is the number of regressors in f( ) above.

  • Non-constructive test, because doesn’t tell you how to proceed (i.e. correct problem)  Goldfield-Quandt Test
  • Sort all data by size in ascending order
  • Slice data into two samples, run regression on each slice
  • H 0 : σ 1

2 = σ 2

2 , H 1 : not H 0.

1 1

n

sort

n x

x

x

x

  , cut in half, sample 1: e 1 ′ e 1 , sample 2: e 2 ′ e 2

  • Define F-statistic: ( )

11 1

22 2 ( 2 , 1 ) ee n k

ee n k F (^) n kn k ′ −

− − = where^ e^2 ′^ e^2 is on

numerator because expect σ 1

2 < σ 2

2

  • Could also split up data into 3 parts and test 1:33 to 66: (ignore middle piece).
  • If find that a variable is causing a problem, then know what Ω is and can use weighted least squares.  Breusch-Godfrey / Lagrange Multiplier Test
  • ( 0 )

2 2

σ i = σ f α − α′ z i , where H 0 : σi

2 = σ

2 , H 1 : not H 0.

  • LM = ½ explained sum of squares in the regression of ei ( e e / n ) 2 ′ on zi.
  • LM ~ χ 2 (P), o Feasible GLS (a two step estimator)

 (^) FGLS ( X X ) X Y

ˆ ˆ−^1 −^1 ˆ−^1

 ( ) ( )

2 1 1 ˆ (^) ˆ − −

Var β FGLS = σ X ′Ω X

 Run regression:

( 50 ) ln ln ln( 50 )

2 2 2 2

e i σ f mupop ei σ λ mupop

λ = ⇔ = + , yields

( )

( )

2

2 1

lnˆ

lnˆ

e n

e

.

 Conduct forecast, and if in log, exponentiate and put on diagonals of

Ωˆ o Maximum Likelihood Estimator  Likelihood function of all parameters (including Ω)  Likelihood ratio test:

  • ( )

2 − 2 ln LR −ln L 0 ~ χ( P ) where P = the number of α’s

 Time-Series Model

o yt = d 0 + d 1 xt + ε t , note that (^) { } t Y

t

t

= ∞

= −∞

and that t = 1,…,T which is a “time-

window” o With time-series data we run into two problems  Independence  Randomness o Properties:  Stationarity – expect mean, var, cov to be finite – we know they are well-behaved  Ergodicity – (has to do with independence)  Martingale Sequences – (fixes randomness problem) o With these three ideas we define Central Limit Theorem based on Martingale Sequences, we continue to maintain E ( ε (^) t | X ) = 0 and

Cov ( ε (^) t , ε ts ) ≠ 0 where t ≠ s

o Ways to determine autocorrelation / capture covariance:  Autoregressive Process:

 (^) ∑

P

j

Q T rj 1

2 and

( )

=

=+

− = (^) T

t

t

T

jt

t t j

j e

ee

1

2

1

γ which is

essentially a correlation coefficient between residuals and lag residuals.

2

Q = Tr 1 ~ χ( 1 )

o Durbin-Watson Test

( )

=

=

= T

t

t

T

t

t t

e

e e D

1

2

2

2 1 (notice only one lag here)

 T is the number of observations and K is the number of parameters  Lower limit: d (^) L ( T , k )

 Upper limit: dU ( T , k )

 Hypothesis testing:

  • If d < dL then reject H
  • If d > dU then fail to reject H
  • If dL < d < dU then it is inconclusive o Everything done so far relies on assumption that you don’t have a lagged dependent variable (i.e. X’s do not include Yt-1)

 If it were included: yt = α +β 1 x 1 t +β 2 yt − 1 + ε t

(where ε t = P ε t − 1 + ut ) and

( )

t t t t t

t t t t t P y x y u

y x y P u

⇒ = − − − −

− −

− −

1 1 1 2 1

11 2 1 1

now ε

is dependent on x, therefore ε is not consistent.  Can use the Durbin H test:

• Z

TS

T

h r C

  • Where S (^) C 2 → Var ( β^ ˆ 2 )and

( )

=

=

− = (^) T

t

t

T

t

t t

e

ee

r

1

2

2

2 1

 Moving Average:

  • ε t = ut + λ ut − 1