Estimability Tools for Rank-Deficient Linear Models: Ensuring Trustworthy Predictions | Summaries Statistics

CONTRIBUTED RESE ARC H ARTICLES 195

Estimability Tools for Package

Developers

by Russell V. Lenth

Abstract

When a linear model is rank-deficient, then predictions based on that model become

questionable because not all predictions are uniquely estimable. However, some of them are, and the

estimability

package provides tools that package developers can use to tell which is which. With the

use of these tools, a model object’s

predict

method could return estimable predictions as-is while

flagging non-estimable ones in some way, so that the user can know which predictions to believe. The

estimability

package also provides, as a demonstration, an estimability-enhanced

epredict

method

to use in place of predict for models fitted using the stats package.

Introduction

Consider a linear regression or mixed model having fixed component of the matrix form

Xβ

. If

not of full column rank, then there is not a unique estimate

. However, consider using

λ0b

estimate the value of some linear function

λ0β

β=∑jλjβj

. (We use

to denote the transpose of a vector

.) For some

s, the prediction depends on the solution

; but for others—the estimable ones—it does

not.

An illustration

An example will help illustrate the issues. In the following commands, we create four predictors

–

and a response variable y:

>x1<--4:4

> x2 <- c(-2, 1, -1, 2, 0, 2, -1, 1,-2)

>x3<-3*x1-2*x2

>x4<-x2-x1+4

> y <- 1 + x1 + x2 + x3 + x4 + c(-.5, .5, .5, -.5, 0, .5, -.5, -.5, .5)

Clearly,

and

depend linearly on

and

and the intercept. Let us fit two versions of the

same model to these data, entering the predictors in different orders, and compare the regression

coefficients:

> mod1234 <- lm(y ~ x1 + x2 + x3 + x4)

> mod4321 <- lm(y ~ x4 + x3 + x2 + x1)

> zapsmall(rbind(b1234 = coef(mod1234), b4321 = coef(mod4321)[c(1, 5:2)]))

(Intercept) x1 x2 x3 x4

b1234 5 3 0 NA NA

b4321 -19 NA NA 3 6

Note that in each model, two regression coefficients are

. This indicates that the associated predictors

were excluded due to their linear dependence on the others.

The problem that concerns us is making predictions on new values of the predictors. Here are

some predictor combinations to try:

> testset <- data.frame(

+ x1 = c(3, 6, 6, 0, 0, 1),

+ x2 = c(1, 2, 2, 0, 0, 2),

+ x3 = c(7, 14, 14, 0, 0, 3),

+ x4 = c(2, 4, 0, 4, 0, 4))

And here is what happens when we make the predictions:

> cbind(testset,

+ pred1234 = predict(mod1234, newdata = testset),

+ pred4321 = suppressWarnings(predict(mod4321, newdata = testset)))

x1 x2 x3 x4 pred1234 pred4321

1 3 1 7 2 14 14

2 6 2 14 4 23 47

The R Journal Vol. 7/1, June 2015 ISSN 2073-4859

Estimability Tools for Rank-Deficient Linear Models: Ensuring Trustworthy Predictions, Summaries of Statistics

Related documents

Partial preview of the text

Download Estimability Tools for Rank-Deficient Linear Models: Ensuring Trustworthy Predictions and more Summaries Statistics in PDF only on Docsity!

Estimability Tools for Package

Developers

by Russell V. Lenth

An illustration

Methods used by estimability

S =

[

]