Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Item Analysis in R: A Practical Guide Using the 'psych' Package, Lecture notes of Statistics

Misamis University (MU)Statistics

It is simply the proportion of individuals that got the “correct” answer on a question, and so it is thought of as a method of determining how “easy” an item ...

Typology: Lecture notes

2021/2022

Uploaded on 08/05/2022

hal_s95 🇵🇭

4.4

(652)

10K documents

1 / 9

This page cannot be seen from the preview

Don't miss anything!

Item Analysis

Rachael Smyth and Andrew Johnson

October 1, 2015

Load Libraries

We’ll need the psych package for the analyses in this demonstration.

library(psych)

When you go to use this package in your publications, you should reference the package, as this gives your

readers an idea as to the version of the package used, and also gives credit to the authors of the package.

You can get the citation for any package using the

citation

function. So. . . to get the citation information

for the psych package, we would do the following:

citation("psych")

##

## To cite the psych package in publications use:

##

## Revelle, W. (2017) psych: Procedures for Personality and Psychological

## Research, Northwestern University, Evanston, Illinois, USA,

## https://CRAN.R-project.org/package=psych Version = 1.7.5.

##

## A BibTeX entry for LaTeX users is

##

## @Manual{,

## title = {psych: Procedures for Psychological, Psychometric, and Personality Research},

## author = {William Revelle},

## organization = { Northwestern University},

## address = { Evanston, Illinois},

## year = {2017},

## note = {R package version 1.7.5},

## url = {https://CRAN.R-project.org/package=psych},

## }

The Data

The dataset that we’ll use for this demonstration is called

bfi

and comes from the

psych

package. It is made

up of 25 self-report personality items from the International Personality Item Pool, gender, education level

and age for 2800 subjects and used in the Synthetic Aperture Personality Assessment.

The personality items are split into 5 categories: Agreeableness (A), Conscientiousness (C), Extraversion

(E), Neuroticism (N), Openness (O). Each item was answered on a six point scale: 1 Very Inaccurate, 2

Moderately Inaccurate, 3 Slightly Inaccurate, 4 Slightly Accurate, 5 Moderately Accurate, 6 Very Accurate.

data("bfi")

1

Partial preview of the text

Download Item Analysis in R: A Practical Guide Using the 'psych' Package and more Lecture notes Statistics in PDF only on Docsity!

Item Analysis

Rachael Smyth and Andrew Johnson

October 1, 2015

Load Libraries

We’ll need the psych package for the analyses in this demonstration.

library (psych)

When you go to use this package in your publications, you should reference the package, as this gives your readers an idea as to the version of the package used, and also gives credit to the authors of the package. You can get the citation for any package using the citation function. So... to get the citation information for the psych package, we would do the following: citation ("psych")

To cite the psych package in publications use:

Revelle, W. (2017) psych: Procedures for Personality and Psychological

Research, Northwestern University, Evanston, Illinois, USA,

https://CRAN.R-project.org/package=psych Version = 1.7.5.

A BibTeX entry for LaTeX users is

@Manual{,

title = {psych: Procedures for Psychological, Psychometric, and Personality Research},

author = {William Revelle},

organization = { Northwestern University},

address = { Evanston, Illinois},

year = {2017},

note = {R package version 1.7.5},

url = {https://CRAN.R-project.org/package=psych},

}

The Data

The dataset that we’ll use for this demonstration is called bfi and comes from the psych package. It is made up of 25 self-report personality items from the International Personality Item Pool, gender, education level and age for 2800 subjects and used in the Synthetic Aperture Personality Assessment. The personality items are split into 5 categories: Agreeableness (A), Conscientiousness (C), Extraversion (E), Neuroticism (N), Openness (O). Each item was answered on a six point scale: 1 Very Inaccurate, 2 Moderately Inaccurate, 3 Slightly Inaccurate, 4 Slightly Accurate, 5 Moderately Accurate, 6 Very Accurate. data ("bfi")

Because the data file is embedded within R, you can query this information directly within the console, with ?bfi. You can also look at the variable names and the first six lines of data, using head(bfi). head (bfi)

A1 A2 A3 A4 A5 C1 C2 C3 C4 C5 E1 E2 E3 E4 E5 N1 N2 N3 N4 N5 O1 O2 O3 O4 O5 gender

61617 2 4 3 4 4 2 3 3 4 4 3 3 3 4 4 3 4 2 2 3 3 6 3 4 3 1

61618 2 4 5 2 5 5 4 4 3 4 1 1 6 4 3 3 3 3 5 5 4 2 4 3 3 2

61620 5 4 5 4 4 4 5 4 2 5 2 4 4 4 5 4 5 4 2 3 4 2 5 5 2 2

61621 4 4 6 5 5 4 4 3 5 5 5 3 4 4 4 2 5 2 4 1 3 3 4 3 5 2

61622 2 3 3 4 5 4 4 5 3 2 2 2 5 4 5 2 3 4 4 3 3 3 4 3 3 1

61623 6 6 5 6 5 6 6 6 1 3 2 1 6 5 6 3 5 2 2 3 4 3 5 6 1 2

education age

61617 NA 16

61618 NA 18

61620 NA 17

61621 NA 17

61622 NA 17

61623 3 21

Finally, let’s create a data object that just has the personality items in it, to facilitate some of our later analyses. We’ll call it bfi.items. bfi.items <- bfi[,1:25]

Item Difficulty

The simplest item analyses are often the best, and that’s the case with item difficulty. It is simply the proportion of individuals that got the “correct” answer on a question, and so it is thought of as a method of determining how “easy” an item is (i.e., “is the question answered correctly by a high proportion of individuals?”) or how “difficult” an item is (i.e., “is the question answered correctly by a low proportion of individuals?”). The ability of an item to discriminate among individuals within the sample is related to this, because items that are too difficult, or too easy, are not particularly good at distinguishing amongst individuals within your sample. Thus, item difficulty is an excellent property to compute for your set of items. And the good news is that it’s quite easy to derive, in R. To start with, we will need to convert our continuous Likert style items into dichotomous items (i.e., “1” or “0”). In the BFI data, we would recode very inaccurate , moderately inaccurate , and slightly inaccurate to become inaccurate , while very accurate , moderately accurate , and slightly inaccurate become accurate ).

Scoring The Data

We’ll set up the scoring key by scale. There are five scales in our data:

Agreeableness
Conscientiousness
Extraversion
Neuroticism
Openness The first step is to set up a list object that indicates the items that go with each of our scales. We can indicate negatively-keyed items with a negative sign. bfi.keys.list <- list (agree= c (-1, 2, 3, 4, 5), consc= c (6, 7, 8, -9, -10), extra= c (-11, -12, 13, 14, 15), neuro= c (16, 17, 18, 19, 20), open= c (21, -22, 23, 24, -25))

Notice that we used the index within the bfi object, for each of the items. We could have used the variable names (“A1”, “A2”, etc.) within this list object, and it would have been perfectly acceptable to the key generating function - but it would have been problematic for our use of the alpha command later on.

We will now take this list object, and use the make.keys function to create the scoring key itself.

bfi.keys <- make.keys (bfi.items,bfi.keys.list,item.labels= colnames (bfi))

Finally, we can use this scoring key to calculate the scale scores for each of our 5 scales. This is also where decisions about missing data are made - I have chosen to just average the non-missing values. If we had wanted to impute missing values with mean substitution (a problematic, albeit common choice), we would use impute = "mean" in place of impute = "none". Finally, another common (and also problematic) practice is to use listwise deletion on the data, by only analyzing complete cases. To do this, you would remove impute = "none" and replace it with missing = FALSE. Because we have both positively and negatively keyed items within our scales, we need to tell the function the minimum and maximum on the scale. This allows it to reverse key the negatively keyed items before creating the scale scores (an item can be reversed by subtracting the observed value from the maximum scale score - in this case, “6”). bfi.scored <- scoreItems (bfi.keys, bfi.items, impute = "none", min=1, max=6, digits=3)

The actual scale scores for each of our five personality variables are now available within the scores value of the bfi.scored object. head (bfi.scored$scores)

agree consc extra neuro open

[1,] 4.0 2.8 3.8 2.8 3.

[2,] 4.2 4.0 5.0 3.8 4.

[3,] 3.8 4.0 4.2 3.6 4.

[4,] 4.6 3.0 3.6 2.8 3.

[5,] 4.0 4.4 4.8 3.2 3.

[6,] 4.6 5.6 5.6 3.0 5.

Reliability Analyses

We can now use the alpha function to calculate the item statistics. Let’s start with the agreeableness scale.

output.alpha.agree <- alpha (bfi.items[, abs (bfi.keys.list$agree)], check.keys=TRUE) output.alpha.agree

Reliability analysis

Call: alpha(x = bfi.items[, abs(bfi.keys.list$agree)], check.keys = TRUE)

raw_alpha std.alpha G6(smc) average_r S/N ase mean sd

0.7 0.71 0.68 0.33 2.5 0.009 4.7 0.

lower alpha upper 95% confidence boundaries

0.69 0.7 0.

Reliability if an item is dropped:

raw_alpha std.alpha G6(smc) average_r S/N alpha se

A1- 0.72 0.73 0.67 0.40 2.6 0.

A2 0.62 0.63 0.58 0.29 1.7 0.

A3 0.60 0.61 0.56 0.28 1.6 0.

A4 0.69 0.69 0.65 0.36 2.3 0.

A5 0.64 0.66 0.61 0.32 1.9 0.

Item statistics

n raw.r std.r r.cor r.drop mean sd

A1- 2784 0.58 0.57 0.38 0.31 4.6 1.

A2 2773 0.73 0.75 0.67 0.56 4.8 1.

A3 2774 0.76 0.77 0.71 0.59 4.6 1.

A4 2781 0.65 0.63 0.47 0.39 4.7 1.

A5 2784 0.69 0.70 0.60 0.49 4.6 1.

Non missing response frequency for each item

1 2 3 4 5 6 miss

A1 0.33 0.29 0.14 0.12 0.08 0.03 0.

A2 0.02 0.05 0.05 0.20 0.37 0.31 0.

A3 0.03 0.06 0.07 0.20 0.36 0.27 0.

A4 0.05 0.08 0.07 0.16 0.24 0.41 0.

A5 0.02 0.07 0.09 0.22 0.35 0.25 0.

We used the list object that we created to produce the scoring key, as this contained the indices associated with each of the items in the bfi.items object. You’ll notice, however, that we applied an absolute value function to the object prior to use - this is because the alpha function can’t cope with negative indices. Some of our items were, however, negatively keyed - and this needs to be taken into account within our calculations of alpha. Otherwise, these items will negatively contribute to overall alpha. Fortunately, the alpha function has the facility to automatically reverse key any items that have a negative item-total correlation. If you want to take advantage of this facility, you need to include the parameter check.keys = TRUE.

Another specific piece of information that we might be interested in is the alpha-if-deleted for each of the items. This gives us information as to how the alpha will change (up or down) when a particular item is deleted. We can access this information from within the alpha.drop value in our reliability objects.

output.alpha.agree$alpha.drop

raw_alpha std.alpha G6(smc) average_r S/N alpha se

A1- 0.7185174 0.7255091 0.6730278 0.3978723 2.643109 0.

A2 0.6171800 0.6255799 0.5794588 0.2946317 1.670797 0.

A3 0.6002596 0.6129447 0.5578155 0.2836176 1.583610 0.

A4 0.6858057 0.6935413 0.6498474 0.3613369 2.263083 0.

A5 0.6429530 0.6555302 0.6050623 0.3223798 1.903012 0.

output.alpha.consc$alpha.drop

raw_alpha std.alpha G6(smc) average_r S/N alpha se

C1 0.6940004 0.6964219 0.6400689 0.3644787 2.294045 0.

C2 0.6735715 0.6748686 0.6189270 0.3416374 2.075679 0.

C3 0.6887341 0.6939587 0.6443433 0.3617903 2.267533 0.

C4- 0.6538256 0.6629030 0.6028021 0.3295908 1.966505 0.

C5- 0.6897249 0.6902020 0.6283368 0.3577300 2.227910 0.

output.alpha.extra$alpha.drop

raw_alpha std.alpha G6(smc) average_r S/N alpha se

E1- 0.7256547 0.7254473 0.6731108 0.3977979 2.642289 0.

E2- 0.6901804 0.6930860 0.6341860 0.3608429 2.258242 0.

E3 0.7279142 0.7262478 0.6737381 0.3987619 2.652939 0.

E4 0.7018885 0.7032346 0.6464289 0.3720235 2.369665 0.

E5 0.7436327 0.7442029 0.6913944 0.4210742 2.909348 0.

output.alpha.neuro$alpha.drop

raw_alpha std.alpha G6(smc) average_r S/N alpha se

N1 0.7581379 0.7583430 0.7109569 0.4396265 3.138096 0.

N2 0.7632327 0.7633957 0.7158526 0.4464791 3.226466 0.

N3 0.7553428 0.7567103 0.7311738 0.4374379 3.110326 0.

N4 0.7953499 0.7968948 0.7688488 0.4951762 3.923557 0.

N5 0.8126022 0.8128355 0.7870014 0.5205500 4.342892 0.

output.alpha.open$alpha.drop

raw_alpha std.alpha G6(smc) average_r S/N alpha se

O1 0.5315935 0.5340608 0.4761929 0.2227279 1.146203 0.

O2- 0.5672275 0.5701068 0.5103454 0.2489897 1.326159 0.

O3 0.4973614 0.5005554 0.4417967 0.2003557 1.002224 0.

O4 0.6114811 0.6208252 0.5602676 0.2904412 1.637306 0.

O5- 0.5116576 0.5279603 0.4738055 0.2185158 1.118466 0.

Or we can ask for a reasonably comprehensive set of item statistics, by accessing the item.stats value in our reliability objects. This value will generate a data frame that includes the following:

value description n number of complete cases for the item raw.r correlation of each item with the total score, not corrected for item overlap std.r correlation of each item with the total score, not corrected for item overlap, based on standardized items r.cor correlation of each item with the total score, corrected for item overlap and scale reliability r.drop correlation of each item with the total score, NOT including this item mean mean of the item sd standard deviation of the item

Recall when we discussed item discrimination, in the context of item difficulty earlier? Those item-total correlations (raw.r, std.r, r.cor, r.drop) are a direct estimate of item discrimination. Items that are particularly good at discriminating between individuals at the extreme ends of the scale will have strong positive correlations with the total score, and so this correlation is often cited as a measure of the “discriminatory power” of an item.

Item Analysis in R: A Practical Guide Using the 'psych' Package, Lecture notes of Statistics

Related documents

Partial preview of the text

Download Item Analysis in R: A Practical Guide Using the 'psych' Package and more Lecture notes Statistics in PDF only on Docsity!

Item Analysis

Rachael Smyth and Andrew Johnson

October 1, 2015

Load Libraries

To cite the psych package in publications use:

Revelle, W. (2017) psych: Procedures for Personality and Psychological

Research, Northwestern University, Evanston, Illinois, USA,

https://CRAN.R-project.org/package=psych Version = 1.7.5.

A BibTeX entry for LaTeX users is

@Manual{,

title = {psych: Procedures for Psychological, Psychometric, and Personality Research},

author = {William Revelle},

organization = { Northwestern University},

address = { Evanston, Illinois},

year = {2017},

note = {R package version 1.7.5},

url = {https://CRAN.R-project.org/package=psych},

}

The Data

A1 A2 A3 A4 A5 C1 C2 C3 C4 C5 E1 E2 E3 E4 E5 N1 N2 N3 N4 N5 O1 O2 O3 O4 O5 gender

61617 2 4 3 4 4 2 3 3 4 4 3 3 3 4 4 3 4 2 2 3 3 6 3 4 3 1

61618 2 4 5 2 5 5 4 4 3 4 1 1 6 4 3 3 3 3 5 5 4 2 4 3 3 2

61620 5 4 5 4 4 4 5 4 2 5 2 4 4 4 5 4 5 4 2 3 4 2 5 5 2 2

61621 4 4 6 5 5 4 4 3 5 5 5 3 4 4 4 2 5 2 4 1 3 3 4 3 5 2

61622 2 3 3 4 5 4 4 5 3 2 2 2 5 4 5 2 3 4 4 3 3 3 4 3 3 1

61623 6 6 5 6 5 6 6 6 1 3 2 1 6 5 6 3 5 2 2 3 4 3 5 6 1 2

education age

61617 NA 16

61618 NA 18

61620 NA 17

61621 NA 17

61622 NA 17

61623 3 21

agree consc extra neuro open

[1,] 4.0 2.8 3.8 2.8 3.

[2,] 4.2 4.0 5.0 3.8 4.

[3,] 3.8 4.0 4.2 3.6 4.

[4,] 4.6 3.0 3.6 2.8 3.

[5,] 4.0 4.4 4.8 3.2 3.

[6,] 4.6 5.6 5.6 3.0 5.

Reliability analysis

Call: alpha(x = bfi.items[, abs(bfi.keys.list$agree)], check.keys = TRUE)

raw_alpha std.alpha G6(smc) average_r S/N ase mean sd

0.7 0.71 0.68 0.33 2.5 0.009 4.7 0.

lower alpha upper 95% confidence boundaries

0.69 0.7 0.

Reliability if an item is dropped:

raw_alpha std.alpha G6(smc) average_r S/N alpha se

A1- 0.72 0.73 0.67 0.40 2.6 0.

A2 0.62 0.63 0.58 0.29 1.7 0.

A3 0.60 0.61 0.56 0.28 1.6 0.

A4 0.69 0.69 0.65 0.36 2.3 0.

A5 0.64 0.66 0.61 0.32 1.9 0.

Item statistics

n raw.r std.r r.cor r.drop mean sd

A1- 2784 0.58 0.57 0.38 0.31 4.6 1.

A2 2773 0.73 0.75 0.67 0.56 4.8 1.

A3 2774 0.76 0.77 0.71 0.59 4.6 1.

A4 2781 0.65 0.63 0.47 0.39 4.7 1.

A5 2784 0.69 0.70 0.60 0.49 4.6 1.

Non missing response frequency for each item

1 2 3 4 5 6 miss

A1 0.33 0.29 0.14 0.12 0.08 0.03 0.

A2 0.02 0.05 0.05 0.20 0.37 0.31 0.

A3 0.03 0.06 0.07 0.20 0.36 0.27 0.

A4 0.05 0.08 0.07 0.16 0.24 0.41 0.

A5 0.02 0.07 0.09 0.22 0.35 0.25 0.

raw_alpha std.alpha G6(smc) average_r S/N alpha se

A1- 0.7185174 0.7255091 0.6730278 0.3978723 2.643109 0.

A2 0.6171800 0.6255799 0.5794588 0.2946317 1.670797 0.

A3 0.6002596 0.6129447 0.5578155 0.2836176 1.583610 0.

A4 0.6858057 0.6935413 0.6498474 0.3613369 2.263083 0.

A5 0.6429530 0.6555302 0.6050623 0.3223798 1.903012 0.

raw_alpha std.alpha G6(smc) average_r S/N alpha se

C1 0.6940004 0.6964219 0.6400689 0.3644787 2.294045 0.

C2 0.6735715 0.6748686 0.6189270 0.3416374 2.075679 0.