Rasch Modeling • rebl

NOTE: Example df is small and simulated, which causes non-convergence errors in some models.

Introduction

Something about REBL and PEBs and Rasch models.

Install and load the rebl package and friends:

library(rebl)
pacman::p_load(
  dplyr, 
  ggplot2,
  ltm,
  eRm
)
#> Installing package into '/home/runner/.cache/R/renv/library/rebl-6007be1a/linux-ubuntu-noble/R-4.5/x86_64-pc-linux-gnu'
#> (as 'lib' is unspecified)
#> also installing the dependencies 'farver', 'labeling', 'RColorBrewer', 'viridisLite', 'gtable', 'isoband', 'S7', 'scales'
#> 
#> ggplot2 installed

Recoding

Take a look at some randomly generated data as an example:

head(raw_example[, 1:6])
#>   respondent_id foodLocal foodLunchNoMeat foodMeat foodOatMilk foodTofu
#> 1            p1       Yes              No      Yes          No       No
#> 2            p2       Yes              No       No          No       No
#> 3            p3        No             Yes      Yes          No      Yes
#> 4            p4       Yes             Yes       No         Yes      Yes
#> 5            p5       Yes             Yes       No         Yes      Yes
#> 6            p6        No              No      Yes         Yes      Yes

We have a respondent_id column and 24 REBL items in the data frame.

It will be mighty convenient if we had a character vector of our REBL items. If you already have this, grand. Otherwise, let’s tell rebl what our items are. We can do this with a regex pattern. Here, we are just identifying any column that does not start with “res”.

rebl_items <- id_rebl_items(
  df = raw_example, 
  pattern = '^(?!res).*', 
  perl = TRUE
)
print(rebl_items[1:6])
#> [1] "foodLocal"       "foodLunchNoMeat" "foodMeat"        "foodOatMilk"    
#> [5] "foodTofu"        "foodVegan"

If your items start with “rebl”, the regex pattern would be ^rebl*.

Next, in the event the data for our REBL items are in a yes/no format, we will convert them to 1/0:

df <- recode_rebl(
  df = raw_example, 
  rebl_items = rebl_items
)
head(raw_example[, 1:6])
#>   respondent_id foodLocal foodLunchNoMeat foodMeat foodOatMilk foodTofu
#> 1            p1       Yes              No      Yes          No       No
#> 2            p2       Yes              No       No          No       No
#> 3            p3        No             Yes      Yes          No      Yes
#> 4            p4       Yes             Yes       No         Yes      Yes
#> 5            p5       Yes             Yes       No         Yes      Yes
#> 6            p6        No              No      Yes         Yes      Yes

Now we recode our reverse-coded REBL items. These are the items where no/0 is the pro-environmental behavior. We start by making a vector of these items with a regex pattern. We use the example here of items that start with “food”. (This is silly and confusing though - change this). Then we recode those items.

reversed_items <- id_reversed_rebl_items(rebl_items, '^food')
head(reversed_items)
#> [1] "foodLocal"       "foodLunchNoMeat" "foodMeat"        "foodOatMilk"    
#> [5] "foodTofu"        "foodVegan"

Reverse code our new vector of items:

df <- reverse_code_rebl_items(df, reversed_items)
head(df[,1:6])
#>   respondent_id foodLocal foodLunchNoMeat foodMeat foodOatMilk foodTofu
#> 1            p1         0               1        0           1        1
#> 2            p2         0               1        1           1        1
#> 3            p3         1               0        0           1        0
#> 4            p4         0               0        1           0        0
#> 5            p5         0               0        1           0        0
#> 6            p6         1               1        0           0        0

Rasch Models

CML

Now we’re ready to run the Rasch model and get REBL scores. The Rasch model for dichotomous items in the eRm package (Mair, Wien, and Hatzinger 2007) estimates the probability $P$ of a person $v$ to endorse an item $i$ to be:

$P(X_{vi}=1)=\frac{exp(\theta_v-\beta_i)}{1+exp(\theta_v-\beta_i)},$

where $\theta$ is the latent parameter for person ability (REBL score) and $\beta$ is the latent parameter for item difficulty.

To calculate $\theta$ we just give the function a data frame, the name of the column with the respondent id, and the vector of REBL items:

model_cml <- get_rasch_model(
  df = df, 
  id = 'respondent_id', 
  rebl_items = rebl_items,
  type = 'cml'
)

Here, the coefficients represent the item difficulty parameters. Note that the discrimination of every item is set to 1.

TODO: Interpretation

MML

We can also run a set of Marginal Maximum Likelihood (MML) models using the ltm package (Rizopoulos 2006). Parameters are estimated by maximizing the log-likelihoods of the observed data. The model is implemented in ltm by parameterizing the $m$ th sample unit as:

$\ell_{m} (\theta)=\log\int p(x_{m}|z_{m};\theta)p(z_{m})dz_{m},$

a function of observed response vectors $x_m$ , the rank of the respondent in latent trait values $z_m$ , and the distribution of person abilities $dz_m$ .

First we try a constrained Rasch model, where item discrimination is constrained to all equal 1:

model_con <- get_rasch_model(
  df = df, 
  id = 'respondent_id', 
  rebl_items = rebl_items,
  type = 'mml_con'
)
summary(model_con)
#> 
#> Call:
#> ltm::rasch(data = df, constraint = cbind(length(df) + 1, 1))
#> 
#> Model Summary:
#>    log.Lik      AIC      BIC
#>  -1677.112 3402.225 3464.749
#> 
#> Coefficients:
#>                                     value std.err  z.vals
#> Dffclt.foodLocal                  -0.0004  0.2286 -0.0015
#> Dffclt.foodLunchNoMeat             0.0845  0.2287  0.3693
#> Dffclt.foodMeat                   -1.0011  0.2490 -4.0200
#> Dffclt.foodOatMilk                -0.3430  0.2309 -1.4856
#> Dffclt.foodTofu                   -0.3433  0.2309 -1.4866
#> Dffclt.foodVegan                  -0.5203  0.2340 -2.2239
#> Dffclt.homeClothesCold            -0.5656  0.2350 -2.4070
#> Dffclt.homeClothesHang             0.5646  0.2349  2.4033
#> Dffclt.homeLightsOff              -0.2999  0.2304 -1.3019
#> Dffclt.packCarriedUtensils         0.2992  0.2303  1.2989
#> Dffclt.packCompost                 0.5195  0.2339  2.2206
#> Dffclt.packContainerToRestaurant   0.4741  0.2330  2.0345
#> Dffclt.packPickedUpLitter          0.0847  0.2287  0.3705
#> Dffclt.packPullRecycleFromTrash    0.6567  0.2372  2.7684
#> Dffclt.packRags                   -0.2999  0.2304 -1.3019
#> Dffclt.packReusableMug             0.2558  0.2299  1.1128
#> Dffclt.packReusedPaperPlasticBags -0.4756  0.2331 -2.0407
#> Dffclt.purchBuyNothing             0.6111  0.2360  2.5890
#> Dffclt.socialDocumentary           0.1703  0.2291  0.7434
#> Dffclt.socialGroup                -0.0001  0.2286 -0.0003
#> Dffclt.socialRead                  0.0848  0.2287  0.3707
#> Dffclt.socialSupportive            0.3861  0.2315  1.6676
#> Dffclt.waterShowerStop             0.1696  0.2291  0.7403
#> Dffclt.waterTeethStop             -0.0430  0.2286 -0.1880
#> Dscrmn                             1.0000      NA      NA
#> 
#> Integration:
#> method: Gauss-Hermite
#> quadrature points: 21 
#> 
#> Optimization:
#> Convergence: 0 
#> max(|grad|): 0.015 
#> quasi-Newton: BFGS

TODO: Interpretation

We can also run the MML unconstrained, where item discrimination is set to be equal but can vary from one:

model_uncon <- get_rasch_model(
  df = df, 
  id = 'respondent_id', 
  rebl_items = rebl_items,
  type = 'mml_uncon'
)
#> Warning in ltm::rasch(df): Hessian matrix at convergence is not positive definite; unstable solution.
summary(model_uncon)
#> Warning in sqrt(diag(new.covar)): NaNs produced
#> Warning in sqrt(diag(new.covar)): NaNs produced
#> Warning in sqrt(diag(new.covar)): NaNs produced
#> Warning in sqrt(diag(new.covar)): NaNs produced
#> Warning in sqrt(diag(new.covar)): NaNs produced
#> Warning in sqrt(diag(new.covar)): NaNs produced
#> Warning in sqrt(diag(new.covar)): NaNs produced
#> Warning in sqrt(diag(new.covar)): NaNs produced
#> Warning in sqrt(diag(new.covar)): NaNs produced
#> Warning in sqrt(diag(new.covar)): NaNs produced
#> Warning in sqrt(diag(new.covar)): NaNs produced
#> Warning in sqrt(diag(new.covar)): NaNs produced
#> Warning in sqrt(Var[n.ind + 1, n.ind + 1]): NaNs produced
#> 
#> Call:
#> ltm::rasch(data = df)
#> 
#> Model Summary:
#>    log.Lik      AIC      BIC
#>  -1619.453 3288.906 3354.035
#> 
#> Coefficients:
#>                                     value std.err  z.vals
#> Dffclt.foodLocal                  -0.0212  1.7450 -0.0121
#> Dffclt.foodLunchNoMeat             0.6815  1.6834  0.4048
#> Dffclt.foodMeat                   -8.1682     NaN     NaN
#> Dffclt.foodOatMilk                -2.8016     NaN     NaN
#> Dffclt.foodTofu                   -2.8145     NaN     NaN
#> Dffclt.foodVegan                  -4.2712     NaN     NaN
#> Dffclt.homeClothesCold            -4.6090     NaN     NaN
#> Dffclt.homeClothesHang             4.6651     NaN     NaN
#> Dffclt.homeLightsOff              -2.4593  0.5408 -4.5476
#> Dffclt.packCarriedUtensils         2.4104  0.6331  3.8070
#> Dffclt.packCompost                 4.2683     NaN     NaN
#> Dffclt.packContainerToRestaurant   3.9521     NaN     NaN
#> Dffclt.packPickedUpLitter          0.6709  1.6854  0.3981
#> Dffclt.packPullRecycleFromTrash    5.5774     NaN     NaN
#> Dffclt.packRags                   -2.4529  0.5539 -4.4281
#> Dffclt.packReusableMug             2.0833  1.0343  2.0141
#> Dffclt.packReusedPaperPlasticBags -3.8862     NaN     NaN
#> Dffclt.purchBuyNothing             5.0703     NaN     NaN
#> Dffclt.socialDocumentary           1.3689  1.4807  0.9245
#> Dffclt.socialGroup                -0.0121  1.7451 -0.0069
#> Dffclt.socialRead                  0.6616  1.6871  0.3921
#> Dffclt.socialSupportive            3.1358     NaN     NaN
#> Dffclt.waterShowerStop             1.3800  1.4760  0.9349
#> Dffclt.waterTeethStop             -0.3715  1.7270 -0.2151
#> Dscrmn                             0.1150     NaN     NaN
#> 
#> Integration:
#> method: Gauss-Hermite
#> quadrature points: 21 
#> 
#> Optimization:
#> Convergence: 0 
#> max(|grad|): 14 
#> quasi-Newton: BFGS

Or a two parameter logistic model that estimates both item difficulty and discrimination:

model_2pl <- get_rasch_model(
  df = df, 
  id = 'respondent_id', 
  rebl_items = rebl_items,
  type = 'mml_2pl'
)
summary(model_2pl)
#> 
#> Call:
#> ltm::ltm(formula = df ~ z1)
#> 
#> Model Summary:
#>    log.Lik      AIC      BIC
#>  -1597.748 3291.496 3416.544
#> 
#> Coefficients:
#>                                     value  std.err  z.vals
#> Dffclt.foodLocal                   0.0087   0.2947  0.0295
#> Dffclt.foodLunchNoMeat            -0.3427   0.9586 -0.3576
#> Dffclt.foodMeat                    6.4927  12.7729  0.5083
#> Dffclt.foodOatMilk                 0.5365   0.4334  1.2381
#> Dffclt.foodTofu                    5.2548  21.9639  0.2392
#> Dffclt.foodVegan                   3.4867   6.5661  0.5310
#> Dffclt.homeClothesCold             1.3828   1.1448  1.2079
#> Dffclt.homeClothesHang             9.5786  44.3396  0.2160
#> Dffclt.homeLightsOff               2.5122   6.0374  0.4161
#> Dffclt.packCarriedUtensils         0.1041   0.4619  0.2254
#> Dffclt.packCompost                 3.3601   6.2429  0.5382
#> Dffclt.packContainerToRestaurant  18.2373 192.9655  0.0945
#> Dffclt.packPickedUpLitter         -1.3615   6.7373 -0.2021
#> Dffclt.packPullRecycleFromTrash   -0.9822   0.6374 -1.5409
#> Dffclt.packRags                    5.5813  28.3185  0.1971
#> Dffclt.packReusableMug            -1.9445   4.2870 -0.4536
#> Dffclt.packReusedPaperPlasticBags  0.8175   0.5874  1.3916
#> Dffclt.purchBuyNothing             2.7832   3.7403  0.7441
#> Dffclt.socialDocumentary          -0.3525   0.5046 -0.6986
#> Dffclt.socialGroup                 0.0082   1.6695  0.0049
#> Dffclt.socialRead                 -0.1480   0.4009 -0.3692
#> Dffclt.socialSupportive            1.7074   2.3756  0.7187
#> Dffclt.waterShowerStop            -1.9530   6.5161 -0.2997
#> Dffclt.waterTeethStop              0.8623   6.3557  0.1357
#> Dscrmn.foodLocal                   0.7743   0.3128  2.4750
#> Dscrmn.foodLunchNoMeat            -0.2310   0.2548 -0.9065
#> Dscrmn.foodMeat                   -0.1464   0.2827 -0.5177
#> Dscrmn.foodOatMilk                -0.6750   0.3006 -2.2452
#> Dscrmn.foodTofu                   -0.0616   0.2535 -0.2428
#> Dscrmn.foodVegan                  -0.1414   0.2562 -0.5517
#> Dscrmn.homeClothesCold            -0.4019   0.2742 -1.4654
#> Dscrmn.homeClothesHang             0.0556   0.2561  0.2173
#> Dscrmn.homeLightsOff              -0.1129   0.2564 -0.4404
#> Dscrmn.packCarriedUtensils        12.8555  54.4996  0.2359
#> Dscrmn.packCompost                 0.1468   0.2624  0.5595
#> Dscrmn.packContainerToRestaurant   0.0245   0.2592  0.0947
#> Dscrmn.packPickedUpLitter         -0.0583   0.2488 -0.2341
#> Dscrmn.packPullRecycleFromTrash   -0.6919   0.3249 -2.1297
#> Dscrmn.packRags                   -0.0506   0.2535 -0.1996
#> Dscrmn.packReusableMug            -0.1240   0.2517 -0.4926
#> Dscrmn.packReusedPaperPlasticBags -0.5985   0.2918 -2.0506
#> Dscrmn.purchBuyNothing             0.2095   0.2648  0.7912
#> Dscrmn.socialDocumentary          -0.4678   0.2790 -1.6770
#> Dscrmn.socialGroup                 0.1203   0.2579  0.4664
#> Dscrmn.socialRead                 -0.5483   0.2832 -1.9362
#> Dscrmn.socialSupportive            0.2167   0.2694  0.8046
#> Dscrmn.waterShowerStop            -0.0819   0.2532 -0.3236
#> Dscrmn.waterTeethStop             -0.0466   0.2502 -0.1861
#> 
#> Integration:
#> method: Gauss-Hermite
#> quadrature points: 21 
#> 
#> Optimization:
#> Convergence: 0 
#> max(|grad|): 0.0084 
#> quasi-Newton: BFGS

Finally, we can use the three parameter logistic model that also estimates a latent variable for guessing:

model_tpm <- get_rasch_model(
  df = df, 
  id = 'respondent_id', 
  rebl_items = rebl_items,
  type = 'mml_tpm'
)
#> Warning in ltm::tpm(df, type = "rasch", max.guessing = 1): Hessian matrix at convergence is not positive definite; unstable solution.
summary(model_tpm)
#> 
#> Call:
#> ltm::tpm(data = df, type = "rasch", max.guessing = 1)
#> 
#> Model Summary:
#>   log.Lik    AIC      BIC
#>  -1618.75 3335.5 3463.154
#> 
#> Coefficients:
#>                                      value std.err  z.vals
#> Gussng.foodLocal                    0.0928     NaN     NaN
#> Gussng.foodLunchNoMeat              0.1055     NaN     NaN
#> Gussng.foodMeat                     0.0499     NaN     NaN
#> Gussng.foodOatMilk                  0.0760     NaN     NaN
#> Gussng.foodTofu                     0.0672  1.1754  0.0572
#> Gussng.foodVegan                    0.0542  0.9039  0.0600
#> Gussng.homeClothesCold              0.0586     NaN     NaN
#> Gussng.homeClothesHang              0.1348  0.4471  0.3014
#> Gussng.homeLightsOff                0.0715  1.7596  0.0407
#> Gussng.packCarriedUtensils          0.1156     NaN     NaN
#> Gussng.packCompost                  0.1273  0.5623  0.2264
#> Gussng.packContainerToRestaurant    0.1332  0.7371  0.1807
#> Gussng.packPickedUpLitter           0.0967     NaN     NaN
#> Gussng.packPullRecycleFromTrash     0.1436     NaN     NaN
#> Gussng.packRags                     0.0802     NaN     NaN
#> Gussng.packReusableMug              0.1211     NaN     NaN
#> Gussng.packReusedPaperPlasticBags   0.0629     NaN     NaN
#> Gussng.purchBuyNothing              0.1383  0.4829  0.2863
#> Gussng.socialDocumentary            0.1089     NaN     NaN
#> Gussng.socialGroup                  0.1016     NaN     NaN
#> Gussng.socialRead                   0.0880  0.9040  0.0973
#> Gussng.socialSupportive             0.1234     NaN     NaN
#> Gussng.waterShowerStop              0.1135     NaN     NaN
#> Gussng.waterTeethStop               0.0842  0.8981  0.0937
#> Dffclt.foodLocal                    5.6084     NaN     NaN
#> Dffclt.foodLunchNoMeat              9.0830     NaN     NaN
#> Dffclt.foodMeat                   -23.2593     NaN     NaN
#> Dffclt.foodOatMilk                 -4.7730     NaN     NaN
#> Dffclt.foodTofu                    -5.4310 60.2100 -0.0902
#> Dffclt.foodVegan                  -10.8194 30.5300 -0.3544
#> Dffclt.homeClothesCold            -11.5569     NaN     NaN
#> Dffclt.homeClothesHang             26.3952     NaN     NaN
#> Dffclt.homeLightsOff               -4.0096 94.8933 -0.0423
#> Dffclt.packCarriedUtensils         16.3230     NaN     NaN
#> Dffclt.packCompost                 24.1189     NaN     NaN
#> Dffclt.packContainerToRestaurant   23.3063 39.3952  0.5916
#> Dffclt.packPickedUpLitter           8.3503     NaN     NaN
#> Dffclt.packPullRecycleFromTrash    31.1251     NaN     NaN
#> Dffclt.packRags                    -3.4320     NaN     NaN
#> Dffclt.packReusableMug             15.4645     NaN     NaN
#> Dffclt.packReusedPaperPlasticBags  -9.0833     NaN     NaN
#> Dffclt.purchBuyNothing             28.4568     NaN     NaN
#> Dffclt.socialDocumentary           11.8613     NaN     NaN
#> Dffclt.socialGroup                  6.3318     NaN     NaN
#> Dffclt.socialRead                   7.5694 58.6144  0.1291
#> Dffclt.socialSupportive            19.6263     NaN     NaN
#> Dffclt.waterShowerStop             12.2448     NaN     NaN
#> Dffclt.waterTeethStop               3.7391 56.2121  0.0665
#> Dscrmn                              0.0370     NaN     NaN
#> 
#> Integration:
#> method: Gauss-Hermite
#> quadrature points: 21 
#> 
#> Optimization:
#> Optimizer: optim (BFGS)
#> Convergence: 0 
#> max(|grad|): 3.2

References

Mair, Patrick, Wirtschaftsuniversität Wien, and Reinhold Hatzinger. 2007. “Journal of Statistical Software Extended Rasch Modeling: The eRm Package for the Application of IRT Models in r.” http://www.jstatsoft.org/.

Rizopoulos, Dimitris. 2006. “Ltm: An r Package for Latent Variable Modeling and Item Response Theory Analyses.” Journal of Statistical Software 17 (5): 1–25.