Simulate Item Difficulties from IRW or Custom Pools

This function generates simulated item difficulties by drawing from normal distributions centered around existing difficulty estimates and their associated standard errors. The result is a mixture distribution, from which new difficulties are sampled using inverse CDF sampling.

Usage

irw_simu_diff(
  num_items = 10,
  num_replications = 1,
  irw_names = NULL,
  difficulty_pool = NULL
)

Arguments

num_items: Number of item difficulties to simulate per replication.
num_replications: Number of replications to perform. If 1, returns a numeric vector. If >1, returns a data frame.
irw_names: Optional character vector of IRW dataset names to filter from diff_long.
difficulty_pool: Optional custom data frame with columns dataset, difficulty, and SE. If provided, overrides the default IRW difficulty pool (diff_long).

Value

A numeric vector of difficulties (if num_replications = 1), or a data frame with replication and difficulty columns (if num_replications > 1).

Details

By default, the function uses diff_long, a built-in dataset included in the irw package. This dataset contains item difficulty estimates and standard errors from a curated subset of IRW datasets. You can:

Use the full IRW difficulty pool (diff_long)
Filter to specific IRW datasets via irw_names
Provide your own difficulty pool via difficulty_pool

This method is based on Zhang et al. (2025), which constructs realistic empirical distributions by accounting for uncertainty around item difficulty estimates.

References

Zhang, L., Liu, Y., Molenaar, D., & Domingue, B. (2025). Realistic Simulation of Item Difficulties. https://doi.org/10.31234/osf.io/jbhxy_v1

Examples

if (FALSE) { # \dontrun{
# Use all IRW data (default)
irw_simu_diff(num_items = 5)

# Filter to specific IRW datasets
irw_simu_diff(num_items = 5, irw_names = c("psychtools_epi", "psychtools_blot"))

# Use a custom difficulty pool
my_pool <- data.frame(dataset = "x",
                      difficulty = c(-0.2, 0.1),
                      SE = c(0.1, 0.2))
irw_simu_diff(num_items = 5, difficulty_pool = my_pool)

# Explore built-in IRW difficulty pool
head(diff_long)
unique(diff_long$dataset)
} # }