FAQs
What is computer adaptive testing (CAT)?
Computer adaptive testing (CAT) is a method of administering health–related
quality–of–life (HRQOL) measures by computer using the psychometric
framework of item response theory (IRT) (11). These IRT–based
adaptive tests are greatly facilitated by a computer because of the computational
requirements of the algorithm and the logistics of item and data management.
Items are selected on the basis of the patient's responses to previously administered
items. This process uses an algorithm to estimate a person's score and the score's
reliability and then chooses the best next item, enabling scale administration based
on specifications such as content coverage, test length, and standard error. The
capacity to rank all patients on the same continuum, even if they have not been
given any common items, allows for an assessment that is individually tailored to
each person. With item banking, each patient need only answer a subset of items
to obtain a measure that accurately estimates what would have been obtained by administering
the entire set of items.
CAT has been used successfully in educational, licensing, and achievement testing;
personality assessment; and selection of military personnel. Recently, investigators
have begun to apply CAT to HRQOL (13). CAT has been shown to
reduce test length without loss of precision. To realize many of the measurement
advantages of adaptive testing, the item pool from which items are selected must
contain high–quality items for many different levels of the HRQOL continuum.
In addition, adaptive test item pools must satisfy the assumptions of the psychometric
model that underlies the item calibration, administration, and scoring (1).
When new items are administered along with items already in the bank, their consistency
with the bank can be evaluated. Item bank composition can therefore be continually
expanded as needed.
Back to Top
What is a Domain Framework?
A domain map that portrays the structure of each target domain and its conceptual
framework or, where applicable, hierarchical structure.
Back to Top
How will the independent research
projects tie into the PROMIS network projects?
The independent projects allow primary research sites flexibility to pursue independent
scholarly research, to explore innovative ideas and to test novel strategies of
data collection. However, each of them is also designed to address the needs of
the overall PROMIS network. For example, the independent research project to develop
computerized adaptive testing for children will use analytic methods and item selection
software similar if not identical to that planned for the overall network. Another
independent research project investigates needs of PROMIS end users such as clinical
trialists, and will develop implementation tools and strategies to expand adoption
and utilization of the PROMIS network tools. Hence, the independent research projects
interact with and feed into the overall project objectives.
Back to Top
The PROMIS banks seem generic. Is there anything
being done to cover targeted issues?
One of the main objectives of PROMIS is to compile a core set of questions to assess
the most common or salient dimensions of patient–relevant outcomes for the
widest possible range of chronic disorders and diseases. Items are currently being
developed to measure pain, fatigue, emotional distress, physical functioning, social
role participation, and general health. Targeted content areas (e.g., items specifically
designed for persons with cancer) will be developed through independent projects.
Back to Top
What is an item bank?
An item bank is comprised of carefully calibrated questions that provide an operational
definition of a trait or construct (8; 2).
A good bank covers the entire continuum of the latent trait being measured, capturing
different severity levels along the continuum (Lai et al., 2003). A well–calibrated
item bank makes it possible to compare the amount of a given trait for patients
who complete different sets of questions in the bank. Not only does this allow for
"adaptive" testing, because all items are calibrated onto one common scale,
it is possible to compare HRQOL results across diverse groups of patients and item
sets. A well–organized item bank with wide ranging item difficulties can also
enable one to select items to construct a wide variety of instruments, depending
on the target populations and purpose of assessment. At a given difficulty level,
any chosen item should provide the maximum amount of information to estimate a patient’s
score on the HRQOL domain of interest (12).
Because the content of questions at comparable difficulty levels may vary in clinical
relevance, it is possible to select the item within a given difficulty level according
to its clinical relevance. Specific items can thus be selected from among those
in the bank to maximize precision of the estimate and clinical relevance of the
questions. With CAT, collaborative interaction between clinicians and programmers
of the algorithm allows one to select the best set of items to obtain an estimate.
Back to Top
What is an 'Item'?
A question (including its response choices) in a survey.
Back to Top
What is item response theory (IRT)?
Item response theory (IRT) is a collection of statistical models and methods used
for two broad purposes in the measurement of health outcomes: item analysis and
scale scoring. The family of IRT models describes, in probabilistic terms, the relationship
between a person's response to a survey question and his or her standing on the
construct (e.g., emotional distress) being measured by the scale. Specifically,
IRT models predict the probability of choosing each response category as a function
of an underlying, unobserved trait and item parameters (1, 2, 3). For item analysis, the IRT model
characterizes each scale item with a set of properties that describes its ability
to discriminate among individuals at different levels along a trait continuum. For
scale scoring, IRT uses the full information from a person’s responses to each item
to estimate their standing on the measured construct. Scale scoring using IRT estimates
a score along the continuum of the construct being measured for persons who provide
a particular sequence of item responses. Usually a person’s score estimates include
a measure of central tendency and a description of variability that is reported
as a standard error of measurement. The IRT scale score may be computed using only
the item parameters and the responses of a single individual to any arbitrarily
selected set of items, and this is the basis for computer adaptive testing.
Back to Top
What kinds of IRT models are there?
IRT models come in many varieties, more than 100, and can handle
unidimensional as well as multidimensional data, binary and polytomous response
data, and ordered as well as unordered response data (10).
The most commonly applied IRT models in health outcomes measurement are the unidimensional
parametric family of polytomous–response models, which include the Rating
Scale Model, the Partial Credit Model, the Generalized–Partial Credit Model,
the Graded Response Model, and the Nominal Model. Each differs in the number of
item parameters that are estimated for each scale item and the constraints placed
on the model or data. The item parameters define how well an item performs for measuring
different levels of the measured construct or trait such as fatigue. The threshold
(or difficulty) parameter describes where along the trait continuum an item is most
informative for differentiating between lower and higher function levels. The slope
parameter describes the strength of an item for discriminating among different levels
of the underlying construct. Discrimination is related to precision in that the
more an item can discriminate among individuals at different levels of the construct,
the more precision the item adds for measuring a person’s trait level.
Back to Top
Computer Adaptive Testing (CAT) Scores
CAT integrates the advances in measurement theory and the power of computer technology
to administer a PRO instrument that selects questions on the basis of a patient's
response to previously administered questions (or possibly other prior information).
Highly informative questions are selected so that we may estimate scores that represent
a persons standing on a domain (e.g., physical functioning, depression) with the
minimal number of questions without a loss in measurement precision.
Back to Top
What is a Theta Metric?
The underlying (latent) construct is estimated from the responses people give to
the items in a scale. These items have been previously calibrated by an IRT model.
Back to Top
What is a T–Score?
Scores that have a mean of 50 and standard deviation of 10 in a reference (e.g.,
general) population.
Back to Top
What is a 'Short Form'?
A parsimonious subset of items selected from a full item bank to yield an accurate
estimate at a targeted range of the measured domain.
Back to Top
How can I get involved?
PROMIS Contact Info:
info@nihpromis.org or call 1–888–261–0922
Back to Top
How much is it going to cost?
The NIH required PROMIS applicants to propose detailed plans for sharing the research
resources generated through the cooperative agreement. The intention is to provide
the wider scientific community with free access in the public domain to the data
repository, CAT, and supporting documents. A public–private partnership is
being pursued within PROMIS to explore options for making the products as widely
available as possible.
Back to Top
Is NIH going to make us use it?
No. Use of PROMIS products is voluntary. The project will be undertaking extensive
outreach efforts to obtain as much stakeholder input as possible to help ensure
that the tools are as useful as possible to potential users.
Back to Top
References
- Bjorner, J.B., Kosinski, M., & Ware, J.E., Jr. (2003).
Calibration of an item pool for assessing the burden of headaches: An application
of item response theory to the Headache Impact Test (HIT). Quality of Life Research,
12(8):913–933.
- Bjorner, J.B., Kosinski, M., & Ware, J.E. (2005). Computerized
adaptive testing and item banking. In Fayers, P.M., Hays, R.D. (eds.). Assessing
Quality of Life. Oxford: Oxford University Press.
- Fischer, G.H., Molenaar, I.W. (1995). Rasch Models:
Foundations, Recent Developments, and Applications. New York, NY: Springer–Verlag.
- Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991).
Fundamentals of Item Response Theory. London: Sage Publications.
- Lai, J.S., Cella, D., Chang, C.H., Bode, R.K., & Heinemann,
A.W. (2003). Item banking to improve, shorten, and computerize self–reported
fatigue: An illustration of steps to create a core item bank from the FACIT–Fatigue
Scale. Quality of Life Research, 12(5):485–501.
- Rasch, G. (1980). Probabilistic Models for Some Intelligence
and Attainment Tests. Chicago: University of Chicago Press.
- Reckase, M.D. (1997). The past and future of multidimensional
item response theory. Applied Psychological Measurement, 21:25–36.
- Revicki, D.A., & Cella, D.F. (1997). Health status
assessment for the twenty–first century: Item response theory, item banking,
and computer adaptive testing. Quality of Life Research, 6(6):595–600.
- Thissen, D., & Steinberg, L. (1986). A taxonomy of
item response models. Psychometrika, New York, NY: 51:567–577.
- van der Linden, W.J., Hambleton, R.K. (eds.) (1997).
Handbook of Modern Item Response Theory. New York, NY: Springer.
- Wainer, H., Dorans, N.J., Eignor, D., Flaugher, R.,
Green, B.F., Mislevy, R.J., et al. Computerized Adaptive Testing: A Primer. Mahwah,
NJ: Lawrence Erlbaum Associates, 2000.
- Wainer, H., Mislevy, R.J. Item response theory, item
calibration, and proficiency estimation. In Wainer, H., Dorans, N.J., Flaugher,
R., Green, B.F., Mislevy, R.J., Steinberg, L. et al. (eds.). (2000). Computerized
Adaptive Testing: A Primer, Hillsdale, NJ: Lawrence Erlbaum Associates. pp. 61–101.
- Ware, J.E., Kosinski, M., Bjorner, J.B., Bayliss,
M.S., Batenhorst, A., Dahlof, C.G. et al. Applications of computerized adaptive
testing (CAT) to the assessment of headache impact. (2003). Quality of Life Research
12:935–952.
- Wright, B., Masters, G.N. (1982). Rating Scale Analysis.
Chicago: MESA Press.
Back to Top