FAQs

What is computer adaptive testing (CAT)?

Computer adaptive testing (CAT) is a method of administering health–related quality–of–life (HRQOL) measures by computer using the psychometric framework of item response theory (IRT) (11). These IRT–based adaptive tests are greatly facilitated by a computer because of the computational requirements of the algorithm and the logistics of item and data management.

Items are selected on the basis of the patient's responses to previously administered items. This process uses an algorithm to estimate a person's score and the score's reliability and then chooses the best next item, enabling scale administration based on specifications such as content coverage, test length, and standard error. The capacity to rank all patients on the same continuum, even if they have not been given any common items, allows for an assessment that is individually tailored to each person. With item banking, each patient need only answer a subset of items to obtain a measure that accurately estimates what would have been obtained by administering the entire set of items.

CAT has been used successfully in educational, licensing, and achievement testing; personality assessment; and selection of military personnel. Recently, investigators have begun to apply CAT to HRQOL (13). CAT has been shown to reduce test length without loss of precision. To realize many of the measurement advantages of adaptive testing, the item pool from which items are selected must contain high–quality items for many different levels of the HRQOL continuum. In addition, adaptive test item pools must satisfy the assumptions of the psychometric model that underlies the item calibration, administration, and scoring (1). When new items are administered along with items already in the bank, their consistency with the bank can be evaluated. Item bank composition can therefore be continually expanded as needed.

Back to Top

What is a Domain Framework?

A domain map that portrays the structure of each target domain and its conceptual framework or, where applicable, hierarchical structure.

Back to Top

How will the independent research projects tie into the PROMIS network projects?

The independent projects allow primary research sites flexibility to pursue independent scholarly research, to explore innovative ideas and to test novel strategies of data collection. However, each of them is also designed to address the needs of the overall PROMIS network. For example, the independent research project to develop computerized adaptive testing for children will use analytic methods and item selection software similar if not identical to that planned for the overall network. Another independent research project investigates needs of PROMIS end users such as clinical trialists, and will develop implementation tools and strategies to expand adoption and utilization of the PROMIS network tools. Hence, the independent research projects interact with and feed into the overall project objectives.

Back to Top

The PROMIS banks seem generic. Is there anything being done to cover targeted issues?

One of the main objectives of PROMIS is to compile a core set of questions to assess the most common or salient dimensions of patient–relevant outcomes for the widest possible range of chronic disorders and diseases. Items are currently being developed to measure pain, fatigue, emotional distress, physical functioning, social role participation, and general health. Targeted content areas (e.g., items specifically designed for persons with cancer) will be developed through independent projects.

Back to Top

What is an item bank?

An item bank is comprised of carefully calibrated questions that provide an operational definition of a trait or construct (8; 2). A good bank covers the entire continuum of the latent trait being measured, capturing different severity levels along the continuum (Lai et al., 2003). A well–calibrated item bank makes it possible to compare the amount of a given trait for patients who complete different sets of questions in the bank. Not only does this allow for "adaptive" testing, because all items are calibrated onto one common scale, it is possible to compare HRQOL results across diverse groups of patients and item sets. A well–organized item bank with wide ranging item difficulties can also enable one to select items to construct a wide variety of instruments, depending on the target populations and purpose of assessment. At a given difficulty level, any chosen item should provide the maximum amount of information to estimate a patient’s score on the HRQOL domain of interest (12).

Because the content of questions at comparable difficulty levels may vary in clinical relevance, it is possible to select the item within a given difficulty level according to its clinical relevance. Specific items can thus be selected from among those in the bank to maximize precision of the estimate and clinical relevance of the questions. With CAT, collaborative interaction between clinicians and programmers of the algorithm allows one to select the best set of items to obtain an estimate.

Back to Top

What is an 'Item'?

A question (including its response choices) in a survey.

Back to Top

What is item response theory (IRT)?

Item response theory (IRT) is a collection of statistical models and methods used for two broad purposes in the measurement of health outcomes: item analysis and scale scoring. The family of IRT models describes, in probabilistic terms, the relationship between a person's response to a survey question and his or her standing on the construct (e.g., emotional distress) being measured by the scale. Specifically, IRT models predict the probability of choosing each response category as a function of an underlying, unobserved trait and item parameters (1, 2, 3). For item analysis, the IRT model characterizes each scale item with a set of properties that describes its ability to discriminate among individuals at different levels along a trait continuum. For scale scoring, IRT uses the full information from a person’s responses to each item to estimate their standing on the measured construct. Scale scoring using IRT estimates a score along the continuum of the construct being measured for persons who provide a particular sequence of item responses. Usually a person’s score estimates include a measure of central tendency and a description of variability that is reported as a standard error of measurement. The IRT scale score may be computed using only the item parameters and the responses of a single individual to any arbitrarily selected set of items, and this is the basis for computer adaptive testing.

Back to Top

What kinds of IRT models are there?

IRT models come in many varieties, more than 100, and can handle unidimensional as well as multidimensional data, binary and polytomous response data, and ordered as well as unordered response data (10). The most commonly applied IRT models in health outcomes measurement are the unidimensional parametric family of polytomous–response models, which include the Rating Scale Model, the Partial Credit Model, the Generalized–Partial Credit Model, the Graded Response Model, and the Nominal Model. Each differs in the number of item parameters that are estimated for each scale item and the constraints placed on the model or data. The item parameters define how well an item performs for measuring different levels of the measured construct or trait such as fatigue. The threshold (or difficulty) parameter describes where along the trait continuum an item is most informative for differentiating between lower and higher function levels. The slope parameter describes the strength of an item for discriminating among different levels of the underlying construct. Discrimination is related to precision in that the more an item can discriminate among individuals at different levels of the construct, the more precision the item adds for measuring a person’s trait level.

Back to Top

Computer Adaptive Testing (CAT) Scores

CAT integrates the advances in measurement theory and the power of computer technology to administer a PRO instrument that selects questions on the basis of a patient's response to previously administered questions (or possibly other prior information). Highly informative questions are selected so that we may estimate scores that represent a persons standing on a domain (e.g., physical functioning, depression) with the minimal number of questions without a loss in measurement precision.

Back to Top

What is a Theta Metric?

The underlying (latent) construct is estimated from the responses people give to the items in a scale. These items have been previously calibrated by an IRT model.

Back to Top

What is a T–Score?

Scores that have a mean of 50 and standard deviation of 10 in a reference (e.g., general) population.

Back to Top

What is a 'Short Form'?

A parsimonious subset of items selected from a full item bank to yield an accurate estimate at a targeted range of the measured domain.

Back to Top

How can I get involved?

PROMIS Contact Info:

info@nihpromis.org or call 1–888–261–0922

Back to Top

How much is it going to cost?

The NIH required PROMIS applicants to propose detailed plans for sharing the research resources generated through the cooperative agreement. The intention is to provide the wider scientific community with free access in the public domain to the data repository, CAT, and supporting documents. A public–private partnership is being pursued within PROMIS to explore options for making the products as widely available as possible.

Back to Top

Is NIH going to make us use it?

No. Use of PROMIS products is voluntary. The project will be undertaking extensive outreach efforts to obtain as much stakeholder input as possible to help ensure that the tools are as useful as possible to potential users.

Back to Top

References

  1. Bjorner, J.B., Kosinski, M., & Ware, J.E., Jr. (2003). Calibration of an item pool for assessing the burden of headaches: An application of item response theory to the Headache Impact Test (HIT). Quality of Life Research, 12(8):913–933.
  2. Bjorner, J.B., Kosinski, M., & Ware, J.E. (2005). Computerized adaptive testing and item banking. In Fayers, P.M., Hays, R.D. (eds.). Assessing Quality of Life. Oxford: Oxford University Press.
  3. Fischer, G.H., Molenaar, I.W. (1995). Rasch Models: Foundations, Recent Developments, and Applications. New York, NY: Springer–Verlag.
  4. Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamentals of Item Response Theory. London: Sage Publications.
  5. Lai, J.S., Cella, D., Chang, C.H., Bode, R.K., & Heinemann, A.W. (2003). Item banking to improve, shorten, and computerize self–reported fatigue: An illustration of steps to create a core item bank from the FACIT–Fatigue Scale. Quality of Life Research, 12(5):485–501.
  6. Rasch, G. (1980). Probabilistic Models for Some Intelligence and Attainment Tests. Chicago: University of Chicago Press.
  7. Reckase, M.D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21:25–36.
  8. Revicki, D.A., & Cella, D.F. (1997). Health status assessment for the twenty–first century: Item response theory, item banking, and computer adaptive testing. Quality of Life Research, 6(6):595–600.
  9. Thissen, D., & Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, New York, NY: 51:567–577.
  10. van der Linden, W.J., Hambleton, R.K. (eds.) (1997). Handbook of Modern Item Response Theory. New York, NY: Springer.
  11. Wainer, H., Dorans, N.J., Eignor, D., Flaugher, R., Green, B.F., Mislevy, R.J., et al. Computerized Adaptive Testing: A Primer. Mahwah, NJ: Lawrence Erlbaum Associates, 2000.
  12. Wainer, H., Mislevy, R.J. Item response theory, item calibration, and proficiency estimation. In Wainer, H., Dorans, N.J., Flaugher, R., Green, B.F., Mislevy, R.J., Steinberg, L. et al. (eds.). (2000). Computerized Adaptive Testing: A Primer, Hillsdale, NJ: Lawrence Erlbaum Associates. pp. 61–101.
  13. Ware, J.E., Kosinski, M., Bjorner, J.B., Bayliss, M.S., Batenhorst, A., Dahlof, C.G. et al. Applications of computerized adaptive testing (CAT) to the assessment of headache impact. (2003). Quality of Life Research 12:935–952.
  14. Wright, B., Masters, G.N. (1982). Rating Scale Analysis. Chicago: MESA Press.

Back to Top