PROMIS Series
The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008

https://doi.org/10.1016/j.jclinepi.2010.04.011Get rights and content

Abstract

Objectives

Patient-reported outcomes (PROs) are essential when evaluating many new treatments in health care; yet, current measures have been limited by a lack of precision, standardization, and comparability of scores across studies and diseases. The Patient-Reported Outcomes Measurement Information System (PROMIS) provides item banks that offer the potential for efficient (minimizes item number without compromising reliability), flexible (enables optional use of interchangeable items), and precise (has minimal error in estimate) measurement of commonly studied PROs. We report results from the first large-scale testing of PROMIS items.

Study Design and Setting

Fourteen item pools were tested in the U.S. general population and clinical groups using an online panel and clinic recruitment. A scale-setting subsample was created reflecting demographics proportional to the 2000 U.S. census.

Results

Using item-response theory (graded response model), 11 item banks were calibrated on a sample of 21,133, measuring components of self-reported physical, mental, and social health, along with a 10-item Global Health Scale. Short forms from each bank were developed and compared with the overall bank and with other well-validated and widely accepted (“legacy”) measures. All item banks demonstrated good reliability across most of the score distributions. Construct validity was supported by moderate to strong correlations with legacy measures.

Conclusion

PROMIS item banks and their short forms provide evidence that they are reliable and precise measures of generic symptoms and functional reports comparable to legacy instruments. Further testing will continue to validate and test PROMIS items and banks in diverse clinical populations.

Introduction

What is new?

Key finding

  1. PROMIS item banks have demonstrated reliability, precision, and construct validity.

What this adds to what was known?
  1. PROMIS measures provide a common metric for assessment of physical function, pain, fatigue, emotional distress, social function, and sleep/wake disturbance.

What is the implication, and what should change now?
  1. PROMIS measures are available for use in clinical research.

Clinical outcome measures, such as radiographic imaging and laboratory tests, have minimal immediate relevance to the day-to-day functioning of patients with chronic diseases, such as arthritis, multiple sclerosis, cancer, and asthma, or conditions characterized by chronic pain and fatigue. Often, the best way patients can judge the effectiveness of treatments is by perceived changes in symptoms, distress, or function. In late 2004, a group of scientists from several U.S.-based academic institutions and the National Institutes of Health (NIH) formed a cooperative group funded under the NIH Roadmap for Medical Research Initiative (http://www.nihroadmap.nih.gov) to revolutionize the assessment of patient-reported outcomes (PROs) for use in clinical research and health care delivery settings. This initiative—the Patient-Reported Outcomes Measurement Information System (PROMIS)—establishes a national resource for precise and efficient measurement of patient-reported symptoms, functioning, and health-related quality of life, appropriate for patients with a wide variety of chronic diseases and conditions. The main goal of the PROMIS initiative is to develop and evaluate, for the clinical research community, a set of publicly available, efficient, and flexible measurements of PROs, including health-related quality of life (HRQL).

This article summarizes PROMIS network research during the period 2005–2008, which includes six primary research sites and a statistical coordinating center. This summary builds on a previously published summary of the processes that defined the activity of PROMIS from 2004 to 2006 [1]. The previous report [1] reviewed the PROMIS conceptual framework and defined the prioritization of PRO domains to be initially developed by PROMIS. This article also builds on previous articles that have described the qualitative review process of PROMIS' item pools [2], [3], [4] and the proposed quantitative methods [5] to be used to evaluate the large-scale data collected by PROMIS for item evaluation and calibration for PROMIS item banks. This article describes and defines the domains first developed and tested by the PROMIS network, summarizes the sampling strategy used for our first wave of testing, and provides summary data based on initial item calibrations and U.S. general population PROMIS scores.

During the first 2 years of support, the PROMIS network developed a domain framework (Fig. 1) that focused efforts to organize item pools for wave 1 testing. This framework begins on the left side of the figure with three broad aspects of self-reported health: physical, mental, and social. Each of these aspects, in turn, is comprised of components, or “domains,” of HRQL. In the first year of PROMIS, investigators working within the consensus-based framework decided to initiate work in at least one domain from each broad aspect of health (physical, mental, and social). Specific domains selected for development were physical function, fatigue, pain, emotional distress, social function, and global health. The framework in Fig. 1 represents the March 2010 version, which has been modified over time based on empirical results (including some reported herein). Content elaborations on the right half of the figure represent functioning banks (green background), components of functioning banks (gray background), item banks in development (yellow background), and uncalibrated item pools and scales (blue background). Conceptual definitions that guided the development of the proposed wave 1 domains are as follows.

Physical function is defined as one's ability to carry out various activities that require physical capability, ranging from self-care (activities of daily living) to more vigorous activities that require increasing degrees of mobility, strength, or endurance [6], [7], [8], [9], [10]. Physical function is conceptually multidimensional, with four related subdomains: mobility (lower extremity function), dexterity (upper extremity function), axial (neck and back) function, and ability to carry out instrumental activities of daily living [11].

In the health-outcomes measurement perspective, fatigue is defined as an overwhelming, debilitating, and sustained sense of exhaustion that decreases one's ability to carry out daily activities, including the ability to work effectively and to function at one's usual level in family or social roles [12], [13], [14]. Similar subjective feelings, yet fewer behavioral impacts, are associated with lower levels of fatigue. Fatigue is divided conceptually into the experience of fatigue (such as its intensity, frequency, and duration), and the impact of fatigue on physical, mental, and social activities.

Pain is an unpleasant sensory and emotional experience associated with actual or potential tissue damage or is described in terms of such damage [15], [16], [17], [18]. Pain is what the respondent says it is, that is, the “gold standard” of pain assessment is self-report [19]. Pain is divided conceptually into components of quality (referring to the nature, characteristics, intensity, frequency, and duration of pain); impact on physical, mental, and social activities; and behaviors one engages in to avoid, minimize, or reduce pain.

Sleep and wakefulness are the two fundamental behavioral states of human beings. Sleep is a rapidly reversible, recurrent state of reduced (but not absent) awareness of and interaction with the environment. Wakefulness is a behavioral state of active engagement and interaction with the environment, including the perception and processing of stimuli and the production of cognitive, emotional, and behavioral responses.

The PROMIS sleep disturbance item bank focuses on perceptions of sleep quality, sleep depth, and restoration associated with sleep; perceived difficulties with getting to sleep or staying asleep; and perceptions of the adequacy of and satisfaction with sleep. The sleep disturbance item bank does not include symptoms of specific sleep disorders nor does it provide subjective estimates of sleep quantities (e.g., the total amount of sleep, time to fall asleep, or amount of wakefulness during sleep).

The PROMIS sleep-related impairment item bank focuses on perceptions of alertness, sleepiness, and tiredness during usual waking hours, and on functional impairments during wakefulness that are associated with sleep problems or impaired alertness. The sleep-related impairment item bank does not directly assess cognitive, affective, or performance impairments. The sleep-related impairment item bank measures the level of waking alertness, sleepiness, and function within the context of overall sleep–wake function.

Emotional distress is an important component of emotional health and is comprised typically of aspects of anxiety, depression, and anger. Given the overlap among these symptoms, a number of conceptual models have been proposed to account for the shared vs. unique variance captured in measures of negative affect. PROMIS adopted a hierarchical structure to explain the relationships between self-reported symptoms of anxiety, depression, and anger [20], [21]. This structure includes a second-order, nonspecific factor reflecting high levels of negative affect—or “general distress”—common to all these emotions. Anger tends to have smaller loadings on the general factor than anxiety and depression, but it still is a strong marker of emotional distress. The PROMIS item banks emphasize the cognitive and affective components of these concepts. Both psychometric considerations (e.g., skewed distributions for high-threshold behavioral items, the need to fit item-response theory [IRT] models to coherent unidimensional concepts) and considerations regarding validity (e.g., potential confounding between somatic symptoms of emotional distress and markers of physical disease) led us to this emphasis.

The PROMIS item bank for depression focuses on negative mood (e.g., sadness, guilt), decrease in positive affect (e.g., loss of interest), information-processing deficits (e.g., problems in decision-making), negative views of the self (e.g., self-criticism, worthlessness), and negative social cognition (e.g., loneliness, interpersonal alienation).

The PROMIS item bank for anxiety focuses on fear (e.g., fearfulness, feelings of panic), anxious misery (e.g., worry, dread), hyperarousal (e.g., tension, nervousness, restlessness), and somatic symptoms related to arousal (e.g., cardiovascular symptoms, dizziness).

The PROMIS item bank for anger focuses on angry mood (e.g., irritability, reactivity), negative social cognition (e.g., interpersonal sensitivity, envy, vengefulness), verbal aggression, and efforts necessary to control angry mood.

Social health is defined as perceived well-being regarding social activities and relationships, including the ability to relate to individuals, groups, communities, and society as a whole. Components of social functioning include understanding and communication, getting along with people, participation in society, and performance of social roles. Additional conceptualizations of social functioning focus on the quality, reciprocity, and size of an individual's social network [22], [23]. Although social function was the initial focus of PROMIS investigation, several other aspects of social health are noteworthy. These include social support and interpersonal attributes independent of particular roles, such as intimacy, assertiveness, sociability, submissiveness, and interpersonal control [24].

Social function is defined by PROMIS as involvement in, and satisfaction with, one's usual social roles in life's situations and activities. These roles may exist in dyadic or family relationships, parental responsibilities, work responsibilities, and social activities [25], [26]. Social function has also been referred to with terms, such as role participation and social adjustment [26]. Qualitative and quantitative analysis of PROMIS and archival data collected before the current study (Cella et al., 2007) [27], [28] led us to hypothesize a conceptual division of social function into “ability to participate” and “satisfaction with participation.” Each of these two components has subcomponents that divide social roles, such as work and family responsibilities, and more discretionary social activities, such as leisure activity and relationships with friends.

Global health refers to a person's general evaluations of health rather than any of its specific components. The global health items include global ratings of the five primary PROMIS domains (physical function, fatigue, pain, emotional distress, and social health) and general health perceptions that cut across domains. Global items allow respondents to weigh together different aspects of health to arrive at a “bottom-line” indicator of their health status. Global health items have been found to be consistently predictive of important future events, such as health care utilization and mortality [8], [14], [18]. Results from wave 1 testing of the global items are reported elsewhere [29], [30].

Each domain listed earlier was assigned a team of PROMIS investigators, consisting of experts in the measurement and assessment of the domain area. These teams identified, evaluated, and revised an exhaustive set of extant questionnaire items and wrote new items when necessary to form a core item pool for each domain. Six phases of item development were documented: identification of existing items, item classification and selection, item review and revision, focus group input on domain coverage, cognitive interviews with individual items, and final revision before field testing [2].

To inform PROMIS item selection and development, we analyzed 11 large data sets with self-report data on the five broad PROMIS core domains: pain, fatigue, emotional distress, physical function, and social function [1], [29], [31], [32]. Sleep disturbance and sleep-related impairment were not included, as their development was focused at a single PROMIS site rather than as a full network effort. Psychometric results from these analyses were reviewed collectively by the analysis team, and summaries were presented to the appropriate domain working group. The primary goal was to use these archival data to better understand the dimensional structure of items that tap one of the five selected PROMIS domains. Secondarily, we aimed to inform the revision of items in the item pools, identify the best performing sets of response options, and guide new item construction in preparation for the first wave of PROMIS testing [1].

Although some data suggest that recall periods beyond 1 day may introduce bias into the reporting of symptoms [33], a recent study of pain and fatigue [34] suggests reasonably high correspondence between real-time symptom reports and 7-day recall of the same symptoms. In addition, Revicki et al. [35] found that gastrointestinal symptom scores based on a daily diary had a correlation greater than 0.90 with a 2-week recall instrument, suggesting minimal recall bias. Thus, from a practical viewpoint, a 7-day recall period provides a sufficiently long interval to capture a clinically relevant window of time and experience with minimal bias. Based on these studies, we opted for the 7-day option as optimal in most cases. “In the past 7 days” is the reference period for all items in anxiety, anger, depression, fatigue, pain quality, pain interference, pain behavior, satisfaction with participation in discretionary social activities, satisfaction with participation in social roles, sleep disturbance, and sleep-related impairment. An exception is physical function, which emphasizes current capabilities and, therefore, does not use a recall period. Item stems begin with phrases, such as “Does your health now limit you” or “Are you able to.” Some global health items use a 7-day recall period, whereas others do not use a recall period and emphasize current status in general.

Most of the PROMIS items use response scales with five options (e.g., 1 = not at all, 2 = a little bit, 3 = somewhat, 4 = quite a bit, 5 = very much). This number of response options was selected after extensive discussion based on prior work [36] and analyses of available large data sets, in which five response options produced data sets with ample responses in each option for IRT analysis; provided good discrimination in item characteristic curves without producing failures of monotonicity, scalability, or item misfit; and performed well in cognitive testing. Pain behavior uses six response options to allow for respondents to endorse “had no pain.” In this way, we could differentiate those with no pain from those who report no such behavior in response to pain. Each of the 10 PROMIS global health items has five response choices, except the 11-point pain intensity item (“How would you rate your pain on average” with 0 = no pain and 10 = worst imaginable pain). All modifications to existing items regarding the number and wording of response options were made with permission of the source item developer. To ease respondent burden, the wording of response categories was kept consistent within banks, and a limited degree of variation in response options was used across banks. Some flexibility in response choices within banks was considered important, however, to capture the range of patient experience in a domain (e.g., intensity, frequency, duration). Therefore, for example, most banks used a common set of response options for intensity (i.e., “not at all” to “very much”) and frequency (i.e., “never” to “always”). The selected response categories were pretested with cognitive interviews to confirm patient comprehension before field testing for item calibration.

After the extensive literature review to identify items for each bank, review of the items by experts and patients, and standardization of the questions and response format, the next phase of PROMIS included the large wave 1 testing of the items to collect patient-reported data to allow quantitative evaluation and calibration of the PROMIS items.

From July 2006 to March 2007, data were collected from the U.S. general population and multiple disease populations. A sampling plan was developed for collecting responses to the candidate items from the targeted PROMIS domains. This plan was designed to accommodate multiple objectives: (1) obtain item calibrations for each domain; (2) estimate profile scores for various disease populations; (3) create linking metrics to legacy questionnaires (e.g., Short Form [SF]-36); (4) confirm the factor structure of the domains; and (5) conduct item and bank analyses. Because of the large total number of items (>1,000), it was unreasonable to ask participants to respond to the entire pool of items. We estimated that participants would respond to approximately four questions per minute and limited the maximum number of items administered to about 150, for an estimated average response time of 37 minutes.

Figure 2 outlines the two arms of the sampling design: “full-bank” and “block” administration. There were 14 candidate item banks (three physical functioning banks, anxiety, depression, anger, alcohol abuse, fatigue interference, fatigue experience, social role performance, social role satisfaction, pain interference, pain quality, and pain behavior). All 56 items for each of the two PROMIS candidate item banks (112 PROMIS items) were administered to a subset of individuals in the full-bank arm. They also completed appropriate “legacy” questionnaires (well-validated and widely used measures of the same concept). Another subset of the PROMIS wave 1 sample was administered blocks of seven items selected from each of the 14 candidate item banks (98 PROMIS items). All participants completed a clinical form consisting of approximately 25 auxiliary items measuring global health perceptions and sociodemographic variables, including age, income, number of hospitalizations, disability days, use of prescription medication, height, weight, gender, race/ethnicity, relationship status, educational attainment, and employment status. This clinical form also included a series of health questions about the presence and degree of limitations related to 25 chronic medical conditions: hypertension, angina, coronary artery disease, heart failure, heart attack, stroke or transient ischemic attack, liver disease, kidney disease, arthritis or rheumatism, osteoarthritis, migraines, asthma, chronic obstructive pulmonary disease, diabetes, cancer, depression, anxiety, alcohol or drug problems, sleep disorder, human immunodeficiency virus/acquired immunodeficiency syndrome, spinal cord injury, multiple sclerosis, Parkinson's disease, epilepsy, and amyotrophic lateral sclerosis.

We organized the sampling frame and item administration according to two types: full-bank and block administration. These are described in detail later. The full-bank administration provided data for evaluating dimensionality and calibrating within item banks (domains). The block administration provided data for evaluating associations among domains. Blocks of PROMIS items were administered both to general population and clinical samples. The sampling design ensured that each item was administered to at least 900 respondents from the general population (some of whom reported having chronic medical conditions) and 500 respondents with known chronic medical conditions.

Most of the response data were collected by YouGovPolimetrix (www.polimetrix.com; also see www.pollingpoint.com), a polling firm based in Palo Alto, CA. YouGovPolimetrix operates PollingPoint.com, a centralized portal that allows interested individuals to provide their views about public policy and other current issues. The respondents for a typical YouGovPolimetrix Internet survey are selected from the PollingPoint panel, a panel of more than 1 million respondents who have provided YouGovPolimetrix with their names, street addresses, e-mail addresses, and other information, and who regularly participate in online surveys. Panelists were recruited by a variety of methods, including e-random digit dialing, invitations by means of web newsletters, and Internet poll–based recruitment, where panelists have opted to participate in a survey advertised on the World Wide Web. Panel members receive modest compensation (less than $10 value) when they participate.

YouGovPolimetrix uses a sample-matching procedure to select representative samples. The sample-matching algorithm starts with a listing of all respondents in the desired or target population. Next, a random sample of the desired size is selected from the population listing (the “target sample”). Third, for each element of the target sample, the closest match is selected from the PollingPoint panel. This method has been shown to give accurate results in a wide variety of contexts, even for groups significantly underrepresented on the Internet [37]. The validity of the approach depends on the panel being sufficiently large and diverse, not on Internet usage or other types of behaviors. For PROMIS, we specified targets in terms of gender (50% female), age (20% in each of five age groups: 18–29, 30–44, 45–59, 60–74, older than 75 years), race/ethnicity (12.3% African American and 12.5% Latino/Hispanic to match the U.S. census), and education (10% less than high school graduate). To supplement these specifications, we developed a subset representative of the U.S. general population [38].

The PROMIS wave 1 sample included 21,133 respondents. Of these, 1,532 were recruited from primary research sites associated with PROMIS network sites and the remainder (19,601) from YouGovPolimetrix's panel sample. Figure 2 describes the samples. These are broken down by source and type of respondent (clinical vs. general population). The PROMIS steering committee chose to anchor the calibration of the first wave of PROMIS items on the U.S. population (unselected for any specific health problem). Therefore, all full-bank respondents were drawn from nonclinical samples, which we refer to as “general population.” The clinical population supplied by YouGovPolimetrix for block testing was identified through a presurvey of 250,000 YouGovPolimetrix panel members. These respondents completed the PROMIS clinical form described earlier. Persons were included in the clinical sample associated with a particular condition if they reported having received the diagnosis from a physician. The general population sample included people with reported conditions. They were administered the clinical form, but their responses did not exclude them from participation in the general population sample.

YouGovPolimetrix sample data were collected using their Web site on a secure server. PROMIS network site data were collected using a Web-based platform created by PROMIS. On completion of data collection, the PROMIS Statistical Coordinating Center received de-identified data sets from YouGovPolimetrix. Full banks were administered to 7,005 individuals (6,676 from YouGovPolimetrix, 236 from University of North Carolina, and 93 from Stanford University). Block administration included 14,128 individuals (6,245 from general population and 7,883 from clinical samples). The clinical samples included persons with heart disease (n = 1,156), cancer (n = 1,754), rheumatoid arthritis (n = 557), osteoarthritis (n = 918), psychiatric illness (n = 1,193), chronic obstructive pulmonary disease (n = 1,214), spinal cord injury (n = 531), and other conditions (n = 560). Participants with comorbidities were included. Figure 2 details which of these clinical samples came from each of the PROMIS sites.

The overall sample (n = 21,133) was 52% female. The median age was approximately 50 years. The breakdown by age range was as follows: 18–29, 12%; 30–39, 12%; 40–49, 16%; 50–64, 32%; and 65 and older, 28%. Eighty-two percent were whites, 9% blacks, 8% multiracial, and 1% others (Asian/Pacific Islanders and Native Americans). The sample was 9% Latino/Hispanic. Highest educational attainment of the participants included 3% less than high school, 16% with terminal high school diploma, 39% with some college but no degree, 24% with a college degree, and 19% with a post-baccalaureate degree. The combined sample was used primarily for calibrating item parameters and setting the optimum location for establishing the midpoints of the score range for each calibrated item bank when it was time to derive scores. This would enable the comparison of item bank scores with general-population benchmark values.

Calibrations of scores based on IRT models yield scores in logits and typically range from around −4 to +4. Most researchers apply a linear transformation to scores (e.g., to create an approximate range of 0–100). PROMIS investigators decided that all PROMIS measures would use the T-score metric [39], in which scores have a mean of 50 and a standard deviation (SD) of 10 compared with the general population. For example, a person who has a PROMIS-pain interference score of 70 is reporting adverse pain interference 2 SDs worse than the general-population average.

The scale-setting PROMIS wave 1 general-population sample was obtained to represent the marginal distributions of race/ethnicity (white vs. black, Latino/Hispanic, others) and education (high school or less vs. more than high school) as reflected in the 2000 U.S. census [38]. The percentages by gender, age, race, and education in the 2000 census were as follows: 52% female; 22% aged 18–29 years, 32% aged 30–44 years, 24% aged 45–59 years, 14% aged 60–74 years, and 8% aged 75 years and older; 74% white, 11% black, 11% Latino/Hispanic, and 4% other; and 51% more than high school. The distribution of characteristics for the PROMIS scale-setting subsample (n = 5,239) were as follows: 57% female; 15% aged 18–29 years, 22% aged 30–44 years, 28% aged 45–59 years, 22% aged 60–74 years, and 13% aged 75 years and older; 74% white, 10% black, 11% Latino/Hispanic, and 4% other; 51% had more than a high school education.

The distribution of pain in the PROMIS wave 1 data proved highly skewed, because few people reported moderate to severe pain. We were concerned that item calibrations from the available data would be unreliable, and the full continuum of pain severity would not be precisely measured, particularly in the moderate-to-severe pain range. Therefore, we collected additional pain item responses from individuals with chronic pain. These respondents were recruited by Web site invitation in collaboration with the American Chronic Pain Association (ACPA). To be eligible, participants had to be 21 years of age or older and have at least one chronic pain condition for at least 3 months before participating in the survey. Those who met eligibility criteria provided Institutional Review Board–approved, online informed consent. The survey was posted on the Web site of the ACPA from September 2007 to March 2008.

The 967 participants responded to 47 pain interference, 42 pain behavior, and 41 pain quality items, and one global average pain intensity item through online administration (some of the 56 items in the original candidate bank were dropped based on preliminary psychometric analyses). The average age of the chronic pain sample was 48.2 years (SD = 11.1). Eighty-one percent were females, 91% were whites, 1.5% were blacks, and 5% were Latino/Hispanics. Eighty-one percent of the participants had a high school education or greater. The data were combined with wave 1 full-bank data to calculate pain item calibrations for the pain item banks.

Respondents for the sleep disturbance and sleep-related impairment items were collected by the University of Pittsburgh research site as an independent research project. A total of 128 sleep disturbance/sleep-related impairment items were administered to 1,993 individuals from YouGovPolimetrix (1,259 from general population and 734 with self-identified sleep problem). Clinical sites at Pittsburgh collected responses from 259 individuals with sleep disorders. The overall sample (n = 2,252) was 44% female. The median age was 52 years; 21% of these were 65 years and older. Eighty-two percent were whites, 13% blacks, 3% Native Americans or Alaskans, 0.4% Native Hawaiians or Pacific Islanders, and 6% others. Ten percent of the sample was Latino/Hispanic. Distribution of educational attainment was 14% with high school or less, 39% with some college, 28% with a college degree, and 20% with an advanced degree. Item-response data from the overall sample (2,252 individuals) were used for item calibration.

Data analyses were driven by a statistical analysis plan [5] for evaluating IRT modeling assumptions (unidimensionality and local dependence), IRT model fit, monotonicity, scalability, item fit, and differential item functioning (DIF). To aid decisions regarding item bank composition, statistical and psychometric results were provided to the domain teams responsible for the development of each bank. These results were discussed, and decisions were made regarding each item. Typically, a first wave of item “cuts” was made; that is, the most problematic items were eliminated, and the reduced-length item pools were subjected to follow-up analyses to help arrive at decisions regarding each item. Through this process of iterative analysis and discussion with content (domain) experts, item-by-item–level decisions were made as to whether an individual item should be (1) calibrated and included in the bank; (2) not calibrated but retained for possible future calibration (e.g., items consistent with the domain being measured but having local dependence; responses concentrated in few of the available response options); or (3) excluded from further consideration (e.g., outside of concept; problematic item wording).

Section snippets

Results

The result of the analyses described earlier was a set of 11 calibrated item banks that would support computerized adaptive testing (CAT) and development of multiple short forms of varying length [27]. A version 1.0 short form ranging from 6 to 10 items was created from each item bank. Items that represented the range of item bank content and difficulty, had high information, and no evidence of DIF were selected. PROMIS item banks and short forms available since December 2008 are listed in

Discussion

PROMIS provides item banks that offer the potential for efficient (minimizes item number without compromising reliability), flexible (enables optional use of interchangeable items), and precise (has minimal error in estimate) measurement of commonly studied PROs. We summarized the domain framework, definitions, and sampling plan that guided the development, testing, and calibration of the first (version 1.0) PROMIS item banks. Item calibrations and statistics are available on the PROMIS Web

Acknowledgments

The Patient-Reported Outcomes Measurement Information System (PROMIS) is an NIH Roadmap initiative to develop a computerized system measuring PROs in respondents with a wide range of chronic diseases and demographic characteristics. This work was funded by cooperative agreements to a Statistical Coordinating Center (Northwestern University, PI: David Cella, PhD, U02AR52177) and six Primary Research Sites (Duke University, PI: Kevin Weinfurt, PhD, U01AR52186; University of North Carolina, PI:

References (44)

  • D. Cella et al.

    The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap Cooperative Group during its first two years

    Med Care

    (2007)
  • D.A. DeWalt et al.

    Evaluation of item candidates: the PROMIS qualitative item review

    Med Care

    (2007)
  • L.D. Castel et al.

    Content validity in the PROMIS social-health domain: a qualitative analysis of focus-group data

    Qual Life Res

    (2008)
  • C. Christodoulou et al.

    Cognitive interviewing in the evaluation of fatigue items: results from the Patient-Reported Outcomes Measurement Information System (PROMIS)

    Qual Life Res

    (2008)
  • B.B. Reeve et al.

    Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS)

    Med Care

    (2007)
  • S.M. Haley et al.

    Measuring physical disablement: the contextual challenge

    Phys Ther

    (1994)
  • A.L. Stewart et al.

    Physical functioning

  • I.B. Wilson et al.

    Linking clinical variables with health-related quality of life. A conceptual model of patient outcomes

    JAMA

    (1995)
  • J.F. Fries et al.

    More relevant, precise, and efficient items for assessment of physical function and disability: moving beyond the classic instruments

    Ann Rheum Dis

    (2006)
  • A. Glaus

    Fatigue in patients with cancer: analysis and assessment

    (1998)
  • North American Nursing Diagnosis Association

    Nursing diagnoses: definition and classification, 1997-1998

    (1996)
  • A.L. Stewart et al.

    Health perceptions, energy/fatigue, and health distress measures. Measuring functioning and well-being: the medical outcomes study approach

    (1992)
  • Cited by (3288)

    View all citing articles on Scopus
    View full text