Valid and Reliable Survey Instruments to Measure Burnout, Well-Being, and Other Work-Related Dimensions

A key organizational strategy to improving clinician well-being is to measure it, develop and implement interventions, and then re-measure it. A variety of dimensions of clinician well-being can be measured including burnout, engagement, and professional satisfaction. Below is a summary of established tools to measure work-related dimensions of well-being. Each tool has advantages and disadvantages and some are more appropriate for specific populations or settings. This information is being provided by the Research, Data, and Metrics Working Group of the National Academy of Medicine Action Collaborative on Clinician Well-Being and Resilience.

Scroll below for an overview of each validated instrument to assess work-related dimensions of well-being.

Valid and Reliable Survey Instruments to Measure Burnout

Purpose
To measure burnout in individuals who work with people (human services and medical professionals).

Format/Data Source
Maslach Burnout Inventory – Human Services Survey for Medical Personnel (MBI-HSS MP) is a 22-item survey that covers 3 areas: Emotional Exhaustion (EE), Depersonalization (DP), and low sense of Personal Accomplishment (PA). Each subscale includes multiple questions with frequency rating choices of Never, A few times a year or less, Once a month or less, A few times a month, Once a week, A few times a week, or Every day.

Date
Measure released in 1981.

Data Analysis
It is preferred to examine relationships with subscale scores as continuous variables and outcomes. Investigators often dichotomize results into burnout – non-burnout but there is no accepted standard definition.1 A common approach considers individuals as presenting at least one symptom of burnout if they have high scores on either the EE (total score of 27 or higher) or DP (total score of 10 or higher) subscales. Evidence indicates that high scores on these subscales can distinguish clinical burnout from the non-burned out2 because this approach identifies individuals whose degree of burnout places them at increased risk of potentially serious personal and professional consequences.3-8 An alternative approach considers individuals to have burnout if they have a high EE score plus either a high DP score or a low PA score (PA score less than 33).1

Development and Testing
The instrument was developed following exploratory research with interview and questionnaire data, testing in a variety of health and service occupations, and factor and confirmatory data analysis. Reliability coefficients, test-retest reliability, convergent validity, and discriminant validity among human services professionals are summarized in the manual.10

Links to Outcomes or Health System Characteristics Related to Health Care Professionals
Substantial data 11,12 supports associations between burnout as measured using the MBI and health care related outcomes (e.g., medical error,5,7,8 malpractice,13 suboptimal patient care practices,14 physician turnover and early retirement,15,16 and lower medical knowledge 17), suboptimal professionalism,4,18 and personal outcomes (e.g., alcohol abuse19-21 , suicidal ideation,3,6 and motor vehicle incidents 22) From a health system characteristics perspective, associations have been found between burnout and practice setting, work hours, clerical burden, and specialty.11,23-26

Country of Origin
United States of America

Past or Validated Applications
Participant age: adults
Population: human services/helping professionals (e.g., teachers, social workers, police officers), including physicians, residents/fellows, medical students, and nurses
National benchmark data available:
US physicians: Yes 24
US residents/fellows: Yes 27
US medical students: Yes 27
General population: Yes 24
Setting: workers in human service/helping professions

Cost
Individual Report – $15; Group Report – $200. Instrument is proprietary. Permission can be obtained through www.mindgarden.com.

Notes
Multiple language translations are available
The MBI-General Survey (MBI-GS) is a 16-item assessment applicable to more general, non-social jobs as well.10
The MBI-Human Services Survey (MBI-HSS) is a 22-item assessment, applicable to human services jobs, e.g. clergy, police, therapists, social workers, medical, etc.10

Purpose
To measure burnout in any occupational group.

Format/Data Source
Oldenburg Burnout Inventory is a 16-item survey with positively and negatively framed items that covers 2 areas: exhaustion (physical, cognitive, and affective aspects) and disengagement from work (negative attitudes toward work objects, work content, or work in general).1 There are multiple questions for each of these subscales and responses are in the form of a 4 point Likert scale from strongly agree (1) to strongly disagree (4).

Date
Measure released in 2002.

Measure Item Mapping
Exhaustion: 2, 4, 5, 8, 10, 12,14,16
Disengagement: 1, 3, 6, 7, 9, 11, 13,15
Data Analysis
Each burnout dimension is treated separately as a continuous variable.

Development and Testing
Developed in response to the MBI not having negatively worded items, and based on job demands-resources model where job demands are primarily related to exhaustion and job resources are primarily related to disengagement.2,3 Two factor structure has been confirmed in a sample of Dutch workers,3 Dutch physicians 1 and US workers 4 whereas a four factor model (exhaustion, energy, disengagement, and engagement) was supported in study of Chinese nurses.5 There is some evidence of convergent validity of OLBI with a shortened (16-item) version of the MBI-GS in a sample of 2431 US workers 4 and in a sample of Chinese nurses though convergent validity data suggests positively worded items should be dropped.5 In a study of 232 Greek employees bivariate correlation between OLBI-exhaustion and MBI-GS-emotional exhaustion was 0.6, and the bivariate correlation between OBLI-disengagement and MBI-GS depersonalization was 0.6.3 In a study of 528 South African employees working in construction, bivariate correlation between OLBI-exhaustion and MBI-GS-emotional exhaustion was 0.6, and the bivariate correlation between OBLI-disengagement and MBI-GS depersonalization was 0.37.6

Links to Outcomes or Health System Characteristics Related to Health Care Professionals
Existing data is limited as a majority of studies have included small samples of physicians and other health care providers, and have mostly been conducted outside of the United States. Studies in Swedish nurses and other Swedish public health professionals suggest that OLBI scores predict intent of turnover and lower self-reported mastery of occupational skills.7-9 Correlations have also been reported between OLBI scores and self-rated health (n=342 Swedish medical students 10 and n=290 medical residents 11). In a longitudinal sample of 186 Swedish medical students, end of medical school OLBI-exhaustion and worries about their future endurance/competence predicted 6-10 month postgraduate OLBI-exhaustion.12

Country of Origin
Germany

Past or Validated Applications
Patient age: adults
Population: any occupational group
National benchmark data not available for US physicians, medical students, or general population.
Setting: any


Cost
$0. Instrument publicly available in appendix of article. 6

Notes
Multiple language translations are available

Purpose
To measure burnout in any occupational group.

Format/Data Source
Single-item. Stem and response items vary in publications. The following item was utilized in Dolan et al. 5: “Overall, based on your definition of burnout, how would you rate your level of burnout?” Responses, options are (1) “I enjoy my work, I have no symptoms of burnout,” (2) “Occasionally I am under stress and I don’t always have as much energy as I once did, but I don’t feel burned out,” (3) “I am definitely burning out and have one or more symptoms of burnout, such as physical and emotional exhaustion,” (4) “The symptoms of burnout that I am experiencing won’t go away. I think about frustration at work a lot,” and (5) “I feel completely burned out and often wonder if I can go on. I am at a point where I may need some changes or may need to seek some sort of help.”

Date
Measure released in 1981.

Measure Item Mapping
N/A

Data Analysis
Often dichotomized as no symptoms of burnout (score of 2 or less) vs. 1 or more symptoms (score of 3 or more). These cut-off scores were not established based on validity evidence.

Development and Testing
In a sample of 5400, VA employees correlation between responses to the single-item with single-item for MBI-EE (item 8:“I feel burned out from my work”) score was r = 0.79. Compared to single MBI-EE item, the single-item had a sensitivity of 83.2%, specificity of 87.4%, and AUC was 0.93.5 In a separate sample of 307 physicians single-item correlated modestly with MBI-EE score r = 0.64.6 In a third study, single-item responses in sample of 308 rural physicians and advance practice providers correlated with full MBI EE and DP domain scores (Spearman’s r =.72 and .41, p<.0001). In multivariable models, single item predicted high EE (but neither low EE nor low/high DP) as measured by the MBI. In this sample, the original MBI 2 items (item 8: “I feel burned out from my work” and item 10: “I have become more callous toward people since I took this job”) correlated better with their respective parent subscale (Spearman’s r = .89 and .81, p <.0001). The summary from that study was that the single item predicts high levels of EE but not low EE or DP, and that it is not effective at capturing individuals who have evidence of burnout in the depersonalization or personal accomplishment domains.7

Links to Outcomes or Health System Characteristics Related to Health Care Professionals
In a study of 422 primary care physicians, single item burnout characterization was associated with lower satisfaction, greater time pressure, poor work control, and intent to leave the medical practice on univariate analysis.8 No relationship was found between burnout and quality of care, as measured by chart review of 1419 patients. In a related study involving 426 primary care physicians structural equation modeling found significant and small to modest path coefficients between stress, satisfaction, and single item burnout and between single item burnout and self-reported medical error and suboptimal patient care practices.9

Country of Origin
United States of America

Past or Validated Applications
Patient age: adults
Population: Physicians
National benchmark data not available for US physicians, medical students, or general population. Some data in VA primary care, including 1769 providers5
Setting: any health care setting


Cost
$0. Publicly available.5

Purpose
To measure burnout in any occupational group.

Format/Data Source
Copenhagen Burnout Inventory is a 19-item survey with positively and negatively framed items that covers 3 areas: personal (degree of physical and psychological fatigue and exhaustion), work (degree of physical and psychological fatigue and exhaustion related to work), and client-related (or a similar term such as patient, student, etc.) burnout. There are multiple questions for each of these subscales and responses are in the form of either always, often, sometimes, seldom, and never/almost never or to a very high degree, to a high degree, somewhat, to a low degree, and to a very low degree.

Date
Measure released in 2005.

Measure Item Mapping
Overall physical and psychological fatigue: 6 items
Physical and psychological fatigue related to work: 7 items
Client-related burnout: 6 items
(Questions are to be mixed with questions on other topics to avoid stereotyped response patterns)

Data Analysis
Each dimension is separately treated as a continuous variable. The response options are recoded into scores of 100, 75, 50, 25, and 0. Next, items within the subscale are averaged, with one item reverse scored. Higher scores indicate a higher degree of burnout. Possible score ranges for all scales is 0-100. In one study investigators chose a score of 50 or higher to indicate burnout as a dichotomous variable.1 In a separate study investigators chose scores of 25 or lower, 25 to 50, and higher than 50 to categorize low, intermediate, and high burnout.2 These cut-off scores were not established based on validity evidence.

Development and Testing
Developed with a framework that characterizes the core of burnout as fatigue and exhaustion, which are attributed to specific domains in a person’s life (personal, work-related, and client-related). In a sample of 1914 individuals from seven different workplaces CBI scales had high internal reliability, scores correlated with SF-36 scales, and scores predicted future sickness absence, intention to quit, and sleep problems.3

Links to Outcomes or Health System Characteristics Related to Health Care Professionals
Existing data is limited as a majority of studies have included small samples of physicians and other health care providers, and have mostly been conducted abroad. In terms of potential health care related outcomes, CBI scores have been associated with lower perceptions of quality of care (psychosocial care, diagnosis/therapy, quality assurance, diagnostic and therapeutic errors in a study of 1311 German surgeons),4 nurse turnover intention (in a study of 159 ICU nurses in Iran),5 self-reported sick absences (prospective study of 824 Danish workers in human service sectors),6 and sickness days, sleep problems, use of pain killers, and intention to quit work (prospective study of 1914 Danish employees in human sector).7 In terms of personal outcomes, CBI scores predicted the WHO-Five Well-Being Index score among 317 Canadian residents,8 and antidepressant treatment, especially among men (prospective study of 2936 Danish employees). From a health system characteristics perspective, associations have been found between CBI score and job strain, over-commitment, and low social support (Taiwanese health care professionals)9 and between practice setting and recent reorganization at work (598 Norwegian midwives).1

Country of Origin
Denmark

Past or Validated Applications
Patient age: adults
Population: any occupational group
National benchmark data not available for US physicians, medical students, or general population.
Setting: any


Cost
$0. Publicly available in Table S1 1 and https://nfa.dk/da/Vaerktoejer/Sporgeskemaer/Sporgeskema-til-maaling-af-udbraendthed/Copenhagen-Burnout-Inventory-CBI

Notes
Multiple language translations are available

Valid and Reliable Survey Instruments to Measure Composite Well-Being

Purpose
To measure burnout and professional fulfillment in physicians.

Format/Data Source
The Stanford Professional Fulfillment Index (PFI) is a 16-item survey that covers burnout (work exhaustion and interpersonal disengagement) and professional fulfillment. Response options are on a five-point Likert scale (“not at all true” to “completely true” for professional fulfillment items and “not at all” to “extremely” for work exhaustion and interpersonal disengagement items.)

Date
Measure published in 2018.

Measure Item Mapping
Professional fulfillment: items 1-6
Work exhaustion: 7-10
Interpersonal disengagement items: 11-16
Data Analysis
Items are scored 0 to 4. Each dimension is treated as a continuous variable. Scale scores are calculated by averaging the item scores of all the items within the corresponding scale. Scale scores can then be multiplied by 25 to create a scale range from 0 to 100. Higher score on the professional fulfillment scale is more favorable. In contrast, higher scores on the work exhaustion or interpersonal disengagement scales are less favorable. Dichotomous burnout categories are determined from the average item score (range 0 to 4) of all 10 burnout items (work exhaustion and interpersonal disengagement), using a cut-point of 1.33. Dichotomous professional flfillment is recommended at an average item score cut-point of >3.0.

Development and Testing
The PFI was developed for use in physicians.1 Development involved input from members of a physician wellness committee (n>30) and two national physician wellness experts. The efficacy of the PFI has been evaluated in a sample of 185 residents and 65 practicing physicians. Principal components analysis of data from this sample justified the three PFI subscales of professional fulfillment, work exhaustion, and interpersonal disengagement. In a subsample of 100 responders who had stable sleep-related impairment scores over a 2-3 week period, test-retest reliability estimates were 0.82 for professional fulfillment (α = 0.91), 0.80 for work exhaustion (α = 0.86), 0.71 for interpersonal disengagement (α = 0.92), and 0.80 for overall burnout (α = 0.92). The correlation between the PFI work exhaustion subscale score and Maslach Burnout Inventory emotional exhaustion subscale score was 0.72. The correlation between PFI interpersonal disengaement score and Maslach Burnout Inventory depersonalization subscale score was 0.59. The correlation between the PFI professional fulfillment score and Maslach Burnout Inventory personal accomplishment subscale score was 0.46. Compared to the Maslach Burnout Inventory, the PFI burnout scale sensitivity and specificity in identifying those with burnout was 72% and 84%, respectively, and AUC was 0.85. PFI scales also correlated in the expected directions with Patient-Reported Outcomes Measurement Information System (PROMIS) sleep-related impairment, depression, and anxiety scores, and with World Health Organization Quality of Life-BREF scores, PFI scales demonstrated sufficient sensitivity to detect expected effects of a two-point (range 8-40) change in PROMIS sleep-related impairment.1

Links to Outcomes or Health System Characteristics Related to Health Care Professionals
In the study of 250 resident and practicing physicians PFI work exhaustion and interpersonal disengagement had small (r=.15 and .33, respectively) but statistically significant correlations with scores on a 4-item medical error scale (internal consistency reliability estimate α =.62). Mean medical error scale scores were higher among those physicians with burnout (as classified using the PFI) in comparison to those without burnout. The Cohen’s d effect size difference in self-reported medical errors for high versus low burnout classified using the PFI was 0.55.1

Country of Origin
USA

Past or Validated Applications
Patient age: adults
Population: physicians
Benchmark data are avialable for practicing U.S. physicians and residents from the authors.
Setting: any health care setting


Cost
Publicly available in article. No cost for non-profit organizations using the PFI for research or program evaluation. Cost for commercial use or use by for-profit organizations depends on application; contact the Stanford Risk Authority at [email protected].

Purpose
To identify distress in a variety of dimensions (burnout, fatigue, low mental/physical quality of life, depression, anxiety/stress).1-5

Format/Data Source
7 or 9-item instrument with yes/no response categories.

Date
Measure released in 2010.

Measure Item Mapping
N/A

Data Analysis
A total score is calculated by adding the number of ‘yes’ responses. In a sample of physicians, medical students, and US workers, every one point increase in score resulted in a step-wise increased probability of distress and risk for adverse personal and professional consequence. For the 7-item version, score range is 0 to 7, and threshold score to identify individuals in distress is 4 or higher for medical students, 5 or higher for residents, 4 or higher for practicing physicians, and 2 or higher for other US workers. In the expanded 9-item version, the original 7-items are scored in a traditional manner, with responses to meaning in work and satisfaction with work-life balance items resulting in 1 point being added or subtracted,1 resulting in a score range of -2 to 9.

Development and Testing
The 7-item Well-Being Index (WBI) was originally designed to be used in medical students.4,5 Development involved input from experts, correlation analysis from previously administered assessments, and a multi-step validation process. After initial development in a sample of 2230 medical students, the efficacy of the WBI was confirmed in a separate sample of 2682 medical students. At a threshold score of 4 or higher, the WBI’s specificity for identifying medical students with severe distress ranged from 88-91% with sensitivity of 59-93%.4 The WBI was validated in a national sample of 7560 US residents in 2012.3 At a threshold score of 5 or higher the index’s specificity for identifying residents with low mental QOL, high fatigue, or recent suicidal ideation was 84%. The score also stratified residents’ self-reported medical errors. The WBI was also validated in a national sample of 6994 US physicians. At a threshold score of 4 or higher, the index’s specificity for identifying physicians with low mental QOL, high fatigue, or recent suicidal ideation was 86%.2 The score also stratified career satisfaction, reported intent to leave the current practice, and self-reported medical errors. In 2014, the 7-item WBI was tested in a sample of 5392 US workers and 6880 US physicians, and the 9-item WBI was developed and tested.1 The 9-item was created in an effort to identify individuals who were thriving, and included items exploring satisfaction with work life integration and meaning in work, both of which may mitigate the relationship between job-related stress and psychological distress.1 The 9-item WBI predicted low and high QOL, high fatigue, recent thoughts of suicidal ideation, and burnout in both samples. The area under the curve of the 7-item and the 9-item for identifying burnout was 0.84 and 0.85 in the physician sample, respectively.

Links to Outcomes or Health System Characteristics Related to Health Care Professionals
National studies have found associations between WBI scores and health care related outcomes (e.g., medical error, physician turnover) and personal outcomes (e.g., fatigue, recent suicidal ideation).1-5

Country of Origin
USA

Past or Validated Applications
Patient age: adults
Population: any occupational group
National benchmark data available for US physicians, residents, medical students, and general population, with national benchmarks soon available for US nurses and advance practice providers.
Setting: any

Cost

The WBI is free for research use and for use in quality improvement efforts by nonprofit organizations. An interactive version of the index that provides personalized feedback to individuals and links to national resources is also free for individual use. The organizational version of the interactive WBI that provides individualized feedback, links to local and national resources, and organization level reports is also available but requires a fee for use. Access to the tool and information regarding cost and permission to use the tool is available at https://www.mededwebs.com/well-being-index.

Valid and Reliable Survey Instruments to Measure Depression and Suicide Risk

Purpose
To measure major depression and suicidal ideation.

Format/Data Source
The Patient Health Questionnaire-9 (PHQ-9) is the self-report component of the PRIME-MD (Primary Care Evaluation of Mental Disorders) inventory1. For each of the 9 DSM-5 (Diagnostic and Statistical Manual of Mental Disorders [Fifth Edition]) depressive symptoms, participants indicate whether, during the previous 2 weeks, the symptom has bothered them “not at all,” for “several days,” for “more than half the days,” or “nearly every day.” Suicidal ideation is screened for with item 9 of the Patient Health Questionnaire–9 (PHQ-9) (i.e., “Thoughts that you would be better off dead, or hurting yourself in some way” over the past 2 weeks). Positive response to this item increases the cumulative risk for a suicide attempt and suicide completion over the next year by 10- and 100-fold, respectively2.

Date
Measure released in 1999.

Measure Item Mapping
One item each for:

Interest
Mood
Sleep
Energy
Appetite
Self-worth
Concentration
Psychomotor slowing or activation
Suicidal ideation
Data Analysis
The PHQ-9 is most often used as a continuous measure, with scores for individual items summed to produce a composite depressive symptom score between 0-27. Cut points of 5, 10, 15 and 20 representing mild, moderate, moderately severe and severe levels of depressive symptoms. The PHQ-9 can also be used as a diagnostic algorithm to make a probable diagnosis of major depressive disorder (MDD)3.

Development and Testing
PHQ-9 scores ≥10 have a sensitivity and specificity of 88% for major depressive disorder3,4. The PHQ-9 performs similarly across sex3,5, age6and racial/ethnic groups7-9. Importantly for longitudinal assessments, the PHQ-9 shows high sensitivity to change over time5,10. Compared to other available depression measures, the PHQ-9 is relatively short and demonstrates good validity, sensitivity and specificity in both clinical and non-clinical populations4,11. Further, the PHQ-9 is the primary depression instrument utilized by large health care providers such as the U.S. Department of Veterans Affairs and the National Health Services, and is the instrument that web users are taken to after a Google search for “clinical depression.”12,13 (https://www.blog.google/products/search/learning-more-about-clinical-depression-phq-9-questionnaire/). The widespread use of the PHQ-9 ensures a range of normative data for comparison.

Links to Outcomes or Health System Characteristics Related to Health Care Professionals
In physicians, PHQ-9 scores have been associated with medical errors, work hours and productivity10,14,15

Country of Origin
United States of America

Past or Validated Applications
Patient age: Adolescents, adults, and older adults
Population: any occupational group
From meta-analyses, comparison data are available for the general population, medical students (N=10,386),16 and resident physicians (N=3,756)17
Setting: any

Cost
$0. Available at: http://www.phqscreeners.com/sites/g/files/g10016261/f/201412/PHQ-9_English.pdf.

Notes
Multiple language translations are available.

Alternate Depression Measure
The abbreviated 2-item PHQ-2 instrument has been developed for situations where administration of the full PHQ-9 is not feasible. The PHQ-2 is composed of the first two items of the PHQ-9 (assessing low mood and loss of interest) and subjects receive a score between 0 and 3 on each item18. With a composite score range between 0-6, scores of ≤2 or ≤3 have been considered a positive screen for depression depending on the study. A positive PHQ-2 screen for depression correlates well with positive screens on the PHQ-9 and other longer depression instruments19. Further, the PHQ-2 has generally shown moderate to good sensitivity to detect clinical depression. However, the specificity of PHQ-2 has been variable across studies and low in many studies20,21. Thus, the PHQ-2 is most accurately viewed as a screening tool for depression rather than a diagnostic instrument22.