EHR indicates electronic health record.
Customize your JAMA Network experience by selecting one or more topics from the list below.
Identify all potential conflicts of interest that might be relevant to your comment.
Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.
Err on the side of full disclosure.
If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.
Not all submitted comments are published. Please see our commenting policy for details.
Mazur LM, Mosaly PR, Moore C, Marks L. Association of the Usability of Electronic Health Records With Cognitive Workload and Performance Levels Among Physicians. JAMA Netw Open. 2019;2(4):e191709. doi:10.1001/jamanetworkopen.2019.1709
¿Se relacionan las mejoras en la funcionalidad de un sistema de registros electrónicos de salud con la carga laboral cognitiva y el desempeño de los médicos?
En este estudio de mejora de calidad, los médicos asignados para realizar tareas en un sistema mejorado de registros electrónicos de salud evidenciaron una cantidad estadísticamente menor de carga laboral cognitiva; aquellos que usaron un sistema con control longitudinal mejorado gestionaron adecuadamente resultados de pruebas significativamente más anormales, desde el punto de vista estadístico, en comparación con otros médicos asignados para usar registros de salud de referencia.
Las mejoras en la funcionalidad de los registros electrónicos de salud estarían relacionadas con mejores niveles de las cargas laborales cognitivas y el desempeño de los médicos; estas conclusiones indican que los sistemas de la próxima generación deben eliminar las interacciones sin valor añadido.
Current electronic health record (EHR) user interfaces are suboptimally designed and may be associated with excess cognitive workload and poor performance.
To assess the association between the usability of an EHR system for the management of abnormal test results and physicians’ cognitive workload and performance levels.
Design, Setting, and Participants
This quality improvement study was conducted in a simulated EHR environment. From April 1, 2016, to December 23, 2016, residents and fellows from a large academic institution were enrolled and allocated to use either a baseline EHR (n = 20) or an enhanced EHR (n = 18). Data analyses were conducted from January 9, 2017, to March 30, 2018.
The EHR with enhanced usability segregated in a dedicated folder previously identified critical test results for patients who did not appear for a scheduled follow-up evaluation and provided policy-based decision support instructions for next steps. The baseline EHR displayed all patients with abnormal or critical test results in a general folder and provided no decision support instructions for next steps.
Main Outcomes and Measures
Cognitive workload was quantified subjectively using NASA–Task Load Index and physiologically using blink rates. Performance was quantified according to the percentage of appropriately managed abnormal test results.
Of the 38 participants, 25 (66%) were female. The 20 participants allocated to the baseline EHR compared with the 18 allocated to the enhanced EHR demonstrated statistically significantly higher cognitive workload as quantified by blink rate (mean [SD] blinks per minute, 16  vs 24 ; blink rate, –8 [95% CI, –13 to –2]; P = .01). The baseline group showed statistically significantly poorer performance compared with the enhanced group who appropriately managed 16% more abnormal test results (mean [SD] performance, 68% [19%] vs 98% [18%]; performance rate, –30% [95% CI, –40% to –20%]; P < .001).
Conclusions and Relevance
Relatively basic usability enhancements to the EHR system appear to be associated with better physician cognitive workload and performance; this finding suggests that next-generation systems should strip away non–value-added EHR interactions, which may help physicians eliminate the need to develop their own suboptimal workflows.
The usability of electronic health records (EHRs) continues to be a major concern.1-3 Usability challenges include suboptimal design of interfaces that have confusing layouts and contain either too much or too little relevant information as well as workflows and alerts that are burdensome. Suboptimal usability has been associated with clinician burnout and patient safety events, and improving the usability of EHRs is an ongoing need.4,5
A long-standing challenge for the US health care system has been to acknowledge and appropriately manage abnormal test results and associated missed or delayed diagnoses.6-11 The unintended consequences of these shortcomings include missed and delayed cancer diagnoses and associated negative clinical outcomes (eg, 28% of women did not receive timely follow-up for abnormal Papanicolaou test results8; 28% of women requiring immediate or short-term follow-up for abnormal mammograms did not receive timely follow-up care9). Even in the EHR environment, with alerts and reminders in place, physicians continue to often inappropriately manage abnormal test results.12-21 Some key remaining barriers to effective management of test results are suboptimal usability of existing EHR interfaces and the high volume of abnormal test result alerts, especially less-critical alerts that produce clutter and distract from the important ones.22,23 In addition, few organizations have explicit policies and decision support systems in their EHR systems for managing abnormal test results, and many physicians have developed processes on their own.24-26 These issues are among the ongoing reasons to improve the usability of the EHR-based interfaces for the evaluation and management of abnormal test results.
We present the results of a quality improvement study to assess a relatively basic intervention to enhance the usability of an EHR system for the management of abnormal test results. We hypothesized that improvements in EHR usability would be associated with improvements in cognitive workload and performance among physicians.
This research was reviewed and approved by the institutional review board committee of the University of North Carolina at Chapel Hill. Written informed consent was obtained from all participants. The study was performed and reported according to the Standards for Quality Improvement Reporting Excellence (SQUIRE) guideline.27
Invitations to participate in the study were sent to all residents and fellows in the school of medicine at a large academic institution, clearly stating the need for experience with using the Epic EHR software (Epic Systems Corporation) in reviewing test results to undergo the study’s simulated scenarios. A $100 gift card was offered as an incentive for participation. Potential participants were given an opportunity to review and sign a consent document, which included information on study purpose, goals, procedures, and risks and rewards as well as the voluntary nature of participation and the confidentiality of data. Recruited individuals had the right to discontinue participation at any time. Forty individuals were recruited to participate, 2 of whom were excluded (eg, numerous cancellations), leaving 38 evaluable participants (Table 1).
From April 1, 2016, to December 23, 2016, 38 participants were enrolled and prospectively and blindly allocated to a simulated EHR environment: 20 were assigned to use a baseline EHR (without changes to the interface), and 18 were assigned to use enhanced EHRs (with changes intended to enhance longitudinal tracking of abnormal test results in the system) (Figure). Abnormalities requiring an action included new abnormal test results and previously identified abnormal test results for patients who did not show up (without cancellation) for their scheduled appointment in which the findings would be addressed. The new abnormal test results included a critically abnormal mammogram (BI-RADS 4 and 5) and Papanicolaou test result with high-grade squamous intraepithelial lesion as well as noncritical results for rapid influenza test, streptococcal culture complete blood cell count, basic metabolic panel, and lipid profile, among others. The previously identified critical test results that required follow-up included abnormal mammogram (BI-RADS 4 and 5), Papanicolaou test result with high-grade squamous intraepithelial lesion, chest radiograph with 2 × 2-cm lesion in the left upper lobe, pulmonary function test result consistent with severe restrictive lung disease, and pathologic examination with biopsy finding of ascending colon consistent with adenocarcinoma.
The simulated scenarios were iteratively developed and tested by an experienced physician and human factors engineer (C.M. and L.M.) in collaboration with an Epic software developer from the participating institution. The process included functionality and usability testing and took approximately 12 weeks to complete. The experimental design was based on previous findings that attending physicians use the EHR to manage approximately 57 test results per day over multiple interactions.22,23 Given that residents often manage a lower volume of patients, the present study was designed such that participants were asked to review a total of 35 test results, including 8 or 16 abnormal test results evenly distributed between study groups, in 1 test session. By organizational policies and procedures, participants were expected to review all results, acknowledge and follow-up on abnormal test results, and follow-up on patients with a no-show status (without cancellation) for their scheduled appointment aimed at addressing their previously identified abnormal test result. The patient data in the simulation included full medical records, such as other clinicians' notes, previous tests, and other visits or subspecialist coverage.
The baseline EHR (without enhanced interface usability), currently used at the study institution, displayed all new abnormal test results and previously identified critical test results for patients with a no-show status (did not show up for or cancelled their follow-up appointment) in a general folder called Results and had basic sorting capabilities. For example, it moved all abnormal test results with automatically flagged alerts to the top of the in-basket queue; flagged alerts were available only for test results with discrete values. Thus, critical test results for mammography, Papanicolaou test, chest radiograph, pulmonary function test, and pathologic examination were not flagged or sortable in the baseline EHR. The baseline EHR included patient status (eg, completed the follow-up appointment, no show), however, that information needed to be accessed by clicking on the visit or patient information tab located on available prebuilt views within each highlighted result.
The enhanced EHR (with enhanced interface usability) automatically sorted all previously identified critical test results for patients with a no-show status in a dedicated folder called All Reminders. It also clearly displayed information regarding patient status and policy-based decision support instructions for next steps (eg, “No show to follow-up appointment. Reschedule appointment in Breast Clinic”).
The intervention was developed according to the classic theory of attention.28 This theory indicates that cognitive workload varies continuously during the course of performing a task and that the changes of cognitive workload may be attributed to the adaptive interaction strategies of the operator exposed to task demands (eg, baseline or enhanced usability).
The NASA–Task Load Index (NASA-TLX) is a widely applied and valid tool used to measure workload,29-34 including the following 6 dimensions: (1) mental demand (How much mental and perceptual activity was required? Was the task easy or demanding, simple or complex?); (2) physical demand (How much physical activity was required? Was the task easy or demanding, slack or strenuous?); (3) temporal demand (How much time pressure did you feel with regard to the pace at which the tasks or task elements occurred? Was the pace slow or rapid?); (4) overall performance (How successful were you in performing the task? How satisfied were you with your performance?); (5) frustration level (How irritated, stressed, and annoyed [compared with content, relaxed, and complacent] did you feel during the task?); and (6) effort (How hard did you have to work, mentally and physically, to accomplish your level of performance?).
At the end of the test session, each participant performed 15 separate pairwise comparisons of the 6 dimensions (mental demand, physical demand, temporal demand, overall performance, frustration level, and effort) to determine the relevance (and hence weight) of a dimension for a given session for a participant. Next, participants marked a workload score between low (corresponding to 0) to high (corresponding to 100), separated by 5-point marks on the tool, for each dimension for each session. The composite NASA-TLX score for each session was obtained by multiplying the dimension weight with the corresponding dimension score, summing across all dimensions, and dividing by 15.
Using eye-tracking technology (Tobii X2-60 screen mount eye tracker; Tobii), we quantified physiological workload with validated methods based on changes in blink rate.35,36 Eye closures ranging between 100 milliseconds to 400 milliseconds were coded as a blink. The validity (actual blink or loss of data) was later confirmed by visual inspection by the expert researcher on our team (P.R.M.) who specializes in physiological measures of cognitive workload. Decreased blink rate has been found to occur in EHR-based tasks requiring more cognitive workload.37 The fundamental idea is that blink rate slows down under visual task demands that require more focused attention and working memory load, but this association might vary with the type of visual task demands.38-40 For each participant, the time-weighted mean blink rate measured during the participant’s review of all abnormal test results was calculated and then considered for data analysis.
For each participant, performance was quantified as the percentage of (new or previously identified) abnormal test results that were appropriately acted on (with possible scores ranging from 0%-100%). Appropriate action on abnormal test result was defined as the study participant ordering (compared with not ordering) a referral for further diagnostic testing (eg, breast biopsy for mass identified on an abnormal mammogram) to a subspecialty clinic (eg, breast clinic). In addition, per the policy and procedures of the institution in which the study took place, if patients missed their appointment for follow-up on critical test results, the participants were expected to contact (compared with not contact) schedulers to reschedule follow-up care. We also quantified the total amount of time that participants took to complete each simulated scenario.
Fatigue can affect perceived and physiological workload and performance and thus can confound study results.41-43 Because of the possible confounding association of fatigue, participants were asked to evaluate their own state of fatigue immediately before each simulated session using the fatigue portion of the Crew Status Survey.44 The fatigue assessment scale included these levels: 1 (fully alert, wide awake, or extremely peppy), 2 (very lively, or responsive but not at peak), 3 (okay, or somewhat fresh), 4 (a little tired, or less than fresh), 5 (moderately tired, or let down), 6 (extremely tired, or very difficult to concentrate), and 7 (completely exhausted, unable to function effectively, or ready to drop). The Crew Status Survey has been tested in real and simulated environments and has been found to be both reliable and able to discriminate between fatigue levels.44,45
On the basis of the anticipated rate of appropriately identified abnormal test results in the literature12-21 and the anticipated magnitude of the association of the enhanced EHR, we required a sample size of 30 participants, each reviewing 35 test results, to achieve 80% power to detect a statistically significant difference in cognitive workload and performance. Specifically, we performed sample size calculations at α = .05, assuming that we could detect a mean (SD) difference of 10 (10) in NASA-TLX scores, a mean (SD) difference of 5 (10) in blink rate, and a mean (SD) difference of 10% (15%) in performance.
Before data analyses, we completed tests for normality using the Shapiro-Wilk test and equal variance using the Bartlett test for all study variables (cognitive workload, performance, and fatigue). Results indicated that all assumptions to perform parametric data analysis were satisfied (normality: all P > .05; equal variance: all P > .05).
We conducted a 2-sample t test to assess the association of enhanced usability of the EHR interface to manage abnormal test results with physician cognitive workload and performance. All data analyses were conducted from January 9, 2017, to March 30, 2018, using JMP 13 Pro software (SAS Institute Inc). Statistical significance level was set at 2-sided P = .05, with no missing data to report.
Of the 852 eligible residents and fellows, 38 (5%) participated. Twenty-five participants (66%) were female and 13 (34%) were male. Thirty-six (95%) were residents and 2 (5%) were fellows (Table 1). Descriptive statistics of cognitive workload and performance are provided in Table 2.
No statistically significant difference was noted in perceived workload between the baseline EHR and enhanced EHR groups (mean [SD] NASA-TLX score, 53  vs 49 ; composite score, 4 [95% CI, –5 to 13]; P = .41). A statistically significantly higher cognitive workload as shown by the lower mean blink rate was found in the baseline EHR group compared with the enhanced EHR group (mean [SD] blinks per minute, 16  vs 24 ; blink rate, –8 [95% CI, –13 to –2]; P = .01).
A statistically significantly poorer performance was found in the baseline EHR group compared with the enhanced EHR group (mean [SD] performance, 68% [19%] vs 98% [18%]; performance rate, –30% [95% CI, –40% to –20%]; P < .001). The difference was mostly attributable to review of patients with a no-show status for a follow-up appointment (Table 2). No difference between the baseline and enhanced EHR groups was noted in time to complete simulated scenarios (mean [SD] time in seconds, 238  vs 236 ; time to complete, 2 seconds [95% CI, –49 to 52]; P > .05). No statistically significant difference was noted in fatigue levels between baseline and enhanced EHR groups (mean [SD] fatigue level, 2.7 [1.4] vs 2.8 [0.9]; fatigue level, –0.1 [95% CI, –0.8 to 0.7]; P = .84).
The rate of appropriately managing previously identified critical test results of patients with a no-show status in the baseline EHR was 37% (34 of 90 failure opportunities) compared with 77% (62 of 81 failure opportunities) in the enhanced EHR. The rate of appropriately acknowledging new abnormal test results in the baseline EHR group was 98% (118 of 120 failure opportunities; 2 participants did not acknowledge a critical Papanicolaou test result) compared with 100% (108 of 108 failure opportunities) in the enhanced EHR group.
Participants in the enhanced EHR group indicated physiologically lower cognitive workload and improved clinical performance. The magnitude of the association of EHR usability with performance we found in the present study was modest, although many such improvements tend to have substantial value in the aggregate. Thus, meaningful usability changes can and should be implemented within EHRs to improve physicians’ cognitive workload and performance. To our knowledge, this research is the first prospective quality improvement study of the association of EHR usability enhancements with both physiological measure of cognitive workload and performance during physicians’ interactions with the test results management system in the EHR.
The enhanced EHR was more likely to result in participants reaching out to patients and schedulers to ensure appropriate follow-up. Physicians who used the baseline EHR were more likely to treat the EHR (not treat the patient) by duplicating the referral, rather than to reach out to patients and schedulers to find out the issues behind the no-show. In the poststudy conversations with participants, most indicated a lack of awareness about policies and procedures for managing patients with a no-show status and justified their duplication of orders as safer medial practice. This result seems to be in line with findings from real clinical settings, suggesting that few organizations have explicit policies and procedures for managing test results and most physicians developed processes on their own.25,26
The result from the baseline EHR group is in line with findings from real clinical settings that indicated physicians did not acknowledge abnormal test results in approximately 4% of cases.19,20 The optimal performance in the enhanced EHR group is encouraging.
No significant difference was noted in the time to complete simulated scenarios and perceived workload between baseline and enhanced EHR groups, as quantified by the global NASA-TLX or by each dimension, while trending toward lower scores (Table 2). The time to complete simulated scenarios and NASA-TLX scores was elevated in the participants in the enhanced EHR group possibly because it was their first time interacting with this enhanced usability.
Overall, past and present research suggests that challenges remain in ensuring the appropriate management of abnormal test results. According to a study, 55% of clinicians believe that EHR systems do not have convenient usability for longitudinal tracking of and follow-up on abnormal test results, 54% do not receive adequate training on system functionality and usability, and 86% stay after hours or come in on the weekends to address notifications.46
We propose several interventions based on our findings to improve the proper management of abnormal test results. First, use the existing capabilities and usability features of the EHR interfaces to improve physicians’ cognitive workload and performance. Similar recommendations were proposed by other researchers.3,5,17-21,46-48 For example, the critical test results for patients with a no-show status should be flagged (ie, clearly visible to the clinician) indefinitely until properly acted on in accordance with explicit organizational policies and procedures. Second, develop explicit policies and procedures regarding the management of test results within EHRs, and implement them throughout the organization, rather than having clinicians develop their own approaches.25,26,49 For example, Anthony et al49 studied the implementation of a critical test results policy for radiology that defined critical results; categorized results by urgency and assigned appropriate timelines for communication; and defined escalation processes, modes of communication, and documentation processes. Measures were taken for 4 years from February 2006 to January 2010, and the percentage of reports adhering to the policies increased from 29% to 90%.49 Third, given that the work is being done in an electronic environment, seize the opportunities to use innovative simulation-based training sessions to address the challenges of managing test results within an EHR ecosystem.50-54 Fourth, establish a regular audit and feedback system to regularly give physicians information on their performance on managing abnormal test results.55-57
This study focused on a particular challenge (ie, the management of abnormal test results), but many other interfaces and workflows within EHRs can be similarly enhanced to improve cognitive workload and performance. For example, there is a need to improve reconciliation and management of medications, orders, and ancillary services. The next generation of EHRs should optimize usability by stripping away non–value-added EHR interactions, which may help eliminate the need for physicians to develop suboptimal workflows of their own.
This study has several limitations, and thus caution should be exercised in generalizing the findings. First, the results are based on 1 experiment with 38 residents and fellows from a teaching hospital artificially performing a discrete set of scenarios. Larger studies could consider possible confounding factors (eg, specialty, training levels, years of EHR use, attendings or residents) and more accurately quantify the association of usability with cognitive workload and performance. Second, performing the scenarios in the simulated environment, in which the participants knew that their work was going to be assessed, may have affected participants’ performance (eg, more or less attentiveness and vigilance as perceived by being assessed or by the possibility of real harm to the patient). To minimize this outcome, all participants were given a chance to discontinue their participation at any time, but participant-specific findings would remain confidential. None of the participants discontinued participation in the study, although 2 participants were excluded from the study as they were not able to meet the scheduling criteria. Third, we acknowledge that the cognitive workload and performance scores were likely affected by the setting (eg, simulation laboratory and EHR) and thus might not reflect the actual cognitive workload and performance in real clinical settings. A laboratory setting cannot totally simulate the real clinical environment, and some activities cannot be easily reproduced (eg, looking up additional information about the patient using an alternative software, calling a nurse with a question about a particular patient, or a radiologist or laboratory technician calling physicians and verbally telling them about abnormal images). We also recognize that the enhanced usability was not optimal as it was designed and implemented within the existing capabilities of the EHR environment used for training purposes.
Fourth, the intervention might have manipulated both the ease of access to information through a reorganized display and learning because it provided a guide to action by clearly showing information on patient status and policy-based decision support instructions for next steps. Future research could more accurately quantify the association of usability and learning with cognitive workload and performance. Nevertheless, the intervention provided the necessary basis to conduct this study. All participants were informed about the limitations of the laboratory environment before the study began.
Relatively basic usability enhancements to EHR systems appear to be associated with improving physician management of abnormal test results while reducing cognitive workload. The findings from this study support the proactive evaluation of other similar usability enhancements that can be applied to other interfaces within EHRs.
Accepted for Publication: February 14, 2019.
Published: April 5, 2019. doi:10.1001/jamanetworkopen.2019.1709
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2019 Mazur LM et al. JAMA Network Open.
Corresponding Author: Lukasz M. Mazur, PhD, Division of Healthcare Engineering, Department of Radiation Oncology, University of North Carolina at Chapel Hill, PO Box 7512, Chapel Hill, NC 27514 (firstname.lastname@example.org).
Author Contributions: Drs Mazur and Mosaly had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Mazur, Mosaly, Moore.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Mazur, Mosaly.
Critical revision of the manuscript for important intellectual content: All authors.
Statistical analysis: Mazur, Mosaly.
Obtained funding: Mazur.
Administrative, technical, or material support: Mazur, Moore.
Supervision: Mazur, Marks.
Conflict of Interest Disclosures: Dr Marks reported grants from Elekta, Accuray, Community Health, and the US government during the conduct of the study, as well as possible royalties for him, his department, and its members from a software patent. No other disclosures were reported.
Funding/Support: This study was supported by grant R21HS024062 from the Agency for Healthcare Research and Quality.
Role of the Funder/Sponsor: The funder had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Disclaimer: The content is solely the responsibility of the authors and does not necessarily represent the official views of the Agency for Healthcare Research and Quality.
Additional Contributions: We are grateful for the time and effort of the research participants.