Effective use of electronic medical record technology requires examination of the communication approaches of both care providers and patients.
Objective: To determine the agreement between patient-reported symptoms of chest pain, dyspnea, and cough and the documentation of these symptoms by physicians in the electronic medical record.
Methods: Symptoms reported on patient-provided information forms between January 1, 2006, and June 30, 2006, were compared with those identified by natural language processing of the text of clinical notes from care providers. Terms that represent the 3 symptoms were used to search clinical notes electronically with subsequent manual identification of the context (eg, affirmative, negated, family history) in which they occurred. Results were reported using positive and negative agreement, and kappa statistics.
Results: Symptoms reported by 1119 patients age 18 years or older were compared with the nonnegated terms identified in their clinical notes. Positive agreement was 74, 70, and 63 for chest pain, dyspnea, and cough, while negative agreement was 78, 76, and 75, respectively. Kappa statistics were 0.52 (95% confidence interval [CI] = 0.44, 0.60) for chest pain, 0.46 (95% CI = 0.37, 0.54) for dyspnea, and 0.38 (95% CI = 0.28, 0.48) for cough. Positive agreement was higher for older men (P >.05), and negative agreement was higher for younger women (P >.05).
Conclusions: We found discordance between patient self-report and documentation of symptoms in the medical record. This discordance has important implications for research studies that rely on symptom information for patient identification and may have clinical implications that must be evaluated for potential impact on quality of care, patient safety, and outcomes.
(Am J Manag Care. 2008;14(8):530-539)
Three symptoms (chest pain, dyspnea, and cough) reported by patients were compared with symptoms identified by natural language processing of the text of clinical notes from care providers.
The electronic medical record documentation of symptoms differed from the symptoms self-reported by patients.
Research studies that rely on using symptom information must take this discordance into account.
This discordance also may have a potential impact on quality of care, patient safety, and outcomes.
Physical symptoms account for half of all outpatient visits in the United States1 and are commonly not diagnosed.2 The verbal characterization of the symptoms conveyed by the patient and recorded by care providers is central to the practice of clinical medicine, and increasing importance is attached to patient-centered clinical care.3 With the increasing adoption of the electronic medical record (EMR), free text of the clinical history can now be subjected to automated analysis in ways that are impossible or uneconomic with paper-based records.4-6 A common, costly7 example of a symptomatic condition in which history taking is central to management is chronic stable angina pectoris,8 the number 1 cause of death in the entire Western world.9 In addition to its mortality burden, angina is associated with serious morbidity such as myocardial infarction and heart failure. Optimal methods of identifying patients with stable angina remain unclear. Many patients with typical symptoms are not diagnosed with angina,10 and age, sex, and ethnicity may influence physicians’ recommendations for diagnostic testing such as coronary angiography and the resulting International Classification of Diseases codes.11-14 Our preliminary findings indicate that natural language processing (NLP) of the free text of the EMR identifies patients with chronic angina pectoris15 and heart failure15 who were missed by traditional diagnostic coding approaches. Natural language processing is a range of computational techniques aimed at extracting useful information from unstructured text. In the context of the EMR, NLP offers a promising method to automate the collection of a richer set of information for quality improvement and safety that would otherwise require manual chart abstraction.
Early identification of patients at risk for myocardial infarction is critical to its prevention and improved prognosis,16-18 and is possible using patient-reportable information.19 Because the diagnosis of angina pectoris relies on the patient conveying the symptoms to the physician,10 the ambiguous nature of verbal communication as well as the nature of coronary artery disease (ie, its presentation varies according to patient race and sex) can make the diagnosis challenging. These challenges can lead to inconsistency and incompleteness in both the diagnosis and the information recorded in the EMR.11 Currently, the nature of the processes that lead to the creation of the EMR is poorly understood; however, prior research indicates that these processes have critical implications for clinical care and may have a significant impact on patient outcomes.
The primary objective of the current study is to determine whether there was disagreement between patient-reported symptoms of chest pain, dyspnea, and cough and the documentation of these symptoms by care providers in the EMR. Although in this study we focus on heart disease, which constitutes one of the top national healthcare priority areas,20 our findings have significant implications for any condition whose diagnosis and treatment relies on verbal presentation of symptoms.
Sources of Symptom Documentation For this study we identified 2 primary sources of symptom information: patient-provided information forms and clinical notes. Both are part of the Mayo Clinic EMR.
Electronic Medical Record and Clinical Notes. The Mayo Clinic maintains an EMR for each inpatient and outpatient. A large part of the EMR consists of clinical notes, which represent transcriptions by trained medical transcriptionists of the dictations recorded by care providers after each contact with the patient. Clinical notes have been in use electronically since 1994; however, all services at the Mayo Clinic switched to electronic notes in 2005. The content of these notes complies with a nationally accepted standard, Health Level 7 Clinical Document Architecture, and consists of standard sections including chief complaint, history of present illness, review of systems, and impression/report/ plan, among others. This standard is used by most major EMR vendors in the United States.
Current Visit Information Forms. Many healthcare providers, including the Mayo Clinic, ask their patients to fill out forms detailing their prior health and social history, current symptoms, medications, and allergies. The Mayo Clinic’s form is reproduced in the eAppendix (available at www.ajmc. com). Figure 1 shows item 14 of this patient-provided information form, which asks the patient to indicate if he/she has ever experienced a symptom listed on the form. Subsequently, the patient’s responses are captured in a structured format by scanning the form into a database.
We used a subset of the database of patient-provided information entered during 6 months between January 1, 2006, and June 30, 2006, as the convenience sample consisting of 121,891 patients who filled out the form at least once during the 6-month period. The patients in this sample were not restricted to any geographic location and represent general Mayo Clinic ambulatory and hospitalized populations. We selected records that indicated either a positive or a negative response to questions about chest pain, chest pressure, shortness of breath/dyspnea, and cough. (These codes are internal identifiers provided here for reference and have no relation to any standardized nomenclature of medical concepts.) The analysis presented here is based on a combined symptom for chest pain that includes both the chest pain and chest pressure responses.
Manual Verification of Symptoms
We manually examined clinical notes containing evidence of chest pain, dyspnea, and cough according to the procedure illustrated in Figure 2. For each of the symptoms, we randomly selected 200 patients who marked the symptom on at least 1 of their forms and 200 patients who did not mark the symptom on any of their forms. No restrictions other than age ≥18 years were applied. Only records of patients who had 1 or more clinical notes dictated during the study period (January 1, 2006, through June 30, 2006) were used in this study.
Each clinical note was searched automatically to identify and electronically mark the terms representing each of the symptoms and their orthographic variants and synonyms using search keywords arranged into natural language queries (Figure 3). The keywords and methods were similar to those previously reported.15 For example, a natural language query for “chest pain” identifies portions of clinical note text where 1 of the terms describing PAIN (eg, pressure) either precedes or follows 1 of the terms describing the LOCATION (eg, chest). Thus, this query will find either “chest pressure” or “pressure in the chest.” Subsequently, the text of the notes was manually examined by 2 nurse abstractors to determine the context in which each term appeared. The range of possible context labels that the abstractors could choose is displayed in column 1 of Table 1. The “conditional” context label was selected when the term was mentioned in a conditional context (eg, “I recommended nitroglycerin if he should develop chest pain”). Terms manually identified as “negated,” “family history,” “conditional,” or “unknown” were excluded from subsequent analysis.
In addition to identification of the context in which the query terms appeared, the abstractors manually identified all symptoms of chest pain, dyspnea, and cough in a random sample of 100 clinical notes. These manually identified symptoms were compared with those identified with automated natural language queries to determine their sensitivity.
Angina refers to a complex of symptoms, 1 of which may be chest pain or discomfort. Similarly to dyspnea, orthopnea and paroxysmal nocturnal dyspnea (PND) refer to a specific kind of shortness of breath. The patient-provided information form used for this study contains a question that covers this type of dyspnea (see “awakened with shortness of breath” in Figure 1). For the sensitivity analysis, we added the terms “angina,” “orthopnea,” and “PND” to the natural language queries to determine whether these related terms had a measurable effect on the agreement between symptoms reported by patients and care providers.
For the primary analysis and the symptom-reporting consistency, data were analyzed in terms of positive and negative agreement rather than sensitivity and specificity because neither the patient self-report nor the clinical notes could be considered a perfect criterion standard. Traditional kappa statistics are sensitive to imbalances in the marginal totals of 2 × 2 comparisons.21 Positive and negative agreement measures have been proposed as a way to ensure the correct interpretation of kappa values22 and have been used to assess the agreement between patient-reported information and the medical record.23 The computation of these measures is illustrated in Table 2, which represents a 2 × 2 table containing the counts where the patient responses were in concordance or discordance with the symptoms found in clinical notes for the same patient. Thus, the positive agreement is a ratio of the concordances in positive responses to the difference between the concordances in positive and negative responses added to the total number of samples according to the following formula: 100 2 (2a/[N + (a - d)]). The negative agreement is a ratio of the concordances in negative responses to the difference between the concordances in positive and negative responses subtracted from the total number of samples according to the following formula: 100 2 (2d/[N - (a - d)]).22 In addition we also report on the false-negative rate, which in our case shows the proportion of times when the patient reported a symptom but the symptom did not appear in the physician’s note [ie, false-negative rate = 100 2 (c/(a + c))]. We report standard measures of sensitivity and specificity instead of positive and negative rates for the assessment of reliability of identification of negation by the NLP system because we used a manually created reference standard. Stratified analyses in sex and age subgroups were performed as well.
Of the 121,891 patients who filled out a patient-provided information form during 6 months between January 1, 2006, and June 30, 2006, 6569 patients (5.39%; 95% confidence interval [CI] = 5.26, 5.52) reported chest pain, 6166 patients (5.06%; 95% CI = 4.94, 5.18) reported chest pressure, 13,924 patients (11.42%; 95% CI = 11.24, 11.60) reported dyspnea, and 11,670 patients (9.57%; 95% CI = 9.41, 9.74) reported cough. Combining chest pain and chest pressure as synonymous terms yielded 10,518 patients (8.63%; 95% CI = 8.47, 8.79) who reported either chest pain or pressure.
Random sampling resulted in a study population of 1119 patients, with 373 patients in the chest pain group, 391 patients in the dyspnea group, 337 patients in the cough group, and 18 patients (1.6%) in more than 1 group. Of these 1119 patients, 499 (44.6%) were male, 127 (11.3%) were between age 18 and 34 years, 185 (16.5%) were between age 35 and 49 years, 337 (30.1%) were between age 50 and 64 years, and 470 (42.0%) were age 65 years or older. Twelve percent of the clinical notes for the 1119 patients were marked as a “Hospital Admission” or “Hospital Dismissal” note, thus representing an estimate of the notes from the hospitalized population. Only 6 patients filled out more than 1 patientprovided information form during the study period. Overall, the clinical notes in this study sample represent a variety of clinical specialties including primary care (20%), cardiovascular services (13%), physical medicine and rehabilitation (5%), hematology (4%), neurology (4%), endocrinology (4%), surgery (4%), psychology/psychiatry (4%), gastroenterology (3%), and emergency medicine (2%), among other less prevalent specialties.
Validation of Natural Language Queries
The distribution of contexts illustrated in Table 1 shows that the terms denoting chest pain, dyspnea, and cough occur in negated contexts between 18% and 29% of the time. These observations are consistent with those of Chapman et al24; in their study, 27% of the findings and diseases were manually identified as negated in the process of creating a reference standard. We examined the precision of the NLP methodology for identification of the context (negated vs affirmative) in which symptoms were mentioned in the free text of clinical reports. Complete results are presented in Table 3. The sensitivity of the NLP negation algorithm was 83% (95% CI = 78, 88), 84% (95% CI = 80, 87), and 88% (95% CI = 84, 90) for cough, dyspnea, and chest pain, respectively. The specificity was 91% (95% CI = 90, 93), 93% (95% CI = 91, 94), and 92% (95% CI = 91, 94), for cough, dyspnea, and chest pain, respectively. The kappa values were 0.69 (95% CI = 0.63, 0.74) for cough, 0.74 (95% CI = 0.71, 0.78) for dyspnea, and 0.78 (95% CI = 0.75, 0.81) for chest pain.
In the random samples of 100 notes for each of the 3 symptoms, the abstractors identified 34 men tions of chest pain, 46 mentions of dyspnea, and 24 mentions of cough. Of the 34 chest pain mentions, 31 were also identified by the natural language queries yielding 91% (95% CI = 82, 100) sensitivity. Of the 46 dyspnea mentions, 45 were identified by the queries yielding 98% (95% CI = 94, 100) sensitivity. All 24 mentions of cough also were identified by automatic queries.
Agreement Between Clinical Notes and Patient-provided Information Forms
The results are summarized in Table 4. Overall, the positive agreement for chest pain, dyspnea, and cough was 74, 70, and 63, respectively, and the negative agreement was 78, 76, and 75, respectively. The kappa values were 0.52 (95% CI = 0.43, 0.60) for chest pain, 0.46 (95% CI = 0.37, 0.54) for dyspnea, and 0.38 (95% CI = 0.28, 0.48) for cough. Additional analysis was carried out within different sex and age strata to determine whether the concordance and discordance are influenced by these factors. Positive agreement was slightly lower for female patients across all 3 symptoms, and tended to be lower for younger subgroups. Negative agreement varied and tended to be lower for males and older subgroups. Kappa values ranged from 0.44 to 0.55 for chest pain, 0.27 to 0.52 for dyspnea, and 0.31 to 0.56 for cough.
Including nonnegated mentions of angina resulted in identification of 5 additional patients who self-reported chest pain and 3 additional patients who did not, yielding positive agreement of 76 and negative agreement of 75. Including nonnegated mentions of orthopnea and PND resulted in identifying 1 additional patient who did not self-report dyspnea, yielding positive agree ment of 70 and negative agreement of 75.
Distribution of Clinical Specialties for Discordant Results
We determined the distributions of clinical services that reflect the clinical specialty of care providers for the discordant results for each of the 3 symptoms (Table 5). The distributions were computed by counting the number of clinical notes originating in each of the top 10 most prevalent clinical services. Consistent with the overall distribution of clinical services in our sample, cardiovascular services and primary care/family medicine were the top 2 most prevalent specialties for the patients in the discordant groups. The majority of the notes for patients reporting chest pain that was not documented in the EMR were distributed among primary care, cardiovascular services, and endocrinology. A similar distribution was found for dyspnea. For cough, the majority of the notes were distributed between primary care, cardiovascular services, and physical medicine and rehabilitation services. The majority of the notes for patients who had documentation of chest pain in the EMR but did not report it on patient-provided information forms were distributed among primary care, psychology/ psychiatry, hematology, and cardiovascular services. A similar distribution was found for cough, with a large proportion of notes from the hematology service. For dyspnea, the majority of the notes originated from primary care and cardiovascular services.
Electronic medical records offer a way to make improvements in the quality and safety of patient care by addressing the information challenge of organizing and making patient charts accessible and interoperable across healthcare providers.25 The primary and most important use of the EMR is to facilitate and streamline care delivery; however, its secondary uses for clinical research and quality and safety assurance also are important.26 From the standpoint of the primary use of the EMR, we currently do not have the data to determine whether the discordances between the symptoms reported by patients and those documented by care providers had any significant clinical consequences. However, we do present evidence that effective secondary uses of the EMR require examination of the communication approaches of care providers and patients.
Our findings indicate substantial discordance between patient reporting and care provider documentation of the symptoms. The 2 sources of the symptoms may complement each other and have implications for clinical studies and quality measurements that rely on the medical record for identification of symptoms. For example, to ensure completeness of identifying and recruiting participants with specific symptoms for clinical studies, it may be necessary to use the information reported by patients on self-entry forms in addition to other sources such as the clinical notes. It also may be necessary to use both sources of information for applications such as postmarketing medication safety surveillance, where it may be necessary to detect alarming trends in symptoms to prevent more serious adverse drug reactions. The current study is the first step toward understanding the implications of symptom documentation practices for both primary and secondary uses of the EMR.
Validation of Natural Language Queries
The results in Table 3 indicate that NLP is a valid tool for identification of symptoms that occur in negated contexts in the free text of clinical reports. These results are promising because they show that the NLP methods used in this pilot study may be applied retrospectively to an existing cohort of patients with known outcomes to determine whether there was an association between the outcomes and symptom documentation. We also found that the methods used to identify the mentions of chest pain, dyspnea, and cough are highly sensitive. Automated natural language queries missed chest “tightness,” “sense of bruising in the lateral and posterior chest wall,” “anginal symptoms,” and “dyspneic.” Capturing these types of cases would require only minor modifications to the algorithms used for term identification. Inclusion of the terms “angina,” “orthopnea,” and “PND” resulted in very minor changes in the positive and negative agreement, not affecting the conclusions of the primary analysis. It is likely that these more specialized terms are used in conjunction with more generic terms such as “chest pain” and “dyspnea” and thus do not yield additional patients.
Agreement Between Clinical Notes and Patient-provided Information Forms
In a previous study of the Mayo Clinic EMR, St. Sauver et al showed that positive reports of cardiovascular disease risk factor information (blood pressure; triglycerides; cholesterol; history of heart rhythm, heart valve, and arterial problems) by patients are largely inaccurate, while negative reports were unlikely to be noted on the medical record by care providers, thus showing a low level of agreement between patient-reported risk factors and physician documentation.23 The study by St. Sauver et al, however, did not address symptom reporting and documentation, as that was not in the scope of their research questions. Although patient self-report is a secondary source of information on lab results and past medical diagnoses, with physician-documented medical record being the primary source (thus perhaps contributing to the low agreement found by St. Sauver et al23), the situation is reversed for patient symptoms in cases where the patient is the primary source and the care provider is the secondary source. The results of the current study are consistent with those reported by St. Sauver et al and contribute to constructing a more complete picture of clinical process and documentation.
Although we found discordance between patient-reported and physician-reported information, the causes of such discordances can only be hypothesized. Indeed, discordance may reflect the nature of the patient-physician interaction or other factors such as clinical specialty. For example, for discordances on chest pain reporting and documentation, one might expect to find the majority of clinical notes originating from noncardiovascular services. Our data shows that although this is true, a relatively large proportion of the notes (24%) originated from cardiovascular services. The results are similar for dyspnea and cough. The majority of the notes for patients who reported chest pain not documented in the EMR originated in 3 services: cardiovascular, endocrinology, and primary care. Direct examination of the patient-physician interaction would be necessary to determine the reasons for the discordance; however, the reasons may differ by specialty. Furthermore, the discordance also may reflect the specifics of the documentation system. The Mayo Clinic uses an integrated EMR where all notes and forms (including the patient-provided information forms) from all services for a given patient are available to any clinician working with the patient. Clinicians may decide to avoid repeating a symptom in their clinical note that is already documented elsewhere. Thus, the nature and the consequences of the interaction between the individual characteristics of the documentation systems, the care providers, and the patients warrant further investigation within the framework of patient-centered care.
Of interest are those instances where the patient reported a symptom not documented in the record (ie, false negatives); 31% of patients with chest pain, 38% with dyspnea, and 45% with cough did not have a positive mention of the symptom in the clinical notes. We could hypothesize that the outcomes of such patients may differ from those for whom symptoms were documented in the EMR. Conversely, when the patient does not report a symptom documented in the EMR, such discordance may reflect differential elicitation of symptoms by care providers by specialty. As these considerations remain speculative, further study is needed to examine the reasons for discordances and to determine whether these affect clinical outcomes. The present study shows that the discordances exist, which has implications for clinical research that relies on the EMR to ascertain symptoms.
Limitations and Strengths
The generalizability of the study depends on the availability of the EMR; however, the adoption of EMRs across the United States is growing.27 Many medical centers already have their patient notes in electronic form, and systems for accessing the information in the notes can be constructed even without a fully integrated EMR system. Although the data sources used in this study are specific to the Mayo Clinic, collecting patient-reported information before a visit and physician documentation of the visit are standard healthcare practices, and similar resources are in use at other institutions across the United States. Thus, our results are generalizable to other institutions that are equipped with an EMR and can guide the improvement of existing EMR systems by suggesting new avenues for improved elicitation and capture of patientspecific information central to patient care and research.
In this study, we did not look for an association between the literacy level or proficiency in the English language. We did not have these data available for this study, but we recognize that these are important variables to consider because the forms require a certain level of knowledge of the English language. Care providers’ demographic characteristics (as well as the interaction of these characteristics with patient characteristics) also may be important predictors of concordances and discordances in documentation. These characteristics were not available for the current study but will be considered in future work.
The use of the state-of-the-art EMR environment at the Mayo Clinic is a distinct strength of this study, enabling an investigation of the discrepancies between patient-reported symptoms and provider documentation. This study presents a novel approach that relies on automated and thus easily scalable investigation of the free text contained in provider documentation; this approach can be used in future large-scale studies of the interaction between the patient and the care provider. Despite the fact that some natural language term variants (particularly those due to misspellings and nonstandard descriptions) are likely to be missed by the automated natural language queries used in this project, our methodology is highly sensitive for identification of chest pain, dyspnea, and cough.
We thank Ellen Koepsell and Diane Batzel for their dedicated efforts in annotating clinical notes.
Author Affiliations: From the Department of Pharmaceutical Care and Health Systems, University of Minnesota (SVP), Minneapolis; the Department of Research and Evaluation, Kaiser Permanente (SJJ), Pasadena, CA; and the Department of Health Sciences Research (CGC) and the Division of Cardiovascular Diseases (VLR), Mayo Clinic, Rochester, MN.
Funding Source: This work was supported by Public Health Service grants RO1-72435, GM14321, and AR30582, and the NIH Roadmap Multidisciplinary Clinical Research Career Development Award Grant (K12/NICHDHD49078).
Author Disclosure: The authors (SVP, SJJ, CGC, VLR) report no relationship or financial interest with any entity that would pose a conflict of interest with the subject matter of this article.
Authorship Information: Concept and design (SVP, SJJ, CGC); acquisition of data (SVP, VLR); analysis and interpretation of data (SVP, SJJ, VLR); drafting of the manuscript (SVP, VLR); critical revision of the manuscript for important intellectual content (SVP, SJJ, CGC); statistical analysis (SVP, VLR); provision of study materials or patients (SVP); obtaining funding (SVP); administrative, technical, or logistic support (SVP, CGC); and supervision (SVP, CGC, VLR).
Address correspondence to: Serguei V. Pakhomov, PhD, Department of Pharmaceutical Care and Health Systems, University of Minnesota, 308 Harvard St, SE, 7-125F Weaver-Densford Hall, Minneapolis, MN 55401. E-mail: email@example.com.
1. Hing E, Cherry DK, Woodwell DA. National Ambulatory Medical Care Survey: 2004 summary. Adv Data. 2006 Jun 23;(374):1-33.
2. Lamberg L. New mind/body tactics target medically unexplained physical symptoms and fears. JAMA. 2005;294(17):2152-2154.
3. Committee on Quality of Health Care in America, Institute of Medicine. Crossing the Quality Chasm: A New Health System for the 21st Century. Washington, DC: Institute of Medicine; 2001.
4. Sager N, Lyman M, Bucknall C, Nhan N, Tick LJ. Natural language processing and the representation of clinical data. J Am Med Inform Assoc. 1994;1(2):142-160.
5. Friedman C. A broad-coverage natural language processing system. In: Overhage JM, ed. Proceedings of the 2000 AMIA Annual Symposium. Bethesda, MD: American Medical Informatics Association; 2000:270-274.
6. Melton GB, Hripcsak G. Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform Assoc. 2005;12(4):448-457.
7. Javitz HS, Ward MM, Watson JB, Jaana M. Cost of illness of chronic angina. Am J Manag Care. 2004;10(11 suppl):S358-S369.
8. Hemingway H, McCallum A, Shipley M, Manderbacka K, Martikainen P, Keskimaki I. Incidence and prognostic implications of stable angina pectoris among women and men. JAMA. 2006;295(12):1404-1411.
9. Thom T, Haase N, Rosamond W, et al; American Heart Association Statistics Committee and Stroke Statistics Subcommittee. Heart disease and stroke statisticsÃ¢â‚¬â€