Objective: Little is known about how concordance between patient self-report and medical record data varies with medical organization type. Given discrepancies in quality of care received across patient cohorts and organizations, it is important to understand the degree to which concordance metrics are robust across organization types. We tested whether concordance between patient selfreport and medical record data would vary with medical organization type, controlling for patient demographic characteristics, health status, and domain of medical care.
Study Design: This observational study included 1270 patients sampled from 39 West Coast medical organizations with at least 1 of the following conditions: diabetes, ischemic heart disease, asthma or chronic obstructive pulmonary disease, or low back pain.
Methods: Medical records and patient self-report were used to measure 50 items grouped into 4 conceptual domains: diagnosis, clinical services delivered, counseling and referral, and medication use.We evaluated the concordance between ambulatory medical record and patient survey data. We conducted multivariate logistic regressions to test the impact of medical organization type (medical groups vs independent practice associations), controlling for patient characteristics and domain of care, on 5 concordance measures.
Results: Independent practice associations were associated with worse agreement, survey specificity, and medical record sensitivity, and better medical record specificity compared with medical groups.
Conclusions: The medical record and patient survey do not measure quality comparably across organization types. We recommend continued efforts to improve survey data collection across different patient populations and to improve the quality of clinical data.
(Am J Manag Care. 2007;13(part 1):289-296)
Concordance between patient self-report and medical record data varies
domain of medical care.Differences across medical organizations in data quality may affect
receipt of indicated services, and (3) appropriate risk adjustmentProblems with medical records affect the quality of care as well as theaccuracy with which it can be measuredWe recommend that personnel considering where next to invest inquality improvement activities give serious attention to strategies to
In 1996, the PBGH collected survey data from 30 308 adults from California, Washington, and Oregon who received care during the prior year from 1 of 60 medical organizations.
In 1998, we surveyed 3656 patients who had responded to the baseline survey in 1996 and indicated that they had 1 of 4 chronic study conditions (ischemic heart disease, diabetes, asthma or chronic obstructive pulmonary disease, or chronic low back pain; response rate 63%). The 1998 self-administered survey queried patients about diagnoses and healthcare services received over a 2-year period. Along with the 1998 mailing, subjects also received an invitation to participate in medical record abstraction and IRB-approved consent materials (response rate 54%).
We developed a medical record abstraction tool to collect items representing the aspects of care under study and guidelines with explicit criteria to code items. Nurse abstractors experienced in medical record abstraction and clinical practice successfully completed an intensive training and passed abstraction tests at the end of the training period and throughout the fieldwork.
Abstractors pursued records of all visits with all key healthcare providers, including records of primary care providers and key specialists for the study conditions noted in the claims/ encounter data provided by participating medical organizations. Abstractions took place on site, and at each site abstractors verified with office staff that they had complete medical information, including any data that might be stored separately from the "hard chart," such as laboratory study results stored electronically. Records of encounters discovered during abstraction though not previously noted by claims/ encounter data were located and abstracted as well.
In all, complete medical records were abstracted for 1270 patient survey respondents from 39 medical organizations. A total of 698 patients' records were not abstracted or were only partially abstracted due to medical practice closures, inability to locate records, or organization or patient study withdrawal. To assess interrater reliability, we compared the performance of 11 pairs of abstractors who abstracted components of process measures from the medical records of 54 unique patients. Concordance between abstractors was excellent with no significant difference noted in overall process scores, and with an aggregate .87 κ score across process measures.
Based on a survey of medical directors, organizations were classified as MGs or IPAs. Medical directors were prompted to think of an MG as an organization with shared overhead, such as rent and administrative support staff, and to think of an IPA as an organization with physicians in independent office practice who affiliate for purposes of contracting with managed care plans.
corrects for chance agreement, to evaluate agreement between data sources at the item level and overall. We also examined the sensitivity (percent true positives detected) and specificity (percent true negatives detected) of each of the 2 data sources. Based on the hypothesis that the patient survey is a better data source for certain items while the medical record is a better source for others, we evaluated each data source by calculating sensitivity and specificity of the patient survey using the medical record as the gold standard, and of the medical record using the patient survey as the gold standard. The terminology "sensitivity and specificity of the survey" will be used to refer to the use of the medical record as the gold standard. "Sensitivity and specificity of the medical record" refers to the use of the survey as the gold standard.
Items were grouped into 4 domains conceptualized as part of the larger study to represent important components of the process of care: diagnosis, clinical services delivered, counseling and referrals, and medication use.
We calculated concordance, sensitivity, and specificity at the item level, the domain level, and overall for all items combined. Item-level analyses were based on unique item-patient dyads, classifying agreement and disagreement based on what was documented by the 2 data sources for each individual item with each unique patient as the unit of analysis. For example, for the calculation of survey sensitivity, if the medical record indicated that an eligible patient had a diabetic foot examination during the study time window, this would be classified as a "1" (true positive) if the patient self-report agreed, or "0" (false negative) if the patient self-report disagreed.
For domain-level and overall analyses, we combined patient-item dyads, using the dyad as the unit of analysis. In other words, we aggregated all dyads into a single 2x2 table to calculate the overall concordance, sensitivity, and specificity metrics. Because patients may be eligible for multiple items per domain, they could be represented multiple times in these analyses.
Item-level and domain-level analyses have been reported elsewhere.15
We determined patient demographic characteristics (age group, sex, race/ethnicity, education level, income level, and health status) from 1996 patient self-report. Self-reported health status was measured using the 12-item Short-Form Health Survey (SF-12).16 The SF-12 physical component scores were classified as high (≥75th percentile, or score ≥52) or low (<75th percentile, or score <52).
We conducted bivariate analyses between the variables representing patient characteristics and MOT, and between concordance and variables representing MOT, patient characteristics, and domain of medical care. We then conducted multivariate logistic regression to test the effect of MOT type on 5 separate measures of concordance: agreement, true positive and true negative using the medical record as the gold standard, and true positive and true negative using the patient self-report as the gold standard, controlling for patient characteristics. Because previous work indicated that concordance varies with the domain of medical care in question, variables representing domain of care were also included as control variables in the regression models. Regression analyses were adjusted for clustering of observations within patient using Huber corrections.17
Each of the 5 models presented was analyzed with a different sample size due to the differing number of eligible events for each.
To determine whether any effects of MOT were moderated by patient characteristics or domain of care, we tested for interactions between MOT and the other covariates. In all, we tested for interactions between MOT and each of 7 covariates for each of the 5 dependent variables. We used the Wald χ2 test18 to determine whether the interaction terms made statistically significant contributions to the multivariate models. Of the 35 tests, 2 were significant. Given the number of tests and the concern that these interaction results could have occurred due to chance, we did not include interaction terms in the final models.
Table 1 presents the demographic characteristics of the 1270 patients, overall and stratified by MOT. We tested for significant differences by MOT using χ2. Distributions of sex, health-related quality of life, education, and income were similar across MG and IPA. The IPA patient sample was slightly older (P < .0002), with a lower proportion who self-identified as non-Hispanic White (P < .02).
We found κ varied significantly by MOT and several patient characteristics. κ was significantly, positively associated with MG as compared with IPA, and with patient age, better self-reported health status; and with Asian, White, and Other as compared with Black, Hispanic, and Missing race categories. Differences were also found in the measures of sensitivity or specificity according to these factors as well as patient education, and, in the case of medical record specificity, by income. Survey sensitivity was significantly and positively associated with patient education, while medical record sensitivity was negatively associated with education. Medical record specificity was positively associated with education and with patient income.
Multivariate ResultsTable 3 presents the results of the multivariate models for each of the 5 outcome variables. The sample size, the c statistic representing the area under the receiver operating characteristic (ROC) curve, and the adjusted odds ratios (ORs) and P-values associated with each independent variable are shown for each model. C statistics ranged from 0.60 to 0.67. Models with some predictive ability have ROC areas between the 2 extremes of 0.5 and 1.0.21
We found that MOT was significantly associated with 4 of the 5 measures of concordance. The first column presents the model predicting agreement (vs lack of agreement) between medical record and patient self-report. Controlling for patient factors and domain of medical care, IPA was found to be associated with a slightly lower likelihood of agreement as compared with MG (OR = 0.93, P = .04). Self-reported health status and domain of medical care were also significantly associated with the likelihood that the 2 data sources would agree on a particular item. Better health status was significantly and positively associated with higher odds of agreement (OR = 1.01, P = .001). The medication use domain was associated with higher odds of agreement as compared with the reference domain (clinical services delivered) (OR = 1.21, P < .001). The counseling and referrals domain was found to be negatively associated with the odds of agreement (OR = 0.43, P < .001).
Next, we modeled survey sensitivity, or the likelihood of obtaining a true positive from the patient self-report data using the medical record data as the gold standard. The relationship between MOT and survey sensitivity was not significant (OR = 0.90, P = .09). Hispanic ethnicity and missing self-reported race/ethnicity were significantly negatively associated with the odds of the survey detecting a true positive as compared with Whites (Hispanic OR = 0.79, P < .03; Missing race OR = 0.49, P < .001). Items associated with the diagnosis domain were significantly positively associated with survey sensitivity (OR = 1.40, P < .001), whereas items from counseling and referrals domain were negatively associated with the outcome (OR = 0.57, P < .001).
In modeling the likelihood of survey specificity, or obtaining a true negative from the patient survey, we found IPA to be associated with lower odds of specificity compared with MG, controlling for other factors (OR = 0.88, P = .01). Older age, Asian race, better health status, and domain of medical care were also significantly associated with the outcome.
The model predicting the likelihood of the medical record detecting a true positive (using the patient self-report as the gold standard) revealed significant associations between MOT, patient age, education, and domain of medical care and the odds of finding a true positive. Independent practice association was negatively associated with the sensitivity of the medical record (OR = 0.71, P = .001). In contrast, IPA was associated with greater odds of the medical record detecting a true negative (OR = 1.14, P < .02) using the patient self-report as the gold standard and controlling for all other factors.
These results show that concordance varies with MOT after controlling for patient self-reported demographic variables, including age, health status, race/ethnicity, education, and domain of medical care.
Independent practice association was found to be associated with lower odds of total agreement as compared with MG. In addition, IPA was associated with lower odds of the self-report detecting a true negative (as defined by taking the medical record as the gold standard) after controlling for patient characteristics. This finding could be interpreted to mean either that IPA patients are less likely to correctly report a negative documented in the medical record, or that IPA medical records underreport. Independent practice associations were also associated with lower medical record sensitivity and greater medical record specificity (taking the self-report as gold standard). Again, one explanation is that IPA patients are more likely to overreport conditions or events. Given that these analyses controlled for important patient attributes hypothesized to influence the accuracy of self-report, it seems unlikely that the explanation lies with differences in patient populations.
The observed differences are possibly associated at least in part with some of the differences in organizational structure associated with IPAs as compared with MGs.Whereas MGs have benefited from many of the efficiencies of consolidated group practice, IPA structure, designed mainly for negotiating power as compared with more extensive organizational change, has not allowed them to reap many of the same rewards in terms of economies of scale allowing investments in information technology, integrated medical records, or other structural mechanisms to support coordination of care with other providers.10
Thus, apparent underreporting according to the medical record data in IPAs could be due to differences in information systems resulting in less complete recording, or greater medical record fragmentation. We did not directly measure electronic medical record availability. However, our survey of medical directors queried about corporate-level policies regarding electronic availability of components of the medical record. Independent practice associations were significantly less likely than MGs to report any corporate policies regarding electronic medical record availability. This finding may reflect fewer resources for or less emphasis on systematic oversight of the completeness of the medical record in the independent office settings affiliated with IPAs as compared with more integrated MGs.
Greater variation in storage and access for medical records in IPA as compared with MG settings could also result in greater difficulty in abstracting all pertinent medical information from all key specialists.
It is essential that quality of care measurement be fair and comparable across MOT and that observed differences not be due to artifacts of data quality. Yet it is well understood that Health Plan Employer Data and Information Set (HEDIS) rates, for example, are likely to be at least in part a function of the information systems of the health plans and their associated medical organizations. Differential medical record data quality across organizations may affect (1) identification of patients eligible for quality indicators, (2) verification of their appropriate receipt of services, and (3) appropriate risk adjustment. All of these effects may yield results that unfairly underestimate or overestimate quality by organization type.
To our knowledge, this is the first study of concordance of which we are aware to test for and find an association between MOT and concordance between patient and medical record data. The results of this study support the hypothesis that the setting of care is an important factor correlated with the concordance between patient self-report and medical record data, suggesting the quality of the 2 data sources varies by organization type. These findings have important implications for quality of care assessment, as they may point to problems with medical records that likely affect the quality of healthcare itself, as well as the accuracy with which healthcare can be measured.
This work is subject to some limitations. These data do not allow us to determine the exact sources of the discrepancies, which may be due to variations in recording, completeness, and fragmentation of the medical record. These results are possibly not representative of nonparticipating patients or medical organizations. Of patients originally surveyed, 54% consented to medical record abstraction. Patients with no medical record data were somewhat more likely to be younger than 65 (69% vs 62%), have low SF-12 scores (82% vs 75%), and to be African American (8% vs 3%) (P < .01 for all of the above).
Complete medical records abstraction was more often successful among participating subjects from MGs as compared with IPAs. Therefore, the final sample may not be fully representative of IPAs. However, we believe that any bias associated with this fact would likely be in the direction of the null hypothesis. Thus, inclusion of these patients would have likely strengthened our finding that IPAs are associated with poorer concordance.
Each of the 5 models presented was analyzed with a different sample size due to the differing number of eligible events for each. Differing sample sizes may explain some inconsistencies in significance levels of predictors across the 5 models, as the smaller number of patients in MGs as compared with IPAs may have contributed to lower power than if we had had more MG patients. We also recognize that we did not achieve the ideal of comparing each data source to a "true" gold standard (eg, direct observation). However, data for this study were drawn from the commonly used data sources of patient selfreport and medical records pertinent to thousands of encounters with physicians across 3 states, making direct observation impracticable. Moreover, because it is unclear that one data source such as the medical record truly represents a gold standard for all patient medical data, we chose the approach of presenting sensitivity and specificity, alternating each data source as the gold standard.
We recommend that managers and policy-makers considering where next to invest in quality improvement activities give serious attention to strategies that will allow consistent concurrent data collection from pertinent sources. This might include information systems such as an electronic medical records and/or systematic and serial patient-level data collection.
Meanwhile, we advise individuals collecting medical record data to take particular care to respond to challenges associated with capturing comparable data across disparate practice settings. The medical record accession process should carefully monitor the inclusion in the abstraction process of all key components of the record, even though they may be found in different areas of the record volume. Although we recognize it is a costly option, we recommend that quality assessment efforts utilize triangulation with the use of more than 1 data source to measure quality whenever feasible.
2. Fowles JB, Fowler EJ, Craft C. Validation of claims diagnoses and self-reported conditions compared with medical records for selected chronic diseases. J Ambul Care Manage. 1998;21:24-34.
4. Rozario PA, Morrow-Howell N, Proctor E. Comparing the congruency of self-report and provider records of depressed eldersâ€™ service use by provider type. Med Care. 2004;42:952-959.
6. Wallihan DB, Stump TE, Callahan CM. Accuracy of self-reported health services use and patterns of care among urban older adults. Med Care. 1999;37:662-670.
8. Bazzoli GJ, Dynan L, Burns LR,Yap C. Two decades of organizational change in health care: what have we learned? Med Care Res Rev. 2004;61:247-331.
10. Casalino LP, Devers KJ, Lake TK, Reed M, Stoddard JJ. Benefits and
11. Luck J, Peabody JW, Dresselhaus TR, Lee M, Glassman P. How well does chart abstraction measure quality? A prospective comparison of standardized patients with the medical record. Am J Med. 2000;108:642-649.
13. Sudman S, Bradburn NM. Response Effects in Surveys. Hawthorne, NY: Adeline; 1974.
15. Tisnado DM, Adams JL, Liu H, et al. What is the concordance between the medical record and patient self report as data sources for ambulatory care? Med Care. 2006;44:132-140.
17. Statacorp. STATA 8 Userâ€™s Guide. College Station, Tex: Stata Press; 2003:263-267.
19. Streiner DL, Norman GK. Health Measurement Scales, 4th Ed. Oxford: Oxford University Press;1994.
21. Obuchowski NA. Receiver operating characteristic curves and their use in radiology. Radiology. 2003;229:3-8.