Providing care in the emergency department (ED) is expensive. The average cost of an ED visit is estimated at $1038 compared with $176 for a primary care visit in the United States.1
Moreover, the number of ED visits grew at roughly twice the population growth rate between 2001 and 2008.2
The CDC reported that the number of ED visits increased from 119.2 million (40.5 visits per 100 persons) in 2006 to 136.3 million (44.5 visits per 100 persons) in 2011.3,4
Furthermore, studies have revealed that “frequent-flyer” patients constitute a key factor of ED crowding, which results in treatment delays and excessive mortality.5,6
Thus, given limited resources, reducing repeated visits is warranted to improve ED effectiveness and timeliness for those truly in need.
To develop interventions to reduce ED revisits, reliable predictive modeling that can identify high-risk patients is desired. However, compared with inpatient 30-day readmissions, which have been measured by CMS since 2012 to adjust payments to hospitals,7
ED revisits have received less attention.8
Although substantial published literature has examined factors influencing ED revisits, research on predictive models is limited.9-22
As a result, the predictive models for ED revisits published in the literature have only moderate predictive power: The highest C statistics reached 0.70 for 30-day revisits2,22
and 0.73 for 6-month revisits.23
In the present study, we developed a statistical model that predicts the risk of revisiting the ED within 30 days of discharge. The model can be used to identify high-risk frequent flyers for proactive intervention. With rapid adoption and use of health information technology and especially electronic health records (EHRs), administrative data are coming increasingly closer to real time and offer greater potential for improving ED care. Our model, based on administrative data and a publicly available patient classification system, can be readily implemented in health systems to reduce unnecessary ED revisits.
Data Source and Study Variables
ED visit data from fiscal years (FYs) 2013 and 2014 in Veterans Healthcare Network Upstate New York (VISN 2 Upstate) were analyzed in this study. VISN 2 is 1 of the 18 networks through which the US Department of Veterans Affairs (VA) delivers care to more than 6 million patients annually. VISN 2 Upstate, with 5 medical centers and 31 outpatient clinics across upstate New York, serves approximately 140,000 patients with an annual budget of $1 billion (starting in FY 2016, VISN 2 was restructured to include New York downstate VA hospitals). In FY 2014 for VISN 2 Upstate, a total of 21,141 patients had ED visits in the 4 medical centers that provided ED services.
We used the VA National Patient Care Database (NPCD) hosted at the VA Information and Computing Infrastructure as the primary data source for this study. The Outpatient Care File (OPC) and clinical stop code 130 were used to identify index ED visits and revisits. In addition to encounter information, such as visit dates and International Classification of Diseases, Ninth Revision, Clinical Modification
) codes, OPC also contains patient demographic and socioeconomic information, such as age, gender, race, and income. Data in NPCD, including OPC, are routinely used in VA operational analysis and research. Most of the data fields, such as visit dates and clinical information like ICD-9-CM
codes, are regularly and rigorously validated with strict business rules. The patient income information is means-tested. One exception is that its race information is often incomplete because the VA health system does not mandate veterans to report race. However, for the last several years, the VA has systematically gathered racial and ethnic information from other data sources, such as Medicare and the Department of Defense; as a result, the updated racial and ethnic data are deemed accurate and reliable.24,25
We also used Managerial Cost Accounting (MCA; formerly Decision Support System) files that contain actual patient care costs, rather than amounts claimed or paid as in private health plans. MCA costs are the primary financial data for internal operations and Congressional inquiries. For case mix or comorbidities, we used a publicly available and widely used system, Clinical Classifications Software (CCS), developed by the Agency for Healthcare Research and Quality (AHRQ),26
which classifies patients into 285 homogeneous groups based on ICD-9-CM
The dependent variable in this study was dichotomous (yes = 1, no = 0), indicating whether a patient had any ED revisit(s) within 30 days after being discharged from the ED in FY 2014. The explanatory or predictive variables used in this study were from FY 2013 and can be grouped into 4 categories: (1) demographics (age, sex, marital status, race, disability rating, and period of military service); (2) socioeconomic variables (patient income, homeless [yes = 1, no = 0], and patient insurance status [ie, not covered by any insurance (yes = 1, no = 0), enrolled in Medicare (yes = 1, no = 0), enrolled in Medicaid (yes = 1, no = 0), and covered by private insurance (yes = 1, no = 0)]); (3) prior-year utilization and cost (ED revisit within 30 days [yes = 1, no = 0], number of ED revisits within 30 days, total number of ED visits, number of primary care visits, number of telehealth encounters, total outpatient visits, number of hospitalizations, and total cost); and (4) patient risk or comorbidities (285 clinically homogeneous groups produced by CCS, which is developed by AHRQ26
The present study did not need or use any identifiable patient private information and therefore had expedited institutional review board review.
Modeling and Analysis
We employed logistic regression to predict the probability or risk of 30-day ED revisits. Logistic regression has been the most extensively used model in predicting binary outcomes where the dependent variable equals 1 if the event happened and otherwise equals 0. The model’s predictive power or discriminative ability is measured by the C statistic, which is defined as the proportion of times that the model correctly discriminates a random pair of individuals with or without an event. It is also equivalent to the area under the receiver operating characteristic curve. A C statistic of 0.5 indicates that the model is no better than flipping a coin, a C statistic of 0.7 to 0.8 suggests that the model has good discriminative ability, and a C statistic of 0.8 or higher indicates great discriminative ability or predictive power.27
To prevent model overfit, we only included variables with P
values smaller than .05 (based on the stepwise procedure) in the final regression analysis, and we also calculated shrinkage coefficient, an indicator of overfit.28
Further, we fitted the model by the split-sample method.28,29
With this method, the full sample (after merging the dependent variable from FY 2014 and the independent variables from FY 2013) was randomly split into a development sample (50%) and a validation sample (50%).30
The model was fitted on the development sample and then the estimated coefficients were applied to the validation sample to produce the risk score (probability) and the goodness-of-fit statistics. The split-sample method has been extensively used to prevent predictive models from fitting random noises rather than a true trend. In addition, we also visually examined the prediction accuracy by comparing the predicted and observed numbers of patients with any 30-day revisit(s) in 5 estimated risk categories, which equally divided all of the patients into 5 groups based on their estimated risk. The analyses were conducted using PROC LOGISTIC of SAS 9.3 (SAS Institute Inc; Cary, North Carolina).
To demonstrate the predictive power of different explanatory variables, we configured and fitted 3 models, from simple to comprehensive. For model 1, only demographic and socioeconomic variables were included in the model as the independent variables (ie, age, sex, marital status, race, income, enrolled in Medicaid, enrolled in Medicare, or covered by other private insurance [no insurance status was omitted in the regression as reference]). We also used 3 dummy variables (1 was omitted as reference) as the fixed effect to take into account the potentially different practice patterns among the 4 medical centers. Model 2 included the variables in model 1 and prior-year utilizations and cost (ie, ED revisit within 30 days [yes/no], number of ED revisits within 30 days, total number of ED visits, number of primary care visits, number of telehealth encounters, number of all outpatient visits, number of hospitalizations, and total cost). Model 3 included the variables in model 2 and patient comorbidities based on CCS. The inclusion or exclusion of the variables in the final regression depended on the P
values produced by the stepwise procedure.
All 22,734 patients who had ED visits in FY 2014 were included in this study. Among the 22,734 patients, 4937 returned to the EDs within 30 days; the overall 30-day revisit rate was 22%. The total number of ED visits in FY 2014 was 42,192. The independent variables (except the 285 CCS indicator variables) and their descriptive statistics are reported in Table 1
. Table 2
shows the top 20 most frequent diagnoses of patients with 30-day ED revisits. All the statistically significant variables (P
<.05) in the final model are reported along with their coefficient estimates, P
values, odds ratios, and 95% CIs in Table 3 [part A and part B]
. Note that the age groups were kept in the model as a convention regardless of their statistical significance (the group aged 75 years or older was omitted as the baseline).
In predicting 30-day ED revisits, the first model included only demographics, socioeconomic characteristics, and the fixed effect of the medical centers, in which 10 variables were statistically significant (P
<.05) and kept in the model. The C statistics were 0.568 (95% CI, 0.555-0.580) and 0.556 (95% CI, 0.543-0.568) for the development and validation samples, respectively. In the second model, 13 variables were statistically significant and kept in the model. The C statistics were 0.748 (95% CI, 0.737-0.759) for both samples. In the final model, 51 variables (39 CCS categories) that were statistically significant were kept in the model, and the C statistics were 0.773 (95% CI, 0.762-0.784) and 0.763 (95% CI, 0.753-0.774) for the development and validation samples, respectively. The receiver operating characteristic curves of all 3 models based on the validation sample are reported in Figure 1
To further examine the prediction accuracy, we compared the predicted and observed numbers of patients with ED revisits by 5 estimated risk categories (% = patients with ED revisits / patients in the risk group). As shown in Figure 2
, in both the development and validation samples, the predicted and the observed numbers of revisits were very close except that the model overestimated the number of revisits for the low-risk group. In addition, we also estimated the shrinkage coefficient, which yielded a value of 0.86, indicating no overfit.28
EDs across the country are still overcrowded and the waiting lines remain long. The CDC reported that the average waiting time in EDs in the United States increased by 25%, from 46.5 to 58.1 minutes, from 2003 through 2009.31
In 2015, the American College of Emergency Physicians published a survey report showing that ED visits went up since implementation of the Affordable Care Act.32
Consequentially, waiting for hours before seeing a doctor in the ED is not uncommon.3,4,33
A systematic review showed that repeated ED visits are a key factor contributing to ED crowding, treatment delays, and excessive mortality.5,6
To free ED capacity for true emergencies, reducing unnecessary repeated visits is desired. However, unlike the rate of hospital readmissions, which has been used by CMS to penalize hospitals with excessive readmissions, ED revisits have not received much attention.8
Given limited resources, interventions to reduce ED revisits require accurate predictive modeling that can identify the patients at high risk for revisits, but only limited research focusing on predictive modeling is found in the literature.
In the present study, we developed a statistical model that predicts, with good accuracy, patients who are at high risk of returning to the ED within 30 days. The strength of the model for broad application lies in its 3 features: (1) It uses administrative data that are readily available in nearly all hospitals or health systems, (2) the comorbidity classification system is publicly available from AHRQ, and (3) the model is based on logistic regression, which can be readily implemented with most common software packages, including Microsoft Excel.
Unlike for predicting hospital readmissions,34
comorbidities added limited predictive power to demographics and prior-year utilization/cost: the C statistic increased by 0.19 from model 1 (only demographics were included) to model 2 (prior-year utilization/cost were added), while only improving by 0.015 from model 2 to model 3 (diagnoses were added). Similarly, our sensitivity analyses revealed that prior-year utilization/cost also did not add much predictive power if the patient demographics and comorbidities were entered into the model first, which indicates that prior-year utilization/cost and comorbidities are highly correlated.
It is worth noting that the stepwise procedure for selecting variables is popular but not perfect. In fact, it can easily produce misleading results if the objective is to assess the effect of an independent variable on the dependent variable. In this study, with prediction as the aim, the forward, backward, stepwise, and Lasso procedures produced nearly identical results and we reported the results based on the stepwise procedure.
Although the objective of this study is to forecast or predict ED revisits, rather than to assess the effect of each individual explanatory variable, some variables are of interest and further research is merited. In particular, the number of primary care visits in the prior year was inversely associated with the risk of ED revisits, which is clinically meaningful and consistent with previous findings.35
On the other hand, it appears to be paradoxical that the number of prior-year hospitalizations was also inversely correlated with ED revisits. Similarly, a few severe comorbidities (eg, CCS131, respiratory failure) were associated with low risk of ED revisits. Without knowing the exact reason(s), we speculate that these variables are indicators of disease severity that warrants inpatient rather than ED care,36
and higher mortality may also play a role. Obviously, further research is needed.
In addition, female patients in this study were less likely to experience ED revisits, holding other things constant, but the finding may not be generalizable to the general public because the study population is predominantly male. It is also interesting to see that a higher disability rating was associated with low risk of ED revisits. Although we do not know the exact reason(s), it could be due to the fact that medical needs for disability can be better anticipated and planned.
This study has several limitations. First, the study population included only veterans, and they were from only 1 region (upstate New York). VA patients could differ from the general population. Apart from seeing mostly male patients, EDs at VA hospitals see more patients with mental health issues (eg, posttraumatic stress disorder) than do EDs in the private sector.37
In addition, VA’s co-payment structure is also different from the private sector’s. No co-payment is required for low-income veterans or those with service-connected conditions, which may result in different ED visit patterns. Second, veterans in this study may have received care from non-VA providers, which may also have decreased the model’s predictive power. Third, the present study only analyzed whether or not a patient had any ED revisit(s) at all; future studies may consider multiple revisits within 30 days. Finally, for the approved study period, International Classification of Diseases, Tenth Edition, Clinical Modification
) codes were not available and therefore ICD-9-CM
codes were used to identify patient comorbidities. However, the modeling process remains the same and the predictive power may be improved if ICD-10-CM
codes are used.38
Curtailing unnecessary ED revisits will not only lower healthcare costs but also shorten wait time for those who critically need ED care. However, broad intervention targeting every ED visitor may not be feasible given limited resources. In the present study, we developed a predictive model based on administrative data, a publicly available patient classification system, and commonly used logistic regression, which yielded good predictive power with a C statistic of 0.76 compared with 0.70 for models previously reported in the literature. The model can be readily implemented by hospitals and healthcare systems with EHRs, and the resultant risk score can be used to identify high-risk patients for proactive interventions to reduce ED revisits.