Lindsey Jeanne Leininger, PhD; Donna Friedsam, MPH; Kristen Voskuil, MA; and Thomas DeLeire, PhD
Medicaid programs provide care to a population with widely varying healthcare needs. Because of these variations, appreciable benefits accrue from the ability to prospectively stratify patients into clinically distinct subgroups. Related applications, including targeted case management and the establishment of riskadjusted performance benchmarks for providers, are key tools in efforts to transform Medicaid into an outcomesfocused payer.1,2
While states differ in the extent to which they employ such techniques for their Medicaid programs,3
they all share the key constraint of lacking information on prior medical history for new enrollees, including the large expansion populations enrolled under the Affordable Care Act. Moreover,Medicaid enrollment is characterized by high levels of churn in coverage status,4,5
further complicating the challenge Medicaid agencies face in garnering recent medical histories of their members. For both new and returning program applicants, self-reported health measures collected at the time of enrollment may be the only practical means of gathering such data. To date, there is minimal evidence regarding whether states’ enrollment systems are capable of meeting the data collection task and whether the resulting data are of sufficient quality to be used for predicting highneed cases.
A recent Medicaid expansion in Wisconsin provides a unique opportunity to assess whether self-reported health measures gathered from an existing Medicaid enrollment system can provide clinically meaningful information. Wisconsin’s Medicaid program, in expanding managed care coverage to childless adults in 2009, required that applicants complete a self-reported health needs assessment (HNA) in addition to providing the sociodemographic information typically required for program enrollment.6
Our study uses administrative data from this expansion population to assess the predictive value of collecting self-reported health measures at the time of application—a novel use of Medicaid enrollment systems. To our knowledge, this is the first paper to explore the promise of using Medicaid enrollment systems data in this capacity.
Our paper tests the following 2 hypotheses:
1. HNA data considerably improve the ability to predict utilization and costs incurred over the first year of Medicaid enrollment, relative to the predictive performance of sociodemographic data typically collected by Medicaid agencies at the time of application;
2. A prediction tool comprising a combination of HNA and sociodemographic measures meets accepted thresholds of predictive ability for utilization and cost outcomes.
Assessing the predictive ability of the HNA data provides an instructive case study for other states’ Medicaid agencies, as limited empirical evidence exists regarding the predictive capacity of self-reported health measures among Medicaid members. We hypothesize that selfreported health measures are meaningfully predictive of high resource utilization among Medicaid members, in keeping with the related literature demonstrating the appreciable predictive ability of self-reported HNA instruments among populations served by Medicare and the Department of Veterans Affairs (VA).7,8
Medicaid programs nationwide have considerable experience using claims and/or encounter data for a variety of actuarial and quality measurement purposes.9
In contrast, Medicaid agencies lack experience collecting self-reported health data as part of the Medicaid application process. The potential relative benefits of this mode of data collection are large, as the marginal cost of collecting health data at enrollment is appreciably lower than fielding a population-based survey or establishing and maintaining an encounter database suitable for analytic purposes. However, there is great concern about and little evidence regarding the quality of the resulting self-reported data. Poor health status and/or poor literacy may potentially preclude enrollees from accurate reporting.10
Moreover, despite Medicaid agencies making explicit promises to the contrary, enrollees may fear that their answers could affect their eligibility for certain services.11
The presence of these and other unknown (and potentially unknowable) data quality threats demands a careful empirical examination of whether an enrollment-based data collection technique can indeed generate health-related information of sufficient caliber for programmatic purposes.
Data and Sample
Data from 2 state administrative systems were merged to construct the sample: the Client Assistance for Re-employment and Economic Support System (CARES), which stores all social program applications, and InterChange, which warehouses all claims and encounter data for Wisconsin Medicaid members. The study sample was drawn from the 48,460 enrollees who applied for the waiver program between its launch in July 2009 and the subsequent imposition of an enrollment freeze in October 2009, and who were enrolled in coverage for at least 1 year.
While the Department of Health Services (DHS) had initially intended that all waiver enrollees complete an HNA, logistical constraints precluded their universal administration. As such, the analytic sample was limited to the 34,087 members who completed an HNA at the time of enrollment. These members comprised 70% of the relevant population entering the program during the study period. DHS agency officials have shared with us that in some months case workers processing phone applications had to sacrifice HNA completion in favor of expediency, given the unanticipated magnitude of applicants (conversation with Linda McCart, director, DHS Policy and Research Section, July 2012). Members with and without HNA information have similar racial and ethnic backgrounds, but differ with respect to age and sex, with HNA respondents being older and disproportionately female (eAppendix Table
). While the HNA completion rate was not universal, it compares favorably to that achieved by a similar pilot study assessing the predictive ability of a self-reported health screener collected on a VA population,8 which had a coverage rate of roughly 40%.
Emergency department (ED) visits and inpatient utilization were chosen as the primary outcomes of interest, as both of these types of care have long been the focus of Medicaid case management efforts12
and subsets of both (eg, ambulatory sensitive ED visits and hospital readmissions) are widely recognized as potential healthcare performance indicators.13,14
Accordingly, they are also the most commonly considered utilization outcomes for predictive modeling applications in Medicaid.11
Medicaid case management programs often seek to target the highest-cost cases15
; as such, we examined the incurrence of high costs as an additional outcome of interest. We operationalized the dependent variables by creating the following 3 binary indicators measured over a member’s first year of Medicaid enrollment: membership in the top decile of ED utilization, which reflects having 3 or more ED visits; having at least 1 inpatient hospitalization (similar to a top decile measure, as 9.2% of the sample experienced an inpatient event); and membership in the top cost decile, which represents costs of at least $6360.
We estimated the predictive ability of 7 different sets of predictors, the first of which consisted a standard set of sociodemographic variables currently collected by Medicaid enrollment systems (see Table 1
for the complete list). Each of the remaining blocks of predictors included both the sociodemographic variables and additional variables drawn from the HNA (see eAppendix Figure
for details on exact wording and progression of HNA measures). The second set of predictors included sociodemographics plus dummy variables reflecting the presence of the following conditions
enumerated in the HNA: asthma; cancer; chronic obstructive pulmonary disease; depression; diabetes; emphysema; heart problems; high blood pressure; other mental health condition; and stroke. The third set included sociodemographics plus self-reported measures of behavior
captured in the HNA: an indicator reflecting smoking status and an indicator reflecting problem alcohol or other drug use. The fourth set was sociodemographics plus a dummy variable reflecting high prescription
drug use, measured as using 5 or more prescription drugs. Access to care
indicators that reflected having a regular doctor and a regular clinic comprised, along with sociodemographics, the fifth set; sociodemographics plus a measure representing the previous year’s utilization
, operationalized as having experienced an ED visit or hospitalization for one of the HNA-enumerated conditions, comprised the sixth. The seventh set of predictors was the entire vector of HNA measures (conditions
+ access to care
+ previous year’s utilization
) plus sociodemographics.
A series of logistic regression models, corresponding to the 7 blocks of predictors described above, was fitted for each outcome. Thus, the baseline model included only the sociodemographic measures, and each subsequent model included the addition of a subset of (or, in the case of the final specification, the entire set of) HNA measures. For each of the HNA specifications, we tested the incremental predictive ability of the HNA measures over that of the baseline demographic model.
We used 3 measures of predictive ability to assess the efficacy of the self-reported HNA measures. First, predictive ability was assessed using the C-statistic, the most commonly reported measure of model discrimination in the related literature.16-18
For a dichotomous outcome, it is identical to the area under the receiver operating curve, a plot of the sensitivity (true positive rate) against 1 – specificity (false positive rate) across the entire range of possible predicted probability thresholds. The C-statistic ranges between 0.5 and 1, with a value of 0.5 reflecting predictive ability no more accurately than a coin flip and a value of 1 reflecting perfect predictive ability. A rule of thumb suggested by Hosmer and Lemeshow19 and widely adopted in the clinical literature is as follows: C-statistics greater than or equal to 0.7 are considered acceptable and values greater than 0.8 are considered excellent.
Second, we calculated the discrimination slope, a complementary metric that provides greater intuition regarding the magnitude of incremental predictive ability contributed by an augmented model.16-18,20
The discrimination slope is computed as follows:
average (pˆ event) – average (pˆnoevent),
representing the mean value of the predicted probabilities resulting from the logistic regression model of the dependent variable event on a given set of predictors. Alternately stated, the discrimination slope is the difference between the average predicted probabilities of sample members experiencing the outcome and the average predicted probabilities of sample members not experiencing the outcome. Improvements in the discrimination slope, termed integrated discrimination improvement (IDI), are reported both as the level difference between a baseline and augmented model and as the percent improvement associated with the augmented model relative to the baseline model. We employed a split-sample approach to compute all C-statistics and discrimination slopes. This approach involves randomly dividing the sample into 2 subsamples, the first of which is used to fit the model (n = 17,043). The resulting estimates are then applied to the withheld validation sample (n = 17,044), with which the metrics of interest and associated 95% confidence intervals are computed using a 500 replicate bootstrap procedure. We also bootstrapped the difference in the C-statistic and discrimination slope between each augmented model and the baseline model to determine the statistical significance of any marginal gain in predictive performance.
Finally, we computed measures of sensitivity, specificity, and positive and negative predictive values associated with the baseline demographic model compared with the specification employing all the HNA measures. In keeping with the related literature, we chose the 50th, 75th, and 90th percentiles in the predicted risk distribution as our threshold values.15,21
These results are particularly important for case-finding applications, as stakeholders must decide upon a risk threshold at which a program (or additional screening measure) will be administered.
Descriptive statistics are displayed in Table 1. Excepting cancer and top ED use, each condition was positively associated with membership in the top utilization and top cost deciles. Similarly, the behavior, prescription drug, and previous year’s utilization measures were all positively correlated with membership in the top utilization and cost deciles. Both access to care measures exhibited modest negative associations with membership in the top ED decile and modest positive associations with the hospitalization and cost outcomes.
Predictive performance of the multivariate specifications is displayed in Tables 2
. For top ED utilization, Cstatistics ranged between 0.67 for the baseline specification and 0.74 for the richest HNA specification. The past-year utilization and condition specifications provided the greatest incremental predictive improvement over baseline; both behaviors and prescriptions were also associated with appreciable increases in predictive performance. In contrast, the access-to-care domain provided no meaningful improvement in predictive ability. Comparing discrimination slopes yielded similar conclusions (Table 3).
Predictive performance, measured by either the C-statistic or the discrimination slope, was lowest for the hospitalization outcome (C-statistic for richest specification = 0.67). Here again the conditions and utilization domains offered the greatest incremental increases in predictive ability over baseline (C-statistic of 0.65 and 0.63 for conditions and past utilization specifications, respectively, vs 0.59 for the baseline model). Prescriptions and behaviors both contributed meaningful improvements in predictive accuracy over baseline (IDI of 90% and 60%, respectively), while the access-to-care specification constituted a negligible (albeit statistically significant) improvement.
Similar to the progression of models predicting high ED use, the inclusion of the HNA predictors improved the performance of models predicting membership in the top cost decile sufficiently, such that the richest specification met the Hosmer-Lemeshow rule-of-thumb threshold for acceptability. Specifically, the C-statistics ranged from 0.61 for the baseline model to 0.72 for the richest HNA specification. For the cost outcome the block of condition predictors was associated with the greatest marginal improvement (C-statistic of 0.69; IDI of 267%). In contrast to the other 2 outcomes, the past year’s utilization specification was ranked third with respect to incremental performance improvement; however, it is important to note that while the specification’s relative performance was lower, the level of incremental predictive ability remained considerable (IDI of 153%). Also of note is that the relative contribution of prescription drugs was much higher for the cost outcome compared with the other 2 outcomes (IDI of 248% for cost, compared with 48% and 90% for ED and inpatient utilization, respectively). Importantly, the incremental predictive contribution of the HNA measures was highest for the cost outcome (IDI of 413% for specification including all HNA measures).
displays the sensitivity, specificity, and positive and negative predictive values by risk threshold associated with the baseline model and the specification including all HNA measures. For each outcome, the HNA specification was associated with appreciable improvements in sensitivity—especially at the 75th and 90th percentiles—with no resulting decreases in specificity. Similarly, the HNA specifications improved positive predictive value across all thresholds, with especially large improvements seen at the 90th percentile, with no associated decline in negative predictive value. The tradeoff between sensitivity and specificity at different risk thresholds is striking, and underscores the tensions inherent in choosing a threshold at which to target case-finding applications of the underlying predictive model. As is expected given the low prevalence of the outcome measures and in keeping with similar studies,15
positive predictive values were fairly low, even at the 90th percentile of the risk distribution (HNA specification: 0.27 for the ED outcome; 0.22 for any hospitalization; and 0.30 for high cost).
Sensitivity Analysis and Limitations
We also performed a number of sensitivity analyses to assess the robustness of these findings across additional specifications. First, we estimated an additional specification including a dummy variable reflecting having a comorbidity (2 or more enumerated conditions) in addition to the full set of HNA measures.22,23
This additional covariate added no incremental predictive power. Second, we estimated models employing top-decile, ambulatory care–sensitive ED visits as the ED outcome measure, using the algorithm created by Billings and colleagues.24
Results were very similar to the specifications modeling top-decile total ED visits (available from the authors upon request).
A potential limitation of our analysis is that it was constrained to use the particular HNA as designed by the Wisconsin DHS. The HNA did not include several of the best established predictors of future health costs and utilization, including general health status and functional and activity limitations, and all-cause past utilization over the past year.25-28
Such omissions, therefore, suggest that our estimates represent a conservative estimate of the potential predictive ability associated with HNA administration to new adult Medicaid enrollees.
We found that a simple, self-reported health needs assessment collected via a Medicaid enrollment system was meaningfully predictive of future healthcare utilization for a sample of new childless adult enrollees. For the outcomes of high ED utilization and high cost, the HNA measures combined with demographic measures demonstrated acceptable predictive performance and were associated with large incremental predictive improvements over demographic variables alone for each of the 3 outcomes, with the largest incremental improvements achieved for the high cost outcome. It is encouraging that the predictive performance of the HNA approaches that achieved in a claims-based study on a comparable Medicaid population in Vermont.15 Two corroborating studies using within-sample comparisons found that the predictive ability of a self-reported health screener approaches but does not quite meet that exhibited by recent claims history.29,30
The Wisconsin experience shows that the use of HNAlike instruments via Medicaid application systems holds great promise for prospective assessment of new enrollees. Medicaid agencies deciding whether and how to use an HNA-like instrument in predictive modeling applications face several important issues, however. As is the case with all risk adjustment applications, agencies will need to work assiduously to ensure that provider groups believe in the legitimacy and fairness of an HNA-based risk model. Medicare’s long standing experience with using survey data as a frailty risk adjuster could serve as an instructive guide in navigating this and other issues inherent in implementing survey-based risk adjustment.27,28,31,32
Agencies interested in using an HNA to target case management and/or other specialized services must be mindful that the positive predictive value of the resulting model is likely to be low. In recognition of this limitation, traditional disease management programs often use predictive modeling as a first screen, complemented by a subsequent screen typically involving follow-up by a nurse case manager.29
Additionally, conducting a business case analysis similar to that pioneered by Billings and colleagues33,34
would give stakeholders a sense of the likely fiscal impacts associated with a case-finding intervention employing a predictive model. We conclude with a final note that, as was the case in Wisconsin, HNA instruments are often designed for several purposes, many of which are not predictive in nature.35
Designing an effective HNA will require balancing its predictive goals with
the demands of its other stated objectives.