Developing a terminology strategy that reflects real-world practice and industry standards can help data scientists and other allied data professionals efficiently and accurately identify clinically relevant insights that help improve the health of populations and individual patients.
At the heart of many advanced analytics projects in health care is the ability to efficiently and accurately surface clinically relevant insights that help improve the health of populations and individual patients. Health care providers, health plans, and health technology vendors are tackling these projects in different ways, such as training machine-learning models for advanced analytics, but regardless of approach, all require a strong foundation of clean, reliable, semantically interoperable, and up-to-date data.
Unfortunately, creating that foundation is where data scientists and other allied data professionals are spending a lot of their time, limiting their bandwidth to focus on more impactful, actionable analysis. To fulfill the promise of data-driven health care and fully realize the potential of machine learning, we must streamline the data harmonization process in a way that empowers data professionals to finally work at the top of their license.
Today, maintaining complex datasets, particularly those upon which accurate machine learning models can be built, is a painstaking, often manual, error-prone process for organizations. A standardized terminology strategy can support immediate improvements, but few data professionals are aware of its central role in accelerating time to insight.
In this article, we’ll highlight 2 areas where data analytics projects are focused—population health and risk adjustment—to demonstrate how an effective terminology strategy can be used to overcome common challenges.
Better Informed Population Health Efforts
Providers and health plans need access to standardized lab data and other clinical codes to analytically evaluate patient populations and inform the right intervention at the right time. However, the quality of lab data captured in electronic health records (EHRs) and other clinical systems used to generate these analytics need improvement.
There is significant variation in the language clinicians use to document across systems. It’s not uncommon for the glycated hemoglobin (A1C) test for diabetes to be documented more than 100 ways within a single organization. Second, the code sets themselves, such as Logical Observation Names Identifiers and Codes (LOINC), which the A, test must map to, change often and are typically not updated across an organization’s systems as quickly as is necessary.
In this example, it’s easy to see how a population health–focused analytics project would be challenged and how the creation of a machine learning model to accurately recognize an A1C test would be a much more formidable task than data professionals may realize at the outset.
Without a terminology framework in place to ensure lab data, such as an A1C test, can be accurately identified regardless of how it’s documented in the EHR and then normalized to the latest LOINC standard, health care organizations run the risk of models and stratifications that can compromise patient safety due to unrecognized care gaps.
Accurate Risk Adjustment for Individuals
Consider another example. Health plans use risk adjustment methods to mitigate the cost of high-risk members, such as those with chronic conditions. Supporting clinical documentation for qualifying diagnoses is required but can be a difficult task, as the rich patient information is often captured in free text fields within the EHR. Rather than review pages of clinician notes manually, plans are increasingly turning to natural language processing (NLP) technology to efficiently surface the support they need, and automatically translate it to the proper International Classification of Diseases, Tenth Revision, Clinical Modification codes needed for accurate payment.
However, in practice, data professionals are finding that traditional NLP models are missing the mark in health care, often overlooking key clinical evidence that can have a negative impact on member care planning and a plan’s bottom line. Clinically tuned NLP that accounts for the way a physician searches and documents patient encounters in real-world practice and is built on a foundation of continually updated medical terminologies is essential for understanding the complex language of medicine.
Enterprise Strategy That Scales
While many organizations aren’t aware of the challenges that can arise from not having a standardized terminology strategy in place at the outset of analytics initiatives, they realize quickly how impactful it can be once implemented. Other organizations, however, are bought in on needing a strategy and have developed internal systems for tracking and implementing code updates across their systems.
The challenge is that most are spreadsheet based and simply not feasible long term. There may be anywhere from 500 to 1000 code groups that have clinical meaning within an organization and from which analytics are built. New codes are released by the standards bodies at different frequencies throughout the year, and most organizations can’t keep up. As a result, the updates often go unmade, leading to unreliable analytics and data integrity problems—and data professionals don’t know where to begin in trying to remedy these challenges.
Wise health care organizations are turning to third-party experts that not only have the infrastructure in place to actively monitor for and update systems with code updates in near real time, but also the clinical expertise to ensure a terminology strategy is reflective of real-world practice and industry standards. Only with these pieces of a terminology strategy in place can data scientists and other allied data professionals in health care begin to pick their heads up from manual data clean-up and provide measurable value to their organizations.