Assessment of Structured Data Elements for Social Risk Factors

Vest,Joshua;Adler-Milstein,Julia;Gottlieb,Laura;Jiang Bian, PhD;Thomas R. Campion Jr, PhD;Genna R. Cohen, PhD;Nathan Donnelly, MS;Jeremy Harper, MS;Timothy R. Huerta, PhD, MS;John P. Kansky, MSE, MBA;Hadi Kharrazi, MD, MPH;Anjum Khurshid, MD, PhD;Harold E. Kooreman, MA, MSW;McDonnell,Cara;J. Marc Overhage, MD, PhD;Matthew S. Pantell, MD, MS;Wendy Parisi, MS;Elizabeth A. Shenkman, PhD;William M. Tierney, MD;Sarah Wiehe, MD;Harle,Christopher;

Publication

Article

January 19, 2022

The American Journal of Managed Care

January 2022

Volume28

Issue 1

Assessment of Structured Data Elements for Social Risk Factors

Author(s):

Joshua R. Vest, PhD, MPH,Julia Adler-Milstein, PhD

An expert panel identified and assessed electronic health record and health information exchange structured data elements to support future development of social risk factor computable phenotyping.

ABSTRACT

Objectives: Computable social risk factor phenotypes derived from routinely collected structured electronic health record (EHR) or health information exchange (HIE) data may represent a feasible and robust approach to measuring social factors. This study convened an expert panel to identify and assess the quality of individual EHR and HIE structured data elements that could be used as components in future computable social risk factor phenotypes.

Study Design: Technical expert panel.

Methods: A 2-round Delphi technique included 17 experts with an in-depth knowledge of available EHR and/or HIE data. The first-round identification sessions followed a nominal group approach to generate candidate data elements that may relate to socioeconomics, cultural context, social relationships, and community context. In the second-round survey, panelists rated each data element according to overall data quality and likelihood of systematic differences in quality across populations (ie, bias).

Results: Panelists identified a total of 89 structured data elements. About half of the data elements (n = 45) were related to socioeconomic characteristics. The panelists identified a diverse set of data elements. Elements used in reimbursement-related processes were generally rated as higher quality. Panelists noted that several data elements may be subject to implicit bias or reflect biased systems of care, which may limit their utility in measuring social factors.

Conclusions: Routinely collected structured data within EHR and HIE systems may reflect patient social risk factors. Identifying and assessing available data elements serves as a foundational step toward developing future computable social factor phenotypes.

Am J Manag Care. 2022;28(1):e14-e23. https://doi.org/10.37765/ajmc.2022.88816

_____

Takeaway Points

Computable phenotypes are measurements of patient conditions or characteristics that can be obtained from existing data by combining a defined set of variables and logical expressions. Routinely collected structured data within electronic health records and health information exchange systems are reflective of characteristics of social and economic well-being and thus may be amenable to use in social risk factor phenotype development. An expert panel identified and assessed structured data elements to support future development of social risk factor computable phenotyping.

Computable phenotypes represent an additional method of measuring patient social factors that leverages existing data sources and workflows.
Structured data elements used in reimbursement-related processes may be of the highest quality for use in phenotyping.
Currently collected structured data elements such as International Classification of Diseases, Tenth Revision Z codes and Logical Observation Identifiers Names and Codes are potentially susceptible to bias.
Computable phenotyping will require transforming or combining data elements into novel and potentially more informative measures.

_____

Social risk factors include patients’ nonclinical, economic, and contextual characteristics that may adversely affect health.^1-3 As important drivers of morbidity, mortality, utilization, and health care costs, social risk factors are important for health risk assessment and both individual and population health management.^4,5 Specifically, social risk factor information may improve risk prediction models^6,7 and identify patients in need of social services.⁸ Because of the potential value of social risk factor information, federal agencies, clinical organizations, and health system experts advocate for better collection and use of patient social risk factor information.⁹

Despite the potential value of social risk factor information to individual patient care and population health management activities, health care organizations’ current methods to measure this information are fraught with challenges. Patient-facing social risk questionnaires have not been consistently validated,¹⁰ diagnostic codes such as International Classification of Diseases, Tenth Revision (ICD-10) Z codes are underutilized,^11-13 area-level measures (eg, zip code–level demographics) can mask heterogeneity across individuals and are prone to the ecological fallacy,^14,15 and extracting free-text clinical documentation from electronic health records (EHRs) remains difficult for many organizations.^16,17 As a result, any one of these methods may not be sufficient for health care organizations to collect the information necessary to make inferences about patients’ and populations’ social risk factors.

Computable social risk factor phenotypes derived from routinely collected structured EHR or health information exchange (HIE) data may be an alternative approach to measuring social factors.¹⁸ Computable phenotypes are representations of patient conditions or characteristics that can be obtained from EHR data by combining a defined set of variables and logical expressions.^19-21 Data such as demographics, insurance information, billing histories, appointment status, emergency contacts, and language preferences exist in most EHRs. Although such data may not contribute to understanding patients’ clinical status, they are reflective of characteristics of social and economic well-being and thus may be amenable to use in social risk factor phenotypes. Additionally, using structured data elements already routinely collected as part of clinical and business operation workflows may mitigate the challenges of underutilization of screening surveys and diagnosis coding, additional data collection burden, and the technical implementation hurdles of natural language processing (NLP) for textual data. Moreover, when applied to HIE data, which combine patient data across organizations over time, robust computable social risk factor phenotypes may be constructed that reduce missing data challenges^22,23 and increase explanatory power.¹⁹ However, biomedical informatics and health services research have devoted little attention to the potential value of developing phenotypes from existing structured data for social risk factor measurement in favor of questionnaires, area-level data linkage, and NLP.^18,24

From existing work on computable phenotypes, we know that poor data quality²⁵ and data that are inconsistently collected across patient populations may result in inaccurate and otherwise biased phenotypes.²⁶ Therefore, as a foundational step, we convened an expert panel to identify and assess the quality of individual EHR and HIE data elements that might be useful, accurate, and unbiased components to include in future computable social risk factor phenotype development. Our work sets the stage for the future quantitative development of computable social risk phenotypes by providing expert insight to guide selection of candidate data elements.

MATERIALS AND METHODS

We used a 2-round Delphi technique to identify and preliminarily evaluate structured data elements as candidates for use in future computable phenotype development.^27,28

Expert Panel Formation

We recruited 17 individuals (of 18 invitations) with in-depth knowledge of EHR and/or HIE data based on publications or practice experience in 1 or more of the following 3 areas: EHR or HIE technology management in a health care organization; clinical or operational practice that involved data collection; or EHR or HIE research. The majority of respondents were affiliated with research or academic medical institutions (n = 14) and the remainder were individuals in leadership roles at health information technology organizations (n = 3). The panel represented organizations located on the East Coast and West Coast and in the South and Midwest. Five of the panel members were physicians. Expert panel members received a financial incentive of $250 for participating in the focus group and follow-up survey. Panelists were split across 2 identification sessions that each followed a common protocol.

In advance of the identification sessions, we provided each panelist with a summary of the research objectives of identifying and assessing potential structured data elements, a description of the expectations, and a shared definition of social risk factors (ie, any patient-level nonclinical economic, contextual, and psychosocial characteristics and factors). Because computable phenotypes are useful if generalizable,²¹ we instructed panelists to focus on structured data elements that they would expect to be commonly available in EHR or HIE data. We asked panelists to exclude unstructured data elements (eg, clinical note or other text data), data elements requiring linkage to sources outside of typical EHR or HIE systems (eg, tax records), and patient-facing social risk factor questionnaires because those are not widely adopted. These restrictions were intended to prioritize potential structured data elements that would be widely available. Panelists’ ideas were not restricted to specific age groups. We provided this information to each panel member during a short preparatory phone call in advance of the identification sessions.

Round 1: Identification

The study team (consisting of authors J.R.V., H.E.K., C.M., and C.A.H.) conducted two 90-minute group identification sessions (n = 8 and n = 9) via videoconference. In each of these sessions, we followed a nominal group approach in which each panelist, in turn, was asked to suggest a data element, until all ideas were exhausted. A research team member documented the ideas generated in real time and displayed them on screen during the session. To help organize the idea generation, when suggesting data elements, panelists were asked to categorize them into 1 or more broad categories of social risk factors^1,29,30: socioeconomic status (eg, employment, financial, food insecurity/hunger, housing instability); cultural context (eg, language, health literacy); social relationships (eg, social support, incarceration); and community context (eg, housing quality, transportation, safety/violence).

The 2 identification sessions were treated as independent (ie, findings from the first were not shared with the second). The research team deduplicated suggested data elements from across the 2 panels.

Round 2: Assessment

Panelists were asked to rate data elements identified in round 1 based on 2 characteristics (defined below): quality and likelihood of systematic differences across demographic groups. The purpose of this rating exercise was to begin evaluating the feasibility and appropriateness of real-world EHR or HIE data for future computable phenotypes. The rating survey was conducted using REDCap.^31,32 Before administering, 4 core research team members and 2 panelists pilot tested the survey.

Is the data element high quality? We defined high-quality data elements as those that were concurrently complete, accurate, and up to date in an EHR or HIE system. These 3 dimensions are common components of data quality frameworks.^33,34 Panel members rated each element on a 5-point scale from poor quality to excellent quality (eAppendix A [eAppendices available at ajmc.com]).

What is the likelihood of systematic differences in data quality? Differential data quality across patient demographic groups (eg, race/ethnicity, gender, age, sexual orientation) can lead to inaccurate and biased social risk factor measurement, risk prediction, and population health management activities. Systematic differences in quality could be due to a lack of patient diversity, differential work processes, structural barriers to care, or broader societal conditions.³⁵ Panel members rated each element on a 5-point scale from extremely unlikely different data quality to extremely likely different data quality.

Finally, to better understand each panel member’s frame of reference when completing the survey, we included a single item to gauge if responses were rooted in experiences with data from hospital settings, physician/group practice settings, and/or HIE systems.

Analyses

Analyses were divided by the identification and assessment phases of data collection. First, we determined counts of identified potential data elements by social risk factor category. We also determined data elements suggested in each social risk category during the identification sessions. These sessions also generated group discussion on potential risks and limitations of each data element, which we summarized. Next, we computed frequencies and percentages to describe panelists’ ratings of each data element during the assessment portion of the panel. We created 2-way scatterplots to illustrate the plurality of panelists who responded at the 2 extremes of the respective scales (eg, top 2 box approach).³⁶ The plots help identify those factors that they generally perceived to be of higher data quality and also as less likely to have systematic differences across groups. We plotted the data elements for each social factor category separately. To facilitate visualization, we labeled reported data elements as billing and payment, diagnoses and clinical data, encounters and appointments, identifiers and contact information, language, referrals and orders, social determinants of health codes, and other. The full distribution of responses for every data element is presented in eAppendix B. To examine consistency in ratings across panel members, we also grouped average data element ratings by social factor category and stratified by panel member type (physician or nonphysician) and primary frame of reference when answering questions (EHR or HIE) (eAppendix C).

RESULTS

Identification: Potential Data Elements (identification session)

Across the 2 identification sessions, panelists generated a total of 89 structured data elements (Table). However, due to the cooccurring nature of social risk factors, several suggested data elements pertained to multiple categories. About half of the identified elements (n = 45 of 89) were relevant to the socioeconomic status category. Within the socioeconomic status category, most data elements were suggested as relevant to financial status followed by employment, food insecurity/hunger, and housing instability. The socioeconomic category also included several data elements that we considered to be general indicators of socioeconomic-related needs. Candidate data elements reflected billing, identifiers, orders, and utilization data.

Identification: Observations on Concerns, Considerations, and Limitations Regarding Data Elements (identification session)

During identification, panelists recognized multiple limitations related to structured data elements. These concerns included the potential for inherently biased data elements, inconsistent data collection processes, potential variation across patient populations, and the limitations of area-level measures. For example, a panelist noted that discrimination occurs in the care delivery processes and that underserved populations face barriers in accessing services, which could lead to biased data. Similarly, another panelist noted that credit scores may be very predictive of patients’ financial risks and needs, but this data element is known to be biased by race. Similarly, another noted that inconsistent data collection also limited the usefulness of some data elements. As one panelist stated about documenting homelessness, “There are some ICD codes that nobody uses.” Another panelist agreed with the limited adoption, but he noted that “…the ICD code is going to be very specific when used.” Panelists noted that computable phenotypes may need to be developed for different patient populations. For example, some data elements could be relevant for adults but would not be relevant for pediatric populations. Alternatively, a phenotype could prove useful for patients with high health care utilization only. Finally, one panelist noted the “poor overlap” between area-level measures and patients’ self-reported social risk factors.

Assessment: Perceptions of Data Quality and Likelihood of Systematic Differences in Quality

When responding to the assessment surveys, most (n = 12 of 17) of the panel members reported primarily thinking about data that come from HIE systems. Clinicians and nonclinicians did not vary substantially in their assessments of data quality and likelihood of systematic differences in data quality across populations. Panelists’ assessment of the likelihood of systematic differences in data quality across populations did vary based on whether they reported primarily thinking of HIE systems vs EHR systems. Those who reported thinking primarily about HIE systems most frequently reported that quality was likely to be different across populations for the socioeconomic, social relationship, and community context categories (eAppendix C).

Socioeconomic Status Data Element Assessment

Data elements that are both higher quality and unlikely to have systematic differences across populations are preferable. For those data elements suggested by the panelists as relevant to socioeconomic status, only identifier and contact information and billing and payment-related data elements were frequently rated as “very good” or “excellent” quality and at the same time rated as also “unlikely” or “extremely unlikely” to have differential quality across patient groups (Figure 1). These elements included date of birth, last name, address, insurance type, bills in collection, payment method, days in accounts receivable, and outstanding bills. Many more data elements were generally viewed as low quality (ie, “fair” or “poor”). Notably, panelists rated the ICD-10 Z and Logical Observation Identifiers Names and Codes (LOINC) codes that represent various social risk factors, several data elements related to referrals to specific social services and providers, and inability to do telehealth visits as lower quality (ie, “fair” or “poor”) and simultaneously “likely” or “extremely likely” to have differential quality across patient groups.

Cultural Context Data Element Assessment

Panelists rated only address and EHR portal account presence and usage as high quality (ie, “very good” or “excellent”) (Figure 2). Panelists also considered these elements to generally be “unlikely” or “extremely unlikely” to have differential quality across patient groups. Conversely, more than half of panelists rated presence of advance directives, language of discharge instructions, primary language, and the need and use of interpreters as low quality (ie, “fair” or “poor”) and as “likely” or “extremely likely” to have differential quality across patient groups. Again, ICD-10 Z codes and similar LOINC codes related to education and literacy were rated of low quality and likely to have different data quality across populations.

Social Relationships Data Element Assessment

Panelists rated few social relationship data elements as high quality overall (ie, “very good” or “excellent”) (Figure 3). The majority of panelists rated social relationship–relevant ICD-10 Z and LOINC codes as low quality and “likely” or “extremely likely” to have differential quality across patient groups.

Community Context Data Element Assessment

In the community context domain (Figure 4), some data elements associated with identity and contact information, diagnoses and clinical data, and encounters and appointments, such as address, arrival by ambulance, and emergency department visits associated with trauma or injury, were viewed by more panelists as higher quality and “unlikely” or “extremely likely” to have differential data quality across populations. Other diagnoses and clinical data were also considered “unlikely” or “extremely likely” to have differential data quality across populations but were nevertheless viewed as having “fair” or “poor” data quality. Again, panelists rated community context–relevant ICD-10 Z and LOINC codes as “likely” or “extremely likely” to have differential data quality across populations and to be of poorer data quality.

DISCUSSION

Our panel of 17 EHR and HIE data experts identified and commented on routinely collected structured data elements for potential use in the future development of computable social risk factor phenotypes. Panelists highlighted several specific concerns about overall data quality and the potential for systematic quality differences across populations that may lead to bias and other data inaccuracies. This novel and foundational work can be used to help develop future computable phenotypes for social factors.

Data quality (defined in this study as complete, accurate, and up to date) is a long-standing concern in biomedical informatics, particularly when data are used for purposes other than those for which they were originally collected.³⁷ Panelists generally perceived data elements of the highest quality to be those used in reimbursement-related processes (eg, identifiers and contact information, billing and payment-related data, diagnoses), which is consistent with prior studies.^38,39 In addition, panelists reported that these data elements were among those less likely to be systematically different in quality across populations. Given these advantageous qualities, reimbursement-related data elements may be viable candidates for use in computable phenotypes development.

In general, the most consistently identified quality concerns related to structured data elements that have been designed to document social risk factors: ICD-10 Z and LOINC codes. Evidence indicates that these codes are substantially underutilized in practice.^11-13 Not only did the expert panel results question the quality of these data, but they also indicated that these were among the most likely to have different data quality across populations. These perceptions align with a recent quantitative analysis indicating that ICD-10 Z codes are a specific indicator of social risk, but one that is collected in a biased fashion.⁴⁰ Increasing adoption, explicit reimbursement for documenting social needs, or the mapping of screening questionnaires to these standards⁴¹ may eventually increase the utility of these data elements. However, currently their application to computable phenotyping or other measurement activities appears limited.

Poor-quality data can undermine any application to care delivery. However, when data quality is systematically different across populations, the risks increase that any measurement strategy, including computed phenotypes, could perpetuate societal biases and inequitable practices in health care.^42,43 Related applications of health data have demonstrated the risk of drawing the wrong inferences about patients. For example, a widely utilized risk stratification tool systematically recommended healthier White patients over sicker Black patients for care management programs, because it failed to account for differential levels of access.⁴⁴ Similarly, disease risk models developed in homogenous majority populations do not perform well for minority groups.⁴⁵ The future development of computable social risk factor phenotypes will require attention to the risks of biased and differential quality data because the processes for collecting health care data are highly variable.^46,47 Multiple frameworks and methodologies for identifying and mitigating bias exist, which could be applied to these data.^48,49 Additionally, like any advanced analytics interventions, computable social factor phenotypes would need to be continually evaluated and monitored for effectiveness and lack of bias.⁵⁰

Effectively identifying patients’ social risks is necessary for health care organizations to initiate appropriate referrals to services.⁵¹ Vendors, collaboratives, and health care organizations have successfully integrated screening questions into EHR systems and workflows to support data collection.^41,52,53 Nevertheless, usage of screening questionnaires in practice is highly variable⁵⁴ and, when used, they have increased staff’s data collection burden.⁵⁵ As a measurement strategy reliant on existing data and one that can be potentially automated, computable social factor phenotypes could support the screening use case while avoiding the challenges of administering questionnaires. However, screening for social risk factors in health care can be a sensitive issue for patients^56,57 and automated computable phenotypes are admittedly not as transparent a screening strategy as patient-completed questionnaires. If computable social factor phenotypes could be successfully developed, future work should include assessments of patient acceptability.

Next Steps

Developing computable phenotypes was beyond the scope of this Delphi panel. Nevertheless, the findings in this paper provide a candidate list of data elements that could be further evaluated for constructing computable phenotypes. As part of the identification process, panelists explained or justified their suggested data elements. Often, these explanations took the form of methods for transforming or combining data elements into novel and potentially more informative measures. As an example, panelists emphasized the potential to gain information by looking at changes in data elements over time. The most salient examples were changes in addresses to identify housing instability, in insurance status for financial status, or in emergency contact information for social relationships, which are regularly updated for billing and reimbursement reasons. Others have suggested similar uses of change in address data over time.⁵⁸ Likewise, panel members noted the information to be gained by explainable missingness (ie, instances in which social circumstances would result in data not being recorded or intentionally recorded as missing, as in the case of missing zip codes, phone numbers, or addresses for homeless individuals). Other important recommendations related to combining data across patients. For example, novel data points could be created by identifying the number of individuals sharing an address to indicate financial strain or by noting reciprocal emergency contacts as indicators of social support. Panelists also surfaced the possibility that social need could be identified using evidence of discordant utilization. An example of this type included ordering nutritional supplements or specific meals without an accompanying diagnosis that suggested a clinical need. Lastly, ideas from panel members also included using the information from the components of the patient record. This included the presence, absence, length, or accessing of social worker notes or even the frequency with which the patient portal was used. Therefore, in addition to raw, untransformed data elements, future work to develop computable phenotypes should consider these data transformations or combinations—including changes over time, explainable missingness, combining data across patients, discordant utilization, and the components of the patient record.

Limitations

Given the variation in the usage of social risk factors, risks, and needs in practice and the literature, it is possible that panel members had different conceptualizations of the social risk factors discussed. For identification, such variation likely had little effect, but the actual movement to phenotype construction would require clear construct definitions. Similarly, our survey did not reflect the multidimensional nature of data quality (eg, conformance, completeness, plausibility) but relied on a single question to reduce respondent burden. Additional work would be necessary to understand the different data quality ratings of each data element. For example, we cannot tell from this study if elements were rated as low quality due to perceived inaccuracy or that the data elements could not be relied upon because they were used too infrequently. Additional study would be required to compare panelists’ perceptions with actual data quality metrics. Also, the panelists were instructed to exclude nonstructured and other sources of data. Of course, these are important sources of social risk factor information, but their potential usage in computable phenotypes represents a different set of challenges than the ones explored in this panel. Still, the identified structured data elements could be combined with structured data from survey questionnaires or even unstructured data extracted from NLP, where available. Such combinations could be more informative. Although expert panel members recognized that social risk factors change over time, the issue of appropriate intervals for measuring social risk factors was not included in this Delphi panel. Lastly, our expert panel reflected individuals with knowledge about EHR and HIE data sources and processes that generated data for clinical, research, and business purposes. A different set of panel members, with different backgrounds, may have identified other data elements.

CONCLUSIONS

EHRs and HIE systems contain structured data elements that reflect patient social circumstances, and these data may be useful in developing computable phenotypes. Efforts to develop phenotypes should consider data quality and risks for systematic differences across populations. Future computable phenotyping research should validate strategies for incorporating concepts such as changes over time, explainable missingness, combining data across patients, discordant utilization, and the components of the patient record.

Acknowledgments

The authors thank Lindsey Sanner, MPH, for her assistance with visualizations.

Author Affiliations: Indiana University Richard M. Fairbanks School of Public Health (JRV, HEK, WMT), Indianapolis, IN; Regenstrief Institute (JRV), Indianapolis, IN; Department of Medicine (JA-M) and Department of Family and Community Medicine (LMG), University of California, San Francisco, San Francisco, CA; Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida (JB, CM, EAS, CAH), Gainesville, FL; Population Health Sciences, Weill Cornell Medical College (TRC), New York, NY; Mathematica (GRC), Washington, DC; New York eHealth Collaborative (ND), New York, NY; Owl Health Works LLC (JH), Indianapolis, IN; Departments of Family and Community Medicine and Biomedical Informatics, College of Medicine, The Ohio State University (TRH), Columbus, OH; Indiana Health Information Exchange (JPK), Indianapolis, IN; Johns Hopkins School of Public Health (HK), Baltimore, MD; Department of Population Health, Dell Medical School, The University of Texas at Austin (AK), Austin, TX; Anthem, Inc (JMO), Indianapolis, IN; Department of Pediatrics, Center for Health and Community, University of California, San Francisco (MSP), San Francisco, CA; University of Rochester Medical Center (WP), Rochester, NY; Department of Pediatrics, School of Medicine, Indiana University (SW), Indianapolis, IN.

Source of Funding: This work was supported, in part, by the Indiana Clinical and Translational Sciences Institute Fund and in part by award No. UL1TR002529 from the National Institutes of Health, National Center for Advancing Translational Sciences, Clinical and Translational Sciences Award. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author Disclosures: Dr Vest provided consulting to New York eHealth Collaborative and Pima County; is a founder and equity holder in Uppstroms, LLC, a technology company; and has patents pending with Uppstroms. Dr Wiehe reports receiving an incentive for participating in the panel. Dr Harle’s institution has received research grants related to social risk factors and their measurement. The remaining authors report no relationship or financial interest with any entity that would pose a conflict of interest with the subject matter of this article.

Authorship Information: Concept and design (JRV, CAH); acquisition of data (JRV, JA-M, LMG, JB, TRC, GC, ND, JH, TRH, JPK, HK, AJ, HEK, CM, CM, JMO, MSP, WP, EAS, WMT, SW, CAH); analysis and interpretation of data (JRV, JA-M, LMG, JB, TRC, GC, ND, JH, TRH, JPK, HK, AJ, HEK, CM, CM, JMO, MSP, WP, EAS, WMT, SW, CAH); drafting of the manuscript (JRV, JA-M, LMG, CM, HEK, CAH); critical revision of the manuscript for important intellectual content (JRV, JA-M, LMG, JB, TRC, GC, ND, JH, TRH, JPK, HK, AJ, HEK, CM, CM, JMO, MSP, WP, EAS, WMT, SW, CAH); statistical analysis (JRV, HEK, CAH); provision of patients or study materials (JRV, CM, HEK, CAH); obtaining funding (JRV); administrative, technical, or logistic support (JRV, HEK, CM, CAH); supervision (JRV, CAH).

Address Correspondence to: Joshua R. Vest, PhD, MPH, Indiana University Richard M. Fairbanks School of Public Health, 1050 Wishard Blvd, Indianapolis, IN 46202. Email: joshvest@iu.edu.

REFERENCES

1. Green K, Zook M. When talking about social determinants, precision matters. Health Affairs. October 29, 2019. Accessed December 3, 2019. https://www.healthaffairs.org/do/10.1377/hblog20191025.776011/full/

2. Alderwick H, Gottlieb LM. Meanings and misunderstandings: a social determinants of health lexicon for health care systems. Milbank Q. 2019;97(2):407-419. doi:10.1111/1468-0009.12390

3. Woolf S, Aron L, eds. U.S. Health in International Perspective: Shorter Lives, Poorer Health. The National Academies Press; 2013.

4. Commission on Social Determinants of Health. Closing the Gap in a Generation: Health Equity Through Action on the Social Determinants of Health. World Health Organization; 2008.

5. Pruitt Z, Emechebe N, Quast T, Taylor P, Bryant K. Expenditure reductions associated with a social service referral program. Popul Health Manag. 2018;21(6):469-476. doi:10.1089/pop.2017.0199

6. Bardsley M, Billings J, Dixon J, Georghiou T, Lewis GH, Steventon A. Predicting who will use intensive social care: case finding tools based on linked health and social care data. Age Ageing. 2011;40(2):265-270. doi:10.1093/ageing/afq181

7. Tan M, Hatef E, Taghipour D, et al. Including social and behavioral determinants in predictive models: trends, challenges, and opportunities. JMIR Med Inform. 2020;8(9):e18084. doi:10.2196/18084

8. Kasthurirathne SN, Vest J, Menachemi N, Halverson PK, Grannis SJ. Assessing the capacity of social determinants of health data to augment predictive models identifying patients in need of wraparound social services. J Am Med Inform Assoc. 2018;25(1):47-53. doi:10.1093/jamia/ocx130

9. Institute of Medicine. Capturing Social and Behavioral Domains in Electronic Health Records: Phase 2. The National Academies Press; 2014.

10. Henrikson NB, Blasi PR, Dorsey CN, et al. Psychometric and pragmatic properties of social risk screening tools: a systematic review. Am J Prev Med. 2019;57(6 suppl 1):S13-S24. doi:10.1016/j.amepre.2019.07.012

11. Matthew J, Hodge C, Khau M. Z codes utilization among Medicare fee-for-service (FFS) beneficiaries in 2017. CMS. January 2020. Accessed January 19, 2021. https://www.cms.gov/files/document/cms-omh-january2020-zcode-data-highlightpdf.pdf

12. Truong HP, Luke AA, Hammond G, Wadhera RK, Reidhead M, Joynt Maddox KE. Utilization of social determinants of health ICD-10 Z-codes among hospitalized patients in the United States, 2016-2017. Med Care. 2020;58(12):1037-1043. doi:10.1097/MLR.0000000000001418

13. Guo Y, Chen Z, Xu K, et al. International Classification of Diseases, Tenth Revision, Clinical Modification social determinants of health codes are poorly used in electronic health records. Medicine (Baltimore). 2020;99(52):e23818. doi:10.1097/MD.0000000000023818

14. Gottlieb LM, Francis DE, Beck AF. Uses and misuses of patient- and neighborhood-level social determinants of health data. Perm J. 2018;22:18-078. doi:10.7812/tpp/18-078

15. Buajitti E, Chiodo S, Rosella LC. Agreement between area- and individual-level income measures in a population-based cohort: implications for population health research. SSM Popul Health. 2020;10:100553. doi:10.1016/j.ssmph.2020.100553

16. Chapman WW, Nadkarni PM, Hirschman L, D’Avolio LW, Savova GK, Uzuner O. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J Am Med Inform Assoc. 2011;18(5):540-543. doi:10.1136/amiajnl-2011-000465

17. Lasser EC, Kim JM, Hatef E, Kharrazi H, Marsteller JA, DeCamp LR. Social and behavioral variables in the electronic health record: a path forward to increase data quality and utility. Acad Med. 2021;96(7):1050-1056. doi:10.1097/ACM.0000000000004071

18. Parikh RB, Jain SH, Navathe AS. The sociobehavioral phenotype: applying a precision medicine framework to social determinants of health. Am J Manag Care. 2019;25(9):421-423.

19. Frey LJ, Lenert L, Lopez-Campos G. EHR big data deep phenotyping: contribution of the IMIA Genomic Medicine Working Group. Yearb Med Inform. 2014;9(1):206-211. doi:10.15265/iy-2014-0006

20. Verchinina L, Ferguson L, Flynn A, Wichorek M, Markel D. Computable phenotypes: standardized ways to classify people using electronic health record data. Perspect Health Inf Manag. 2018;(Fall):1-8.

21. Richesson RL, Hammond WE, Nahm M, et al. Electronic health records based phenotyping in next-generation clinical trials: a perspective from the NIH Health Care Systems Collaboratory. J Am Med Inform Assoc. 2013;20(e2):e226-e231. doi:10.1136/amiajnl-2013-001926

22. Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. J Am Med Inform Assoc. 2013;20(1):117-121. doi:10.1136/amiajnl-2012-001145

23. Basile AO, Ritchie MD. Informatics and machine learning to define the phenotype. Expert Rev Mol Diagn. 2018;18(3):219-226. doi:10.1080/14737159.2018.1439380

24. Feller DJ, Bear Don’t Walk OJ IV, Zucker J, Yin MT, Gordon P, Elhadad N. Detecting social and behavioral determinants of health with structured and free-text clinical data. Appl Clin Inform. 2020;11(1):172-181. doi:10.1055/s-0040-1702214

25. Ahmad FS, Ricket IM, Hammill BG, et al. Computable phenotype implementation for a national, multicenter pragmatic clinical trial: lessons learned from ADAPTABLE. Circ Cardiovasc Qual Outcomes. 2020;13(6):e006292. doi:10.1161/CIRCOUTCOMES.119.006292

26. Richesson R, Smerek M. Electronic health records-based phenotyping. rethinking clinical trials. June 27, 2014. Accessed April 26, 2021. https://sites.duke.edu/rethinkingclinicaltrials/ehr-phenotyping/

27. McPherson S, Reese C, Wendler MC. Methodology update: Delphi studies. Nurs Res. 2018;67(5):404-410. doi:10.1097/nnr.0000000000000297

28. Hasson F, Keeney S, McKenna H. Research guidelines for the Delphi survey technique. J Adv Nurs. 2000;32(4):1008-1015. doi:10.1046/j.1365-2648.2000.t01-1-01567.x

29. Social determinants of health. HealthyPeople.gov. Accessed March 5, 2020. https://www.healthypeople.gov/2020/topics-objectives/topic/social-determinants-of-health

30. National Academies of Sciences, Engineering, and Medicine. Accounting for Social Risk Factors in Medicare Payment: Identifying Social Risk Factors. The National Academies Press; 2016.

31. Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42(2):377-381. doi:10.1016/j.jbi.2008.08.010

32. Harris PA, Taylor R, Minor BL, et al; REDCap Consortium. The REDCap consortium: building an international community of software platform partners. J Biomed Inform. 2019;95:103208. doi:10.1016/j.jbi.2019.103208

33. Kahn MG, Callahan TJ, Barnard J, et al. A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data. EGEMS (Wash DC). 2016;4(1):1244-1244. doi:10.13063/2327-9214.1244

34. Lee YW, Strong DM, Kahn BK, Wang RY. AIMQ: a methodology for information quality assessment. Inf Manage. 2002;40(2):133-146. doi:10.1016/s0378-7206(02)00043-5

35. Ferryman K, Pitcan M. Fairness in precision medicine. Data & Society. February 26, 2018. Accessed March 12, 2020. https://datasociety.net/library/fairness-in-precision-medicine/

36. Russell GJ. Itemized rating scales (Likert, semantic differential, and Stapel). In: Kamakura W, ed. Marketing Research. Wiley & Sons; 2010. Sheth J, Malhotra NK, eds. Wiley International Encyclopedia of Marketing; vol 2. Accessed March 17, 2021. https://onlinelibrary.wiley.com/doi/abs/10.1002/9781444316568.wiem02011

37. Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc. 2013;20(1):144-151. doi:10.1136/amiajnl-2011-000681

38. Callahan A, Shah NH, Chen JH. Research and reporting considerations for observational studies using electronic health record data. Ann Intern Med. 2020;172(suppl 11):S79-S84. doi:10.7326/M19-0873

39. Horsky J, Drucker EA, Ramelson HZ. Accuracy and completeness of clinical coding using ICD-10 for ambulatory visits. AMIA Annu Symp Proc. 2018;2017:912-920.

40. Weeks WB, Cao SY, Lester CM, Weinstein JN, Morden NE. Use of Z-codes to record social determinants of health among fee-for-service Medicare beneficiaries in 2017. J Gen Intern Med. 2020;35(3):952-955. doi:10.1007/s11606-019-05199-w

41. Weir RC, Proser M, Jester M, Li V, Hood-Ronick CM, Gurewich D. Collecting social determinants of health data in the clinical setting: findings from national PRAPARE implementation. J Health Care Poor Underserved. 2020;31(2):1018-1035. doi:10.1353/hpu.2020.0075

42. Rutjes AWS, Reitsma JB, Di Nisio M, Smidt N, van Rijn JC, Bossuyt PMM. Evidence of bias and variation in diagnostic accuracy studies. CMAJ. 2006;174(4):469-476. doi:10.1503/cmaj.050090

43. Rajkomar A, Hardt M, Howell MD, Corrado G, Chin MH. Ensuring fairness in machine learning to advance health equity. Ann Intern Med. 2018;169(12):866-872. doi:10.7326/M18-1990

44. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453. doi:10.1126/science.aax2342

45. Adamson AS, Smith A. Machine learning and health care disparities in dermatology. JAMA Dermatol. 2018;154(11):1247-1248. doi:10.1001/jamadermatol.2018.2348

46. Cohen GR, Friedman CP, Ryan AM, Richardson CR, Adler-Milstein J. Variation in physicians’ electronic health record documentation and potential patient harm from that variation. J Gen Intern Med. 2019;34(11):2355-2367. doi:10.1007/s11606-019-05025-3

47. Overhage JM, McCallie D. Physician time spent using the electronic health record during outpatient encounters. Ann Intern Med. 2020;173(7):594-595. doi:10.7326/M18-3684

48. Bellamy RKE, Dey K, Hind M, et al. AI Fairness 360: an extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. ArXiv. Preprint posted online October 3, 2018. Accessed May 29, 2020. http://arxiv.org/abs/1810.01943

49. Haneuse S, Daniels M. A general framework for considering selection bias in EHR-based studies: what data are observed and why? EGEMS (Wash DC). 2016;4(1):1203. doi:10.13063/2327-9214.1203

50. Embi PJ. Algorithmovigilance—advancing methods to analyze and monitor artificial intelligence–driven health care for effectiveness and equity. JAMA Netw Open. 2021;4(4):e214622. doi:10.1001/jamanetworkopen.2021.4622

51. Garg A, Boynton-Jarrett R, Dworkin PH. Avoiding the unintended consequences of screening for social determinants of health. JAMA. 2016;316(8):813-814. doi:10.1001/jama.2016.9282

52. Buitron de la Vega P, Losi S, Sprague Martinez L, et al. Implementing an EHR-based screening and referral system to address social determinants of health in primary care. Med Care. 2019;57(6 suppl 2):S133-S139. doi:10.1097/mlr.0000000000001029

53. Gold R, Bunce A, Cowburn S, et al. Adoption of social determinants of health EHR tools by community health centers. Ann Fam Med. 2018;16(5):399-407. doi:10.1370/afm.2275

54. Cottrell EK, Dambrun K, Cowburn S, et al. Variation in electronic health record documentation of social determinants of health across a national network of community health centers. Am J Prev Med. 2019;57(6 suppl 1):S65-S73. doi:10.1016/j.amepre.2019.07.014

55. Greenwood-Ericksen M, DeJonckheere M, Syed F, Choudhury N, Cohen AJ, Tipirneni R. Implementation of health-related social needs screening at Michigan health centers: a qualitative study. Ann Fam Med. 2021;19(4):310-317. doi:10.1370/afm.2690

56. Kusnoor SV, Koonce TY, Hurley ST, et al. Collection of social determinants of health in the community clinic setting: a cross-sectional study. BMC Public Health. 2018;18(1):550. doi:10.1186/s12889-018-5453-2

57. Pinto AD, Glattstein-Young G, Mohamed A, Bloch G, Leung FH, Glazier RH. Building a foundation to reduce health inequities: routine collection of sociodemographic data in primary care. J Am Board Fam Med. 2016;29(3):348-355. doi:10.3122/jabfm.2016.03.150280

58. Gold R, Cottrell E, Bunce A, et al. Developing electronic health record (EHR) strategies related to health center patients’ social determinants of health. J Am Board Fam Med. 2017;30(4):428-447. doi:10.3122/jabfm.2017.04.170046