Translational Science Benefits
Summary
HIV screening is essential, as individuals unaware of their HIV status account for 40% of new infections.1,2 Beyond limiting the spread of disease, screening serves as the first step in linking patients to HIV care.3-5 People who use drugs (PWUD), particularly those who inject drugs (PWID), are key populations for screening, given that sharing used needles remains a significant mode of HIV transmission. However, screening rates among PWID remain low compared to other at-risk groups, with only 55% receiving annual testing in 2018.6,7 Compounding this issue, PWUD are seven times more likely to be hospitalized than the general population and are less likely to seek primary care, making routine screening in healthcare settings even more critical.1,2
Natural language processing (NLP) is a branch of artificial intelligence (AI) and computer science focused on using computers to read, process, and interpret human language.8,9 It can extract and analyze information from unstructured text in patient records.10,11 To develop a system for identifying hospitalized PWUD, we collaborated with people across the spectrum of health services research and clinical care —including patients, clinicians, biostatisticians, and health informatics experts — to design a method of PWUD identification using NLP applied to the electronic health record (EHR).
By using NLP, we can overcome the limitations of using administrative billing codes known as ICD-10 codes. These codes track health encounters and are typically useful in identifying various patient populations. However, since these codes are collected for billing and reimbursement purposes, a clinician might only include the necessary codes in the administrative section of the patient chart while documenting additional information as unstructured text in the notes section. ICD-10 codes that indicate drug use may not always be enough to identify PWUD, depending on where the code is placed in the chart. For some kinds of conditions, such as injection drug use (IDU), there are no ICD-10 codes.12 By improving methods of PWUD identification, providers can increase HIV screening rates for PWUD during hospital stays.2
Significance
Focusing on under-detected and underserved populations is crucial, as healthcare systems are often designed around those well-represented in the data. PWUD represents one population of marginalized patients overlooked by traditional data sources like structured EHR fields and billing codes. There are marginalized communities, including those who are homeless, refugees, or incarcerated, who face higher health risks but cannot be identified in health data. By uncovering “hidden patients,” NLP can enhance research efforts to identify many people who face barriers to equitable healthcare.
Benefits
Demonstrated benefits are those that have been observed and are verifiable.
Potential benefits are those logically expected with moderate to high confidence.
Better identify under-identified patient cohorts for potential research studies. potential.
Clinical
Improve patient identification in EHR data by using NLP when structured fields and ICD-10 codes are insufficient. demonstrated.
Clinical
Increase HIV testing rates and enable more tailored healthcare services for PWUD. demonstrated.
Community
Reduce overall viral load and disease spread through routine testing and early identification of HIV. potential.
Community
Increase number of PWUD who receive HIV screening, preventative services, targeted interventions, and timely treatment. potential.
Community
Reduce HIV-related complications through early identification and treatment of HIV. potential.
Community
Reduce the cost of treatment through an increase in early identification and intervention in the treatment of complex diseases. potential.
Economic
Identify and treat individuals at high-risk for HIV early, leading to a reduction in costs associated with HIV-related repeat hospitalizations at the patient, hospital, and societal levels. potential.
Economic
Reduce the burden of disease and alleviate strain on overburdened healthcare system through early identification of under-identified patient populations to provide treatment. potential.
Economic
Improve data collection on PWUDs to inform more effective policies on harm reduction and treatment options. potential.
Policy
This research has clinical, community, economic, and policy implications. The framework for these implications was derived from the Translational Science Benefits Model created by the Institute of Clinical & Translational Sciences at Washington University in St. Louis.13
Clinical
Currently, hospitals have systems that let clinicians know how well they are following treatment guidelines for heart failure and managing antibiotics. We want to improve these systems using NLP for identifying PWUD, making it easier for clinicians to quickly assess the care being provided during patient visits. Using NLP will help compare the quality of care for PWUD, giving clinicians useful information, like how often certain treatments are used. For example, clinicians caring for patients in the hospital could see information on the use of evidence-based treatments for withdrawal, medications for opioid use disorder, and how often patients leave the hospital on their own terms. NLP can also identify stigmatizing terms used in health records and prompt feedback to clinicians about using more respectful, patient-first language.
Community
Building trust between the community and hospital systems is essential. The NLP system will increase transparency in care outcomes for PWUD, helping to identify higher-risk individuals earlier. By doing so, it provides opportunities to reduce viral loads through timely treatment, which in turn helps to limit HIV transmission within the community. This approach will not only improve HIV testing rates but also ensure that PWUD have access to tailored healthcare services, preventative care, and targeted interventions. Ultimately, by promoting routine testing and early HIV identification, we can reduce overall viral load, prevent the spread of HIV, and address HIV-related complications through early diagnosis and treatment.
Economic
People who leave the hospital before being medically advised to leave have high rates of readmission for more severe disease. NLP may serve as a tool to identify PWUD, a population who may also be at high risk for readmission. By following evidence-based addiction care, hospital systems may reduce rates of patient-directed discharges and positively impact the finances of their system.
Policy
This work has the potential to inform healthcare policies by providing more accurate data on the needs of PWUD and other underserved populations. With better identification systems in place, policymakers can develop harm reduction strategies and treatment options tailored to the specific needs of these groups. Ultimately, these advancements can lead to more inclusive, data-driven policies that promote health equity and improve access to care for underserved populations. With the rapid adoption and expansion of AI, full policy implications are not yet understood. Additional policies safeguarding patient privacy may be warranted.
Lessons Learned
An important lesson learned from this research project is the complexity of language and how it affects data in EHRs. While using standard phrases in clinical notes can help make key information easier to find, it also makes it harder for NLP systems to understand the full context of patient care. NLP work in healthcare requires collaboration from many different fields, including computer science, ethics, clinical medicine, and, most importantly, people who have the lived experience of using drugs.
- HIV.gov. Ending the HIV Epidemic. 2020. Accessed November 13, 2023.
- Centers for Disease Control and Prevention. HIV testing. Accessed November 13, 2023.
- Walensky, R. P., Freedberg, K. A., et al. (2007). Cost-effectiveness of HIV testing and treatment in the United States. Clinical Infectious Diseases: An Official Publication of the Infectious Diseases Society of America, 45 Suppl 4(Suppl 4), S248-254.
- Mwachofi, A., Fadul, N. A., et al. (2021). Cost-effectiveness of HIV screening in emergency departments: A systematic review. AIDS Care, 33(10), 1243–1254.
- Phillips, K. A., Fernyak, S. (2000). The cost-effectiveness of expanded HIV counseling and testing in primary care settings: A first look. AIDS, 14(14), 2159–2169.
- Farhadian, N., Karami Matin, B., et al. (2022). The prevalence of people who inject drugs among those with HIV late presentation: A meta-analysis. Substance Abuse Treatment, Prevention, and Policy, 17(1), 11.
- Smith, A. Mingjing ,et al. (2020). HIV Infection, risk, prevention, and testing behaviors among persons who inject drugs :National HIV Behavioral Surveillance: injection drug use, 23 U.S. cities, 2018. (24).
- Harrison, C. J., Sidey-Gibbons, C. J. (2021). Machine learning in medicine: A practical introduction to natural language processing. BMC Medical Research Methodology, 21(1), 158.
- National Library of Medicine. Natural language processing. Updated June 13, 2022. Accessed January 21, 2025.
- Reading Turchioe, M., Volodarskiy, A., et al., & Slotwiner, D. (2022). Systematic review of current natural language processing methods and applications in cardiology. Heart, 108(12), 909–916.
- Feller, D. J., Zucker, J., et al. (2018). Using Clinical Notes and Natural Language Processing for Automated HIV Risk Assessment. JAIDS Journal of Acquired Immune Deficiency Syndromes, 77(2), 160–166.
- McGrew, K. M., Homco, J. B., et al., & Carabin, H. (2020). Validity of International Classification of Diseases codes in identifying illicit drug use target conditions using medical record data as a reference standard: A systematic review. Drug and Alcohol Dependence, 208, 107825.
- Luke DA, Sarli CC, Suiter AM, et al. The Translational Science Benefits Model: A New Framework for Assessing the Health and Societal Benefits of Clinical and Translational Sciences. Clin Transl Sci. 2018;11(1):77-84.