News|Articles|July 31, 2025

LLMs Show Promise, but Challenges Remain in Improving Inefficient Clinical Trial Screening

Fact checked by: Rose McNulty

Listen

0:00 / 0:00

Key Takeaways

GPT-4 outperforms GPT-3.5 in clinical trial patient screening but is slower and more expensive, requiring human oversight due to potential errors.
Low patient accrual in clinical trials is a major issue, with manual screening being time-consuming and inefficient, necessitating automated solutions.
GPT-4's limitations include high costs, lack of metadata extraction, and the need for expert guidance, emphasizing the need for interdisciplinary collaboration.
External validation across diverse clinical trials and institutions is crucial to strengthen the study's conclusions and improve generalizability.

Large language models (LLMs) such as GPT-3.5 and GPT-4 may offer a solution to the costly and inefficient process of manual clinical trial screening, which is often hindered by the inability of structured electronic health record data to capture all necessary criteria.

Large language models (LLMs) like GPT-4 can effectively analyze unstructured clinical notes to improve the efficiency of patient screening for clinical trials, but while GPT-4 consistently outperforms GPT-3.5, it is slower and more costly, and both models still require human oversight due to potential errors and limited sensitivity in identifying eligible patients, according to a study published in Machine Learning: Health.¹

Factors impeding patient accrual include resource scarcity, inefficient manual screening processes, and limited availability of research staff. Manual eligibility screening is particularly time-consuming, often requiring over 40 min per patient. Even though manual screening is inefficient, it is currently a standard practice when conducting clinical trial research.² Previous research found that an automated screening system reduced patient screening time by 34% compared with manual methods, demonstrating the inefficiency of the traditional process.

There is a need for improved interdisciplinary collaboration between doctors, data scientists, and domain experts.³ It underscores that customizing machine learning and natural language processing approaches for specific medical situations is a difficult task due to the limited availability of high-quality data for niche disorders and the ethical concerns surrounding patient privacy and data protection.

Researchers evaluated the performance of GPT-3.5 and GPT-4 in screening 74 patients for a head and neck cancer trial using EHR data.¹ They tested 3 prompting methods, including a structured output, chain of thought, and self-discover.

GPT-4 consistently outperformed GPT-3.5 across all metrics. While GPT-3.5's best-performing methods achieved an accuracy of 91% and a Youden's Index (YI) of 0.59, GPT-4's median performance was notably higher, with a median accuracy of 84%, a sensitivity of 84%, and a specificity of 83%.

GPT-4's most effective method, the self-discover approach, yielded a superior YI of 0.73, showcasing a better balance of sensitivity and specificity. In other trials, GPT-4 maintained its lead with median accuracies of 94% and 85%, significantly surpassing GPT-3.5's median accuracies of 87% and 72% in the same contexts. Its highest scores for both accuracy and YI consistently exceeded those of GPT-3.5.

When assessing patient eligibility for trial enrollment, GPT-3.5 had a median accuracy of 0.54 (95% CI, 0.50-0.61), with the structured and expert guidance approach achieving the best result at 0.611. While its specificity was high (median 100%), its sensitivity was very low (median 0%), indicating it was poor at identifying eligible patients.

GPT-4 performed slightly better, with a median accuracy of 0.61 (95% CI, 0.54-0.65) and a highest accuracy of 0.65 using the chain of thought plus expert approach. Similar to its predecessor, GPT-4 maintained high specificity (median 100%) but also had a low sensitivity (median 16%), showing that both models struggle to correctly identify eligible patients despite being effective at ruling out those who are ineligible.

Screening a single patient with GPT-3.5 took between 1.4 and 3 minutes at a cost of $0.02 to $0.03. In contrast, GPT-4 was significantly slower and more expensive, with screening times ranging from 7.9 to 12.4 minutes and costs from $0.15 to $0.27 per patient. The higher cost and longer processing time for GPT-4 are likely due to its increased computational demands.

After reviewing 42 misclassifications (21 for each model), 2 main types of errors were identified. The most common issue for both models was the improper processing of available information, accounting for 95% of GPT-4's errors and 71% of GPT-3.5's errors. This failure occurred when the model correctly identified the relevant text but misinterpreted details like dates, locations, or clinical requirements. The second type of error, a failure to identify relevant information, was more prevalent in GPT-3.5 (29% of its errors) than in GPT-4 (5% of its errors), where the model simply failed to locate the necessary text to answer the question correctly.

The study's LLM-based approach has several limitations. First, while cost-effective compared with manual screening, the use of closed-source GPT models raises concerns about ongoing costs and generalizability to open-source alternatives. Second, the system lacks metadata extraction from clinical notes, which would enable better chronological understanding. Third, generating expert guidance requires specialized domain expertise, posing a barrier to widespread adoption. Other limitations include the lack of structured data integration, limited analysis of how indexing affects outcomes, and the absence of performance data for diverse trial types. The need for human review to correct for potential LLM hallucinations is also a factor. A key limitation is that the patient sample is from a single institution with a specific documentation style, which may limit the generalizability of the findings to other health care settings and patient populations. Overall, the study's conclusions would be strengthened by external validation across a larger, more diverse set of clinical trials and institutions.

“LLM performance varies by prompt, with GPT-4 generally outperforming GPT-3.5, but at higher costs and longer processing times. LLMs should complement, not replace, manual chart reviews for matching patients to clinical trials,” study authors concluded.

References

Beattie J, Owens D, Navar AM, et al. ChatGPT augmented clinical trial screening. Mach Learn Health. 2025;1(1):015005. doi:10.1088/3049-477x/adbd47
Ni Y, Bermudez M, Kennebeck S, Liddy-Hicks S, Dexheimer J. A real-time automated patient screening system for clinical trials eligibility in an emergency department: design and evaluation. JMIR Med Inform. 2019;7(3):e14185. doi:10.2196/14185
Khalate P, Gite S, Pradhan B, Lee CW. Advancements and gaps in natural language processing and machine learning applications in healthcare: a comprehensive review of electronic medical records and medical imaging. Front Phys. 2024;12. doi:10.3389/fphy.2024.1445204

Stay ahead of policy, cost, and value—subscribe to AJMC for expert insights at the intersection of clinical care and health economics.

Subscribe Now!

Latest CME

Virtual Symposium

Closing Gaps in CLL Care: Managed Care Insights and Strategies

1.5 Credit / Hematologic Cancer, Oncology

On-Demand Webinar

Revolutionizing Acute Pain Relief: Emerging Non-Opioid Therapies and the Essential Role of Pharmacists

1.5 Credit / Pain Management/Opioids

On-Demand Webinar

Overview and Burden of Migraine

0.5 Credit / Neurology

On-Demand Webinar

The Role of the Pharmacist Migraine Management

0.5 Credit / Neurology

On-Demand Webinar

The Evolving Treatment Landscape of Migraine

0.5 Credit / Neurology

On-Demand Webinar

Evidence-Based Guidelines Point to Leap in Care for Hypertrophic Cardiomyopathy

1.0 Credit / Cardiology

On-Demand Webinar

ASH Annual Meeting Coverage: Highlighting the Recent Updates in TKI Use in the Treatment of CML – Insights and Application for Managed Care

1.0 Credit / Oncology, Hematology, Hematologic Cancer

Online Article

New and Approved: FDA’s 2024 Drug Lineup

1.5 Credits / General Pharmacy

Online Article

Innovations in Medicine: 2024 Lineup of New and Approved Specialty Drugs

2.0 Credits / General Pharmacy

AJMC Supplement

Revolutionizing Acute Pain Relief: Emerging Nonopioid Therapies and the Essential Role of Managed Care

2.0 Credits / Pain Management

LLMs Show Promise, but Challenges Remain in Improving Inefficient Clinical Trial Screening

Key Takeaways

Newsletter

Related Content

CAR T-Cell Therapies, Bispecific Antibodies Usher in New Era in R/R MCL

Trial Data Yield Risk-Based NRSTS Treatment Strategies

Global CKD Burden Nearly Doubles Since 1990, Reaching 788 Million Adults Worldwide

New Proteomic Analysis Maps Cytokine Signatures Driving IPF

Genetic Study Establishes Causal Link Between RA and Interstitial Lung Disease Risk

Latest CME

Closing Gaps in CLL Care: Managed Care Insights and Strategies

Revolutionizing Acute Pain Relief: Emerging Non-Opioid Therapies and the Essential Role of Pharmacists

Overview and Burden of Migraine

The Role of the Pharmacist Migraine Management

The Evolving Treatment Landscape of Migraine

Evidence-Based Guidelines Point to Leap in Care for Hypertrophic Cardiomyopathy

ASH Annual Meeting Coverage: Highlighting the Recent Updates in TKI Use in the Treatment of CML – Insights and Application for Managed Care

New and Approved: FDA’s 2024 Drug Lineup

Innovations in Medicine: 2024 Lineup of New and Approved Specialty Drugs

Revolutionizing Acute Pain Relief: Emerging Nonopioid Therapies and the Essential Role of Managed Care

Leveraging Managed Care to Optimize Patient Outcomes: Integrating Novel Treatments in Schizophrenia

Advancing Immunotherapy in Endometrial Cancer: A Managed Care Perspective on Personalized Care

Overcoming Operational and Clinical Barriers in Multiple Myeloma: Managed Care Strategies for Antibody-Based Regimens

Exploring Therapeutic Advances in Myelofibrosis and Key Considerations for Effective Management

Driving Better Outcomes in Hypertrophic Cardiomyopathy: A Managed Care Imperative

Optimizing the Uptake of Long-Acting Injectables in HIV Treatment and Prevention: Considerations for Managed Care

Harnessing Data-Driven Insights and Innovations to Enhance AMD and DME Management: Strategic Approaches for Managed Care

Navigating the Advancements in Oral Therapy Options in HR+ Metastatic Breast Cancer

Navigating the Changing Treatment Paradigm of Metabolic Dysfunction-Associated Steatohepatitis: A Guide for Managed Care Pharmacists

Managed Care Approaches and Models for Equitable Access to Innovations in Major Depressive Disorder Treatment

Evaluating the Effectiveness and Value of Novel Nonhormonal Treatments in the Management of Menopause-Related Vasomotor Symptoms

Updated Guidance and Managed Care Strategies to Optimize Respiratory Syncytial Virus Vaccination Coverage

Leveraging Biologics and Immunotherapies in the Management of Food Allergies

Advancing Care in Neovascular Age-Related Macular Degeneration and Diabetic Macular Edema: Optimizing Outcomes With Emerging Therapies

The Evolving Landscape of Transthyretin Amyloidosis Cardiomyopathy: New Therapies and Treatment Strategies

Reflecting on the Real-World Use of Biologic Therapy in Asthma Management

Advancing Cystic Fibrosis Management: The Evolving Role of Specialty and Managed Care Pharmacists

Pulmonary Arterial Hypertension: Real-World Applications of New Therapies and Management Strategies

Targeting Chronic Rhinosinusitis With Nasal Polyps With Biologics: Optimizing Outcomes and Reducing Health Care Burden

Emerging Treatment Options for Inflammatory Bowel Disease: A Focus on IL-23 Pathway Inhibition

Transforming Metabolic Dysfunction-Associated Steatohepatitis Management: Pharmacist Considerations for the Evolving Treatment Landscape

Innovations in Inflammatory Bowel Disease Therapy: How IL-23 Inhibitors Are Shaping Treatment and Managed Care Approaches

The Expanding Therapeutic Landscape in IgA Nephropathy: Applying Evidence-Based Strategies and Guideline Updates for Managed Care

Leveraging Novel Therapies to Transform Demodex Blepharitis Care (Pharmacy Technician Credit)

Leveraging Novel Therapies to Transform Demodex Blepharitis Care

The Changing Paradigm in Pain Management and Supporting Access to Novel Therapies

The Expanding Therapeutic Landscape in IgA Nephropathy: A Case-Based Exploration of Optimized Patient Outcomes

Opioids, Pain Management, and Substance Use Disorder: A Practical Overview

Trending on AJMC

CAR T-Cell Therapies, Bispecific Antibodies Usher in New Era in R/R MCL

5 Expanded FDA Approvals From October

Trial Data Yield Risk-Based NRSTS Treatment Strategies

Missed Hepatitis B Treatment Opportunities in the US Increase Health Risks

Global CKD Burden Nearly Doubles Since 1990, Reaching 788 Million Adults Worldwide