News

Article

DeepSeek-R1 More Effective in Diagnosis, Management of Ophthalmic Subspecialties Compared With OpenAI

Fact checked by:

Key Takeaways

  • DeepSeek-R1 outperformed OpenAI o1 in diagnostic accuracy (70.4% vs. 63.0%) and management recommendations (82.7% vs. 75.8%) across ophthalmic subspecialties.
  • The study highlighted DeepSeek-R1's cost-effectiveness, with a total cost of $1.12 for 422 prompts, compared to OpenAI o1's $16.91.
SHOW MORE

DeepSeek-R1 was able to improve diagnosis management in these subspecialties while also lowering operation costs.

Diagnosis and management of different ophthalmic subspecialties were better addressed when clinical cases were entered into DeepSeek-R1, which outperformed OpenAI o1 in the same categories. The study, published in JAMA Ophthalmology,1 also found that DeepSeek-R1 was able to lower operating costs with its performance in diagnosis and management.

DeepSeek outperformed OpenAI in clinical diagnosis and management of ophthalmic subspecialties | Image credit: 光画社 (Kōgasha) - stock.adobe.com

DeepSeek outperformed OpenAI in clinical diagnosis and management of ophthalmic subspecialties | Image credit: 光画社 (Kōgasha) - stock.adobe.com

Clinicals can be aided with clinical decision-making with the use of large language models,2 a form of AI that has natural language processing capabilities. DeepSeek-R1 is a form of large language model that was released in January 2025 and had lower costs compared with other models like OpenAI’s GPT-4.1 This study aimed to assess the performance of DeepSeek-R1 diagnostic accuracy, capacity to recommend different management techniques, and cost analysis compared with OpenAI’s o1 model when using previously published clinical cases.

The researchers used a cross-sectional study to perform the analysis, with all clinical cases coming from JAMA Ophthalmology’s Clinical Challenge articles. A total of 13 subspecialties were used to classify each case found in the analysis, which included retina and vitreous cases, pathology and tumors, pediatrics, lens and cataract, and glaucoma, among others. All data were analyzed in March 2025.

The official chat user interface was used to interact with DeepSeek-R1 whereas the application program interface was used to access OpenAI o1. Clinical cases were entered into each interface by including the case description, a multiple-choice question, and choices for answers. No figures were included in the assessment.

There were 422 clinical cases spanning 10 subspecialties that were included in the analysis. The diagnostic accuracy of DeepSeek-R1 was 70.4%, which was higher than the 63.0% accuracy of OpenAI o1 (95% CI, 1.0-13.7% difference). DeepSeek-R1 picked the correct next step in management in 82.7% of the cases compared with OpenAI o1 having a next-step accuracy of 75.8% (95% CI, 1.4%-12.3% difference).

DeepSeek-R1 was more accurate than OpenAI o1 in 8 of the 10 subspecialties but this was not statistically significant. Next-step performance was also better from DeepSeek-R1 in 9 of the 10 subspecialties. Subspecialties with less than 10 entries showed no significant differences between the 2 models.

The cost of using OpenAI o1 for this study amounted to $16.91 for all 422 prompts compared with $1.12 for DeepSeek-R1.

The study did have some limitations. The cost of using each model was based on the final answers produced and did not take intermediate production into account, underestimating costs of both models. Ophthalmic images were not included for every case due to DeepSeek-R1’s weakness in evaluating images. There were fewer than 10 cases for 2 of the subspecialties, which limited the analysis of accuracy in diagnosis between the 2 models.

The authors concluded that DeepSeek-R1 had higher accuracy in diagnosing different complex ophthalmology cases across different subspecialties when compared with OpenAI o1.

“Nevertheless, addressing the risks of hallucinations, ensuring robust multimodal integration, and validating safety constraints will be essential for translating DeepSeek-R1’s capabilities into mainstream practice,” the authors wrote.

References

1. Mikhail D, Farah A, Milad, et al. DeepSeek-R1 vs OpenAI o1 for ophthalmic diagnoses and management plans. JAMA Ophthalmol. Published online September 4, 2025. doi:10.1001/jamaophthalmol.2025.2918

2. Gaige M. Open-source AI matches top proprietary LLM in solving tough medical cases. Harvard Medical School. March 14, 2025. Accessed September 8, 2025. https://hms.harvard.edu/news/open-source-ai-matches-top-proprietary-llm-solving-tough-medical-cases

Newsletter

Stay ahead of policy, cost, and value—subscribe to AJMC for expert insights at the intersection of clinical care and health economics.

Related Videos
Ravi Vij, MD, MBA – AJMC
Varsha Tanguturi, MD, MPH, Mass General Hospital
AJMC Managed Markets Network Logo
CH LogoCenter for Biosimilars Logo