Comparative analysis of ChatGPT and Gemini responses on epistaxis: Accuracy and readability

Okkes Zortuk; Cihan Bedel; Fatih Selvi

doi:10.38029/babcockuniv.med.j..v8i2.1085

Authors

Zortuk O Department of Emergency Medicine, Bandırma State Hospital, Balıkesir, Turkey
Bedel C Department of Emergency Medicine, Health Science University Antalya Training and Research Hospital, Antalya, Turkey
Selvi F Department of Emergency Medicine, Bandırma State Hospital, Balıkesir, Turkey

DOI:

https://doi.org/10.38029/babcockuniv.med.j..v8i2.1085

Keywords:

Chat-GPT, GEMINI, Readability, Epistaxis, AI in healthcare, Patient education

Abstract

Objective: There is a dearth of literature addressing the utilisation of AI models in patients with epistaxis, and ambiguities exist in the responses these models provide to patient-generated inquiries. This study aimed to evaluate and compare the accuracy and readability of responses regarding frequently asked questions (FAQs) about epistaxis provided by two advanced artificial intelligence (AI) models: ChatGPT-4 Pro and Gemini 2.5 Pro.

Methods: A total of 30 commonly asked questions about epistaxis were retrieved from the publicly accessible Quora platform and submitted separately to ChatGPT and Gemini. Two independent medical experts evaluated the AI-generated responses on a 5-point scale, focusing on accuracy and comprehensibility. Readability was assessed using multiple indices, including the Flesch Reading Ease, Gunning Fog, and SMOG indices, among others. Statistical analyses, including interobserver agreement and t-tests, were conducted using SPSS v27.

Results: The mean evaluation scores from the two observers were 4.18 ± 0.85 and 4.01 ± 0.83, respectively, with excellent interobserver agreement (ICC = 0.877, p < 0.001). ChatGPT scored slightly higher (4.18 ± 0.66) than Gemini (4.01 ± 0.91), though the difference was not statistically significant (p = 0.179). When readability metrics for artificial intelligence were compared, there was no difference in all parameters except Linear Write Grade Level Formula: 14.17±4.56 vs. 10.35±3.77, p<<0.001.

Conclusion: Both ChatGPT and Gemini provided highly accurate and readable responses to questions about epistaxis. These results indicate that AI-based tools can effectively support patient education and clinical communication. However, attention to content readability and regular evaluation is still necessary.

Comparative analysis of ChatGPT and Gemini responses on epistaxis: Accuracy and readability

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License