Skip to main content
ASOHNS ASM 2026
Paediatric Tonsillitis: Evaluation of AI Generated Answers vs Guidelines to Commonly Asked Questions
Poster

Poster

Themes

ASOHNS

Presentations Description

Institution: Nepean Hospital - NSW, Australia

Background: Artificial intelligence (AI) is becoming increasingly prevalent as a medical tool for patients. This study aims to investigate the performance of ChatGPT and Google’s Gemini in paediatric tonsillitis vignettes, with comparison to recognised international guidelines and patient-centred online resources for accuracy and quality. Methodology: Five commonly asked questions about paediatric tonsillitis were selected by four ENT surgeons and posed to ChatGPT and Gemini. Patient resources and medical guidelines were also searched for relevant answers. Answers were ranked using the Appropriateness of Preclinical Measures (APM) score. Readability scores were calculated for each answer to determine ease of interpretation using the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). Results: Mean APM scores for ChatGPT (No Prompt) and Gemini were 3.90 (± 0.31) and 3.15 (±1.60) respectively. Analysis showed neither was superior (p=0.33). When acting as an experienced ENT surgeon, ChatGPT’s mean APM score increased to 4.0 (± 0.00). This score was significantly higher compared paediatric tonsillectomy guidelines published by the American Academy of Otolaryngology–Head and Neck Surgery (AAO-HNS) and the Royal Australasian College of General Practitioners (RACGP) (both with mean APM scores 3.10 (± 1.70); p<0.02). There was no significant difference between both versions of ChatGPT (p=0.99). Patient facing resources were most readable (FRES ~61; FKGL 8.60–8.70; grade 8-9 level). The AAO-HNS guidelines showed lowest readability (FRES 46.3 ±9.29; FKGL ≈ 10.8). All AI outputs exceeded recommended complexity levels. Conclusion: These results suggest that AI systems may be equivalent to guidelines or online resources and shows potential for AI to aid the public. However, responses may lack the depth required for thorough understanding. Quality and accuracy of AI answers need improvement before it can be trusted as a primary information resource for patients.
Speakers
Authors
Authors

Dr Surya Singh - , Dr Femi Ayeni - , Dr Anand Suruliraj - , Dr Niranjan Sritharan - , A/Prof Faruque Riffat - , Dr Suchitra Paramaesvaran -