Skip to main content
ASOHNS ASM 2026
Multimodal Artificial Intelligence (AI) for Otoscopic Diagnosis: Combining Otoscopic images with Tympanometry and Audiometry to Classify Middle Ear Disease
Verbal Presentation

Verbal Presentation

2:12 pm

20 March 2026

Grand Ballroom 1

Concurrent Session 1B - General Otology

Themes

Default

Talk Description

Institution: Westmead Hospital - NSW , Australia

Aims: Artificial intelligence (AI) image classification models can detect paediatric middle ear disease from otoscopic images, Hhowever, they are limited by the inability to interpret complementary tympanometry and audiometry. A fusion model, combining large language model (LLM) clinical data interpretation with image classification, can fill this gap and provide a more comprehensive diagnostic tool. Methodology: Otoscopic images, tympanometry and audiogram results were collected prospectively by nursing staff as part of routine primary care assessments. A cohort of 80 encounters was analysed. Tympanometry and audiograms were processed by four LLMs (GPT-4, Claude 3.5 Sonnet, Grok-4, DeepSeek), each paired with an otoscopic image-only classification algorithm, to establish a multi-modal fusion model. Diagnostic outputs from each LLM–image system were generated and compared against ground truth, defined by a panel of 13 otolaryngologists, for the multi-classification of normal, acute otitis media, otitis media with effusion, and chronic otitis media. Results: The image-only classifier achieved accuracy of 87.3%, AUC=0.959, and κ=0.83. When combined with LLMs, overall diagnostic performance improved across hybrid configurations. The top fusion model (with GPT-4) reached 92.6% accuracy (AUC=0.962, κ=0.90), followed by Claude 3.5 Sonnet (accuracy=92.5%, AUC=0.963, κ=0.90), AI Grok-4 (accuracy=92.4%, AUC=0.963, κ=0.90), and DeepSeek V2 (accuracy=88.9%, AUC=0.955, κ=0.85). All four hybrid fusion models showed significantly higher diagnostic accuracy and inter-rater agreement than the image-only classifier (p<0.05 for all comparisons). Conclusion: Multi-modal integration of LLM-interpreted tympanometry and audiograms with otoscopic image classification significantly improves diagnostic performance in middle ear disease. This marks a major step towards AI systems reflecting real world clinical reasoning, synthesising multiple tiers of imaging and clinical data.
Presenters
Authors
Authors

Dr Justin Eltenn - , Dr Al-Rahim Habib - , Dr Tony Lian - , Professor Narinder Singh -