A Hybrid CNN–Vision Transformer and Explainable AI Framework for Real-Time Retinal Disease Diagnosis in IoT-Enabled and Ubiquitous Healthcare Systems
Raghad Saleem Mohamed NajeebDepartment of Software, College of Computer Science and Mathematics, University of Mosul, Ninevah, Iraq. raghad.saleem@uomosul.edu.iq0009-0007-1434-2342
Shatha Abdullah MohammedDepartment of Software, College of Computer Science and Mathematics, University of Mosul, Ninevah, Iraq. shathaabdullah@uomosul.edu.iq0000-0002-3098-0519
Mohammed F Ibrahim AlsarrajTechnical Engineering College for Computer and AI–Mosul, Northern Technical University, Mosul, Iraq. mohammed_alsarraj@ntu.edu.iq0000-0001-7886-4464
Eye diseases, such as diabetic retinopathy, are a major cause of preventable visual loss, and it is important to note that a high level of accuracy and efficiency is required in automated solutions to screen for this disease. This study proposes a Modified Artificial Intelligence (MAI) approach that blends a hybrid convolutional neural network vision transformer architecture with stacked autoencoder-based feature enhancement and Grad-CAM-based explainability. Performance evaluation was performed on publicly accessible retinal fundus datasets. An empirical assessment was conducted by comparing the state-of-the-art performance of conventional CNN, ResNet, and Efficient Net. Empirical evidence shows that the proposed MAI does reach high levels of performance with 98.2% accuracy and an area under the receiver operating characteristic curve of 0.992, all at a low inference latency that can be matched to real-time deployment. Clinical interpretability is further enhanced by the fact that explainable visual cues are added. All these findings display that the MAI framework is a valid and efficient solution to automated diagnosis of retinal diseases in a mobile and IoT-based health-care environment. However, existing automated diabetic retinopathy screening models suffer from limited cross-dataset generalization, a lack of clinical interpretability, and high computational complexity, which restrict their deployment in real-time, mobile, and IoT-enabled healthcare settings.