Early Detection of Multilingual Mental Health Depression Using Pretrained Transformers and Machine Learning
Ali Sami AzeezDepartment of Information Technology Management, Technical College of Management, Middle Technical University, Baghdad, Iraq. ali.sami@mtu.edu.iq0000-0003-3433-7
Osama Abduljaleel AliComputer Center, Al-Muthanna University, Al-Muthanna, Iraq. osama@mu.edu.iq0000-0002-0711-5025
Nawar Abbood FadhilDepartment of Information Technology Management, Technical College of Management, Middle Technical University, Baghdad, Iraq. nawar@mtu.edu.iq0000-0002-7741-2965
Dr. Ali Mohammed SahanDepartment of Information Technology Management, Technical College of Management, Middle Technical University, Baghdad, Iraq. dralimohammed2@gmail.com0000-0001-5161-4756
Keywords: Multilingual Depression Detection, Mental Health Analytics, Social Media Mining, Transformer Models, XLM-RoBERTa, Machine Learning, Natural Language Processing, Digital Mental Health.
Abstract
The social media is producing vast amounts of user-generated text, which can serve as a great indicator of initial mental health diagnosis. This paper develops a scalable, multilingual depression classifier based on classical machine learning (ML) methods and state-of-the-art, pretrained transformer-based models to overcome the weaknesses of language-specific and binary-only methods in previous studies. In a contrast to the majority of the studies, the work is a systematic exploration of bilingual and multilingual depression recognition in the context of Arabic, English, Russian, and Spanish data in a single pipeline. TF-IDF is used to represent textual information to conventional ML classifiers, such as SVM, Random Forest, Naive Bayes and AdaBoost, and transformers, such as XLM-RoBERTa and XLNet are used to train contextual semantic representations. Decades of experiments demonstrate that models using transformers always perform better in comparison to traditional models of machine learning. XLM-RoBERTa provided 94.33% accuracy, 0.94 F1-score, and 0.99 AUC, which outperforms SVM (93% accuracy) and means a lot in terms of preforming XLNet (72.36% accuracy). XLM-RoBERTa achieved 99.5% accuracy in Russian, 98% in English, 96% in Arabic, and 85.9% in Spanish in single-language tests, which shows that it is strong in various languages. The findings reveal the usefulness of pretrained multilingual transformers to identify subtle cases of depression, which offers a dependable, language-independent approach to screening early cases of digital depression in mental-health monitoring systems in the real world.