A Unified Framework for Real-Time Intrusion Detection and Mitigation using Apache Spark
Ammar Ahmed AbdullahAssistant Lecturer, Department of Computer Science, College of Computer Science and Mathematics, University of Mosul, Mosul, Iraq. ammar.20ba1175@student.uomosul.edu.iq0009-0004-2110-2767
Dr. Dhuha Basheer AbdullahProfessor, Department of Computer Science, College of Computer Science and Mathematics, University of Mosul, Mosul, Iraq. prof.dhuha_basheer@uomosul.edu.iq0000-0001-9003-2943
The traditional intrusion detection systems (IDS) are facing challenges with data volume and complexity. Detection of cyber events in real time and surveillance of significant data systems are also major challenges, which raise the need for a new IDS with cybersecurity systems in software-defined networks (SDN) that detect, classify, and mitigate low-latency threats in data streams. This will be the first fully integrated, new SDN architecture with real-time machine learning for automated and instantaneous threat response built on the integrated Spark framework. This integrated eclipse framework will operate on a synthetic data set of 10 million records to simulate a structured traffic stream where Decision Trees, Logistic Regression, Random Forest, and Multilayer Perceptron will be used to train on four classifiers that will recognize and respond to the threats in the traffic stream. Compared to single-node Spark implementations, these models proved the framework's validation, as the Spark Logistic Regression model achieved an accuracy of 99.96% and 640,000+ records processed per second, showing a double speedup vs classical implementation, while the Spark Random Forest model achieved a 40 times speedup with the same AUC. All the models were memory-constrained to 11 GB, showing suitability for today's commodity hardware. Steady-state tests validated the system's defensive capabilities as 99.9% of the attacks were blocked while the SDN controller CPU was utilized 42% less. The framework's unified architecture, performance on commodity hardware, and the real-time telemetry, fairness, and explanation features showcased telemetry data performance and gap-filling in the state of the art.