Volume 11 - Issue 4
Towards Detecting and Classifying Malicious URLs Using Deep Learning
- Clayton Johnson
Colorado Mesa University, Grand Junction, CO 81501, USA
cpjohnson@mavs.coloradomesa.edu
- Bishal Khadka
Colorado Mesa University, Grand Junction, CO 81501, USA
bkhadka@mavs.coloradomesa.edu
- Ram B. Basnet
Colorado Mesa University, Grand Junction, CO 81501, USA
rbasnet@coloradomesa.edu
- Tenzin Doleck
University of Southern California, Los Angeles, CA 90007, USA
doleck@usc.edu
Keywords: Malicious URLs, Phishing URLs, Deep Learning, Web Security, Machine Learning
Abstract
Emails containing Uniform Resource Locators (URLs) pose substantial risks to organizations by potentially
compromising both credentials and network security through general and spear-phishing
campaigns to their employees. The detection and classification of malicious URLs is an important
research problem with practical applications. With an appropriate machine learning model, an organization
may protect itself by filtering incoming emails and the websites its employees are visiting
based on the maliciousness of URLs contained in emails and web pages. In this work, we compare
the performance of traditional machine learning algorithms, such as Random Forest, CART,
and kNN against popular deep learning framework models, such as Fast.ai and Keras-TensorFlow
across CPU, GPU, and TPU architectures. Using the publicly available ISCX-URL-2016 dataset,
we present the models’ performances across binary and multiclass classification experiments. By
collecting accuracy and timing metrics, we find that Random Forest, Keras-TensorFlow, and Fast.ai
models performed comparably and with the highest accuracies > 96% in both the detection and
classification of malicious URLs, with Random Forest as the preferable model based on time, performance,
and complexity constraints. Additionally, by ranking and using feature selection techniques,
we determine that the top 5-10 features provide the best performances compared to using all the features
provided in the dataset.