Volume 12 - Issue 4
Behaviour-based Malware Detection in Mobile Android Platforms Using Machine Learning Algorithms
- Andre Prata Ferreira
Instituto de Telecomunicacoes, Universidade da Beira Interior, Covilha, Portugal
D1569@ubi.pt
- Chetna Gupta
Centro de Competencias em Cloud Computing, Universidade da Beira Interior, Covilha, Portugal
chetna.gupta@ubi.pt
- Pedro R. M. Inacio
Instituto de Telecomunicacoes, Universidade da Beira Interior, Covilha, Portugal, Centro de Competencias em Cloud Computing, Universidade da Beira Interior, Covilha, Portugal
inacio@di.ubi.pt
- Mario M. Freire
Instituto de Telecomunicacoes, Universidade da Beira Interior, Covilha, Portugal, Centro de Competencias em Cloud Computing, Universidade da Beira Interior, Covilha, Portugal
mario@di.ubi.pt
Keywords: Behaviour-Based Malware Detection, Static Malware Detection, Android Platforms, Machine Learning Algorithms
Abstract
During the last few years, several approaches have been proposed for detection of Android malware
Apps, each usually using its own dataset. Generating a representative Android malware dataset to
evaluate malware detection approaches is a challenging task. Recently, the Canadian Institute for Cybersecurity
released the CICAndMal2017 dataset, which includes recent and sophisticated Android
samples spanning between five distinct categories: Adware, Ransomware, SMS malware, Scareware,
and Benign. The best classification result obtained for this dataset was with a Precision of 95.3%,
achieved with the Random Forest algorithm, using Permissions and Intents as static features. In this
paper, we investigate the usage of nine machine learning algorithms to classify malware in the above
mentioned dataset. The comparison of the obtained results is performed with the ones obtained with
Random Forest, including performance evaluation (in terms of Precision, Recall, F-Measure, and
Accuracy) and resource usage (in terms of execution time and CPU and memory consumption). Besides,
we also investigate the use of a non-sliding Bag of System Calls algorithm with the above
mentioned machine learning algorithms. It is shown that the Adaboost algorithm, using the Random
Forest as a base estimator, leads to the best classification results with an Accuracy of 98.24%, a Precision
of 99.31% (for malware), and an F1-Measure of 95.05% (for malware), at the cost of a larger
execution time than Random Forest.