JoWUA

Volume 12 - Issue 4

Behaviour-based Malware Detection in Mobile Android Platforms Using Machine Learning Algorithms

Andre Prata Ferreira Instituto de Telecomunicacoes, Universidade da Beira Interior, Covilha, Portugal
D1569@ubi.pt
Chetna Gupta Centro de Competencias em Cloud Computing, Universidade da Beira Interior, Covilha, Portugal
chetna.gupta@ubi.pt
Pedro R. M. Inacio Instituto de Telecomunicacoes, Universidade da Beira Interior, Covilha, Portugal, Centro de Competencias em Cloud Computing, Universidade da Beira Interior, Covilha, Portugal
inacio@di.ubi.pt
Mario M. Freire Instituto de Telecomunicacoes, Universidade da Beira Interior, Covilha, Portugal, Centro de Competencias em Cloud Computing, Universidade da Beira Interior, Covilha, Portugal
mario@di.ubi.pt

DOI: 10.22667/JOWUA.2021.12.31.062

Keywords: Behaviour-Based Malware Detection, Static Malware Detection, Android Platforms, Machine Learning Algorithms

Abstract

During the last few years, several approaches have been proposed for detection of Android malware Apps, each usually using its own dataset. Generating a representative Android malware dataset to evaluate malware detection approaches is a challenging task. Recently, the Canadian Institute for Cybersecurity released the CICAndMal2017 dataset, which includes recent and sophisticated Android samples spanning between five distinct categories: Adware, Ransomware, SMS malware, Scareware, and Benign. The best classification result obtained for this dataset was with a Precision of 95.3%, achieved with the Random Forest algorithm, using Permissions and Intents as static features. In this paper, we investigate the usage of nine machine learning algorithms to classify malware in the above mentioned dataset. The comparison of the obtained results is performed with the ones obtained with Random Forest, including performance evaluation (in terms of Precision, Recall, F-Measure, and Accuracy) and resource usage (in terms of execution time and CPU and memory consumption). Besides, we also investigate the use of a non-sliding Bag of System Calls algorithm with the above mentioned machine learning algorithms. It is shown that the Adaboost algorithm, using the Random Forest as a base estimator, leads to the best classification results with an Accuracy of 98.24%, a Precision of 99.31% (for malware), and an F1-Measure of 95.05% (for malware), at the cost of a larger execution time than Random Forest.

Date

December 2021

Page Number

62-88