Volume 4 - Issue 1
Comparative Analysis of Voting Schemes for Ensemble-based Malware Detection
- Raja Khurram Shahzad
School of Computing Blekinge Institute of Technology SE-37179, Karlskrona, Sweden
rks@bth.se
- Niklas Lavesson
School of Computing Blekinge Institute of Technology SE-37179, Karlskrona, Sweden
niklas.lavesson@bth.se
Keywords: Malware detection, scareware, veto voting, feature extraction, classification, majority voting, ensemble, trust, malicious software
Abstract
Malicious software (malware) represents a threat to the security and the privacy of computer users.
Traditional signature-based and heuristic-based methods are inadequate for detecting some forms of
malware. This paper presents a malware detection method based on supervised learning. The main
contributions of the paper are two ensemble learning algorithms, two pre-processing techniques, and
an empirical evaluation of the proposed algorithms. Sequences of operational codes are extracted
as features from malware and benign files. These sequences are used to create three different data
sets with different configurations. A set of learning algorithms is evaluated on the data sets. The
predictions from the learning algorithms are combined by an ensemble algorithm. The predicted
outcome of the ensemble algorithm is decided on the basis of voting. The experimental results show
that the veto approach can accurately detect both novel and known malware instances with the higher
recall in comparison to majority voting, however, the precision of the veto voting is lower than the
majority voting. The veto voting is further extended as trust-based veto voting. A comparison of
the majority voting, the veto voting, and the trust-based veto voting is performed. The experimental
results indicate the suitability of each voting scheme for detecting a particular class of software. The
experimental results for the composite F1-measure indicate that the majority voting is slightly better
than the trusted veto voting while the trusted veto is significantly better than the veto classifier.