A Framework for Identifying Obfuscation Techniques applied to Android Apps using Machine Learning
Malicious app writers tend to employ code obfuscation techniques to prevent their malicious code from being easily reverse engineered and analyzed. In order to effectively analyze malicious Android apps, it is necessary to identify what code obfuscation technique is applied to the malicious apps. Existing studies have devised some approaches that identify app-level obfuscation. However, recent obfuscators can apply different obfuscation techniques on a class-by-class basis not on an app basis. In such a case, app-level obfuscation identification may be ineffective. In this paper, we propose a new framework to identify a class-level obfuscation technique used in Android apps. The proposed framework vectorizes the decompiled codes of each class of Android apps using a paragraph vector. Then the output vectors are fed to machine learning classifier to identify what obfuscation technique is applied to each class. We use four machine learning classifiers: Random Forest, AdaBoost, Extra Trees, and Linear SVM, and compare the performance of the classifiers for each obfuscation technique.