JoWUA

Volume 10 - Issue 4

A Framework for Identifying Obfuscation Techniques applied to Android Apps using Machine Learning

Minjae Park Dankook University, Yongin, Korea
parkminjae@dankook.ac.kr
Geunha You Dankook University, Yongin, Korea
geunhayou@dankook.ac.kr
Seong-je Cho Dankook University, Yongin, Korea
sjcho@dankook.ac.kr
Minkyu Park Konkuk University, Chungju, Korea
minkyup@kku.ac.kr
Sangchul Han Konkuk University, Chungju, Korea
schan@kku.ac.kr

DOI: 10.22667/JOWUA.2019.12.31.022

Keywords: Android app, Obfuscation technique, Class-level obfuscation, Machine learning

Abstract

Malicious app writers tend to employ code obfuscation techniques to prevent their malicious code from being easily reverse engineered and analyzed. In order to effectively analyze malicious Android apps, it is necessary to identify what code obfuscation technique is applied to the malicious apps. Existing studies have devised some approaches that identify app-level obfuscation. However, recent obfuscators can apply different obfuscation techniques on a class-by-class basis not on an app basis. In such a case, app-level obfuscation identification may be ineffective. In this paper, we propose a new framework to identify a class-level obfuscation technique used in Android apps. The proposed framework vectorizes the decompiled codes of each class of Android apps using a paragraph vector. Then the output vectors are fed to machine learning classifier to identify what obfuscation technique is applied to each class. We use four machine learning classifiers: Random Forest, AdaBoost, Extra Trees, and Linear SVM, and compare the performance of the classifiers for each obfuscation technique.

Date

December 2019

Page Number

22-30