Volume 10 - Issue 4
A Framework for Identifying Obfuscation Techniques applied to Android Apps using Machine Learning
- Minjae Park
Dankook University, Yongin, Korea
parkminjae@dankook.ac.kr
- Geunha You
Dankook University, Yongin, Korea
geunhayou@dankook.ac.kr
- Seong-je Cho
Dankook University, Yongin, Korea
sjcho@dankook.ac.kr
- Minkyu Park
Konkuk University, Chungju, Korea
minkyup@kku.ac.kr
- Sangchul Han
Konkuk University, Chungju, Korea
schan@kku.ac.kr
Keywords: Android app, Obfuscation technique, Class-level obfuscation, Machine learning
Abstract
Malicious app writers tend to employ code obfuscation techniques to prevent their malicious code
from being easily reverse engineered and analyzed. In order to effectively analyze malicious Android
apps, it is necessary to identify what code obfuscation technique is applied to the malicious apps.
Existing studies have devised some approaches that identify app-level obfuscation. However, recent
obfuscators can apply different obfuscation techniques on a class-by-class basis not on an app basis.
In such a case, app-level obfuscation identification may be ineffective. In this paper, we propose a
new framework to identify a class-level obfuscation technique used in Android apps. The proposed
framework vectorizes the decompiled codes of each class of Android apps using a paragraph vector.
Then the output vectors are fed to machine learning classifier to identify what obfuscation technique
is applied to each class. We use four machine learning classifiers: Random Forest, AdaBoost, Extra
Trees, and Linear SVM, and compare the performance of the classifiers for each obfuscation technique.