Effects of Code Obfuscation on Android App Similarity Analysis
Code obfuscation is a technique to transform a program into an equivalent one that is harder to be reverse engineered and understood. On Android, well-known obfuscation techniques are shrinking, optimization, renaming, string encryption, control flow transformation, etc. On the other hand, adversaries may also maliciously use obfuscation techniques to hide pirated or stolen software. If pirated software were obfuscated, it would be difficult to detect software theft. To detect illegal software transformed by code obfuscation, one possible approach is to measure software similarity between original and obfuscated programs and determine whether the obfuscated version is an illegal copy of the original version. In this paper, we analyze empirically the effects of code obfuscation on Android app similarity analysis. The empirical measurements were done on five different Android apps with DashO obfuscator. Experimental results show that similarity measures at bytecode level are more effective than those at source code level to analyze software similarity.