A main topic of my work is Android security, mostly from an attacker’s view. Android devices are ubiquitous and the operating system has a big market share, so it is one of the most interesting systems to research. Since I also do pentesting, I often look at Android applications from a security standpoint, where developers often do serious mistakes, which can lead to data breaches, misuse and more.
To inspect an app, you often take two approaches: static and dynamic analysis. Static means that you investigate the binary itself, the program code and the metadata that comes with it, while dynamic means investigating the execution of the code. In order to get a holistic view of the app, you can not limit yourself to one of these approaches, since each yields different insights.
As an example, static analysis can tell much about the circumstances an app will run:
- Which devices are targeted?
- What are the software dependencies?
- Does it contain suspicious code?
Meanwhile, a dynamic analysis might answer the questions:
- What does the app actually do?
- Is suspicious code executed?
- What data is sent and processed?
In this post, I will be looking at static approaches specific to Android applications, but similar methods are used for other systems.
The Android Package Kit APK
An Android binary, the actual file that is installed on a device, is a ZIP archive that was renamed to the extension apk
. Therefore, if you talk about an app, you actually mean a whole bunch of files packed into one single executable. This APK has a number of properties:
- It contains all the application code that is loaded from
.dex
files (more on that later) - It contains all the resources (images, icons, layouts) that are used inside the app
- It contains the app’s configuration, which includes permissions
- It is cryptographically signed such that modifications to the package are detected by the OS
- Often it contains information about the build process
In most cases, whatever happens in an app is somehow included in the APK file. Now, you can start analyzing the package by extracting all files, but you will notice that some files are compressed or otherwise obfuscated. For example, the AndroidManifest.xml file, which contains essential information about an app in XML notation, is in a binary format and unreadable. A similar problem is encountered with classes.dex
files, which contain the actual bytecode to be executed by the operating system (more precisely: the Dalvik Virtual Machine) on a device.
A short note on the DEX format: dex files contain the binary code to be run by Dalvik, Android’s implementation of the Java Virtual Machine. So they are very similar to .class
files in Java (and can also be converted to actual Java bytecode).
So the first step of static analysis is to make these files readable, and we are going to use Apktool for this.
Where to get the APK
You might ask how to even get the APK of an app. First of all, if you use the Google Play Store, there is no option to download an APK file. If you are interested in downloading them directly from Google, search for some “Play Store crawler” software out there, there are some on GitHub.
If you want to pull the APK from an installed app, check this Stack Overflow answer.
If you are targeting a specific app, check alternative app stores for it. I do not have any suggestions here, simply turn on your favourite search engine and type “APK” to find downloads. Note that there are not many trustworthy app stores out there, so never install an APK on your real device, or you might infect your phone.
A little more trustworthy is F-Droid, which also allows to download APK files and the apps on there are Free and Open-Source, so there is some way to check the source code if you are unsure.
Apktool
Apktool is very simple to use, but also very powerful. It has two main functionalities: decode and build.
Decoding is for making APK data readable (and modifiable), while building is for transforming the decoded files into an APK file again. For this, Apktool makes use of smali, a disassembler for DEX files. This turns the actual Dalvik bytecode into an intermediate representation (smali code) that can be modified. With baksmali you can reassemble smali code into DEX files, effectively allowing modifications to app code.
Disassembling an APK s pretty simple:
apktool decode -o output_dir application.apk
Here is an example output of smali code of a MainActivity class, as generated by Apktool:
.class public Lcom/example/ui/MainActivity;
.super Lcom/example/ui/BaseActivity;
.source "Application"
# interfaces
.implements Lx/lz;
.implements Lx/ma;
.implements Lx/mo;
.implements Lx/mp;
# instance fields
.field private m:Ljava/util/HashMap;
.annotation system Ldalvik/annotation/Signature;
value = {
"Ljava/util/HashMap<",
"Ljava/lang/Integer;",
"Lorg/apache/cordova/CordovaPlugin;",
">;"
}
.end annotation
.end field
Since this notation is easier to read and modify for humans compared to bytecode, it is the preferred format for app analysis. You could do more, like using a decompiler to generate actual Java code, but decompilation is often considered unethical and some countries even outlaw it. More often than not, you have license agreements that disallow decompilation, so doing it nonetheless is a breach of contract and an analyst might get into legal trouble for that. If you are pentesting a mobile application, you should get explicit permission for this or even ask for the actual source code, depending on the situation.
So, once Apktool has done its job, you get a file structure like this:
output_dir/
├── AndroidManifest.xml
├── apktool.yml
├── assets
├── lib
├── original
├── res
├── smali
├── smali_classes2
└── unknown
Where to look next
In the next step, you would start to check the Manifest file, to learn which activities and services are launched first. From there, you can make your way through the rest of the smali code. Alternatively, you could check the resources folder, which often contains configuration files and strings.
Be First to Comment