Microbenchmarking on Android
Since Kotlin becomes more and more popular, especially amongst Android developers (and it’s officially supported by Google), some people decided to compare its runtime performance with Java. After reading a few articles I wanted to test it myself and now I’m ready to share some of my observations and experiences.
What is it about?
Microbenchmarking is just micro-scale benchmarking :-) It’s about measuring the performance of something really “small” that may take just micro- or nanoseconds, like calling a function or iterating over a collection.
I really recommend to read this wiki page on GitHub as it summarizes a few very important aspects of microbenchmarking, including its fallibility and some possible reasons why you could ever consider doing it. It also explains why you should actually avoid writing microbenchmarks as there are only a few reasonable excuses for doing it (mostly when you develop a performance-critical library or framework).
How do I start?
Writing a microbenchmark can be as simple as running some piece of code in a loop and measuring the time. You can also use one of the existing frameworks which have some useful features like the possibility to easily configure the way your benchmarks will be executed.
Unfortunately, if you are willing to perform the tests on Android, you can’t use probably the most advanced library, JMH, as it uses some part of the Java API not available in Android API. This fact might discourage you from using Android platform for your benchmarks but you should bear in mind ART and Dalvik have significantly different characteristics from the JVM so optimizing code for JVM may be pessimizing it for Android (take a look at this commit in Gson library).
The simplest way to measure the execution time of some code block is to call it large number of times in a loop like below:
In this example we benchmark the
addConst function. If you are new to microbenchmarking, I bet you have a few questions.
First of all, we need multiple
addConst calls for a few reasons, for example because
System.nanoTime() doesn’t really need to be that accurate as you might think. The documentation says it clearly:
This method provides nanosecond precision, but not necessarily nanosecond resolution (that is, how frequently the value changes) - no guarantees are made except that the resolution is at least as good as that of currentTimeMillis().
Another reason is to get more confident results as the test environment (the smartphone) is very complex and it surely will take different amount of time for each execution.
It is also important to keep in mind that
System.nanoTime() might be relatively expensive in terms of execution time so we definitely should NOT put it inside the same loop as the measured function, unless all we want to measure is
System.nanoTime() itself (been there, done that :-) ). So this is another reason why we need to call the desired function multiple times - to compensate
Another aspect is that we need a function that has some observable effects, e.g. returning a result that is accumulated and then printed out. Otherwise, the compiler could cut out the code that was meant to be measured (see: Could a compiler possibly optimize your benchmark away?).
Another way to benchmark the code is to use a dedicated framework like Spanner. It’s an Android-oriented, Caliper-like library which can make this task easier. Despite its alpha-ish state, it’s quite usable and worth trying out.
Putting configuration matters aside, the benchmark function may look like this:
As you can see, the framework gives us the required repetitions count as a parameter and it’s our job to actually run the code in a loop.
Side note: the current Spanner version (as of Feb 2, 2018) requires benchmark classes’ and methods’ modifiers to be exacly
java.lang.reflect.Modifier.PUBLIC so you can’t run final-by-default Kotlin code without additional
open modifier. That’s why I use my forked version with this behavior changed accordingly.
Spanner can also upload your benchmark results to https://microbenchmarks.appspot.com either anonymously or with a given API key.
What could possibly go wrong?
It’s very easy to get horribly misleading results, especially if you don’t follow some rules when writing microbenchmarks. For example, this article by Julien Page shows how benchmarks may be optimized by the JVM so that the results become meaningless. A few criteria of a good benchmark have also been defined on this Caliper wiki page. And in this example Cédric Champeau proves why it’s so important to use the right tool for measurements (like JMH).
Of course there are more issues that may happen, e.g.:
- garbage collection occurring during the measurements,
- temporarily increased CPU usage by other apps,
- JIT compiler making your code run faster each time,
- unavoidable CPU throttling.
Microbenchmarking is an interesting topic and I’m glad I could practice it myself and learn about it. As I’ve already mentioned, it’s not something you should do on your daily basis, but still, I think it’s worth reading about it and trying it out just to expand your horizons.