Yesterday I released benchee 0.3.0! Benchee is a tool for (micro) benchmarking in elixir focussing on being simple, extensible and to provide you with good statistics. You can refer to the Changelog for detailed information about the changes. This post will look at the bigger changes and also give a bit of the why for the new features and changes.
Arguably the biggest feature in Benchee 0.3.0 is that it is now easy and built-in to configure multiple formatters for a benchmarking suite. This means that first the benchmark is run, and then multiple formatters are run on the benchmarking results. This way you can get both the console output and the corresponding csv file using BencheeCSV. This was a pain point for me before, as you could either get one or the other or you needed to use the more verbose API.
You can also see the new output/1 methods at work, as opposed to format/1 they also really do the output themselves. BencheeCSV uses a custom configuration options to know which file to write to. This is also new, as now formatters have access to the full benchmarking suite, including configuration, raw run times and function definitions. This way they can be configured using configuration options they define themselves, or a plugin could graph all run times if it wanted to.
Of course, formatters default to just the built-in console formatter.
Another big addition is parallel benchmarking. In Elixir, this just feels natural to have. You can specify a parallel key in the configuration and that tells Benchee how many tasks should execute any given benchmarking job in parallel.
Of course, if you want to see how a system behaves under load – overloading might be exactly what you want to stress test the system. And this was exactly the reason why Leon contributed this change back to Benchee:
I needed to benchmark integration tests for a telephony system we wrote – with this system the tests actually interfere with each other (they’re using an Ecto repo) and I wanted to see how far I could push the system as a whole. Making this small change to Benchee worked perfectly for what I needed 🙂
(Of course it makes me extremely happy that people found adjusting Benchee for their use case simple, that’s one of the main goals of Benchee. Even better that it was contributed back ❤ )
If you want to see more information and detail about “to benchmark in parallel or not” you can check the Benchee wiki. Spoiler alert: The more parallel benchmarks run, the slower they get to an acceptable degree until the system is overloaded (more tasks execute in parallel than there are CPU cores to take care of them). Also deviation skyrockets.
While the effect seems not to be very significant for parallel: 2 on my system, the default in Benchee remains parallel: 1 for the mentioned reasons.
Print configuration information
Partly also due to the parallel change, Benchee wil now print a brief summary of the benchmarking suite before executing it.
tobi@happy ~/github/benchee $ mix run samples/run_parallel.exs Benchmark suite executing with the following configuration: warmup: 2.0s time: 3.0s parallel: 2 Estimated total run time: 10.0s Benchmarking flat_map... Benchmarking map.flatten... Name ips average deviation median map.flatten 1268.15 788.55μs (±13.94%) 759.00μs flat_map 706.35 1415.72μs (±8.56%) 1419.00μs Comparison: map.flatten 1268.15 flat_map 706.35 - 1.80x slower
This was done so that when people share their benchmarks online one can easily see the configuration they ran it with. E.g. was there any warmup time? Was the amount of parallel tasks too high and therefore the results are that bad?
It also prints an estimated total run time (number of jobs * (warmup + time)), so you know if there’s enough time to go and get a coffee before a benchmark finishes.
Map instead of a list of tuples
What is also marked as a “breaking” change in the Changelog is actually not THAT breaking. The main data structure handed to Benchee.run was changed to a map instead of a list of tuples and all corresponding data structures changed as well (important for plugins to know).
It used to be a list of tuples because of the possibility that benchmarks with the same name would override each other. However, having benchmarks with the same name is nonsensical as you can’t discern their results in the output any way. So, this now feels like a much more fitting data structure.
The old main data structure of a list of tuples still works and while I might remove it, I don’t expect me to right now as all that is required to maintain it is 4 lines of code. This makes duplicated names no longer working the only real deprecation, although one might even call it a feature 😉