Benchee 1.1.0 released + why did it take so long

Benchee 1.1.0 has finally hit After, well, almost 3 years. So, in this blog post we’ll dive into:

  1. What are the changes
  2. Why did it take so long, with some (significant) musings on Open Source and bugs as well as my approach to it

What does Benchee 1.1.0 Bring to the table

The star of the show certainly are the two new major features: reduction measurements and profiling! Then there is also a nasty bug that was squashed. Check out the Changelog for all.

Reduction Counting

Reductions joins execution time and memory consumption as the third measure Benchee can take. This one was kicked off way back when someone asked in our #benchee channel about adding this feature. What reductions are, is hard to explain. In short, it’s not very well defined but a “unit of work”. The BEAM uses them to keep track of how long a process has run. As the Beam Book puts it as follows:

BEAM solves this by keeping track of how long a process has been running. This is done by counting reductions. The term originally comes from the mathematical term beta-reduction used in lambda calculus.

The definition of a reduction in BEAM is not very specific, but we can see it as a small piece of work, which shouldn’t take too long. Each function call is counted as a reduction. BEAM does a test upon entry to each function to check whether the process has used up all its reductions or not. If there are reductions left the function is executed otherwise the process is suspended.

Beam Book, Chapter 5.3

This can help you, as it’s not affected by system load so you could make assumptions in your CI about performance. It’s not 1:1 but it helps. Of course, check out Benchee’s docs about it. Biggest shout out goes to Devon for implementing it.

You can simply specify reduction_time and there you go:

list = Enum.to_list(1..10_000)
map_fun = fn i -> [i, i * i] end
"flat_map" => fn -> Enum.flat_map(list, map_fun) end,
"map.flatten" => fn -> list |> |> List.flatten() end
reduction_time: 2
view raw bench.exs hosted with ❤ by GitHub
Operating System: Linux
CPU Information: AMD Ryzen 9 5900X 12-Core Processor
Number of Available Cores: 24
Available memory: 31.27 GB
Elixir 1.13.3
Erlang 24.2.1
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 2 s
parallel: 1
inputs: none specified
Estimated total run time: 18 s
Benchmarking flat_map …
Benchmarking map.flatten …
Name ips average deviation median 99th %
flat_map 3.52 K 283.95 μs ±10.98% 279.09 μs 500.28 μs
map.flatten 2.26 K 441.58 μs ±20.43% 410.51 μs 680.60 μs
flat_map 3.52 K
map.flatten 2.26 K – 1.56x slower +157.64 μs
Reduction count statistics:
Name Reduction count
flat_map 65.01 K
map.flatten 124.52 K – 1.92x reduction count +59.51 K
**All measurements for reduction count were the same**
view raw output hosted with ❤ by GitHub

It’s worth noting that reduction counts will differ between different elixir and erlang versions – as we often noticed in our own CI setup.

Profile after benchmarking

Another feature that I’d never imagined having in Benchee, but thanks to community suggestions (and implementation!) it came to be. This one in particular was even suggested by José Valim himself – chatting with him he asked if there were plans to include something like this as his workflow would often be:

1. benchmark to see results

2. profile to find improvement opportunities

3. improve code

4. Start again at 1.

Makes perfect sense, I just never thought of it. So, you can now say profile_after: true or even specify a specific profiler + options.

list = Enum.to_list(1..10_000)
map_fun = fn i -> [i, i * i] end
"flat_map" => fn -> Enum.flat_map(list, map_fun) end,
"map.flatten" => fn -> list |> |> List.flatten() end
profile_after: true
view raw benchmark.exs hosted with ❤ by GitHub
Operating System: Linux
CPU Information: AMD Ryzen 9 5900X 12-Core Processor
Number of Available Cores: 24
Available memory: 31.27 GB
Elixir 1.13.3
Erlang 24.2.1
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 14 s
Benchmarking flat_map …
Benchmarking map.flatten …
Name ips average deviation median 99th %
flat_map 3.51 K 284.55 μs ±13.79% 277.29 μs 557.14 μs
map.flatten 2.09 K 477.46 μs ±30.97% 410.71 μs 871.02 μs
flat_map 3.51 K
map.flatten 2.09 K – 1.68x slower +192.91 μs
Profiling flat_map with eprof…
Profile results of #PID<0.237.0>
Total 30004 100.0 6864 0.23
Enum.flat_map/2 1 0.00 0 0.00
anonymous fn/2 in :elixir_compiler_1.__FILE__/1 1 0.00 0 0.00
:erlang.apply/2 1 0.03 2 2.00
:erlang.++/2 10000 17.35 1191 0.12
anonymous fn/1 in :elixir_compiler_1.__FILE__/1 10000 30.29 2079 0.21
Enum.flat_map_list/2 10001 52.33 3592 0.36
Profile done over 6 matching functions
Profiling map.flatten with eprof…
Profile results of #PID<0.239.0>
Total 60007 100.0 9204 0.15 1 0.00 0 0.00
:lists.flatten/1 1 0.00 0 0.00
anonymous fn/2 in :elixir_compiler_1.__FILE__/1 1 0.01 1 1.00
List.flatten/1 1 0.01 1 1.00
:erlang.apply/2 1 0.02 2 2.00
anonymous fn/1 in :elixir_compiler_1.__FILE__/1 10000 16.17 1488 0.15
Enum."-map/2-lists^map/1-0-"/2 10001 26.81 2468 0.25
:lists.do_flatten/2 40001 56.98 5244 0.13
Profile done over 8 matching functions
view raw output hosted with ❤ by GitHub

We didn’t implement the profiling ourselves, but instead we rely on the builtin profiling tasks like this one. To make the feature fully compatible with hooks, I also had to send a small patch to elixir and so after_each hooks won’t work with profiling until it’s released. But, nobody uses hooks anyhow so, who cares? 😛

This feature made it in thanks to Pablo Costas, and his great work. I’m happy to highlight that not only did this contribution give us all a great Benchee feature, but also a friendship to boot. Oh, the wonders of Open Source. 💚

Measurement accuracy on Mac

Now to the least fun part about this release. There is a bugfix, a quite important one at that. Basically on Mac OS previous Benchee versions might report inaccurate results for very fast benchmarks (< 10 microseconds). There are many more musings in this issue, but basically we relied on the operating system clock returning times in a value that it can accurately measure in. Alas, OSX reports in nanoseconds but only has microsecond accuracy (leading to measurements being multiples of 1000). However, even the operating system clock reported nanosecond accuracy – so I even reported a bug on erlang/otp that was thankfully fixed in 22.2.

Fixing this was hard and stressful, which leads nicely into the next major section…

Why it took so long, perfectionism and open source

So, why did it take so long? I blogged earlier today about some of the things that held me back the past 1.5 years in “The Silence Between”. However, you can see that a lot of these features already landed in early 2020, so what gives?

The short answer is the bug above was hard to fix and I needed to fix it. The long answer is… well, long.

I think I could describe myself as a pragmatic perfectionist. I’m happy to implement an MVP, I constantly ask “Do we really need this?” or “Can we make this simpler and deliver it faster?”, but what I end up shipping I want to… well, almost need to be great for what we decided to ship. I don’t want to release with bugs, constant error notifications or barely anything tested. I can make lots of tradeoffs, as long as I decide on them like: Ok we’ll duplicate this code now, as we have no idea what a good abstraction might be and we don’t wanna lock ourselves in. But something misbehaving that I thought was sublime? Oh, the pain.

Why am I highlighting this? Well, Benchee reporting wrong results is frightening to me. Benchee has one core promise, and that promise is to measure your functions as accurately as possible. Also, in my opinion fixing critical bugs such as this one should have the highest priority. I can’t, for myself, justify working on Benchee while not working on that bug. I know, it’s not a great attitude and I should have released the features on main and just released the bug fix later. I do. But I felt like, all energy had to be spent on fixing that bug.

And working on that bug was hard. It’s a Mac only bug and I famously do not own or want to own a Mac. My partner owns one, but when I’m doing Open Source chances are she’s at her computer as well. And then, to investigate something like this, I need a couple of hours of interrupted time with no distractions on my mind as well. I might as well not even start otherwise. It certainly didn’t help that the bug randomly disappeared, when trying to look at it.

The problem that I did not have a Mac to fix this was finally solved when I started a new job, but then first the stress was too high and then my arms were injured (as mentioned in the other blog post). My arms finally got better and I had a good 4h+ to set aside to fix this bug. It can be kind of hard, to get that dedicated time but it’s absolutely needed for an intricate bug such as this one.

So, that’s the major reason it took so long. I mean, it involved finding a bug in Erlang itself. And, me working around that bug which is some code that well… was almost harder to write than the actual fix.

I would be amiss not to mention something else: It’s perfectly fine for Open Source project not to update! Sometimes, they are just done. Or the maintainers have more important things to do. I certainly consider Benchee “done” since 1.0 as it has all features I really wanted it to have. You see, reduction counting and profiler after are great features, but they are hardly essential.

Still, Benchee having a rather important bug for so long really made me feel guilty and bad. Even worse, because I didn’t fix the bug those great contributions from Devon and Pablo were never released. That’s another thing, that’s very important to me: Whoever takes the time to contribute should have a great experience and their contribution should be valued. The ultimate show of appreciation is releasing the feature they worked on is getting it released into people’s hands.

At times those negative feelings (“Oh no there is a bug” & “Oh no these great features lie around unreleased”) paradoxically lead me to stay away from Benchee even more since I felt bad about this state. Yes, it was only on mac and only affected benchmarks where individual function invocations took less than 10 microseconds. But still, that’s the perfectionist in me. This should be fixed within weeks, not 2.5 years. Most certainly, ready to ship features shouldn’t just chill on main for years. Release early, release often.

Anyhow, thanks for reading my musings on Open Source, responsibility, pragmatism and perfectionism. The bug is fixed now, the features are released and I’m happy. Who knows what’s next for Benchee.

Happy benchmarking!

The silence between

Sorry y’all, for the silence between.

What happened, why no new post in ~1.5 years? Well, the answer is quite simple: Life happened. First I was at a new job that was honestly quite stressful and took all my headspace, not leaving any space for blogging.

And then, I manged to injure my arms – how exactly I don’t know. It might have been a long time coming amplified by some unusually straining things I did that week. That was more than a year ago. I took 6 weeks off between jobs not doing anything hoping it will be better again soon. It wasn’t. Aside: Worst 6 week stay-cation of my life: at home, not programming, not writing & not playing games 😱 Of course, there was also Covid happening, welp.

Anyhow, I also did lots of physical therapy, saw specialists and did exercises every day for half an hour on a busy day to up to 4 hours+ on a free day. Good thing is, it was neither carpal nor cubital tunnel syndrome. I experimented with voice typing. Lots of stuff, this shit is scary.

I got a couple of breakthroughs – end of Summer 2021 my arms didn’t hurt more by the end of the week any more (as they did after a work week the entire time before). Before Christmas my pain got less to a degree where I could play games again.

And now? I still have varying degrees of pain, but now it’s mild pain and I feel like I can control it due to a variety of measures (braces, exercises, setup, it having gotten better). You gotta imagine, the first months of this I would sometimes get inexplicably strong pain jolting through my arm just because someone gave me an orange and I tried to hold it in my hand. Yeah, it was that bad.

So, I’m happy that I can do some gaming and some open source again. As well, as some writing. There are probably dozens of blog posts trapped in my head, half of which I have forgotten again. See, turns out I really do love software development. And so, while I was handicapped doing it or well was happy I could get to my day-to-day work but not in my free time, I thought about it – a lot.

So, let’s see how many posts I’ll manage to write and how much you’ll enjoy them.

A note on open source & responsibilities

With the last 1.5 years being mostly “I’m unable to do almost any open source or blogging”, I’m really happy that I changed my relationship to it. I used to feel guilty. I maintain big libraries, people depend on them. I need to fix bugs and have them be in great shape. That kinda feeling.

And well, I do still feel guilty. At the end of the day, I owe folks nothing. I provide free stuff. Take free stuff, fix free stuff and be happy. At the end of the day, my health and my life comes first. Doesn’t mean I don’t feel guilty sometimes or like I really need to fix something up, but it doesn’t eat me up and I’m fine. This would have been different

The expectation that maintainers are there to fix stuff whenever and face backlash when they don’t is what drives many people out of open source. I thankfully haven’t faced this backlash a lot, but it’s still a problem. Be better, everyone.

War in Ukraine

Now is a weird and arguably bad time to revive a tech blog. Russia’s unjust war of aggression on Ukraine, his threat of nukes and the unimaginable suffering of the Ukrainian people along with their bravery is on all our minds.

Well, I’m not sure if it is on yours but it for sure is on mine. This hits close to home for me. One of my closest friends was born in Ukraine. I have been to Ukraine for 3 years in a row pre-covid (2017-2019). I was planning to go again.

I gave talks at the wonderful Pivorak meetup in 2017 & 2019. Each time I had one of my closest friends with me and we had a little mini vacation in the wonderful city of Lviv that I want to visit again. Hell, I even have a favorite cafe in the city, I know the city and truly appreciate it. I’m not sure if calling many members of the Pivorak meetup (Anna, Oxana, Volodya, Anton…) “friends” is an exaggeration, but they’re definitely people that I’d excitedly run toward whenever I’d see them hug them and chat with them for as long as I could. And, I honestly don’t know if I’ll ever get that opportunity again. And… that is scary.

It was probably the best run ruby meetup I’ve ever seen, with curios, nice, humble and super active people. Hell, they even ran their own ruby learners workshop. They were shocked that people had forgotten the war in Ukraine and were afraid of Russia… little did we know.

I didn’t only see them in Lviv, some came to Berlin and I also saw them again at RubyC in Kyiv. Kyiv, where I walked across the Maidan square and past the memorials of the brave people who died during the Maidan protests, trying to pay my respect to each one of them.

So, with this background this is hard for me. I feel helpless as I can’t really help them. I don’t know what’s going to happen and what it will take for Putin to stop this war. Well, what I can do (and did!) and you can do as well is donate to support the Ukrainian people. If you’re reading this, you’re probably a software developer and have more dispensable income than most. Consider donating it through whichever means to organizations helping Ukraine. You can find donation links for instance here.

Sorry if this section is a bit more incoherent than usual, but I’m really lacking the words in any language to express how I feel.

So, with all that – why am I blogging/doing tech stuff? Honestly, I often don’t know how to help more. I’ll donate more, I’ll continue to speak up. I can’t think about the war 24/7, it’s hard and yes it’s privilege that I don’t have to think about it 24/7. As my current favorite author Brandon Sanderson said, we deal with these situations in our own way. He writes 5 extra novels in 2 years, I’ll focus some of my mind on open source and blogging. And also, I’ve been waiting for my arms to get better for so long – so I’ve been looking forward to this.

Before I’ll close this “short” interlude blog post, I have one more thing on my mind. Please do not confuse Putin & the Russian government with the Russian people.

The great Rubykon Benchmark 2020: CRuby vs JRuby vs TruffleRuby

It has been far too long, more than 3.5 years since the last edition of this benchmark. Well what to say? I almost had a new edition ready a year ago and then the job hunt got too intense and now the heat wave in Berlin delayed me. You don’t want your computer running at max capacity for an extended period, trust me.

Well, you aren’t here to hear about why there hasn’t been a new edition in so long, you’re here to read about the new edition! Most likely you’re here to look at graphs and see what’s the fastest ruby implementation out there. And I swear we’ll get to it but there’s some context to establish first. Of course, feel free to skip ahead if you just want the numbers.

Well, let’s do this!

What are we benchmarking?

We’re benchmarking Rubykon again, a Go AI written in Ruby using Monte Carlo Tree Search. It’s a fun project I wrote a couple of years back. Basically it does random playouts of Go games and sees what moves lead to a winning game building a tree with different game states and their win percentages to select the best move.

Why is this a good problem to benchmark? Performance matters. The more playouts we can do the better our AI plays because we have more data for our decisions. The benchmark we’re running starts its search from an empty 19×19 board (biggest “normal” board) and does 1000 full random playouts from there. We’ll measure how long that takes/how often we could do that in a minute. This also isn’t a micro benchmark, while remaining reasonable in size it looks at lots of different methods and access patterns.

Why is this a bad problem to benchmark? Most Ruby devs are probably interested in some kind of web application performance. This does no IO (which keeps the focus on ruby code execution, which is also good) and mainly deals with arrays. While we deal with collections all the time, rubykon also accesses a lot of array indexes all over, which isn’t really that common. It also barely deals with strings. Moreover, it does a whole lot of (pseudo-)random number generation which definitely isn’t a common occurrence. It also runs a relatively tight hot loop of “generate random valid move, play it, repeat until game over”, which should be friendly to JIT approaches.

What I want to say, this is an interesting problem to benchmark but it’s probably not representative of web application performance of the different ruby implementations. It is still a good indicator of where different ruby implementations rank performance wise.

It’s also important to note that this benchmark is single threaded – while it is a problem suited for parallelization I haven’t done so yet. Plus, single threaded applications are still typical for Ruby (due to the global interpreter lock in CRuby).

We’re also mainly interested in “warm” application performance i.e. giving them a bit of time to warm up and look at their peak performance. We’ll also look at the warmup times in a separate section though.

The competitors

Our competitors are ruby variants I could easily install on my machine and was interested in which brings us to:

  • CRuby 2.4.10
  • CRuby 2.5.8
  • CRuby 2.6.6
  • CRuby 2.7.1
  • CRuby 2.8.0-dev (b4b702dd4f from 2020-08-07) (this might end up being called Ruby 3 not 2.8)
  • truffleruby-1.0.0-rc16
  • truffleruby-20.1.0
  • jruby-
  • jruby-

All of those versions were current as of early August 2020. As usual doing all the benchmarking, graphing and writing has taken me some time so that truffleruby released a new version in the mean time, result shouldn’t differ much though.

CRuby (yes I still insist on calling it that vs. MRI) is mainly our base line as it’s the standard ruby interpreter. Versions that are capable of JITing (2.6+) will also be run with the –jit flag separately to show improvement (also referred to as MJIT).

TruffleRuby was our winner the last 2 times around. We’re running 20.1 and 1.0-rc16 (please don’t ask me why this specific version, it was in the matrix from when I originally redid this benchmarks a year ago). We’re also going to run both native and JVM mode for 20.1.

JRuby will be run “normally”, and with invokedynamic + server flag (denoted by “+ID”). We’re also gonna take a look at JDK 8 and JDK 14. For JDK 14 we’re also going to run it with a non default GC algorithm, falling back to the one used in JDK 8 as the new default is slower for this benchmark. Originally I also wanted to run with lots of different JVMs but as it stands I already recorded almost 40 different runs in total and the JVMs I tried didn’t show great differences so we’ll stick with the top performer of those I tried which is AdoptOpenJDK.

You can check all flags passed etc. in the benchmark script.

The Execution Environment

This is still running on the same Desktop PC that I did the first version of these benchmarks with – almost 5 years ago. In the meantime it was hit by a lot of those lovely intel security vulnerabilities though. It’s by no means a top machine any more.

The machine has 16 GB of RAM, runs Linux Mint 19.3 (based on Ubuntu 18.04 LTS) and most importantly an i7-4790 (3.6 GHz, 4 GHz boost) (which is more than 6 years old now).

tobi@speedy:~$ uname -a
Linux speedy 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
tobi@speedy:~$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 60
Model name: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Stepping: 3
CPU MHz: 3568.176
CPU max MHz: 4000,0000
CPU min MHz: 800,0000
BogoMIPS: 7200.47
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
view raw system_info hosted with ❤ by GitHub

All background applications were closed and while the benchmarks were running no GUI was active. They were run on hot Berlin evenings 😉

If you want to run these benchmarks yourself the rubykon repo has the instructions, with most of it being automated.

Timing wise I chose 5 minutes of warmup and 2 minutes of run time measurements. The (enormous) warmup time was mostly driven by behaviour observed in TruffleRuby where sometimes it would deoptimize even after a long warmup. So, I wanted to make sure everyone had all the time they needed to reach good “warm” performance.

Run Time Results

One more thing before we get to it: JRuby here ran on AdoptOpenJDK 8. Differences to AdoptOpenJDK 14 (and other JVMs) aren’t too big and would just clutter the graphs. We’ll take a brief look at them later.

If you want to take a look at all the data I gathered you can access the spreadsheet.

Iterations per Minute per Ruby implementation for running 1000 full playouts on a 19×19 board (higher is better).

Overall this looks more or less like the graphs from the last years:

  • CRuby is the baseline performance without any major jumps
  • JRuby with invokedynamic (+ID) gets a bit more than 2x the baseline performance of CRuby, invokedynamic itself makes it a lot faster (2x+)
  • TruffleRuby runs away with the win

What’s new though is the inclusion of the JIT option for CRuby which performs quite impressively and is only getting better. An 18% improvement on 2.6 goes up to 34% on 2.7 and tops out at 47% for 2.8 dev when looking at the JIT vs. non JIT run times of the same Ruby version. Looking at CRuby it’s also interesting that this time around “newer” CRuby performance is largely on par with not JITed JRuby performance.

The other thing that sticks out quite hugely are those big error bars on TruffleRuby 20. This is caused by some deoptimizations even after the long warmup. Portrayed here is a run where they weren’t as bad, even if they are worse performance was still top notch at 27 i/min overall though. It’s most likely a bug that these deoptimizations happen, you can check the corresponding issue. In the past the TruffleRuby always found a way to fix issues like this. So, the theoretical performance is a bit higher.

Another thing I like to look at is the relative speedup chart:

Speedup relative to CRuby 2.4.10 (baseline)

CRuby 2.4.10 was chosen as the “baseline” for this relative speedup chart mostly as a homage to Ruby 3×3 in which the goal was for Ruby 3 to be 3 times faster than Ruby 2.0. I can’t get Ruby < 2.4 to compile on my system easily any more and hence they are sadly missing here.

I’m pretty impressed with the JIT in Ruby 2.8: a speedup of over 60% is not to be scoffed at! So, as pointed out in the results above, I have ever rising hopes for it! JRuby (with invokedynamic) sits nice and comfortably at ~2.5x speedup which is a bit down from its 3x speedup in the older benchmarks. This might also be to the improved baseline of CRuby 2.4.10 versus the old CRuby 2.0 (check the old blog post for some numbers from then, not directly comparable though). TruffleRuby sits at the top thanks to the –jvm version with almost a 6x improvement. Perhaps more impressively it’s still 2.3 times faster than the fastest non TruffleRuby implementation. The difference between “native” and –jvm for TruffleRuby is also astounding and important to keep in mind should you do your own benchmarks.

What’s a bit baffling is that the performance trend for CRuby isn’t “always getting better” like I’m used to. The differences are rather small but looking at the small standard deviation (at most less than 1%) I’m rather sure of them. 2.5 is slower than 2.4, and 2.6 is faster than both 2.7 and 2.8.-dev. However, the “proper” order is established again when enabling the JIT.

If you’re rather interested in the data table you can still check out the spreadsheet for the full data, but here’s some of it inline:

Rubyi/minavg (s)stddev %relative speedup
2.6.6 –jit7.87.690.591.3903743315508
2.7.1 –jit8.646.950.291.54010695187166
2.8.0-dev –jit9.256.480.291.64884135472371
truffleruby-20.1.0 –jvm33.321.819.015.93939393939394
jruby- +ID14.
jruby- +ID13.854.330.442.46880570409982


Seems the JITing approaches are winning throughout, however such performance isn’t free. Conceptually, a JIT looks at what parts of your code are run often and then tries to further optimize (and often specialize) these parts of the code. This makes it a whole lot faster, this process takes time and work though.

The benchmarking numbers presented above completely ignore the startup and warmup time. The common argument for this is that in long lived applications (like most web applications) we spend the majority of time in the warmed up/hot state. It’s different when talking about scripts we run as a one off. I visualized and described the different times to measure way more in another post.

Anyhow, lets get a better feeling for those warmup times, shall we? One of my favourite methods for doing so is graphing the first couple of run times as recorded (those are all during the warmup phase):

Run times as recorded by iteration number for a few select Ruby implementations. Lower is faster/better.
Same data as above but as a line chart. Thanks to Stefan Marr for nudging me.

CRuby itself (without –jit) performs at a steady space, this is expected as no further optimizations are done and there’s also no cache or anything involved. Your first run is pretty much gonna be as fast as your last run. It’s impressive to see though that the –jit option is faster already in the first iteration and still getting better. What you can’t see in the graph, as it doesn’t contain enough run times and the difference is very small, is that the CRuby –jit option only reaches its peak performance around iteration 19 (going from ~6.7s to ~6.5s) which is quite surprising looking at how steady it seems before that.

TruffleRuby behaves in line with previous results. It has by far the longest warmup time, especially the JVM configuration which is in line with their presented pros and cons. The –jvm runtime configuration only becomes the fastest implementation by iteration 13! Then it’s faster by quite a bit though. It’s also noteworthy that for neither native nor JVM the time declines steadily. Sometimes subsequent iterations are slower which is likely due to the JIT trying hard to optimize something or having to deoptimize something. The random nature of Rubykon might play into this, as we might be hitting edge cases only at iteration 8 or so. While especially the first run time can be quite surprising, it’s noteworthy that during my years of doing these benchmarks I’ve seen TruffleRuby steadily improve its warmup time. As a datapoint, TruffleRuby 1.0.0-rc16 had its first 2 run times at 52 seconds and 25 seconds.

JRuby is very close to peak performance after one iteration already. Peak performance with invokedynamic is hit around iteration 7. It’s noteworthy that with invokedynamic even the first iteration is faster than CRuby “normal” and on par with the CRuby JIT implementation but in subsequent iterations gets much faster than them. The non invokedynamic version is very close to normal CRuby 2.8.0-dev performance almost the entire time, except for being slower in the first iteration.

For context it’s important to point out though that Rubykon is a relatively small application. Including the benchmarking library it’s not even 1200 lines of code long. It uses no external gems, it doesn’t even access the standard library. So all of the code is in these 1200 lines + the core Ruby classes (Array etc.) which is a far cry from a full blown Rails application. More code means more things to optimize and hence should lead to much longer warmup times than presented here.

JRuby/JVM musings

It might appear unfair that the results up there were run only with JDK 8. I can assure you, in my testing it sadly isn’t. I had hoped for some big performance jumps with the new JDK versions but I found no such thing. Indeed, it features the fastest version but only by a rather slim margin. It also requires switching up the GC algorithm as the new default performs worse at least for this benchmark.

Comparison JRuby with different options against AdoptOpenJDK 8 and 14

Performance is largely the same. JDK 14 is a bit faster when using both invokedynamic and falling back to the old garbage collector (+ParallelGC). Otherwise performance is worse. You can find out more in this issue. It’s curios though that JRuby 9.1 seems mostly faster than 9.2.

I got also quite excited at first looking at all the different new JVMs and thought I’d benchmark against them all, but it quickly became apparent that this was a typical case of “matrix explosion” and I really wanted for you all to also see these results unlike last year 😅 I gathered data for GraalVM and Java Standard Edition Reference Implementation in addition to AdoptOpenJDK but performance was largely the same and best at AdoptOpenJDK on my system for this benchmark. Again, these are in the spreadsheet.

I did one more try with OpenJ9 as it sounded promising. The results were so bad I didn’t even put them into the spreadsheet (~4 i/min without invokedynamic, ~1.5 i/min with invokedynamic). I can only imagine that either I’m missing a magic switch, OpenJ9 wasn’t built with a use case such as JRuby in mind or JRuby isn’t optimized to run on OpenJ9. Perhaps all of the above.

Final Thoughts

Alright, I hope this was interesting for y’all!

What did we learn? TruffleRuby still has the best “warm” performance by a mile, warmup is getting better but can still be tricky (–> unexpected slowdowns late into the process). The JIT for CRuby seems to get better continuously and has me a bit excited. CRuby performance has caught up to JRuby out of the box (without invokedynamic). JRuby with invokedynamic is still the second fastest Ruby implementation though.

It’s also interesting to see that every Ruby implementation has at least one switch (–jit, –jvm, invokedynamic) that significantly alters performance characteristics.

Please, also don’t forget the typical grain of salt: This is one benchmark, with one rather specific use case run on one machine. Results elsewhere might differ greatly.

What else is there? Promising to redo the benchmark next year would be something, but my experience tells me not to 😉

There’s an Enterprise version of GraalVM with supposedly good performance gains. Now, I won’t be spending money but you can evaluate it for free after registering. Well, if I ever manage to fix my Oracle login and get Oracle’s permission to publish the numbers I might (I’m fairly certain I can get that though 🙂 ). I also heard rumours of some CLI flags to try with TruffleRuby to get even better numbers 🤔

Finally, this benchmark has only looked at run times which is most often the most interesting value. However, there are other numbers that could prove interesting, such as memory consumption. These aren’t as easy to break down so neatly (or I don’t know how to). Showing the maximum amount of memory consumed during the measurement could be helpful though. As some people can tell you, with Ruby it can often be that you scale up your servers due to memory constraints not necessary CPU constraints.

I’d also be interested in how a new PC (planned purchase within a year!) affects these numbers.

So, there’s definitely some future work to be done here. Anything specific you want to see? Please let me know in the comments, via Twitter or however you like. Same goes for new graph types, mistakes I made or what not – I’m here to learn!

Guest on Parallel Passion Podcast

Hey everyone,

yes yes I should blog more. The world is just in a weird place right now affecting all of us and I hope you’re all safe & sound. I do many interesting things while freelancing, but sadly didn’t allocate the time to blog about them yet.

What did get the time to is go on a podcast with Parallel Passion – it’s a podcast with techies but focusses on their non direct tech hobbies. You can listen to it here:

We talked about many things but a lot of it went to user groups, public speaking, Go (game), social/people skills and we even dove into philosophy!

1 hour is only so much time though, so we didn’t manage to cover books, comics, movies, series and gaming.

Well, hope you enjoy the podcast 🙂

Stay safe and healthy everyone!

Slides: Stories in Open Source

Yesterday at RUG::B I tried something I’d never done before: a more personal, story driven talk. And in order to adequately illustrate it and how different Open Source feel to me I also opted paint some very special drawings.

Open Source is something I’ve been doing for longer than people pay me to do programming. I’m immensely passionate about it and it felt like it was some kind of blind spot that I never gave a talk about it so far.

If you know of a conference this talk would be a good fit for, please let me know.

Anyhow, here are the slides to enjoy: Speaker Deck, SlideShare or PDF


What’s it like to work on Open Source projects? They’re all the same aren’t they? No, they’re not – the longer I worked on Open Source the more I realize how different the experience is for each one of them. Walk with me through some stories that happened to me in Open Source and let’s see what we can take away.

Slides: Elixir & Phoenix – Fast, Concurrent and Explicit (Øredev)

I had the great pleasure to finally speak at Øredev! I wanted to speak there for so long, not only because it’s a great conference but also because it’s in the city of Malmö. A city that I quite like and a city I’m happy to have friends in 🙂

Anyhow, all went well although I’d have loved to spread the word more.

And yes, at its basics it’s a presentation I gave a while ago but I reworked and repurposed it both for the audience and this day and age. Of course, it now also includes bunny pics 😉

Slides can be viewed here, on speaker deck, slideshare or PDF


Key takeaways
  • What are Elixir and Phoenix? What makes them standout among programming languages and frameworks?
  • Why would I want to use Functional Programming, what are the benefits and why does it work so well for the web?
  • How capable is Erlang (Whatsapp example) performance and reliability wise and why would I consider it for a project?
  • How does explicitness help in system design?

Elixir and Phoenix are known for their speed, but that’s far from their only benefit. Elixir isn’t just a fast Ruby and Phoenix isn’t just Rails for Elixir. Through pattern matching, immutable data structures and new idioms your programs can not only become faster but more understandable and maintainable. This talk will take a look at what’s great, what you might miss and augment it with production experience and advice.

Video & Slides: Functioning Among Humans (Heart of Clojure)

Back in July I had a great time at Heart of Clojure – the first conference who finally allowed me to share my thoughts on the importance of people skills and important people skills. And they were so nice to even record it, so here it is!

Slides can be viewed here, on speaker deck, slideshare and PDF.


sketch notes by my friend @malweene


In the development world most people are striving for technical excellence: better code, faster run times, more convenient interfaces, better databases… But is that really what helps us create better software?

In the end software development is done by groups of people creating products together. To do that communication and collaboration are essential. You can be the best programmer ever, but if you can’t efficiently work with others what good does it do you?

This talk will introduce you to relevant, easy to grasp concepts of collaboration and communication as well as give you food for thought.

On Going Freelance

At the end of a lengthy job search I decided to become a freelancer helping companies onboard onto Elixir, helping them with their development projects and processes, some performance work, pushing Open Source and maybe even a bit of interim CTOing or other consulting. Who knows what the future will hold? Right now I’m on a project until the end of October to help a company realize their first Elixir project, so mostly mentoring and coaching.

As I think that reflection is important (hence Retrospectives are the only constant!) I wanted to write a bit about why I decided to take the freelance route:


I like to take (big) breaks between jobs. I’d also love to get some Open Source funding to do Open source full time. Both are hard while working full time, as you want to stay at your job for a prolonged time. It’s not exactly easy in most jobs to say “Hey can I take a 6 month leave because I got this great Open Source fund?”, especially not if you work in a leadership position. Should I discover freelancing isn’t for me it’s also easier to get back into full time employment than the other way around.

Freelancing gives me some of this flexibility. If I already earned enough money I can decide to take a month or more off (although it seems really expensive to do so). I can apply to Open Source funds – in fact I just did last Saturday and am anxiously awaiting the result as I’d love to push a vital part of the Ruby eco system to 1.0 🤞

It also gives me the flexibility to help people with smaller projects. I get approached semi frequently asking if I know of a freelancer to do X and X might be very interesting. Now I can say that I can do X myself, and in fact I’m already throwing ideas around together with a friend. Which leads me to my next point:


While I engage with communities, run the Ruby User Group Berlin, do open source and give presentations because it’s fun to me and I want these things to exist it has the positive side effect of being well connected. My big hope is that I have to spend less time doing client acquisition and can get either more paid time or free time. I also have a variety of freelancer friends whom I always wanted to work with so also keeping my fingers crossed that I might get to work with some of them 💚

Special Knowledge at Use

I happen to have some relatively specialized knowledge and combination of skills that I’d like to use more. For instance, I love performance optimizations and think there’s a market for bringing in freelancers to make your application faster and teach the team how to do this (especially with big Rails applications 😉 ). Other things I love are elixir and teaching. During my interviews I also heard of so many failed elixir introduction projects that I thought: Hey, people need some help adopting elixir! I like elixir, I like coaching/teaching, I like helping people = perfect match?

In fact, that’s exactly what I’m doing right now!

A good project to kick-start things

I was lucky enough that through my network I already had a standing offer for a 3 month project to help a company build their first elixir project. That’s something I really wanted to do and the people at the company were genuinely nice and excited. So you know – I wanted to do it so let’s try it out! “Worst” case, I do this one project and then get back to full time employment.

Choices, Choices, Choices…

I had a bunch of offers and good interviews for a variety of positions. In the end it was always hard for all of the points I mentioned in my post to come together. The project was great but I had concerns about diversity or diversity was great but I had concerns about the project or things are good but the position wasn’t what I wanted or we couldn’t agree about salary & vacations… you get the picture. I know a 100% fit is hard to achieve but in the end you can’t fault me for trying to achieve it, right?

Sometimes the timing also just didn’t work out – the freelance offer had a set deadline on when the project had to start so I couldn’t even finish interviewing with some promising positions as I decided to do (at least) this project.

Don’t get me wrong – there are really good positions out there and I’m still thinking about doing a follow up blog post highlighting some of the cool companies I interviewed with. For me the prospect of freelancing and potentially doing open source work just seemed more tempting at the time. That said, I already lost a CTO position I really liked because I decided to wait for an open source fund that I didn’t get. Let’s just hope that story doesn’t repeat itself 😉

So will you be freelancing forever now?

Maybe? I don’t know. I like it for now (well, a month in..) and if I manage to get the Open Source funding I’ll be ecstatic as I’ll essentially be paid for my hobby and do something good.

However, there are several things I’ll miss about full time employment, most importantly:

  • Building and evolving a team long term & really being part of a team of people you know well
  • Building and evolving processes long term
  • Seeing the long term impact of work and decisions
  • Form a deep understanding of product, processes, market, competitors etc

Of course there are also aspects to freelancing that aren’t ideal, I don’t believe anyone really enjoys doing their taxes, invoicing etc. but that all comes with it. Plus, the risk is all yours – if you can’t find a project you won’t get paid, if you’re sick you’re not getting paid.

For now I enjoy being a freelancer and I’m looking forward to the different projects that’ll hopefully come my way. But for how long? We’ll see 😉

Of course you can help me by hiring or recommending me 🤗

Looking for a job!


I’m no longer looking for a job, or at least not really. I decided to go freelancing for now and I have a project until the end of October if nothing fails. So if you’ve got new freelance projects or you have a really great CTO/VP/Head of position still please feel free to contact me 😉

(decision to be explained in more detail in a further post)

Original post for historical reasons:

It’s that time: I’m looking for a job! You can find my CV and all relevant links over at my website: CV Web, CV PDF

Quick links that might interest you: website, github, twitter and blog

Who am I and why would you want to hire me?

My name is Tobi, but online I’m better known as PragTob. I am a full-stack engineer & leader deeply interested in agile methodologies, web technologies, software crafting and teaching. I love creating products that help people and am fascinated with the human side of software development.

To have a look at some of the technical topics that I’m involved with you can check out this blog but the core technologies I’ve worked with are Ruby, Elixir, PostgreSQL and JavaScript. I enjoy pair programming, mentoring, TDD, naming discussions and clean code bases. I also have a passion for performance and eliminating bottlenecks. I maintain a variety of open source projects mostly in Ruby (f.ex. simplecov) and Elixir (f.ex. benchee).

While I have this technical background, I believe that the so called “soft”/people/social skills are more important for developers as we all work together in teams. That is even more vital when looking at any kind of lead or even management position, as mentoring, communicating and collaborating is at the heart of what people in these positions should do. I’m deeply interested in topics such as motivation and communication. In my last job I was responsible for a team of ~15 people (together with the CTO). We relied on constant feedback through retrospectives to adapt our processes and grow the team through introducing the concept of seasons, going remote first, enhancing our onboarding and many more improvements. I also did regular one-on-ones, interviewed candidates, facilitated retrospectives and other meetings as well as doing knowledge sharing sessions for the whole team.

Other than these I’ve been mentoring at Rails Girls Berlin/code curious for more than 7 years and am coaching my Rails Girls project group, the rubycorns, for more than 6 years. I also run the Ruby User Group Berlin, and give presentations at various conferences.

My CV goes into more detail about what I’ve done.

What am I looking for?

I’m looking for a company where we build something useful together and where I can have an impact on product, people and process. With more concrete points this amounts to:

Position: I’m primarily looking for lead positions with responsibility so CTO/VP of Engineering/Head of Engineering/Team Lead – all depending on company size and culture. In the right circumstances, I can also imagine working as a Senior Developer. I want to be in an environment where my impact goes beyond code as I love to help people and improve processes. I like to be where I can help the team & the company the most, sometimes that’s mentoring and sharing knowledge, sometimes that’s fixing critical performance bugs and sometimes that’s participating in an all day workshop at the headquarters of a potential client.

Location: Berlin or potentially remote. I’m not looking to relocate currently.

Employment/Freelance: I’m looking for full time employment (reduced hours would also be ok) but am also open to freelance work.

Field of Work: I’m looking for a company that helps people solve real problems. I want a purpose I can get behind. For me that means no crypto currencies, advertisement or fintech for the rich. In a similar vein I’m not particularly interested in consultancy/agency work as I don’t like to travel every week and really like to work at a product company where I can really dive into the domain.

Diversity: I believe that the best products and work environments are created by diverse teams. Hence, this should be a core value applied through all levels of an organization. Or at least, the problem should be recognized and active work to counter it be put forth.

Company Culture: I’m looking for a company that trusts its employees and is open. It should also support a sustainable pace. Regular overtime is nothing to be proud of. Knowledge sharing should be key and therefore asking questions should be encouraged.

Time Allocation: I love companies that trust their employees and allow them flexible working hours and locations. Meaning it’s ok to work remotely for a couple of days or from home. Historically, I did some of my best work checking out emails and pull requests at home in the morning and then biking to work afterwards. Beloved refactorings have also emerged while dog sitting.

Benefits: Giving people time and room to grow is important to me. Particularly I usually speak at a couple of conferences during a year and think going there should be work, not vacation time.

Languages: I have expert knowledge of Ruby and Elixir. Working with Elixir would be a plus, but not a must. I also like working with JavaScript and am not afraid to do CSS albeit my CSS has gotten rather rusty sadly. I’m also naturally open to other languages, while I’m certainly most effective in the mentioned language the right company with the right culture and purpose is more important. Particularly Rust would be very interesting to me, granted I’m not too good at it (yet).

Open Source: An employer that values contributing back to open source or sharing their own creations would be a big plus, even more so if it happened to be full OSS work.

I’m aware that it’s hard to tick all of these boxes, and if you did hey we might be a great match. I just wanted to communicate wishes and values clearly as a starting point.

Getting in touch

You can send me an email at – otherwise my Twitter direct messages are also open.

Please don’t cold call me 🙂