Reexamining FizzBuzz Step by Step – and allowing for more varied rules

Last week I found myself at my old RailsGirls/code curious project group the rubycorns coaching a beginner through the FizzBuzz coding challenge. It was a lot of fun and I found myself itching to implement it again myself as I came up with some ideas about a nice solution given a requirement for arbitrary or changing rules to the game.

I’ve also been working on blog posts helping people interview processes, the next of which will be about Technical Challenges/Code challenges (due to be published tomorrow! edit: Published now!). This is a little extension for that blog post, as an example of going through and improving a coding challenge.

To be clear, I don’t endorse FizzBuzz as a coding challenge. In my opinion something closer to your domain is much more valuable. However, it is (probably) the most well known coding challenge so I wanted to examine it a bit. It is also deceptively simple, and so deserves some consideration.

So, in this blog post let’s start with what FizzBuzz is and then let’s iteratively go through writing and improving it. Towards the end we’ll also talk about possible extensions of the task and how to deal with them. The examples here will be in Ruby, but are easy to transfer to any other programming language. If you’re only here for the code, you can check out the repo.

The FizzBuzz Challenge

The challenge, inspired by a children’s game, originated from the blog post “Using FizzBuzz to Find Developers who Grok Coding” by Imran Ghory back in 2007 – the intent being to come up with a question as simple as possible to check if people can write code.

The problem goes as follows:

Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.

Simple enough, right? Well, I think it actually checks for some interesting properties and is a good basis for a conversation. To get started, let’s check out a basic solution.

Basic Solution

1.upto(100) do |number|
if (number % 3 == 0) && (number % 5 == 0)
puts "FizzBuzz"
elsif number % 3 == 0
puts "Fizz"
elsif number % 5 == 0
puts "Buzz"
else
puts number
end
end

That one does the job perfectly fine. It prints out the numbers as requested. It also helps to illustrate some of the difficulties with the challenge:

First off, printing out the results is interesting as it is notoriously hard to test (while possible given the correct helpers). It should push a good programmer towards separating the output concern from the business logic concern. So, a good solution should usually feature a separate function fizz_buzz(number) that given any number number either returns the number itself or ”FizzBuzz” etc. according to the rules. This is wonderfully easy to test, but many struggle initially to make that separation of concerns. We’ll get to that in a second.

What I don’t like about the challenge is that it requires knowledge about the modulo operator, as it isn’t commonly used, but you need it to check whether a number is actually divisible. Anyhow, there will be a lot of code like: number % 3 == 0. To avoid repetition and instead speak in the language of the domain it’s much better to extract this functionality into a function divisible_by?(number, divisor). Makes the code read nicer and removes inherent duplication.

Speaking of the division, the order in which you check the conditions (namely, being divisible by 3 and 5, or 15, before the individual checks) is crucial for the program to work and I’ve seen more than one senior engineer stumble upon this.

With that in mind, let’s improve the challenge!

We need to go back

First off, while the previous solution is “perfectly” fine, I’d probably never write it as it’s hard to test – it’s just a script to run and everything is printed to the console. And since I’m a TDD kind of person, that won’t do! I usually start, as teased before, by just implementing a function fizz_buzz(number) – that’s the core of the business logic. The iteration and printing out are just secondary aspects to me, so let’s start there:

module FizzBuzz
module_function
def fizz_buzz(number)
if (number % 3 == 0) && (number % 5 == 0)
"FizzBuzz"
elsif number % 3 == 0
"Fizz"
elsif number % 5 == 0
"Buzz"
else
number
end
end
end
view raw fizz_buzz.rb hosted with ❤ by GitHub
RSpec.describe FizzBuzz do
describe ".fizz_buzz" do
expected = {
1 => 1,
2 => 2,
3 => "Fizz",
4 => 4,
5 => "Buzz",
6 => "Fizz",
11 => 11,
15 => "FizzBuzz",
20 => "Buzz",
60 => "FizzBuzz",
98 => 98,
99 => "Fizz",
100 => "Buzz"
}
expected.each do |input, output|
it "for #{input} expect #{output}" do
expect(FizzBuzz.fizz_buzz(input)).to eq output
end
end
end
end

Much better, and it’s tested! You may think that the test generation from the hash is overdone, but I love how easy it is to adjust and modify test cases. No ceremony, I just add an input and an expected output. Also, yes – tests. When solving a coding challenge tests should usually be a part of it unless you’re explicitly told not to. Testing is an integral skill after all.

Ok, let’s make it a full solution.

Full FizzBuzz

Honestly, all that is required to turn it into a full FizzBuzz solution is a simple loop and output. What is a bit fancier is the integration test I added to go along with it:

module FizzBuzz
# …
def run
1.upto(100) do |number|
puts fizz_buzz(number)
end
end
end
view raw runner.rb hosted with ❤ by GitHub
describe ".run" do
full_fizz_buzz = <<~FIZZY
1
2
Fizz
.. much more …
98
Fizz
Buzz
FIZZY
it "does a full run integration style" do
expect { FizzBuzz.run() }.to output(full_fizz_buzz).to_stdout
end
end
view raw runner_spec.rb hosted with ❤ by GitHub

Simple isn’t it? You may argue that the integration test is too much, but when I can write an integration test as easy as this I prefer to do it. When I write code that generates files, like PAIN XML, I also love to have a full test that makes sure when given the same inputs we get the same outputs. This has the helpful side effect that even minor changes become very apparent in the pull request diff.

Anyhow, the other thing that we see is that our separation of concerns with the fizz_buzz(number) function pushed us to here is that there is only a single puts statement. We separated the output concern from the business logic. Should we want to change the output – perhaps it should be uploaded to an SFTP sever – then there is a single point for us to adjust that in. Of course we could also use dependency injection to make the method of delivery easier to change, but without a hint that we might need it this is likely overdone. Especially since I’d still want to keep the full integration test we just wrote, programs have a tendency to break in the most unanticipated ways.

Keeping up with the Domain

The other thing I complained about initially was the usage of the modulo operator. Truth be told, I only wrote the initial version like this for demonstration purposes – that one has to go. I don’t want to think about what “modulo a number equals 0” means. Whether or not something is divisible is something I understand and can work with. It also removes a fair bit of duplication:

module FizzBuzz
module_function
def fizz_buzz(number)
fizz = divisible_by?(number, 3)
buzz = divisible_by?(number, 5)
if fizz && buzz
"FizzBuzz"
elsif fizz
"Fizz"
elsif buzz
"Buzz"
else
number
end
end
def run
1.upto(100) do |number|
puts fizz_buzz(number)
end
end
def divisible_by?(number, divisor)
number % divisor == 0
end
end
view raw fizz_buzz.rb hosted with ❤ by GitHub

Much better! There is a small optimization here, where we only check the divisibility twice instead of 4 times. It doesn’t fully matter, but when I see the exact same code being run twice in a method and I can remove it without impacting readability I love to do it.

I consider this a good solution. However, now is where the fun of many coding challenges starts – what extension points are there?

Extension Points

Many coding challenges have potential extension points baked in. Some extra features, or, what I almost prefer dealing with, uncertainty and how that might affect software design.

How certain are we that we need the first 100 numbers? How certain are we that we want to print it out on the console? Is there a possibility that we’ll get a 3rd number like 7 and how would that work? How certain are we that it’s the numbers 3, 5 and the words Fizz and Buzz?

Depending on the answers to these question you could go, adjust the challenge to make these easier to change. And with answers I don’t mean you making them up, but in case of a live coding challenge you talking to your interviewers to see what they think. A common case for instance would be that the numbers and strings may change while we’re certain it will be 2 numbers and 2 distinct strings – “product is still trying to figure out the exact numbers and text and they might change in the future as we’re experimenting in the space”.

Let’s run with that for now – we think it will always be 2 numbers but we’re not sure what 2 numbers and we’re also not sure about Fizz and Buzz. What do we do now?

Going Constants

One of the easiest solutions to this is to extract the relevant values to constants or even into a config.

module FizzBuzz
module_function
FIZZ_NUMBER = 3
FIZZ_TEXT = "Fizz"
BUZZ_NUMBER = 5
BUZZ_TEXT = "Buzz"
FIZZ_BUZZ_TEXT = FIZZ_TEXT + BUZZ_TEXT
def fizz_buzz(number)
fizz = divisible_by?(number, FIZZ_NUMBER)
buzz = divisible_by?(number, BUZZ_NUMBER)
if fizz && buzz
FIZZ_BUZZ_TEXT
elsif fizz
FIZZ_TEXT
elsif buzz
BUZZ_TEXT
else
number
end
end
def run
1.upto(100) do |number|
puts fizz_buzz(number)
end
end
def divisible_by?(number, divisor)
number % divisor == 0
end
end
view raw fizz_buzz.rb hosted with ❤ by GitHub

Right, so that got a lot longer and frankly also a bit more confusing. However, it is now immediately apparent where to change the values. That said, we also kept the “FizzBuzz” naming for the constants which may get extra confusing if we changed the text to something like “Zazz”. It’s always a tradeoff, the previous version was definitely more readable. We did get rid of “Magic numbers” and “Magic Strings”. Sadly, due to the nature of the challenge, they are also still very magical as there is no inherent reasoning to them 😅

Something bugs me with this solution though, and that’s the reason why I actually went and implemented it myself again (and wrote this post): There is an obvious relation between FIZZ_NUMBER and FIZZ_TEXT but from a code point of view they are completely separate data structures only held together by naming conventions and their usage together.

Working With Rules

Ideally we’d want a data structure to hold both the text to be outputted and the number that triggers it together. There’s a gazillion ways you could go about this. You could simply use maps, arrays or tuples to hold that data together. As we’re doing Ruby right now, I decided to create an object that holds the rule and can apply itself to a rule – either returning its configured text or nil.

module FizzBuzz
module_function
class Rule
def initialize(output, applicalbe_divisible_by)
@output = output
@applicalbe_divisible_by = applicalbe_divisible_by
end
def apply(number)
@output if divisible_by?(number, @applicalbe_divisible_by)
end
def divisible_by?(number, divisor)
number % divisor == 0
end
end
DEFAULT_RULES = [
Rule.new("Fizz", 3),
Rule.new("Buzz", 5)
].freeze
def fizz_buzz(number, rules = DEFAULT_RULES)
applied_rules = rules.filter_map { |rule| rule.apply(number) }
if applied_rules.any?
applied_rules.join
else
number
end
end
def run(rules = DEFAULT_RULES)
1.upto(100) do |number|
puts fizz_buzz(number, rules)
end
end
end
view raw fizz_buzz.rb hosted with ❤ by GitHub

Now, that’s quite different! Does it work? Well, yes – and the beauty of it is that so far we haven’t altered our API at all so all of these were internal refactorings that work with exactly the same set of tests. Yes, technically this adds additional optional parameters, we’ll get to these later 😉

The most interesting thing about this is that the rule of “if it is divisible by both do X” is gone. Turns out, that rule isn’t really needed:

  • if it is divisible by 3 add “Fizz” to the output
  • if it is divisible by 5 add “Buzz” to the output
  • if it is divisible by neither, just print out the number itself

That works just as well. It means that the order in which we store rules in our DEFAULT_RULES array matters (so we don’t end up with “BuzzFizz”). Of course we need to verify this with our stakeholders/interviewers, but let’s say they agree here. Now, can we allow for even more flexibility with the rules?

Flexible Rules

Now our “stakeholders” might come back and say well, you know what we’re not so sure about just having 2 numbers and their respective outputs. It may be more, it may be less! The good news? This already completely works with our implementation above. However, we should still test it. You can take a look at all the test I wrote over here, I’ll just post the tests here for adding a 3rd rule: 7 and Zazz!

context "with Zazz into the equation" do
zazz_rules = [
FizzBuzz::Rule.new("Fizz", 3),
FizzBuzz::Rule.new("Buzz", 5),
FizzBuzz::Rule.new("Zazz", 7)
]
expected = {
1 => 1,
3 => "Fizz",
5 => "Buzz",
7 => "Zazz",
15 => "FizzBuzz",
21 => "FizzZazz",
35 => "BuzzZazz",
105 => "FizzBuzzZazz"
}
expected.each do |input, output|
it "for #{input} expect #{output}" do
expect(described_class.fizz_buzz(input, zazz_rules)).to eq output
end
end
end
view raw zazz_spec.rb hosted with ❤ by GitHub

As per usual, testing all the different edge cases here is a fun exercise:

  • What is the behavior if no rules are passed?
  • What happens if the exact same rule is applied twice?
  • Can I change the order to make it “BuzzFizz”?

What I also like about this solution is that the rules aren’t actually hard coded but are injected from the outside. That allows us to easily test our system against many different rule configurations. It also allows us to run many different rule configurations in the same system – each user of our FizzBuzz platform could have their own settings for FizzBuzz!

Closing Out

That’s quite a bit of changes we went through here. I want to stress, that you shouldn’t start with the more complex solution as it is definitely harder to understand. You Ain’t Gonna Need It.However, if there are actual additional requirements or credible uncertainty it might be worth it. I ended up implementing it because I wanted to translate that relationship between “3” and “Fizz” explicitly into a data structure, as it feels like an inherent part of the domain. The other properties of it were almost just a by-product.

You can check out the whole code on github and let me know what you think.

Of course, we haven’t handled all extension points here:

  • The range of 1 to 100 is still hard coded, however that is easily remedied if necessary (by passing the range in as a parameter)
  • Our choice of the implementation of Rule hard coded the assumption that rules only ever check the divisibility by a number. We couldn’t easily produce a rule that says “for every odd number output ‘Odd'”. Instead of providing Rule with just a number, we could instead use an anonymous function to make it even more configurable. However, don’t ever forget you ain’t gonna need it – even in a coding challenge. Overengineering is also often bad, hence the conversation with your stakeholders/interviewers is important as for what are “sensible” extension points.
  • The output mechanism is still just puts, as mentioned earlier dependency injecting an object with a specific output method instead would make that more configurable and quite easily so.

Naturally, naming is important and could see some improvement here. However, with completely made up challenges having good naming becomes nigh impossible.

Anyhow, I hope you enjoyed this little journey through FizzBuzz and that it may have been helpful to you. I certainly enjoyed writing that solution 😁

Edit: Earlier versions of the code samples featured bitwise-and (&) instead of the proper and operator (&&) – both work in this context (hence the tests passed) but you should definitely be using &&. Thanks to my friend Jesse Herrick for pointing that out. Yes the featured image is still wrong. I won’t fix it.

The great Rubykon Benchmark 2020: CRuby vs JRuby vs TruffleRuby

It has been far too long, more than 3.5 years since the last edition of this benchmark. Well what to say? I almost had a new edition ready a year ago and then the job hunt got too intense and now the heat wave in Berlin delayed me. You don’t want your computer running at max capacity for an extended period, trust me.

Well, you aren’t here to hear about why there hasn’t been a new edition in so long, you’re here to read about the new edition! Most likely you’re here to look at graphs and see what’s the fastest ruby implementation out there. And I swear we’ll get to it but there’s some context to establish first. Of course, feel free to skip ahead if you just want the numbers.

Well, let’s do this!

What are we benchmarking?

We’re benchmarking Rubykon again, a Go AI written in Ruby using Monte Carlo Tree Search. It’s a fun project I wrote a couple of years back. Basically it does random playouts of Go games and sees what moves lead to a winning game building a tree with different game states and their win percentages to select the best move.

Why is this a good problem to benchmark? Performance matters. The more playouts we can do the better our AI plays because we have more data for our decisions. The benchmark we’re running starts its search from an empty 19×19 board (biggest “normal” board) and does 1000 full random playouts from there. We’ll measure how long that takes/how often we could do that in a minute. This also isn’t a micro benchmark, while remaining reasonable in size it looks at lots of different methods and access patterns.

Why is this a bad problem to benchmark? Most Ruby devs are probably interested in some kind of web application performance. This does no IO (which keeps the focus on ruby code execution, which is also good) and mainly deals with arrays. While we deal with collections all the time, rubykon also accesses a lot of array indexes all over, which isn’t really that common. It also barely deals with strings. Moreover, it does a whole lot of (pseudo-)random number generation which definitely isn’t a common occurrence. It also runs a relatively tight hot loop of “generate random valid move, play it, repeat until game over”, which should be friendly to JIT approaches.

What I want to say, this is an interesting problem to benchmark but it’s probably not representative of web application performance of the different ruby implementations. It is still a good indicator of where different ruby implementations rank performance wise.

It’s also important to note that this benchmark is single threaded – while it is a problem suited for parallelization I haven’t done so yet. Plus, single threaded applications are still typical for Ruby (due to the global interpreter lock in CRuby).

We’re also mainly interested in “warm” application performance i.e. giving them a bit of time to warm up and look at their peak performance. We’ll also look at the warmup times in a separate section though.

The competitors

Our competitors are ruby variants I could easily install on my machine and was interested in which brings us to:

  • CRuby 2.4.10
  • CRuby 2.5.8
  • CRuby 2.6.6
  • CRuby 2.7.1
  • CRuby 2.8.0-dev (b4b702dd4f from 2020-08-07) (this might end up being called Ruby 3 not 2.8)
  • truffleruby-1.0.0-rc16
  • truffleruby-20.1.0
  • jruby-9.1.17.0
  • jruby-9.2.11.1

All of those versions were current as of early August 2020. As usual doing all the benchmarking, graphing and writing has taken me some time so that truffleruby released a new version in the mean time, result shouldn’t differ much though.

CRuby (yes I still insist on calling it that vs. MRI) is mainly our base line as it’s the standard ruby interpreter. Versions that are capable of JITing (2.6+) will also be run with the –jit flag separately to show improvement (also referred to as MJIT).

TruffleRuby was our winner the last 2 times around. We’re running 20.1 and 1.0-rc16 (please don’t ask me why this specific version, it was in the matrix from when I originally redid this benchmarks a year ago). We’re also going to run both native and JVM mode for 20.1.

JRuby will be run “normally”, and with invokedynamic + server flag (denoted by “+ID”). We’re also gonna take a look at JDK 8 and JDK 14. For JDK 14 we’re also going to run it with a non default GC algorithm, falling back to the one used in JDK 8 as the new default is slower for this benchmark. Originally I also wanted to run with lots of different JVMs but as it stands I already recorded almost 40 different runs in total and the JVMs I tried didn’t show great differences so we’ll stick with the top performer of those I tried which is AdoptOpenJDK.

You can check all flags passed etc. in the benchmark script.

The Execution Environment

This is still running on the same Desktop PC that I did the first version of these benchmarks with – almost 5 years ago. In the meantime it was hit by a lot of those lovely intel security vulnerabilities though. It’s by no means a top machine any more.

The machine has 16 GB of RAM, runs Linux Mint 19.3 (based on Ubuntu 18.04 LTS) and most importantly an i7-4790 (3.6 GHz, 4 GHz boost) (which is more than 6 years old now).

tobi@speedy:~$ uname -a
Linux speedy 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
tobi@speedy:~$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 60
Model name: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Stepping: 3
CPU MHz: 3568.176
CPU max MHz: 4000,0000
CPU min MHz: 800,0000
BogoMIPS: 7200.47
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
view raw system_info hosted with ❤ by GitHub

All background applications were closed and while the benchmarks were running no GUI was active. They were run on hot Berlin evenings 😉

If you want to run these benchmarks yourself the rubykon repo has the instructions, with most of it being automated.

Timing wise I chose 5 minutes of warmup and 2 minutes of run time measurements. The (enormous) warmup time was mostly driven by behaviour observed in TruffleRuby where sometimes it would deoptimize even after a long warmup. So, I wanted to make sure everyone had all the time they needed to reach good “warm” performance.

Run Time Results

One more thing before we get to it: JRuby here ran on AdoptOpenJDK 8. Differences to AdoptOpenJDK 14 (and other JVMs) aren’t too big and would just clutter the graphs. We’ll take a brief look at them later.

If you want to take a look at all the data I gathered you can access the spreadsheet.

Iterations per Minute per Ruby implementation for running 1000 full playouts on a 19×19 board (higher is better).

Overall this looks more or less like the graphs from the last years:

  • CRuby is the baseline performance without any major jumps
  • JRuby with invokedynamic (+ID) gets a bit more than 2x the baseline performance of CRuby, invokedynamic itself makes it a lot faster (2x+)
  • TruffleRuby runs away with the win

What’s new though is the inclusion of the JIT option for CRuby which performs quite impressively and is only getting better. An 18% improvement on 2.6 goes up to 34% on 2.7 and tops out at 47% for 2.8 dev when looking at the JIT vs. non JIT run times of the same Ruby version. Looking at CRuby it’s also interesting that this time around “newer” CRuby performance is largely on par with not JITed JRuby performance.

The other thing that sticks out quite hugely are those big error bars on TruffleRuby 20. This is caused by some deoptimizations even after the long warmup. Portrayed here is a run where they weren’t as bad, even if they are worse performance was still top notch at 27 i/min overall though. It’s most likely a bug that these deoptimizations happen, you can check the corresponding issue. In the past the TruffleRuby always found a way to fix issues like this. So, the theoretical performance is a bit higher.

Another thing I like to look at is the relative speedup chart:

Speedup relative to CRuby 2.4.10 (baseline)

CRuby 2.4.10 was chosen as the “baseline” for this relative speedup chart mostly as a homage to Ruby 3×3 in which the goal was for Ruby 3 to be 3 times faster than Ruby 2.0. I can’t get Ruby < 2.4 to compile on my system easily any more and hence they are sadly missing here.

I’m pretty impressed with the JIT in Ruby 2.8: a speedup of over 60% is not to be scoffed at! So, as pointed out in the results above, I have ever rising hopes for it! JRuby (with invokedynamic) sits nice and comfortably at ~2.5x speedup which is a bit down from its 3x speedup in the older benchmarks. This might also be to the improved baseline of CRuby 2.4.10 versus the old CRuby 2.0 (check the old blog post for some numbers from then, not directly comparable though). TruffleRuby sits at the top thanks to the –jvm version with almost a 6x improvement. Perhaps more impressively it’s still 2.3 times faster than the fastest non TruffleRuby implementation. The difference between “native” and –jvm for TruffleRuby is also astounding and important to keep in mind should you do your own benchmarks.

What’s a bit baffling is that the performance trend for CRuby isn’t “always getting better” like I’m used to. The differences are rather small but looking at the small standard deviation (at most less than 1%) I’m rather sure of them. 2.5 is slower than 2.4, and 2.6 is faster than both 2.7 and 2.8.-dev. However, the “proper” order is established again when enabling the JIT.

If you’re rather interested in the data table you can still check out the spreadsheet for the full data, but here’s some of it inline:

Rubyi/minavg (s)stddev %relative speedup
2.4.105.6110.690.861
2.5.85.1611.630.270.919786096256684
2.6.66.619.080.421.17825311942959
2.6.6 –jit7.87.690.591.3903743315508
2.7.16.459.30.251.14973262032086
2.7.1 –jit8.646.950.291.54010695187166
2.8.0-dev6.289.560.321.11942959001783
2.8.0-dev –jit9.256.480.291.64884135472371
truffleruby-1.0.0-rc1616.553.632.192.95008912655971
truffleruby-20.1.020.222.9725.823.60427807486631
truffleruby-20.1.0 –jvm33.321.819.015.93939393939394
jruby-9.1.17.06.529.210.631.16221033868093
jruby-9.1.17.0 +ID14.274.20.292.54367201426025
jruby-9.2.11.16.339.490.541.1283422459893
jruby-9.2.11.1 +ID13.854.330.442.46880570409982

Warmup

Seems the JITing approaches are winning throughout, however such performance isn’t free. Conceptually, a JIT looks at what parts of your code are run often and then tries to further optimize (and often specialize) these parts of the code. This makes it a whole lot faster, this process takes time and work though.

The benchmarking numbers presented above completely ignore the startup and warmup time. The common argument for this is that in long lived applications (like most web applications) we spend the majority of time in the warmed up/hot state. It’s different when talking about scripts we run as a one off. I visualized and described the different times to measure way more in another post.

Anyhow, lets get a better feeling for those warmup times, shall we? One of my favourite methods for doing so is graphing the first couple of run times as recorded (those are all during the warmup phase):

Run times as recorded by iteration number for a few select Ruby implementations. Lower is faster/better.
Same data as above but as a line chart. Thanks to Stefan Marr for nudging me.

CRuby itself (without –jit) performs at a steady space, this is expected as no further optimizations are done and there’s also no cache or anything involved. Your first run is pretty much gonna be as fast as your last run. It’s impressive to see though that the –jit option is faster already in the first iteration and still getting better. What you can’t see in the graph, as it doesn’t contain enough run times and the difference is very small, is that the CRuby –jit option only reaches its peak performance around iteration 19 (going from ~6.7s to ~6.5s) which is quite surprising looking at how steady it seems before that.

TruffleRuby behaves in line with previous results. It has by far the longest warmup time, especially the JVM configuration which is in line with their presented pros and cons. The –jvm runtime configuration only becomes the fastest implementation by iteration 13! Then it’s faster by quite a bit though. It’s also noteworthy that for neither native nor JVM the time declines steadily. Sometimes subsequent iterations are slower which is likely due to the JIT trying hard to optimize something or having to deoptimize something. The random nature of Rubykon might play into this, as we might be hitting edge cases only at iteration 8 or so. While especially the first run time can be quite surprising, it’s noteworthy that during my years of doing these benchmarks I’ve seen TruffleRuby steadily improve its warmup time. As a datapoint, TruffleRuby 1.0.0-rc16 had its first 2 run times at 52 seconds and 25 seconds.

JRuby is very close to peak performance after one iteration already. Peak performance with invokedynamic is hit around iteration 7. It’s noteworthy that with invokedynamic even the first iteration is faster than CRuby “normal” and on par with the CRuby JIT implementation but in subsequent iterations gets much faster than them. The non invokedynamic version is very close to normal CRuby 2.8.0-dev performance almost the entire time, except for being slower in the first iteration.

For context it’s important to point out though that Rubykon is a relatively small application. Including the benchmarking library it’s not even 1200 lines of code long. It uses no external gems, it doesn’t even access the standard library. So all of the code is in these 1200 lines + the core Ruby classes (Array etc.) which is a far cry from a full blown Rails application. More code means more things to optimize and hence should lead to much longer warmup times than presented here.

JRuby/JVM musings

It might appear unfair that the results up there were run only with JDK 8. I can assure you, in my testing it sadly isn’t. I had hoped for some big performance jumps with the new JDK versions but I found no such thing. Indeed, it features the fastest version but only by a rather slim margin. It also requires switching up the GC algorithm as the new default performs worse at least for this benchmark.

Comparison JRuby with different options against AdoptOpenJDK 8 and 14

Performance is largely the same. JDK 14 is a bit faster when using both invokedynamic and falling back to the old garbage collector (+ParallelGC). Otherwise performance is worse. You can find out more in this issue. It’s curios though that JRuby 9.1 seems mostly faster than 9.2.

I got also quite excited at first looking at all the different new JVMs and thought I’d benchmark against them all, but it quickly became apparent that this was a typical case of “matrix explosion” and I really wanted for you all to also see these results unlike last year 😅 I gathered data for GraalVM and Java Standard Edition Reference Implementation in addition to AdoptOpenJDK but performance was largely the same and best at AdoptOpenJDK on my system for this benchmark. Again, these are in the spreadsheet.

I did one more try with OpenJ9 as it sounded promising. The results were so bad I didn’t even put them into the spreadsheet (~4 i/min without invokedynamic, ~1.5 i/min with invokedynamic). I can only imagine that either I’m missing a magic switch, OpenJ9 wasn’t built with a use case such as JRuby in mind or JRuby isn’t optimized to run on OpenJ9. Perhaps all of the above.

Final Thoughts

Alright, I hope this was interesting for y’all!

What did we learn? TruffleRuby still has the best “warm” performance by a mile, warmup is getting better but can still be tricky (–> unexpected slowdowns late into the process). The JIT for CRuby seems to get better continuously and has me a bit excited. CRuby performance has caught up to JRuby out of the box (without invokedynamic). JRuby with invokedynamic is still the second fastest Ruby implementation though.

It’s also interesting to see that every Ruby implementation has at least one switch (–jit, –jvm, invokedynamic) that significantly alters performance characteristics.

Please, also don’t forget the typical grain of salt: This is one benchmark, with one rather specific use case run on one machine. Results elsewhere might differ greatly.

What else is there? Promising to redo the benchmark next year would be something, but my experience tells me not to 😉

There’s an Enterprise version of GraalVM with supposedly good performance gains. Now, I won’t be spending money but you can evaluate it for free after registering. Well, if I ever manage to fix my Oracle login and get Oracle’s permission to publish the numbers I might (I’m fairly certain I can get that though 🙂 ). I also heard rumours of some CLI flags to try with TruffleRuby to get even better numbers 🤔

Finally, this benchmark has only looked at run times which is most often the most interesting value. However, there are other numbers that could prove interesting, such as memory consumption. These aren’t as easy to break down so neatly (or I don’t know how to). Showing the maximum amount of memory consumed during the measurement could be helpful though. As some people can tell you, with Ruby it can often be that you scale up your servers due to memory constraints not necessary CPU constraints.

I’d also be interested in how a new PC (planned purchase within a year!) affects these numbers.

So, there’s definitely some future work to be done here. Anything specific you want to see? Please let me know in the comments, via Twitter or however you like. Same goes for new graph types, mistakes I made or what not – I’m here to learn!

Slides: Stories in Open Source

Yesterday at RUG::B I tried something I’d never done before: a more personal, story driven talk. And in order to adequately illustrate it and how different Open Source feel to me I also opted paint some very special drawings.

Open Source is something I’ve been doing for longer than people pay me to do programming. I’m immensely passionate about it and it felt like it was some kind of blind spot that I never gave a talk about it so far.

If you know of a conference this talk would be a good fit for, please let me know.

Anyhow, here are the slides to enjoy: Speaker Deck, SlideShare or PDF

Abstract

What’s it like to work on Open Source projects? They’re all the same aren’t they? No, they’re not – the longer I worked on Open Source the more I realize how different the experience is for each one of them. Walk with me through some stories that happened to me in Open Source and let’s see what we can take away.

Video & Slides: Do You Need That Validation? Let Me Call You Back About It

I had a wonderful time at Ruby On Ice! I gave a talk, that I loved to prepare to formulate the ideas the right way. You’ll see it focuses a lot on the problems, that’s intentional because if we’re not clear on the problems what good is a solution?

You can find the video along with awesome sketch notes on the Ruby on Ice homepage.

Anyhow, here are the slides: speakerdeck slideshare PDF

(in case you wonder why the first slide is a beer, the talk was given on Sunday Morning as the first talk after the party – welcoming people back was essential as I was a bit afraid not many would show up but they did!)

Abstract

Rails apps start nice and cute. Fast forward a year and business logic and view logic are entangled in our validations and callbacks – getting in our way at every turn. Wasn’t this supposed to be easy?

Let’s explore different approaches to improve the situation and untangle the web.

Slides: Where do Rubyists go?

I gave my first ever keynote yesterday at Ruby on Ice, which was a lot of fun. A lot of the talk is based on my “Where do Rubyists go?”-survey but also researching and looking into languages. The talk looks into what programming languages Ruby developers learn for work or in their free time, what the major features of those languages are and how that compares to Ruby. What does it tell us about Ruby and our community?

Slides can be viewed here or on speakerdeck, slideshare or PDF

Abstract

Many Rubyists branch out and take a look at other languages. What are similarities between those languages and ruby? What are differences? How does Ruby influence these languages?

Careful what you measure: 2.1 times slower to 4.2 times faster – MJIT versus TruffleRuby

Have you seen the MJIT benchmark results? Amazing, aren’t they? MJIT basically blows the other implementations out of the water! What were they doing all these years? That’s it, we’re done here right?

Well, not so fast as you can infer from the title. But before we can get to what I take issue with in these particular benchmarks (you can of course jump ahead to the nice diagrams) we gotta get some introductions and some important benchmarking basics out of the way.

MJIT? Truffle Ruby? WTF is this?

MJIT currently is a branch of ruby on github by Vladimir Makarov, GCC developer, that implements a JIT (Just In Time Compilation) on the most commonly used Ruby interpreter/CRuby. It’s by no means final, in fact it’s in a very early stage. Very promising benchmarking results were published on the 15th of June 2017, which are in major parts the subject of this blog post.

TruffleRuby is an implementation of Ruby on the GraalVM by Oracle Labs. It poses impressive performance numbers as you can see in my latest great “Ruby plays Go Rumble”. It also implements a JIT, is known to take a bit of a warmup but comes out being ~8 times faster than Ruby 2.0 in the previously mentioned benchmark.

Before we go further…

I have enormous respect for Vladimir and think that MJIT is an incredibly valuable project. Realistically it might be one of our few shots to get a JIT into mainstream ruby. JRuby had a JIT and great performance for years, but never got picked up by the masses (topic for another day).

I’m gonna critique the way the benchmarks were done, but there might be reasons for that, that I’m missing (gonna point out the ones I know). After all, Vladimir has been programming for way longer than I’m even alive and also knows more about language implementations than I do obviously.

Plus, to repeat, this is not about the person or the project, just the way we do benchmarks. Vladimir, in case you are reading this 💚💚💚💚💚💚

What are we measuring?

When you see a benchmark in the wild, first you gotta ask “What was measured?” – the what here comes in to flavors: code and time.

What code are we benchmarking?

It is important to know what code is actually being benchmarked, to see if that code is actually relevant to us or a good representation of a real life Ruby program. This is especially important if we want to use benchmarks as an indication of the performance of a particular ruby implementation.

When you look at the list of benchmarks provided in the README (and scroll up to the list what they mean or look at them) you can see that basically the top half are extremely micro benchmarks:

Selection_041.png

What’s benchmarked here are writes to instance variables, reading constants, empty method calls, while loops and the like. This is extremely micro, maybe interesting from a language implementors point of view but not very indicative of real world ruby performance. The day looking up a constant will be the performance bottle neck in Ruby will be a happy day. Also, how much of your code uses while loops?

A lot of the code (omitting the super micro ones) there isn’t exactly what I’d call typical ruby code. A lot of it is more a mixture of a script and C-code. Lots of them don’t define classes, use a lot of while and for loops instead of the more typical Enumerable methods and sometimes there’s even bitmasks.

Some of those constructs might have originated in optimizations, as they are apparently used in the general language benchmarks. That’s dangerous as well though, mostly they are optimized for one specific platform, in this case CRuby. What’s the fastest Ruby code on one platform can be way slower on the other platforms as it’s an implementation detail (for instance TruffleRuby uses a different String implementation). This puts the other implementations at an inherent disadvantage.

The problem here goes a bit deeper, whatever is in a popular benchmark will inevitably be what implementations optimize for and that should be as close to reality as possible. Hence, I’m excited what benchmarks the Ruby 3×3 project comes up with so that we have some new more relevant benchmarks.

What time are we measuring?

This is truly my favorite part of this blog post and arguably most important. For all that I know the time measurements in the original benchmarks were done like this: /usr/bin/time -v ruby $script which is one of my favorite benchmarking mistakes for programming languages commonly used for web applications. You can watch me go on about it for a bit here.

What’s the problem? Well, let’s analyze the times that make up the total time you measure when you just time the execution of a script: Startup, Warmup and Runtime.

Selection_043.png

  • Startup – the time until we get to do anything “useful” aka the Ruby Interpreter has started up and has parsed all the code. For reference, executing an empty ruby file with standard ruby takes 0.02 seconds for me, MJIT 0.17 seconds and for TruffleRuby it takes 2.5 seconds (there are plans to significantly reduce it though with the help of Substrate VM). This time is inherently present in every measured benchmark if you just time script execution.
  • Warmup – the time it takes until the program can operate at full speed. This is especially important for implementations with a JIT. On a high level what happens is they see which code gets called a lot and they try to optimize this code further. This process takes a lot of time and only after it is completed can we truly speak of “peak performance”. Warmup can be significantly slower than runtime. We’ll analyze the warmup times more further down.
  • Runtime – what I’d call “peak performance” – run times have stabilized. Most/all code has already been optimized by the runtime. This is the performance level that the code will run at for now and the future. Ideally, we want to measure this as 99.99%+ of the time our code will run in a warmed up already started state.

Interestingly, the startup/warmup times are acknowledged in the original benchmark but the way that they are dealt with simply lessens their effect but is far from getting rid of them: “MJIT has a very fast startup which is not true for JRuby and Graal Ruby. To give a better chance to JRuby and Graal Ruby the benchmarks were modified in a way that Ruby MRI v2.0 runs about 20s-70s on each benchmark”.

I argue that in the greater scheme of things, startup and warmup don’t really matter when we are talking about benchmarks when our purpose is to see how they perform in a long lived process.

Why is that, though? Web applications for instance are usually long lived, we start our web server once and then it runs for hours, days, weeks. We only pay the cost of startup and warmup once in the beginning, but run it for a much longer time until we shut the server down again. Normally servers should spend 99.99%+ of their time in the warmed up runtime “state”. This is a fact, that our benchmarks should reflect as we should look for what gives us the best performance for our hours/days/weeks of run time, not for the first seconds or minutes of starting up.

A little analogy here is a car. You wanna go 300 kilometers as fast as possible (straight line). Measuring as shown above is the equivalent of measuring maybe the first ~500 meters. Getting in the car, accelerating to top speed and maybe a bit of time on top speed. Is the car that’s fastest on the first 500 meters truly the best for going 300 kilometers at top speed? Probably not. (Note: I know little about cars)

What does this mean for our benchmark? Ideally we should eliminate startup and warmup time. We can do this by using a benchmarking library written in ruby that also runs the benchmark for a couple of times before actually taking measurements (warmup time). We’ll use my own little library as it means no gem required and it’s well equipped for the rather long run times.

But does startup and warmup truly never matter? It does matter. Most prominently it matters during development time – starting the server, reloading code, running tests. For all of those you gotta “pay” startup and warmup time. Also, if you develop a UI application  or a CLI tool for end users startup and warmup might be a bigger problem, as startup happens way more often. You can’t just warm it up before you take it into the load balancer. Also, running tasks periodically as a cronjob on your server will have to pay theses costs.

So are there benefits to measuring with startup and warmup included? Yes, for one for the use cases mentioned above it is important. Secondly, measuring with time -v gives you a lot more data:


tobi@speedy $ /usr/bin/time -v ~/dev/graalvm-0.25/bin/ruby pent.rb
Command being timed: "/home/tobi/dev/graalvm-0.25/bin/ruby pent.rb"
User time (seconds): 83.07
System time (seconds): 0.99
Percent of CPU this job got: 555%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:15.12
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1311768
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 57
Minor (reclaiming a frame) page faults: 72682
Voluntary context switches: 16718
Involuntary context switches: 13697
Swaps: 0
File system inputs: 25520
File system outputs: 312
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

You get lots of data, among which there’s memory usage, CPU usage, wall clock time and others which are also important for evaluating language implementations which is why they are also included in the original benchmarks.

Setup

Before we (finally!) get to the benchmarks, the obligatory “This is the system I’m running this on”:

The ruby versions in use are MJIT as of this commit from 25th of August compiled with no special settings, graalvm 25 and 27 (more on that in a bit) as well as CRuby 2.0.0-p648 as a baseline.

All of this is run on my Desktop PC running Linux Mint 18.2 (based on Ubuntu 16.04 LTS) with 16 GB of memory and an i7-4790 (3.6 GHz, 4 GHz boost).


tobi@speedy ~ $ uname -a
Linux speedy 4.10.0-33-generic #37~16.04.1-Ubuntu SMP Fri Aug 11 14:07:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

I feel it’s especially important to mention the setup in here, as when I first did these benchmarks for Polyconf on my dual core notebook TruffleRuby had significantly worse results. I think graalvm benefits from the 2 extra cores for warmup etc, as the CPU usage across cores is also quite high.

You can check out the benchmarking script used etc. as part of this repo.

But… you promised benchmarks, where are they?

Sorry, I think the theory is more important than the benchmarks themselves, although they undoubtedly help illustrate the point. We’ll first get into why I chose the pent.rb benchmark as a subject and why I run it with a slightly old versions of graalvm (no worries, current version coming in later on). Then, finally, graphs and numbers.

Why this benchmark?

First of all, the original benchmarks were done with graalvm-0.22. Attempting to reproduce the results with the (at the time current) graalvm-0.25 proved difficult as a lot of them had already been optimized (and 0.22 contained some genuine performance bugs).

One that I could still reproduce the performance problems with was pent.rb and it also seemed like a great candidate to show that something is flawed. In the original benchmarks it is noted down as 0.33 times the performance of Ruby 2.0 (or well, 3 times slower). All my experience with TruffleRuby told me that this is most likely wrong. So, I didn’t choose it because it was the fastest one on TruffleRuby, but rather the opposite – it was the slowest one.

Moreover, while a lot of it isn’t exactly idiomatic ruby code to my mind (no classes, lots of global variables) it uses quite a lot Enumerable methods such as each, collect, sort and uniq while refraining from bitmaskes and the like. So I also felt that it’d make a comparatively good candidate from here.

The way the benchmark is run is basically the original benchmark put into a loop so it is repeated a bunch of times so we can measure the times during warmup and later runtime to get an average of the runtime performance.

So, why am I running it on the old graalvm-0.25 version? Well, whatever is in a benchmark is gonna get optimized making the difference here less apparent.

We’ll run the new improved version later.

MJIT vs. graalvm-0.25

So on my machine the initial execution of the pent.rb benchmark (timing startup, warmup and runtime) on TruffleRuby 0.25 took 15.05 seconds while it just took 7.26 seconds with MJIT. Which has MJIT being 2.1 times faster. Impressive!

What’s when we account for startup and warmup though? If we benchmark just in ruby startup time already goes away, as we can only start measuring inside ruby once the interpreter has started. Now for warmup, we run the code to benchmark in a loop for 60 seconds of warmup time and 60 seconds for measuring the actual runtime. I plotted the execution times of the first 15 iterations below (that’s about when TruffleRuby stabilizes):

2_warmup.png
Execution time of TruffleRuby and MJIT progressing over time – iteration by iteration.

As you can clearly see, TruffleRuby starts out a lot slower but picks up speed quickly while MJIT stay more or less consistent. What’s interesting to see is that iteration 6 and 7 of TrufleRuby are slower again. Either it found a new optimization that took significant time to complete or a deoptimization had to happen as the constraints of a previous optimization were no longer valid. TruffleRuby stabilizes from there and reaches peak performance.

Running the benchmarks we get an average (warm) time for TruffleRuby of 1.75 seconds and for MJIT we get 7.33 seconds. Which means that with this way of measuring, TruffleRuby is suddenly 4.2 times faster than MJIT.

We went from 2.1 times slower to 4.2 times faster and we only changed the measuring method.

I like to present benchmarking numbers in iterations per second/minute (ips/ipm) as here “higher is better” so graphs are far more intuitive, our execution times converted are 34.25 iterations per minute for TruffleRuby and 8.18 iterations per minute for MJIT. So now have a look at our numbers converted to iterations per minute compared for the initial measuring method and our new measuring method:

2_comparison_before_after.png
Results of timing the whole script execution (initial time) versus the average execution time warmed up.

You can see the stark contrast for TruffleRuby caused by the hefty warmup/long execution time during the first couple of iterations. MJIT on the other hand, is very stable. The difference is well within the margin of error.

Ruby 2.0 vs MJIT vs. graalvm-0.25 vs. graalvm-0.27

Well, I promised you more data and here is more data! This data set also includes CRuby 2.0 as the base line as well as the new graalvm.

initial time (seconds) ipm of initial time average (seconds) ipm of average after warmup Standard Deviation as part of total
CRuby 2.0 12.3 4.87 12.34 4.86 0.43%
TruffleRuby 0.25 15.05 3.98 1.75 34.25 0.21%
TruffleRuby 0.27 8.81 6.81 1.22 49.36 0.44%
MJIT 7.26 8.26 7.33 8.18 2.39%

4_warmup.png
Execution times by iteration in second. CRuby stops appearing because that were already all the iterations I had.

We can see that TruffleRuby 0.27 is already faster than MJIT in the first iteration, which is quite impressive. It’s also lacking the weird “getting slower” around the 6th iteration and as such reaches peak performance much faster than TruffleRuby 0.25. It also gets faster overall as we can see if we compare the “warm” performance of all 4 competitors:

4_comparison.png
Iterations per Minute after warmup as an average of our 4 competitors.

So not only did the warmup get much faster in TruffleRuby 0.27 the overall performance also increased quite a bit. It is now more than 6 times faster than MJIT. Of course, some of it is probably the TruffleRuby team tuning it to the existing benchmark, which reiterates my point that we do need better benchmarks.

As a last fancy graph for you I have the comparison of measuring the runtime through time versus giving it warmup time, then benchmarking multiple iterations:

4_comparison_before_after.png
Difference between measuring whole script execution versus letting implementations warmup.

CRuby 2 is quite consistent as expected, TruffleRuby already manages a respectable out of the box performance but gets even faster. I hope this helps you see how the method of measuring can achieve drastically different results.

Conclusion

So, what can we take away? Startup time and warmup are a thing and you should think hard about whether those times are important for you and if you want to measure them. For web applications, most of the time startup and warmup aren’t that important as 99.99%+ you’ll run with a warm “runtime” performance.

Not only what time we measure is important, but also what code we measure. Benchmarks should be as realistic as possible so that they are as significant as possible. What a benchmark on the Internet check most likely isn’t directly related to what your application does.

ALWAYS RUN YOUR OWN BENCHMARKS AND QUESTION BOTH WHAT CODE IS BENCHMARKED, HOW IT IS BENCHMARKED AND WHAT TIMES ARE TAKEN

(I had this in my initial draft, but I ended up quite liking it so I kept it around)

edit1: Added CLI tool specifically to where startup & warmup counts as well as a reference to Substrate VM for how TruffleRuby tries to combat it 🙂

edit2: Just scroll down a little to read an interesting comment by Vladimir

Choosing Elixir for the Code, not the Performance

People like to argue about programming languages: “This one is better!” “No this one!”. In these discussion, often the performance card is pulled. This language is that much faster in these benchmarks or this company just needs that many servers now. Performance shouldn’t matter that much in my opinion, and Nate Berkopec makes a good point about that in his blog post “Is Ruby too slow for web scale?” (TLDR; we can add more servers and developer time often costs more than servers):

The better conversation, the more meaningful and impactful one, is which framework helps me write software faster, with more quality, and with more happiness.

I agree with lots of the points Nate makes and I like him and the post, but still it rubbed me the wrong way a bit. While it also states the above, it makes it seem like people just switch languages for the performance gains. And that brought up a topic that has been bugging me for a while: If you’re switching your main language purely for performance, there’s a high chance you’re doing it wrong. Honestly, if all we cared about was performance we’d all still be writing Assembly, or C/C++ at least.

It’s also true, that performance is often hyped a lot around new languages and specifically Elixir can also be guilty of that. And sure, performance is great and we read some amazing stories about that. Two of the most prominent adoption stories that I can recall are usually cited and referred to for their great performance numbers. There is Pinterest “our API responses are in microseconds now” and there is Bleacher Report “we went from 150 servers to 5”. If you re-read the articles though, other benefits of elixir are mentioned as well ore are even discussed more than  performance or even more.

The Pinterest article focuses first on Elixir as a good language, performance is arguably secondary in the post. Before we ever talk about microseconds there is lots of talk such as:

The language makes heavy use of pattern matching, a technique that prevents *value* errors which are much more common than *type* errors. It also has an innovative pipelining operator, which allows data to flow from one function to the next in a clear and easy to read fashion.

Then the famous microseconds drops in one paragraph, and then it immediately turns around and talks about code clarity again:

We’ve also seen an improvement in code clarity. We’re converting our notifications system from Java to Elixir. The Java version used an Actor system and weighed in at around 10,000 lines of code. The new Elixir system has shrunk this to around 1000 lines.

The Bleacher Report article has performance in its headline and at its heart, but it also mentions different benefits of Elixir:

The new language has led to cleaner code base and much less technical debt, according to Marx. It has also increased the speed of development(…)

So why do people rave about performance so much? Performance numbers are “objective”, they are “rationale”, they are impressive and people like those. It’s an easy argument to make that bleacher report uses just 5 servers instead of 150 in the old ruby stack. That’s a fact. It’s easy to remember and easy to put into a headline. Discussing the advantages of immutable data structures, pattern matching and “let it crash” philosophy is much more subjective, personal and nuanced.

Before we jump in, this blog post is general but some specific points might resonate the best with a ruby crowd as that is my main programming language/where I’m coming from. So, from other languages some of the points I’ll make will be like “meh I already got this” while I might miss out obvious cool things both Ruby and Elixir have.

Hence, after this lengthy introduction let’s focus on something different – what makes Elixir a language worth learning – how can it make day to day coding more productive in spite of performance? 

Let’s get some performance stuff out of the way first…

(The irony of starting the first section of a blog post decisively not about performance by discussing performance is not lost on me)

First, I wanna touch the topic of performance again really quickly – does it really not matter? Can we seamlessly scale horizontally? Does performance not impact productivity?

Well it certainly does, as remarked by Devon:

In a more general sense, if my runtime is already fast enough I don’t need to bother with more complex algorithms and extra concepts. I can just leave it as is. No extra engineering spent on “making it faster” – just on to the next. That’s a good thing. Especially caching can be just wonderful to debug.

What about performance of big tasks? Like Data processing, or in the case of the company I’m working for solving a vehicle routing problem1? You can’t just scale those up by throwing servers at it. You might be able to parallelize it, but that’s not too easy in Ruby and in general is often a bigger engineering effort. And some languages make that easier as well, like Elixir’s flow.

Vertical Scaling has its limits there. It works fine for serving more web requests, working on more background jobs but it gets more complicated when you have a big problem to solve that aren’t easily parallelizable especially if they need to be done wihin a given time frame.

Also, I might not be one of the cool docker + kubernetes kids, but if you tell me that there’s no overhead to managing 125 servers versus 5 servers, I tend to not believe it. If simply because the chance of anyone of your servers failing at any time is much bigger just cause you got more of them.

Well then, finally enough performance chatter in a post not about performance. Let’s look at the code and how it can make your life easier! I swear I try to keep these sections short and sweet, although admittedly that’s not exactly my strength (who would have guessed by now? 😉 )

Pattern Matching


defmodule Patterns do
def greet(%{name: name, age: age}) do
IO.puts "Hi there #{name}, what's up at #{age}?"
end
def greet(%{name: "José Valim"}) do
IO.puts "Hi José, thanks for elixir! <3"
end
def greet(%{name: name}) do
IO.puts "Hi there #{name}"
end
def greet(_) do
IO.puts "Hi"
end
end
Patterns.greet %{name: "Tobi", age: 27} # Hi there Tobi, what's up at 27?
Patterns.greet %{name: "José Valim"} # Hi José, thanks for elixir! ❤
Patterns.greet %{name: "dear Reader"} # Hi there dear Reader
Patterns.greet ["Mop"] # Hi

view raw

matching.exs

hosted with ❤ by GitHub

Pattern Matching is my single favorite feature. If I could pick a single feature to be adopted in other programming languages it would be pattern matching. I find myself writing pattern matching code in Ruby, then sighing… “Ugh right I can’t do this”. It changed the way I think.

Enough “this is soo great”. With pattern matching you basically make assertions on the structure and can get values directly out of a deeply nested map and put their value into a variable. It runs deeper than that though. You also have method overloading and elixir will try to match the functions from top to bottom which means you can have different function definitions based on the structure of your input data.

You can’t just use it on maps though. You can use it on lists as well, so you can have a separate function clause for an empty or one element list which is really great for recursion and catching edge cases.

One of the most fascinating uses I’ve seen was for parsing files as you can also use it for strings and so can separate the data and different headers of mp3 files all in just a couple of lines of elixir:


mp3_byte_size = (byte_size(binary) – 128)
<< _ :: binary-size(mp3_byte_size), id3_tag :: binary >> = binary
<< "TAG",
title :: binary-size(30),
artist :: binary-size(30),
album :: binary-size(30),
year :: binary-size(4),
comment :: binary-size(30),
_rest :: binary >> = id3_tag

view raw

mp3.ex

hosted with ❤ by GitHub

Immutable Data Structures and Pure Functions

If you’re unfamiliar with immutable data structures you might wonder how the hell one ever gets anything done? Well, you have to reassign values to the return values of functions if you wanna have any sort of change. You get pure functions, which means no side effects. The only thing that “happens” is the return value of the function. But, how does that help anyone?

Well, it means you have all your dependencies and their effect right there – there is no state to hold on which execution could depend. Everything that the function depends on is a parameter. That makes for superior understandability, debugging experience and testing.

Already months into my Elixir journey I noticed that I was seemingly much better at debugging library code than I was in Ruby. The reason, I believe, is the above. When I debug something in Ruby what a method does often depends on one or more instance variables. So, if yo wanna understand why that method “misbehaves” you gotta figure out which code sets those instance variables, which might depend on other instance variables being set and so on… Similarly a method might have the side effect of changing some instance variable. What is the effect in the end? You might never know.

With pure functions I can see all the dependencies of a function at a glance, I can see what they return and how that new return value is used in further function calls. It reads more like a straight up book and less like an interconnected net where I might not know where to start or stop looking.

The Pipeline Operator


config
|> Benchee.init
|> Benchee.system
|> Benchee.benchmark("job", fn -> magic end)
|> Benchee.measure
|> Benchee.statistics
|> Benchee.Formatters.Console.output
|> Benchee.Formatters.HTML.output

view raw

pipe.exs

hosted with ❤ by GitHub

How does a simple operator make it into this list?

Well, it’s about the code that it leads you to. The pipeline operator passes the value of the previous expression into the next function as the first argument. That gives you a couple of guidelines. First, when determining the order of arguments thinking about which one is the main data structure and putting that one first gives you a new guideline. Secondly, it leads you to a design with a main data structure per function, which can turn out really nice.

The above is an actual interface to my benchmarking library benchee. One of the design goals was to be “pipable” in elixir. This lead me to the design with a main Suite data structure in which all the important information is stored. As a result, implementing formatters is super easy as they are just a function that takes the suite and they can pick the information to take into account. Moreover, each and every one of those steps is interchangeable and well suited for plugins. As long you provide the needed data for later processing steps there is nothing stopping you from just replacing a function in that pipe with your own.

Lastly, the pipeline operator represents very well how I once learned to think about Functional Programming, it’s a transformation of inputs. The pipeline operator perfectly mirrors this, we start with some data structure and through a series of transformations we get some other data structure. We start with a configuration and end up with a complete benchmarking suite. We start with a URL and some parameters which we transform into some HTML to send to the user.

Railway Oriented Programming


with {:ok, record} <- validate_data(params),
{:ok, record} <- validate_in_other_system(record),
{:ok, record} <- Repo.insert(record) do
{:ok, record}
else
{:error, changeset} -> {:error, changeset}
end

view raw

railway.ex

hosted with ❤ by GitHub

I’d love to ramble on about Railway Oriented Programming, but there’s already good blog posts about that out there. Basically, instead of always checking if an error had already occurred earlier we can just branch out to the error track at any point.

It doesn’t seem all that magical until you use it for the first time. I remember suggesting using it to a colleague on a pull request (without ever using it before) and my colleague came back like “It’s amazing”!

It’s been a pattern in the application ever since. It goes a bit like this:

  1. Check the basic validity of data that we have (all fields present/sensible data)
  2. Validate that data with another system (business logic rules in some external service)
  3. Insert record into database

Anyone of those steps could fail, and if it fails executing the other steps makes no sense. So, as soon as a function doesn’t return {:ok, something} we error out to the error track and otherwise we stay on the happy track.

Explicit Code

The Python folks were right all along.

Implicit code feels like magic. It just works without writing any code. My controller instance variables are just present in the view? The name of my view is automatically inferred I don’t have to write anything? Such magic, many wow.

Phoenix, the most popular elixir web framework, takes another approach. You have to specify a template (which is like a Rails view) by name and explicitly pass parameters to it:


def new(conn, _params) do
changeset = User.new_changeset(%User{})
render conn, "new.html", changeset: changeset
end

No magic. What happens is right there, you can see it. No accidentally making something accessible to views (or partials) anymore! You know what else is there? The connection and the parameters so we can make use of them, and pattern match on them.

Another place where the elixir eco system is more explicit is when loading relations of a record:


iex(13)> user = Repo.get_by(User, name: "Homer")
iex(14)> user.videos
#Ecto.Association.NotLoaded<association :videos is not loaded>
iex(17)> user = Repo.preload(user, :videos)
iex(18)> user.videos
# all the videos

This is ecto, the “database access layer” in the elixir world. As you see, we have to explicitly preload associations we want to use. Seems awful bothersome, doesn’t it?

I love it!

No more N+1 queries, as Rails loads things magically for me. Also, I get more control. I know about the db queries my application fires against the database. Just the other day I fixed a severe performance degradation as our app was loading countless records from the database and instantiated them for what should have been a simple count query. Long story short, it was a presenter object so .association loaded all the objects, put them in presenters and then let my .size be executed on that. I would have never explicitly preloaded that data and hence found out much earlier that something is wrong with this.

Speaking of explicitness and ecto…

Ecto Changesets


def new_changeset(model, params \\ %{}) do
model
|> cast(params, ~w(name username))
|> validate_required(~w(name username))
|> unique_constraint(:username)
|> validate_length(:username, min: 1, max: 20)
end
def registration_changeset(model, params) do
model
|> new_changeset(params)
|> cast(params, ~w(password))
|> validate_required(~w(password))
|> validate_length(:password, min: 6, max: 100)
|> put_pass_hash()
end

view raw

changeset.ex

hosted with ❤ by GitHub

Callbacks and validations are my nemesis.

The problem with them is the topic for another topic entirely but in short, validations and callbacks are executed all the time (before save, validation, create, whatever) but lots of them are just added for one feature that is maybe used in 2 places. Think about the user password. Validating it and hashing/salting it is only ever relevant when a user registers or changes the password. But that code sits there and is executed in a not exactly trivial to determine order. Sometimes that gets in the way of new features or tests so you start to throw a bunch of ifs at it.

Ecto Changesets were one of the strangest things for me to get used to coming to Elixir. Basically they just encapsulate a change operation, saying which parameters can take part, what should be validated and other code to execute. You can also easily combine changesets, in the code above the registration_changeset uses the new_changeset and adds just password functionality on top.

I only deal with password stuff when I explicitly want to. I know which parameters were allowed to go in/change so I just need to validate those. I know exactly when what step happens so it’s easy to debug and understand.

Beautiful.

Optional Type-Checking


@type job_name :: String.t | atom
@spec benchmark(Suite.t, job_name, fun, module) :: Suite.t
def benchmark(suite = %Suite{scenarios: scenarios}, job_name, function, printer \\ Printer) do
#…
end

view raw

type_specs.ex

hosted with ❤ by GitHub

Want to try typing but not all the time? Elixir has got something for you! It’s cool, but not enough space here to explain it, dialyxir makes the dialyzer tool quite usable and it also goes beyond “just” type checking and includes other static analysis features as well. Still, in case you don’t like types it’s optional.

Parallelism

“Wait, you swore this was it about performance! This is outrageous!”

Relax. While Parallelism can be used for better performance it’s not the only thing. What’s great about parallelism in elixir/the Erlang VM in general is how low cost and seamless it is. Spawning a new process (not like an Operating System process, they are more like actors) is super easy and has a very low overhead, unlike starting a new thread. You can have millions of them on one machine, no problem.

Moreover thanks to our immutability guarantees and every process being isolated you don’t have to worry about processes messing with each other. So, first of all if I want to geocode the pick up and drop off address in parallel I can just do that with a bit of Task.async and Task.await. I’d never just trust whatever ruby gems I use for geocoding to be threadsafe due to global et. al.

How does this help? Well, I have something that is easily parallelizable and I can just do that. Benchee generates statistics for different scenarios in parallel just because I can easily do so. That’s nice, because for lots of samples it might actually take a couple of seconds per scenario.

Another point is that there’s less need for background workers. Let’s take web sockets as an example. To the best of my knowledge it is recommended to load off all bigger tasks in the communication to a background workers in an evented architecture as we’d block our thread/operating system process from handling other events. In Phoenix every connection already runs in its own elixir process which means they are already executed in parallel and doing some more work in one won’t block the others.

This ultimately makes applications easier as you don’t have to deal with background workers, off loading work etc.

OTP

Defining what OTP really is, is hard. It’s somewhat a set of tools for concurrent programming and it includes everything from an in memory database to the Dialyzer tool mentioned earlier. It is probably most notorious for its “behaviours” like supervisors that help you build concurrent and distributed systems in the famous “Let it crash! (and restart it in a known good state maybe)” philosophy.

There’s big books written about this so I’m not gonna try to explain it. Just so much, years of experience about highly available systems are in here. There is so much to learn here and change how you see programming. It might be daunting, but don’t worry. People built some nice abstractions that are better to use and often it’s the job of a library or a framework to set these up. Phoenix and ecto do this for you (web requests/database connections respectively). I’ll out myself right now: I’ve never written a Supervisor or a GenServer for production purposes. I used abstractions or relied on what my framework brought with it.

If this has gotten you interested I warmly recommend “The Little Elixir & OTP Guidebook”. It walks you through building a complete worker pool application from simple to a more complex fully featured version.

Doctests


@doc """
## Examples
iex> {:ok, pid} = Agent.start_link(fn -> 42 end)
iex> Agent.get_and_update(pid, fn(state) -> {state, state + 1} end)
42
iex> Agent.get(pid, fn(state) -> state end)
43
"""

view raw

doctest.ex

hosted with ❤ by GitHub

Imo the most underrated feature of elixir. Doc tests allow you to write iex example sessions in the documentation of a method. These will be executed during test runs and check if they still return the same values/still pass. They are also part of the awesome generated documentation. No more out of date/slightly wrong code samples in the documentation!

I have entire modules that only rely on their doctests, which is pretty awesome if you ask me. Also, contributing doc tests to libraries is a pretty great way to provide both documentation and tests. E.g. once upon a time I wanted to learn about the Agent module, but it didn’t click right away, so I made a PR to elixir with some nice doctests to help future generations.

A good language to learn

In the end, elixir is a good language to learn. It contains many great concepts that can make your coding lives easier and more enjoyable and all of that lives in a nice and accessible syntax. Even if you can’t use it at work straight away, learning these concepts will influence and improve your code. I experienced the same when I learned and read a couple of books about Clojure, I never wrote it professionally but it improved my Ruby code.

You might ask “Should we all go and write Elixir now?”. No. There are many great languages and there are no silver bullets. The eco system is still growing for instance. I’m also not a fan of rewriting all applications. Start small. See if you like it and if it works for you.

Lastly, if this has peaked your interest I have a whole talk up that focuses on the great explicit features of elixir and explains them in more detail: “Elixir & Phoenix – fast, concurrent and explicit”

edit1: Clarified that the bleacher report blog post is mostly about performance with little else

edit2: Fixed that you gotta specify the template by name, not the view

[1] It’s sort of like Traveling Salesman, put together with Knapsack and then you also need to decide what goes where. In short: a very hard problem to solve.

Slides: Stop Guessing and Start Measuring (Poly-Version)

Hello from the amazing Polyconf! I just gave my Stop Guessing and Start Measuring talk and if you are thinking “why do you post the slides of this SO MANY TIMES”, well the first one was an Elixir version, then a Ruby + Elixir version and now we are at a Poly version. The slides are mostly different and I’d say about ~50% of them are new. New topics covered include:

  • MJIT – what’s wrong with the benchmarks – versus TruffleRuby
  • JavaScript!
  • other nice adjustments

The all important video isn’t in the PDF export but you can see a big part of it on Instagram.

You can view the slides here or on speakerdeck, slideshare or PDF.

Abstract

“What’s the fastest way of doing this?” – you might ask yourself during development. Sure, you can guess, your intuition might be correct – but how do you know? Benchmarking is here to give you the answers, but there are many pitfalls in setting up a good benchmark and analyzing the results. This talk will guide you through, introduce best practices, and surprise you with some unexpected benchmarking results. You didn’t think that the order of arguments could influence its performance…or did you?