Careful what you measure: 2.1 times slower to 4.2 times faster – MJIT versus TruffleRuby

Have you seen the MJIT benchmark results? Amazing, aren’t they? MJIT basically blows the other implementations out of the water! What were they doing all these years? That’s it, we’re done here right?

Well, not so fast as you can infer from the title. But before we can get to what I take issue with in these particular benchmarks (you can of course jump ahead to the nice diagrams) we gotta get some introductions and some important benchmarking basics out of the way.

MJIT? Truffle Ruby? WTF is this?

MJIT currently is a branch of ruby on github by Vladimir Makarov, GCC developer, that implements a JIT (Just In Time Compilation) on the most commonly used Ruby interpreter/CRuby. It’s by no means final, in fact it’s in a very early stage. Very promising benchmarking results were published on the 15th of June 2017, which are in major parts the subject of this blog post.

TruffleRuby is an implementation of Ruby on the GraalVM by Oracle Labs. It poses impressive performance numbers as you can see in my latest great “Ruby plays Go Rumble”. It also implements a JIT, is known to take a bit of a warmup but comes out being ~8 times faster than Ruby 2.0 in the previously mentioned benchmark.

Before we go further…

I have enormous respect for Vladimir and think that MJIT is an incredibly valuable project. Realistically it might be one of our few shots to get a JIT into mainstream ruby. JRuby had a JIT and great performance for years, but never got picked up by the masses (topic for another day).

I’m gonna critique the way the benchmarks were done, but there might be reasons for that, that I’m missing (gonna point out the ones I know). After all, Vladimir has been programming for way longer than I’m even alive and also knows more about language implementations than I do obviously.

Plus, to repeat, this is not about the person or the project, just the way we do benchmarks. Vladimir, in case you are reading this 💚💚💚💚💚💚

What are we measuring?

When you see a benchmark in the wild, first you gotta ask “What was measured?” – the what here comes in to flavors: code and time.

What code are we benchmarking?

It is important to know what code is actually being benchmarked, to see if that code is actually relevant to us or a good representation of a real life Ruby program. This is especially important if we want to use benchmarks as an indication of the performance of a particular ruby implementation.

When you look at the list of benchmarks provided in the README (and scroll up to the list what they mean or look at them) you can see that basically the top half are extremely micro benchmarks:

Selection_041.png

What’s benchmarked here are writes to instance variables, reading constants, empty method calls, while loops and the like. This is extremely micro, maybe interesting from a language implementors point of view but not very indicative of real world ruby performance. The day looking up a constant will be the performance bottle neck in Ruby will be a happy day. Also, how much of your code uses while loops?

A lot of the code (omitting the super micro ones) there isn’t exactly what I’d call typical ruby code. A lot of it is more a mixture of a script and C-code. Lots of them don’t define classes, use a lot of while and for loops instead of the more typical Enumerable methods and sometimes there’s even bitmasks.

Some of those constructs might have originated in optimizations, as they are apparently used in the general language benchmarks. That’s dangerous as well though, mostly they are optimized for one specific platform, in this case CRuby. What’s the fastest Ruby code on one platform can be way slower on the other platforms as it’s an implementation detail (for instance TruffleRuby uses a different String implementation). This puts the other implementations at an inherent disadvantage.

The problem here goes a bit deeper, whatever is in a popular benchmark will inevitably be what implementations optimize for and that should be as close to reality as possible. Hence, I’m excited what benchmarks the Ruby 3×3 project comes up with so that we have some new more relevant benchmarks.

What time are we measuring?

This is truly my favorite part of this blog post and arguably most important. For all that I know the time measurements in the original benchmarks were done like this: /usr/bin/time -v ruby $script which is one of my favorite benchmarking mistakes for programming languages commonly used for web applications. You can watch me go on about it for a bit here.

What’s the problem? Well, let’s analyze the times that make up the total time you measure when you just time the execution of a script: Startup, Warmup and Runtime.

Selection_043.png

  • Startup – the time until we get to do anything “useful” aka the Ruby Interpreter has started up and has parsed all the code. For reference, executing an empty ruby file with standard ruby takes 0.02 seconds for me, MJIT 0.17 seconds and for TruffleRuby it takes 2.5 seconds (there are plans to significantly reduce it though with the help of Substrate VM). This time is inherently present in every measured benchmark if you just time script execution.
  • Warmup – the time it takes until the program can operate at full speed. This is especially important for implementations with a JIT. On a high level what happens is they see which code gets called a lot and they try to optimize this code further. This process takes a lot of time and only after it is completed can we truly speak of “peak performance”. Warmup can be significantly slower than runtime. We’ll analyze the warmup times more further down.
  • Runtime – what I’d call “peak performance” – run times have stabilized. Most/all code has already been optimized by the runtime. This is the performance level that the code will run at for now and the future. Ideally, we want to measure this as 99.99%+ of the time our code will run in a warmed up already started state.

Interestingly, the startup/warmup times are acknowledged in the original benchmark but the way that they are dealt with simply lessens their effect but is far from getting rid of them: “MJIT has a very fast startup which is not true for JRuby and Graal Ruby. To give a better chance to JRuby and Graal Ruby the benchmarks were modified in a way that Ruby MRI v2.0 runs about 20s-70s on each benchmark”.

I argue that in the greater scheme of things, startup and warmup don’t really matter when we are talking about benchmarks when our purpose is to see how they perform in a long lived process.

Why is that, though? Web applications for instance are usually long lived, we start our web server once and then it runs for hours, days, weeks. We only pay the cost of startup and warmup once in the beginning, but run it for a much longer time until we shut the server down again. Normally servers should spend 99.99%+ of their time in the warmed up runtime “state”. This is a fact, that our benchmarks should reflect as we should look for what gives us the best performance for our hours/days/weeks of run time, not for the first seconds or minutes of starting up.

A little analogy here is a car. You wanna go 300 kilometers as fast as possible (straight line). Measuring as shown above is the equivalent of measuring maybe the first ~500 meters. Getting in the car, accelerating to top speed and maybe a bit of time on top speed. Is the car that’s fastest on the first 500 meters truly the best for going 300 kilometers at top speed? Probably not. (Note: I know little about cars)

What does this mean for our benchmark? Ideally we should eliminate startup and warmup time. We can do this by using a benchmarking library written in ruby that also runs the benchmark for a couple of times before actually taking measurements (warmup time). We’ll use my own little library as it means no gem required and it’s well equipped for the rather long run times.

But does startup and warmup truly never matter? It does matter. Most prominently it matters during development time – starting the server, reloading code, running tests. For all of those you gotta “pay” startup and warmup time. Also, if you develop a UI application  or a CLI tool for end users startup and warmup might be a bigger problem, as startup happens way more often. You can’t just warm it up before you take it into the load balancer. Also, running tasks periodically as a cronjob on your server will have to pay theses costs.

So are there benefits to measuring with startup and warmup included? Yes, for one for the use cases mentioned above it is important. Secondly, measuring with time -v gives you a lot more data:


tobi@speedy $ /usr/bin/time -v ~/dev/graalvm-0.25/bin/ruby pent.rb
Command being timed: "/home/tobi/dev/graalvm-0.25/bin/ruby pent.rb"
User time (seconds): 83.07
System time (seconds): 0.99
Percent of CPU this job got: 555%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:15.12
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1311768
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 57
Minor (reclaiming a frame) page faults: 72682
Voluntary context switches: 16718
Involuntary context switches: 13697
Swaps: 0
File system inputs: 25520
File system outputs: 312
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

You get lots of data, among which there’s memory usage, CPU usage, wall clock time and others which are also important for evaluating language implementations which is why they are also included in the original benchmarks.

Setup

Before we (finally!) get to the benchmarks, the obligatory “This is the system I’m running this on”:

The ruby versions in use are MJIT as of this commit from 25th of August compiled with no special settings, graalvm 25 and 27 (more on that in a bit) as well as CRuby 2.0.0-p648 as a baseline.

All of this is run on my Desktop PC running Linux Mint 18.2 (based on Ubuntu 16.04 LTS) with 16 GB of memory and an i7-4790 (3.6 GHz, 4 GHz boost).


tobi@speedy ~ $ uname -a
Linux speedy 4.10.0-33-generic #37~16.04.1-Ubuntu SMP Fri Aug 11 14:07:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

I feel it’s especially important to mention the setup in here, as when I first did these benchmarks for Polyconf on my dual core notebook TruffleRuby had significantly worse results. I think graalvm benefits from the 2 extra cores for warmup etc, as the CPU usage across cores is also quite high.

You can check out the benchmarking script used etc. as part of this repo.

But… you promised benchmarks, where are they?

Sorry, I think the theory is more important than the benchmarks themselves, although they undoubtedly help illustrate the point. We’ll first get into why I chose the pent.rb benchmark as a subject and why I run it with a slightly old versions of graalvm (no worries, current version coming in later on). Then, finally, graphs and numbers.

Why this benchmark?

First of all, the original benchmarks were done with graalvm-0.22. Attempting to reproduce the results with the (at the time current) graalvm-0.25 proved difficult as a lot of them had already been optimized (and 0.22 contained some genuine performance bugs).

One that I could still reproduce the performance problems with was pent.rb and it also seemed like a great candidate to show that something is flawed. In the original benchmarks it is noted down as 0.33 times the performance of Ruby 2.0 (or well, 3 times slower). All my experience with TruffleRuby told me that this is most likely wrong. So, I didn’t choose it because it was the fastest one on TruffleRuby, but rather the opposite – it was the slowest one.

Moreover, while a lot of it isn’t exactly idiomatic ruby code to my mind (no classes, lots of global variables) it uses quite a lot Enumerable methods such as each, collect, sort and uniq while refraining from bitmaskes and the like. So I also felt that it’d make a comparatively good candidate from here.

The way the benchmark is run is basically the original benchmark put into a loop so it is repeated a bunch of times so we can measure the times during warmup and later runtime to get an average of the runtime performance.

So, why am I running it on the old graalvm-0.25 version? Well, whatever is in a benchmark is gonna get optimized making the difference here less apparent.

We’ll run the new improved version later.

MJIT vs. graalvm-0.25

So on my machine the initial execution of the pent.rb benchmark (timing startup, warmup and runtime) on TruffleRuby 0.25 took 15.05 seconds while it just took 7.26 seconds with MJIT. Which has MJIT being 2.1 times faster. Impressive!

What’s when we account for startup and warmup though? If we benchmark just in ruby startup time already goes away, as we can only start measuring inside ruby once the interpreter has started. Now for warmup, we run the code to benchmark in a loop for 60 seconds of warmup time and 60 seconds for measuring the actual runtime. I plotted the execution times of the first 15 iterations below (that’s about when TruffleRuby stabilizes):

2_warmup.png
Execution time of TruffleRuby and MJIT progressing over time – iteration by iteration.

As you can clearly see, TruffleRuby starts out a lot slower but picks up speed quickly while MJIT stay more or less consistent. What’s interesting to see is that iteration 6 and 7 of TrufleRuby are slower again. Either it found a new optimization that took significant time to complete or a deoptimization had to happen as the constraints of a previous optimization were no longer valid. TruffleRuby stabilizes from there and reaches peak performance.

Running the benchmarks we get an average (warm) time for TruffleRuby of 1.75 seconds and for MJIT we get 7.33 seconds. Which means that with this way of measuring, TruffleRuby is suddenly 4.2 times faster than MJIT.

We went from 2.1 times slower to 4.2 times faster and we only changed the measuring method.

I like to present benchmarking numbers in iterations per second/minute (ips/ipm) as here “higher is better” so graphs are far more intuitive, our execution times converted are 34.25 iterations per minute for TruffleRuby and 8.18 iterations per minute for MJIT. So now have a look at our numbers converted to iterations per minute compared for the initial measuring method and our new measuring method:

2_comparison_before_after.png
Results of timing the whole script execution (initial time) versus the average execution time warmed up.

You can see the stark contrast for TruffleRuby caused by the hefty warmup/long execution time during the first couple of iterations. MJIT on the other hand, is very stable. The difference is well within the margin of error.

Ruby 2.0 vs MJIT vs. graalvm-0.25 vs. graalvm-0.27

Well, I promised you more data and here is more data! This data set also includes CRuby 2.0 as the base line as well as the new graalvm.

initial time (seconds) ipm of initial time average (seconds) ipm of average after warmup Standard Deviation as part of total
CRuby 2.0 12.3 4.87 12.34 4.86 0.43%
TruffleRuby 0.25 15.05 3.98 1.75 34.25 0.21%
TruffleRuby 0.27 8.81 6.81 1.22 49.36 0.44%
MJIT 7.26 8.26 7.33 8.18 2.39%
4_warmup.png
Execution times by iteration in second. CRuby stops appearing because that were already all the iterations I had.

We can see that TruffleRuby 0.27 is already faster than MJIT in the first iteration, which is quite impressive. It’s also lacking the weird “getting slower” around the 6th iteration and as such reaches peak performance much faster than TruffleRuby 0.25. It also gets faster overall as we can see if we compare the “warm” performance of all 4 competitors:

4_comparison.png
Iterations per Minute after warmup as an average of our 4 competitors.

So not only did the warmup get much faster in TruffleRuby 0.27 the overall performance also increased quite a bit. It is now more than 6 times faster than MJIT. Of course, some of it is probably the TruffleRuby team tuning it to the existing benchmark, which reiterates my point that we do need better benchmarks.

As a last fancy graph for you I have the comparison of measuring the runtime through time versus giving it warmup time, then benchmarking multiple iterations:

4_comparison_before_after.png
Difference between measuring whole script execution versus letting implementations warmup.

CRuby 2 is quite consistent as expected, TruffleRuby already manages a respectable out of the box performance but gets even faster. I hope this helps you see how the method of measuring can achieve drastically different results.

Conclusion

So, what can we take away? Startup time and warmup are a thing and you should think hard about whether those times are important for you and if you want to measure them. For web applications, most of the time startup and warmup aren’t that important as 99.99%+ you’ll run with a warm “runtime” performance.

Not only what time we measure is important, but also what code we measure. Benchmarks should be as realistic as possible so that they are as significant as possible. What a benchmark on the Internet check most likely isn’t directly related to what your application does.

ALWAYS RUN YOUR OWN BENCHMARKS AND QUESTION BOTH WHAT CODE IS BENCHMARKED, HOW IT IS BENCHMARKED AND WHAT TIMES ARE TAKEN

(I had this in my initial draft, but I ended up quite liking it so I kept it around)

edit1: Added CLI tool specifically to where startup & warmup counts as well as a reference to Substrate VM for how TruffleRuby tries to combat it 🙂

edit2: Just scroll down a little to read an interesting comment by Vladimir

Advertisements

Choosing Elixir for the Code, not the Performance

People like to argue about programming languages: “This one is better!” “No this one!”. In these discussion, often the performance card is pulled. This language is that much faster in these benchmarks or this company just needs that many servers now. Performance shouldn’t matter that much in my opinion, and Nate Berkopec makes a good point about that in his blog post “Is Ruby too slow for web scale?” (TLDR; we can add more servers and developer time often costs more than servers):

The better conversation, the more meaningful and impactful one, is which framework helps me write software faster, with more quality, and with more happiness.

I agree with lots of the points Nate makes and I like him and the post, but still it rubbed me the wrong way a bit. While it also states the above, it makes it seem like people just switch languages for the performance gains. And that brought up a topic that has been bugging me for a while: If you’re switching your main language purely for performance, there’s a high chance you’re doing it wrong. Honestly, if all we cared about was performance we’d all still be writing Assembly, or C/C++ at least.

It’s also true, that performance is often hyped a lot around new languages and specifically Elixir can also be guilty of that. And sure, performance is great and we read some amazing stories about that. Two of the most prominent adoption stories that I can recall are usually cited and referred to for their great performance numbers. There is Pinterest “our API responses are in microseconds now” and there is Bleacher Report “we went from 150 servers to 5”. If you re-read the articles though, other benefits of elixir are mentioned as well ore are even discussed more than  performance or even more.

The Pinterest article focuses first on Elixir as a good language, performance is arguably secondary in the post. Before we ever talk about microseconds there is lots of talk such as:

The language makes heavy use of pattern matching, a technique that prevents *value* errors which are much more common than *type* errors. It also has an innovative pipelining operator, which allows data to flow from one function to the next in a clear and easy to read fashion.

Then the famous microseconds drops in one paragraph, and then it immediately turns around and talks about code clarity again:

We’ve also seen an improvement in code clarity. We’re converting our notifications system from Java to Elixir. The Java version used an Actor system and weighed in at around 10,000 lines of code. The new Elixir system has shrunk this to around 1000 lines.

The Bleacher Report article has performance in its headline and at its heart, but it also mentions different benefits of Elixir:

The new language has led to cleaner code base and much less technical debt, according to Marx. It has also increased the speed of development(…)

So why do people rave about performance so much? Performance numbers are “objective”, they are “rationale”, they are impressive and people like those. It’s an easy argument to make that bleacher report uses just 5 servers instead of 150 in the old ruby stack. That’s a fact. It’s easy to remember and easy to put into a headline. Discussing the advantages of immutable data structures, pattern matching and “let it crash” philosophy is much more subjective, personal and nuanced.

Before we jump in, this blog post is general but some specific points might resonate the best with a ruby crowd as that is my main programming language/where I’m coming from. So, from other languages some of the points I’ll make will be like “meh I already got this” while I might miss out obvious cool things both Ruby and Elixir have.

Hence, after this lengthy introduction let’s focus on something different – what makes Elixir a language worth learning – how can it make day to day coding more productive in spite of performance? 

Let’s get some performance stuff out of the way first…

(The irony of starting the first section of a blog post decisively not about performance by discussing performance is not lost on me)

First, I wanna touch the topic of performance again really quickly – does it really not matter? Can we seamlessly scale horizontally? Does performance not impact productivity?

Well it certainly does, as remarked by Devon:

In a more general sense, if my runtime is already fast enough I don’t need to bother with more complex algorithms and extra concepts. I can just leave it as is. No extra engineering spent on “making it faster” – just on to the next. That’s a good thing. Especially caching can be just wonderful to debug.

What about performance of big tasks? Like Data processing, or in the case of the company I’m working for solving a vehicle routing problem1? You can’t just scale those up by throwing servers at it. You might be able to parallelize it, but that’s not too easy in Ruby and in general is often a bigger engineering effort. And some languages make that easier as well, like Elixir’s flow.

Vertical Scaling has its limits there. It works fine for serving more web requests, working on more background jobs but it gets more complicated when you have a big problem to solve that aren’t easily parallelizable especially if they need to be done wihin a given time frame.

Also, I might not be one of the cool docker + kubernetes kids, but if you tell me that there’s no overhead to managing 125 servers versus 5 servers, I tend to not believe it. If simply because the chance of anyone of your servers failing at any time is much bigger just cause you got more of them.

Well then, finally enough performance chatter in a post not about performance. Let’s look at the code and how it can make your life easier! I swear I try to keep these sections short and sweet, although admittedly that’s not exactly my strength (who would have guessed by now? 😉 )

Pattern Matching

Pattern Matching is my single favorite feature. If I could pick a single feature to be adopted in other programming languages it would be pattern matching. I find myself writing pattern matching code in Ruby, then sighing… “Ugh right I can’t do this”. It changed the way I think.

Enough “this is soo great”. With pattern matching you basically make assertions on the structure and can get values directly out of a deeply nested map and put their value into a variable. It runs deeper than that though. You also have method overloading and elixir will try to match the functions from top to bottom which means you can have different function definitions based on the structure of your input data.

You can’t just use it on maps though. You can use it on lists as well, so you can have a separate function clause for an empty or one element list which is really great for recursion and catching edge cases.

One of the most fascinating uses I’ve seen was for parsing files as you can also use it for strings and so can separate the data and different headers of mp3 files all in just a couple of lines of elixir:

Immutable Data Structures and Pure Functions

If you’re unfamiliar with immutable data structures you might wonder how the hell one ever gets anything done? Well, you have to reassign values to the return values of functions if you wanna have any sort of change. You get pure functions, which means no side effects. The only thing that “happens” is the return value of the function. But, how does that help anyone?

Well, it means you have all your dependencies and their effect right there – there is no state to hold on which execution could depend. Everything that the function depends on is a parameter. That makes for superior understandability, debugging experience and testing.

Already months into my Elixir journey I noticed that I was seemingly much better at debugging library code than I was in Ruby. The reason, I believe, is the above. When I debug something in Ruby what a method does often depends on one or more instance variables. So, if yo wanna understand why that method “misbehaves” you gotta figure out which code sets those instance variables, which might depend on other instance variables being set and so on… Similarly a method might have the side effect of changing some instance variable. What is the effect in the end? You might never know.

With pure functions I can see all the dependencies of a function at a glance, I can see what they return and how that new return value is used in further function calls. It reads more like a straight up book and less like an interconnected net where I might not know where to start or stop looking.

The Pipeline Operator

How does a simple operator make it into this list?

Well, it’s about the code that it leads you to. The pipeline operator passes the value of the previous expression into the next function as the first argument. That gives you a couple of guidelines. First, when determining the order of arguments thinking about which one is the main data structure and putting that one first gives you a new guideline. Secondly, it leads you to a design with a main data structure per function, which can turn out really nice.

The above is an actual interface to my benchmarking library benchee. One of the design goals was to be “pipable” in elixir. This lead me to the design with a main Suite data structure in which all the important information is stored. As a result, implementing formatters is super easy as they are just a function that takes the suite and they can pick the information to take into account. Moreover, each and every one of those steps is interchangeable and well suited for plugins. As long you provide the needed data for later processing steps there is nothing stopping you from just replacing a function in that pipe with your own.

Lastly, the pipeline operator represents very well how I once learned to think about Functional Programming, it’s a transformation of inputs. The pipeline operator perfectly mirrors this, we start with some data structure and through a series of transformations we get some other data structure. We start with a configuration and end up with a complete benchmarking suite. We start with a URL and some parameters which we transform into some HTML to send to the user.

Railway Oriented Programming

I’d love to ramble on about Railway Oriented Programming, but there’s already good blog posts about that out there. Basically, instead of always checking if an error had already occurred earlier we can just branch out to the error track at any point.

It doesn’t seem all that magical until you use it for the first time. I remember suggesting using it to a colleague on a pull request (without ever using it before) and my colleague came back like “It’s amazing”!

It’s been a pattern in the application ever since. It goes a bit like this:

  1. Check the basic validity of data that we have (all fields present/sensible data)
  2. Validate that data with another system (business logic rules in some external service)
  3. Insert record into database

Anyone of those steps could fail, and if it fails executing the other steps makes no sense. So, as soon as a function doesn’t return {:ok, something} we error out to the error track and otherwise we stay on the happy track.

Explicit Code

The Python folks were right all along.

Implicit code feels like magic. It just works without writing any code. My controller instance variables are just present in the view? The name of my view is automatically inferred I don’t have to write anything? Such magic, many wow.

Phoenix, the most popular elixir web framework, takes another approach. You have to specify a template (which is like a Rails view) by name and explicitly pass parameters to it:

No magic. What happens is right there, you can see it. No accidentally making something accessible to views (or partials) anymore! You know what else is there? The connection and the parameters so we can make use of them, and pattern match on them.

Another place where the elixir eco system is more explicit is when loading relations of a record:

This is ecto, the “database access layer” in the elixir world. As you see, we have to explicitly preload associations we want to use. Seems awful bothersome, doesn’t it?

I love it!

No more N+1 queries, as Rails loads things magically for me. Also, I get more control. I know about the db queries my application fires against the database. Just the other day I fixed a severe performance degradation as our app was loading countless records from the database and instantiated them for what should have been a simple count query. Long story short, it was a presenter object so .association loaded all the objects, put them in presenters and then let my .size be executed on that. I would have never explicitly preloaded that data and hence found out much earlier that something is wrong with this.

Speaking of explicitness and ecto…

Ecto Changesets

Callbacks and validations are my nemesis.

The problem with them is the topic for another topic entirely but in short, validations and callbacks are executed all the time (before save, validation, create, whatever) but lots of them are just added for one feature that is maybe used in 2 places. Think about the user password. Validating it and hashing/salting it is only ever relevant when a user registers or changes the password. But that code sits there and is executed in a not exactly trivial to determine order. Sometimes that gets in the way of new features or tests so you start to throw a bunch of ifs at it.

Ecto Changesets were one of the strangest things for me to get used to coming to Elixir. Basically they just encapsulate a change operation, saying which parameters can take part, what should be validated and other code to execute. You can also easily combine changesets, in the code above the registration_changeset uses the new_changeset and adds just password functionality on top.

I only deal with password stuff when I explicitly want to. I know which parameters were allowed to go in/change so I just need to validate those. I know exactly when what step happens so it’s easy to debug and understand.

Beautiful.

Optional Type-Checking

Want to try typing but not all the time? Elixir has got something for you! It’s cool, but not enough space here to explain it, dialyxir makes the dialyzer tool quite usable and it also goes beyond “just” type checking and includes other static analysis features as well. Still, in case you don’t like types it’s optional.

Parallelism

“Wait, you swore this was it about performance! This is outrageous!”

Relax. While Parallelism can be used for better performance it’s not the only thing. What’s great about parallelism in elixir/the Erlang VM in general is how low cost and seamless it is. Spawning a new process (not like an Operating System process, they are more like actors) is super easy and has a very low overhead, unlike starting a new thread. You can have millions of them on one machine, no problem.

Moreover thanks to our immutability guarantees and every process being isolated you don’t have to worry about processes messing with each other. So, first of all if I want to geocode the pick up and drop off address in parallel I can just do that with a bit of Task.async and Task.await. I’d never just trust whatever ruby gems I use for geocoding to be threadsafe due to global et. al.

How does this help? Well, I have something that is easily parallelizable and I can just do that. Benchee generates statistics for different scenarios in parallel just because I can easily do so. That’s nice, because for lots of samples it might actually take a couple of seconds per scenario.

Another point is that there’s less need for background workers. Let’s take web sockets as an example. To the best of my knowledge it is recommended to load off all bigger tasks in the communication to a background workers in an evented architecture as we’d block our thread/operating system process from handling other events. In Phoenix every connection already runs in its own elixir process which means they are already executed in parallel and doing some more work in one won’t block the others.

This ultimately makes applications easier as you don’t have to deal with background workers, off loading work etc.

OTP

Defining what OTP really is, is hard. It’s somewhat a set of tools for concurrent programming and it includes everything from an in memory database to the Dialyzer tool mentioned earlier. It is probably most notorious for its “behaviours” like supervisors that help you build concurrent and distributed systems in the famous “Let it crash! (and restart it in a known good state maybe)” philosophy.

There’s big books written about this so I’m not gonna try to explain it. Just so much, years of experience about highly available systems are in here. There is so much to learn here and change how you see programming. It might be daunting, but don’t worry. People built some nice abstractions that are better to use and often it’s the job of a library or a framework to set these up. Phoenix and ecto do this for you (web requests/database connections respectively). I’ll out myself right now: I’ve never written a Supervisor or a GenServer for production purposes. I used abstractions or relied on what my framework brought with it.

If this has gotten you interested I warmly recommend “The Little Elixir & OTP Guidebook”. It walks you through building a complete worker pool application from simple to a more complex fully featured version.

Doctests

Imo the most underrated feature of elixir. Doc tests allow you to write iex example sessions in the documentation of a method. These will be executed during test runs and check if they still return the same values/still pass. They are also part of the awesome generated documentation. No more out of date/slightly wrong code samples in the documentation!

I have entire modules that only rely on their doctests, which is pretty awesome if you ask me. Also, contributing doc tests to libraries is a pretty great way to provide both documentation and tests. E.g. once upon a time I wanted to learn about the Agent module, but it didn’t click right away, so I made a PR to elixir with some nice doctests to help future generations.

A good language to learn

In the end, elixir is a good language to learn. It contains many great concepts that can make your coding lives easier and more enjoyable and all of that lives in a nice and accessible syntax. Even if you can’t use it at work straight away, learning these concepts will influence and improve your code. I experienced the same when I learned and read a couple of books about Clojure, I never wrote it professionally but it improved my Ruby code.

You might ask “Should we all go and write Elixir now?”. No. There are many great languages and there are no silver bullets. The eco system is still growing for instance. I’m also not a fan of rewriting all applications. Start small. See if you like it and if it works for you.

Lastly, if this has peaked your interest I have a whole talk up that focuses on the great explicit features of elixir and explains them in more detail: “Elixir & Phoenix – fast, concurrent and explicit”

edit1: Clarified that the bleacher report blog post is mostly about performance with little else

edit2: Fixed that you gotta specify the template by name, not the view

[1] It’s sort of like Traveling Salesman, put together with Knapsack and then you also need to decide what goes where. In short: a very hard problem to solve.

Videos & Slides: It’s About the Humans, Stupid (Lighting)

 

When I was at the excellent Pivorak meetup I had  the chance to give an “inspirtational” lightning talk. So I took this chance and prepared a lightning talk version of a talk I’ve been submitting to a lot of conferences but always got rejected. The talk is about “soft” skills, not the usual “hard” skills talks that you might be used to 🙂

The topic is very close to my heart as applications aren’t developed in a vacuum – they are developed with and for humans. I hope you enjoy it and that at some point I submit it to the “right” conference which will accept that type of talk 🙂

You can see the slides here or take a look at them at speakerdeck, slideshare or PDF.

Abstract

In the development world most people are striving for technical excellence: better code, faster run times, more convenient interfaces, better databases, faster deployments… But is that really what makes us better at developing software?

In the end software development is done by groups of people creating products together. To do that communication and collaboration between humans is essential – you can be the best programmer ever, if you can’t efficiently work with others what good does it do you?

This talk will give you a primer and food for further thought.

 

 

Slides: Stop Guessing and Start Measuring (Poly-Version)

Hello from the amazing Polyconf! I just gave my Stop Guessing and Start Measuring talk and if you are thinking “why do you post the slides of this SO MANY TIMES”, well the first one was an Elixir version, then a Ruby + Elixir version and now we are at a Poly version. The slides are mostly different and I’d say about ~50% of them are new. New topics covered include:

  • MJIT – what’s wrong with the benchmarks – versus TruffleRuby
  • JavaScript!
  • other nice adjustments

The all important video isn’t in the PDF export but you can see a big part of it on Instagram.

You can view the slides here or on speakerdeck, slideshare or PDF.

Abstract

“What’s the fastest way of doing this?” – you might ask yourself during development. Sure, you can guess, your intuition might be correct – but how do you know? Benchmarking is here to give you the answers, but there are many pitfalls in setting up a good benchmark and analyzing the results. This talk will guide you through, introduce best practices, and surprise you with some unexpected benchmarking results. You didn’t think that the order of arguments could influence its performance…or did you?

 

 

Video & Slides: How Fast is it Really? Benchmarking in Practice (Ruby)

My slides & video from visiting the excellent WRUG (Warsaw Ruby Users Group). The talk is a variation of the similarly named elixir talk, but it is ever evolving and here more focused on Ruby. It covers mostly how to setup and run good benchmarks, traps you can fall into and tools you should use.

You can also have a look at the slides right here or at speakerdeck, slideshare or PDF.

Abstract

“What’s the fastest way of doing this?” – you might ask yourself during development. Sure, you can guess what’s fastest or how long something will take, but do you know? How long does it take to sort a list of 1 Million elements? Are tail-recursive functions always the fastest?

Benchmarking is here to answer these questions. However, there are many pitfalls around setting up a good benchmark and interpreting the results. This talk will guide you through, introduce best practices and show you some surprising benchmarking results along the way.

edit: If you’re interested there’s another iteration of this talk that I gave at the pivorakmeetup

Slides: Code, Comments, Concepts, Comprehension – Conclusion?

The following is the first part of my visit to Warsaw in April (sorry for the super late post!). As part of the visit, I also visited Visuality and spent an evening there giving a presentation and discussing the topics afterwards for a long time. We capped it off some board games 😉 I had a great time and the discussions were super interesting.

The talk is a reworked old goldie (“Code is read many more times than written” / “Optimizing for Readability”) and is about readable code and keeping readable code. It’s evolved as I evolve – I learn new things, assign differing importance to different topics and discover entirely new important topicss.

You can view the slides here or on speakerdeck, slideshare or PDF.

 

Mastery comes from failure

In software development, and many other disciplines, people strive for mastery – you want to get better to be great something. For some reason failure is often seen as the opposite of mastery – after all masters don’t fail – or do they?

We often see talks and read great blog posts about all these great achievements of amazing people but almost never about major failures. But, how are failures important and useful? Well, let me tell you a little story from back when I was teaching an introductory web development course:

We were in the practice stage and I was doing my usual rounds looking at what the students did and helped them to resolve their problems. After glancing at a screen and saying that the problem could be resolved by running the migrations the student looked at me with disbelief and said:

Tobi, how can you just look at the screen and see what’s wrong in the matter of seconds? I’ve been trying to fix this for almost 15 minutes!

My reply was simple:

I’ve  probably made the same mistake a thousand times – I learned to recognize it.

And that’s what this post is about and where a lot of mastery comes from in my mind – from failure. We learn as we fail. What is one of the biggest differences between a novice and a master? Well, quite simply the master has failed many more times than the novice has ever tried. Through these failures the now master learned and achieved a great level of expertise – but you don’t see these past failures now. This is shown quite nicely in this excellent web-comic:

befriendswithfailure0005
“Be Friends with Failure” taken (with permission) from doodlealley. It focuses on art, but I believe it applies to programming just as much and I recommend reading the whole comic  🙂

For this to work properly we can’t program/work by coincidence though – when something goes wrong we must take the right measurements to determine why it failed and what we could do better the next time around. Post-mortems are relatively popular for bigger failures, in a post-mortem you identify the root cause of a problem, show how it lead to the observed misbehavior of the system and ideally identify steps to prevent such faults in the future. For service providers they are often even shared publicly.

I think we should always do our own little “post-mortems” – there doesn’t have to be a big service outage, data leak or whatever. Why did this ticket take longer than expected? Which assumptions we had were wrong? Could we change anything in our process to identify potential problems earlier on in the future? This interaction with my co-worker didn’t go as well as it could have been, how could we communicate better and more effectively? This piece of code is hard to deal with – what makes it hard to deal with, how could it be made easier?

Of course, we can’t only learn from our own failures (although those tend to work best!) but from mistakes of others as well – as we observe their situation and the impact it had and see what could have been done better, what they learned from it and what we can ultimately learn from it.

Therein lies the crux though – people are often afraid to share their failings. When nobody shares their failures – how are we supposed to learn? Often it seems that admitting to failure is a “sign of weakness” that you are not good enough and you’d rather be silent about it. Maybe you tell your closes friends about it, but no one else should know that YOU made a mistake! Deplorable!

I think we should rather be more open about it, share failures, share learnings and get better as a group.

This was signified for me as a couple of years back a friend asked me if it was ok to give a talk (I organize a local user group) about some mistakes made in a recent project. I thought it was a great idea to share these mistakes along with some learnings with everyone else. My friend seemed concerned what the others might think, after all it is not common to share stories like this. We had the talk at a meetup and it was a great success, it made me wonder though – how many people are out there that have great stories of failures and learnings to share but decide not to share them?

I’ve heard whispers of some few meetups (or even conferences?!) that focus on failure stories but they don’t seem to have reached the mainstream. I’d love to hear more failure stories! Tried Microservices/GraphQL/Elm/Elixir/Docker/React/HypeXY in your project and it all blew up? Tell me about it! Your Rails monolith basically exploded? Tell me more! You had an hour long outage due to a simple problem a linter could have detected? You have my attention!

What I’m saying is: Please go ahead and share your failures! Sharing them you learn more about them as you need to articulate and analyze, everyone else benefits, learns something and might have some input. Last but not least people see that mistakes happen, it demystifies this image we have of these great people who never make a mistake and who just always were great and instead shows us where they are coming from and what’s still happening to them.

My Failures + Lessons learned

Of course a blog post like this would feel empty, hollow and wrong without sharing a couple of my own failures or some that I observed and that shaped me. These are not detailed post-mortems but rather short bullet points of a failure/mistake and what I learned from it. Of course, these sound general but are also ultimately situational and more nuanced than this but are kept like this in favor of brevity – so please keep that in mind.

  • Reading a whole Ruby book from start to finish without doing any exercise taught me that this won’t teach me a programming language and that I can’t even write a basic program afterwards so I really should listen to the author and do the exercises
  • Trying to send a secret encryption key as a parameter through GET while working under pressure taught me that this is a bad idea (parameter is in the URL —> URL is not encrypted –> security FAIL) , that working under pressure indeed makes me worse and that I’d never miss a code review again, as this was thankfully caught during our code review
  • Finally diving into meta programming after regarding the topic as too magic for too long, I learned that I can learn almost anything and getting into it is mostly faster than I think – it’s the fear of it that keeps you away for too long
  • Overusing meta programming taught me that I should seek the simplest workable solution first and only reach for meta programming as a last resort as it is easy to build a harder to maintain and understand than necessary code base – sometimes it’s even better to have some duplication than that meta programming
  • Overusing meta programming also taught me about the negative performance implications especially if methods are called often
  • Being lied to in an interview taught me not to ask “Do you do TDD?” but rather “How do you work?”
  • Doing too much in my free time taught me that I should say “No” some times and that a “No” can be a “Yes” to yourself
  • Working on a huge Rails application taught me the dangers of fat models and all their validations, callbacks etc.
  • Letting a client push in more features late in the process of a feature taught me the value of splitting up tickets, finishing smaller work packages and again decisively saying “No!”
  • Feeling very uncomfortable in situations and not speaking up because I thought I was the only one affected taught me that when this is the case, chances are I’m mostly not the only one and others are affected way more so I should speak up
  • Having a Code of Conduct violation at one of my meetups showed me that I should pro actively inform all speakers about the CoC weeks before the talks in our communication and not just have it on the meetup page
  • Blindly sticking to practices and failing with it taught me to always keep an open mind and question what I’m doing and why I’m doing it
  • Doing two talks in the same week (while being wobbly unprepared for the second) taught me that when I do that again none of them can be original
  • Working in a project started with micro services and an inexperienced team showed me the overhead involved and how wrongly sliced services can be worse than any monolith
  • Building my first bigger project (in university, thankfully) in a team and completely messing it up at first showed me the value of design patterns
  • Skipping acceptance testing in a (university) project and then having the live demo error out on something we could have only caught in acceptance/end-to-end testing showed me how important those tests really are
  • Writing too many acceptance/end-to-end tests clarified  to me how tests should really be written on the right level of the testing pyramid in order to save test execution time, test writing time and test refactoring time
  • Seeing how I get less effective when panic strikes during production problems and how panic spreads to the rest of the team highlighted the importance of staying calm and collected especially during urgent problems
  • Also during urgent problems it is especially important to delegate and trust my co-workers, no single person can handle and fix all that – it’s a team effort
  • Accidentally breaking a crucial algorithm in edge cases (while fixing another bug) made me really appreciate our internal and external fallbacks/other algorithms and options so that the system was still operational
  • Working with overly (performance) optimized code showed me that premature optimization truly is the root of all evil, and the enemy of readability – measure and monitor where those performance bottle necks and hot spots are and only then go ahead and look for performance improvements there!
  • Using only variable names like a, b, c, d (wayyy back when I started programming) and then not being able to understand how my program worked a week later and having to completely rewrite it (couple of days before the hand in of the competition…) forever engraved the importance of readable and understandable names into my brain
  • Giving a talk that had a lot of information that I found interesting but ultimately wasn’t crucial for the understanding of the main topic taught me to cut down on additional content and streamline the experience towards the learning goals of the presentation
  • Working in a team where people yelled at each other taught me that I don’t want to deal with behavior like this and that intervention is hard – often it’s best to leave the room and let the situation cool down
  • Being in many different situations failing to act in a good way taught me that every situation is unique and that you can’t always act based on your previous experience or advice
  • Trying to contribute to an open source project for the first time and never hearing back from the maintainers and ultimately having my patch rejected half a year after I asked if this was cool to work on showed me the value of timely clear communication especially to support open source newcomers and keep their spirits high
  • Just recently I failed at creating a proper API for my Elixir benchmarking library, used a map for configuration and passed it in as an optional first argument (ouch!) and the main data structure was a list of two-tuples instead of a map as the second argument – gladly fixed in the latest release
  • probably a thousand more but that I can’t think of right now 😉

Closing

befriendswithfailure0010
“Be Friends with Failure” taken (with permission) from doodlealley.

We can also look at this from another angle – when we’re not failing then we’re probably doing things that we’re already good at and not something new where we’re learning and growing. There’s nothing wrong with doing something you’re good – but when you venture out to learn something new failure is part of the game, at least in the small.

I guess what I’m saying is – look at failures as an opportunity to improve. For you, your team, your friends and potential listeners. Analyze them. What could have prevented this? How could this have been handled better? Could the impact have been smaller? I mean this in the small (“I’ve been trying to fix this for the past hour, but the fault was over here in this other file”), in the big (“Damn, we just leaked customer secrets”) and everywhere in between.

We all make mistakes. Yes, even our idols – we sadly don’t talk about them as much. What’s important in my opinion is not that we made a mistake, but how we handle it and how we learn from it. I’d like us to be more open about it and share these stories so that others can avoid falling into the same trap.

Slides: How fast is it really? Benchmarking in Elixir

I’m at Elixirlive in Warsaw right now and just gave a talk. This talk is about benchmarking – the greater concepts but concrete examples are in Elixir and it works with my very own library benchee to also show some surprising Elixir benchmarks. The concepts are applicable in general and it also gets into categorizing benchmarks into micro/macro/application etc.

If you’ve been here and have feedback – positive or negative. Please tell me 🙂

Slides are available as PDF, speakerdeck and slideshare.

Abstract

“What’s the fastest way of doing this?” – you might ask yourself during development. Sure, you can guess what’s fastest or how long something will take, but do you know? How long does it take to sort a list of 1 Million elements? Are tail-recursive functions always the fastest?

Benchmarking is here to answer these questions. However, there are many pitfalls around setting up a good benchmark and interpreting the results. This talk will guide you through, introduce best practices and show you some surprising benchmarking results along the way.

Released: benchee 0.6.0, benchee_csv 0.5.0, benchee_json and benchee_html – HTML reports and nice graphs!

The last days I’ve been hard at work to polish up and finish releases of benchee (0.6.0 – Changelog), benchee_csv (0.5.0 – Changelog) as well as the initial releases of benchee_html and benchee_json!

I’m the proudest and happiest of finally getting benchee_html out of the door along with great HTML reports including plenty of graphs and the ability to export them! You can check out the example online report or glance at this screenshot of it:

reportWhile benchee_csv had some mere updates for compatibility and benchee_json just transforms the general suite to JSON (which is then used in the HTML formatter) I’m particularly excited about the big new features in benchee and of course benchee_html!

Benchee

The 0.6.0 is probably the biggest release of the “core” benchee library with some needed API changes and great features.

New run API – options last as keyword list

The “old” way you’d optionally pass in options as the first argument into run as a map and then define the jobs to benchmark in another map. I did this because in my mind the configuration comes first and maps are much easier to work with through pattern matching as opposed to keyword lists. However, having an optional first argument already felt kind of weird…

Thing is, that’s not the most elixir way to do this. It is rather conventional to pass in options as the last argument and as a keyword list. After voicing my concerns in the elixirforum, the solution was to allow passing in options as keyword lists but convert to maps internally to still have the advantage of good pattern matching among other advantages.

The old style still works (thanks to pattern matching!) – but it might get deprecated in the future. In this process though the run interface of the very first version of run, which used a list of tuples, doesn’t work anymore 😦

Multiple inputs

The great new feature is that benchee now supports multiple inputs – so that in one suite you can run the same functions against multiple different inputs. That is important as functions can behave very differently on inputs of different sizes or a different structure. Therefore it’s good to check the functions against multiple inputs. The feature was inspired by a discussion on an elixir issue with José Valim.

So what does this look like? Here it goes:

The hard thing about it was that it changed how benchmarking results had to be represented internally, as another level to represent the different inputs was needed. This lead to quite some work both in benchee and in plugins – but in the end it was all worth it 🙂

benchee_html

This has been in the making for way too long, should have released a month or 2 ago. But now it’s here! It provides a nice HTML table and four different graphs – 2 for comparing the different benchmarking jobs and 2 graphs for each individual job to take a closer look at the distribution of run times of this particular job. There is a wiki page at benchee_html to discern between the different graphs highlighting what they might be useful for. You can also export PNG images of the graphs at click of a simple icon 🙂

Wonder how to use it? Well it was already shown earlier in this post when showing off the new API. You just specify the formatters and the file where it should be written to 🙂

But without further ado you can check out the sample report or just take a look at these images 🙂

ipsboxplothistogram

raw_run_timesClosing Thoughts

Hope you enjoy benchmarking, with different inputs and then see great reports of them. Let me know what you like about benchee or what you don’t like about it and what could be better.