Yesterday at RUG::B I tried something I’d never done before: a more personal, story driven talk. And in order to adequately illustrate it and how different Open Source feel to me I also opted paint some very special drawings.
Open Source is something I’ve been doing for longer than people pay me to do programming. I’m immensely passionate about it and it felt like it was some kind of blind spot that I never gave a talk about it so far.
If you know of a conference this talk would be a good fit for, please let me know.
What’s it like to work on Open Source projects? They’re all the same aren’t they? No, they’re not – the longer I worked on Open Source the more I realize how different the experience is for each one of them. Walk with me through some stories that happened to me in Open Source and let’s see what we can take away.
I just released statistex – a library to calculate statistics based on a given data sample. It’s an extraction from my benchmarking library benchee and part of what I’ll henceforth call the “benchee library family”. As it’s been running in benchee basically since its inception I dubbed it 1.0.
The extraction is good because it helps the benchee code base focus on benchmarking and not things around it. It removes about 800 lines from the repo and it makes that code reusable for everyone else. It also makes it easier to introduce new statistics as it’s clearer that we’ll first introduce the logic inside statistex and later on just display it in benchee and friends. It’s good design.
Do we really need another statistics library?
I struggled with this question. I don’t want to split the eco system unnecessarily. To be honest, I thought there was no other library out there. There are at least statistics and Numerix, though. Not sure how I missed them when I started incorporating more advanced statistics.
Both include more functions than just for statistics: general math and more (drawing of random values for instance). They also support more statistical values than statistex at the time of this writing. So if you’re looking for something, that statistex doesn’t provide (yet) these are some of the first places I’d look.
Why would you still want to use statistex? Well there are some things it offers that the others don’t (to the best of my knowledge):
statistics/2 – Give me everything!
As mentioned before, statistex was extracted and the way it was used in benchee is just “give me all the statistics so that formatters can display them!” – so there’s a function that does exactly this:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
What’s cool about this? Well, little effort, big result! This is quite nice to explore data sets in iex for instance. Behind the scenes statistex reuses previously calculated values so that no value is calculated twice. For instance first you get the sampe_size and the total, both are then used to calculate the average. The average and sample_size are then passed on to calculate the variance and so forth. This way statistex is fast by not duplicating work if you want a bunch of statistical values (and benchee wants most of them).
But you don’t want all of these values but would still like to reuse previously calculated values? Got you covered!
Manually reuse previously calculated values
Say you want to calculate the variance but have already calculated the average and sample_size. Easy:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Like variance/2 a lot of function take an optional keyword list as arguments where you can provide previously calculated values (options are of course documented).
Raising on empty list input
Thanks to a discussion on elixir forum I made the decision to raise an ArgumentError when statistex is handed an empty list in most functions. Why? The statistics don’t make any sense at this point and it’s likely some kind of error either way/you probably want to display something other than a bunch of nils.
I know many won’t consider this a feature, I quite like the direction it pushes client code towards, though.
Is this enough for a new library?
I think so. 🙂 Especially getting all values at once and reusing previously calculated values are incredibly valuable properties. I also like that statistex is solely focussed on statistics. And the statistics it can’t compute yet? We’ll catch up on that over time. Moreover, it’s not like I spent some huge amount of work writing it as it was a simple extraction.
I’d be happy if you gave statistex a trial run and left some feedback.
I always loved the idea of Property Based Testing, it was the most mind blowing idea I encountered while learning clojure many years back. However, I always found it hard to apply in practice. I barely encountered the “easy” case where an operation was reversible (if I encode a term and then decode it again it should be equal to the original term, for instance). And the properties I came up always seemed to loose to catch any bugs.
So, I thought well let’s give it another shot. I got all these statistics functions in benchee/statistex – data should be easy to generate for them. Let’s give it a shot, we’ll surely not find a bug… or will we?
The first property based tests
If nothing else, I wanted to make sure no matter what numbers you throw at my statistics module it won’t blow up. To implement it I used stream_data:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This is what I came up with – our samples are any non empty list of floats and there are a bunch of checks that make sure the values are somewhere between minimum and maximum or bigger than 0. No way the tests are failing…
Honestly, I was shocked. On closer inspection, the standard deviation ratio was negative when I said it should always be positive. As the generated sample only contains negative numbers the average is negative as well. As the ratio is calculated by dividing the standard deviation by the average it turned out to be negative. Usually I only work with positive samples, hence it never occurred before. The ratio should still always be positive so an abs/1 call fixed it.
Another property
Thinking more I came up with another property:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
It’s much like the first properties, just making sure the percentile values are in order as they should there is absolutely no possibility that this will fail, absolutely none, well tested code… no chance it will fail …
Wait, the 25th percentile is bigger than the 50th percentile? No way that’s ok.
A lot of digging, googling and reading our original source for implementing percentile interpolation later I figured out the problem. Basically interpolation for small sample sizes is hard and also uncommon. We missed a clause/case stated in the source, that points out that for a too small percentile and sample size the value is to be set to the minimum.
Note that any p ≤ 1/(N+1) will simply be set to the minimum value.
Our p was 0.25 (25th percentile) and 0.25 <= 1/3. Implementing this clause (through guard clauses) fixed the test.
You can check out the full implementation and bug fixes in the PR.
Learnings
The generation part was super easy in the case shown. However, what’s impressive to me is that although the properties were very loosely defined they still uncovered 2 bugs. And that’s in code that both me and many of you have been running for quite a while in benchee. Sure, they are very specific edge cases but that’s what property based testing is good at: Finding edge cases!
If you have other ideas for properties to check, I’m happy to listen and learn. And give property based testing a shot yourselves even with very loose properties – you might be surprised what you find.
Ever wanted to implement something board game like in Elixir? Chess? Go? Islands? Well, then you’re gonna need a board!
But what data structure would be the most efficient one to use in Elixir? Conventional wisdom for a lot of programming languages is to use some sort of array. However, most programming languages with immutable data structures don’t have a “real” array data structure (we’ll talk about erlangs array later, it’s not really like the arrays in non functional languages) . Elixir is one of those languages.
For this benchmark I didn’t have a very specific board game in mind so I settled for a board size of 9×9 . It’s a bit bigger than a normal chess board (8×8), it’s exactly the size of the smallest “normal” Go-board and it’s one smaller than the board used in Islands implemented in Functional Web Development with Elixir, OTP and Phoenix, so it seemed like a good compromise. Different sizes are likely to sport different performance characteristics.
Without a concrete usage scenario in mind I settled on a couple of different benchmarks:
getting a value at the coordinates (0,0), (4, 4) and (8,8). This is a fairly nano/micro benchmark for data access and provides a good balance of values at the beginning/middle/end when thinking in list terms.
setting a value at the coordinates (0,0), (4, 4) and (8,8).
a still nano/micro benchmark that combines the two previous benchmarks by getting and setting all three mentioned values. I call this “mixed bag”.
Why stop at the previous one? The last benchmark just sets and gets every possible coordinate once (first it sets (0,0) then gets it, then it sets (0, 1), then gets it and so forth). This also simulates the board filling which can be important for some data structures. Completely filling a board is unrealistic for most board games however, as most games finish before this stage. This one is called “getting and setting full board”.
Something that is notably not benchmarked is the creation of boards. For (almost) all of the board implementations it could resolve to a constant value which should be similar in the time it takes to create. I wasn’t overly interested in that property and didn’t want to make the code less readable by inlining the constant after creation when I didn’t need to.
Also noteworthy is that these benchmark mostly treat reading and writing equally while in my experience most AIs/bots are much more read-heavy than write-heavy.
Take all these caveats of the benchmark design into consideration when looking at the results and if in doubt of course best write your own benchmark taking into account the concrete usage patterns of your domain.
Without further ado then let’s look at the different implementations I have benchmarked so far:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
All boards are built so that accessing a previously unset field will return nil. No assumptions about the data stored in the board have been made, which rules out String as an implementation type. In the benchmarks atoms are used as values.
In the descriptions of the data types below (x, y) is used to mark where what value is stored.
List2D: A 2 dimensional list representing rows and columns: [[(0, 0), (0, 1), (0, 2), ...], [(1, 0), (1, 1), ..], ..., [..., (8, 8)]]
List1D: Using the knowledge of a constant board size you can encode it into a one-dimensional list resolving the index as dimension * x + y: [(0, 0), (0, 1), (0, 2), ..., (1, 0), (1, 1), ..., (8, 8)]
Tuple2D: Basically like List2D but with tuples instead of lists: {{(0, 0), (0, 1), (0, 2), ...}, {(1, 0), (1, 1), ..}, ..., {..., (8, 8)}}
Tuple1D: Basically like List1D but with a tuple instead of a list: {(0, 0), (0, 1), (0, 2), ..., (1, 0), (1, 1),... (8, 8)}
Array1D: see above for the data structure in general, otherwise conceptually like Tuple1D.
MapTuple: A map that takes the tuple of the coordinates {x, y} as the key with the value being whatever is on the board: %{{0, 0} => (0, 0), {0, 1} => (0, 1), ..., {8, 8} => (8, 8)}. It’s a bit unfair compared to others shown so far as it can start with an empty map which of course is a much smaller data structure that is not only smaller but usually faster to retrieve values from. As the benchmarks start with an empty board that’s a massive advantage, so I also included a full map in the benchmark, see next/
MapTupleFull: Basically the same as above but initialized to already hold all key value pairs initialized as nil. Serves not only the purpose to see how this performs but also to see how MapTuple performs once it has “filled up”.
MapTupleHalfFull: Only looking at complete full performance and empty performance didn’t seem good either, so I added another one initialized from 0 to 4 on all columns (a bit more than a board half, totalling 45 key/value pairs).
MapTupleQuarterFull: Another one of these, this time with 27 key/value pairs. Why? Because there is an interesting performance characteristic, read on to find out 🙂
Map2D: Akin to List2D etc. a map of maps: %{0 => %{0 => (0, 0), 1 => (0, 1), ...}, 1 => %{0 => (1, 0), ...}, ..., 8 => %{..., 8 => (8, 8)}}
ETSSet: erlang ETS storage with table type set. Storage layout wise it’s basically the same as MapTuple, with a tuple of coordinates pointing at the stored value.
ETSOrderedSet: Same as above but with table type ordered_set.
ProcessDictionary: On a special request for Michał 😉 This is probably not a great default variant as you’re practically creating (process-) global state which means you can’t have two boards within the same process without causing mayham. Also might accidentally conflict with other code using the process dictionary. Still might be worth considering if you want to always run a board in its own process.
It’s significant to point out that all mentioned data types except for ETS and the process dictionary are immutable. This means that especially for those in the benchmark a new board is created in a before_each hook (does not count towards measured time) to avoid “contamination”.
Another notable exception (save for String for the aforementioned constraints) is Record. Records are internally represented as tuples but give you the key/value access of maps, however in elixir it is more common to use Structs (which are backed by maps). As both maps and tuples are already present in the benchmark including these likely wouldn’t lead to new insights.
System Setup
Operating System
Linux
CPU Information
Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Number of Available Cores
8
Available Memory
15.61 GB
Elixir Version
1.8.2
Erlang Version
22.0
Benchmarking Results
Benchmarks of course were run with benchee and the benchmarking script is here (nothing too fancy).
You can check them out in the repo as markdown (thanks to benchee_markdown) or HTML reports (benchee_html). Careful though if you’re on mobile some of the HTML reports contain the raw measurements and hence go up to 9MB in size and can take a while to load also due to the JS drawing graphs!
getting and setting full board iterations per second (higher is better)
It’s a tight race at the top when it comes to run time! Tupl1D, Tuple2D and MapTuple are all within striking range of each other and then there’s a sharp fall off.
Also there is a fair bit of variance involved as shown by the black “whiskers” (this is usual for benchmarks that finish in nanoseconds or microseconds because of garbage collection, interference etc.). Which one of these is best? To get a better picture let’s look at the whole table of results:
Name
IPS
Average
Deviation
Median
Mode
Minimum
Maximum
Tuple1D
133.95 K
7.47 μs
±23.29%
6.93 μs
6.88 μs
6.72 μs
492.37 μs
Tuple2D
132.16 K
7.57 μs
±29.17%
7.21 μs
7.16 μs
7.03 μs
683.60 μs
MapTuple
126.54 K
7.90 μs
±25.69%
7.59 μs
7.56 μs
7.43 μs
537.56 μs
ProcessDictionary
64.68 K
15.46 μs
±14.61%
15.12 μs
15.05 μs
14.89 μs
382.73 μs
ETSSet
60.35 K
16.57 μs
±9.17%
16.04 μs
15.95 μs
15.79 μs
161.51 μs
Array2D
56.76 K
17.62 μs
±17.45%
17.15 μs
17.04 μs
16.54 μs
743.46 μs
MapTupleFull
55.44 K
18.04 μs
±11.00%
16.92 μs
16.59 μs
16.43 μs
141.19 μs
MapTupleHalfFull
53.70 K
18.62 μs
±8.36%
17.96 μs
17.87 μs
17.67 μs
160.86 μs
Array1D
50.74 K
19.71 μs
±10.60%
19.29 μs
18.99 μs
18.81 μs
469.97 μs
ETSOrderedSet
39.53 K
25.30 μs
±10.51%
24.82 μs
24.57 μs
24.34 μs
390.32 μs
Map2D
36.24 K
27.59 μs
±8.32%
27.71 μs
25.90 μs
25.12 μs
179.98 μs
List2D
29.65 K
33.73 μs
±4.12%
33.31 μs
33.04 μs
31.66 μs
218.55 μs
MapTupleQuarterFull
28.23 K
35.42 μs
±3.86%
34.96 μs
34.61 μs
34.39 μs
189.68 μs
List1D
15.41 K
64.90 μs
±2.84%
64.91 μs
64.14 μs
62.41 μs
175.26 μs
Median, and Mode are good values to look at when unsure what is usually fastest. These values are the “middle value” and the most common respectively, as such they are much less likely to be impacted by outliers (garbage collection and such). These seem to reinforce that Tuple1D is really the fastest, if by a negligible margin.
MapTuple is very fast, but its sibling MapTupleFull, that already starts “full”, is more than 2 times slower. Whether this is significant for you depends if you start with a truly empty board (Go starts with an empty board, chess doesn’t for instance).
Somewhat expectedly List1D does worst as getting values towards to the end of the list it has to traverse the entire list which is incredibly slow.
As an aside, it’s easy to see in the box plot that the high deviation is mainly caused by some very big outliers:
Boxplot of getting and setting full board – dots are outliers
The dots denote outliers and they are so big (but few) that the rest of the chart is practically unreadable as all that remains from the actual box is practically a thick line.
What about memory consumption?
getting and setting full board memory usage (lower is better)
Here we can see the immediate drawback of Tuple1D – it’s memory consumption is many times worse than that of the others. My (educated) guess is that it’s because it has to replace/copy/update the whole tuple with it’s 9*9 = 81 values for every update operation. Tuple2D is much more economical here, as it only needs to to update the tuple holding the columns and the one holding the specific column we’re updating (2 * 9 = 18) to the best of my understanding.
Big Tuples like this are relatively uncommon in “the real world” in my experience though as their fixed size nature makes them inapplicable for a lot of cases. Luckily, our case isn’t one of them.
MapTuple does amazingly well overall as it’s probably the structure quite some people would have intuitively reached for for good constant memory access speed. It’s memory consumption is also impressively low.
ProcessDictionary is very memory efficient and also constantly in the top 4 when it comes to run time. However, at least run time wise there’s quite the margin ~15 μs to ~7 μs which doesn’t seem to make the risks worth it overall.
Other Observations
Let’s take a look at some other things that seem note worthy:
ETS isn’t the winner
This surprised me a bit (however I haven’t used ETS much). ETS was always tagged as the go to option for performance in my mind. Looking at the docs and use cases I know it makes sense though – we’re likely to see benefits for much larger data sets as ours is relatively small:
These (ETS) provide the ability to store very large quantities of data in an Erlang runtime system, and to have constant access time to the data.
Let’s have a look at some of the time it takes to retrieve a value – usually a much more common operation than writing:
get(0,0)
Name
IPS
Average
Deviation
Median
Mode
Minimum
Maximum
Tuple1D
44.12 M
22.66 ns
±842.77%
20 ns
20 ns
9 ns
35101 ns
Tuple2D
42.46 M
23.55 ns
±846.67%
20 ns
19 ns
7 ns
36475 ns
Array1D
30.38 M
32.92 ns
±84.61%
32 ns
32 ns
20 ns
8945 ns
MapTuple
29.09 M
34.38 ns
±111.15%
32 ns
31 ns
19 ns
10100 ns
MapTupleQuarterFull
18.86 M
53.03 ns
±37.27%
50 ns
49 ns
38 ns
2579 ns
Array2D
18.62 M
53.70 ns
±67.02%
50 ns
49 ns
34 ns
10278 ns
List1D
18.26 M
54.75 ns
±56.06%
53 ns
52 ns
42 ns
8358 ns
ProcessDictionary
17.19 M
58.18 ns
±1393.09%
52 ns
51 ns
39 ns
403837 ns
Map2D
15.79 M
63.34 ns
±25.86%
60 ns
54 ns
41 ns
388 ns
MapTupleHalfFull
10.54 M
94.87 ns
±27.72%
91 ns
89 ns
76 ns
2088 ns
MapTupleFull
10.29 M
97.16 ns
±18.01%
93 ns
89 ns
70 ns
448 ns
ETSSet
9.74 M
102.63 ns
±26.57%
100 ns
99 ns
78 ns
2629 ns
List2D
9.04 M
110.57 ns
±69.64%
105 ns
109 ns
82 ns
4597 ns
ETSOrderedSet
6.47 M
154.65 ns
±19.27%
152 ns
149 ns
118 ns
1159 ns
get(8, 8)
Name
IPS
Average
Deviation
Median
Mode
Minimum
Maximum
Tuple2D
42.47 M
23.55 ns
±788.60%
21 ns
20 ns
7 ns
33885 ns
Tuple1D
40.98 M
24.40 ns
±725.07%
22 ns
21 ns
10 ns
34998 ns
Array1D
29.67 M
33.70 ns
±161.51%
33 ns
32 ns
21 ns
18301 ns
MapTuple
28.54 M
35.03 ns
±986.95%
32 ns
32 ns
20 ns
230336 ns
ProcessDictionary
19.71 M
50.73 ns
±1279.45%
47 ns
47 ns
34 ns
377279 ns
Array2D
17.88 M
55.92 ns
±85.10%
52 ns
51 ns
35 ns
13720 ns
Map2D
13.28 M
75.31 ns
±32.34%
73 ns
65 ns
56 ns
2259 ns
MapTupleHalfFull
12.12 M
82.53 ns
±31.49%
80 ns
80 ns
60 ns
1959 ns
ETSSet
9.90 M
101.05 ns
±16.04%
99 ns
95 ns
78 ns
701 ns
MapTupleFull
9.85 M
101.53 ns
±19.29%
99 ns
90 ns
70 ns
487 ns
ETSOrderedSet
5.59 M
178.80 ns
±41.70%
169 ns
170 ns
135 ns
4970 ns
MapTupleQuarterFull
4.09 M
244.65 ns
±16.85%
242 ns
240 ns
226 ns
9192 ns
List2D
3.76 M
265.82 ns
±35.71%
251 ns
250 ns
231 ns
9085 ns
List1D
1.38 M
724.35 ns
±10.88%
715 ns
710 ns
699 ns
9676 ns
The top 3 remain relatively unchanged. What is very illustrative to look at is List1D and List2D though. For get(0, 0) List1D vastly outperforms its 2D sibling even being closest to the top group. That is easy to explain because it basically translates to looking at the first element of the list which is very fast for a linked list. However, looking at the last element is very slow and this is what get(8, 8) translates to. All elements have to be traversed until the end is reached. As such the whole thing is almost 16 times slower for List1D. List2D is still very slow but through it’s 2-dimenstional structure it only needs to look at 18 elements instead of 81.
MapTuple vs. MapTupleQuarterFull vs. MapTupleHalfFull vs. MapTupleFull
In most scenarios, including the biggest scenario, MapTupleQuarterFull performs worse than MapTuple (expected), MapTupleHalfFull (unexpected) and MapTupleFull (unexpected). I had expected its performance to be worse than MapTuple but better than MapTupleFull and MapTupleHalfFull. Why is that?
I had no idea but Johanna had one: it might have to do with the “magic” limit at which a map “really” becomes a map and not just a list that is linearly searched. That limit is defined as 32 entries in the erlang source code (link also provided by Johanna). Our quarter full implementation is below that limit (27 entries) and hence often performance characteristics more akin to List1D (see good get(0, 0) performance but bad get(8, 8) performance) than its “real” map cousins.
To the best of my understanding this “switch the implementation at size 32” is a performance optimization. With such a small data set a linear search often performs better than the overhead introduced by hashing, looking up etc. You can also see that the trade-off pays off as in the big benchmark where the whole board is filled incrementally MapTuple (which is initially empty and grows) still provides top performance.
What I still don’t fully understand is that sometimes MapTupleFull seems to still outperform MapTupleHalfFull – but only by a very negligible margin (most notably in the “big” getting and setting full board benchmark). The difference however is so small that it doesn’t warrant further investigation I believe, unless you have an idea of course.
Performance difference of Array vs. Tuple
In the introduction I said arrays are backed by tuples – how come their performance is way worse then? Well, let’s have a look at what an array actually looks like:
It cleverly doesn’t even initialize all the fields but uses some kind of length encoding saying “the value is the default value of nil for the next 100 fields” but also saving its set size limit of 81 (fun fact: these arrays can be configured to also dynamically grow!).
Once we set a value (at index 13) the representation changes showing still some length encoding “there is nothing here for the first 10 entries” but then the indexes 10..19 are expanded as a whole tuple that’s holding our value. So, to the best of my understanding arrays work by adding “stretches” of tuples the size of 10 as they need to.
However, our custom tuple implementations are perfectly sized to begin with and not too huge. Moreover, their whole size being set at compile-time probably enables some optimizations (or so I believe). Hence the tuple implementations outperform them while arrays don’t do too shabby (especially with read access) as compared to other implementations.
Conclusion
Tuples can be very good for the use case of known at compile time sized collections that need fast access and a simple flat map performs amazingly well. All that least for the relatively small board size (9×9 = 81 fields) benchmarked against here. There is a big caveat for the map though – it is so fast if we can start with an empty map and grow it in size as new pieces are set. The completely initialized map (MapTupleFull) performs way worse, tuples are the clear winners then.
Missing a data structure? Please do a PR! There’s a behaviour to implement and then just to lists to add your module name to – more details.
Update 1 (2019-06-17): Fixed MapTupleHalfFull. Before the update it was actually just quarter full 😅 which has wildly different performance characteristics for reasons now described along with the MapTupleQuarterFull implementation. Thanks goes to Johanna for pointing that out. Also the process registry has been added as another possible implementation on a suggestion from Michał 😉 . Also added a run time box plot to show outliers clearer and visually.
I was supposed to give this talk at ElixirConf.Eu, but sadly fell ill. These are the slides (still titled alpha-1) that I used to give it Elixir Berlin which was met with a great reception. Which is also why I was so looking forward to give it again and have it recorded… Anyhow, if you saw the talk and want to go through the slides again or you were looking forward to the slides – here they are.
Elixir is great, so clearly we’ll all rewrite our applications in Elixir. Mostly, you can’t and shouldn’t do that. This presentation will show you another path. You’ll see how at Liefery, we started with small steps instead of rewriting everything. This allowed us to reap the benefits earlier and get comfortable before getting deeper into it. We’ll examine in detail the tactics we used to create two Elixir apps for new requirements, and how we integrated them with our existing Rails code base.
Join us on our tale of adopting Elixir and Phoenix and see what we learned, what we loved, and what bumps we hit along the road
edit: slightly updated version from devday.io – PDFslideshare
People like to argue about programming languages: “This one is better!” “No this one!”. In these discussion, often the performance card is pulled. This language is that much faster in these benchmarks or this company just needs that many servers now. Performance shouldn’t matter that much in my opinion, and Nate Berkopec makes a good point about that in his blog post “Is Ruby too slow for web scale?” (TLDR; we can add more servers and developer time often costs more than servers):
The better conversation, the more meaningful and impactful one, is which framework helps me write software faster, with more quality, and with more happiness.
I agree with lots of the points Nate makes and I like him and the post, but still it rubbed me the wrong way a bit. While it also states the above, it makes it seem like people just switch languages for the performance gains. And that brought up a topic that has been bugging me for a while: If you’re switching your main language purely for performance, there’s a high chance you’re doing it wrong. Honestly, if all we cared about was performance we’d all still be writing Assembly, or C/C++ at least.
It’s also true, that performance is often hyped a lot around new languages and specifically Elixir can also be guilty of that. And sure, performance is great and we read some amazing stories about that. Two of the most prominent adoption stories that I can recall are usually cited and referred to for their great performance numbers. There is Pinterest “our API responses are in microseconds now” and there is Bleacher Report “we went from 150 servers to 5”. If you re-read the articles though, other benefits of elixir are mentioned as well ore are even discussed more than performance or even more.
The Pinterest article focuses first on Elixir as a good language, performance is arguably secondary in the post. Before we ever talk about microseconds there is lots of talk such as:
The language makes heavy use of pattern matching, a technique that prevents *value* errors which are much more common than *type* errors. It also has an innovative pipelining operator, which allows data to flow from one function to the next in a clear and easy to read fashion.
Then the famous microseconds drops in one paragraph, and then it immediately turns around and talks about code clarity again:
We’ve also seen an improvement in code clarity. We’re converting our notifications system from Java to Elixir. The Java version used an Actor system and weighed in at around 10,000 lines of code. The new Elixir system has shrunk this to around 1000 lines.
The Bleacher Report article has performance in its headline and at its heart, but it also mentions different benefits of Elixir:
The new language has led to cleaner code base and much less technical debt, according to Marx. It has also increased the speed of development(…)
So why do people rave about performance so much? Performance numbers are “objective”, they are “rationale”, they are impressive and people like those. It’s an easy argument to make that bleacher report uses just 5 servers instead of 150 in the old ruby stack. That’s a fact. It’s easy to remember and easy to put into a headline. Discussing the advantages of immutable data structures, pattern matching and “let it crash” philosophy is much more subjective, personal and nuanced.
Before we jump in, this blog post is general but some specific points might resonate the best with a ruby crowd as that is my main programming language/where I’m coming from. So, from other languages some of the points I’ll make will be like “meh I already got this” while I might miss out obvious cool things both Ruby and Elixir have.
Hence, after this lengthy introduction let’s focus on something different – what makes Elixir a language worth learning – how can it make day to day coding more productive in spite of performance?
Let’s get some performance stuff out of the way first…
(The irony of starting the first section of a blog post decisively not about performance by discussing performance is not lost on me)
First, I wanna touch the topic of performance again really quickly – does it really not matter? Can we seamlessly scale horizontally? Does performance not impact productivity?
Well it certainly does, as remarked by Devon:
“Performance” isn’t a good reason to choose @elixirphoenix.
“I don’t have to use caching for sub 200ms responses” is a _very_ good reason.
In a more general sense, if my runtime is already fast enough I don’t need to bother with more complex algorithms and extra concepts. I can just leave it as is. No extra engineering spent on “making it faster” – just on to the next. That’s a good thing. Especially caching can be just wonderful to debug.
What about performance of big tasks? Like Data processing, or in the case of the company I’m working for solving a vehicle routing problem1? You can’t just scale those up by throwing servers at it. You might be able to parallelize it, but that’s not too easy in Ruby and in general is often a bigger engineering effort. And some languages make that easier as well, like Elixir’s flow.
Vertical Scaling has its limits there. It works fine for serving more web requests, working on more background jobs but it gets more complicated when you have a big problem to solve that aren’t easily parallelizable especially if they need to be done wihin a given time frame.
Also, I might not be one of the cool docker + kubernetes kids, but if you tell me that there’s no overhead to managing 125 servers versus 5 servers, I tend to not believe it. If simply because the chance of anyone of your servers failing at any time is much bigger just cause you got more of them.
Well then, finally enough performance chatter in a post not about performance. Let’s look at the code and how it can make your life easier! I swear I try to keep these sections short and sweet, although admittedly that’s not exactly my strength (who would have guessed by now? 😉 )
Pattern Matching
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Pattern Matching is my single favorite feature. If I could pick a single feature to be adopted in other programming languages it would be pattern matching. I find myself writing pattern matching code in Ruby, then sighing… “Ugh right I can’t do this”.It changed the way I think.
Enough “this is soo great”. With pattern matching you basically make assertions on the structure and can get values directly out of a deeply nested map and put their value into a variable. It runs deeper than that though. You also have method overloading and elixir will try to match the functions from top to bottom which means you can have different function definitions based on the structure of your input data.
You can’t just use it on maps though. You can use it on lists as well, so you can have a separate function clause for an empty or one element list which is really great for recursion and catching edge cases.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
If you’re unfamiliar with immutable data structures you might wonder how the hell one ever gets anything done? Well, you have to reassign values to the return values of functions if you wanna have any sort of change. You get pure functions, which means no side effects. The only thing that “happens” is the return value of the function. But, how does that help anyone?
Well, it means you have all your dependencies and their effect right there – there is no state to hold on which execution could depend. Everything that the function depends on is a parameter. That makes for superior understandability, debugging experience and testing.
Already months into my Elixir journey I noticed that I was seemingly much better at debugging library code than I was in Ruby. The reason, I believe, is the above. When I debug something in Ruby what a method does often depends on one or more instance variables. So, if yo wanna understand why that method “misbehaves” you gotta figure out which code sets those instance variables, which might depend on other instance variables being set and so on… Similarly a method might have the side effect of changing some instance variable. What is the effect in the end? You might never know.
With pure functions I can see all the dependencies of a function at a glance, I can see what they return and how that new return value is used in further function calls. It reads more like a straight up book and less like an interconnected net where I might not know where to start or stop looking.
The Pipeline Operator
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
How does a simple operator make it into this list?
Well, it’s about the code that it leads you to. The pipeline operator passes the value of the previous expression into the next function as the first argument. That gives you a couple of guidelines. First, when determining the order of arguments thinking about which one is the main data structure and putting that one first gives you a new guideline. Secondly, it leads you to a design with a main data structure per function, which can turn out really nice.
The above is an actual interface to my benchmarking library benchee. One of the design goals was to be “pipable” in elixir. This lead me to the design with a main Suite data structure in which all the important information is stored. As a result, implementing formatters is super easy as they are just a function that takes the suite and they can pick the information to take into account. Moreover, each and every one of those steps is interchangeable and well suited for plugins. As long you provide the needed data for later processing steps there is nothing stopping you from just replacing a function in that pipe with your own.
Lastly, the pipeline operator represents very well how I once learned to think about Functional Programming, it’s a transformation of inputs. The pipeline operator perfectly mirrors this, we start with some data structure and through a series of transformations we get some other data structure. We start with a configuration and end up with a complete benchmarking suite. We start with a URL and some parameters which we transform into some HTML to send to the user.
Railway Oriented Programming
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
I’d love to ramble on about Railway Oriented Programming, but there’s already good blog posts about that out there. Basically, instead of always checking if an error had already occurred earlier we can just branch out to the error track at any point.
It doesn’t seem all that magical until you use it for the first time. I remember suggesting using it to a colleague on a pull request (without ever using it before) and my colleague came back like “It’s amazing”!
It’s been a pattern in the application ever since. It goes a bit like this:
Check the basic validity of data that we have (all fields present/sensible data)
Validate that data with another system (business logic rules in some external service)
Insert record into database
Anyone of those steps could fail, and if it fails executing the other steps makes no sense. So, as soon as a function doesn’t return {:ok, something} we error out to the error track and otherwise we stay on the happy track.
Implicit code feels like magic. It just works without writing any code. My controller instance variables are just present in the view? The name of my view is automatically inferred I don’t have to write anything? Such magic, many wow.
Phoenix, the most popular elixir web framework, takes another approach. You have to specify a template (which is like a Rails view) by name and explicitly pass parameters to it:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
No magic. What happens is right there, you can see it. No accidentally making something accessible to views (or partials) anymore! You know what else is there? The connection and the parameters so we can make use of them, and pattern match on them.
Another place where the elixir eco system is more explicit is when loading relations of a record:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This is ecto, the “database access layer” in the elixir world. As you see, we have to explicitly preload associations we want to use. Seems awful bothersome, doesn’t it?
I love it!
No more N+1 queries, as Rails loads things magically for me. Also, I get more control. I know about the db queries my application fires against the database. Just the other day I fixed a severe performance degradation as our app was loading countless records from the database and instantiated them for what should have been a simple count query. Long story short, it was a presenter object so .association loaded all the objects, put them in presenters and then let my .size be executed on that. I would have never explicitly preloaded that data and hence found out much earlier that something is wrong with this.
Speaking of explicitness and ecto…
Ecto Changesets
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
The problem with them is the topic for another topic entirely but in short, validations and callbacks are executed all the time (before save, validation, create, whatever) but lots of them are just added for one feature that is maybe used in 2 places. Think about the user password. Validating it and hashing/salting it is only ever relevant when a user registers or changes the password. But that code sits there and is executed in a not exactly trivial to determine order. Sometimes that gets in the way of new features or tests so you start to throw a bunch of ifs at it.
Ecto Changesets were one of the strangest things for me to get used to coming to Elixir. Basically they just encapsulate a change operation, saying which parameters can take part, what should be validated and other code to execute. You can also easily combine changesets, in the code above the registration_changeset uses the new_changeset and adds just password functionality on top.
I only deal with password stuff when I explicitly want to. I know which parameters were allowed to go in/change so I just need to validate those. I know exactly when what step happens so it’s easy to debug and understand.
Beautiful.
Optional Type-Checking
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Want to try typing but not all the time? Elixir has got something for you! It’s cool, but not enough space here to explain it, dialyxir makes the dialyzer tool quite usable and it also goes beyond “just” type checking and includes other static analysis features as well. Still, in case you don’t like types it’s optional.
Parallelism
“Wait, you swore this was it about performance! This is outrageous!”
Relax. While Parallelism can be used for better performance it’s not the only thing. What’s great about parallelism in elixir/the Erlang VM in general is how low cost and seamless it is. Spawning a new process (not like an Operating System process, they are more like actors) is super easy and has a very low overhead, unlike starting a new thread. You can have millions of them on one machine, no problem.
Moreover thanks to our immutability guarantees and every process being isolated you don’t have to worry about processes messing with each other. So, first of all if I want to geocode the pick up and drop off address in parallel I can just do that with a bit of Task.async and Task.await. I’d never just trust whatever ruby gems I use for geocoding to be threadsafe due to global et. al.
How does this help? Well, I have something that is easily parallelizable and I can just do that. Benchee generates statistics for different scenarios in parallel just because I can easily do so. That’s nice, because for lots of samples it might actually take a couple of seconds per scenario.
Another point is that there’s less need for background workers. Let’s take web sockets as an example. To the best of my knowledge it is recommended to load off all bigger tasks in the communication to a background workers in an evented architecture as we’d block our thread/operating system process from handling other events. In Phoenix every connection already runs in its own elixir process which means they are already executed in parallel and doing some more work in one won’t block the others.
This ultimately makes applications easier as you don’t have to deal with background workers, off loading work etc.
OTP
Defining what OTP really is, is hard. It’s somewhat a set of tools for concurrent programming and it includes everything from an in memory database to the Dialyzer tool mentioned earlier. It is probably most notorious for its “behaviours” like supervisors that help you build concurrent and distributed systems in the famous “Let it crash! (and restart it in a known good state maybe)” philosophy.
There’s big books written about this so I’m not gonna try to explain it. Just so much, years of experience about highly available systems are in here. There is so much to learn here and change how you see programming. It might be daunting, but don’t worry. People built some nice abstractions that are better to use and often it’s the job of a library or a framework to set these up. Phoenix and ecto do this for you (web requests/database connections respectively). I’ll out myself right now: I’ve never written a Supervisor or a GenServer for production purposes. I used abstractions or relied on what my framework brought with it.
If this has gotten you interested I warmly recommend “The Little Elixir & OTP Guidebook”. It walks you through building a complete worker pool application from simple to a more complex fully featured version.
Doctests
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Imo the most underrated feature of elixir. Doc tests allow you to write iex example sessions in the documentation of a method. These will be executed during test runs and check if they still return the same values/still pass. They are also part of the awesome generated documentation. No more out of date/slightly wrong code samples in the documentation!
I have entire modules that only rely on their doctests, which is pretty awesome if you ask me. Also, contributing doc tests to libraries is a pretty great way to provide both documentation and tests. E.g. once upon a time I wanted to learn about the Agent module, but it didn’t click right away, so I made a PR to elixir with some nice doctests to help future generations.
A good language to learn
In the end, elixir is a good language to learn. It contains many great concepts that can make your coding lives easier and more enjoyable and all of that lives in a nice and accessible syntax. Even if you can’t use it at work straight away, learning these concepts will influence and improve your code. I experienced the same when I learned and read a couple of books about Clojure, I never wrote it professionally but it improved my Ruby code.
You might ask “Should we all go and write Elixir now?”. No. There are many great languages and there are no silver bullets. The eco system is still growing for instance. I’m also not a fan of rewriting all applications. Start small. See if you like it and if it works for you.
Lastly, if this has peaked your interest I have a whole talk up that focuses on the great explicit features of elixir and explains them in more detail: “Elixir & Phoenix – fast, concurrent and explicit”
edit1: Clarified that the bleacher report blog post is mostly about performance with little else
edit2: Fixed that you gotta specify the template by name, not the view
[1] It’s sort of like Traveling Salesman, put together with Knapsack and then you also need to decide what goes where. In short: a very hard problem to solve. ↑
Hello from the amazing Polyconf! I just gave my Stop Guessing and Start Measuring talk and if you are thinking “why do you post the slides of this SO MANY TIMES”, well the first one was an Elixir version, then a Ruby + Elixir version and now we are at a Poly version. The slides are mostly different and I’d say about ~50% of them are new. New topics covered include:
MJIT – what’s wrong with the benchmarks – versus TruffleRuby
“What’s the fastest way of doing this?” – you might ask yourself during development. Sure, you can guess, your intuition might be correct – but how do you know? Benchmarking is here to give you the answers, but there are many pitfalls in setting up a good benchmark and analyzing the results. This talk will guide you through, introduce best practices, and surprise you with some unexpected benchmarking results. You didn’t think that the order of arguments could influence its performance…or did you?
I’m at Elixirlive in Warsaw right now and just gave a talk. This talk is about benchmarking – the greater concepts but concrete examples are in Elixir and it works with my very own library benchee to also show some surprising Elixir benchmarks. The concepts are applicable in general and it also gets into categorizing benchmarks into micro/macro/application etc.
If you’ve been here and have feedback – positive or negative. Please tell me 🙂
“What’s the fastest way of doing this?” – you might ask yourself during development. Sure, you can guess what’s fastest or how long something will take, but do you know? How long does it take to sort a list of 1 Million elements? Are tail-recursive functions always the fastest?
Benchmarking is here to answer these questions. However, there are many pitfalls around setting up a good benchmark and interpreting the results. This talk will guide you through, introduce best practices and show you some surprising benchmarking results along the way.
I’m the proudest and happiest of finally getting benchee_html out of the door along with great HTML reports including plenty of graphs and the ability to export them! You can check out the example online report or glance at this screenshot of it:
While benchee_csv had some mere updates for compatibility and benchee_json just transforms the general suite to JSON (which is then used in the HTML formatter) I’m particularly excited about the big new features in benchee and of course benchee_html!
Benchee
The 0.6.0 is probably the biggest release of the “core” benchee library with some needed API changes and great features.
New run API – options last as keyword list
The “old” way you’d optionally pass in options as the first argument into run as a map and then define the jobs to benchmark in another map. I did this because in my mind the configuration comes first and maps are much easier to work with through pattern matching as opposed to keyword lists. However, having an optional first argument already felt kind of weird…
Thing is, that’s not the most elixir way to do this. It is rather conventional to pass in options as the last argument and as a keyword list. After voicing my concerns in the elixirforum, the solution was to allow passing in options as keyword lists but convert to maps internally to still have the advantage of good pattern matching among other advantages.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
The old style still works (thanks to pattern matching!) – but it might get deprecated in the future. In this process though the run interface of the very first version of run, which used a list of tuples, doesn’t work anymore 😦
Multiple inputs
The great new feature is that benchee now supports multiple inputs – so that in one suite you can run the same functions against multiple different inputs. That is important as functions can behave very differently on inputs of different sizes or a different structure. Therefore it’s good to check the functions against multiple inputs. The feature was inspired by a discussion on an elixir issue with José Valim.
So what does this look like? Here it goes:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
The hard thing about it was that it changed how benchmarking results had to be represented internally, as another level to represent the different inputs was needed. This lead to quite some work both in benchee and in plugins – but in the end it was all worth it 🙂
benchee_html
This has been in the making for way too long, should have released a month or 2 ago. But now it’s here! It provides a nice HTML table and four different graphs – 2 for comparing the different benchmarking jobs and 2 graphs for each individual job to take a closer look at the distribution of run times of this particular job. There is a wiki page at benchee_html to discern between the different graphs highlighting what they might be useful for. You can also export PNG images of the graphs at click of a simple icon 🙂
Wonder how to use it? Well it was already shown earlier in this post when showing off the new API. You just specify the formatters and the file where it should be written to 🙂
But without further ado you can check out the sample report or just take a look at these images 🙂
Closing Thoughts
Hope you enjoy benchmarking, with different inputs and then see great reports of them. Let me know what you like about benchee or what you don’t like about it and what could be better.