Looking for a job!

It’s that time: I’m looking for a job! You can find my CV and all relevant links over at my website: CV Web, CV PDF

Quick links that might interest you: website, github, twitter and blog

Who am I and why would you want to hire me?

My name is Tobi, but online I’m better known as PragTob. I am a full-stack engineer & leader deeply interested in agile methodologies, web technologies, software crafting and teaching. I love creating products that help people and am fascinated with the human side of software development.

To have a look at some of the technical topics that I’m involved with you can check out this blog but the core technologies I’ve worked with are Ruby, Elixir, PostgreSQL and JavaScript. I enjoy pair programming, mentoring, TDD, naming discussions and clean code bases. I also have a passion for performance and eliminating bottlenecks. I maintain a variety of open source projects mostly in Ruby (f.ex. simplecov) and Elixir (f.ex. benchee).

While I have this technical background, I believe that the so called “soft”/people/social skills are more important for developers as we all work together in teams. That is even more vital when looking at any kind of lead or even management position, as mentoring, communicating and collaborating is at the heart of what people in these positions should do. I’m deeply interested in topics such as motivation and communication. In my last job I was responsible for a team of ~15 people (together with the CTO). We relied on constant feedback through retrospectives to adapt our processes and grow the team through introducing the concept of seasons, going remote first, enhancing our onboarding and many more improvements. I also did regular one-on-ones, interviewed candidates, facilitated retrospectives and other meetings as well as doing knowledge sharing sessions for the whole team.

Other than these I’ve been mentoring at Rails Girls Berlin/code curious for more than 7 years and am coaching my Rails Girls project group, the rubycorns, for more than 6 years. I also run the Ruby User Group Berlin, and give presentations at various conferences.

My CV goes into more detail about what I’ve done.

What am I looking for?

I’m looking for a company where we build something useful together and where I can have an impact on product, people and process. With more concrete points this amounts to:

Position: I’m primarily looking for lead positions with responsibility so CTO/VP of Engineering/Head of Engineering/Team Lead – all depending on company size and culture. In the right circumstances, I can also imagine working as a Senior Developer. I want to be in an environment where my impact goes beyond code as I love to help people and improve processes. I like to be where I can help the team & the company the most, sometimes that’s mentoring and sharing knowledge, sometimes that’s fixing critical performance bugs and sometimes that’s participating in an all day workshop at the headquarters of a potential client.

Location: Berlin or potentially remote. I’m not looking to relocate currently.

Employment/Freelance: I’m looking for full time employment (reduced hours would also be ok) but am also open to freelance work.

Field of Work: I’m looking for a company that helps people solve real problems. I want a purpose I can get behind. For me that means no crypto currencies, advertisement or fintech for the rich. In a similar vein I’m not particularly interested in consultancy/agency work as I don’t like to travel every week and really like to work at a product company where I can really dive into the domain.

Diversity: I believe that the best products and work environments are created by diverse teams. Hence, this should be a core value applied through all levels of an organization. Or at least, the problem should be recognized and active work to counter it be put forth.

Company Culture: I’m looking for a company that trusts its employees and is open. It should also support a sustainable pace. Regular overtime is nothing to be proud of. Knowledge sharing should be key and therefore asking questions should be encouraged.

Time Allocation: I love companies that trust their employees and allow them flexible working hours and locations. Meaning it’s ok to work remotely for a couple of days or from home. Historically, I did some of my best work checking out emails and pull requests at home in the morning and then biking to work afterwards. Beloved refactorings have also emerged while dog sitting.

Benefits: Giving people time and room to grow is important to me. Particularly I usually speak at a couple of conferences during a year and think going there should be work, not vacation time.

Languages: I have expert knowledge of Ruby and Elixir. Working with Elixir would be a plus, but not a must. I also like working with JavaScript and am not afraid to do CSS albeit my CSS has gotten rather rusty sadly. I’m also naturally open to other languages, while I’m certainly most effective in the mentioned language the right company with the right culture and purpose is more important. Particularly Rust would be very interesting to me, granted I’m not too good at it (yet).

Open Source: An employer that values contributing back to open source or sharing their own creations would be a big plus, even more so if it happened to be full OSS work.

I’m aware that it’s hard to tick all of these boxes, and if you did hey we might be a great match. I just wanted to communicate wishes and values clearly as a starting point.

Getting in touch

You can send me an email at pragtob@gmail.com – otherwise my Twitter direct messages are also open.

Please don’t cold call me 🙂

 

Released: Statistex 1.0 an Elixir Statistics calculation library

I just released statistex – a library to calculate statistics based on a given data sample. It’s an extraction from my benchmarking library benchee and part of what I’ll henceforth call the “benchee library family”. As it’s been running in benchee basically since its inception I dubbed it 1.0.

The extraction is good because it helps the benchee code base focus on benchmarking and not things around it. It removes about 800 lines from the repo and it makes that code reusable for everyone else. It also makes it easier to introduce new statistics as it’s clearer that we’ll first introduce the logic inside statistex and later on just display it in benchee and friends. It’s good design.

Do we really need another statistics library?

I struggled with this question. I don’t want to split the eco system unnecessarily. To be honest, I thought there was no other library out there. There are at least statistics and Numerix, though. Not sure how I missed them when I started incorporating more advanced statistics.

Both include more functions than just for statistics: general math and more (drawing of random values for instance). They also support more statistical values than statistex at the time of this writing. So if you’re looking for something, that statistex doesn’t provide (yet) these are some of the first places I’d look.

Why would you still want to use statistex? Well there are some things it offers that the others don’t (to the best of my knowledge):

statistics/2 – Give me everything!

As mentioned before, statistex was extracted and the way it was used in benchee is just “give me all the statistics so that formatters can display them!” – so there’s a function that does exactly this:

What’s cool about this? Well, little effort, big result! This is quite nice to explore data sets in iex for instance. Behind the scenes statistex reuses previously calculated values so that no value is calculated twice. For instance first you get the sampe_size and the total, both are then used to calculate the average. The average and sample_size are then passed on to calculate the variance and so forth. This way statistex is fast by not duplicating work if you want a bunch of statistical values (and benchee wants most of them).

But you don’t want all of these values but would still like to reuse previously calculated values? Got you covered!

Manually reuse previously calculated values

Say you want to calculate the variance but have already calculated the average and sample_size. Easy:

Like variance/2 a lot of function take an optional keyword list as arguments where you can provide previously calculated values (options are of course documented).

Raising on empty list input

Thanks to a discussion on elixir forum I made the decision to raise an ArgumentError when statistex is handed an empty list in most functions. Why? The statistics don’t make any sense at this point and it’s likely some kind of error either way/you probably want to display something other than a bunch of nils.

I know many won’t consider this a feature, I quite like the direction it pushes client code towards, though.

Is this enough for a new library?

I think so. 🙂 Especially getting all values at once and reusing previously calculated values are incredibly valuable properties. I also like that statistex is solely focussed on statistics. And the statistics it can’t compute yet? We’ll catch up on that over time. Moreover, it’s not like I spent some huge amount of work writing it as it was a simple extraction.

I’d be happy if you gave statistex a trial run and left some feedback.

 

Slides: Functioning Among Humans (RUG-B)

In a trial run for Heart of Clojure I gave my talk Functioning Among Humans yesterday at RUG::B. It covers topics very close to my heart and is a talk I wanted to give forever. It’s sort of a “best of” soft/social/people skills that I learned going back all the way to high school and that are useful to me in every day life.

If you happened to have seen the talk, please feel free to reach out to me with feedback as I still want to improve upon it for future versions.

Slides can be viewed here, on speaker deck, slideshare or PDF

Abstract

In the development world most people are striving for technical excellence: better code, faster run times, more convenient interfaces, better databases… But is that really what helps us create better software?

In the end software development is done by groups of people creating products together. To do that communication and collaboration are essential. You can be the best programmer ever, but if you can’t efficiently work with others what good does it do you?

This talk will introduce you to relevant, easy to grasp concepts of collaboration and communication as well as give you food for thought.

Finding Bugs with Property Based Testing in a Statistics Calculation

I always loved the idea of Property Based Testing, it was the most mind blowing idea I encountered while learning clojure many years back. However, I always found it hard to apply in practice. I barely encountered the “easy” case where an operation was reversible (if I encode a term and then decode it again it should be equal to the original term, for instance). And the properties I came up always seemed to loose to catch any bugs.

So, I thought well let’s give it another shot. I got all these statistics functions in benchee/statistex – data should be easy to generate for them. Let’s give it a shot, we’ll surely not find a bug… or will we?

The first property based tests

If nothing else, I wanted to make sure no matter what numbers you throw at my statistics module it won’t blow up. To implement it I used stream_data:

This is what I came up with – our samples are any non empty list of floats and there are a bunch of checks that make sure the values are somewhere between minimum and maximum or bigger than 0. No way the tests are failing…

Wait, the tests are failing?!

 Failed with generated values (after 2 successful runs):

* Clause: samples <- list_of(float(), min_length: 1) Generated: [-9.0, -1.0] Assertion with >= failed
code: assert stats.standard_deviation_ratio() >= 0
left: -1.131370849898476
right: 0

Honestly, I was shocked. On closer inspection, the standard deviation ratio was negative when I said it should always be positive. As the generated sample only contains negative numbers the average is negative as well. As the ratio is calculated by dividing the standard deviation by the average it turned out to be negative. Usually I only work with positive samples, hence it never occurred before. The ratio should still always be positive so an abs/1 call fixed it.

Another property

Thinking more I came up with another property:

It’s much like the first properties, just making sure the percentile values are in order as they should there is absolutely no possibility that this will fail, absolutely none, well tested code… no chance it will fail …

IT FAILED AGAIN?!?!?!

 Failed with generated values (after 4 successful runs):

* Clause: samples <- list_of(float(), min_length: 1)
Generated: [1.0, 33.0]

Assertion with <= failed
code: assert percies[25] <= percies[50]
left: 25.0
right: 17.0

Wait, the 25th percentile is bigger than the 50th percentile? No way that’s ok.

A lot of digging, googling and reading our original source for implementing percentile interpolation later I figured out the problem. Basically interpolation for small sample sizes is hard and also uncommon. We missed a clause/case stated in the source, that points out that for a too small percentile and sample size the value is to be set to the minimum.

Note that any p ≤ 1/(N+1) will simply be set to the minimum value.

Our p was 0.25 (25th percentile) and 0.25 <= 1/3. Implementing this clause (through guard clauses) fixed the test.

You can check out the full implementation and bug fixes in the PR.

Learnings

The generation part was super easy in the case shown. However, what’s impressive to me is that although the properties were very loosely defined they still uncovered 2 bugs. And that’s in code that both me and many of you have been running for quite a while in benchee. Sure, they are very specific edge cases but that’s what property based testing is good at: Finding edge cases!

If you have other ideas for properties to check, I’m happy to listen and learn. And give property based testing a shot yourselves even with very loose properties – you might be surprised what you find.

 

What’s the Fastest Data Structure to Implement a Game Board in Elixir?

Ever wanted to implement something board game like in Elixir? Chess? Go? Islands? Well, then you’re gonna need a board!

But what data structure would be the most efficient one to use in Elixir? Conventional wisdom for a lot of programming languages is to use some sort of array. However, most programming languages with immutable data structures don’t have a “real” array data structure (we’ll talk about erlangs array later, it’s not really like the arrays in non functional languages) . Elixir is one of those languages.

As I like board games this was one of the first questions I ever asked the community. It’s also an interesting and relatable example to see and understand the performance trade-offs of different data structures.

Complete sources can be found in my elixir_boards_benchmark repo.

Benchmark Design

For this benchmark I didn’t have a very specific board game in mind so I settled for a board size of 9×9 . It’s a bit bigger than a normal chess board (8×8), it’s exactly the size of the smallest “normal” Go-board and it’s one smaller than the board used in Islands implemented in Functional Web Development with Elixir, OTP and Phoenix, so it seemed like a good compromise. Different sizes are likely to sport different performance characteristics.

Without a concrete usage scenario in mind I settled on a couple of different benchmarks:

  • getting a value at the coordinates (0,0), (4, 4) and (8,8). This is a fairly nano/micro benchmark for data access and provides a good balance of values at the beginning/middle/end when thinking in list terms.
  • setting a value at the coordinates (0,0), (4, 4) and (8,8).
  • a still nano/micro benchmark that combines the two previous benchmarks by getting and setting all three mentioned values. I call this “mixed bag”.
  • Why stop at the previous one? The last benchmark just sets and gets every possible coordinate once (first it sets (0,0) then gets it, then it sets (0, 1), then gets it and so forth). This also simulates the board filling which can be important for some data structures. Completely filling a board is unrealistic for most board games however, as most games finish before this stage. This one is called “getting and setting full board”.

Something that is notably not benchmarked is the creation of boards. For (almost) all of the board implementations it could resolve to a constant value which should be similar in the time it takes to create. I wasn’t overly interested in that property and didn’t want to make the code less readable by inlining the constant after creation when I didn’t need to.

Also noteworthy is that these benchmark mostly treat reading and writing equally while in my experience most AIs/bots are much more read-heavy than write-heavy.

Take all these caveats of the benchmark design into consideration when looking at the results and if in doubt of course best write your own benchmark taking into account the concrete usage patterns of your domain.

Without further ado then let’s look at the different implementations I have benchmarked so far:

Contenders

All boards need to implement a simple Board behaviour:

All boards are built so that accessing a previously unset field will return nil. No assumptions about the data stored in the board have been made, which rules out String as an implementation type. In the benchmarks atoms are used as values.

In the descriptions of the data types below (x, y) is used to mark where what value is stored.

  • List2D: A 2 dimensional list representing rows and columns: [[(0, 0), (0, 1), (0, 2), ...], [(1, 0), (1, 1), ..], ..., [..., (8, 8)]]
  • List1D: Using the knowledge of a constant board size you can encode it into a one-dimensional list resolving the index as dimension * x + y: [(0, 0), (0, 1), (0, 2), ..., (1, 0), (1, 1), ..., (8, 8)]
  • Tuple2D: Basically like List2D but with tuples instead of lists: {{(0, 0), (0, 1), (0, 2), ...}, {(1, 0), (1, 1), ..}, ..., {..., (8, 8)}}
  • Tuple1D: Basically like List1D but with a tuple instead of a list: {(0, 0), (0, 1), (0, 2), ..., (1, 0), (1, 1),... (8, 8)}
  • Array2D: erlang arrays aren’t exactly a common sight, even learn you some Erlang basically skims over them and says to be cautious when using them. I even forgot about them for the first version of this post 😅. They internally map to tuple usage in an interesting way that will be discussed/illustrated further below. With that out of the way, conceptually this is much like Tuple2D.
  • Array1D: see above for the data structure in general, otherwise conceptually like Tuple1D.
  • MapTuple: A map that takes the tuple of the coordinates {x, y} as the key with the value  being whatever is on the board: %{{0, 0} => (0, 0), {0, 1} => (0, 1), ..., {8, 8} => (8, 8)}. It’s a bit unfair compared to others shown so far as it can start with an empty map which of course is a much smaller data structure that is not only smaller but usually faster to retrieve values from. As the benchmarks start with an empty board that’s a massive advantage, so I also included a full map in the benchmark, see next/
  • MapTupleFull: Basically the same as above but initialized to already hold all key value pairs initialized as nil. Serves not only the purpose to see how this performs but also to see how MapTuple performs once it has “filled up”.
  • MapTupleHalfFull: Only looking at complete full performance and empty performance didn’t seem good either, so I added another one initialized from 0 to 4 on all columns (a bit more than a board half, totalling 45 key/value pairs).
  • MapTupleQuarterFull: Another one of these, this time with 27 key/value pairs. Why? Because there is an interesting performance characteristic, read on to find out 🙂
  • Map2D: Akin to List2D etc. a map of maps: %{0 => %{0 => (0, 0), 1 => (0, 1), ...}, 1 => %{0 => (1, 0), ...}, ..., 8 => %{..., 8 => (8, 8)}}
  • ETSSet: erlang ETS storage with table type set. Storage layout wise it’s basically the same as MapTuple, with a tuple of coordinates pointing at the stored value.
  • ETSOrderedSet: Same as above but with table type ordered_set.
  • ProcessDictionary: On a special request for Michał 😉 This is probably not a great default variant as you’re practically creating (process-) global state which means you can’t have two boards within the same process without causing mayham. Also might accidentally conflict with other code using the process dictionary. Still might be worth considering if you want to always run a board in its own process.

It’s significant to point out that all mentioned data types except for ETS and the process dictionary are immutable. This means that especially for those in the benchmark a new board is created in a before_each hook (does not count towards measured time) to avoid “contamination”.

Another notable exception (save for String for the aforementioned constraints) is Record. Records are internally represented as tuples but give you the key/value access of maps, however in elixir it is more common to use Structs (which are backed by maps). As both maps and tuples are already present in the benchmark including these likely wouldn’t lead to new insights.

System Setup

Operating System Linux
CPU Information Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Number of Available Cores 8
Available Memory 15.61 GB
Elixir Version 1.8.2
Erlang Version 22.0

Benchmarking Results

Benchmarks of course were run with benchee and the benchmarking script is here (nothing too fancy).

You can check them out in the repo as markdown (thanks to benchee_markdown) or HTML reports (benchee_html). Careful though if you’re on mobile some of the HTML reports contain the raw measurements and hence go up to 9MB in size and can take a while to load also due to the JS drawing graphs!

The results of getting and setting full board:

runtime_final.png
getting and setting full board iterations per second (higher is better)

It’s a tight race at the top when it comes to run time! Tupl1D, Tuple2D and MapTuple are all within striking range of each other and then there’s a sharp fall off.

Also there is a fair bit of variance involved as shown by the black “whiskers” (this is usual for benchmarks that finish in nanoseconds or microseconds because of garbage collection, interference etc.). Which one of these is best? To get a better picture let’s look at the whole table of results:

Name IPS Average Deviation Median Mode Minimum Maximum
Tuple1D 133.95 K 7.47 μs ±23.29% 6.93 μs 6.88 μs 6.72 μs 492.37 μs
Tuple2D 132.16 K 7.57 μs ±29.17% 7.21 μs 7.16 μs 7.03 μs 683.60 μs
MapTuple 126.54 K 7.90 μs ±25.69% 7.59 μs 7.56 μs 7.43 μs 537.56 μs
ProcessDictionary 64.68 K 15.46 μs ±14.61% 15.12 μs 15.05 μs 14.89 μs 382.73 μs
ETSSet 60.35 K 16.57 μs ±9.17% 16.04 μs 15.95 μs 15.79 μs 161.51 μs
Array2D 56.76 K 17.62 μs ±17.45% 17.15 μs 17.04 μs 16.54 μs 743.46 μs
MapTupleFull 55.44 K 18.04 μs ±11.00% 16.92 μs 16.59 μs 16.43 μs 141.19 μs
MapTupleHalfFull 53.70 K 18.62 μs ±8.36% 17.96 μs 17.87 μs 17.67 μs 160.86 μs
Array1D 50.74 K 19.71 μs ±10.60% 19.29 μs 18.99 μs 18.81 μs 469.97 μs
ETSOrderedSet 39.53 K 25.30 μs ±10.51% 24.82 μs 24.57 μs 24.34 μs 390.32 μs
Map2D 36.24 K 27.59 μs ±8.32% 27.71 μs 25.90 μs 25.12 μs 179.98 μs
List2D 29.65 K 33.73 μs ±4.12% 33.31 μs 33.04 μs 31.66 μs 218.55 μs
MapTupleQuarterFull 28.23 K 35.42 μs ±3.86% 34.96 μs 34.61 μs 34.39 μs 189.68 μs
List1D 15.41 K 64.90 μs ±2.84% 64.91 μs 64.14 μs 62.41 μs 175.26 μs

Median, and Mode are good values to look at when unsure what is usually fastest. These values are the “middle value” and the most common respectively, as such they are much less likely to be impacted by outliers (garbage collection and such).  These seem to reinforce that Tuple1D is really the fastest, if by a negligible margin.

MapTuple is very fast, but its sibling MapTupleFull, that already starts “full”, is more than 2 times slower. Whether this is significant for you depends if you start with a truly empty board (Go starts with an empty board, chess doesn’t for instance).

Somewhat expectedly List1D does worst as getting values towards to the end of the list it has to traverse the entire list which is incredibly slow.

As an aside, it’s easy to see in the box plot that the high deviation is mainly caused by some very big outliers:

boxplot_final.png
Boxplot of getting and setting full board – dots are outliers

The dots denote outliers and they are so big (but few) that the rest of the chart is practically unreadable as all that remains from the actual box is practically a thick line.

What about memory consumption?

memory_final.png
getting and setting full board memory usage (lower is better)

Here we can see the immediate drawback of Tuple1D – it’s memory consumption is many times worse than that of the others. My (educated) guess is that it’s because it has to replace/copy/update the whole tuple with it’s 9*9 = 81 values for every update operation. Tuple2D is much more economical here, as it only needs to to update the tuple holding the columns and the one holding the specific column we’re updating (2 * 9 = 18) to the best of my understanding.

Big Tuples like this are relatively uncommon in “the real world” in my experience though as their fixed size nature makes them inapplicable for a lot of cases. Luckily, our case isn’t one of them.

MapTuple does amazingly well overall as it’s probably the structure quite some people would have intuitively reached for for good constant memory access speed. It’s memory consumption is also impressively low.

ProcessDictionary is very memory efficient and also constantly in the top 4 when it comes to run time. However, at least run time wise there’s quite the margin ~15 μs to ~7 μs which doesn’t seem to make the risks worth it overall.

Other Observations

Let’s take a look at some other things that seem note worthy:

ETS isn’t the winner

This surprised me a bit (however I haven’t used ETS much). ETS was always tagged as the go to option for performance in my mind. Looking at the docs and use cases I know it makes sense though – we’re likely to see benefits for much larger data sets as ours is relatively small:

These (ETS) provide the ability to store very large quantities of data in an Erlang runtime system, and to have constant access time to the data.

81 values hardly qualifies as “very large”.

Don’t blindly follow conventional “wisdom” – always benchmark! 💪

get(0,0) vs. get(8,8)

Let’s have a look at some of the time it takes to retrieve a value – usually a much more common operation than writing:

get(0,0)

Name IPS Average Deviation Median Mode Minimum Maximum
Tuple1D 44.12 M 22.66 ns ±842.77% 20 ns 20 ns 9 ns 35101 ns
Tuple2D 42.46 M 23.55 ns ±846.67% 20 ns 19 ns 7 ns 36475 ns
Array1D 30.38 M 32.92 ns ±84.61% 32 ns 32 ns 20 ns 8945 ns
MapTuple 29.09 M 34.38 ns ±111.15% 32 ns 31 ns 19 ns 10100 ns
MapTupleQuarterFull 18.86 M 53.03 ns ±37.27% 50 ns 49 ns 38 ns 2579 ns
Array2D 18.62 M 53.70 ns ±67.02% 50 ns 49 ns 34 ns 10278 ns
List1D 18.26 M 54.75 ns ±56.06% 53 ns 52 ns 42 ns 8358 ns
ProcessDictionary 17.19 M 58.18 ns ±1393.09% 52 ns 51 ns 39 ns 403837 ns
Map2D 15.79 M 63.34 ns ±25.86% 60 ns 54 ns 41 ns 388 ns
MapTupleHalfFull 10.54 M 94.87 ns ±27.72% 91 ns 89 ns 76 ns 2088 ns
MapTupleFull 10.29 M 97.16 ns ±18.01% 93 ns 89 ns 70 ns 448 ns
ETSSet 9.74 M 102.63 ns ±26.57% 100 ns 99 ns 78 ns 2629 ns
List2D 9.04 M 110.57 ns ±69.64% 105 ns 109 ns 82 ns 4597 ns
ETSOrderedSet 6.47 M 154.65 ns ±19.27% 152 ns 149 ns 118 ns 1159 ns

get(8, 8)

Name IPS Average Deviation Median Mode Minimum Maximum
Tuple2D 42.47 M 23.55 ns ±788.60% 21 ns 20 ns 7 ns 33885 ns
Tuple1D 40.98 M 24.40 ns ±725.07% 22 ns 21 ns 10 ns 34998 ns
Array1D 29.67 M 33.70 ns ±161.51% 33 ns 32 ns 21 ns 18301 ns
MapTuple 28.54 M 35.03 ns ±986.95% 32 ns 32 ns 20 ns 230336 ns
ProcessDictionary 19.71 M 50.73 ns ±1279.45% 47 ns 47 ns 34 ns 377279 ns
Array2D 17.88 M 55.92 ns ±85.10% 52 ns 51 ns 35 ns 13720 ns
Map2D 13.28 M 75.31 ns ±32.34% 73 ns 65 ns 56 ns 2259 ns
MapTupleHalfFull 12.12 M 82.53 ns ±31.49% 80 ns 80 ns 60 ns 1959 ns
ETSSet 9.90 M 101.05 ns ±16.04% 99 ns 95 ns 78 ns 701 ns
MapTupleFull 9.85 M 101.53 ns ±19.29% 99 ns 90 ns 70 ns 487 ns
ETSOrderedSet 5.59 M 178.80 ns ±41.70% 169 ns 170 ns 135 ns 4970 ns
MapTupleQuarterFull 4.09 M 244.65 ns ±16.85% 242 ns 240 ns 226 ns 9192 ns
List2D 3.76 M 265.82 ns ±35.71% 251 ns 250 ns 231 ns 9085 ns
List1D 1.38 M 724.35 ns ±10.88% 715 ns 710 ns 699 ns 9676 ns

The top 3 remain relatively unchanged. What is very illustrative to look at is List1D and List2D though. For get(0, 0) List1D vastly outperforms its 2D sibling even being closest to the top group. That is easy to explain because it basically translates to looking at the first element of the list which is very fast for a linked list. However, looking at the last element is very slow and this is what get(8, 8) translates to. All elements have to be traversed until the end is reached. As such the whole thing is almost 16 times slower for List1D. List2D is still very slow but through it’s 2-dimenstional structure it only needs to look at 18 elements instead of 81.

MapTuple vs. MapTupleQuarterFull vs. MapTupleHalfFull vs. MapTupleFull

In most scenarios, including the biggest scenario, MapTupleQuarterFull performs worse than MapTuple (expected), MapTupleHalfFull (unexpected) and MapTupleFull (unexpected). I had expected its performance to be worse than MapTuple but better than MapTupleFull and MapTupleHalfFull. Why is that?

I had no idea but Johanna had one: it might have to do with the “magic” limit at which a map “really” becomes a map and not just a list that is linearly searched. That limit is defined as 32 entries in the erlang source code (link also provided by Johanna). Our quarter full implementation is below that limit (27 entries) and hence often performance characteristics more akin to List1D (see good get(0, 0) performance but bad get(8, 8) performance) than its “real” map cousins.

To the best of my understanding this “switch the implementation at size 32” is a performance optimization. With such a small data set a linear search often performs better than the overhead introduced by hashing, looking up etc. You can also see that the trade-off pays off as in the big benchmark where the whole board is filled incrementally MapTuple (which is initially empty and grows) still provides top performance.

What I still don’t fully understand is that sometimes MapTupleFull seems to still outperform MapTupleHalfFull – but only by a very negligible margin (most notably in the “big” getting and setting full board benchmark). The difference however is so small that it doesn’t warrant further investigation I believe, unless you have an idea of course.

Performance difference of Array vs. Tuple

In the introduction I said arrays are backed by tuples – how come their performance is way worse then? Well, let’s have a look at what an array actually looks like:

iex(3)> mine = :array.new(81, default: nil)
{:array, 81, 0, nil, 100}
iex(4)> :array.set(13, :boom, mine)
{:array, 81, 0, nil,
{10, {nil, nil, nil, :boom, nil, nil, nil, nil, nil, nil}, 10, 10, 10, 10, 10,
10, 10, 10, 10}}

It cleverly doesn’t even initialize all the fields but uses some kind of length encoding saying “the value is the default value of nil for the next 100 fields” but also saving its set size limit of 81 (fun fact: these arrays can be configured to also dynamically grow!).

Once we set a value (at index 13) the representation changes showing still some length encoding “there is nothing here for the first 10 entries” but then the indexes 10..19 are expanded as a whole tuple that’s holding our value. So, to the best of my understanding arrays work by adding “stretches” of tuples the size of 10 as they need to.

In general this is a performance optimization especially making writes/updates faster as compared to huge tuples as mainly the 10-tuple holding the concrete value needs to get updated instead of the whole thing.

However, our custom tuple implementations are perfectly sized to begin with and not too huge. Moreover, their whole size being set at compile-time probably enables some optimizations (or so I believe).  Hence the tuple implementations outperform them while arrays don’t do too shabby (especially with read access) as compared to other implementations.

Conclusion

Tuples can be very good for the use case of known at compile time sized collections that need fast access and a simple flat map performs amazingly well. All that least for the relatively small board size (9×9 = 81 fields) benchmarked against here. There is a big caveat for the map though – it is so fast if we can start with an empty map and grow it in size as new pieces are set. The completely initialized map (MapTupleFull) performs way worse, tuples are the clear winners then.

Missing a data structure? Please do a PR! There’s a behaviour to implement and then just to lists to add your module name to – more details.

Update 1 (2019-06-17): Fixed MapTupleHalfFull. Before the update it was actually just quarter full 😅 which has wildly different performance characteristics for reasons now described along with the MapTupleQuarterFull implementation. Thanks goes to Johanna for pointing that out. Also the process registry has been added as another possible implementation on a suggestion from Michał 😉 . Also added a run time box plot to show outliers clearer and visually.

Update 2 (2019-06-18): Added and investigated Arrays thanks to /u/Hauleth over on reddit. Also added a remark about records thanks to /u/friendlysock over on lobste.rs.

You may not need GenServers and Supervision Trees

The thought that people seem to think of GenServers and supervision trees in the elixir/erlang world as too essential deterring people from using these languages has been brewing in my mind for quite some time. In fact I have written about it before in Elixir Forum. This is a summary and extended version of that (including some choice replies) to give it more visibility.

It feels like we talk so much about GenServers etc. that people who come to Elixir feel like they need to use them or they are not “really” using Elixir. I fear that it keeps people out of the elixir community because all this just seems so complicated. However, you can write great applications without ever writing them yourself. I emphasize yourself because a lot of the libraries you’ll use (phoenix, ecto etc.) already have their supervision trees and their processes – you’re standing on the shoulders of giants truly. Still,  I hear people say something to the tune of “We’re still using this like Rails – we should use GenServers” – without any need or concrete reasoning. Having something as simple as rails (in parts, let’s not derail about rails here) but at the same time fully parallel and more efficient is a huge boon to my mind.

I feel like hardly anyone ever highlights this. I’ve been programming elixir since ~2015. I’ve never written a production GenServer or supervision tree (and I have multiple applications in production and multiple published libraries). I simply didn’t encounter these problems or the eco system took care of it for me. Still I felt somewhat inadequate because of it, as everyone seem to already be talking about these. Which is also understandable, they are unique, they solve hard problems – there’s lots to learn and share knowledge.

I’ve had this thought that often you don’t really need GenServers and supervision trees in my head for a long time (~2016) but was too afraid to voice it. Maybe I just don’t understand it enough? Maybe I need to learn more about them? Everyone always talks about them, I’m sure enlightenment will come to me some time soon! I finally nervously wrote the aforementioned Elixir Forum thread in 2018. My nervousness went down a notch after José Valim, creator of elixir somewhat hyper productive and omnipresent, liked the post within ~15 minutes of its creation. Also most people were really supportive, hence I’m happy to share this more openly, sorry for the delay a lot had been going on in my life 😉

Be mindful that I don’t say you don’t need them – I’m merely saying that you can build significant and great elixir applications without writing your own GenServers and supervision trees. It’s of course still a topic worth learning.

GenServers? Supervision Trees? Processes?

If you’ve ever watched a talk about elixir or erlang you’re likely to have heard about these. They’re one of the “killer features” of erlang and elixr. They give you that famed parallelism backed by the actor model, reliability and that whole “Let it crash! (and restart in a known good state)”. Both GenServer and Supervisor are behaviours showing a process “how to behave” and they both ship with Erlang/OTP.

So what’s the problem with Genservers and Supervision Trees?

GenServers, supervisors etc. are great technologies that help you solve problems. They’re one of the things that is most special & unique about elixir & erlang. As a result lots of conference talks, blog posts etc. focus on them and it seems everyone wants to use them. Through the big focus on them in the community it sometimes feels like you can’t be a “real” elixir/erlang programmer until you’ve used and mastered them.

However, do you need them all the time? At least while using a framework (like Phoenix), chances are you don’t. The hidden detail of course is that you are using GenServers and friends without even knowing it – Phoenix runs every request and every channel[1] in their own processes. Ecto has its own pool for your database connections. It’s already parallelized and you don’t need to take care of it. That’s the beauty of it. What I’m saying is that in the standard situation the eco system takes care of you.

Building a relatively standard CRUD web application with Phoenix? No need.
Just using channels for a chat like applications in Phoenix? You’re good.

Of course, you don’t need them until you really do. I like how Jordan put it:

I agree that you, 99% of the time you do not need to use anything otp related but when you are in that 1% its really a game changer that makes elixir as great as it is.

The Bad and the Ugly

It’s not like using GenServers and friends makes everything instantly better. Quite the opposite, it can make your code both harder to understand and slower.

Sometimes the introduction of processes just complicates the code, what could have been a simple interaction is now obfuscated through a bunch of GenServer calls. Through trying to use these concepts you can also essentially grind your application to a halt performance wise. To take an example from the excellent Adopting Elixir, which also covers this topic:

A new developer team started building their Phoenix applications.
They had always heard GenServers could be treated like microservices but even
tinier. This “wisdom” led them to push all of their database access control to
GenServers .
(…)
performance was abysmal. Under high-enough load, some pages took 3 seconds to render because they built a bottleneck where none existed. They
defeated Ecto connection pools because all access happened through a single
process.
In essence, they made it easy to create global, mutable variables in Elixir. They
essentially crippled the single biggest advantage of functional languages, for
no gain whatsoever.

When to GenServer?

Adopting Elixir again provides some guidance as to what to best use processes for:

  • Model state accessed by multiple processes.
  • Run multiple tasks concurrently.
  • Gracefully handle clean startup and exit concerns.
  • Communicate between servers.

They especially highlight that GenServers aren’t to be used for code organization.

Robert Virding (one of the creators of Erlang) also chimed in and his response is so measured that I want to quote it in full:

I think the most important thing to understand and to use properly is the concurrency in the problem/solution/system. Using GenServers and other behaviours is just one way of doing the concurrency but it is not the only way. They are tools and like all tools they need to be used in the right way. The problem is to get the right level of concurrency which suites your problem and your solution to that problem. Too much concurrency means you will be doing excess work to no real gain, and too little concurrency means you will be making your system too sequential.

Now, as has been pointed out, many packages like Phoenix already provide a pretty decent level of concurrency which is suitable for many types of applications, at least the ones they were intended for. They will do this automatically so you don’t have to think about it in most cases, but it is still there. Understanding that is necessary so you can work out how much concurrency you need to explicitly add if any. Unfortunately because it is all managed for you “invisibly underneath” many don’t realise that is it there.

While people agree in general, there are also some that say that most systems can benefit from a GenServer and supervision tree. As Saša Jurić points out:

With a lot of hand waving, I’d say that GenServers are OTPs built-in building block for building responsive services, Tasks are the same for non-responsive ones, and supervision tree is the built-in service manager like systemd or upstart. In the past 10+ years of my backend side experience, I’ve worked on small to medium systems, and all of them needed all of these technical approaches.

He also has a fairly extensive blog post on the topic of when to reach for these tools and when not to: To spawn, or not to spawn.

In the end, I’m also not an expert on GenServers and Supervision Trees – as I said I never wrote a production one. Still learning and still growing. I think knowing them well gives you a good basis to make informed decisions on when to use them and when not to.

Abstractions

But you came to Elixir for the parallelism! So you need GenServers right? No.

Elixir core and the community have been very good at providing easy to use solutions to write fully parallel programs without having to write your own GenServers and supervision trees. There are gen_stage, flow and broadway, but somewhat more importantly a couple of these are built-ins like Task (do something in parallel easily) and Agent (share state through a process).

Want to geocode the pick up and drop off addresses of a shipment in parallel and then wait until both have finished? Say no more:

Learning

Although I’m saying you don’t need it and can write applications without it just fine, it’s a fascinating and interesting topic that can make you a better programmer without ever writing your own supervision trees even. And as I said, education is key to know when to reach for them and when not to. So here are a couple of books I can recommend to up your knowledge:

  • The Little Elixir & OTP Guide Book – my personal favorite, in it the author takes you through writing implementations of poolboy starting with a simple one, showing what shortcomings it has and then extending on it building a complicated supervision tree but you get the reasoning to go along with it to see for what feature what level of complexity is needed. A fascinating read for me.
  • Adopting Elixir – covers many aspects as mentioned before but is especially good at what to use these concepts for and what not to use them for (and is an interesting read overall if you consider to get your company into elixir)
  • Elixir in Action – Saša Jurić is one of the most knowledgeable people I can think of about this topic, hence I quoted him before, and a significant part of the book is dedicated to it too. A second edition came out earlier this year so it’s also all up to date.
  • Functional Web Development with Elixir, OTP, and Phoenix – build a simple game, first with simple functions, then develop a GenServer interface to it with some supervisors and then wire it all up in a Phoenix App – good introduction and especially good at showing some solid debugging work.
  • Programming Phoenix – introductory book I wish everyone writing a phoenix application read first, as it also covers why things are done in a certain way and it’s by the authors of the framework which gives a unique perspective. It also has the aforementioned information of what is already parallelized etc. It also includes a pretty cool use case for a supervised GenServer (getting suggestions from an external service and ranking them).
  • Designing Elixir Systems with OTP – this is shaping up to be a great resource on GenServers, Supervision Trees and OTP in general. I haven’t read it yet, but James, Bruce and PragProg have all my trust plus I read some early praise already.

Final Thoughts

Well, you don’t need GenServers and supervision trees to start with writing elixir applications! Go out there, write an application, play with it, have fun, call yourself an elixir programmer (because you are!). Still, learn about OTP to expand your mind and to know where to look when you encounter a problem where a supervision tree could help you.

When discussing this in Elixir Forum, Dimitar also came up with a good thought: Maybe the OTP is what pulls people in and helps them discover all those other nice things?

I came for OTP. I stayed for the functional programming.

As a community, I think we should make it clearer that you don’t have to use GenServers and that doing so might actually be harmful. Of course all those conference talks about how to use them, distributed systems etc. are very cool but every now and then give me a talk about how a business succeeded by writing a fairly standard Phoenix application. Don’t over complicate things.

I’m not saying you shouldn’t learn about GenServers. You should. But know when to use them and when not to.

Lastly, if you disagree I want you to scream at me and teach me the error of my ways :smiley:

 

[1]Technically the web server of phoenix cowboy doesn’t use GenServers and supervision trees for normal http request handling but their own thing, they have a similar functionality though so it still holds true that you don’t need to roll your own. Thanks to Hubert for pointing that out.

edit1: Correctly mention new edition of Elixir in Action, be more specific about Cowboy Processes and include a thought in closing paragraph. Thanks go to Saša, Hubert and Dimitar.

Revisiting “Tail Call Optimization in Elixir & Erlang” with benchee 1.0

All the way back in June 2016 I wrote a well received blog post about tail call optimization in Elixir and Erlang. It was probably the first time I really showed off my benchmarking library benchee, it was just a couple of days after the 0.2.0 release of benchee after all.

Tools should get better over time, allow you to do things easier, promote good practices or enable you to do completely new things. So how has benchee done? Here I want to take a look back and show how we’ve improved things.

What’s better now?

In the old benchmark I had to:

  • manually collect Opearting System, CPU as well as Elixir and Erlang version data
  • manually create graphs in Libreoffice from the CSV output
  • be reminded that performance might vary for multiple inputs
  • crudely measure memory consumption in one run through on the command line

The new benchee:

  • collects and shows system information
  • produces extensive HTML reports with all kinds of graphs I couldn’t even produce before
  • has an inputs feature encouraging me to benchmark with multiple different inputs
  • is capable of doing memory measurements showing me what consumers more or less memory

I think that these are all great steps forward of which I’m really proud.

Show me the new benchmark!

Here you go, careful it’s long (implementation of MyMap for reference):

We can easily see that the tail recursive functions seem to always consume more memory. Also that our tail recursive implementation with the switched argument order is mostly faster than its sibling (always when we look at the median which is worthwhile if we want to limit the impact of outliers).

Such an (informative) wall of text! How do we spice that up a bit? How about the HTML report generated from this? It contains about the same data but is enhanced with some nice graphs for comparisons sake:

newplot(4).png

newplot(5).png

It doesn’t stop there though, some of my favourite graphs are the once looking at individual scenarios:

newplot(6).png

This Histogram shows us the distribution of the values pretty handily. We can easily see that most samples are in a 100Million – 150 Million Nanoseconds range (100-150 Milliseconds in more digestible units, scaling values in the graphs is somewhere on the road map ;))

newplot(7).png

Here we can just see the raw run times in order as they were recorded. This is helpful to potentially spot patterns like gradually increasing/decreasing run times or sudden spikes.

Something seems odd?

Speaking about spotting, have you noticed anything in those graphs? Almost all of them show that some big outliers might be around screwing with our results. The basic comparison shows pretty big standard deviation, the box plot one straight up shows outliers (little dots), the histogram show that for a long time there’s nothing and then there’s a measurement that’s much higher and in the raw run times we also see one enormous spike.

All of this is even more prevalent when we look at the graphs for the small input (10 000 elements):

newplot(8).png

Why could this be? Well, my favourite suspect in this case is garbage collection. It can take quite a while and as such is a candidate for huge outliers – the more so the faster the benchmarks are.

So let’s try to take garbage collection out of the equation. This is somewhat controversial and we can’t take it out 100%, but we can significantly limit its impact through benchee’s hooks feature. Basically through adding after_each: fn _ -> :erlang.garbage_collect() end to our configuration we tell benchee to run garbage collection after every measurement to minimize the chance that it will trigger during a measurement and hence affect results.

You can have a look at it in this HTML report. We can immediately see in the results and graphs that standard deviation got a lot smaller and we have way fewer outliers now for our smaller input sizes:

newplot(9).png

newplot(10).png

Note however that our sample size also went down significantly (from over 20 000 to… 30) so increasing benchmarking time might be worth while to get more samples again.

How does it look like for our big 5 Million input though?

newplot(11).png

Not much of an improvement… Actually slightly worse. Strange. We can find the likely answer in the raw run time graphs of all of our contenders:

newplot(13).pngnewplot(12).png

The first sample is always the slowest (while running with GC it seemed to be the third run). My theory is that for the larger amount of data the BEAM needs to repeatedly grow the memory of the process we are benchmarking. This seems strange though, as that should have already happened during warmup (benchee uses one process for each scenario which includes warmup and run time). It might be something different, but it very likely is a one time cost.

To GC or not to GC

Is a good question. Especially for very micro benchmarks it can help stabilize/sanitize the measured times. Due to the high standard deviation/outliers whoever is fastest can change quite a lot on repeated runs.

However, Garbage Collection happens in a real world scenario and the amount of “garbage” you produce can often be directly linked to your run time – taking the cleaning time out of equation can yield results that are not necessarily applicable to the real world. You could also significantly increase the run time to level the playing field so that by the law of big numbers we come closer to the true average – spikes from garbage collection or not.

Wrapping up

Anyhow, this was just a little detour to show how some of these graphs can help us drill down and find out why our measurements are as they are and find likely causes.

The improvements in benchee mean the promotion of better practices and much less manual work. In essence I could just link the HTML report and then just discuss the topic at hand (well save the benchmarking code, that’s not in there… yet 😉 ) which is great for publishing benchmarks. Speaking about discussions, I omitted the discussions around tail recursive calls etc. with comments from José Valim and Robert Virding. Feel free to still read the old blog post for that – it’s not that old after all.

Happy benchmarking!

Released: benchee 0.99, 1.0 & friends

It’s finally here – benchee 1.0! 🎉🎉🎉

The first benchee release was almost 3 years ago – it started a mission to improve benchmarking tooling in the elixir eco system. And now we’re not at the goal – after all it’s never done and we’re not short of ideas of what to do.

What’s in a 1.0?

Also called “Why did you take so long to call it 1.0?” – 1.0 for me means a good level of stability. A level where not every second new benchee version all formatters would need updates because they would break otherwise. And in recent releases we have still shuffled major data structures around A LOT (just check all the Breaking Changes (Plugins)). Benchee was mostly stable from a user perspective – but this means it’s less of a risk factor to go ahead and write your own plugins, something that benchee always encouraged/was built to empower. I don’t have any plans for 2.0 right now – all features that I know of can easily be added to the existing structure.

It also means I’m happy with the features. What benchee offers is great, we have:

  •  nano second precise run time measurements
  •  memory measurements
  • rich statistics
  • show information such as CPU, elixir and erlang versions about the system running the benchmarks
  •  support for multiple inputs
  •  hooks to support even unconventional scenarios
  • you can access it all via your CLI, CSV, JSON or HTML (including nice graphs!)
  • and actually a lot more 😉

Benchee might have started out as “I want benchmark-ips in elixir” but it has surpassed it in many ways so that I’d actually want to have benchee in Ruby but that’s another topic. However, that makes me proud of what we accomplished.

With that amount of polish I can also easily sit back and not work on benchee for some time because I know it’s good – it is “done” in the sense that it can do everything I wanted it to do when I started the project (and even more!).

As for what is actually in it mostly removing deprecations. You can check out the Changelog.

What’s 0.99?

I found it nice how rspec did their 2.99 –> 3.0 switch – get it to run on 2.99 without deprecation warnings and then you can safely use 3.0. That was a great user experience. Ember.js handles their major versions similarly. Now, benchee is nowhere near as complex as those 2 but we thought providing that nicety would still be great.

Features

As mentioned before 0.99/1.0 don’t actually include many features – the previous 0.14.0 release from about a month ago was very feature packed. These releases are a lot about polish. Redoing the documenation, updating names, fixing typespecs, being more careful about what is and isn’t exposed in the public interface.

A small but important feature made it in though – displaying the absolute difference between measurements:

Comparison:
flat_map           2.34 K
map.flatten        1.22 K - 1.92x slower +393.09 μs

See that little+393.09 μs? It’s how much slower it was on average in absolute terms. With these comparisons people often focus too much on “OMG it’s almost 2 times as slow!!!” but this number helps put it into context: It’s not even half a millisecond. If you only do this once in a web request the difference likely doesn’t matter. It’s a calculation I always did in my head, I’m happy to make it easily accessible for everyone.

Along with this patch those values were added to our Statistics struct – including the “x-times slower” values, which means formatters no longer have to implement this themselves! Hooray!

We’re an org now!

An astute observer might have seen that all my benchee repos have been moved to the github organization bencheeorg. What’s that all about? It’s mostly a tribute to benchee not being a personal project but a community project. Many people have contributed massively to benchee, most notably Devon and Eric. Without Devon we probably still wouldn’t have memory measurements and without Eric our unit scaling wouldn’t be as great as it is. Others such as Michał and OvermindDL1 have also contributed a lot through ideas, testing and help (especially with memory measurements :)). Feels wrong to keep the repositories attached to a single person.

Also, should anything happen to me (which I hope won’t happen), the others could still add people to the organization and carry on.

It also helps with another problem I’ve had: I want to extract small useful libraries from benchee: Statistics (introduced by me), System Information gathering (introduced by Devon) and unit scaling (introduced by Eric) – where do I put these repos? All under their own name space? All under my name space? Nah, I put them in the benchee organization where we share ownership – that’s where they belong.

The future of benchee

As I said benchee isn’t done – there is an open PR to add reference jobs which didn’t make it into the release. We’d like to add more types of memory measurements, as well as measuring reductions, incorporating profiling right after benchmarking to drill down on those bottle necks sounds great, more compact console output and also include the benchmarking code itself in the suite so that formatters could display it. Finally, now might finally be the time to brush up on meta programming and write that DSL wrapper that people apparently want.

Help with all of those is very welcome. Personally, I’m really itching to extract these libraries I mentioned – let’s see about that. Also to showcase benchee with some nice benchmarks – after all what good is a great benchmarking tool if you rarely use it?

Video & Slides: Do You Need That Validation? Let Me Call You Back About It

I had a wonderful time at Ruby On Ice! I gave a talk, that I loved to prepare to formulate the ideas the right way. You’ll see it focuses a lot on the problems, that’s intentional because if we’re not clear on the problems what good is a solution?

You can find the video along with awesome sketch notes on the Ruby on Ice homepage.

Anyhow, here are the slides: speakerdeck slideshare PDF

(in case you wonder why the first slide is a beer, the talk was given on Sunday Morning as the first talk after the party – welcoming people back was essential as I was a bit afraid not many would show up but they did!)

Abstract

Rails apps start nice and cute. Fast forward a year and business logic and view logic are entangled in our validations and callbacks – getting in our way at every turn. Wasn’t this supposed to be easy?

Let’s explore different approaches to improve the situation and untangle the web.