10 Elixir gotchas

No, I’ve not gone to the click-baiters (“10 tips that will change your life today!!!”), but I chose to limit myself to just 10 so that I don’t pull what is considered a “Tobi” and spend days writing a blog post so huge no one wants to read it anyhow. I’ll write a follow-up post with more 🙂

Anyhow, what are “gotchas”? For this purpose I’d define them as “slightly confusing or irritating behavior, prone to lead to errors especially when you’re new to Elixir“. There are good reasons for many of these, however some of them are also more arcane. Running on the great basis that Erlang built is probably Elixir’s biggest asset – it gives us a lot of functionality and properties that helps Elixir thrive especially in the modern multi-core and multi-node environment. However, Erlang also comes with its fair share of baggage as a programming language conceived almost 40 years ago. Erlang’s focus on backwards compatibility also means many of these decisions still live on today.

The list is brought to you by:

  1. My own experience learning Elixir
  2. Me teaching Elixir to folks both at Liefery and at Remote over the years
  3. Horrifying discoveries in production code

Apologies for the lack of syntax highlighting, but wordpress borked the last way that was working for elixir and I didn’t want to yak shave this too far. I hope that you can sill enjoy them and may learn from them!

1. A list of numbers becomes text in iex

Let’s start with an oldie but goldie that pretty much every beginner book tells you about: Why does this random list of integers print out as text?

iex> [69, 108, 105, 120, 105, 114]
~c"Elixir"

This is because charlists, denoted by ~c"text" or 'text', are actually just that – a list of integers. So iex literally can’t tell the difference and does its own best guess work: it checks if the integers are in a range between 0 and 127 and will then print it as text.

You can also show that it literally is a list by using Enum functions on it:

iex> Enum.map(~c"elixir", fn integer -> integer + 100 end)
[201, 208, 205, 220, 205, 214]
iex> Enum.map(~c"elixir", fn integer -> integer - 32 end)
~c"ELIXIR"

The first line changes the integers to be outside of the printable range with +100 so iex prints it as a list of integers again. The second one just uses the knowledge that the difference between lower case and upper case letters in ASCII is 32 to transform it.

2. Charlists vs. Strings

All this brings us to the following questions: Why do we even have charlists and strings in elixir? What’s the difference between them? When do I use which one? Great question! It’s a source of a lot of confusion, esp. since in most languages single and double-quoted strings only sport minor differences – in Elixir they are backed by entirely different data structures. While single-quoted strings are just a list of integers, double-quoted strings are UTF-8 encoded binaries and so resemble strings you are used to in most modern programming languages.

As a rule of thumb, use strings aka double-quotes ("string"). The major use case for charlists / 'charlists'/~c"charlists" is interfacing with erlang or erlang libraries. Or in the words of the documentation:

In practice, you will not come across them often, only in specific scenarios such as interfacing with older Erlang libraries that do not accept binaries as arguments.

The other mystery here are the 2 different syntaxes in use for charlists – single quotes ('charlist') vs. the ~c"charlist" sigil. That’s a rather recent development, it was changed in Elixir 1.15, after some discussion. The reason for this is what I mentioned initially – it caused a lot of confusion:

In many languages, 'foobar' is equivalent to "foobar", that’s not the case in Elixir and we believe it leads to confusion.

So, it’s now less confusing but still confusing – which is why it made this list.

3. %{} matches any Map

Pattern matching is one of Elixir’s chief features! You can see it utilized frequently, for instance in a recursive function where we want to end recursion on empty list:

def list([]) do
  IO.puts("list is empty")
end

That works flawlessly, however if you try the same with a map it’ll always match – no matter the map:

def map(%{}) do
  IO.puts("Empty map or is it?")
end
iex> map(%{not: "empty"})
Empty map or is it?

The reason is simple – pattern matches on lists and maps just work different. In a list we’re looking for an exact match of the elements, whereas for maps it is basically checked if the structure is included:

iex> [a, b, c] = [1, 2, 3]
[1, 2, 3]
iex> [a, b, c] = [1, 2, 3, 4]
** (MatchError) no match of right hand side value: [1, 2, 3, 4]
iex> [] = [1, 2, 3]
** (MatchError) no match of right hand side value: [1, 2, 3]
iex> %{action: action} = %{action: "learn"}
%{action: "learn"}
iex> %{action: action} = %{action: "learn", more: "can be", provided: true}
%{
  more: "can be",
  action: "learn",
  provided: true
}
iex> %{} = %{action: "learn", more: "can be", provided: true}
%{
  more: "can be",
  action: "learn",
  provided: true
}
iex> %{action: action} = %{no_action: "sad"}
** (MatchError) no match of right hand side value: %{no_action: "sad"}

If you do want to execute a function only when given an empty map you can use either of the following guards: map_size(map) == 0 or map == %{} – which also showcases the difference between the match (=) and equality (==) operators. One full example from the docs:

def empty_map?(map) when map_size(map) == 0, do: true
def empty_map?(map) when is_map(map), do: false

4. Structs are Maps

While we’re in the topic of maps let’s talk about structs! We can easily create and use a struct:

iex> defmodule Human do
...> defstruct [:name, :age]
...> end
iex> tobi = %Human{name: "Tobi", age: 34}
%Human{name: "Tobi", age: 34}

It gets more interesting around pattern matching again, let’s try %{} from the previous section:

iex> %{} = %Human{name: "Tobi", age: 34}
%Human{name: "Tobi", age: 34}
iex> is_map(%Human{name: "Tobi", age: 34})
true

It matches and it is a map! Structs are nothing more than special maps with a __struct__ key that tells it which struct it is. It gets even weirder when you know that some of the built-in data types are structs and hence maps:

iex> is_map(1..10)
true
iex> is_map(Date.utc_today())
true
iex> is_map(~r/elixir/)
true

We can see their map nature more easily in an example! With a bit of meddling we can also tell IO.inspect to not print a prettified version showing us the real map underneath:

iex> tobi = %Human{name: "Tobi", age: 34}
%Human{name: "Tobi", age: 34}
iex> tobi.__struct__
Human
iex> Map.keys(tobi)
[:name, :__struct__, :age]
iex> Map.values(tobi)
["Tobi", Human, 34]
iex> IO.inspect(tobi, structs: false)
%{name: "Tobi", __struct__: Human, age: 34}
%Human{name: "Tobi", age: 34}
iex> IO.inspect(1..10, structs: false)
%{first: 1, last: 10, step: 1, __struct__: Range}
1..10

Now, you might think this all doesn’t matter too much. But it does! Be aware that every pattern match on a map might also match on a struct with the same keys, so will every is_map check. And yes, dates and ranges may match as well as shown above:

iex> %{first: number} = 1..10
1..10
iex> number
1

I have seen bugs in code that first matched on something being a map and only later matched on specific structs. So, instead of the struct specific code the more general map code was run – and hence another hard to track down bug was born.

In order to combat this, the Elixir team introduced a new is_non_struct_map/1 guard.

5. Structs don’t implement Access

So, I just told you that structs are just maps. But then you try to use random key access via [] on them and you are confused again:

iex(18)> tobi[:age]
** (UndefinedFunctionError) function Human.fetch/2 is undefined (Human does not implement the Access behaviour

You can use the "struct.field" syntax to access struct fields. You can also use Access.key!/1 to access struct fields dynamically inside get_in/put_in/update_in)
    Human.fetch(%Human{name: "Tobi", age: 34}, :age)
    (elixir 1.16.0-rc.1) lib/access.ex:309: Access.get/3
    iex:18: (file)
iex(18)> tobi.age
34
iex(19)> map_tobi = %{name: "Tobi", age: 34}
%{name: "Tobi", age: 34}
iex(20)> map_tobi[:age]
34
iex(21)> map_tobi.age
34

As usual, elixir is amazing and already tells us that the problem is that the struct doesn’t implement the Access behaviour. As structs have predefined keys, you should use the dot-syntax of struct.key to access them. However, since sometimes you do still want to randomly access struct keys you can use the fact that structs are still just maps to your advantage using functions like Map.get/3:

iex(22)> attribute = :age
:age
iex(23)> Map.get(tobi, attribute)
34

You can also take it further than that and use get_in/2. It doesn’t work in a plain attempt, but can work thanks to Access.key/2:

iex(25)> get_in(tobi, [attribute])
** (UndefinedFunctionError) function Human.fetch/2 is undefined (Human does not implement the Access behaviour
# etc....
iex(25)> get_in(tobi, [Access.key(attribute)])
34

Be mindful to only use these if you really do need random key access on structs. Otherwise there are many other ways, such as good old plain dot-based access or pattern matching even.

6. Keyword lists are a bit awkward as options and in pattern matches

Another somewhat special data structure in elixir are keyword lists. Again, these are backed by “syntactic sugar” on top of lists. A keyword list is a list of 2 element tuples, where the first element is an atom.

iex> [{:option, true}] == [option: true]
true

When you call functions with keywordlists as the last argument you can even omit the brackets as seen before when we wrote IO.inspect(tobi, structs: false)structs: false is a keyword list here. These properties make it the default data structure for passing along options to functions in Elixir.

However, since it’s a list the order matters here (and keys can be duplicated!) which often isn’t what you want for options: order usually doesn’t matter and duplicated options should not be a thing. It’s great for DSLs such as ecto, but when used as options it means it’s hard to pattern match on them. Let’s check out the following function:

def option(warning: true) do
  IO.puts "warning!"
end

def option(_anything) do
  IO.puts "No warning!"
end

It only matches when our options are exactly warning: true – any additional data makes it a different list and hence fails the pattern match:

iex> option warning: true
warning!
iex> option warning: true, more: true
No warning!
iex> option more: true, warning: true
No warning!

It’s an issue I struggled with early in my Elixir days. There are plenty of solutions for this. What I do in benchee is accept the options as a keyword list but internally convert it to a map (well, actually a struct even!). So, internally I can work with a nice structure that is easy to pattern match, but preserves the nice & idiomatic interface.

You can also use Keyword.get/3 to get the value of whatever option you’re looking for. You can also use Keyword.validate/2 to make sure only well known options are supplied and that you provide good defaults – hat tip to Vinicius.

7. Everything can be compared to Everything

Another surprise might be that you can compare literally every elixir term with one another without raising an exception:

iex> nil < 8
false
iex> 8 < "hello"
true
iex> {1, 2} < ["a"]
true

Most people would probably expect this to raise an error as it does in many other languages. It doesn’t, as Elixir does structural comparisons and follows Erlang’s term ordering which basically gives all terms a predetermined order:

number < atom < reference < function < port < pid < tuple < map < list < bitstring

Why is it done like this?

This means comparisons in Elixir are structural, as it has the goal of comparing data types as efficiently as possible to create flexible and performant data structures.

All in all being able to compare everything to everything may sound mildly annoying but can also lead to some really bad bugs. In a conditional, this will just silently run the wrong code:

iex> maximum = "100" # forgot to parse
"100"
iex> if 9999 < maximum, do: "you pass"
"you pass"

I have seen similar bugs in production code bases, esp. since nil also doesn’t raise and is more likely to slip through. Thankfully, if you try to compare structs elixir issues a warning these days:

iex> %Human{} > nil
warning: invalid comparison with struct literal %Human{}. Comparison operators (>, <, >=, <=, min, and max) perform structural and not semantic comparison. Comparing with a struct literal is unlikely to give a meaningful result. Struct modules typically define a compare/2 function that can be used for semantic comparison
└─ iex:6

true

It’s note-worthy that this gotcha and the next one are currently already being addressed at Elixir targeted for the 1.17 release, thanks to the introduction of the type system. Beyond that, sabiwara also wrote the micro library cmp to take care of the problem.

8. Proper Date comparisons

Speaking of which, how do you compare dates?

iex(18)> ~D[2024-05-01] > ~D[2024-05-02]
warning: invalid comparison with struct literal ~D[2024-05-01]. Comparison operators (>, <, >=, <=, min, and max) perform structural and not semantic comparison. Comparing with a struct literal is unlikely to give a meaningful result. Struct modules typically define a compare/2 function that can be used for semantic comparison
└─ iex:18
false

Whoops, there is that warning again! Obviously, we shouldn’t compare them like this – but it still works, and might even produce the correct result by accident slipping through tests!

The correct way to compare dates is Date.compare/2 and friends:

iex> Date.compare(~D[2024-05-01], ~D[2024-05-02])
:lt

Again, you may be surprised how often this has snuck past someone.

9. nil["something"] is valid and returns nil

Another surprise may be this:

iex> nil["something"]
nil

Of course, you’d never write it like this but if a nil value had gotten past you and was in your map variable there’d be no way to tell:

iex> map = nil
nil
iex> map["something"]
nil

Which, can be very dangerous. Why is it like this? So that you can use [] to safely access nested values:

iex> map = %{a: %{b: :c}}
%{a: %{b: :c}}
iex> map[:a][:b]
:c
iex> map[:d][:b]
nil

In that last example map[:d] returns nil and then nil[:b] evaluates to nil again without crashing. If you wanted to assure that the keys are there, you got a lot of possibilities but one of them is pattern matching:

iex> %{a: %{b: value}} = map
%{a: %{b: :c}}
iex> value
:c
iex> %{d: %{b: value}} = map
** (MatchError) no match of right hand side value: %{a: %{b: :c}}

10. How to use constants

Another question that’s common among Elixir newcomers is: “Cool, so how do I define constants?” and the answer is… there are no real constants in Elixir/Erlang. The best workaround we have are module attributes. However, they are not visible to the outside by default so you have to provide a function to access them:

defmodule Constants do
  @my_constant "super constant"
  def my_constant do
    @my_constant
  end
end
iex> Constants.my_constant()
"super constant"

That works, however one unfortunate thing about module attributes is that they aren’t… you know, truly constant. You can redefine a module attribute later on in a module without any warning and if you then use it again below the new definition – with its value will have changed:

defmodule Constants do
  @my_constant "super constant"
  def my_constant do
    @my_constant
  end

  @my_constant "ch-ch-changes!"
  def my_constant_again do
    @my_constant
  end
end
iex> Constants.my_constant_again()
"ch-ch-changes!"
iex> Constants.my_constant()
"super constant"

Interestingly, the value is not changed retroactively so my_constant/0 still returns the original value (and is a true constant in that sense). But it can change throughout the module, which is necessary for other use cases of module attributes. So, if you accesses it in a function and someone happened to define it again with a newer value above, you may be in for a bad time.

Hence, I whole-heartedly agree with my friend Michał here:

It’s also worth nothing that you don’t need module attributes – you can also just define a function that returns a constant value:

def my_other_constant do
  "This is cool as well"
end

In many cases, the compiler is smart enough to realize it’s a constant value (even with some operations applied) and so you won’t suffer a performance penalty for this. However, there are cases where it doesn’t work (f.ex. reading a file) and certain guards require module attributes (f.ex. around enum checking). Hat tip to discussing this with José.

To help with this, hauleth has also created a new miny library called defconst.

Closing

Hope you enjoyed these gotchas and they helped you! What gotchas are missing? Let me know in the comments or elsewhere and I’ll try to cover them in future editions – I still got ~10 on my TODO list so far though 😅

It’s also worth mentioning that Elixir is well aware of a lot of these – if you follow the links I posted, they will frequently send you to Elixir’s own documentation explaining these. From the early days, there have also already been quite some improvements and more warnings emitted to help you. As Elixir is amazing, and cares a lot about the developer experience.

If you enjoyed this post and think “Working with Tobi may be cool!” – you’re in luck as I’m still looking for a job – so give me a shout, will ya? 💚

Update 1 (2024-05-02):

Update 2 (2024-05-04)

Update 3 (2024-05-11)

Careful what data you send or how to tank your performance with Task.async

In Elixir and on the BEAM (Erlang Virtual Machine) in general we love our processes – lightweight, easily run millions of them, easy lock-less parallelism – you’ve probably heard it all. Processes are great and one of the many reasons people gravitate towards the BEAM.

Functions like Task.async/1 make parallelism effortless and can feel almost magical. Cool, let’s use it in a simple benchmark! Let’s create some random lists, and then let’s run some non trivial Enum functions on them: uniq, frequencies and shuffle and let’s compare doing them sequentially (one after the other) and running them all in parallel. This kind of work is super easy to parallelize, so we can just fire off the tasks and then await them:

random_list = fn size, spread ->
for _i <- 1..size, do: :rand.uniform(spread)
end
inputs = [
{"10k", random_list.(10_000, 100)},
{"1M", random_list.(1_000_000, 1_000)},
{"10M", random_list.(10_000_000, 10_000)}
]
Benchee.run(
%{
"sequential" => fn big_list ->
uniques = Enum.uniq(big_list)
frequencies = Enum.frequencies(big_list)
shuffled = Enum.shuffle(big_list)
[uniques, frequencies, shuffled]
end,
"parallel" => fn big_list ->
tasks = [
Task.async(fn -> Enum.uniq(big_list) end),
Task.async(fn -> Enum.frequencies(big_list) end),
Task.async(fn -> Enum.shuffle(big_list) end)
]
Task.await_many(tasks, :infinity)
end
},
inputs: inputs,
warmup: 15,
time: 60,
formatters: [
{Benchee.Formatters.Console, extended_statistics: true},
{Benchee.Formatters.HTML, file: "bench/output/task_no_task/index.html", auto_open: false}
]
)
view raw benchmark.exs hosted with ❤ by GitHub

Cool, let’s check out the results! You can check the HTML report online here, uncollapse for the console formatter version or just check out the pictures.

Console formatter output
Operating System: Linux
CPU Information: AMD Ryzen 9 5900X 12-Core Processor
Number of Available Cores: 24
Available memory: 31.25 GB
Elixir 1.16.0-rc.1
Erlang 26.1.2
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 15 s
time: 1 min
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: 10k, 1M, 10M
Estimated total run time: 7.50 min

##### With input 10k #####
Name                 ips        average  deviation         median         99th %
sequential        315.29        3.17 ms    ±20.76%        2.96 ms        5.44 ms
parallel          156.77        6.38 ms    ±31.08%        6.11 ms       10.75 ms

Comparison: 
sequential        315.29
parallel          156.77 - 2.01x slower +3.21 ms

Extended statistics: 

Name               minimum        maximum    sample size                     mode
sequential         2.61 ms        7.84 ms        18.91 K         2.73 ms, 3.01 ms
parallel           3.14 ms       11.99 ms         9.40 K4.80 ms, 4.87 ms, 8.93 ms

##### With input 1M #####
Name                 ips        average  deviation         median         99th %
sequential          1.14         0.87 s     ±7.16%         0.88 s         0.99 s
parallel            0.94         1.07 s     ±3.65%         1.07 s         1.16 s

Comparison: 
sequential          1.14
parallel            0.94 - 1.22x slower +0.194 s

Extended statistics: 

Name               minimum        maximum    sample size                     mode
sequential          0.74 s         0.99 s             69                     None
parallel            0.98 s         1.16 s             57                     None

##### With input 10M #####
Name                 ips        average  deviation         median         99th %
sequential        0.0896        11.17 s    ±10.79%        11.21 s        12.93 s
parallel          0.0877        11.40 s     ±1.70%        11.37 s        11.66 s

Comparison: 
sequential        0.0896
parallel          0.0877 - 1.02x slower +0.23 s

Extended statistics: 

Name               minimum        maximum    sample size                     mode
sequential          9.22 s        12.93 s              6                     None
parallel           11.16 s        11.66 s              6                     None
10k input, iterations per second (higher is better)
Boxplot for 10k, measured run time (lower is better). Sort of interesting how many “outliers” (blue dots) there are for sequential though.
1M input, iterations per second (higher is better)
Boxplot for 1M, measured run time (lower is better).
10M input, iterations per second (higher is better). Important to know, they take so long here the sample size is only 6 for each.

And just as we all expected the parallel… no wait a second the sequential version is faster for all of them? How could that be? This was easily parallelizable work, split into 3 work packages with many more cores available to do the work. Why is the parallel execution slower?

What happened here?

There’s no weird trick to this: It ran on a system with 12 physical cores that was idling save for the benchmark. Starting processes is extremely fast and lightweight, so that’s also not it. By most accounts, parallel processing should win out.

What is the problem then?

The problem here are the huge lists the tasks need to operate on and the return values that need to get back to the main process. The BEAM works on a “share nothing” architecture, this means in order to process theses lists in parallel we have to copy the lists over entirely to the process (Tasks are backed by processes). And once they’re done, we need to copy over the result as well. Copying, esp. big data structures, is both CPU intensive and memory intensive. In this case the additional copying work we do outweighs the gains we get by processing the data in parallel. You can also see that this effect seems to be diminishing the bigger the lists get – so it seems like the parallelization is catching up there.

The full copy may sound strange – after all we’re dealing with immutable data structures which should be safe to share. Well, once processes share data garbage collection becomes a whole other world of complex, or in the words of the OTP team in “A few notes on message passing” (emphasis mine):

Sending a message is straightforward: we try to find the process associated with the process identifier, and if one exists we insert the message into its signal queue.

Messages are always copied before being inserted into the queue. As wasteful as this may sound it greatly reduces garbage collection (GC) latency as the GC never has to look beyond a single process. Non-copying implementations have been tried in the past, but they turned out to be a bad fit as low latency is more important than sheer throughput for the kind of soft-realtime systems that Erlang is designed to build.

John Högberg

Robert Virding (co-inventor of Erlang) also puts some more color to it in a thread on elixir forum.

In case you’re interested in other factors for this particular benchmark: I chose the 3 functions semi-randomly looking for functions that traverse the full list at least once doing some non trivial work. If you do heavier work on the lists the parallel solution will fare better. We can also not completely discount that CPU boosting (where single core performance may increase if the other cores are idle) is shifting benchmark a bit in favor of sequential but overall it should be solid enough for demonstration purposes. Due to the low sample size for the 10M list, parallel execution may sometimes come out ahead, but usually doesn’t (and I didn’t want the benchmark take even longer).

The Sneakyness

Now, the problem here is a bit more sneaky – as we’re not explicitly sending messages. Our code looks like this: Task.async(fn -> Enum.uniq(big_list) end) – there is no send or GenServer.call here! However, that function still needs to make its way to the process for execution. As the closure of the function automatically captures referenced variables – all that data ends up being copied over as well! (Technically speaking Task.async does a send under the hood, but spawn/1 also behaves like this.)

This is what caught me off-guard with this – I knew messages were copied, but somehow Task.async was so magical I didn’t think about it sending messages or needing to copy its data to a process. Let’s call it a blind spot and broken mental model I’ve had for way too long. Hence, this blog post is for you dear reader – may you avoid the mistake I made!

Let’s also be clear here that normally this isn’t a problem and the benefits we get from this behavior are worth it. When a process terminates we can just free all its memory. It’s also not super common to shift so much data to a process to do comparatively lightweight work. The problem here is a bit, how easy it is for this problem to sneak up on you when using these high level abstractions like Task.async/1.

Real library, real problems

Yup. While I feel some shame about it, I’ve always been an advocate for sharing mistakes you made to spread some of the best leanings. This isn’t a purely theoretical thing I ran into – it stems from real problems I encountered. As you may know I’m the author of benchee – the best benchmarking library ™ 😉 . Benchee’s design, in a nut shell, revolves around a big data structure – the suite – data is enriched throughout the process of benchmarking. You may get a better idea by looking at the breakdown of the steps. This has worked great for us.

However, some of the data in that suite may reference large chunks of data if the benchmark operates on large data. Each Scenario references its given input as well as its benchmarking function. Given what we just learned both of these may be huge. More than that, the Configuration also holds all the configured inputs and is part of the suite as well.

Now, when benchee tries to compute your statistics in parallel it happily creates a new process for each scenario (which may be 20+) copying over the benchmarking function and input although it really doesn’t need them.

Even worse formatters are run in parallel handing over the entire suite – including all scenarios (function and input) as well as all the inputs again as part of the Configuration – none of which a formatter should need. 😱

To be clear, you will only encounter this problem if you deal with huge sets of data and if you do it’s “just” more memory and time used. However, for a library about measuring things and making them fast this is no good.

The remedy

Thankfully, there are multiple possible remedies for this problem:

  • Limiting the data you send to the absolute necessary minimum, instead of just sending the whole struct. For example, don’t send an entire Suite struct if all you need is a couple of fields.
  • If only the process needs the data, it may fetch the data itself instead. I.e. instead of putting the result of a giant query into the process, the process could be the one doing the query if it’s the only one that needs the data.
  • There are some data structures that are shared between processes and hence don’t need copying, such as ets and persistent_term.

As teased above, the most common and easiest solution is just to pass along the data you need, if you ended up accidentally sending along more than you wanted to. You can see one step of it in this pull request or this one.

The results are quite astounding, for a benchmark I’m working on (blog post coming soon ™) this change got it from practically being unable to run the benchmark due to memory constraints (on a 32GB RAM system) to easily running the benchmark – maximum resident size set size got almost halfed.

The magnitude of this can also be shown perhaps by the size of the files I saved for this benchmark. Saving is actually implemented as a formatter, and so automatically benefits from these changes – the file size for this benchmark went down from ~200MB per file to 1MB aka a reduction to 0.5% in size. You can read more about how it improved in the benchee 1.3.0 release notes.

Naturally this change will also make its way to you all as benchee 1.3.0 soon (edit: out now!).

Also when pursuing to fix this be mindful that you need to completely remove the variable from the closure. You can’t just go: Task.async(fn -> magic(suite.configuration) end) – the entire suite will still be sent along.

iex(1)> list = Enum.to_list(1..100_000)
iex(2)> # do not benchmark in iex, this is purely done to get a suite with some data
iex(3)> suite = Benchee.run(%{map: fn -> Enum.map(list, fn i -> i * i end) end })
iex(4)> :erts_debug.size(suite)
200642
iex(5)> :erts_debug.size(fn -> suite end)
200675
iex(6)> :erts_debug.size(fn -> suite.configuration end)
200841
iex(7)> :erts_debug.size(fn -> suite.configuration.time end)
201007
iex(8)> configuration = suite.configuration
iex(9)> :erts_debug.size(fn -> configuration.time end)
295
iex(10)> time = configuration.time
iex(11)> :erts_debug.size(fn -> time end)
54

Helping others avoid making the same mistake

All of that discovery, and partially shame, left me with the question: How can I help others avoid making the same mistake? Well, one part of it is right here – publish a blog post. However, that’s one point.

We already added documentation to the Task module mentioning this, and as proposed by José are working on adding a section to the process anti-patterns section.

Also don’t forget: processes are still awesome and lightweight – you should use them! This is just a cautionary tale of how things might go wrong if you’re dealing with big chunks of data and that the work you’re doing on that data may not be extensive enough to warrant a full copy. Or that you’re accidentally sending along too much data unaware of the consequences. There are many more use cases for processes and tasks that are absolutely great, appropriate and will save you a ton of time.

What does this leave us with? As usual: don’t assume, always benchmark!

Also, be careful about the data you’re sending around and if you really need it! 💚

You may not need GenServers and Supervision Trees

The thought that people seem to think of GenServers and supervision trees in the elixir/erlang world as too essential deterring people from using these languages has been brewing in my mind for quite some time. In fact I have written about it before in Elixir Forum. This is a summary and extended version of that (including some choice replies) to give it more visibility.

It feels like we talk so much about GenServers etc. that people who come to Elixir feel like they need to use them or they are not “really” using Elixir. I fear that it keeps people out of the elixir community because all this just seems so complicated. However, you can write great applications without ever writing them yourself. I emphasize yourself because a lot of the libraries you’ll use (phoenix, ecto etc.) already have their supervision trees and their processes – you’re standing on the shoulders of giants truly. Still,  I hear people say something to the tune of “We’re still using this like Rails – we should use GenServers” – without any need or concrete reasoning. Having something as simple as rails (in parts, let’s not derail about rails here) but at the same time fully parallel and more efficient is a huge boon to my mind.

I feel like hardly anyone ever highlights this. I’ve been programming elixir since ~2015. I’ve never written a production GenServer or supervision tree (and I have multiple applications in production and multiple published libraries). I simply didn’t encounter these problems or the eco system took care of it for me. Still I felt somewhat inadequate because of it, as everyone seem to already be talking about these. Which is also understandable, they are unique, they solve hard problems – there’s lots to learn and share knowledge.

I’ve had this thought that often you don’t really need GenServers and supervision trees in my head for a long time (~2016) but was too afraid to voice it. Maybe I just don’t understand it enough? Maybe I need to learn more about them? Everyone always talks about them, I’m sure enlightenment will come to me some time soon! I finally nervously wrote the aforementioned Elixir Forum thread in 2018. My nervousness went down a notch after José Valim, creator of elixir somewhat hyper productive and omnipresent, liked the post within ~15 minutes of its creation. Also most people were really supportive, hence I’m happy to share this more openly, sorry for the delay a lot had been going on in my life 😉

Be mindful that I don’t say you don’t need them – I’m merely saying that you can build significant and great elixir applications without writing your own GenServers and supervision trees. It’s of course still a topic worth learning.

GenServers? Supervision Trees? Processes?

If you’ve ever watched a talk about elixir or erlang you’re likely to have heard about these. They’re one of the “killer features” of erlang and elixr. They give you that famed parallelism backed by the actor model, reliability and that whole “Let it crash! (and restart in a known good state)”. Both GenServer and Supervisor are behaviours showing a process “how to behave” and they both ship with Erlang/OTP.

So what’s the problem with Genservers and Supervision Trees?

GenServers, supervisors etc. are great technologies that help you solve problems. They’re one of the things that is most special & unique about elixir & erlang. As a result lots of conference talks, blog posts etc. focus on them and it seems everyone wants to use them. Through the big focus on them in the community it sometimes feels like you can’t be a “real” elixir/erlang programmer until you’ve used and mastered them.

However, do you need them all the time? At least while using a framework (like Phoenix), chances are you don’t. The hidden detail of course is that you are using GenServers and friends without even knowing it – Phoenix runs every request and every channel[1] in their own processes. Ecto has its own pool for your database connections. It’s already parallelized and you don’t need to take care of it. That’s the beauty of it. What I’m saying is that in the standard situation the eco system takes care of you.

Building a relatively standard CRUD web application with Phoenix? No need.
Just using channels for a chat like applications in Phoenix? You’re good.

Of course, you don’t need them until you really do. I like how Jordan put it:

I agree that you, 99% of the time you do not need to use anything otp related but when you are in that 1% its really a game changer that makes elixir as great as it is.

The Bad and the Ugly

It’s not like using GenServers and friends makes everything instantly better. Quite the opposite, it can make your code both harder to understand and slower.

Sometimes the introduction of processes just complicates the code, what could have been a simple interaction is now obfuscated through a bunch of GenServer calls. Through trying to use these concepts you can also essentially grind your application to a halt performance wise. To take an example from the excellent Adopting Elixir, which also covers this topic:

A new developer team started building their Phoenix applications.
They had always heard GenServers could be treated like microservices but even
tinier. This “wisdom” led them to push all of their database access control to
GenServers .
(…)
performance was abysmal. Under high-enough load, some pages took 3 seconds to render because they built a bottleneck where none existed. They
defeated Ecto connection pools because all access happened through a single
process.
In essence, they made it easy to create global, mutable variables in Elixir. They
essentially crippled the single biggest advantage of functional languages, for
no gain whatsoever.

When to GenServer?

Adopting Elixir again provides some guidance as to what to best use processes for:

  • Model state accessed by multiple processes.
  • Run multiple tasks concurrently.
  • Gracefully handle clean startup and exit concerns.
  • Communicate between servers.

They especially highlight that GenServers aren’t to be used for code organization.

Robert Virding (one of the creators of Erlang) also chimed in and his response is so measured that I want to quote it in full:

I think the most important thing to understand and to use properly is the concurrency in the problem/solution/system. Using GenServers and other behaviours is just one way of doing the concurrency but it is not the only way. They are tools and like all tools they need to be used in the right way. The problem is to get the right level of concurrency which suites your problem and your solution to that problem. Too much concurrency means you will be doing excess work to no real gain, and too little concurrency means you will be making your system too sequential.

Now, as has been pointed out, many packages like Phoenix already provide a pretty decent level of concurrency which is suitable for many types of applications, at least the ones they were intended for. They will do this automatically so you don’t have to think about it in most cases, but it is still there. Understanding that is necessary so you can work out how much concurrency you need to explicitly add if any. Unfortunately because it is all managed for you “invisibly underneath” many don’t realise that is it there.

While people agree in general, there are also some that say that most systems can benefit from a GenServer and supervision tree. As Saša Jurić points out:

With a lot of hand waving, I’d say that GenServers are OTPs built-in building block for building responsive services, Tasks are the same for non-responsive ones, and supervision tree is the built-in service manager like systemd or upstart. In the past 10+ years of my backend side experience, I’ve worked on small to medium systems, and all of them needed all of these technical approaches.

He also has a fairly extensive blog post on the topic of when to reach for these tools and when not to: To spawn, or not to spawn.

In the end, I’m also not an expert on GenServers and Supervision Trees – as I said I never wrote a production one. Still learning and still growing. I think knowing them well gives you a good basis to make informed decisions on when to use them and when not to.

Abstractions

But you came to Elixir for the parallelism! So you need GenServers right? No.

Elixir core and the community have been very good at providing easy to use solutions to write fully parallel programs without having to write your own GenServers and supervision trees. There are gen_stage, flow and broadway, but somewhat more importantly a couple of these are built-ins like Task (do something in parallel easily) and Agent (share state through a process).

Want to geocode the pick up and drop off addresses of a shipment in parallel and then wait until both have finished? Say no more:


pick_up_task = Task.async(fn -> geocode(pick_up_address) end)
drop_off_task = Task.async(fn -> geocode(drop_off_address) end)
geocoded_addresses = Enum.map([pick_up_task, drop_off_task], fn task -> Task.await(task) end)

view raw

task.ex

hosted with ❤ by GitHub

Learning

Although I’m saying you don’t need it and can write applications without it just fine, it’s a fascinating and interesting topic that can make you a better programmer without ever writing your own supervision trees even. And as I said, education is key to know when to reach for them and when not to. So here are a couple of books I can recommend to up your knowledge:

  • The Little Elixir & OTP Guide Book – my personal favorite, in it the author takes you through writing implementations of poolboy starting with a simple one, showing what shortcomings it has and then extending on it building a complicated supervision tree but you get the reasoning to go along with it to see for what feature what level of complexity is needed. A fascinating read for me.
  • Adopting Elixir – covers many aspects as mentioned before but is especially good at what to use these concepts for and what not to use them for (and is an interesting read overall if you consider to get your company into elixir)
  • Elixir in Action – Saša Jurić is one of the most knowledgeable people I can think of about this topic, hence I quoted him before, and a significant part of the book is dedicated to it too. A second edition came out earlier this year so it’s also all up to date.
  • Functional Web Development with Elixir, OTP, and Phoenix – build a simple game, first with simple functions, then develop a GenServer interface to it with some supervisors and then wire it all up in a Phoenix App – good introduction and especially good at showing some solid debugging work.
  • Programming Phoenix – introductory book I wish everyone writing a phoenix application read first, as it also covers why things are done in a certain way and it’s by the authors of the framework which gives a unique perspective. It also has the aforementioned information of what is already parallelized etc. It also includes a pretty cool use case for a supervised GenServer (getting suggestions from an external service and ranking them).
  • Designing Elixir Systems with OTP – this is shaping up to be a great resource on GenServers, Supervision Trees and OTP in general. I haven’t read it yet, but James, Bruce and PragProg have all my trust plus I read some early praise already.

Final Thoughts

Well, you don’t need GenServers and supervision trees to start with writing elixir applications! Go out there, write an application, play with it, have fun, call yourself an elixir programmer (because you are!). Still, learn about OTP to expand your mind and to know where to look when you encounter a problem where a supervision tree could help you.

When discussing this in Elixir Forum, Dimitar also came up with a good thought: Maybe the OTP is what pulls people in and helps them discover all those other nice things?

I came for OTP. I stayed for the functional programming.

As a community, I think we should make it clearer that you don’t have to use GenServers and that doing so might actually be harmful. Of course all those conference talks about how to use them, distributed systems etc. are very cool but every now and then give me a talk about how a business succeeded by writing a fairly standard Phoenix application. Don’t over complicate things.

I’m not saying you shouldn’t learn about GenServers. You should. But know when to use them and when not to.

Lastly, if you disagree I want you to scream at me and teach me the error of my ways :smiley:

 

[1]Technically the web server of phoenix cowboy doesn’t use GenServers and supervision trees for normal http request handling but their own thing, they have a similar functionality though so it still holds true that you don’t need to roll your own. Thanks to Hubert for pointing that out.

edit1: Correctly mention new edition of Elixir in Action, be more specific about Cowboy Processes and include a thought in closing paragraph. Thanks go to Saša, Hubert and Dimitar.

Tail Call Optimization in Elixir & Erlang – not as efficient and important as you probably think

(Automatic) Tail Call Optimization (TCO) is that great feature of Elixir and Erlang that everyone tells you about. It’s super fast, super cool and you should definitely always aim to make every recursive function tail-recursive. What if I told you that body-recursive functions can be faster and more memory efficient than their especially optimized tail-recursive counter parts?

Seems unlikely, doesn’t it? After all every beginners book will mention TCO, tell you how efficient it is and that you should definitely use it. Plus, maybe you’ve tried body-recusion before in language X and your call stack blew up or it was horrendously slow. I did and I thought tail-recursive functions were always better than body-recursive. Until one day, by accident, I wrote a none tail-recursive function (so TCO didn’t apply to it). Someone told me and eagerly I replaced it with its tail-recursive counterpart. Then, I stopped for a second and benchmarked it – the results were surprising to say the least.

Before we do a deep dive into the topic, let’s take a refresher on tail call optimization from Dave Thomas in Programming Elixir 1.2 (great book!):

(…) it ends up calling itself. In many languages, that adds a new frame to the stack. After a large number of messages, you might run out of memory.

This doesn’t happen in Elixir, as it implements tail-call optimization. If the last thing a function does is call itself, there’s no need to make the call. Instead, the runtime simply jumps back to the start of the function. If the recursive call has arguments, then these replace the original parameters.

Well, let’s get into this 🙂

Writing a map implementation

So let’s write an implementation of the map function. One will be body-recursive, one will be tail-recursive. I’ll add another tail-recursive implementation using ++ but no reverse and one that just does not reverse the list in the end. The one that doesn’t reverse the list of course isn’t functionally equivalent to the others as the elements are not in order, but if you wrote your own function and don’t care about ordering this might be for you. In an update here I also added a version where the argument order is different, for more on this see the results and edit6.


defmodule MyMap do
def map_tco(list, function) do
Enum.reverse _map_tco([], list, function)
end
defp _map_tco(acc, [head | tail], function) do
_map_tco([function.(head) | acc], tail, function)
end
defp _map_tco(acc, [], _function) do
acc
end
def map_tco_arg_order(list, function) do
Enum.reverse do_map_tco_arg_order(list, function, [])
end
defp do_map_tco_arg_order([], _function, acc) do
acc
end
defp do_map_tco_arg_order([head | tail], function, acc) do
do_map_tco_arg_order(tail, function, [function.(head) | acc])
end
def map_tco_concat(acc \\ [], list, function)
def map_tco_concat(acc, [head | tail], function) do
map_tco_concat(acc ++ [function.(head)], tail, function)
end
def map_tco_concat(acc, [], _function) do
acc
end
def map_body([], _func), do: []
def map_body([head | tail], func) do
[func.(head) | map_body(tail, func)]
end
def map_tco_no_reverse(list, function) do
_map_tco([], list, function)
end
end

view raw

my_map.ex

hosted with ❤ by GitHub

map_body here is the function I originally wrote. It is not tail-recursive as the last operation in this method is the list append operation, not the call to map_body. Comparing it to all the other implementations, I’d also argue it’s the easiest and most readable as we don’t have to care about accumulators or reversing the list.

Now that we have the code, let us benchmark the functions with benchee! Benchmark run on Elixir 1.3 with Erlang 19 on an i7-4790 on Linux Mint 17.3. Let’s just map over a large list and add one to each element of the list. We’ll also throw in the standard library implementation of map as a comparison baseline:


list = Enum.to_list(1..10_000)
map_fun = fn(i) -> i + 1 end
Benchee.run %{
"map tail-recursive with ++" =>
fn -> MyMap.map_tco_concat(list, map_fun) end,
"map with TCO reverse" =>
fn -> MyMap.map_tco(list, map_fun) end,
"stdlib map" =>
fn -> Enum.map(list, map_fun) end,
"map simple without TCO" =>
fn -> MyMap.map_body(list, map_fun) end,
"map with TCO new arg order" =>
fn -> MyMap.map_tco_arg_order(list, map_fun) end,
"map TCO no reverse" =>
fn -> MyMap.map_tco_no_reverse(list, map_fun) end
}, time: 10, warmup: 10


tobi@happy ~/github/elixir_playground $ mix run bench/tco_blog_post_detailed.exs
Benchmarking map tail-recursive with ++…
Benchmarking map TCO no reverse…
Benchmarking stdlib map…
Benchmarking map with TCO new arg order…
Benchmarking map simple without TCO…
Benchmarking map with TCO reverse…
Name ips average deviation median
map simple without TCO 6015.31 166.24μs (±6.88%) 163.00μs
stdlib map 5815.97 171.94μs (±11.29%) 163.00μs
map with TCO new arg order 5761.46 173.57μs (±10.24%) 167.00μs
map TCO no reverse 5566.08 179.66μs (±6.39%) 177.00μs
map with TCO reverse 5262.89 190.01μs (±9.98%) 182.00μs
map tail-recursive with ++ 8.66 115494.33μs (±2.86%) 113537.00μs
Comparison:
map simple without TCO 6015.31
stdlib map 5815.97 – 1.03x slower
map with TCO new arg order 5761.46 – 1.04x slower
map TCO no reverse 5566.08 – 1.08x slower
map with TCO reverse 5262.89 – 1.14x slower
map tail-recursive with ++ 8.66 – 694.73x slower

view raw

results

hosted with ❤ by GitHub

(The benchmarking was actually done by this a little more verbose but equivalent script so it generates the CSV ouput and console output on the same run using benchee’s slightly more verbose interface. Feature to make that possible with the nicer interface is planned 😉 )

For the more visual here is also a graph showcasing the results (visualized using benchee_csv):

new_map_tco
Graphing iterations per second (higher is better)

So what do we see? The body-recursive function seems to be as fast as the version from standard library. The reported values are faster, but well within the margin of error. Plus the median of the two is the same while standard deviation is higher for the standard library version. This hints at the possibility that the worse average may be through some outliers (resulting f.ex. from Garbage Collection). The tail-recursive version with ++ is VERY SLOW but that’s because appending with ++ so frequently is a bad idea as it needs to go to the end of the linked list every time around (O(n)). But that’s not the main point.

The main point is that the tail-recursive version is about 14% slower! Even the tail-recursive version that doesn’t reverse the list is slower than the body-recursive implementation!

Here seems like a good point to interject and mention that it was brought up in the comments (see edit11) that for significantly larger lists tail-recursive implementations get faster again. You can check out the results from a 10 Million item list.

What is highly irritating and surprising to me is that the tail-recursive function with a slightly different argument order is significantly faster than my original implementation, almost 10%. And this is not a one off – it is consistently faster across a number of runs. You can see more about that implementation in edit6 below. Thankfully José Valim chimed in about the argument order adding the following:

The order of arguments will likely matter when we generate the branching code. The order of arguments will specially matter if performing binary matching. The order of function clauses matter too although I am not sure if it is measurable (if the empty clause comes first or last).

Now, maybe there is a better tail-recursive version (please tell me!) but this result is rather staggering but repeatable and consistent. So, what happened here?

An apparently common misconception

That tail-recursive functions are always faster seems to be a common misconception – common enough that it made the list of Erlang Performance Myths as “Myth: Tail-Recursive Functions are Much Faster Than Recursive Functions”! (Note: this section is currently being reworked so the name might change/link might not lead to it directly any more in the near-ish future)

To quote that:

A body-recursive list function and a tail-recursive function that calls lists:reverse/1 at the end will use the same amount of memory. lists:map/2, lists:filter/2, list comprehensions, and many other recursive functions now use the same amount of space as their tail-recursive equivalents.

So, which is faster? It depends. On Solaris/Sparc, the body-recursive function seems to be slightly faster, even for lists with a lot of elements. On the x86 architecture, tail-recursion was up to about 30% faster

The topic also recently came up on the erlang-questions mailing list again while talking about the rework of the aforementioned Erlang performance myths site (which is really worth the read!). In it Fred Hebert remarks (emphasis added by me):

In cases where all your function does is build a new list (or any other accumulator whose size is equivalent to the number of iterations and hence the stack) such as map/2 over nearly any data structure or say zip/2 over lists, body recursion may not only be simpler, but also faster and save memory over time.

He also has his own blog post on the topic.

But won’t it explode?

I had the same question. From my experience with the clojure koans I expected the body-recursive function to blow up the call stack given a large enough input. But, I didn’t manage to – no matter what I tried.

Seems it is impossible as the BEAM VM, that Erlang and Elixir run in, differs in its implementation from other VMs, the body recursion is limited by RAM:

Erlang has no recursion limit. It is tail call optimised. If the recursive call is not a tail call it is limited by available RAM

Memory consumption

So what about memory consumption? Let’s create a list with one hundred million elements (100_000_000) and map over it measuring the memory consumption. When this is done the tail-recursive version takes almost 13 Gigabytes of memory while the body-recursive version takes a bit more than 11.5 Gigabytes. Details can be found in this gist.

Why is that? Well most likely here with the large list it is because the tail recursive version needs to create a new reversed version of the accumulator to return a correct result.

Body-recursive functions all the time now?

So let’s recap, the body-recursive version of map is:

  • faster
  • consumes less memory
  • easier to read and maintain

So why shouldn’t we do this every time? Well there are other examples of course. Let’s take a look at a very dumb function deciding whether a number is even (implemented as a homage to this clojure kaons exercise that showed how the call stack blows up in Clojure without recur):


defmodule Number do
def is_even?(0), do: true
def is_even?(n) do
!is_even?(n – 1)
end
def is_even_tco?(n, acc \\ true)
def is_even_tco?(0, acc), do: acc
def is_even_tco?(n, acc) do
is_even_tco?(n – 1, !acc)
end
end

view raw

number.ex

hosted with ❤ by GitHub


number = 10_000_000
Benchee.run %{
"is_even?" => fn -> Number.is_even?(number) end,
"is_even_tco?" => fn -> Number.is_even_tco?(number) end,
}


tobi@happy ~/github/elixir_playground $ mix run bench/is_even.exs
Benchmarking is_even?…
Benchmarking is_even_tco?…
Name ips average deviation median
is_even? 10.26 97449.21μs (±0.50%) 97263.00μs
is_even_tco? 9.39 106484.48μs (±0.09%) 106459.50μs
Comparison:
is_even? 10.26
is_even_tco? 9.39 – 1.09x slower

view raw

results

hosted with ❤ by GitHub

The tail-recursive version here is still 10% slower. But what about memory? Running the function with one hundred million as input takes 41 Megabyte for the tail-recursive version (mind you, this is the whole elixir process) but almost 6.7 Gigabyte for the body-recursive version. Also, for that huge input the tail-recursive version took 1.3 seconds, while the body-recursive function took 3.86 seconds. So for larger inputs, it is faster.

Stark contrast, isn’t it? That’s most likely because this time around there is no huge list to be carried around or accumulated – just a boolean and a number. Here the effect that the body-recursive function needs to save its call stack in the RAM has a much more damaging effect, as it needs to call itself one hundred million times.

Rerunning the map implementation with a significantly larger list, the tail-recursive versions also become faster than the standard library version and the body-recursive function. See edit11.

So, what now?

Tail-recursive functions still should be faster and more efficient for many or most use cases. Or that’s what I believe through years of being taught that tail call optimization leads to the fastest recursive functions ;). This post isn’t to say that TCO is bad or slow. It is here to say and highlight that there are cases where body-recursive functions are faster and more efficient than tail-recursive functions. I’m also still unsure why the tail-recursive function that does not reverse the list is still slower than the body-recursive version – it might be because it has to carry the accumulator around.

Maybe we should also take a step back in education and teaching and be more careful not to overemphasize tail call optimization and with it tail-recursive functions. Body-recursive functions can be a viable, or even superior, alternative and they should be presented as such.

There are, of course, cases where writing tail-recursive functions is absolutely vital, as Robert Virding, creator of Erlang, rightfully highlights:

No, the main case where is TCO is critical is in process top-loops. These functions never return (unless the process dies) so they will build up stack never to release it. Here you have to get it right. There are no alternatives. The same applies if you top-loop is actually composed of a set of mutually calling functions. There there are no alternatives. Sorry for pushing this again, and again, but it is critical. :slight_smile:

But what does this teach us in the end? Don’t take your assumptions stemming from other programming environments for granted. Also, don’t assume – always proof. So let’s finish with the closing words of the Erlang performance myths section on this:

So, the choice is now mostly a matter of taste. If you really do need the utmost speed, you must measure. You can no longer be sure that the tail-recursive list function always is the fastest.

edit1: there previously was a bug in is_even_tco? with a missing not, not caught by my tests as they were calling the wrong function 😦 Thanks to Puella for notifying me.

edit2/addendum: (see next edit/update) It was pointed out at lobste.rs that running it in an erlang session the body-recursive function was significantly slower than the tco version. Running what I believe to be equivalent code in Elixir and Erlang it seems that the Erlang map_body version is significantly slower than in Elixir (2 to 3 times by the looks of it). I’d need to run it with an Erlang benchmarking tool to confirm this, though.

edit3/addendum2: The small tries mentioned in the first addendum were run in the shell which is not a great idea, using my little erlang knowledge I made something that compiled that “benchmark” and map_body is as fast/faster again thread. Benchmarking can be fickle and wrong if not done right, so would still look forward to run this in a proper Erlang benchmarking tool or use Benchee from Erlang. But no time right now 😦

edit4: Added comment from Robert Virding regarding process top loops and how critical TCO is there. Thanks for reading, I’m honoured and surprised that one of the creators of Erlang read this 🙂 His full post is of course worth a read.

edit5: Following the rightful nitpick I don’t write “tail call optimized” functions any more but rather “tail-recursive” as tail call optimization is more of a feature of the compiler and not directly an attribute of the function

edit6: Included another version in the benchmark that swaps the argument order so that the list stays the first argument and the accumulator is the last argument. Surprisingly (yet again) this version is constantly faster than the other tail-recursive implementation but still slower than body recursive. I want to thank Paweł for pointing his version out in the comments. The reversed argument order was the only distinguishing factor I could make out in his version, not the assignment of the new accumulator. I benchmarked all the different variants multiple times. It is consistently faster, although I could never reproduce it being the fastest. For the memory consumption example it seemed to consume about 300MB less than the original tail-recursive function and was a bit faster. Also since I reran it either way I ran it with the freshly released Elixir 1.3 and Erlang 19. I also increased the runtime of the benchmark as well as the warmup (to 10 and 10) to get more consistent results overall. And I wrote a new benchmarking script so that the results shown in the graph are from the same as the console output.

edit7: Added a little TCO intro as it might be helpful for some 🙂

edit8: In case you are interested in what the bytecode looks like, I haven’t done it myself (not my area of expertise – yet) there is a gist from Sasa Juric showing how you can get the bytecode from an elixir function

edit9: Added José’s comment about the argument order, thanks for reading and commenting! 🙂

edit10: Added CPU and OS information 🙂

edit11: It was pointed out in the comments that the performance characteristics might differ for larger lists. And it does, taking 1000 times as many elements (10_000_000) all tail-recursive versions are significantly faster than the body-recursive version (and even the stdlib version).

edit12: Updated to reflect newer Benchee API as the old API with lists of tuples doesn’t work anymore 🙂