Object#extend leaks memory on Ruby 1.9.1

13

Labels: , ,

The Garbage Collector is really strange business in the Ruby land. For me, it is the major performance drain currently. If you are not aware of the limitations here's a list:

  1. The GC is mark and sweep, it needs to scan the whole heap for each run. It is directly affected by heap size O(n).
  2. The GC cannot be interrupted and hence all threads must wait for it to finish (shameful pause on big heaps).
  3. The GC marks objects in the objects themselves destroying any value of copy on write.
  4. The GC does not (edit: usually) give memory back to the system. What goes in does not (edit: usually) go out.
  5. It is a bit on the conservative side. Meaning garbage can stay because it is not sure that it is so.


Needless to say, some of these are being addressed, specially 3 and 5. But the patches are not yet accepted in the current Ruby release. I believe though that they will find their way to 1.8.x which is being maintained by Engine Yard. The EY guys are really working hard to solve the issues of Ruby as a server platform which is the most popular use for it today thanks to Rails.


Alas, my issue today involves Ruby 1.9.1 (it does not affect 1.8.x). See I have built this toy server to experiment with multi process applications and some Unix IPC facilities. I did make the design a bit modular to make it easier to test and debug different aspects of the stack. So I have these tcp, http handler modules that extend the connection object (a socket) whenever a connection is accepted. Here's a sample:


conn = server_socket.accept
conn.extend HttpHandler
..
..


This worked really great and I was even able to chain handlers to get more stack functionality (a handler will simply include those that it requires). This worked great, until I looked at memory usage.

I discovered that after showering the server with requests it will start to grow in size. This is acceptable as it is making way for new objects. But given the way the GC works it should have allocated enough heap locations after a few of those ab runs. On the contraty, even when I am hitting the same file with ab the server keeps growing. After 10 or more ab runes (each doing 10000 requests) it is still consuming more memory. So I suspected there is a leak some where. I tested a hello world and found that the increase was very consistent. Every 10K requests the process gains 0.1 to 0.2 MB. (10 to 20 Bytes per request). So I started removing components one after another till I was left with a bare server that only requires socket and reactor.

 

When I tested that server the process started to gain memory then after like 3 or 4 ab runs it stabilized. It would no longer increase its allocated memory no matter how many times I run ab on it. So the next logical move was to re-insert the first level of the stack (the tcp handler module). Once I did that the issue started appearing again. So the next test was to disable the use of the tcp handler but still decorate my connections with it. The issue still appeared. Since the module is not overriding Module.extended to do any work upon it extending an object it became clear that it is the guilty party.

Instead of Object#extend I tried reopening the BasicSocket class and including the required module there. After doing that memory usage pattern resembled the bare bones server. It would increase for a few runs and then remain flat as long as you are hitting the same request.

To isolate the problem further I created this script:

# This code is Ruby 1.9.x and above only

@extend = ARGV[0]

module BetterHash
  def blabla
  end
end

unless @extend
  class Hash
  include BetterHash
  end
end

t = Time.now
1_000_000.times do
  s = {}
  s.extend BetterHash if @extend 
end
after = Time.now - t
puts "done with #{GC.count} gc runs after #{after} seconds"
sleep # so that it doesn't exit before we check the memory

using extend:
351 GC runs, 9.108 seconds, 18.7 MB

using include:
117 GC runs, 0.198 seconds, 2.8 MB

Besides being much faster, the resulting process was much smaller. Around 16MB smaller. I am suspecting that the leak is around 16 bytes or a little less per extend invokation. This means that a server that uses a single extend per request will increase around 160KB in size after every 10K requests. Not that huge but it will pile up fast if left for a while and the server is under heavy load. 



A quick grep in Rails sources showed that this pattern is being used heavily throughout the code. But it is used to extend base classes rather than objects. Hence it will not be invoked on every request and the effect will be mostly limited to the initial start size (a few bytes actually). You should avoid using it dynamically at request serving time though, till it gets fixed.

 

A fast, simple, pure Ruby reactor library

2

Labels: , , ,

Please welcome Reactor, a reactor library with the very original name of "Reactor".

What is a reactor any way?

A reactor library is one that provides an asynchronus event handling mechanism. Ruby already has a couple of those. The most prominent are EventMachine and Rev.

Many high performing Ruby applications like Thin and Evented Mongrel are utilizing EventMachine for event handling. Both Rev and EventMachine build atop native reactor implementations written in C or C++. While this ensures high performance it makes some integration aspects with Ruby a bit quirky. Sometimes
even at a noticable performance cost.

This is why I thought of building Reactor. A much simpler reactor library in pure Ruby that attempts to use as much of the Ruby built in classes and standard libraries as possible. It only provides a minimal API that does not attempt to be so smart. It differs from EventMachine and Rev in the following aspects.


  1. Pure Ruby, no C or C++ code involved
  2. Very small (~100 lines of code)
  3. Uses the vanilla Ruby socket and server implementations
  4. Decent (high) performance on Ruby 1.9.1
  5. Ruby threading friendly (naturally)
  6. You can have multiple reactors running (like Rev and unlike EventMachine)
Usage is simple, here's a simple Echo server that uses Reactor
require 'reactor'
require 'socket'
reactor = Reactor::Base.new
server = TCPServer.new("0.0.0.0",8080)
reactor.attach(:read, server) do |server|
conn = server.accept
conn.write(conn.gets)
conn.close
end
reactor.run # blocking call, will run for ever

The server is a normal Ruby TCPServer. It attaches itself to the reactor and asks to be notified if there is data to be read on the wire. A block is provided that will handle those notifications. Alternatively, the server can implement a notify_readable method that will be fired instead.

Any IO object can be attached to the reactor but it doesn't make much sense to attach actual files since they will block upon reading or writing anyway. Sockets and pipes will work in a non-blocking manner though.

Reactor is using Ruby's IO.select behind the scenes. This limits its ability to scale in comparison to something like EventMachine or Rev which are able to utilize Epoll and Kqueue which scale much better. This is not a major concern though. Most servers listen to a few fds most of the time, which is a bit faster when using select. Besides one can hope that Ruby will be able to use Epoll and Kqueue some day which will translate to direct benefit to Reactor.

Ruby Strikes Back

5

Labels: , , ,

If you are not following Mauricio Fernandez's blog then please do yourself a favor and subscribe to it. Mauricio's writings are very interesting and informative. In one of his posts Mauricio gives a record of re-implementing his blog in OCaml using the OCsigen (webserver + framework) library. Mauricio did some benchmarking for the OCsigen environment against Rails and even a C fastcgi implementation. Naturally one would expect that OCaml will be orders of magnitude faster than Ruby. But the benchmark showed really abysmal performance for Rail vs. OCsigen. We are talking 260 request per second vs. 4500 requests per second for a single process test! That's north of 20X difference! I decided that Ruby can do better.

Looking at what OCsigen offers revealed that Rails is an overkill in comparison. I thought that for Ruby a nice alternative can be the mystery webserver + framework called unicycle (never heard of it? you don't know what you're missing). Since OCsigen offers LWt (a light weight cooperative threading library) at its core for concurrency I added a fiber wrapper to Unicycle's request processing path so that we get a similar overhead (the testing was done using Ruby 1.9.1).

Here are my results:

Hello World - Unicycle, Single Process: 7378 requests/second

Please note that this is running on my Intel Mobile Core 2 Duo 2.0 GHZ processor vs. the 3GHZ desktop AMD Athlon64 that was used for the original tests (it should be roughly 50% faster than my mobile core2).

I decided to take it further still. Mauricio mentioned that he was able to get performance above 2000 req/s from OCsigen when benchmarking the blog page that we are discussing here. So I created a sqlite database (he mentioned somewhere that he is using sqlite) and inserted the same blog entry (with very little modifications) in a structured manner. I didn't bother with comments though (out of being lazy). Sequel was used to connect and fetch the record from the database and an rhtml template that is rendered using Erubis. The result was a page very similar to the original blog post. ApacheBench was used to benchmark the page.

Unicyle + Fibers + Sequel (Sqlite) + Erubis, Single Process: 1296 requests/second

During the time of testing the Unicycle process was between 13MB and 21MB (that's 3x to 4x the size of the OCsigen process)

Considering that the components found in any laptop are usually inferior to their desktop counter parts I believe this at least equals the figure reported for OCsigen's performance.

How can Ruby achieve such performance? By careful selection of components:

First off, Ruby 1.9.1, everybody should start using it for their next project. It is much faster and much easier on memory

Unicycle is built atop EventMachine and the EventMachine HTTP Server. Both are C based speed demons. Unicycle itself is a minimal framework that doesn't attempt to be so smart.

Erubis is a nice surprise. Pure Ruby and decently fast are not commonly found together but kudos to the authors of Erubis, they somehow did it.

Conclusion

Ruby is faster the OCaml. Wrong! OCaml is a lot faster than Ruby. But thanks to hard work by some prominent Rubyists you can have a Ruby setup that performs decently enough to make you proud.