New Reactor Release

1

Reactor has reached version 0.4, with this release we see a few notable features:


  1. More efficient timer implementation


    In this release timers are removed immediately once they are cancelled, this is done is a (semi) efficient manner, more importantly it removes the overhead of lingering cancelled timers. This change enables the implementation of connection timeouts efficiently.



  2. The reactor loop is now fiber (not thread) re-entrant


    The reactor now may be run inside a fiber and when that fiber yields another one can grab the reactor and call run on it and it will just continue where it left off. This feature enables the building of fibered servers with near zero overhead for fast non-blocking connections.



    Here's how a fibered server would be written normally



    @socket = TCPServer.new(..,..)
    @reactor = Reactor::Base.new
    @reactor.attach(:read, @socket) do
    conn = @socket.accept
    Fiber.new do
    # handle connection here
    end.resume
    end
    @reactor.run


    As you can see, we spawn a fiber for each connection which adds up when we have many of those since each fiber requires a 4KB stack



    Now it can be done like that



    @socket = TCPServer.new(..,..)
    @reactor = Reactor::Base.new
    @reactor.attach(:read, @socket) do
    conn = @socket.accept
    # handle connection here
    end
    loop do
    Fiber.new do
    @reactor.run
    end
    end


    Now the whole reactor loop runs in a fiber, if the connection does not block then it will run as if there is no fiber overhead, only when a connection blocks the reactor loop will break, a new fiber will be created and it will run the reactor again. This virtually removes the fiber overheads for non-blocking connections




  3. Reactor#next_tick is now thread safe


    When trying to access the reactor from other threads you can now schedule events using next_tick which will make sure the event is put in place gracefuly even in the presence of multiple threads fighting for the reactor. That's it though, you cannot safely access any other reactor methods from multiple threads, and it is up to you to ensure that the block provided to next_tick wont try to access variables that are shared among threads in an unsafe manner (it will be run in the context of the reactor's thread).




That's beside a few bug fixes here and there.



Grab it from here

The Ruby 19x Web Servers Booklet

20

Labels: , , , , , ,

I have just finished a draft of my Ruby web server review, I have uploaded it to my scribd account, This is still a work in progress and I am looking at improving it further and may be include Unicorn tests as well.

Here is the document:

While at the topic of Ruby web servers, I highly recommend this article by Ryan Tomayko on Unicorn's architecture.

Edit: I didn't know that scribd requires you to login before you can download the files, here is a direct link

NeverBlock Saves The Day

8

Labels: , , ,

It started with too many processes

A great proof of how valuable NeverBlock is happend just a little while ago. Alsaha.com is one of the oldest forums in the middle east. Lately, I helped rebuild the whole thing and move it to Ruby on Rails while I was at eSpace, my previous employer.

Many web users rely on sites like Alsaha.com for following commentary on breaking news, thus during such events the daily page views can jump from a couple hundred thousands to millions. Add to that the fact that there are some (unavoidable) slow database operations that must be done online for the administrators.

Initially we had different web server instances for the normal users and administrators to avoid stalling Rails processes on long running queries. All in all, since we had a capable back end, we coupled it with a formidable number of Rails processes as front end servers. This was the only way back then to exploit our back end's conncurency. Needless to say, this army of front end processes consumed lots and lots of memory.

Enter NeverBlock

After the initial production testing in meOwns.com, we thought of the gains we can get from using it with Alsaha, so we planned for the move and were able to drastically reduce the Rails instances count. We only use 4 now and this is for utilizing the CPU cores rather than concurrency. Those 4 processes serve all the user and administrative content. Thanks to NeverBlock, no one has to wait for the other.

The Real Test

There was a very important news item lately that resulted in a traffic spike in the order of millions of page views in a few hours. Thankfully, the NeverBlock setup easily accomodated that without noticeable degradation (actually there was some degradation attributed to a bug in the logging logic, that was quickly discovered and fixed). The small 4 instances kept up with the load even though some slow operations were running on them while they were serving loads of quick ones to lots of users.

Conclusion

I am grateful to have written code that turned out to be useful to others. I hope this would be a pattern.

My RubyKaigi 2009 Presentations

0

Labels: , , ,

Better late than never :)

I totally forgot to link to my RubyKaigi 2009 presentations, so without further ado, here they are:

NeverBlock, the video:




NeverBlock, the slides:
NeverBlock-RubyKaigi2009


Arabesque, the slides

Enjoy

Improving Ruby's VM performance using gcc profiling tools

2

Labels: , ,

Digging through some of the old Python archives I found really interesting post by Mike Pall of LuaJit's fame ( the post can be found here ). Pall was talking about possible ways to improve Python's performance. One of his findings was that the branching code was one of the major culprits and that affected Python's runtime performance. This is mainly due to the fact that it was not optimized for the commonly used paths. This is understandable, since a compiler like GCC will have no prior knowledge on how the binary will perform. So branch predicition remains as such, a predicition that might be skewed from the true state of affair. In his email Mike offered a technique to show that with a little compiler trickery this effect could be a little marginalized. I followed his steps but this time I did it for the Ruby 1.9.1 p129 binary.

First I ran configure on the source code and in the Makefile I added -fprofile-arcs to optflags. Then make was run which resulted in a Ruby binary, lots of .o files and lots of .gcda and .gcno files.

What we have now is a binary that will profile the branching behaviour of the application. Next we need to run the binary in a real life situation such that we it can record how branching will be done in the wild. I tried a simple benchmark applicaition that included lots of math and string operations. After doing that the make file was changed once more, this time -fprofile-arcs was replaced with -fbranch-probabilities. This asks the compiler to use the branching probabilities that were generated from the previous run.

All .o files were deleted and make was run again resulting in a shiny new ruby binary that is ready for testing.

I ran the new binary head to head against the regular one. I used several ruby scripts for that matter and the result was that the new binary was almost consistently faster than the old one, usually by a little margin but for one benchmark the difference reached 40%. Even though the code samples differed widely from the one used in the profiling it was apparent that the VM itself gained a little from this optimization which would cause all Ruby scripts to gain as well.

At the end of the day, an average 5% improvement is not something to brag about but the fact remains that this shows that the Ruby VM still has a way to go. Considering that a silly, mostly static attempt did that I think that a JIT engine with tracing optimizations can do wonders for this little language. Since it will adapt to the application usage patterns and should be able to eliminate lots of the slow code paths in ways that static optimization cannot predict before hand.

Object#extend leaks memory on Ruby 1.9.1

13

Labels: , ,

The Garbage Collector is really strange business in the Ruby land. For me, it is the major performance drain currently. If you are not aware of the limitations here's a list:

  1. The GC is mark and sweep, it needs to scan the whole heap for each run. It is directly affected by heap size O(n).
  2. The GC cannot be interrupted and hence all threads must wait for it to finish (shameful pause on big heaps).
  3. The GC marks objects in the objects themselves destroying any value of copy on write.
  4. The GC does not (edit: usually) give memory back to the system. What goes in does not (edit: usually) go out.
  5. It is a bit on the conservative side. Meaning garbage can stay because it is not sure that it is so.


Needless to say, some of these are being addressed, specially 3 and 5. But the patches are not yet accepted in the current Ruby release. I believe though that they will find their way to 1.8.x which is being maintained by Engine Yard. The EY guys are really working hard to solve the issues of Ruby as a server platform which is the most popular use for it today thanks to Rails.


Alas, my issue today involves Ruby 1.9.1 (it does not affect 1.8.x). See I have built this toy server to experiment with multi process applications and some Unix IPC facilities. I did make the design a bit modular to make it easier to test and debug different aspects of the stack. So I have these tcp, http handler modules that extend the connection object (a socket) whenever a connection is accepted. Here's a sample:


conn = server_socket.accept
conn.extend HttpHandler
..
..


This worked really great and I was even able to chain handlers to get more stack functionality (a handler will simply include those that it requires). This worked great, until I looked at memory usage.

I discovered that after showering the server with requests it will start to grow in size. This is acceptable as it is making way for new objects. But given the way the GC works it should have allocated enough heap locations after a few of those ab runs. On the contraty, even when I am hitting the same file with ab the server keeps growing. After 10 or more ab runes (each doing 10000 requests) it is still consuming more memory. So I suspected there is a leak some where. I tested a hello world and found that the increase was very consistent. Every 10K requests the process gains 0.1 to 0.2 MB. (10 to 20 Bytes per request). So I started removing components one after another till I was left with a bare server that only requires socket and reactor.

 

When I tested that server the process started to gain memory then after like 3 or 4 ab runs it stabilized. It would no longer increase its allocated memory no matter how many times I run ab on it. So the next logical move was to re-insert the first level of the stack (the tcp handler module). Once I did that the issue started appearing again. So the next test was to disable the use of the tcp handler but still decorate my connections with it. The issue still appeared. Since the module is not overriding Module.extended to do any work upon it extending an object it became clear that it is the guilty party.

Instead of Object#extend I tried reopening the BasicSocket class and including the required module there. After doing that memory usage pattern resembled the bare bones server. It would increase for a few runs and then remain flat as long as you are hitting the same request.

To isolate the problem further I created this script:

# This code is Ruby 1.9.x and above only

@extend = ARGV[0]

module BetterHash
  def blabla
  end
end

unless @extend
  class Hash
  include BetterHash
  end
end

t = Time.now
1_000_000.times do
  s = {}
  s.extend BetterHash if @extend 
end
after = Time.now - t
puts "done with #{GC.count} gc runs after #{after} seconds"
sleep # so that it doesn't exit before we check the memory

using extend:
351 GC runs, 9.108 seconds, 18.7 MB

using include:
117 GC runs, 0.198 seconds, 2.8 MB

Besides being much faster, the resulting process was much smaller. Around 16MB smaller. I am suspecting that the leak is around 16 bytes or a little less per extend invokation. This means that a server that uses a single extend per request will increase around 160KB in size after every 10K requests. Not that huge but it will pile up fast if left for a while and the server is under heavy load. 



A quick grep in Rails sources showed that this pattern is being used heavily throughout the code. But it is used to extend base classes rather than objects. Hence it will not be invoked on every request and the effect will be mostly limited to the initial start size (a few bytes actually). You should avoid using it dynamically at request serving time though, till it gets fixed.

 

A fast, simple, pure Ruby reactor library

2

Labels: , , ,

Please welcome Reactor, a reactor library with the very original name of "Reactor".

What is a reactor any way?

A reactor library is one that provides an asynchronus event handling mechanism. Ruby already has a couple of those. The most prominent are EventMachine and Rev.

Many high performing Ruby applications like Thin and Evented Mongrel are utilizing EventMachine for event handling. Both Rev and EventMachine build atop native reactor implementations written in C or C++. While this ensures high performance it makes some integration aspects with Ruby a bit quirky. Sometimes
even at a noticable performance cost.

This is why I thought of building Reactor. A much simpler reactor library in pure Ruby that attempts to use as much of the Ruby built in classes and standard libraries as possible. It only provides a minimal API that does not attempt to be so smart. It differs from EventMachine and Rev in the following aspects.


  1. Pure Ruby, no C or C++ code involved
  2. Very small (~100 lines of code)
  3. Uses the vanilla Ruby socket and server implementations
  4. Decent (high) performance on Ruby 1.9.1
  5. Ruby threading friendly (naturally)
  6. You can have multiple reactors running (like Rev and unlike EventMachine)
Usage is simple, here's a simple Echo server that uses Reactor
require 'reactor'
require 'socket'
reactor = Reactor::Base.new
server = TCPServer.new("0.0.0.0",8080)
reactor.attach(:read, server) do |server|
conn = server.accept
conn.write(conn.gets)
conn.close
end
reactor.run # blocking call, will run for ever

The server is a normal Ruby TCPServer. It attaches itself to the reactor and asks to be notified if there is data to be read on the wire. A block is provided that will handle those notifications. Alternatively, the server can implement a notify_readable method that will be fired instead.

Any IO object can be attached to the reactor but it doesn't make much sense to attach actual files since they will block upon reading or writing anyway. Sockets and pipes will work in a non-blocking manner though.

Reactor is using Ruby's IO.select behind the scenes. This limits its ability to scale in comparison to something like EventMachine or Rev which are able to utilize Epoll and Kqueue which scale much better. This is not a major concern though. Most servers listen to a few fds most of the time, which is a bit faster when using select. Besides one can hope that Ruby will be able to use Epoll and Kqueue some day which will translate to direct benefit to Reactor.