The case for a nonblocking Ruby stack

Labels: , , , , , , , , , ,

In a previous post I talked about the problems that plauge the web based Ruby applications regarding processor and memory use. I proposed using non-blocking IO as a solution to this problem. In a follow up post I benchmarked nonblocking vs blocking performance using the async facilities in the Ruby Postgres driver in combination with Ruby Fibers. The results were very promising (up to 40% improvement) that I decided to take the benchmarking effort one step further. I monkey patched the ruby postgres driver to be fiber aware and was able to integrate it into sequel with little to no effort. Next I used the unicycle monorail server (the EventMachine HTTP server) in an eventmachine loop. I created a dumb controller which would query the db and render the results using the Object#to_json method.

As was done with the evented db access benchmark, a long query ran every n short queries (n belongs to {5, 10, 20, 50, 100}). The running application accepted 2 urls. One ran db operations in normal mode and the other ran in nonblocking mode (every action invocation was wrapped in a fiber in the latter case)

Here are the benchmark results

Full results

Comparing the number of requests/second fulfilled by each combination of blocking mode and conncurrency level. The first had the possible values of [blocking, nonblocking] the second had the possible values of [5, 10, 20, 50, 100]



Advantage Graph

Comparing the advantage gained for nonblocking over blocking mode for different long to short query ratios. Displaying the results for different levels of concurrency



And the full results in tabular form

Concurrent Requests
Ratio 10 100 1000

1 To 100 Nonblocking 456.94 608.67 631.82
1 To 100 Blocking 384.82 524.39 532.26
Advantage 18.74% 16.07% 18.71%

1 To 50 Nonblocking 377.38 460.74 471.89
1 To 50 Blocking 266.63 337.49 339.01
Advantage 41.54% 36.52% 39.20%

1 To 20 Nonblocking 220.44 238.63 266.07
1 To 20 Blocking 142.6 159.7 141.92
Advantage 54.59% 49.42% 87.48%

1 To 10 Nonblocking 130.87 139.76 195.02
1 To 10 Blocking 78.68 84.84 81.07
Advantage 66.33% 64.73% 140.56%

1 To 5 Nonblocking 70.05 75.5 109.34
1 To 5 Blocking 41.48 42.13 41.77
Advantage 68.88% 79.21% 161.77%

Conclusion

In accordance with my expectations. The nonblocking mode outperforms the blocking mode as long as enough long queries come into the play. If all the db queries are very small then the blocking mode will triumph mainly due to the overhead of fibers. But nevertheless, once there is a even single long query for every 100 short queries the performance is swayed into the nonblocking mode favor. There are still a few optimizations to be done, mainly complete the integrations with the EventMachine which should theoritically enhance performance. The next step is to integrate this into some framework and build a real application using this approach. Since Sequel is working now having Ramaze or Merb running in non-blocking mode should be a fairly easy task. Sadly Rails is out of the picture for now as it does not support Ruby 1.9 yet.

I reckon an application that does all its IO in an evented way will need much less processes per CPU core to make full use of it. Actually I am guessing that a single core can be maxed by a single process. If this is the case then I can live much happier if I can replace the 16 Thin processes running on my server with only 4. Couple that with the 30% memory savings we get from using RubyEE and we are talking about an amazing 82.5% memory foot print reduction without sacrificing performance.

Comments (12)

The background makes this page unreadable for me. It's almost like an optical illusion.

Point taken. I should be changing the whole theme soon.

Hi -

Database access can also be done in an event loop. It's simply another socket connection - the web server can process other requests while waiting for a response. Of course this requires a complete redesign of how web apps are currently structured. Rails, Merb, nor Rack are structured in a way that breaks up the request processing into such chunks. Make all external queries (memcached, database, etc) evented and part of the request loop and you'll see very good performance, i think.

ry

@four, The benchmark actually behaves like this. I have modded the PostgreSQL adapter to behave in an evented way (in case it finds itself operating in the context of a special fiber). I managed to use Sequel for the above benchmarks. Making it Merb or Ramaze compatible should be easy, the framework need not know that IO operations are evented, Fibers take care of all the slicing and dicing behind the scenes.

What I was suggesting was a single threaded single event loop for every connection. The database file descriptors should be part of the same select/poll group as the http clients.

this is possible but you must break request processing into before-db-query and after-db-query functions since nothing may block.

if you're interested in discussing such a system further email me: ry at tinyclouds dot org

In terms of integrating this with existing servers, I know that the ramaze guys have something in the works to make ramaze "fiber friendly" -- I'm not sure if the work has made its way all the way to being committed, but some branch or other works that way, you could ask them. Merb also hopes to be 1.9 compatible as of version 1. Also note that if you need an evented driver you could use a fibered mongrel [either rev's version, or that described in http://209.85.173.104/search?q=cache:EcGf70nMwgwJ:betterlogic.com/roger/%3Fcat%3D35+asymy&hl=en&ct=clnk&cd=3&gl=us&client=firefox-a ] and hook that in with your now fully functional postgres driver and it should work as a full app. Way to go. I can imagine this becoming quite popular among 1.9 apps. Could work nicely. good call on the massive memory savings, too. Those running on VPS's will probably benefit, and perhaps it will be stronger overall.
-R

I think the good news might be that, though fibered DB is slower synchronous for short queries, it is only slower for them when there is low load [in a high load you'd expect them to be interspersed with some long queries, thus the advantage of asynch winning out]. Since they are only faster in low load, that means that there are more resources available to process them. And since there isn't a high load, it'll probably finish quickly, anyway. I.e. the slowness is during a time of non contention, so the slowness doesn't hurt, because overall it'll run fast. Good luck with all that.

Four's thought is interesting, too. With EventMachine you're stuck with one event loop. However with rev each thread can have its own event loop, if desired. I'm not sure if running multi-threaded + fibered would help or hurt things.

[1] found that "multi threaded and multi process" was a good combo. Don't know if that means the same as "multi fibered and multi threaded" though.

-R
[1] http://compoundthinking.com/blog/index.php/2008/05/14/threads-processes-rails-trubogears-and-scalability/

@mormon, I just ran a very simple query ('select id from users limit 1') and compared both approaches. The blocking mode is faster before I reach a count of 100 and after I reach a count of 2000 short queries.

Adding a single long query ('select sleep(1)') pushed the limit to 6000 queries before blocking mode started to gain back.

I am not so enthusiastic about threads though. If all IO is nonblocking, why would we need another context switching overhead?

Interesting that blocking mode would be faster after 2000. I wonder why that would be. I guess you could almost make the assertion that "asynch IO is, in general, slightly slower for fast queries"

Yeah I think you're right about multi-threaded. Perhaps multi-process [as mentioned] would be better, just because of the GLI :)

With regard to RAM usage, perhaps since you'd be cutting down on Thin processes, you could do without the COW aspect of REE, but keep the [newer, faster memory allocator], which could speed up your app about 10% [1]. So you win twice.
Just a thought.

--
[1]
http://groups.google.com/group/emm-ruby/browse_thread/thread/1df52f59b266b9b6/0dadab7d1c94e17b?lnk=gst&q=20+slower#0dadab7d1c94e17b
[1]

I wonder if blocking mode is faster with > 2000 queries because it still runs fewer simultaneous queries at the same time [blocking for each one] or what not. I guess. Seeing as it seems faster for very fast queries to block, one option would be to 'guess' if a query is going to be extremely fast or not--if it is then just revert back to the old blocking way. The benefit of this being that while you are retrieving your fast query the other long queries which are 'fiber blocked' are still running in the background.
I wonder if there's some play in the size of the fiber pool, should those things be the case [to allow one to not slam the mysql server with too many requests].
Thanks!
-=R

Guess what? I was just working on this feature (allowing queries that you know are very fast to run in blocking mode). I came up with a nice interface to wrap those blocking queries which I should be releasing soon, among other things.

As for overwhelming the db, the db connection pool should be decoupled from the fiber pool, it should also handle the "too many connections" error gracefully.

I guess with the new library you'll be able to know better the impact of fibers [versus whatever else is in there].