Labels: eventmachine , fibers , neverblock , rails , ruby , ruby 1.9 , threads
Ruby provides several socket classes for various connection protocols. Those classes are arranged in a strange and a convoluted hierarchy.
This ASCII diagram explains this hierarchy
The BasicSocket class provides some common methods but you cannot instantiate it. You have to use one of the sub classes. We have three branches coming out from BasicSocket. One that implements the IP (and descendant) protocls the other implements the UNIX domain sockets protocol. A third branch provides a generic wrapper over FreeBSD sockets. The first problem with this branching strategy is that while the Socket class can be used as a parent class to both UNIXSocket and IPSocket classes the implementer chose to create a separate path for each of them. This results in that there exists lots of code duplication in the implementation that makes maintaining those classes a lot harder than it should be.
A prime example for this is the addition of non blocking features lately to the I/O and socket classes. Only the Socket class was lucky enough to get an accept_nonblocking method. The other classes sadly didn't get it. It is very important to be able to initiate network connections in a non blocking manner if you are using an evented framework (like NeverBlock for example).
What makes the problem worse is that major Ruby network libraries overlook the Socket class and use TCPSocket or UNIXSocket. Net/HTTP for example uses TCPSocket. Since NeverBlock tries to work in harmony with most Ruby libraries it attempts to make up for this inconsistency by altering the default heirarechy of socket classes. Ruby allows you to un-define constants in an object. We remove the TCPSocket and UNIXSocket classes and redefine them by inheriting from Socket and defining some methods to make up for any lost functionality.
After modifying the Socket classes NeverBlock support was integrated. This was done by rewriting the connect, read and write methods so that they would detect the presence of a NeverBlock fiber and operate in an aysnchronous way accordingly. If you use the new socket classes in a non NeverBlock context or in NeverBlock's blocking mode they will resort to the old blocking implementation.
So Here is an example. First we will create a server using EventMachine that takes 1 second to process each request.
server.rb
Second we will create a client that will issue requests to the server
client.rb
Issuing 20 GET requests in NeverBlock fibers causes them to run concurrently. Even while our server process a request in one complete second, they all return after approximately 1 second.
Here is a blocking version
blocking_client.rb
The blocking client finishes after around 20 seconds.
Here's a teaser graph
The really good thing is that we used the Net/HTTP library transparently. Any Ruby library that relies on Ruby sockets will benefit from NeverBlock and gain the ability to run in a concurrent manner.
The clients were tested using Ruby 1.8.6 and 1.9.1. The only exception was the NeverBlock client which was only tested with 1.9.1. This is due to the fact that the current fiber implementation for Ruby 1.8.x is based on threads so it will only reflect a threaded implementation performance. Ruby1.8 was introduced because I noticed problems with the Ruby 1.9 threading implementation regarding scalability and performance so I added Ruby1.8 to the mix which proved to have a (sometimes) faster and more scalable threading implementation.
The application will attempt to issue 1000 requests to the back end server and will try to do so in a concurrent fashion (except for the blocking version of course)
Here are the results
And the results in ASCII format (numbers in cells are requests/sec)
Let's try to explain the results. For a server that has no delay whatsoever (a utopian assumption) we see that the blocking servers offer the greatest performance. Ruby 1.9 in blocking mode comes first mainly due to the fact that Ruby1.9 is faster than Ruby1.8 and also comes with a faster Net/HTTP library[1]. Why is blocking faster? Simply because the evented server is processing the requests serially and the latency is minimal. The request processing send a response and returns immediately so the server does not get a chance to process requests concurrently. This is the fastest that you can drive your processor.
The NeverBlock implementation comes as a very close second to the fastest client which shows that the overhead of using fibers is not that much. Actually we are cheating a bit here, because we make up for the overhead by sending the requests concurrently, and while the server is still processing the serially we are able to process the fiber pause and resume while the server is working.
Needless to say, NeverBlock is much ahead of the threaded clients (either 1.8 or 1.9) when working with the zero latency server. We also see that 1.8 threads are considerably faster than 1.9's.
When we start adding a simulated delay to the server we see that the blocking clients fall dramatically from the first position to the last. They become too slow that they are really not suitable for use in that setting any more. Please note that the results for the 500ms delay are extrapolations. I was to annoyed by the idea of waiting 500 seconds for a test to run, twice!
On the other hand, threaded and NeverBlock implementations are much less affected even though they lose ground as we increase the delay. NeverBlock maintains its lead though over threaded clients. It is generally 2.5X faster.
Here is a graph of the NeverBlock advantage over the fastest threaded client
And in ASCII format
Aside from the NeverBlock advantage the numbers themselves are very impressive. A single process can achieve ~1000 operations per second given that we have half a second processing and network latency. In a mutli process setup we should be able to achieve a lot more than that. For example, forking another NeverBlock client on my dual core notebook which hosts the client and the server apps adds a 50% performance gain.
This ASCII diagram explains this hierarchy
IO
|
BasicSocket
|
|-- IPSocket
| |
| |-- TCPSocekt
| | |
| | |-- TCPServer
| | |
| | |-- SocksSocket
| |
| |-- UDPSocket
|
|-- Socket
|
|-- UNIXSocket
|
UNIXServer
The BasicSocket class provides some common methods but you cannot instantiate it. You have to use one of the sub classes. We have three branches coming out from BasicSocket. One that implements the IP (and descendant) protocls the other implements the UNIX domain sockets protocol. A third branch provides a generic wrapper over FreeBSD sockets. The first problem with this branching strategy is that while the Socket class can be used as a parent class to both UNIXSocket and IPSocket classes the implementer chose to create a separate path for each of them. This results in that there exists lots of code duplication in the implementation that makes maintaining those classes a lot harder than it should be.
A prime example for this is the addition of non blocking features lately to the I/O and socket classes. Only the Socket class was lucky enough to get an accept_nonblocking method. The other classes sadly didn't get it. It is very important to be able to initiate network connections in a non blocking manner if you are using an evented framework (like NeverBlock for example).
What makes the problem worse is that major Ruby network libraries overlook the Socket class and use TCPSocket or UNIXSocket. Net/HTTP for example uses TCPSocket. Since NeverBlock tries to work in harmony with most Ruby libraries it attempts to make up for this inconsistency by altering the default heirarechy of socket classes. Ruby allows you to un-define constants in an object. We remove the TCPSocket and UNIXSocket classes and redefine them by inheriting from Socket and defining some methods to make up for any lost functionality.
After modifying the Socket classes NeverBlock support was integrated. This was done by rewriting the connect, read and write methods so that they would detect the presence of a NeverBlock fiber and operate in an aysnchronous way accordingly. If you use the new socket classes in a non NeverBlock context or in NeverBlock's blocking mode they will resort to the old blocking implementation.
So Here is an example. First we will create a server using EventMachine that takes 1 second to process each request.
server.rb
require 'eventmachine'
class Server < EM::Connection
# handle requests here
def receive_data data
# set the respnonse to be sent after 1 second
EM.add_timer(1) do
send_data "HTTP/1.1 200 OK\r\n\r\ndone"
close_connection_after_writing
end
end
end
EM.run do
EM.start_server('0.0.0.0',8080, Server)
end
Second we will create a client that will issue requests to the server
client.rb
require 'neverblock'
require 'net/http'
EM.run do
@pool = NB::FiberPool.new(20)
20.times do
@pool.spawn do
url = "http://localhost:8080"
res = Net::HTTP.start(url.host, url.port) { |http| http.get('/') }
end
end
end
Issuing 20 GET requests in NeverBlock fibers causes them to run concurrently. Even while our server process a request in one complete second, they all return after approximately 1 second.
Here is a blocking version
blocking_client.rb
require 'net/http'
20.times do
url = "http://localhost:8080"
res = Net::HTTP.start(url.host, url.port) { |http| http.get('/') }
end
The blocking client finishes after around 20 seconds.
Here's a teaser graph
The really good thing is that we used the Net/HTTP library transparently. Any Ruby library that relies on Ruby sockets will benefit from NeverBlock and gain the ability to run in a concurrent manner.
What does that mean?
Originally, NeverBlock only supported concurrent database access for PostgreSQL and MySQL. While this was good and all, databases usually were the bottlenecks of most applications. Unless you have something like a database cluster which can truly absorb any load. This was a shame, since NeverBlock is meant for high levels of concurrency that are only available with massively scalable back ends. With this new development, however, we are now one step closer to tapping into this realm of high performance and scalable web applications. Read on.Enter AWS and the cloud
Amazon Web Services provide an example of a massively scalable backend that is accessible via HTTP. Services like S3, SimpleDB and SQS are all a URL away. Such services have a higher latency than your nearby database server but they more than make up for that by being able to absorb all the requests you through at them. Most of the Ruby libraries for accessing AWS rely on Net/HTTP in some way or another. This means we get NeverBlock support for those libraries. Now this is big news for those Ruby applications (including Rails ones) that rely on an AWS or a similar backend. For those types of apps, forget about a 10 or 20 fibers pool. We are talking a 1000 fibers pool here. Even higher numbers could be possible (once a nasty file descriptor bug in Ruby 1.9 is fixed).Why Not Threads?
I have been claiming that Ruby fibers are faster than Ruby threads[1]. I have seen that in my tests but those were usually limited to testing a single performance metric. So I decided to simulate a very scalable back end and see which approach offers more scalability. For testing purposes I created two client applications. One is threaded and the other is based on NeverBlock. In the NeverBlock version I did not use the fiber pool though, I was creating a new fiber per operation to mimic the threaded app behavior. The simulated scalable back end consisted of an EventMachine based server that waits for a certain time before responding with 200 OK. The delay time is to simulate back end processing and network latencies. I testing using 0, 10, 50, 100 and 500 ms as delay values. Another client application was written that worked in the normal blocking mode for comparison.The clients were tested using Ruby 1.8.6 and 1.9.1. The only exception was the NeverBlock client which was only tested with 1.9.1. This is due to the fact that the current fiber implementation for Ruby 1.8.x is based on threads so it will only reflect a threaded implementation performance. Ruby1.8 was introduced because I noticed problems with the Ruby 1.9 threading implementation regarding scalability and performance so I added Ruby1.8 to the mix which proved to have a (sometimes) faster and more scalable threading implementation.
The application will attempt to issue 1000 requests to the back end server and will try to do so in a concurrent fashion (except for the blocking version of course)
Here are the results
And the results in ASCII format (numbers in cells are requests/sec)
Server Delay 0ms 10ms 50ms 100ms 500ms
Ruby1.8 Blocking 2000 19 16 10 2
Ruby1.9 Blocking 2400 19 17 10 2
Ruby1.8 Threaded 1050 800 670 536 415
Ruby1.9 Threaded 618 470 451 441 395
Ruby1.9 NeverBlock 2360 1997 1837 1656 1031
Let's try to explain the results. For a server that has no delay whatsoever (a utopian assumption) we see that the blocking servers offer the greatest performance. Ruby 1.9 in blocking mode comes first mainly due to the fact that Ruby1.9 is faster than Ruby1.8 and also comes with a faster Net/HTTP library[1]. Why is blocking faster? Simply because the evented server is processing the requests serially and the latency is minimal. The request processing send a response and returns immediately so the server does not get a chance to process requests concurrently. This is the fastest that you can drive your processor.
The NeverBlock implementation comes as a very close second to the fastest client which shows that the overhead of using fibers is not that much. Actually we are cheating a bit here, because we make up for the overhead by sending the requests concurrently, and while the server is still processing the serially we are able to process the fiber pause and resume while the server is working.
Needless to say, NeverBlock is much ahead of the threaded clients (either 1.8 or 1.9) when working with the zero latency server. We also see that 1.8 threads are considerably faster than 1.9's.
When we start adding a simulated delay to the server we see that the blocking clients fall dramatically from the first position to the last. They become too slow that they are really not suitable for use in that setting any more. Please note that the results for the 500ms delay are extrapolations. I was to annoyed by the idea of waiting 500 seconds for a test to run, twice!
On the other hand, threaded and NeverBlock implementations are much less affected even though they lose ground as we increase the delay. NeverBlock maintains its lead though over threaded clients. It is generally 2.5X faster.
Here is a graph of the NeverBlock advantage over the fastest threaded client
And in ASCII format
Server Delay 0ms 10ms 50ms 100ms 500ms
NeverBlock Advantage 124.76% 149.63% 174.18% 208.96% 148.43%
Aside from the NeverBlock advantage the numbers themselves are very impressive. A single process can achieve ~1000 operations per second given that we have half a second processing and network latency. In a mutli process setup we should be able to achieve a lot more than that. For example, forking another NeverBlock client on my dual core notebook which hosts the client and the server apps adds a 50% performance gain.
sweet holy awesome. 1.9 haters: start learning to enjoy attempting to drive from the back seat :D
Very interesting post, I would be very interesting in reading your clients code.
Thx.
Muhammed, thanks for sharing this blog post with us. Please write more often :-)
One question... Are you working with the Modrails (Phusion Passenger) team?
I'm curious if is (or will be) possible to use Neverblock with Passenger.
Thanks again.
I think fibers really shine when you have "lots" of threads [or, conversely, lots of fibers]. Then it rocks and can hopefully use a single core effectively.
Thanks for your work on this!
For sure fibers are better if you need a lot of them--I wonder if a thread pool versus fiber pool would be about the same speed? I'd imagine they're probably about the same.
Nice job.
-=R
This is great stuff, but when will it be released to the public?