Building the Never Blocking Rails, Making Rails 12X Faster

15

Labels: , , , , ,

They told you it can't be done, they told you it has no scale. They told you lies!

What if you suddenly had the ability to serve mutliple concurrent requests in a single Rails instance? What if you had the ability to multiplex IO operations from a single Rails instance?

No more what ifs. It has been done.

I was testing NeverBlock support for Rails. For testing I built a normal Rails application. Nothing up normal here, you get the whole usual Rails deal, routes, controllers, ActiveRecord models and eRuby templates. I am using the Thin server for serving the application and PostgreSQL as a database server. The only difference is that I was not using the PostgreSQL adapter, rather I was using the NeverBlock::PostgreSQL adapter.

All I needed to do is to call the adapter in database.yml neverblock_postgresql instead of postgresql and require 'never_block/server/thin' in my production.rb

All this was working with Ruby 1.9, so I had to comment out the body of the load_rubygems method in config/boot.rb which is not needed in Ruby1.9 anyway.

Now what difference does this thing make?

It allows you to process multiple requests concurrently from a single Rails instance. It does this by utilizing the async features of the PG client interface coupled with Fibers and the EventMachine to provide transparent async operations.

So, when a Rails action issue any ActiveRecord operation it will be suspended and another Rails action can kick in. The first one will be resumed once PostgreSQL has provided us with the data.

To make a quick test, I created a controller which would use an AR model to issue the following sql command "select sleep(1)". (sleep does not come by default with PostgreSQL, you have to implement it yourself). I ran the applications with the normal postgresql adapter and used apache bench to measure the performance of 10 concurrent requests.

Here are the results:
Server Software:        thin
Server Hostname: localhost
Server Port: 3000

Document Path: /forums/sleep/
Document Length: 11 bytes

Concurrency Level: 10
Time taken for tests: 10.248252 seconds
Complete requests: 10
Failed requests: 0
Write errors: 0
Total transferred: 4680 bytes
HTML transferred: 110 bytes
Requests per second: 0.98 [#/sec] (mean)
Time per request: 10248.252 [ms] (mean)
Time per request: 1024.825 [ms] (mean, across all concurrent requests)
Transfer rate: 0.39 [Kbytes/sec] received


Almost 1 request per second. Which is what I expected. Now I switched to the new adapter, restarted thin and redid the test.

Here are the new results:

Server Software: thin
Server Hostname: localhost
Server Port: 3000

Document Path: /forums/sleep/
Document Length: 11 bytes

Concurrency Level: 10
Time taken for tests: 1.75797 seconds
Complete requests: 10
Failed requests: 0
Write errors: 0
Total transferred: 4680 bytes
HTML transferred: 110 bytes
Requests per second: 9.30 [#/sec] (mean)
Time per request: 1075.797 [ms] (mean)
Time per request: 107.580 [ms] (mean, across all concurrent requests)
Transfer rate: 3.72 [Kbytes/sec] received


Wow! a 9x speed improvement! The database requests were able to run concurrently and they all came back together.

I decided to simulate various work loads and test the new implementation against the old one. I devised the workloads taking into account that the test machine did have a rather bad IO perfromance so I decided to use queries that would not tax the IO but still would require the PostgreSQL to take it's time. The work loads were categorized as follows:

First a request would issue a "select 1" query, this is the fastest I can think of, then for the differen work loads

1 - Very light  work load,  every 200 requests, one "select sleep(1)" would be issued 
2 - Light work load, every 100 requests, one "select sleep(1)" would be issued
3 - Moderate work load, every 50 requests, one "select sleep(1)" would be issued
4 - Heavy work load, every 20 requests, one "select sleep(1)" would be issued
5 - Very heavy work load, every 10 requests, one "select sleep(1)" would be issued


I tested those workloads against the following

1 - 1 Thin server, normal postgreSQL Adapter
2 - 2 Thin servers (behind nginx), normal postgreSQL Adapter
3 - 4 Thin servers (behind nginx), normal postgreSQL Adapter
4 - 1 Thin server, neverblock postgreSQL Adapter


I tested with 1000 queries and a concurrency of 200 ( the mutliple thin servers were having problems above that figure, the new adapter scaled up to 1000 with no problems, usually with similar or slightly better results )

Here are the graphed results:



For the neverblock thin server I was using a pool of 12 connections. As you can see from the results, In very heavy workload I would perform on par with a 12 Thin cluster. Generally the NeverBlock Thin server easily outperforms the 4 Thin cluster. The margin increases as the work load gets heavier.

And here are the results for scaling the number of concurrent connections for a NeverBlock::Thin server



Traditionally we used to spawn as many thin servers as we can till we run out of memory. Now we don't need to do so, as a single process will maintain multiple connections and would be able to saturate a single cpu core, hence the perfect setup seems to be a single server instance for each processor core.

But to really saturate a CPU one has to do all the IO requests in a non-blocking manner, not just the database. This is exactly the next step after the DB implementation is stable, to enrich NeverBlock with a set of IO libraries that operate in a seemingly blocking way while they are doing all their IO in a totally transparent non-blocking manner, thanks to Fibers.

I am now wondering about the possibilities, the reduced memory footprint gains and what benefits such a solution can bring to the likes of dreamhost and all the Rails hosting companies.