Showing posts with label postgres. Show all posts
Showing posts with label postgres. Show all posts

Building the Never Blocking Rails, Making Rails 12X Faster

15

Labels: , , , , ,

They told you it can't be done, they told you it has no scale. They told you lies!

What if you suddenly had the ability to serve mutliple concurrent requests in a single Rails instance? What if you had the ability to multiplex IO operations from a single Rails instance?

No more what ifs. It has been done.

I was testing NeverBlock support for Rails. For testing I built a normal Rails application. Nothing up normal here, you get the whole usual Rails deal, routes, controllers, ActiveRecord models and eRuby templates. I am using the Thin server for serving the application and PostgreSQL as a database server. The only difference is that I was not using the PostgreSQL adapter, rather I was using the NeverBlock::PostgreSQL adapter.

All I needed to do is to call the adapter in database.yml neverblock_postgresql instead of postgresql and require 'never_block/server/thin' in my production.rb

All this was working with Ruby 1.9, so I had to comment out the body of the load_rubygems method in config/boot.rb which is not needed in Ruby1.9 anyway.

Now what difference does this thing make?

It allows you to process multiple requests concurrently from a single Rails instance. It does this by utilizing the async features of the PG client interface coupled with Fibers and the EventMachine to provide transparent async operations.

So, when a Rails action issue any ActiveRecord operation it will be suspended and another Rails action can kick in. The first one will be resumed once PostgreSQL has provided us with the data.

To make a quick test, I created a controller which would use an AR model to issue the following sql command "select sleep(1)". (sleep does not come by default with PostgreSQL, you have to implement it yourself). I ran the applications with the normal postgresql adapter and used apache bench to measure the performance of 10 concurrent requests.

Here are the results:
Server Software:        thin
Server Hostname: localhost
Server Port: 3000

Document Path: /forums/sleep/
Document Length: 11 bytes

Concurrency Level: 10
Time taken for tests: 10.248252 seconds
Complete requests: 10
Failed requests: 0
Write errors: 0
Total transferred: 4680 bytes
HTML transferred: 110 bytes
Requests per second: 0.98 [#/sec] (mean)
Time per request: 10248.252 [ms] (mean)
Time per request: 1024.825 [ms] (mean, across all concurrent requests)
Transfer rate: 0.39 [Kbytes/sec] received


Almost 1 request per second. Which is what I expected. Now I switched to the new adapter, restarted thin and redid the test.

Here are the new results:

Server Software: thin
Server Hostname: localhost
Server Port: 3000

Document Path: /forums/sleep/
Document Length: 11 bytes

Concurrency Level: 10
Time taken for tests: 1.75797 seconds
Complete requests: 10
Failed requests: 0
Write errors: 0
Total transferred: 4680 bytes
HTML transferred: 110 bytes
Requests per second: 9.30 [#/sec] (mean)
Time per request: 1075.797 [ms] (mean)
Time per request: 107.580 [ms] (mean, across all concurrent requests)
Transfer rate: 3.72 [Kbytes/sec] received


Wow! a 9x speed improvement! The database requests were able to run concurrently and they all came back together.

I decided to simulate various work loads and test the new implementation against the old one. I devised the workloads taking into account that the test machine did have a rather bad IO perfromance so I decided to use queries that would not tax the IO but still would require the PostgreSQL to take it's time. The work loads were categorized as follows:

First a request would issue a "select 1" query, this is the fastest I can think of, then for the differen work loads

1 - Very light  work load,  every 200 requests, one "select sleep(1)" would be issued 
2 - Light work load, every 100 requests, one "select sleep(1)" would be issued
3 - Moderate work load, every 50 requests, one "select sleep(1)" would be issued
4 - Heavy work load, every 20 requests, one "select sleep(1)" would be issued
5 - Very heavy work load, every 10 requests, one "select sleep(1)" would be issued


I tested those workloads against the following

1 - 1 Thin server, normal postgreSQL Adapter
2 - 2 Thin servers (behind nginx), normal postgreSQL Adapter
3 - 4 Thin servers (behind nginx), normal postgreSQL Adapter
4 - 1 Thin server, neverblock postgreSQL Adapter


I tested with 1000 queries and a concurrency of 200 ( the mutliple thin servers were having problems above that figure, the new adapter scaled up to 1000 with no problems, usually with similar or slightly better results )

Here are the graphed results:



For the neverblock thin server I was using a pool of 12 connections. As you can see from the results, In very heavy workload I would perform on par with a 12 Thin cluster. Generally the NeverBlock Thin server easily outperforms the 4 Thin cluster. The margin increases as the work load gets heavier.

And here are the results for scaling the number of concurrent connections for a NeverBlock::Thin server



Traditionally we used to spawn as many thin servers as we can till we run out of memory. Now we don't need to do so, as a single process will maintain multiple connections and would be able to saturate a single cpu core, hence the perfect setup seems to be a single server instance for each processor core.

But to really saturate a CPU one has to do all the IO requests in a non-blocking manner, not just the database. This is exactly the next step after the DB implementation is stable, to enrich NeverBlock with a set of IO libraries that operate in a seemingly blocking way while they are doing all their IO in a totally transparent non-blocking manner, thanks to Fibers.

I am now wondering about the possibilities, the reduced memory footprint gains and what benefits such a solution can bring to the likes of dreamhost and all the Rails hosting companies.

ActiveRecord meets NeverBlock

9

Labels: , , , ,

I happily announce the release of the first NeverBlock enabled activerecord adapter. The neverblock-postgresql-adapter. This is a beta release but I have been testing it for a while now with great results.

And while this is a big improvement it only requires you to replace the driver name in the connection to neverblock_postgresql instead of postgresql as described in the official neverblock blog

To make a long story short, this enables active record to issue queries in parallel, much like in a multi-threaded application. But this has several advantages over multi-threaded operations:

  1. Fibers are cheaper than threads so this solution is theoretically faster.

  2. NeverBlock does not require full thread safety, just avoid using globals and static variables for transient state.

  3. It integrates nicely in evented programs thus eliminating the performance drop which occurs with the introduction of threads in such environments


I have benchmarked this against the plain postgresql adapter using different workloads categorized as follows

Very Light : A single count statement
Light : A single count and a create
Moderate : 2 counts and a create wrapped in a transaction that rolls back
Heavy : 3 counts, a create and an update wrapped in a transaction that commits
Very Heavy : 3 counts, one conditional count (on a non-indexed field), a create and two updates all wrapped in a transaction that commits

(if you are wondering why these queries in particular, they were extracted from some other code)

All were issued 1000 times

The results came as follows:



As you can see, NeverBlock::AR is persistently faster than vanilla AR. It appears that such work loads generate linear increase for both AR and NeverBlock::AR as the NeverBlock advantage was almost the same



Another benchmark was performed to test the effect of increasing the connection count for NeverBlock::AR. We tested with 2, 4, 8, 16 and 32 connections.

The benchmark consisted of first running "select 1" 5000 times and then running "select sleep(10)" "select sleep(1)" 20 times for each configuration.



As you can probably guess, increasing connection count has very little effect if the queries are all very fast (you cannot beat "select 1") but if the queries are all slow, you will be able to double the performance by simply doubling the connection count.

I hope this gives you a glimpse of what's coming next. Watch this space

101 Reasons Why PostgreSQL is a better fit for Rails than MySQL

26

Labels: , , ,

1 - Indexing Support

MySQL cannot utilize more than one index per query. I believe this is worth repeating: MySQL CANNOT UTILIZE MORE THAN ONE INDEX PER QUERY. Wait till your tables get large enough and this will surely hit you. OTOH PostgreSQL can use multiple indices per query which come real handy.

2 - Full Text Indexing Support

MySQL can do full text indexing on MyISAM tables only, those working with InnoDB tables are out if luck. PostgreSQL has very advanced full text indexing capabilities wich enable you to control the tiniest details down to the stemming strategy.

3 - Asynchronous Interface

MySQL drivers are very unfriendly to the Ruby interpreter. Once a command is issued they take over until they come back with results. PostgreSQL sports a completely asynchronous interface where you can send queries to the database and then tend to other matters while the query is being processed by the server. The good news is that an Async ActiveRecord adapter for MySQL is being developed right now, as part of the rapidly growing NeverBlock library.

4 - Ruby Threading Aware

PostgreSQL dirvers enable the Ruby thread scheduler while IO requests are being processed (a nice side effect of the async interface). Which makes it much better suited for multithreaded Rails apps.

5 - Multistatements Per Query

Both MySQL and PostgreSQL support sending multiple statements separated by semi colons at once. But the returning result will be that of the last statement in the group. Now did you know that by using the async interface you can send multiple queries at once and then get back the results, one by one? One of the coolest features of the coming ActiveRecord (and Sequel btw) adapter is it's support for queuing queries to be consumed by a pool of connections. A trick we are contemplating working on is to group consequent selects together and send them in a single request to PostgreSQL and then later extract the results associated with each one of them. This is still very theoretical but should be verified soon.

Now that the 0b101 reasons are told I rest my case.

NeverBlock, much faster IO for Ruby

5

Labels: , , , ,

At eSpace we have just released an alpha version of NeverBlock. A library that aims to bring evented IO to the masses. It does so by wrapping all IO in Fibers which handle all the async aspects and hides them totally from the developers.

Just as a teaser, here are some benchmarks of running PostgreSQL queries with and without NeverBlock



10x performance boost? how about that?

I am working on extending the NeverBlock library now, watch this space for great news soon

The case for a nonblocking Ruby stack

12

Labels: , , , , , , , , , ,

In a previous post I talked about the problems that plauge the web based Ruby applications regarding processor and memory use. I proposed using non-blocking IO as a solution to this problem. In a follow up post I benchmarked nonblocking vs blocking performance using the async facilities in the Ruby Postgres driver in combination with Ruby Fibers. The results were very promising (up to 40% improvement) that I decided to take the benchmarking effort one step further. I monkey patched the ruby postgres driver to be fiber aware and was able to integrate it into sequel with little to no effort. Next I used the unicycle monorail server (the EventMachine HTTP server) in an eventmachine loop. I created a dumb controller which would query the db and render the results using the Object#to_json method.

As was done with the evented db access benchmark, a long query ran every n short queries (n belongs to {5, 10, 20, 50, 100}). The running application accepted 2 urls. One ran db operations in normal mode and the other ran in nonblocking mode (every action invocation was wrapped in a fiber in the latter case)

Here are the benchmark results

Full results

Comparing the number of requests/second fulfilled by each combination of blocking mode and conncurrency level. The first had the possible values of [blocking, nonblocking] the second had the possible values of [5, 10, 20, 50, 100]



Advantage Graph

Comparing the advantage gained for nonblocking over blocking mode for different long to short query ratios. Displaying the results for different levels of concurrency



And the full results in tabular form

Concurrent Requests
Ratio 10 100 1000

1 To 100 Nonblocking 456.94 608.67 631.82
1 To 100 Blocking 384.82 524.39 532.26
Advantage 18.74% 16.07% 18.71%

1 To 50 Nonblocking 377.38 460.74 471.89
1 To 50 Blocking 266.63 337.49 339.01
Advantage 41.54% 36.52% 39.20%

1 To 20 Nonblocking 220.44 238.63 266.07
1 To 20 Blocking 142.6 159.7 141.92
Advantage 54.59% 49.42% 87.48%

1 To 10 Nonblocking 130.87 139.76 195.02
1 To 10 Blocking 78.68 84.84 81.07
Advantage 66.33% 64.73% 140.56%

1 To 5 Nonblocking 70.05 75.5 109.34
1 To 5 Blocking 41.48 42.13 41.77
Advantage 68.88% 79.21% 161.77%

Conclusion

In accordance with my expectations. The nonblocking mode outperforms the blocking mode as long as enough long queries come into the play. If all the db queries are very small then the blocking mode will triumph mainly due to the overhead of fibers. But nevertheless, once there is a even single long query for every 100 short queries the performance is swayed into the nonblocking mode favor. There are still a few optimizations to be done, mainly complete the integrations with the EventMachine which should theoritically enhance performance. The next step is to integrate this into some framework and build a real application using this approach. Since Sequel is working now having Ramaze or Merb running in non-blocking mode should be a fairly easy task. Sadly Rails is out of the picture for now as it does not support Ruby 1.9 yet.

I reckon an application that does all its IO in an evented way will need much less processes per CPU core to make full use of it. Actually I am guessing that a single core can be maxed by a single process. If this is the case then I can live much happier if I can replace the 16 Thin processes running on my server with only 4. Couple that with the 30% memory savings we get from using RubyEE and we are talking about an amazing 82.5% memory foot print reduction without sacrificing performance.

Faster IO for Ruby with Postgres

13

Labels: , , , , ,

Or 40% faster DB Access for your Ruby applications!

In a previous post I talked a bit about event based programming for Ruby. I mentioned the EventMachine/Asymy combo as a means of doing Asynchronous database operations hence freeing up the Ruby runtime to do other things while it is waiting on database I/O operations. Even more, the devs need not worry about using a different programming model, with the help of Ruby Fibers we will continue to program in the same old ways while Fibers will be doing all the twisted work underneath. Very promising indeed, but one big elephant in the room was the immaturity of the current solution. Asymy is still very infant and it is based on the super slow pure Ruby MySQL driver, not to mention that it is fairly incomplete as well.


So, what can we do about the elephant in the room? There is an Arabic proverb that basically says "Nothing can beat iron but iron" and this is exactly what we are going to do. Enter Postgres, the database with a realistic, unfriendly elephant mascot. Go away dolphins, a real elephant is in the room now.

Surprisingly Postgres happens to have an excellent asynchronous client API. It allows you to do almost all operations in a non blocking way. More surprisingly the Postgres driver for Ruby covers almost all those asynchronous API calls. The driver was originally written by Matz (yes the man himself) in 1997. It was later updated by ematsu in 1999 and now we have an update fresh from the oven in March 2008 by Jdavis. If you go through the C source code you will find many hidden gems. The methods are fairly well documented and you will discover that the driver has a blocking method that wraps the asynchronous calls inside but it does so in a Ruby threading friendly way. This way a threaded application will not block on the Postgres SQL commands. Good thing but I am more interested in the asynchronous side of the fence.

Let's walk through the API and see how can we use it to do non blocking database access. First you will need to install the gem ("sudo gem install pg"). Then you need to require 'pg' in your code.

One problem though before we start. The current driver has this nasty little bug that prevents you from setting the connection to nonblocking. It is actually a bug in the parameter count defined in the Ruby interface. A simple switch from 0 to 1 fixes this. To save you time and sweat I have provided a replacement gem with the modified sources (till the bug is fixed upstream). Now let's get back to the code.

Here's how to get things started
require 'pg'
# I have configured postgres to run in *trusted* mode
# so I don't need to supply a password
conn = PGconn.new({:host=>'localhost',:user=>'postgres',:dbname=>'evented'})
conn.setnonblocking(true)
This way our connection is ready for async operations. Now we need to start sending some sql commands to our connection. To do that we normally use the PGconn#exec method. But this method will block, waiting on postgres. So instead we will use the PGconn#send_query method. This method will return immediately, not waiting for Postgres to actually process the sql command. Here's how are going to use it.
conn.send_query("select * from users where name like '%am%'")
# the method will return immediately (or raise an exception in case of an error)
But wait, where are the results? Normally we expect the call to return with the data. Now where is my data? The results are being processed right now at the server side. We can continue to do other things till they come. But how do we know when they arrive? It turns out that this is easy as well. The PGconn instance provides a method that returns the connection's socket descriptor. PGconn#socket that is. We retrieve that socket descriptor and wrap it in a Ruby IO object by calling
io = IO.new(conn.socket)
Now have a nice IO object that we can get notified of its activity in a select call. For the uninitiated, event based programming is done by have a tight loop that runs forever. Within this loop we check if IO events happen and if so we respond to them. One efficient way of doing so is using the Ruby Kernel#select method (which is a wrapper to the UNIX select). The select method works that way: you provide it with three lists, one for sockets that you need to read from and one for sockets that you need to write to, the third is for errors that you are interested in. The call returns an array of the sockets that can be read/write or nil if none is ready.

We will use select as follows:
# the method that will be called if input is ready
def process_command(conn)
# we will detail the implementation soon
end

loop do
# we supply a list of sockets we need to read from.
# Only our io object in this case. we nullify
# the other lists and we set a timeout
res = select([io],nil,nil, 0.001)
# of course this needs to be done in a cleaner way
process_command(conn) unless result.nil?
end
This way whenever there is info to read from the socket we will not get a nil (we will get an array actually) so we can call the process command. When the process command gets called it knows that there is data in the connection to be read so it calls the PGconn#consume_input method. After which it checks to see if the conn is busy or not. If it is still busy, it does nothing (it will do in a later event). On the other hand, if the connection is not busy then we start calling the PGconn#get_result method and append what we get to the result we got so far. We keep doing that till we get a nil result which indicates the end of the command and the readiness of the connection to accept further commands. Here is how the method will look like:
def process_command(conn)
conn.consume_input
unless conn.is_busy
res, data = 0, []
while res != nil
res = get_result
res.each {|d| data.push d}unless res.nil?
end
#we are done, we need to put this data some where
end
end
Several things to be noted. First, one cannot process several commands using the same connection at once. You need several connections to achieve parallel command processing. Second, the model described above works in the twisted way, to get things working the normal way you can use Ruby Fibers (or continuations but they apparently leak memory)

I have put together a couple of Ruby classes that implement a nonblocking connection pool and a fiber pool. You can find them here Using those you can write code that looks like this:
require 'fiber_pool'
require 'fibered_connection_pool'

options = {:host=>'localhost',:user=>'postgres',:dbname=>'evented'}

cpool = FiberedC onnectionPool.new(options, 12)
# second param is the number of connections to spawn, defaults at 8
# note that one more connection than those will be spawned. This one
# will be used for processing blocking requests.

fpool = FiberPool.new(100)
# the number of fibers to spawn, defaults at 50

100.times do
fpool.spawn do
cpool.exec(some_sql_command, true) #true means async
cpool.exec(some_other_sql_command, true)
cpool.exec(yet_another_sql_command, true)
end
end

# our event loop
loop do
res = select(cpool.sockets,nil,nil,0) #check for something to read
# IO is monkey patched to be able to hold a reference to the connection
res.first.each{ |s|s.connection.process_command } if res
end
This works as follows, once a fiber calls cpool.exec the query is sent to the pool for processing and the fiber is halted, giving way for another one to start processing. The other one will halt as well once it hits a cpool.exec. Later during the event loop you will get notifications of completion of queries (in any order) and resume the fiber associated with the finished query. Note that commands issued in the same fiber will run sequentially while those issued from different fibers will interleave. This is effectively what is achieved by threading but without its costs.

Performance:

I am sure that my code might use some tweaking but I am getting very good results already. During benchmarking I found out that the cost on isntantiating fibers could be high (the cost of pausing and resuming is high as well, but unavoidable) So I created a pool of fibers that can be reused (a very naive implementation that can make use of lots of improvement).

I tested by issuing a group of long and short queries together. You actually provide the test program with the number of long queries and the multiplier it should use for short queries. i.e. ruby test.rb 10 20 will iterate 10 times and issue a long query then within the same iteration it will issue 20 short queries. It will do this in a blocking and then nonblocking way, reporting the time taken for each to complete and the percentage of performance increase/decrease.



I tested for 10, 50 and 100 long queries with the following multipliers (1, 2, 5, 10, 50, 100). The graph shows the performance gain for each number of queries vs the multiplier. For example 50 long queries with a multiplier of 10 (i.e. 500 short queries) achieves a 39.6% reduction in query execution time. I have repeated many of the tests several time (not all of them, too lazy to do that). The repeated tests showed consistent results so I am pretty confident of the presented results.


Here is the full list:

Queries Mode
Ratio Long Short Blocking Non Blocking Advantage

:1/2 10 20 0.56 0.5 10.27%
50 100 2.55 2.26 11.19%
100 200 5.15 4.46 13.53%

:1/5 10 50 0.55 0.4 27.04%
50 250 2.72 1.83 32.82%
100 500 5.45 3.63 33.39%

:1/10 10 100 0.6 0.4 33.76%
50 500 3.01 1.82 39.67%
100 1000 5.9 3.65 38.13%

:1/20 10 200 0.72 0.45 38.12%
50 1000 3.43 2.1 38.73%
100 2000 6.83 4.33 36.53%

:1/50 10 500 0.98 0.62 36.57%
50 2500 4.78 3.23 32.36%
100 5000 9.74 8.68 10.93%

:1/100 10 1000 1.46 0.94 35.40%
50 5000 7.42 5.17 30.31%
100 10000 14.27 12.68 11.15%
The area I would like to focus on for performance tuning is the size of the fiber pool. The test is a bit sensitive to it so I believe I can gain a bit more performance with insane query counts if I optimize my fiber pool a bit. Setting the initial size too high certainly helps, but eats too much memory to make it usable.

A final note. I am playing with using this along side an EventMachine based http server. It works OK but is a cpu hog. Propably due to using select in next_tick calls withing EM's event loop. I would love to be able to provide EM with a list of IO objects and a call back instead of requiring me to use it to open the connection. Nevertheless, even though in many cases the nonblocking db implementation is slower than a blocking one in the http serving arena, I managed to get ~800 req/s vs ~500 req/s for a very typical use case, A request that runs a long query followed by many short ones. Impressive to say the least. I might be even try to hack EM to support the feature I need and then see what performance this could yield.

UPDATE

Apparently one can get more performance for the blocking requests if the fiber pool is initiated AFTER the blocking calls. Possibly due to the VM being impacted by the memory increase. Rerunning some of the tests showed fractional improvements for the blocking case. On the other hand, I tried some of the tests while another process was doing heavy I/O (RDoc generation). The performance gain jumped to an amazing 76% in one of the tests (it was generally between 51% and 76%).