Showing posts with label rails. Show all posts
Showing posts with label rails. Show all posts

NeverBlock Saves The Day

6

Labels: , , ,

It started with too many processes

A great proof of how valuable NeverBlock is happend just a little while ago. Alsaha.com is one of the oldest forums in the middle east. Lately, I helped rebuild the whole thing and move it to Ruby on Rails while I was at eSpace, my previous employer.

Many web users rely on sites like Alsaha.com for following commentary on breaking news, thus during such events the daily page views can jump from a couple hundred thousands to millions. Add to that the fact that there are some (unavoidable) slow database operations that must be done online for the administrators.

Initially we had different web server instances for the normal users and administrators to avoid stalling Rails processes on long running queries. All in all, since we had a capable back end, we coupled it with a formidable number of Rails processes as front end servers. This was the only way back then to exploit our back end's conncurency. Needless to say, this army of front end processes consumed lots and lots of memory.

Enter NeverBlock

After the initial production testing in meOwns.com, we thought of the gains we can get from using it with Alsaha, so we planned for the move and were able to drastically reduce the Rails instances count. We only use 4 now and this is for utilizing the CPU cores rather than concurrency. Those 4 processes serve all the user and administrative content. Thanks to NeverBlock, no one has to wait for the other.

The Real Test

There was a very important news item lately that resulted in a traffic spike in the order of millions of page views in a few hours. Thankfully, the NeverBlock setup easily accomodated that without noticeable degradation (actually there was some degradation attributed to a bug in the logging logic, that was quickly discovered and fixed). The small 4 instances kept up with the load even though some slow operations were running on them while they were serving loads of quick ones to lots of users.

Conclusion

I am grateful to have written code that turned out to be useful to others. I hope this would be a pattern.

Ruby Networking on Steroids

5

Labels: , , , , , ,

Ruby provides several socket classes for various connection protocols. Those classes are arranged in a strange and a convoluted hierarchy.
This ASCII diagram explains this hierarchy

IO
|
BasicSocket
|
|-- IPSocket
| |
| |-- TCPSocekt
| | |
| | |-- TCPServer
| | |
| | |-- SocksSocket
| |
| |-- UDPSocket
|
|-- Socket
|
|-- UNIXSocket
|
UNIXServer

The BasicSocket class provides some common methods but you cannot instantiate it. You have to use one of the sub classes. We have three branches coming out from BasicSocket. One that implements the IP (and descendant) protocls the other implements the UNIX domain sockets protocol. A third branch provides a generic wrapper over FreeBSD sockets. The first problem with this branching strategy is that while the Socket class can be used as a parent class to both UNIXSocket and IPSocket classes the implementer chose to create a separate path for each of them. This results in that there exists lots of code duplication in the implementation that makes maintaining those classes a lot harder than it should be.

A prime example for this is the addition of non blocking features lately to the I/O and socket classes. Only the Socket class was lucky enough to get an accept_nonblocking method. The other classes sadly didn't get it. It is very important to be able to initiate network connections in a non blocking manner if you are using an evented framework (like NeverBlock for example).

What makes the problem worse is that major Ruby network libraries overlook the Socket class and use TCPSocket or UNIXSocket. Net/HTTP for example uses TCPSocket. Since NeverBlock tries to work in harmony with most Ruby libraries it attempts to make up for this inconsistency by altering the default heirarechy of socket classes. Ruby allows you to un-define constants in an object. We remove the TCPSocket and UNIXSocket classes and redefine them by inheriting from Socket and defining some methods to make up for any lost functionality.

After modifying the Socket classes NeverBlock support was integrated. This was done by rewriting the connect, read and write methods so that they would detect the presence of a NeverBlock fiber and operate in an aysnchronous way accordingly. If you use the new socket classes in a non NeverBlock context or in NeverBlock's blocking mode they will resort to the old blocking implementation.

So Here is an example. First we will create a server using EventMachine that takes 1 second to process each request.

server.rb

require 'eventmachine'

class Server < EM::Connection
# handle requests here
def receive_data data
# set the respnonse to be sent after 1 second
EM.add_timer(1) do
send_data "HTTP/1.1 200 OK\r\n\r\ndone"
close_connection_after_writing
end
end
end

EM.run do
EM.start_server('0.0.0.0',8080, Server)
end


Second we will create a client that will issue requests to the server

client.rb

require 'neverblock'
require 'net/http'
EM.run do
@pool = NB::FiberPool.new(20)
20.times do
@pool.spawn do
url = "http://localhost:8080"
res = Net::HTTP.start(url.host, url.port) { |http| http.get('/') }
end
end
end

Issuing 20 GET requests in NeverBlock fibers causes them to run concurrently. Even while our server process a request in one complete second, they all return after approximately 1 second.

Here is a blocking version

blocking_client.rb

require 'net/http'
20.times do
url = "http://localhost:8080"
res = Net::HTTP.start(url.host, url.port) { |http| http.get('/') }
end


The blocking client finishes after around 20 seconds.

Here's a teaser graph



The really good thing is that we used the Net/HTTP library transparently. Any Ruby library that relies on Ruby sockets will benefit from NeverBlock and gain the ability to run in a concurrent manner.

What does that mean?

Originally, NeverBlock only supported concurrent database access for PostgreSQL and MySQL. While this was good and all, databases usually were the bottlenecks of most applications. Unless you have something like a database cluster which can truly absorb any load. This was a shame, since NeverBlock is meant for high levels of concurrency that are only available with massively scalable back ends. With this new development, however, we are now one step closer to tapping into this realm of high performance and scalable web applications. Read on.

Enter AWS and the cloud

Amazon Web Services provide an example of a massively scalable backend that is accessible via HTTP. Services like S3, SimpleDB and SQS are all a URL away. Such services have a higher latency than your nearby database server but they more than make up for that by being able to absorb all the requests you through at them. Most of the Ruby libraries for accessing AWS rely on Net/HTTP in some way or another. This means we get NeverBlock support for those libraries. Now this is big news for those Ruby applications (including Rails ones) that rely on an AWS or a similar backend. For those types of apps, forget about a 10 or 20 fibers pool. We are talking a 1000 fibers pool here. Even higher numbers could be possible (once a nasty file descriptor bug in Ruby 1.9 is fixed).

Why Not Threads?

I have been claiming that Ruby fibers are faster than Ruby threads[1]. I have seen that in my tests but those were usually limited to testing a single performance metric. So I decided to simulate a very scalable back end and see which approach offers more scalability. For testing purposes I created two client applications. One is threaded and the other is based on NeverBlock. In the NeverBlock version I did not use the fiber pool though, I was creating a new fiber per operation to mimic the threaded app behavior. The simulated scalable back end consisted of an EventMachine based server that waits for a certain time before responding with 200 OK. The delay time is to simulate back end processing and network latencies. I testing using 0, 10, 50, 100 and 500 ms as delay values. Another client application was written that worked in the normal blocking mode for comparison.

The clients were tested using Ruby 1.8.6 and 1.9.1. The only exception was the NeverBlock client which was only tested with 1.9.1. This is due to the fact that the current fiber implementation for Ruby 1.8.x is based on threads so it will only reflect a threaded implementation performance. Ruby1.8 was introduced because I noticed problems with the Ruby 1.9 threading implementation regarding scalability and performance so I added Ruby1.8 to the mix which proved to have a (sometimes) faster and more scalable threading implementation.

The application will attempt to issue 1000 requests to the back end server and will try to do so in a concurrent fashion (except for the blocking version of course)

Here are the results



And the results in ASCII format (numbers in cells are requests/sec)

Server Delay 0ms 10ms 50ms 100ms 500ms

Ruby1.8 Blocking 2000 19 16 10 2

Ruby1.9 Blocking 2400 19 17 10 2

Ruby1.8 Threaded 1050 800 670 536 415

Ruby1.9 Threaded 618 470 451 441 395

Ruby1.9 NeverBlock 2360 1997 1837 1656 1031

Let's try to explain the results. For a server that has no delay whatsoever (a utopian assumption) we see that the blocking servers offer the greatest performance. Ruby 1.9 in blocking mode comes first mainly due to the fact that Ruby1.9 is faster than Ruby1.8 and also comes with a faster Net/HTTP library[1]. Why is blocking faster? Simply because the evented server is processing the requests serially and the latency is minimal. The request processing send a response and returns immediately so the server does not get a chance to process requests concurrently. This is the fastest that you can drive your processor.

The NeverBlock implementation comes as a very close second to the fastest client which shows that the overhead of using fibers is not that much. Actually we are cheating a bit here, because we make up for the overhead by sending the requests concurrently, and while the server is still processing the serially we are able to process the fiber pause and resume while the server is working.

Needless to say, NeverBlock is much ahead of the threaded clients (either 1.8 or 1.9) when working with the zero latency server. We also see that 1.8 threads are considerably faster than 1.9's.

When we start adding a simulated delay to the server we see that the blocking clients fall dramatically from the first position to the last. They become too slow that they are really not suitable for use in that setting any more. Please note that the results for the 500ms delay are extrapolations. I was to annoyed by the idea of waiting 500 seconds for a test to run, twice!

On the other hand, threaded and NeverBlock implementations are much less affected even though they lose ground as we increase the delay. NeverBlock maintains its lead though over threaded clients. It is generally 2.5X faster.

Here is a graph of the NeverBlock advantage over the fastest threaded client



And in ASCII format

Server Delay 0ms 10ms 50ms 100ms 500ms

NeverBlock Advantage 124.76% 149.63% 174.18% 208.96% 148.43%

Aside from the NeverBlock advantage the numbers themselves are very impressive. A single process can achieve ~1000 operations per second given that we have half a second processing and network latency. In a mutli process setup we should be able to achieve a lot more than that. For example, forking another NeverBlock client on my dual core notebook which hosts the client and the server apps adds a 50% performance gain.

Conclusion

NeverBlock really shines when the back end is highly scalable. The only problem I met was a Ruby1.9 bug that crashed the client when the file descriptors exceeded 1024. I hope this could be fixed as it will enable us to extract more performance from each process. Expect the socket support to be officially added to NeverBlock soon.

Building the Never Blocking Rails, Making Rails 12X Faster

15

Labels: , , , , ,

They told you it can't be done, they told you it has no scale. They told you lies!

What if you suddenly had the ability to serve mutliple concurrent requests in a single Rails instance? What if you had the ability to multiplex IO operations from a single Rails instance?

No more what ifs. It has been done.

I was testing NeverBlock support for Rails. For testing I built a normal Rails application. Nothing up normal here, you get the whole usual Rails deal, routes, controllers, ActiveRecord models and eRuby templates. I am using the Thin server for serving the application and PostgreSQL as a database server. The only difference is that I was not using the PostgreSQL adapter, rather I was using the NeverBlock::PostgreSQL adapter.

All I needed to do is to call the adapter in database.yml neverblock_postgresql instead of postgresql and require 'never_block/server/thin' in my production.rb

All this was working with Ruby 1.9, so I had to comment out the body of the load_rubygems method in config/boot.rb which is not needed in Ruby1.9 anyway.

Now what difference does this thing make?

It allows you to process multiple requests concurrently from a single Rails instance. It does this by utilizing the async features of the PG client interface coupled with Fibers and the EventMachine to provide transparent async operations.

So, when a Rails action issue any ActiveRecord operation it will be suspended and another Rails action can kick in. The first one will be resumed once PostgreSQL has provided us with the data.

To make a quick test, I created a controller which would use an AR model to issue the following sql command "select sleep(1)". (sleep does not come by default with PostgreSQL, you have to implement it yourself). I ran the applications with the normal postgresql adapter and used apache bench to measure the performance of 10 concurrent requests.

Here are the results:
Server Software:        thin
Server Hostname: localhost
Server Port: 3000

Document Path: /forums/sleep/
Document Length: 11 bytes

Concurrency Level: 10
Time taken for tests: 10.248252 seconds
Complete requests: 10
Failed requests: 0
Write errors: 0
Total transferred: 4680 bytes
HTML transferred: 110 bytes
Requests per second: 0.98 [#/sec] (mean)
Time per request: 10248.252 [ms] (mean)
Time per request: 1024.825 [ms] (mean, across all concurrent requests)
Transfer rate: 0.39 [Kbytes/sec] received


Almost 1 request per second. Which is what I expected. Now I switched to the new adapter, restarted thin and redid the test.

Here are the new results:

Server Software: thin
Server Hostname: localhost
Server Port: 3000

Document Path: /forums/sleep/
Document Length: 11 bytes

Concurrency Level: 10
Time taken for tests: 1.75797 seconds
Complete requests: 10
Failed requests: 0
Write errors: 0
Total transferred: 4680 bytes
HTML transferred: 110 bytes
Requests per second: 9.30 [#/sec] (mean)
Time per request: 1075.797 [ms] (mean)
Time per request: 107.580 [ms] (mean, across all concurrent requests)
Transfer rate: 3.72 [Kbytes/sec] received


Wow! a 9x speed improvement! The database requests were able to run concurrently and they all came back together.

I decided to simulate various work loads and test the new implementation against the old one. I devised the workloads taking into account that the test machine did have a rather bad IO perfromance so I decided to use queries that would not tax the IO but still would require the PostgreSQL to take it's time. The work loads were categorized as follows:

First a request would issue a "select 1" query, this is the fastest I can think of, then for the differen work loads

1 - Very light  work load,  every 200 requests, one "select sleep(1)" would be issued 
2 - Light work load, every 100 requests, one "select sleep(1)" would be issued
3 - Moderate work load, every 50 requests, one "select sleep(1)" would be issued
4 - Heavy work load, every 20 requests, one "select sleep(1)" would be issued
5 - Very heavy work load, every 10 requests, one "select sleep(1)" would be issued


I tested those workloads against the following

1 - 1 Thin server, normal postgreSQL Adapter
2 - 2 Thin servers (behind nginx), normal postgreSQL Adapter
3 - 4 Thin servers (behind nginx), normal postgreSQL Adapter
4 - 1 Thin server, neverblock postgreSQL Adapter


I tested with 1000 queries and a concurrency of 200 ( the mutliple thin servers were having problems above that figure, the new adapter scaled up to 1000 with no problems, usually with similar or slightly better results )

Here are the graphed results:



For the neverblock thin server I was using a pool of 12 connections. As you can see from the results, In very heavy workload I would perform on par with a 12 Thin cluster. Generally the NeverBlock Thin server easily outperforms the 4 Thin cluster. The margin increases as the work load gets heavier.

And here are the results for scaling the number of concurrent connections for a NeverBlock::Thin server



Traditionally we used to spawn as many thin servers as we can till we run out of memory. Now we don't need to do so, as a single process will maintain multiple connections and would be able to saturate a single cpu core, hence the perfect setup seems to be a single server instance for each processor core.

But to really saturate a CPU one has to do all the IO requests in a non-blocking manner, not just the database. This is exactly the next step after the DB implementation is stable, to enrich NeverBlock with a set of IO libraries that operate in a seemingly blocking way while they are doing all their IO in a totally transparent non-blocking manner, thanks to Fibers.

I am now wondering about the possibilities, the reduced memory footprint gains and what benefits such a solution can bring to the likes of dreamhost and all the Rails hosting companies.

101 Reasons Why PostgreSQL is a better fit for Rails than MySQL

26

Labels: , , ,

1 - Indexing Support

MySQL cannot utilize more than one index per query. I believe this is worth repeating: MySQL CANNOT UTILIZE MORE THAN ONE INDEX PER QUERY. Wait till your tables get large enough and this will surely hit you. OTOH PostgreSQL can use multiple indices per query which come real handy.

2 - Full Text Indexing Support

MySQL can do full text indexing on MyISAM tables only, those working with InnoDB tables are out if luck. PostgreSQL has very advanced full text indexing capabilities wich enable you to control the tiniest details down to the stemming strategy.

3 - Asynchronous Interface

MySQL drivers are very unfriendly to the Ruby interpreter. Once a command is issued they take over until they come back with results. PostgreSQL sports a completely asynchronous interface where you can send queries to the database and then tend to other matters while the query is being processed by the server. The good news is that an Async ActiveRecord adapter for MySQL is being developed right now, as part of the rapidly growing NeverBlock library.

4 - Ruby Threading Aware

PostgreSQL dirvers enable the Ruby thread scheduler while IO requests are being processed (a nice side effect of the async interface). Which makes it much better suited for multithreaded Rails apps.

5 - Multistatements Per Query

Both MySQL and PostgreSQL support sending multiple statements separated by semi colons at once. But the returning result will be that of the last statement in the group. Now did you know that by using the async interface you can send multiple queries at once and then get back the results, one by one? One of the coolest features of the coming ActiveRecord (and Sequel btw) adapter is it's support for queuing queries to be consumed by a pool of connections. A trick we are contemplating working on is to group consequent selects together and send them in a single request to PostgreSQL and then later extract the results associated with each one of them. This is still very theoretical but should be verified soon.

Now that the 0b101 reasons are told I rest my case.

The case for a nonblocking Ruby stack

12

Labels: , , , , , , , , , ,

In a previous post I talked about the problems that plauge the web based Ruby applications regarding processor and memory use. I proposed using non-blocking IO as a solution to this problem. In a follow up post I benchmarked nonblocking vs blocking performance using the async facilities in the Ruby Postgres driver in combination with Ruby Fibers. The results were very promising (up to 40% improvement) that I decided to take the benchmarking effort one step further. I monkey patched the ruby postgres driver to be fiber aware and was able to integrate it into sequel with little to no effort. Next I used the unicycle monorail server (the EventMachine HTTP server) in an eventmachine loop. I created a dumb controller which would query the db and render the results using the Object#to_json method.

As was done with the evented db access benchmark, a long query ran every n short queries (n belongs to {5, 10, 20, 50, 100}). The running application accepted 2 urls. One ran db operations in normal mode and the other ran in nonblocking mode (every action invocation was wrapped in a fiber in the latter case)

Here are the benchmark results

Full results

Comparing the number of requests/second fulfilled by each combination of blocking mode and conncurrency level. The first had the possible values of [blocking, nonblocking] the second had the possible values of [5, 10, 20, 50, 100]



Advantage Graph

Comparing the advantage gained for nonblocking over blocking mode for different long to short query ratios. Displaying the results for different levels of concurrency



And the full results in tabular form

Concurrent Requests
Ratio 10 100 1000

1 To 100 Nonblocking 456.94 608.67 631.82
1 To 100 Blocking 384.82 524.39 532.26
Advantage 18.74% 16.07% 18.71%

1 To 50 Nonblocking 377.38 460.74 471.89
1 To 50 Blocking 266.63 337.49 339.01
Advantage 41.54% 36.52% 39.20%

1 To 20 Nonblocking 220.44 238.63 266.07
1 To 20 Blocking 142.6 159.7 141.92
Advantage 54.59% 49.42% 87.48%

1 To 10 Nonblocking 130.87 139.76 195.02
1 To 10 Blocking 78.68 84.84 81.07
Advantage 66.33% 64.73% 140.56%

1 To 5 Nonblocking 70.05 75.5 109.34
1 To 5 Blocking 41.48 42.13 41.77
Advantage 68.88% 79.21% 161.77%

Conclusion

In accordance with my expectations. The nonblocking mode outperforms the blocking mode as long as enough long queries come into the play. If all the db queries are very small then the blocking mode will triumph mainly due to the overhead of fibers. But nevertheless, once there is a even single long query for every 100 short queries the performance is swayed into the nonblocking mode favor. There are still a few optimizations to be done, mainly complete the integrations with the EventMachine which should theoritically enhance performance. The next step is to integrate this into some framework and build a real application using this approach. Since Sequel is working now having Ramaze or Merb running in non-blocking mode should be a fairly easy task. Sadly Rails is out of the picture for now as it does not support Ruby 1.9 yet.

I reckon an application that does all its IO in an evented way will need much less processes per CPU core to make full use of it. Actually I am guessing that a single core can be maxed by a single process. If this is the case then I can live much happier if I can replace the 16 Thin processes running on my server with only 4. Couple that with the 30% memory savings we get from using RubyEE and we are talking about an amazing 82.5% memory foot print reduction without sacrificing performance.

Untwisting the Event Loop

1

Labels: , , , , , ,

Have you ever wondered why your Rails application is so memory hungry while it is not really trying to fully utilize your CPUs? To saturate your CPUs you have to have a large number of Thin (or Mongrel or whatever) instances. Why is that? We all know that the Ruby interpreter is not able to utilize more than one CPU (or no more than one CPU at a time in the case of 1.9). But why can't Ruby (or may be it's Rails?) utilize the processors efficiently? Let's look for an answer to this question.

First off, what happens in a typical Rails action? The Rails framework will be doing some request mapping and routing which is mostly CPU (if we consider memory latency negligible) Then a few requests will be sent to the database to retrieve some data after which a rendering process which is mostly CPU as well.
def show
@user = User.find(params[id]) #db access
@events = Events.find(:all) #another db access
render :action => :show #rendering
end

The problem here comes with the database part of the action. Calls to the database will block processing till results get back from the DBMS. During that time, Rails will be frozen and not trying to do any thing else till the call ends. Good news is that threads can help here (even Ruby's green threads). A blocked thread will give way to other threads till it is back in the ready state. Thus filling those slots with some useful processing. Sounds good enough? NO!

Sadly Rails is NOT thread safe. You cannot use threads to do parallel processing in Rails. So why not something like Merb? I hear you say. Well Merb and threads will be able to interleave CPU operations and help with the time spent on IO in something like fetching data from some other service. But it won't save you when you do database IO. Simply because of the simple fact that calling C extensions blocks the whole Ruby interpreter. Yes, you read it correctly the first time. Nothing cannot be scheduled while a native call is being issued. Since database drivers are mostly C extensions they suffer from this. Your nice SELECT statement keeps the whole Ruby interpreter on hold till it is finished.

But there must be a solution to this. We cannot be all left high and dry with interpreters eating our memory and not really using our CPUs.

Enter EventMachine and AsyMy

For those who are not in the loop of events (bun intended) there happens to be another approach to this problem. Event based (read asynchronous) IO. In this mode of operation you request an IO operation and tell the event loop what to do when the request is fulfilled (either fully or partially). An excellent library for event handling exists for Ruby which is Francis' EventMachine (used internally by the Thin server and the evented flavour of Mongrel). But still, using EventMachine does not magically solve all our problems. The question that keeps popping up, what to do with database access? AsyMy to the rescue! AsyMy, written by Thomas Ptacek, is an evented driver for MySQL that operates in an asynchronous fashion. A quick example will look like:
connection.execute('SELECT * from events') do |headers,data|
# do something with headers and data
pp headers
pp data
end
Asymy is still in a very early stage, the performance is horrible (as it is based on the darn slow pure Ruby MySQL driver) and it comes with many rough corners (I was not able to run INSERTs and UPDATEs without hacking it, and I am still not able to run the callbacks for those). Nevertheless, this is a formidable achievement on the road to a very fast single threaded implementation.

Here's how our action would look like if there was an Asymy adapter for ActiveRecord
#this is propably wrong but it can illustrate
#the twisted nature of evented programming
def show
User.find(params[:id]) do |result_set|
@user = result_set
Events.find(:all) do |result_set|
@events = result_set
@events.each do |event|
event.owner = @user
if event != @events.last
event.save
else
event.save do |ev|
render :action => :show
end
end
end
end
end
end
We had to twist the function flow to be able to make use of the evented nature of the new driver. Instead of flow passing normally it is being scattered in the different callbacks. This is one of the areas where event based programming makes you change the way you think about program flow. A hurdle for many developers and a show stopper for some. No wonder the event library for Python is called Twisted

Why not untangle this with Fibers?

Fibers are lightweight concurrency primitives introduced in Ruby 1.9. How light weight? well they don't come at zero cost but in long running requests the weight they add can be negligible. Fibers provide some form of cooperative (rather than preemptive) concurrency inside a single thread (you cannot pass fibers between threads, you have been warned). Fibers enjoy the ability to pause and resume like continuations, but they don't suffer from the memory leaks the continuations have. When we use this feature wisely we can unwind the action code above to look like this:
def show
@user = User.find(params[:id]
@events = Events.find(:all).each do |event|
event.owner = @user
event.save
end
end
Huh? this is the normal action code we are used to. Well, using fibers we can do this and still do things under the hood in an evented way.

To make things clear we need to illustrate Fibers with an example:
require 'fiber'

fiber = Fiber.new do
#do something
Fiber.yield another_thing
#do yet another thing
end

yielded = fiber.resume # => runs the fiber till the yield,
# returns the yielded value
# and pauses the fiber where it is
fiber.resume #=> re-runs the fiber from the point it was paused.
fiber.resume #=> no more statements to run, raises an exception
Let's see how can this be useful for dispatching controller actions (this code will preferrably be in the server itself)
Fiber.new do
Dispatcher.dispatch(controller,action,req,res)
send_response res
end.resume
Inside the action we call the find method repeatedly. This method could be implemented like this:
class DataStore
def find(*args)
query = construct_query(*args)
fiber = Fiber.current # grab the current fiber
conn.execute(query) do |headers, data|
fiber.resume convert_to_objects(data)
end
yield
end
end
This way whenever the code passes a find method it will pass the query to the db driver, return immediately and pause, giving room for other requests to be processed. Once the data comes from the db server the call back is run and it resumes the fiber (passing to it the result of the query). The result gets passed back to the caller of the function and the original action method continues till completion (or till it is paused again by another find method)

Roger Pack has a nice writeup (with actual working code) on the Evented Fibered combo here.

Charles Jolley implemented a similar thing here. It is called Pipelined and while it is more obtrusive than the approach described above, it still has the advantage of being optional. Pipelined uses continuations and hence is available to Ruby 1.8 (and Rails).

I am still ironing out and tying things together (and doing lots of benchmarks) and I would like to tell you that I have ditched AsyMy for now for another alternative which I will attempt to discuss in detail in another blog post.

Ponder This

1

Labels: , , ,

I happened to stumble on this (kinda old) page where the author is comparing Rails to Groovy performance. I was reading his comments, he was generally concluding that Groovy is faster than Rails. I didn't look at the graphs until I saw his comment that 10 Mongrels behind pound was much slower than a single Mongrel!!. I double checked the graph and this was the case! The 10 Mongrel setup is much slower in all the cases than a 1 Mongrel instance.

I was wondering where is the catch. Rereading the article I spotted it with little effort. Here's a ponder-this for those who read this blog (both of you), what did the guy do to screw up the performance figures that bad? If you get it, please add a comment with your answer.

Given his numbers, an appropriate Rails setup will make Rails suddenly faster (I don't mean any magic tricks, just fixing his fatal mistake). But the difference will not be that big anyway.

Update: seemingly no one discovered (or cared to discover?) the mistake, so here it goes. The lad used 10 Mongrels on a 1GB Ram machine that also ran mysql, OS X and whatever else he got running, he simply ran out of memory and started swapping. The numbers for the 10 Mongrel setup were including the disk swapping penalty. Couldn't he just listen to his drive or see a blinking led?

Thin is getting thinner by the day

1

Labels: , , , , ,

It seems that macournoyer is in a frenzy of releases for Thin!. He just released the 0.6.1 sweet cheesecake release (what's up with the release names? I'm trying to control my diet here!). The new release uses less memory than Mongrel which is a very good thing IMHO. It also uses the same config files used for Mongrel Cluster. The big bang is the ability to use UNIX sockets to connect to the load balancer instead of TCP connections. This decreases the overhead of nginx sending requests and getting responses back from the upstream servers.

These are good news, though I will have to test if this will translate to noticeable performance increase in a full fledged Ruby on Rails application

I will redo my tests with this new setup and come back with more numbers

Stay tuned

The numbers are in, Thin is (sort of) faster than Mongrel for production apps.

2

Labels: , , , ,

I have been playing with the Thin + nginx combo for a while now. I did a lot of stress testing using an actual application that is ready to go for production soon. I played with both Thin and Mongrel to see if there is a big difference.

I need not mention that I didn't bother testing Apache as a proxy balancer instead of nginx, I am getting an earth shaking 7900 req/s for static requests using nginx on an application serving unit.

Our application serving units are xen virtual servers with a quad core processor each. It runs The nginx web server and the web application cluster (Mongrel currently, but may be Thin too). The cluster runs 10 Mongrels and 4 nginx workers. Any application can scale its front end by adding more of those application serving units.

While testing I was surprised that after a certain test Thin was giving me results that were slower than Mongrel. Repeating the tests or letting the system load cool down didn't help. What I found was that I hit the memory limit and the system started swapping. I shut down one Thin server and suddenly they started to outperform (albeit by a small margin) the Mongrel cluster again.

I tested against three of the very heavy pages. The results were the average of three runs for each page in the specified concurrency/connections pair

So here are the numbers (using Apache Bench):
            (Concurrency/Connections) 
100/1000 200/1000 200/10000
Thin 104.3req/s 115.7req/s 123.1req/s
Mongrel 100.6req/s 113.2req/s 121.6req/s

The figures speak for themselves; while Thin is constantly faster than Mongrel, the difference is negligible. I am assuming that this is due to the fact that most of the time is spent in processing the Rails stack and doing IO with memcached then sending the actual repsonse back. The raw differences between Thin and Mongrel are dwarfed by the time spent in Rails. You will see a good advantage for Thin when you are doing very small requests that do little processing and send small responses. While this is not the case for this particular test, it is typical in many Ajax intensive applications. And since this application is full of Ajax requests, I believe that we might opt for Thin at the end.

I have to say that I was very happy with the results so far. Mongrel and Thin are both robust and nginx is a true gem ;). The application is expected to generate lots of traffic and I am confident that scaling would only be a matter of adding more boxes. My next challenge is to get more performance out of those boxes. Which is a possibility since Evan is working on a much better memcached client for Ruby. Knowing that Evan is working on Mongrel too is reassuring me regarding its future.

Thin, the thin server that can!

6

Labels: , , , ,

I did some testing for the new kid on the block. The Thin Ruby web server. I used 4 instances and Nginx as my proxy server.

Thin is based on tried and true components (best in their class if you ask me). It's got its parser from Mongrel, IO management by Eventmachine and finally it connects to your favorite Ruby framework via Rack. It's amazing how one can achieve much just by blending the right components together.

For static page serving, I got a whopping ~2500 req/s on my 2GHZ Core 2 Duo machine. (vs ~900 req/s for the same machine running Mongrels). That's for 1000 concurrent users using Apache Bench

I also managed to achieve ~1200 req/s for a dynamic request in Rails that prints out 'Hello world!'. For the same 1000 concurrent users.

I will put it to real test in the coming days in more real world scenarios. I hope to be able to post the results here soon.

Restful Pagination in Rails

4

Labels: , ,

Have you ever tried to cache your paginated lists? Sadly, vanilla Rails wont help much as they ignore the url query parameters when caching and hence the page=x value is not honored and Rails caching (action or page caching) will simply stick to the first page rendered for all requests.

One might come with a solution that overrides the cache key generation to incorporate the query string, which will work, but will result in very long and ugly hash keys.

Luckily there is a better approach, if you simply defined routs for pages (for the paginated resources) and name them page parameter with the same name you give it in the paginator then Rails will pick up the route when creating paginated links.

In your routes.rb
map.resources    :users
map.paged_users '/users/pages/:page'
map.formatted_paged_users '/users/pages/:page.:format'

once the above routes are in place, all you need is to make sure your paginators are using 'page' as the page parameter name and you will see the pagination links created like this:
/users/pages/1
/users/pages/2

Don't forget the formatted route to support pagination with various formats so you can use routes like:
/users/pages/1.xml

These urls are very cache friendly and adhere to REST much more than the default parameters based ones.

Happy caching (with pagination)

Using action+client caching to speed up your Rails application

0

Labels: , ,

Too many visitors are visiting your website and loads of dynamic data are being delivered to your clients?. Of those visitors, you have more people reading your site's content than people modifying it? meaning, you get lots more GET requests than POST, PUT or DELETE?

If the above questions are all answered with a YES, then, my friend, you are desperately in need of caching. Caching will help you lessen the load on your servers by doing two main things:
  1. It eliminates lengthy trips to the (slow by nature) database to fetch the dynamic data
  2. It frees precious CPU cycles needed in processing this data and preparing it for presentation.
I have faced the same situation with a project we are planning, we are bound to have much more GETS than any other HTTP command, and since we are building a Restful application we will have a one to one mapping between our web resources (urls) and our application models. The needs of our caching mechanism are the following:
  1. It needs to be fast
  2. It needs to be shared across multiple servers
  3. Authentication is required for some actions
  4. Page presentation changes (slightly) based on logged in user
  5. Most pages are shared and only a few are private for each user
We have two answer the following now, what caching technique and what cache store we will use?

The cache store part is easy, memcached seems like the most sensible choice as it achieves points 1 & 2 and is orthogonal to the other 3 requirements. So it is memcached for now.

Now, which caching technique?. Rails has several caching methods, the most famous of those is Page, Action and Fragment Caching. Greg Pollack has a great writeup on these here and here. Model caching is also an option, but it can get a bit too complicated, so I'm leaving it out for now, it can be implemented later though (layering your caches is usually a good idea)

Page caching is the fastest, but we will use the ability to authenticate (unless we do so via HTTP authentication, which I would love to, but sadly is not the case). This leaves us with action and fragment caching. Since the page contains slightly different presentation based on the logged in user (like a hello message and may be a localized datetime string) fragment caching would sound to be the better choice, no? Well, I would love to be able to use action caching after all, this way I can server whole pages without invoking the renderer at all and really avoid doing lots of string processing by Ruby.


There is a solution, if you'd just wake up and smell the coffee, we are in Web 2.0 and we should think in Web 2.0 age solutions for Web 2.0 problems. What if add little JavaScript to the page that dynamically displays the desired content based on user role. And if the content is really little, why not store it in a session cookie? Max Dunn implements a similar solution for his wiki here and thus the page is served the same with dom manipulation kicking in to do the simple mods for this specific user. Rendering of those is done on the client so no load on the server, and since the mods are really small, the client is not hurt either, and it gets to get the page much faster, it's a win win situation. Life can't be better!

No, It can!. In a content driven website, many people check a hot topic frequently, and many reread the same data they read before. In those cases, the server is sending those a cached page yes, but it is resending the same bits which the browser has in it's cache. This is a waste of bandwidth, and your mongrel will be waiting for the page transfer to finish before it can consume another request.

A better solution is to utilize client caching. Tell the browser to use the version in its cache if it is not invalidated. Just send the new data in a cookie and and let the page dynamically modify itself to adapt to the logged in user. Relying on session cookies for dynamic parts will prevent the browser from displaying stale data between two different session. But the page itself will not be fetched over the wire more than once, even for different users on the same computer.

I am using the Action Cache Plugin by Tom Fakes to add client caching capabilities to my Action Caches. Basically things go in the following manner:
  1. A GET request is encountered and is intercepted
  2. Caching headers are checked, if none exists then proceed
    else send (304 NOT MODIFIED)
  3. Action Cache is checked if it is not there then proceed
    else send the cached page (200 OK)
  4. Action processed and page content is rendered
  5. Page added to cache, with last-modified header information
  6. Response sent back to browser (200 OK + all headers)
So how to determine the impact of applying these to the application
  1. We need to know the percentage of GET requests, which can be cached as opposed to POST, PUT and DELETE ones
  2. Of those GET requests, how many are repeated?
  3. Of those repeated GET requests, how many originate from the same client?
Those numbers can tell us if our caching model works fine or not, this should be the topic of the next installment of this article

Happy caching