Ruby Networking on Steroids


Labels: , , , , , ,

Ruby provides several socket classes for various connection protocols. Those classes are arranged in a strange and a convoluted hierarchy.
This ASCII diagram explains this hierarchy

|-- IPSocket
| |
| |-- TCPSocekt
| | |
| | |-- TCPServer
| | |
| | |-- SocksSocket
| |
| |-- UDPSocket
|-- Socket
|-- UNIXSocket

The BasicSocket class provides some common methods but you cannot instantiate it. You have to use one of the sub classes. We have three branches coming out from BasicSocket. One that implements the IP (and descendant) protocls the other implements the UNIX domain sockets protocol. A third branch provides a generic wrapper over FreeBSD sockets. The first problem with this branching strategy is that while the Socket class can be used as a parent class to both UNIXSocket and IPSocket classes the implementer chose to create a separate path for each of them. This results in that there exists lots of code duplication in the implementation that makes maintaining those classes a lot harder than it should be.

A prime example for this is the addition of non blocking features lately to the I/O and socket classes. Only the Socket class was lucky enough to get an accept_nonblocking method. The other classes sadly didn't get it. It is very important to be able to initiate network connections in a non blocking manner if you are using an evented framework (like NeverBlock for example).

What makes the problem worse is that major Ruby network libraries overlook the Socket class and use TCPSocket or UNIXSocket. Net/HTTP for example uses TCPSocket. Since NeverBlock tries to work in harmony with most Ruby libraries it attempts to make up for this inconsistency by altering the default heirarechy of socket classes. Ruby allows you to un-define constants in an object. We remove the TCPSocket and UNIXSocket classes and redefine them by inheriting from Socket and defining some methods to make up for any lost functionality.

After modifying the Socket classes NeverBlock support was integrated. This was done by rewriting the connect, read and write methods so that they would detect the presence of a NeverBlock fiber and operate in an aysnchronous way accordingly. If you use the new socket classes in a non NeverBlock context or in NeverBlock's blocking mode they will resort to the old blocking implementation.

So Here is an example. First we will create a server using EventMachine that takes 1 second to process each request.


require 'eventmachine'

class Server < EM::Connection
# handle requests here
def receive_data data
# set the respnonse to be sent after 1 second
EM.add_timer(1) do
send_data "HTTP/1.1 200 OK\r\n\r\ndone"
end do
EM.start_server('',8080, Server)

Second we will create a client that will issue requests to the server


require 'neverblock'
require 'net/http' do
@pool =
20.times do
@pool.spawn do
url = "http://localhost:8080"
res = Net::HTTP.start(, url.port) { |http| http.get('/') }

Issuing 20 GET requests in NeverBlock fibers causes them to run concurrently. Even while our server process a request in one complete second, they all return after approximately 1 second.

Here is a blocking version


require 'net/http'
20.times do
url = "http://localhost:8080"
res = Net::HTTP.start(, url.port) { |http| http.get('/') }

The blocking client finishes after around 20 seconds.

Here's a teaser graph

The really good thing is that we used the Net/HTTP library transparently. Any Ruby library that relies on Ruby sockets will benefit from NeverBlock and gain the ability to run in a concurrent manner.

What does that mean?

Originally, NeverBlock only supported concurrent database access for PostgreSQL and MySQL. While this was good and all, databases usually were the bottlenecks of most applications. Unless you have something like a database cluster which can truly absorb any load. This was a shame, since NeverBlock is meant for high levels of concurrency that are only available with massively scalable back ends. With this new development, however, we are now one step closer to tapping into this realm of high performance and scalable web applications. Read on.

Enter AWS and the cloud

Amazon Web Services provide an example of a massively scalable backend that is accessible via HTTP. Services like S3, SimpleDB and SQS are all a URL away. Such services have a higher latency than your nearby database server but they more than make up for that by being able to absorb all the requests you through at them. Most of the Ruby libraries for accessing AWS rely on Net/HTTP in some way or another. This means we get NeverBlock support for those libraries. Now this is big news for those Ruby applications (including Rails ones) that rely on an AWS or a similar backend. For those types of apps, forget about a 10 or 20 fibers pool. We are talking a 1000 fibers pool here. Even higher numbers could be possible (once a nasty file descriptor bug in Ruby 1.9 is fixed).

Why Not Threads?

I have been claiming that Ruby fibers are faster than Ruby threads[1]. I have seen that in my tests but those were usually limited to testing a single performance metric. So I decided to simulate a very scalable back end and see which approach offers more scalability. For testing purposes I created two client applications. One is threaded and the other is based on NeverBlock. In the NeverBlock version I did not use the fiber pool though, I was creating a new fiber per operation to mimic the threaded app behavior. The simulated scalable back end consisted of an EventMachine based server that waits for a certain time before responding with 200 OK. The delay time is to simulate back end processing and network latencies. I testing using 0, 10, 50, 100 and 500 ms as delay values. Another client application was written that worked in the normal blocking mode for comparison.

The clients were tested using Ruby 1.8.6 and 1.9.1. The only exception was the NeverBlock client which was only tested with 1.9.1. This is due to the fact that the current fiber implementation for Ruby 1.8.x is based on threads so it will only reflect a threaded implementation performance. Ruby1.8 was introduced because I noticed problems with the Ruby 1.9 threading implementation regarding scalability and performance so I added Ruby1.8 to the mix which proved to have a (sometimes) faster and more scalable threading implementation.

The application will attempt to issue 1000 requests to the back end server and will try to do so in a concurrent fashion (except for the blocking version of course)

Here are the results

And the results in ASCII format (numbers in cells are requests/sec)

Server Delay 0ms 10ms 50ms 100ms 500ms

Ruby1.8 Blocking 2000 19 16 10 2

Ruby1.9 Blocking 2400 19 17 10 2

Ruby1.8 Threaded 1050 800 670 536 415

Ruby1.9 Threaded 618 470 451 441 395

Ruby1.9 NeverBlock 2360 1997 1837 1656 1031

Let's try to explain the results. For a server that has no delay whatsoever (a utopian assumption) we see that the blocking servers offer the greatest performance. Ruby 1.9 in blocking mode comes first mainly due to the fact that Ruby1.9 is faster than Ruby1.8 and also comes with a faster Net/HTTP library[1]. Why is blocking faster? Simply because the evented server is processing the requests serially and the latency is minimal. The request processing send a response and returns immediately so the server does not get a chance to process requests concurrently. This is the fastest that you can drive your processor.

The NeverBlock implementation comes as a very close second to the fastest client which shows that the overhead of using fibers is not that much. Actually we are cheating a bit here, because we make up for the overhead by sending the requests concurrently, and while the server is still processing the serially we are able to process the fiber pause and resume while the server is working.

Needless to say, NeverBlock is much ahead of the threaded clients (either 1.8 or 1.9) when working with the zero latency server. We also see that 1.8 threads are considerably faster than 1.9's.

When we start adding a simulated delay to the server we see that the blocking clients fall dramatically from the first position to the last. They become too slow that they are really not suitable for use in that setting any more. Please note that the results for the 500ms delay are extrapolations. I was to annoyed by the idea of waiting 500 seconds for a test to run, twice!

On the other hand, threaded and NeverBlock implementations are much less affected even though they lose ground as we increase the delay. NeverBlock maintains its lead though over threaded clients. It is generally 2.5X faster.

Here is a graph of the NeverBlock advantage over the fastest threaded client

And in ASCII format

Server Delay 0ms 10ms 50ms 100ms 500ms

NeverBlock Advantage 124.76% 149.63% 174.18% 208.96% 148.43%

Aside from the NeverBlock advantage the numbers themselves are very impressive. A single process can achieve ~1000 operations per second given that we have half a second processing and network latency. In a mutli process setup we should be able to achieve a lot more than that. For example, forking another NeverBlock client on my dual core notebook which hosts the client and the server apps adds a 50% performance gain.


NeverBlock really shines when the back end is highly scalable. The only problem I met was a Ruby1.9 bug that crashed the client when the file descriptors exceeded 1024. I hope this could be fixed as it will enable us to extract more performance from each process. Expect the socket support to be officially added to NeverBlock soon.

My US Visa Status


Labels: , ,

So, RubyConf 2008 has started. I was supposed to be presenting NeverBlock to the audience there. I didn't make it but thank God my coworker and friend Yasser was able to go toFlorida. I couldn't make it because I am still waiting for my visa clearance (the Americans changed presidents while I am still waiting!). The status kept saying "Under Processing" till a few days before the conference date. It changed to show the following:

Building the Never Blocking Rails, Making Rails 12X Faster


Labels: , , , , ,

They told you it can't be done, they told you it has no scale. They told you lies!

What if you suddenly had the ability to serve mutliple concurrent requests in a single Rails instance? What if you had the ability to multiplex IO operations from a single Rails instance?

No more what ifs. It has been done.

I was testing NeverBlock support for Rails. For testing I built a normal Rails application. Nothing up normal here, you get the whole usual Rails deal, routes, controllers, ActiveRecord models and eRuby templates. I am using the Thin server for serving the application and PostgreSQL as a database server. The only difference is that I was not using the PostgreSQL adapter, rather I was using the NeverBlock::PostgreSQL adapter.

All I needed to do is to call the adapter in database.yml neverblock_postgresql instead of postgresql and require 'never_block/server/thin' in my production.rb

All this was working with Ruby 1.9, so I had to comment out the body of the load_rubygems method in config/boot.rb which is not needed in Ruby1.9 anyway.

Now what difference does this thing make?

It allows you to process multiple requests concurrently from a single Rails instance. It does this by utilizing the async features of the PG client interface coupled with Fibers and the EventMachine to provide transparent async operations.

So, when a Rails action issue any ActiveRecord operation it will be suspended and another Rails action can kick in. The first one will be resumed once PostgreSQL has provided us with the data.

To make a quick test, I created a controller which would use an AR model to issue the following sql command "select sleep(1)". (sleep does not come by default with PostgreSQL, you have to implement it yourself). I ran the applications with the normal postgresql adapter and used apache bench to measure the performance of 10 concurrent requests.

Here are the results:
Server Software:        thin
Server Hostname: localhost
Server Port: 3000

Document Path: /forums/sleep/
Document Length: 11 bytes

Concurrency Level: 10
Time taken for tests: 10.248252 seconds
Complete requests: 10
Failed requests: 0
Write errors: 0
Total transferred: 4680 bytes
HTML transferred: 110 bytes
Requests per second: 0.98 [#/sec] (mean)
Time per request: 10248.252 [ms] (mean)
Time per request: 1024.825 [ms] (mean, across all concurrent requests)
Transfer rate: 0.39 [Kbytes/sec] received

Almost 1 request per second. Which is what I expected. Now I switched to the new adapter, restarted thin and redid the test.

Here are the new results:

Server Software: thin
Server Hostname: localhost
Server Port: 3000

Document Path: /forums/sleep/
Document Length: 11 bytes

Concurrency Level: 10
Time taken for tests: 1.75797 seconds
Complete requests: 10
Failed requests: 0
Write errors: 0
Total transferred: 4680 bytes
HTML transferred: 110 bytes
Requests per second: 9.30 [#/sec] (mean)
Time per request: 1075.797 [ms] (mean)
Time per request: 107.580 [ms] (mean, across all concurrent requests)
Transfer rate: 3.72 [Kbytes/sec] received

Wow! a 9x speed improvement! The database requests were able to run concurrently and they all came back together.

I decided to simulate various work loads and test the new implementation against the old one. I devised the workloads taking into account that the test machine did have a rather bad IO perfromance so I decided to use queries that would not tax the IO but still would require the PostgreSQL to take it's time. The work loads were categorized as follows:

First a request would issue a "select 1" query, this is the fastest I can think of, then for the differen work loads

1 - Very light  work load,  every 200 requests, one "select sleep(1)" would be issued 
2 - Light work load, every 100 requests, one "select sleep(1)" would be issued
3 - Moderate work load, every 50 requests, one "select sleep(1)" would be issued
4 - Heavy work load, every 20 requests, one "select sleep(1)" would be issued
5 - Very heavy work load, every 10 requests, one "select sleep(1)" would be issued

I tested those workloads against the following

1 - 1 Thin server, normal postgreSQL Adapter
2 - 2 Thin servers (behind nginx), normal postgreSQL Adapter
3 - 4 Thin servers (behind nginx), normal postgreSQL Adapter
4 - 1 Thin server, neverblock postgreSQL Adapter

I tested with 1000 queries and a concurrency of 200 ( the mutliple thin servers were having problems above that figure, the new adapter scaled up to 1000 with no problems, usually with similar or slightly better results )

Here are the graphed results:

For the neverblock thin server I was using a pool of 12 connections. As you can see from the results, In very heavy workload I would perform on par with a 12 Thin cluster. Generally the NeverBlock Thin server easily outperforms the 4 Thin cluster. The margin increases as the work load gets heavier.

And here are the results for scaling the number of concurrent connections for a NeverBlock::Thin server

Traditionally we used to spawn as many thin servers as we can till we run out of memory. Now we don't need to do so, as a single process will maintain multiple connections and would be able to saturate a single cpu core, hence the perfect setup seems to be a single server instance for each processor core.

But to really saturate a CPU one has to do all the IO requests in a non-blocking manner, not just the database. This is exactly the next step after the DB implementation is stable, to enrich NeverBlock with a set of IO libraries that operate in a seemingly blocking way while they are doing all their IO in a totally transparent non-blocking manner, thanks to Fibers.

I am now wondering about the possibilities, the reduced memory footprint gains and what benefits such a solution can bring to the likes of dreamhost and all the Rails hosting companies.

NeverBlock, MySQL and MySQLPlus


Labels: , , ,

I have great news for MySQL users. A very nice side effect has emerged from the development of the NeverBlock support for MySQL. I am glad to announce the release of a new MySQL driver for Ruby applications. It builds on top of the original Ruby MySQL driver but it comes with two notable additions:
  1. Asynchronous query processing support

  2. Threaded access support

Thanks to help from Roger Pack and Aman Gupta we were able to put the thing together that you can use and test right now (on Ruby1.8 and 1.9)

To install it please do:
sudo gem install espace-mysqlplus

Then you can use it in your code as follows:
require 'mysqlplus'
mysql = Mysql.real_connect(..)
mysql.query("select sleep(1)")

The test folder of the gem contains examples for threaded and evented implementations.

The announcement page in NeverBlock shows benchmark results for running the sleeping queries in normal(blocking), evented and threaded modes. The normal mode is 10X slower, which is normal due to its inability to run queries in parallel.

Now that Rails is becoming so-so-thread-safe this should show tremendous gains with Rails deployments that use MySQL (PostgreSQL already has such facilities).

ActiveRecord meets NeverBlock


Labels: , , , ,

I happily announce the release of the first NeverBlock enabled activerecord adapter. The neverblock-postgresql-adapter. This is a beta release but I have been testing it for a while now with great results.

And while this is a big improvement it only requires you to replace the driver name in the connection to neverblock_postgresql instead of postgresql as described in the official neverblock blog

To make a long story short, this enables active record to issue queries in parallel, much like in a multi-threaded application. But this has several advantages over multi-threaded operations:

  1. Fibers are cheaper than threads so this solution is theoretically faster.

  2. NeverBlock does not require full thread safety, just avoid using globals and static variables for transient state.

  3. It integrates nicely in evented programs thus eliminating the performance drop which occurs with the introduction of threads in such environments

I have benchmarked this against the plain postgresql adapter using different workloads categorized as follows

Very Light : A single count statement
Light : A single count and a create
Moderate : 2 counts and a create wrapped in a transaction that rolls back
Heavy : 3 counts, a create and an update wrapped in a transaction that commits
Very Heavy : 3 counts, one conditional count (on a non-indexed field), a create and two updates all wrapped in a transaction that commits

(if you are wondering why these queries in particular, they were extracted from some other code)

All were issued 1000 times

The results came as follows:

As you can see, NeverBlock::AR is persistently faster than vanilla AR. It appears that such work loads generate linear increase for both AR and NeverBlock::AR as the NeverBlock advantage was almost the same

Another benchmark was performed to test the effect of increasing the connection count for NeverBlock::AR. We tested with 2, 4, 8, 16 and 32 connections.

The benchmark consisted of first running "select 1" 5000 times and then running "select sleep(10)" "select sleep(1)" 20 times for each configuration.

As you can probably guess, increasing connection count has very little effect if the queries are all very fast (you cannot beat "select 1") but if the queries are all slow, you will be able to double the performance by simply doubling the connection count.

I hope this gives you a glimpse of what's coming next. Watch this space

101 Reasons Why PostgreSQL is a better fit for Rails than MySQL


Labels: , , ,

1 - Indexing Support

MySQL cannot utilize more than one index per query. I believe this is worth repeating: MySQL CANNOT UTILIZE MORE THAN ONE INDEX PER QUERY. Wait till your tables get large enough and this will surely hit you. OTOH PostgreSQL can use multiple indices per query which come real handy.

2 - Full Text Indexing Support

MySQL can do full text indexing on MyISAM tables only, those working with InnoDB tables are out if luck. PostgreSQL has very advanced full text indexing capabilities wich enable you to control the tiniest details down to the stemming strategy.

3 - Asynchronous Interface

MySQL drivers are very unfriendly to the Ruby interpreter. Once a command is issued they take over until they come back with results. PostgreSQL sports a completely asynchronous interface where you can send queries to the database and then tend to other matters while the query is being processed by the server. The good news is that an Async ActiveRecord adapter for MySQL is being developed right now, as part of the rapidly growing NeverBlock library.

4 - Ruby Threading Aware

PostgreSQL dirvers enable the Ruby thread scheduler while IO requests are being processed (a nice side effect of the async interface). Which makes it much better suited for multithreaded Rails apps.

5 - Multistatements Per Query

Both MySQL and PostgreSQL support sending multiple statements separated by semi colons at once. But the returning result will be that of the last statement in the group. Now did you know that by using the async interface you can send multiple queries at once and then get back the results, one by one? One of the coolest features of the coming ActiveRecord (and Sequel btw) adapter is it's support for queuing queries to be consumed by a pool of connections. A trick we are contemplating working on is to group consequent selects together and send them in a single request to PostgreSQL and then later extract the results associated with each one of them. This is still very theoretical but should be verified soon.

Now that the 0b101 reasons are told I rest my case.

NeverBlock, much faster IO for Ruby


Labels: , , , ,

At eSpace we have just released an alpha version of NeverBlock. A library that aims to bring evented IO to the masses. It does so by wrapping all IO in Fibers which handle all the async aspects and hides them totally from the developers.

Just as a teaser, here are some benchmarks of running PostgreSQL queries with and without NeverBlock

10x performance boost? how about that?

I am working on extending the NeverBlock library now, watch this space for great news soon

Document Matching In Ruby


Labels: , , ,

With the debut of the new version of meOwns we have introduced several features that are concerned with how objects are related to each other. In the user profile page, you get a list of other user that are similar to him/her. And you are faced with the chemistry meter which tells you how much you are related to this user (if you are logged in of course). If you happen to be viewing your own profile page you will get a list of recommended items that you might be interested in. Last but not least, when you view an item, you get a list of similar items.

In this article I will be talking about the features from an implementation point of view. Naturally the first hurdle was to define the problem. What we needed was to find a way to match items and users. So first we needed to represent them in a way that can be matched. We started from the items first and looked at how to match two items together.

What is an item? In meOwns an item is simply a name, a description, a type and some tags. We decided to ignore photos and comments from the item. Each of those fields gets a weight which affects the value of terms found in it. Here is a sample product (encoded in Yaml):
Name : Fiat Sienna
Type : Car
Description : 1.6 HP, not bad for a sedan, relatively good performance for the price, best sedan i've bought
Tags : Cars Fiat Silver
The above fields are then processed to extract relevant terms from them, this is done in the following manner

  1. Remove punctuation and non alphanumeric characters (replace them with spaces)

  2. Collapse spaces and split the text on them

  3. Match the generated list of terms to a stop word list to remove them (words like "on", "the" should not be considered in the index)

  4. The remaining terms are converted to lower case and then converted to their stem representation (we use a snowball stemmer for now)

The above can be represented as follows:
Attribute : Value
Name : fiat sienna
Type : car
Description : hp sedan performance price sedan buy
Tags : car fiat silver
After doing so, we create a list of terms with their frequencies, each field has a frequency multiplier according to its significance. Assumming a multiplier value of 1 for all the fields
Term : Frequency
fiat : 2
car : 2
sedan : 2
sienna : 1
performance : 1
silver : 1
buy : 1
hp : 1
This Term-Frequency vector is the basis of doing item matching in meowns. Several different approaches can be implemented to reach an item representation. Even the details of a given approach can vary significantly. I have chosen to stick to the easiest approach in the initial implementation. Those willing to dig further are free to lookup more into document representation and indexing strategies.

User representation is just an aggregation of their item (and wished item) representations. This way user and items share the same term vector structure and hence we can match users to items as much as we can match items to items and users to users.

The matching process involves further encoding of the term frequency vector. Those in the academia refer to this as TF-IDF (Term Frequency, Inverse Document Frequency) representation. In lay man's terms this is a representation of how significant a term is to the certain document.

It is composed of the product of two parts. This first which is TF (Term Frequency) is simply the frequency of the term in the document divided by the total sum of frequency of terms in the same document.

The second (IDF) is the total number of documents in the corpse (the document store) divided by the number of documents which contain this term. If the term is found in all documents then the IDF will be equal to 1 and hence won't have an effect on the final product of the two parts. On the other hand, if we have a corpus of 1,000,000 documents and only one with the given term then it will multiply the TF value by 1,000,000 which is significant.

A slight variation (widely used) is to multiply the TF with the Logarithmic value of the IDF. After we are done calculating TF-IDF values for all the terms in term vectors matching can be done as follows
Term Vector A vs. Term Vector B = cos a = (A . B) / (|A|.|B|)
A.B = dot product for the two vectors of TF-IDF values
|A|.|B| = scalar product of the magnitudes of the two vectors
The returned value is called the cosine similarity between the two items. It ranges between 0 and 1 where zero means no correlation and one means exact match. We are still experimenting with threshold values but these are essentially the figures you get when you see another user's chemistry meter for example. In another installment we will discuss how are we implementing behind the scenes matching of users and items in an efficient way.

Just to justify calling the post document matching in Ruby, here's a Ruby code to implement the above
# Monkey patch string to be able to extract terms from any string
Class String
def to_terms(boost = 1, terms = {})
# remove all non letters and reject stop words
terms_list = self.gsub(/(\s|\d|\W)+/u,' ').rstrip.strip.split(' ').reject{|term|$stop_words.include?(term)}
# transform to a hash with a frequency * boost value
terms_list.each do|term|
if terms[term]
terms[term] = terms[term] + boost
terms[term] = boost

#our item class, which we match upon
Class Item
def to_terms(terms = {})
#a hash of attributes to serialize
{:name => 10,
:description => 3,
:type_name => 1,
:tag_names => 1}.each_pair do |field, boost|
terms = send(field).to_terms(boost, terms)
The above methods enable us to extract the terms from different items. The first method refers to a global (bad me) stop word list which should be present.

Now that we have the items represented as term frequencies we can generate their tf-idf vectors (we will keep them as hashes though) and we can use them to do the matching
Class Item
def to_tf_idf
#assume we have a method that returns the df value for any term
terms = self.to_terms
total_frequency = terms.values.inject(0){|a,b|a+b}
terms.each do |term,freq|
terms[term]= (freq / total_frequency) * self.df(term)
magnitude = magnitude + terms[term]**2
terms, magnitude
def match(item)
my_tf_idf, my_magnitude = self.to_tf_idf
his_tf_idf, his_magnitude = item.to_tf_idf
dot_product = 0
my_tf_idf.each do |term,tf_idf|
dot_product = dot_product + tf_idf * his_tf_idf[term] if his_tf_idf[term]
cosine_similarity = dot_product / (my_magnitude * his_magnitude)
Pretty easy, now you get a value between 0 and 1 that represents how similar those two items are.

The case for a nonblocking Ruby stack


Labels: , , , , , , , , , ,

In a previous post I talked about the problems that plauge the web based Ruby applications regarding processor and memory use. I proposed using non-blocking IO as a solution to this problem. In a follow up post I benchmarked nonblocking vs blocking performance using the async facilities in the Ruby Postgres driver in combination with Ruby Fibers. The results were very promising (up to 40% improvement) that I decided to take the benchmarking effort one step further. I monkey patched the ruby postgres driver to be fiber aware and was able to integrate it into sequel with little to no effort. Next I used the unicycle monorail server (the EventMachine HTTP server) in an eventmachine loop. I created a dumb controller which would query the db and render the results using the Object#to_json method.

As was done with the evented db access benchmark, a long query ran every n short queries (n belongs to {5, 10, 20, 50, 100}). The running application accepted 2 urls. One ran db operations in normal mode and the other ran in nonblocking mode (every action invocation was wrapped in a fiber in the latter case)

Here are the benchmark results

Full results

Comparing the number of requests/second fulfilled by each combination of blocking mode and conncurrency level. The first had the possible values of [blocking, nonblocking] the second had the possible values of [5, 10, 20, 50, 100]

Advantage Graph

Comparing the advantage gained for nonblocking over blocking mode for different long to short query ratios. Displaying the results for different levels of concurrency

And the full results in tabular form

Concurrent Requests
Ratio 10 100 1000

1 To 100 Nonblocking 456.94 608.67 631.82
1 To 100 Blocking 384.82 524.39 532.26
Advantage 18.74% 16.07% 18.71%

1 To 50 Nonblocking 377.38 460.74 471.89
1 To 50 Blocking 266.63 337.49 339.01
Advantage 41.54% 36.52% 39.20%

1 To 20 Nonblocking 220.44 238.63 266.07
1 To 20 Blocking 142.6 159.7 141.92
Advantage 54.59% 49.42% 87.48%

1 To 10 Nonblocking 130.87 139.76 195.02
1 To 10 Blocking 78.68 84.84 81.07
Advantage 66.33% 64.73% 140.56%

1 To 5 Nonblocking 70.05 75.5 109.34
1 To 5 Blocking 41.48 42.13 41.77
Advantage 68.88% 79.21% 161.77%


In accordance with my expectations. The nonblocking mode outperforms the blocking mode as long as enough long queries come into the play. If all the db queries are very small then the blocking mode will triumph mainly due to the overhead of fibers. But nevertheless, once there is a even single long query for every 100 short queries the performance is swayed into the nonblocking mode favor. There are still a few optimizations to be done, mainly complete the integrations with the EventMachine which should theoritically enhance performance. The next step is to integrate this into some framework and build a real application using this approach. Since Sequel is working now having Ramaze or Merb running in non-blocking mode should be a fairly easy task. Sadly Rails is out of the picture for now as it does not support Ruby 1.9 yet.

I reckon an application that does all its IO in an evented way will need much less processes per CPU core to make full use of it. Actually I am guessing that a single core can be maxed by a single process. If this is the case then I can live much happier if I can replace the 16 Thin processes running on my server with only 4. Couple that with the 30% memory savings we get from using RubyEE and we are talking about an amazing 82.5% memory foot print reduction without sacrificing performance.

Ruby Fibers Vs Ruby Threads


Labels: , , , ,

Ruby 1.9 Fibers are touted as lightweight concurrency elements that are much lighter than threads. I have noticed a sizbale impact when I was benchmarking an application that made heavy use of fibers. So I wondered what If I switched to threads instead? After some time fighting with threads I decided I needed to write something specific for this comparison. I have written a small application that would spawn a number of fibers (or threads) and then would return the time went into this operation. I also recorded the VM size after the operation (all created fibers and threads are still reachable, hence, no garbage collection). I did not measure the cost of context switching for both approaches, may be in another time.

Here are the results for creation time:

And the results for memory usage:


Fibers are much faster to create than threads, they eat much less memory too. There is also a limit on the number of threads for 1.9 as I maxed on 3070 threads while fibers were not complaining when I created 100,000 of them (but they took 203 seconds and occuppied a whoping 500MB of RAM).

Final Veridect, No One To Blame!


More than a thousand humans drowned in the sea but apparently no one is to blame for this accident.

Apparently no one cares.

Are lives that cheap? Looks like some people believe so.

Faster IO for Ruby with Postgres


Labels: , , , , ,

Or 40% faster DB Access for your Ruby applications!

In a previous post I talked a bit about event based programming for Ruby. I mentioned the EventMachine/Asymy combo as a means of doing Asynchronous database operations hence freeing up the Ruby runtime to do other things while it is waiting on database I/O operations. Even more, the devs need not worry about using a different programming model, with the help of Ruby Fibers we will continue to program in the same old ways while Fibers will be doing all the twisted work underneath. Very promising indeed, but one big elephant in the room was the immaturity of the current solution. Asymy is still very infant and it is based on the super slow pure Ruby MySQL driver, not to mention that it is fairly incomplete as well.

So, what can we do about the elephant in the room? There is an Arabic proverb that basically says "Nothing can beat iron but iron" and this is exactly what we are going to do. Enter Postgres, the database with a realistic, unfriendly elephant mascot. Go away dolphins, a real elephant is in the room now.

Surprisingly Postgres happens to have an excellent asynchronous client API. It allows you to do almost all operations in a non blocking way. More surprisingly the Postgres driver for Ruby covers almost all those asynchronous API calls. The driver was originally written by Matz (yes the man himself) in 1997. It was later updated by ematsu in 1999 and now we have an update fresh from the oven in March 2008 by Jdavis. If you go through the C source code you will find many hidden gems. The methods are fairly well documented and you will discover that the driver has a blocking method that wraps the asynchronous calls inside but it does so in a Ruby threading friendly way. This way a threaded application will not block on the Postgres SQL commands. Good thing but I am more interested in the asynchronous side of the fence.

Let's walk through the API and see how can we use it to do non blocking database access. First you will need to install the gem ("sudo gem install pg"). Then you need to require 'pg' in your code.

One problem though before we start. The current driver has this nasty little bug that prevents you from setting the connection to nonblocking. It is actually a bug in the parameter count defined in the Ruby interface. A simple switch from 0 to 1 fixes this. To save you time and sweat I have provided a replacement gem with the modified sources (till the bug is fixed upstream). Now let's get back to the code.

Here's how to get things started
require 'pg'
# I have configured postgres to run in *trusted* mode
# so I don't need to supply a password
conn ={:host=>'localhost',:user=>'postgres',:dbname=>'evented'})
This way our connection is ready for async operations. Now we need to start sending some sql commands to our connection. To do that we normally use the PGconn#exec method. But this method will block, waiting on postgres. So instead we will use the PGconn#send_query method. This method will return immediately, not waiting for Postgres to actually process the sql command. Here's how are going to use it.
conn.send_query("select * from users where name like '%am%'")
# the method will return immediately (or raise an exception in case of an error)
But wait, where are the results? Normally we expect the call to return with the data. Now where is my data? The results are being processed right now at the server side. We can continue to do other things till they come. But how do we know when they arrive? It turns out that this is easy as well. The PGconn instance provides a method that returns the connection's socket descriptor. PGconn#socket that is. We retrieve that socket descriptor and wrap it in a Ruby IO object by calling
io =
Now have a nice IO object that we can get notified of its activity in a select call. For the uninitiated, event based programming is done by have a tight loop that runs forever. Within this loop we check if IO events happen and if so we respond to them. One efficient way of doing so is using the Ruby Kernel#select method (which is a wrapper to the UNIX select). The select method works that way: you provide it with three lists, one for sockets that you need to read from and one for sockets that you need to write to, the third is for errors that you are interested in. The call returns an array of the sockets that can be read/write or nil if none is ready.

We will use select as follows:
# the method that will be called if input is ready
def process_command(conn)
# we will detail the implementation soon

loop do
# we supply a list of sockets we need to read from.
# Only our io object in this case. we nullify
# the other lists and we set a timeout
res = select([io],nil,nil, 0.001)
# of course this needs to be done in a cleaner way
process_command(conn) unless result.nil?
This way whenever there is info to read from the socket we will not get a nil (we will get an array actually) so we can call the process command. When the process command gets called it knows that there is data in the connection to be read so it calls the PGconn#consume_input method. After which it checks to see if the conn is busy or not. If it is still busy, it does nothing (it will do in a later event). On the other hand, if the connection is not busy then we start calling the PGconn#get_result method and append what we get to the result we got so far. We keep doing that till we get a nil result which indicates the end of the command and the readiness of the connection to accept further commands. Here is how the method will look like:
def process_command(conn)
unless conn.is_busy
res, data = 0, []
while res != nil
res = get_result
res.each {|d| data.push d}unless res.nil?
#we are done, we need to put this data some where
Several things to be noted. First, one cannot process several commands using the same connection at once. You need several connections to achieve parallel command processing. Second, the model described above works in the twisted way, to get things working the normal way you can use Ruby Fibers (or continuations but they apparently leak memory)

I have put together a couple of Ruby classes that implement a nonblocking connection pool and a fiber pool. You can find them here Using those you can write code that looks like this:
require 'fiber_pool'
require 'fibered_connection_pool'

options = {:host=>'localhost',:user=>'postgres',:dbname=>'evented'}

cpool = FiberedC, 12)
# second param is the number of connections to spawn, defaults at 8
# note that one more connection than those will be spawned. This one
# will be used for processing blocking requests.

fpool =
# the number of fibers to spawn, defaults at 50

100.times do
fpool.spawn do
cpool.exec(some_sql_command, true) #true means async
cpool.exec(some_other_sql_command, true)
cpool.exec(yet_another_sql_command, true)

# our event loop
loop do
res = select(cpool.sockets,nil,nil,0) #check for something to read
# IO is monkey patched to be able to hold a reference to the connection
res.first.each{ |s|s.connection.process_command } if res
This works as follows, once a fiber calls cpool.exec the query is sent to the pool for processing and the fiber is halted, giving way for another one to start processing. The other one will halt as well once it hits a cpool.exec. Later during the event loop you will get notifications of completion of queries (in any order) and resume the fiber associated with the finished query. Note that commands issued in the same fiber will run sequentially while those issued from different fibers will interleave. This is effectively what is achieved by threading but without its costs.


I am sure that my code might use some tweaking but I am getting very good results already. During benchmarking I found out that the cost on isntantiating fibers could be high (the cost of pausing and resuming is high as well, but unavoidable) So I created a pool of fibers that can be reused (a very naive implementation that can make use of lots of improvement).

I tested by issuing a group of long and short queries together. You actually provide the test program with the number of long queries and the multiplier it should use for short queries. i.e. ruby test.rb 10 20 will iterate 10 times and issue a long query then within the same iteration it will issue 20 short queries. It will do this in a blocking and then nonblocking way, reporting the time taken for each to complete and the percentage of performance increase/decrease.

I tested for 10, 50 and 100 long queries with the following multipliers (1, 2, 5, 10, 50, 100). The graph shows the performance gain for each number of queries vs the multiplier. For example 50 long queries with a multiplier of 10 (i.e. 500 short queries) achieves a 39.6% reduction in query execution time. I have repeated many of the tests several time (not all of them, too lazy to do that). The repeated tests showed consistent results so I am pretty confident of the presented results.

Here is the full list:

Queries Mode
Ratio Long Short Blocking Non Blocking Advantage

:1/2 10 20 0.56 0.5 10.27%
50 100 2.55 2.26 11.19%
100 200 5.15 4.46 13.53%

:1/5 10 50 0.55 0.4 27.04%
50 250 2.72 1.83 32.82%
100 500 5.45 3.63 33.39%

:1/10 10 100 0.6 0.4 33.76%
50 500 3.01 1.82 39.67%
100 1000 5.9 3.65 38.13%

:1/20 10 200 0.72 0.45 38.12%
50 1000 3.43 2.1 38.73%
100 2000 6.83 4.33 36.53%

:1/50 10 500 0.98 0.62 36.57%
50 2500 4.78 3.23 32.36%
100 5000 9.74 8.68 10.93%

:1/100 10 1000 1.46 0.94 35.40%
50 5000 7.42 5.17 30.31%
100 10000 14.27 12.68 11.15%
The area I would like to focus on for performance tuning is the size of the fiber pool. The test is a bit sensitive to it so I believe I can gain a bit more performance with insane query counts if I optimize my fiber pool a bit. Setting the initial size too high certainly helps, but eats too much memory to make it usable.

A final note. I am playing with using this along side an EventMachine based http server. It works OK but is a cpu hog. Propably due to using select in next_tick calls withing EM's event loop. I would love to be able to provide EM with a list of IO objects and a call back instead of requiring me to use it to open the connection. Nevertheless, even though in many cases the nonblocking db implementation is slower than a blocking one in the http serving arena, I managed to get ~800 req/s vs ~500 req/s for a very typical use case, A request that runs a long query followed by many short ones. Impressive to say the least. I might be even try to hack EM to support the feature I need and then see what performance this could yield.


Apparently one can get more performance for the blocking requests if the fiber pool is initiated AFTER the blocking calls. Possibly due to the VM being impacted by the memory increase. Rerunning some of the tests showed fractional improvements for the blocking case. On the other hand, I tried some of the tests while another process was doing heavy I/O (RDoc generation). The performance gain jumped to an amazing 76% in one of the tests (it was generally between 51% and 76%).

Untwisting the Event Loop


Labels: , , , , , ,

Have you ever wondered why your Rails application is so memory hungry while it is not really trying to fully utilize your CPUs? To saturate your CPUs you have to have a large number of Thin (or Mongrel or whatever) instances. Why is that? We all know that the Ruby interpreter is not able to utilize more than one CPU (or no more than one CPU at a time in the case of 1.9). But why can't Ruby (or may be it's Rails?) utilize the processors efficiently? Let's look for an answer to this question.

First off, what happens in a typical Rails action? The Rails framework will be doing some request mapping and routing which is mostly CPU (if we consider memory latency negligible) Then a few requests will be sent to the database to retrieve some data after which a rendering process which is mostly CPU as well.
def show
@user = User.find(params[id]) #db access
@events = Events.find(:all) #another db access
render :action => :show #rendering

The problem here comes with the database part of the action. Calls to the database will block processing till results get back from the DBMS. During that time, Rails will be frozen and not trying to do any thing else till the call ends. Good news is that threads can help here (even Ruby's green threads). A blocked thread will give way to other threads till it is back in the ready state. Thus filling those slots with some useful processing. Sounds good enough? NO!

Sadly Rails is NOT thread safe. You cannot use threads to do parallel processing in Rails. So why not something like Merb? I hear you say. Well Merb and threads will be able to interleave CPU operations and help with the time spent on IO in something like fetching data from some other service. But it won't save you when you do database IO. Simply because of the simple fact that calling C extensions blocks the whole Ruby interpreter. Yes, you read it correctly the first time. Nothing cannot be scheduled while a native call is being issued. Since database drivers are mostly C extensions they suffer from this. Your nice SELECT statement keeps the whole Ruby interpreter on hold till it is finished.

But there must be a solution to this. We cannot be all left high and dry with interpreters eating our memory and not really using our CPUs.

Enter EventMachine and AsyMy

For those who are not in the loop of events (bun intended) there happens to be another approach to this problem. Event based (read asynchronous) IO. In this mode of operation you request an IO operation and tell the event loop what to do when the request is fulfilled (either fully or partially). An excellent library for event handling exists for Ruby which is Francis' EventMachine (used internally by the Thin server and the evented flavour of Mongrel). But still, using EventMachine does not magically solve all our problems. The question that keeps popping up, what to do with database access? AsyMy to the rescue! AsyMy, written by Thomas Ptacek, is an evented driver for MySQL that operates in an asynchronous fashion. A quick example will look like:
connection.execute('SELECT * from events') do |headers,data|
# do something with headers and data
pp headers
pp data
Asymy is still in a very early stage, the performance is horrible (as it is based on the darn slow pure Ruby MySQL driver) and it comes with many rough corners (I was not able to run INSERTs and UPDATEs without hacking it, and I am still not able to run the callbacks for those). Nevertheless, this is a formidable achievement on the road to a very fast single threaded implementation.

Here's how our action would look like if there was an Asymy adapter for ActiveRecord
#this is propably wrong but it can illustrate
#the twisted nature of evented programming
def show
User.find(params[:id]) do |result_set|
@user = result_set
Events.find(:all) do |result_set|
@events = result_set
@events.each do |event|
event.owner = @user
if event != @events.last
else do |ev|
render :action => :show
We had to twist the function flow to be able to make use of the evented nature of the new driver. Instead of flow passing normally it is being scattered in the different callbacks. This is one of the areas where event based programming makes you change the way you think about program flow. A hurdle for many developers and a show stopper for some. No wonder the event library for Python is called Twisted

Why not untangle this with Fibers?

Fibers are lightweight concurrency primitives introduced in Ruby 1.9. How light weight? well they don't come at zero cost but in long running requests the weight they add can be negligible. Fibers provide some form of cooperative (rather than preemptive) concurrency inside a single thread (you cannot pass fibers between threads, you have been warned). Fibers enjoy the ability to pause and resume like continuations, but they don't suffer from the memory leaks the continuations have. When we use this feature wisely we can unwind the action code above to look like this:
def show
@user = User.find(params[:id]
@events = Events.find(:all).each do |event|
event.owner = @user
Huh? this is the normal action code we are used to. Well, using fibers we can do this and still do things under the hood in an evented way.

To make things clear we need to illustrate Fibers with an example:
require 'fiber'

fiber = do
#do something
Fiber.yield another_thing
#do yet another thing

yielded = fiber.resume # => runs the fiber till the yield,
# returns the yielded value
# and pauses the fiber where it is
fiber.resume #=> re-runs the fiber from the point it was paused.
fiber.resume #=> no more statements to run, raises an exception
Let's see how can this be useful for dispatching controller actions (this code will preferrably be in the server itself) do
send_response res
Inside the action we call the find method repeatedly. This method could be implemented like this:
class DataStore
def find(*args)
query = construct_query(*args)
fiber = Fiber.current # grab the current fiber
conn.execute(query) do |headers, data|
fiber.resume convert_to_objects(data)
This way whenever the code passes a find method it will pass the query to the db driver, return immediately and pause, giving room for other requests to be processed. Once the data comes from the db server the call back is run and it resumes the fiber (passing to it the result of the query). The result gets passed back to the caller of the function and the original action method continues till completion (or till it is paused again by another find method)

Roger Pack has a nice writeup (with actual working code) on the Evented Fibered combo here.

Charles Jolley implemented a similar thing here. It is called Pipelined and while it is more obtrusive than the approach described above, it still has the advantage of being optional. Pipelined uses continuations and hence is available to Ruby 1.8 (and Rails).

I am still ironing out and tying things together (and doing lots of benchmarks) and I would like to tell you that I have ditched AsyMy for now for another alternative which I will attempt to discuss in detail in another blog post.

Meet Hamza


Labels: , ,

وَلَقَدْ خَلَقْنَا الْإِنسَانَ مِن سُلَالَةٍ مِّن طِينٍ
(12) ثُمَّ جَعَلْنَاهُ نُطْفَةً فِي قَرَارٍ مَّكِينٍ (13) ثُمَّ خَلَقْنَا النُّطْفَةَ عَلَقَةً فَخَلَقْنَا الْعَلَقَةَ مُضْغَةً فَخَلَقْنَا الْمُضْغَةَ عِظَاماً فَكَسَوْنَا الْعِظَامَ لَحْماً ثُمَّ أَنشَأْنَاهُ خَلْقاً آخَرَ فَتَبَارَكَ اللَّهُ أَحْسَنُ الْخَالِقِينَ (14) المؤمنون

Verily We created man from a product of wet earth (12); Then placed him as a drop (of seed) in a safe lodging (13); Then fashioned We the drop a clot, then fashioned We the clot a little lump, then fashioned We the little lump bones, then clothed the bones with flesh, and then produced it as another creation. So blessed be Allah, the Best of creators!(14) The Believers, Holy Quran


Meet Hamza, my firstborn son. God's gift to me. I am so full of emotions that I am not able to express them adequately. It has been one very strange night when we left for the hospital. We wanted to have a natural childbirth but he refused to turn upside down. We decided to go for an operation and we had the luxury of presetting the date of birth. A bit less hassle than the shock that accompanies a surprising labor pain but still manages to make you very anxious.

We drove to the hospital, the doctor was there and my wife suddenly realized that there is no turning back and that she is going to actually be operated upon. She didn't panic or anything but she got her self one of the palest faces I ever saw on her. The doctor came to the room and recommended epidural anesthesia, which meant she will be awake during the operation. I gulped, as I was not sure if she will feel pain that way or not. Then twice gulped when the doctor asked me to change into an operation suit. "I will need your help there" he said.

I am not sure how did I change or how did the time pass when I was waiting in the doctor's area. Watching them washing their hands over and over (much like in the Panadol commercial). I sat there, silent, contemplating the moment. Thinking of what I might say to support her and if I am really capable of seeing her being cut. It was bizarre, ideas seemed to flash very fast in my mind and strangely it seemed totally empty at the same time. Something similar to that feeling when you try to memorize everything in your head minutes before an exam.

That state was preempted when they called for me. I went inside. My first time in an operation room. They were about to start. I sat on a chair so that my head was a bit higher than hers. I leaned forward to be level with her and started whispering in her ear. She was praying, she was repeating prayers and she was trying to remember the names of every person she knew. To pray for him/her. At that time I glanced over her. Just a glimpse as I ducked immediately after I saw that terrifying blood stream.

I tried as hard as I can to hide the effect of what I saw. I kept soothing her and helping her with her prayers. At that time we were both repeating: "My God, Nothing goes smooth but what you make smooth, and You are who makes hardship smooth" and "All power and might belong to Allah, the most High, the Great". Again I took another glimpse, and my God! I saw a little foot in the doctor's hand!.

It was pale, blueish white. The skin was wrinkled as wrinkled can be. The doctor was holding to it and pushing the rest of the baby out. My eyes were fixed on the scene in front of me. I was on complete awe. The miracle of birth. Exalted be Allah. I kept saying that between my words when I told my wife that I am seeing the baby. She was out of her breath. Firing questions as rapidly as she can manage. "Is he OK?", "Is he out already?". I tried to keep pace with her then resorted to "he's coming out, he's coming out"

It was a few minutes till the doctor manage to get him completely out. I saw him, very very small. Very vulnerable. Very weak. I couldn't help the tear forming in my eyes for the sight of him. He was crying as the nurses wrapped him and took him away for his first shower. My wife was tired beyond belief. But she asked me to run after him to take his first photos.

I ran for it and saw that they put him under some sort of a heater to keep him warm. I leaned on him, kissed him and recited the call to prayers in his both ears. They took him and showered him. And I watched the expression of disbelief on his face. It was funny to see such a thing. I wouldn't expect a different expression from an adult if was locked for nine months and then taken out like what happened to him.

I showered him myself with photos. Very funny ones. I hope the collection will grow. As I watch him grow.

I look at him now and see a much calmer expression on his face. Except when he is hungry of course. Getting him to that level means an inescapable scandal as he raises his voice loud enough to make sure that everyone in the neighborhood knows that we starve him.

Parenting is a new experience for me. I have always loved experiencing new things specially if I had a passion for them. In this case, I have more than a passion. I have love that I can't describe. I keep remembering how vulnerable I saw him that day. It affected me in a way that I have yet to understand.

How much horse power can your app engine deliver?


Labels: , , , , ,

I was wondering if the google app engine thing has the horse power to actually deliver the scalability experience that they are promising. I would expect that the app engines would be able to handle high loads and fulfill requests at decent rates.

I decided to benchmark the app engines and see if it can live to the promises that google are making.

For the benchmark I used a simple web page that visits the google data store, retrieves some data from it (small data set, so that the test wouldn't be bandwidth limited) then converts it to JSON (not using any library code for that, ugly hand written string manipulation code).

The web page (actually the currently super useless supergtd app) was written in accordance to the sample provided in the google app engine documentation. My Python skills are so immature but I can tell that everything is pretty straightforward. Route is determined, correct python file loaded and my get method is run. The method internally calls the data store and retrieves data then does some string processing and sends the result back.

I tested from a server at softlayer (this one has the most bandwidth and the least latency of all the servers that I have access too, and it's a dual quad core monster for the records)

Here are the results from the benchmark (click for a bigger version):

Frankly, I am very satisfied by the results. The app engine was capable of handling any load I threw at it. Even though the request I am testing is kinda simplistic it manages to pass through enough different parts of the stack to make it relevant. Many APIs will have resources that are only slightly more complex than the sample resource I created for this benchmark.

Bottom line, top notch performance. And you don't even need to bother with your Elastic Computing Cloud configuration. You don't manage instances or anything (or even pay by them) you only pay per usage which is a much sweeter deal if you ask me (if your app does not require that you cross the sandbox boundaries that is). Oh and before I forget, serving static content was pretty fast too. Not breathtakingly fast but very decent to say the least (you should never skip proper client caching though, at least to save the bandwidth)

If any one at Google is reading this I say "Great job, big hand for the big G". Now show me some Ruby love (and JavaScript too for what it's worth).

Did I mention that I need some Ruby love? Please.

Computing In The Cloud, Google App Engines


Labels: , ,

One day we will plug our applications in the computing cloud as much as we plug our devices into the electricity network. Surprisingly, this days is today!

The trend towards offering computational infrastructure(storage, processing, bandwidth, etc) as a service is gaining momentum everyday. Services like Amazon's suite (S3, SimpleDB, EC2, SQS) and now Google AppEngines are manifestations of this trend. Infrastructure is becoming a commodity and large scale operations already invented their wheels, relieving the small to mid size player from having to invent wheels of their on. Stand on the shoulders of the giants and don't bother with performance, redundancy or data center setup overhead.

I have had a look at Google App Engines. In a nut shell, App Engines is an web application environment that is capable of running a subset of Python(self). Even though there is no write support to the disk; the available features cover more than 90% of web development needs. You are able to process HTTP requests, generate response, configure your routes, persist data and query it, and connect to other web services. One of the most cunning features is the ability to authenticate google accounts, plus one to whoever lobbied for this feature. Suddenly your application will be available to millions of users with no registration overhead (defining your user count gets tricky that way though).

One of the good things about Google App Engines is the choice of the programming language. Python(self) has dynamic and functional features that may be able to one day open the eyes of those who think that annotations are the next-big-thing!. I have been coding exclusively in Ruby and JavaScript (and of course GammaScript) for a while now. I have only experimented briefly with Python(self) earlier, so this was a good chance for a refresher. And Python(self) was fun except for some syntax quirks that I am not used to yet. I hope google extends their language support to cover Ruby and JavaScript (I stopped caring about Java as a web development tool long ago but others may be interested in including it as well). But kudos to Google for starting with Python(self).

I was over joyed with the ease of deployment. I faced a minor bug in the supplied local webserver but had an easy work around. I am still working off Google's web framework and didn't move to something like Django or Pylons yet.

  • Platform comes with most of your development needs
  • Don't concern yourself with scaling, Google handles that for you, and is expected to do a very good job at it
  • Good choice for a starting language, Python(self) is a powerful daynamic language
  • Very good integration with Google accounts (brilliant!)
  • Ability to use full blown frameworks lik Django and Pylons

  • Google lock in for your applications. Imagine that Google gets complaints against your application from governments (Chinese may be), will they shut it?
  • Sandboxes tend to constrain you. (that might be a pro rather than a con, it is your ticket for scaling)
  • For some reason Guido van Rossum thinks I should use "self" as the first argument in every method definition. I have read several explanations but I have yet to find something that convinces me.
  • I have been bitten with the white spaces thing a couple times. I attribute it to my acquired style though. Should be handled with time.

All in all, a very good move from Google, to them and everyone helping making cloud computing a reality. Thank you for bringing us the future!.

I am tempted to test Heroku now. I have been claiming that I was too busy before. This is no longer an excuse.

Preview of coming attractions, weNear


Labels: , , , ,

coming soon, to a handset near you.

website, perview

enjoy Site Tour


Labels: , , ,

Here is a link to site tour. We are getting ready of an update in the coming weeks. Stay tuned.

An introduction to Ruby


Labels: ,

I just gave a presentation on Ruby. Uploaded to scribd as usual (here)

Read this doc on Scribd: Ruby-Programmers-Best-Friend

Introducing GammaScript


Labels: , , ,

I have been experimenting with an implementation of the GAMMA formalism in Javascript. It is a VERY naive implementation that uses setTimeouts to mimick parallelism. Still it allows experimentation with the GAMMA and the chemical programming paradigm.

The current implementation can be found here. It provides a graphical representation of the multiset status (via the canvas element). A sample application is provided. An application that calculate the value for PI in a parallel way.

For the uninitiated, GAMMA provides a chemical like reaction model. For example, imagine that you have the following set (multiset) of numbers:
S -> 1,2,1,5,4,3,7,7,3,5,9,8,3,2,2,1,5

A GAMMA program to compute the max of this set would look something like this:
C: x >= y
R: x,y -> x

or the more concise:
x,y -> x for every x >= y

in GammaScript this can be written as:
var max = {
condition : function(x,y){
return x >=y;
reaction : function(x,y){
return x.consume(y);

As you can see, none of these dictates how the set should be traversed. It is totally left to the implementation. The current Javascript implementation provides pseudo (fake) parallelism.

I am considering a new implementation that utilizes Google Gears for true parallelism. And I need to refactor the view code from the core of the implementation.


To get the thing running you need to add data (comma separated) in the left box and click the (add) link. Then you should click the start button.

For the PI calculation programs that are supplied, you need to add a single element (the tuple [0,1]) and click add then start.

meOwns, a new way to express yourself


Labels: ,

Have you noticed the widget to the left of this blog? This is a list of the items I have in meOwns, the new way to express yourself via your belongings. This is a Ruby on Rails application that is still in early beta. Registration is invitation only currently. If you would like an invitation then please send me an email with "meowns invitation required" in the subject to oldmoe at (g)mail dot com

Another JavaScript session


Labels: , ,

I have uploaded my second session on JavaScript to scribd as well. Please find it here

My Latest Javascript Session



I have uploaded the slides from my latest eSpace open session (this one about Javascript Internals) to scribd here. Please forgive the bad formatting. I hope this is useful (too many thanks for Crockford's writings, I would have been lost without those)

Sun is bringing Java to the iphone


Labels: , , , ,

Here you can watch a video by a Sun engineer talking about the plans to port the JVM to the iPhone. I believe this can increase the available applications for the iPhone tenfold overnight. The challenge would be to make full use of the iPhone features like the multi touch interface and the accelerometer.

The iPhone SDK, Objective C and Ruby


Labels: , , , ,

I bet many of you have already seen the iPhone SDK presentation already. I believe that many will be tempted by the platform. There are barriers for entry though. You have to do all development on a Mac, you have to have an iPhone (emulators are not for production testing), you have to be a partner in the Apple developer program and you have to write code in Objective-C.

Objective what? Objective-C is a language that sits on top of C++ C. It has some dynamic features and the syntax will instantly remind you of
SmallTalk. Actually I shrug to the idea of writing code that looks like SmallTalk with C++ like constructs. Sounds like sweet and sour Chinese food
to me (it reminds me of the ugly "new" operator that is off place in Javascript). But Apple has done a great job with Cocoa (the MacOS X interface toolkit) and CocoaTouch (the one designed for touch interfaces, iPhone, iPodTouch and soon iTablet). The API is very elegant and clean (I still yearn for the BeOS API though, will always do).

OK, what does this have to do with Ruby? Well, a very interesting project popped up in the Ruby core list recently. Apparently, Apple is integrating the whole of Ruby1.9 into the Objective-C runtime and it is calling the package MacRuby. Ruby code will have access to all Cocoa interfaces and vice-versa. This project is an open source one but is being spearheaded by Apple. All those actively contributing right now are Apple engineers. They are trying to expand the interfaces to their APIs and thus cater for more developers.

Will we ever see an iPhone shipping with MacRuby? Will we be able to write Cocoa Touch interfaces in Ruby? Imagine that, I will no longer be ashamed that I don't know shoes!

Generators & Generator Expressions in Javascript 1.8


Labels: , ,

Javascript has been going through a step by step evolution for a while now. Many people are unaware that new Javascript features are being added to almost every new firefox major release. And while most references are still quoting the 1.5 release, 1.6, 1.7 and even 1.8 are out now and can be used today.

One of those new features (an exciting one) is the introduction of generators. In layman's terms generators are: pause and resume for your methods. How is that? A generator is simply a normal method, but one that has the ability to yield back control to the caller while maintaining its state for future runs. This is not a very accurate description as it will not yield back to the method caller but actually to those who call next() on it. Confused already? let's use an example to make things clear.
//fibnacci example, stolen right from the mozilla docs
function fib() {
var i = 0, j = 1;
while (true) {
yield i;
var t = i;
i = j;
j += t;

var g = fib();
for (var i = 0; i < 10; i++) {
document.write( + " ");

which results in:
1 1 2 3 5 8 13 21 34 55

Before you get lost in the above code, here is a quick description of what happens:

  1. Javascript knows the above fib function is a generator (becuase it encloses the keyword yield)

  2. When you call a generator function, any parameters you send in the call are bound

  3. Rather than executing the method body it returns a generator iterator, one which you can call some of the iterator methods on (like next, send and close)

  4. The loop outside the function is run and gets called

  5. Whenever is called the fib function body gets executed, until it reaches the yield keword, at this point it returns control back to the caller of next() while its state remains intact.

  6. The result of the expression following the yield is returned to the caller of next (this is what is being generated by the generator)

  7. Subsequent calls to next() will cause the function to continue right after the yield keyword and yields control back again when it re-encounters it.

You can think of generators as a interruptable transformations. They are usually used to generate a transformation of some iteratable data while giving the callers control of when (or if) they are allowed to move forward with this generation.

Building on this, a new feature was introduced to make your life even easier. Generator expressions; Instead of having to write a generator functions it is possible to describe your transformation as a short hand in-place expression.

Consider the following generator function (also stolen from Mozilla but modified this time)
function square(obj) {
for each ( var i in obj )
yield i*i;

var someNumbers = {a:1,b:2,c:3,d:4,e:5,f:6};

var iterator = square(someNumbers);
try {
while (true) {
document.write( + " ");
} catch (error if error instanceof StopIteration) {
//we are done

this results in:
1 4 9 16 25 36

This square function will iterate over the hash values (using the for statement) and will generate the square of the current hash value and yield control back to the caller.

In this case the generator function is merely doing a very simple transformation. Thus we can easily replace it by a generator expression.

Like this example (for the third time, stolen and modified from
var someNumbers = {a:1,b:2,c:3,d:4,e:5,f:6};

var iterator = (i * i for each (i in someNumbers));
try {
while (true) {
document.write( + " ");
} catch (error if error instanceof StopIteration) {
//we are done

This line :
var iterator = (i * i for each (i in someNumbers));
Is what we call generator expressions. This is exactly like the above generator function. It returns an iterator (the assignment) that when its next method is called it does a transformation (the expression i * i) in some sort of a loop (the for statement) and returns control back to the caller after each iteration (implicitly yielding the expression result).

And there is more to generator expressions. They actually have a neat way of yielding only under some condition and the Javascript 1.8 developers (thanks Brendan et. al) came up with a cool Ruby like conditioning.

Say you only wanted to get the squares of the even numbers in the list, the above generator expression will be rewritten as:
var iterator =
(i * i for each (i in someNumbers) if (i%2==0));

Aspect Oriented Javascript, revisited


Labels: ,

Tonight I was experimenting a bit with a minimalist implementation of advising in Aspect Oriented Javascript. I Implemented the three basic advices (before, around and after). For this naive implementation I slammed the functions in the Object's prototype. This way I had access to them in all objects in the system.

Here's how to use them:
Dog.before('bark',function(){alert('going to bark')});
Dog.after('bark',function(){alert('done barking')});
User.prototype.before('login',function(){alert('logging in!')}

The actual code written is very small (20 lines, less if you discount lines taken by braces)
Object.prototype.before = function(func, advice){
var oldfunc = this[func];
this[func] = function(){

Object.prototype.around = function(func, advice){
var oldfunc = this[func];
this[func] = function(){
var myargs = [oldfunc];
for(var i=0; i < arguments.length;i++){

Object.prototype.after = function(func, advice){
var oldfunc = this[func];
this[func] = function(){
This way you can add any sorts of filters to your JavaScript methods. Enhancing on this is to be able to remove those filters once added and to add a filter to all functions of an object (recursively) at once.

We also need some fail safety against users trying to advice non functions or even undefined properties.

Tom, the cat we all love


Labels: , , ,

You have your Java web app hot from the oven. Looking around you see this fat cat (we call it Tom, Tomcat) sitting around the corner. You hand it the app hoping that it will do a good job of serving it.

But how does Tomcat manage to serve pages from our Java web app? Simple, Tomcat listens on a certain port (default is 8080) and accepts requests, spawning a thread for each request to be handled when it reaches the maximum number of allowed threads (the maxthreadcount parameter and it defaults to 150) it queues the incoming requests until a thread is free so it can hand over a request to it. When threads get idle, Tomcat will start killing them till it reaches the max spare threads count (maxsparethreads, defaults to 75)

Sounds good. That means that a default Tomcat instance can handle up to 150 requests in parallel. Meaning it will spawn up to 150 threads. Which is a good thing. The more threads, the more parallel processing we can do.

WRONG! Because of limits imposed by your combination of hardware and software the above naive statements are not true. Manly due to:

Hardware: You can have only have n running threads, where n is the number of your cpu cores. Other threads are waiting until the scheduler preempts the current ones and permits them to run.

VM: JVM uses native threads (it used green threads in the past) which means that creating threads is not a very small process. In reality the cost associated with it is a bit high.

OS: Context switching is usually a heavy operation as well. When you have many thread more than your cores you will be dealing with much of those operations.

Here is a scenario: You use Tomcat with its default settings on a Quad core machine to serve your web applications. Your website is attacked and get sustained 150+ concurrent requests. Thus Tomcat spawns his max thread limit of threads (150) and attempts to serve all the coming requests.

Since you only have 4 cores. Only 4 threads can be active at a time, neglecting the Tomcat process and any other system processes then we have our 4 cores being fought for by 150 threads. Many threads will be waiting for I/O (hopefully hardware accelerated) most of the time. Thus a single core will be able to handle more than 1 thread depending on the speed of that core and the amount of time the threads are waiting.

I would say that a single core can cope with 5 to 10 threads (processing web requests) with negligible context switching penalty. Having more than that will result in too many context switches for threads congested on the core. With the default Tomcat settings, a cpu core will be handling 37 threads on average. This will lead to poor performance under heavy load and will slow down the application rather than help it run faster

So, what should we do with the maxthreads setting?

  1. Start from an informed position, knowing how much cores in your system you can just throw a suitable amount by following my (rather simplistic) approximation above of 5 to 10 (it is a guesstimate and it may turn out very bad for your specific case so don't say I didn't warn you) or ..

  2. Use a benchmarking tool, like Apache bench (from and start testing your typical workload on your production machine with 1 thread per core setting. Record you requests/second and then redo the tests with more threads added. Stop adding threads when you can't get better performance. At this point, if you are not satisfied with your performance you can either:

    1. Get faster hardware
    2. Optimize your application and redo the benchmarking again
    3. Both of the above

Mixing Asset and Page Caching In A Multiserver Setup


Generally you would set an expires header for your assets so that your clients would retrieve them from their local caches and not bother your servers for a good amount of time.

Setting this for JavaScript and Image files causes your site to feel much faster on pages with many images and JavaScript files. But what do we do when any of those static resource change? For Images I usually change the filename with the new version and change the reference. For JavaScript files a common practice is to append some version information to the file name, usually a time stamp of the last modification date. This way when a file changes the reference to it changes as well and clients no longer use the old the cached resource and they will request the new one.

The simplistic time stamping approach works fine on a single server setup. When you add more servers you will find that you will need a more distributed safe way other than time stamping. One such way is to use your repository's revision file number. As long as you consistently deploy to all the machines you will have the same revision number on all the servers. In that case your files can look like this, application_235.js and common_42.js

Another issue arises with caching. If you are caching your entire responses (in memcached for example) and it references a Javascript file which happens to change its version then the response will keep asking for the older version rather than the new one. This can easily be solved by appending the application revision number to the cache key, i.e. "/users/1235/profile.html_1269". This way whenever the revision is upped your application will look for the latest ones in the caches and the older ones will auto expire (if you are using a cache store with auto expire capability like memcached)

Now, just relax and watch your web server serving static files blazingly fast while you are assured that everything is in sync.