NeverBlock, MySQL and MySQLPlus

27

Labels: , , ,

I have great news for MySQL users. A very nice side effect has emerged from the development of the NeverBlock support for MySQL. I am glad to announce the release of a new MySQL driver for Ruby applications. It builds on top of the original Ruby MySQL driver but it comes with two notable additions:
  1. Asynchronous query processing support

  2. Threaded access support

Thanks to help from Roger Pack and Aman Gupta we were able to put the thing together that you can use and test right now (on Ruby1.8 and 1.9)


To install it please do:
sudo gem install espace-mysqlplus


Then you can use it in your code as follows:
require 'mysqlplus'
mysql = Mysql.real_connect(..)
mysql.query("select sleep(1)")


The test folder of the gem contains examples for threaded and evented implementations.

The announcement page in NeverBlock shows benchmark results for running the sleeping queries in normal(blocking), evented and threaded modes. The normal mode is 10X slower, which is normal due to its inability to run queries in parallel.

Now that Rails is becoming so-so-thread-safe this should show tremendous gains with Rails deployments that use MySQL (PostgreSQL already has such facilities).

ActiveRecord meets NeverBlock

9

Labels: , , , ,

I happily announce the release of the first NeverBlock enabled activerecord adapter. The neverblock-postgresql-adapter. This is a beta release but I have been testing it for a while now with great results.

And while this is a big improvement it only requires you to replace the driver name in the connection to neverblock_postgresql instead of postgresql as described in the official neverblock blog

To make a long story short, this enables active record to issue queries in parallel, much like in a multi-threaded application. But this has several advantages over multi-threaded operations:

  1. Fibers are cheaper than threads so this solution is theoretically faster.

  2. NeverBlock does not require full thread safety, just avoid using globals and static variables for transient state.

  3. It integrates nicely in evented programs thus eliminating the performance drop which occurs with the introduction of threads in such environments


I have benchmarked this against the plain postgresql adapter using different workloads categorized as follows

Very Light : A single count statement
Light : A single count and a create
Moderate : 2 counts and a create wrapped in a transaction that rolls back
Heavy : 3 counts, a create and an update wrapped in a transaction that commits
Very Heavy : 3 counts, one conditional count (on a non-indexed field), a create and two updates all wrapped in a transaction that commits

(if you are wondering why these queries in particular, they were extracted from some other code)

All were issued 1000 times

The results came as follows:



As you can see, NeverBlock::AR is persistently faster than vanilla AR. It appears that such work loads generate linear increase for both AR and NeverBlock::AR as the NeverBlock advantage was almost the same



Another benchmark was performed to test the effect of increasing the connection count for NeverBlock::AR. We tested with 2, 4, 8, 16 and 32 connections.

The benchmark consisted of first running "select 1" 5000 times and then running "select sleep(10)" "select sleep(1)" 20 times for each configuration.



As you can probably guess, increasing connection count has very little effect if the queries are all very fast (you cannot beat "select 1") but if the queries are all slow, you will be able to double the performance by simply doubling the connection count.

I hope this gives you a glimpse of what's coming next. Watch this space

101 Reasons Why PostgreSQL is a better fit for Rails than MySQL

26

Labels: , , ,

1 - Indexing Support

MySQL cannot utilize more than one index per query. I believe this is worth repeating: MySQL CANNOT UTILIZE MORE THAN ONE INDEX PER QUERY. Wait till your tables get large enough and this will surely hit you. OTOH PostgreSQL can use multiple indices per query which come real handy.

2 - Full Text Indexing Support

MySQL can do full text indexing on MyISAM tables only, those working with InnoDB tables are out if luck. PostgreSQL has very advanced full text indexing capabilities wich enable you to control the tiniest details down to the stemming strategy.

3 - Asynchronous Interface

MySQL drivers are very unfriendly to the Ruby interpreter. Once a command is issued they take over until they come back with results. PostgreSQL sports a completely asynchronous interface where you can send queries to the database and then tend to other matters while the query is being processed by the server. The good news is that an Async ActiveRecord adapter for MySQL is being developed right now, as part of the rapidly growing NeverBlock library.

4 - Ruby Threading Aware

PostgreSQL dirvers enable the Ruby thread scheduler while IO requests are being processed (a nice side effect of the async interface). Which makes it much better suited for multithreaded Rails apps.

5 - Multistatements Per Query

Both MySQL and PostgreSQL support sending multiple statements separated by semi colons at once. But the returning result will be that of the last statement in the group. Now did you know that by using the async interface you can send multiple queries at once and then get back the results, one by one? One of the coolest features of the coming ActiveRecord (and Sequel btw) adapter is it's support for queuing queries to be consumed by a pool of connections. A trick we are contemplating working on is to group consequent selects together and send them in a single request to PostgreSQL and then later extract the results associated with each one of them. This is still very theoretical but should be verified soon.

Now that the 0b101 reasons are told I rest my case.

NeverBlock, much faster IO for Ruby

10

Labels: , , , ,

At eSpace we have just released an alpha version of NeverBlock. A library that aims to bring evented IO to the masses. It does so by wrapping all IO in Fibers which handle all the async aspects and hides them totally from the developers.

Just as a teaser, here are some benchmarks of running PostgreSQL queries with and without NeverBlock



10x performance boost? how about that?

I am working on extending the NeverBlock library now, watch this space for great news soon

Document Matching In Ruby

18

Labels: , , ,

With the debut of the new version of meOwns we have introduced several features that are concerned with how objects are related to each other. In the user profile page, you get a list of other user that are similar to him/her. And you are faced with the chemistry meter which tells you how much you are related to this user (if you are logged in of course). If you happen to be viewing your own profile page you will get a list of recommended items that you might be interested in. Last but not least, when you view an item, you get a list of similar items.

In this article I will be talking about the features from an implementation point of view. Naturally the first hurdle was to define the problem. What we needed was to find a way to match items and users. So first we needed to represent them in a way that can be matched. We started from the items first and looked at how to match two items together.

What is an item? In meOwns an item is simply a name, a description, a type and some tags. We decided to ignore photos and comments from the item. Each of those fields gets a weight which affects the value of terms found in it. Here is a sample product (encoded in Yaml):
Name : Fiat Sienna
Type : Car
Description : 1.6 HP, not bad for a sedan, relatively good performance for the price, best sedan i've bought
Tags : Cars Fiat Silver
The above fields are then processed to extract relevant terms from them, this is done in the following manner

  1. Remove punctuation and non alphanumeric characters (replace them with spaces)

  2. Collapse spaces and split the text on them

  3. Match the generated list of terms to a stop word list to remove them (words like "on", "the" should not be considered in the index)

  4. The remaining terms are converted to lower case and then converted to their stem representation (we use a snowball stemmer for now)


The above can be represented as follows:
Attribute : Value
===================
Name : fiat sienna
Type : car
Description : hp sedan performance price sedan buy
Tags : car fiat silver
After doing so, we create a list of terms with their frequencies, each field has a frequency multiplier according to its significance. Assumming a multiplier value of 1 for all the fields
Term : Frequency
=======================
fiat : 2
car : 2
sedan : 2
sienna : 1
performance : 1
silver : 1
buy : 1
hp : 1
This Term-Frequency vector is the basis of doing item matching in meowns. Several different approaches can be implemented to reach an item representation. Even the details of a given approach can vary significantly. I have chosen to stick to the easiest approach in the initial implementation. Those willing to dig further are free to lookup more into document representation and indexing strategies.

User representation is just an aggregation of their item (and wished item) representations. This way user and items share the same term vector structure and hence we can match users to items as much as we can match items to items and users to users.

The matching process involves further encoding of the term frequency vector. Those in the academia refer to this as TF-IDF (Term Frequency, Inverse Document Frequency) representation. In lay man's terms this is a representation of how significant a term is to the certain document.

It is composed of the product of two parts. This first which is TF (Term Frequency) is simply the frequency of the term in the document divided by the total sum of frequency of terms in the same document.

The second (IDF) is the total number of documents in the corpse (the document store) divided by the number of documents which contain this term. If the term is found in all documents then the IDF will be equal to 1 and hence won't have an effect on the final product of the two parts. On the other hand, if we have a corpus of 1,000,000 documents and only one with the given term then it will multiply the TF value by 1,000,000 which is significant.

A slight variation (widely used) is to multiply the TF with the Logarithmic value of the IDF. After we are done calculating TF-IDF values for all the terms in term vectors matching can be done as follows
Term Vector A vs. Term Vector B = cos a = (A . B) / (|A|.|B|)
A.B = dot product for the two vectors of TF-IDF values
|A|.|B| = scalar product of the magnitudes of the two vectors
The returned value is called the cosine similarity between the two items. It ranges between 0 and 1 where zero means no correlation and one means exact match. We are still experimenting with threshold values but these are essentially the figures you get when you see another user's chemistry meter for example. In another installment we will discuss how are we implementing behind the scenes matching of users and items in an efficient way.

Just to justify calling the post document matching in Ruby, here's a Ruby code to implement the above
# Monkey patch string to be able to extract terms from any string
Class String
def to_terms(boost = 1, terms = {})
# remove all non letters and reject stop words
terms_list = self.gsub(/(\s|\d|\W)+/u,' ').rstrip.strip.split(' ').reject{|term|$stop_words.include?(term)}
# transform to a hash with a frequency * boost value
terms_list.each do|term|
if terms[term]
terms[term] = terms[term] + boost
else
terms[term] = boost
end
end
terms
end
end

#our item class, which we match upon
Class Item
def to_terms(terms = {})
#a hash of attributes to serialize
{:name => 10,
:description => 3,
:type_name => 1,
:tag_names => 1}.each_pair do |field, boost|
terms = send(field).to_terms(boost, terms)
end
terms
end
end
The above methods enable us to extract the terms from different items. The first method refers to a global (bad me) stop word list which should be present.

Now that we have the items represented as term frequencies we can generate their tf-idf vectors (we will keep them as hashes though) and we can use them to do the matching
Class Item
def to_tf_idf
#assume we have a method that returns the df value for any term
terms = self.to_terms
total_frequency = terms.values.inject(0){|a,b|a+b}
terms.each do |term,freq|
terms[term]= (freq / total_frequency) * self.df(term)
magnitude = magnitude + terms[term]**2
end
terms, magnitude
end
def match(item)
my_tf_idf, my_magnitude = self.to_tf_idf
his_tf_idf, his_magnitude = item.to_tf_idf
dot_product = 0
my_tf_idf.each do |term,tf_idf|
dot_product = dot_product + tf_idf * his_tf_idf[term] if his_tf_idf[term]
end
cosine_similarity = dot_product / (my_magnitude * his_magnitude)
end
end
Pretty easy, now you get a value between 0 and 1 that represents how similar those two items are.

The case for a nonblocking Ruby stack

12

Labels: , , , , , , , , , ,

In a previous post I talked about the problems that plauge the web based Ruby applications regarding processor and memory use. I proposed using non-blocking IO as a solution to this problem. In a follow up post I benchmarked nonblocking vs blocking performance using the async facilities in the Ruby Postgres driver in combination with Ruby Fibers. The results were very promising (up to 40% improvement) that I decided to take the benchmarking effort one step further. I monkey patched the ruby postgres driver to be fiber aware and was able to integrate it into sequel with little to no effort. Next I used the unicycle monorail server (the EventMachine HTTP server) in an eventmachine loop. I created a dumb controller which would query the db and render the results using the Object#to_json method.

As was done with the evented db access benchmark, a long query ran every n short queries (n belongs to {5, 10, 20, 50, 100}). The running application accepted 2 urls. One ran db operations in normal mode and the other ran in nonblocking mode (every action invocation was wrapped in a fiber in the latter case)

Here are the benchmark results

Full results

Comparing the number of requests/second fulfilled by each combination of blocking mode and conncurrency level. The first had the possible values of [blocking, nonblocking] the second had the possible values of [5, 10, 20, 50, 100]



Advantage Graph

Comparing the advantage gained for nonblocking over blocking mode for different long to short query ratios. Displaying the results for different levels of concurrency



And the full results in tabular form

Concurrent Requests
Ratio 10 100 1000

1 To 100 Nonblocking 456.94 608.67 631.82
1 To 100 Blocking 384.82 524.39 532.26
Advantage 18.74% 16.07% 18.71%

1 To 50 Nonblocking 377.38 460.74 471.89
1 To 50 Blocking 266.63 337.49 339.01
Advantage 41.54% 36.52% 39.20%

1 To 20 Nonblocking 220.44 238.63 266.07
1 To 20 Blocking 142.6 159.7 141.92
Advantage 54.59% 49.42% 87.48%

1 To 10 Nonblocking 130.87 139.76 195.02
1 To 10 Blocking 78.68 84.84 81.07
Advantage 66.33% 64.73% 140.56%

1 To 5 Nonblocking 70.05 75.5 109.34
1 To 5 Blocking 41.48 42.13 41.77
Advantage 68.88% 79.21% 161.77%

Conclusion

In accordance with my expectations. The nonblocking mode outperforms the blocking mode as long as enough long queries come into the play. If all the db queries are very small then the blocking mode will triumph mainly due to the overhead of fibers. But nevertheless, once there is a even single long query for every 100 short queries the performance is swayed into the nonblocking mode favor. There are still a few optimizations to be done, mainly complete the integrations with the EventMachine which should theoritically enhance performance. The next step is to integrate this into some framework and build a real application using this approach. Since Sequel is working now having Ramaze or Merb running in non-blocking mode should be a fairly easy task. Sadly Rails is out of the picture for now as it does not support Ruby 1.9 yet.

I reckon an application that does all its IO in an evented way will need much less processes per CPU core to make full use of it. Actually I am guessing that a single core can be maxed by a single process. If this is the case then I can live much happier if I can replace the 16 Thin processes running on my server with only 4. Couple that with the 30% memory savings we get from using RubyEE and we are talking about an amazing 82.5% memory foot print reduction without sacrificing performance.

Ruby Fibers Vs Ruby Threads

15

Labels: , , , ,

Ruby 1.9 Fibers are touted as lightweight concurrency elements that are much lighter than threads. I have noticed a sizbale impact when I was benchmarking an application that made heavy use of fibers. So I wondered what If I switched to threads instead? After some time fighting with threads I decided I needed to write something specific for this comparison. I have written a small application that would spawn a number of fibers (or threads) and then would return the time went into this operation. I also recorded the VM size after the operation (all created fibers and threads are still reachable, hence, no garbage collection). I did not measure the cost of context switching for both approaches, may be in another time.

Here are the results for creation time:



And the results for memory usage:



Conclusion

Fibers are much faster to create than threads, they eat much less memory too. There is also a limit on the number of threads for 1.9 as I maxed on 3070 threads while fibers were not complaining when I created 100,000 of them (but they took 203 seconds and occuppied a whoping 500MB of RAM).