Generators & Generator Expressions in Javascript 1.8

1

Labels: , ,

Javascript has been going through a step by step evolution for a while now. Many people are unaware that new Javascript features are being added to almost every new firefox major release. And while most references are still quoting the 1.5 release, 1.6, 1.7 and even 1.8 are out now and can be used today.

One of those new features (an exciting one) is the introduction of generators. In layman's terms generators are: pause and resume for your methods. How is that? A generator is simply a normal method, but one that has the ability to yield back control to the caller while maintaining its state for future runs. This is not a very accurate description as it will not yield back to the method caller but actually to those who call next() on it. Confused already? let's use an example to make things clear.
//fibnacci example, stolen right from the mozilla docs
function fib() {
var i = 0, j = 1;
while (true) {
yield i;
var t = i;
i = j;
j += t;
}
}

var g = fib();
for (var i = 0; i < 10; i++) {
document.write(g.next() + " ");
}

which results in:
1 1 2 3 5 8 13 21 34 55

Before you get lost in the above code, here is a quick description of what happens:

  1. Javascript knows the above fib function is a generator (becuase it encloses the keyword yield)

  2. When you call a generator function, any parameters you send in the call are bound

  3. Rather than executing the method body it returns a generator iterator, one which you can call some of the iterator methods on (like next, send and close)

  4. The loop outside the function is run and g.next() gets called

  5. Whenever g.next() is called the fib function body gets executed, until it reaches the yield keword, at this point it returns control back to the caller of next() while its state remains intact.

  6. The result of the expression following the yield is returned to the caller of next (this is what is being generated by the generator)

  7. Subsequent calls to next() will cause the function to continue right after the yield keyword and yields control back again when it re-encounters it.

You can think of generators as a interruptable transformations. They are usually used to generate a transformation of some iteratable data while giving the callers control of when (or if) they are allowed to move forward with this generation.

Building on this, a new feature was introduced to make your life even easier. Generator expressions; Instead of having to write a generator functions it is possible to describe your transformation as a short hand in-place expression.

Consider the following generator function (also stolen from Mozilla but modified this time)
function square(obj) {
for each ( var i in obj )
yield i*i;
}

var someNumbers = {a:1,b:2,c:3,d:4,e:5,f:6};

var iterator = square(someNumbers);
try {
while (true) {
document.write(iterator.next() + " ");
}
} catch (error if error instanceof StopIteration) {
//we are done
}

this results in:
1 4 9 16 25 36

This square function will iterate over the hash values (using the for each..in statement) and will generate the square of the current hash value and yield control back to the caller.

In this case the generator function is merely doing a very simple transformation. Thus we can easily replace it by a generator expression.

Like this example (for the third time, stolen and modified from mozilla.org):
var someNumbers = {a:1,b:2,c:3,d:4,e:5,f:6};

var iterator = (i * i for each (i in someNumbers));
try {
while (true) {
document.write(iterator.next() + " ");
}
} catch (error if error instanceof StopIteration) {
//we are done
}

This line :
var iterator = (i * i for each (i in someNumbers));
Is what we call generator expressions. This is exactly like the above generator function. It returns an iterator (the assignment) that when its next method is called it does a transformation (the expression i * i) in some sort of a loop (the for each..in statement) and returns control back to the caller after each iteration (implicitly yielding the expression result).

And there is more to generator expressions. They actually have a neat way of yielding only under some condition and the Javascript 1.8 developers (thanks Brendan et. al) came up with a cool Ruby like conditioning.

Say you only wanted to get the squares of the even numbers in the list, the above generator expression will be rewritten as:
var iterator =
(i * i for each (i in someNumbers) if (i%2==0));
sweet!

Aspect Oriented Javascript, revisited

2

Labels: ,

Tonight I was experimenting a bit with a minimalist implementation of advising in Aspect Oriented Javascript. I Implemented the three basic advices (before, around and after). For this naive implementation I slammed the functions in the Object's prototype. This way I had access to them in all objects in the system.

Here's how to use them:
Dog.before('bark',function(){alert('going to bark')});
Dog.after('bark',function(){alert('done barking')});
User.prototype.before('login',function(){alert('logging in!')}

The actual code written is very small (20 lines, less if you discount lines taken by braces)
Object.prototype.before = function(func, advice){
var oldfunc = this[func];
this[func] = function(){
advice.apply(this,arguments);
oldfunc.apply(this,arguments);
}
}

Object.prototype.around = function(func, advice){
var oldfunc = this[func];
this[func] = function(){
var myargs = [oldfunc];
for(var i=0; i < arguments.length;i++){
myargs.push(arguments[i])
}
advice.apply(this,myargs);
}
}

Object.prototype.after = function(func, advice){
var oldfunc = this[func];
this[func] = function(){
oldfunc.apply(this,arguments);
advice.apply(this,arguments);
}
}
This way you can add any sorts of filters to your JavaScript methods. Enhancing on this is to be able to remove those filters once added and to add a filter to all functions of an object (recursively) at once.

We also need some fail safety against users trying to advice non functions or even undefined properties.

Tom, the cat we all love

9

Labels: , , ,

You have your Java web app hot from the oven. Looking around you see this fat cat (we call it Tom, Tomcat) sitting around the corner. You hand it the app hoping that it will do a good job of serving it.

But how does Tomcat manage to serve pages from our Java web app? Simple, Tomcat listens on a certain port (default is 8080) and accepts requests, spawning a thread for each request to be handled when it reaches the maximum number of allowed threads (the maxthreadcount parameter and it defaults to 150) it queues the incoming requests until a thread is free so it can hand over a request to it. When threads get idle, Tomcat will start killing them till it reaches the max spare threads count (maxsparethreads, defaults to 75)

Sounds good. That means that a default Tomcat instance can handle up to 150 requests in parallel. Meaning it will spawn up to 150 threads. Which is a good thing. The more threads, the more parallel processing we can do.

WRONG! Because of limits imposed by your combination of hardware and software the above naive statements are not true. Manly due to:

Hardware: You can have only have n running threads, where n is the number of your cpu cores. Other threads are waiting until the scheduler preempts the current ones and permits them to run.

VM: JVM uses native threads (it used green threads in the past) which means that creating threads is not a very small process. In reality the cost associated with it is a bit high.

OS: Context switching is usually a heavy operation as well. When you have many thread more than your cores you will be dealing with much of those operations.

Here is a scenario: You use Tomcat with its default settings on a Quad core machine to serve your web applications. Your website is attacked and get sustained 150+ concurrent requests. Thus Tomcat spawns his max thread limit of threads (150) and attempts to serve all the coming requests.

Since you only have 4 cores. Only 4 threads can be active at a time, neglecting the Tomcat process and any other system processes then we have our 4 cores being fought for by 150 threads. Many threads will be waiting for I/O (hopefully hardware accelerated) most of the time. Thus a single core will be able to handle more than 1 thread depending on the speed of that core and the amount of time the threads are waiting.

I would say that a single core can cope with 5 to 10 threads (processing web requests) with negligible context switching penalty. Having more than that will result in too many context switches for threads congested on the core. With the default Tomcat settings, a cpu core will be handling 37 threads on average. This will lead to poor performance under heavy load and will slow down the application rather than help it run faster

So, what should we do with the maxthreads setting?

  1. Start from an informed position, knowing how much cores in your system you can just throw a suitable amount by following my (rather simplistic) approximation above of 5 to 10 (it is a guesstimate and it may turn out very bad for your specific case so don't say I didn't warn you) or ..

  2. Use a benchmarking tool, like Apache bench (from www.apache.org) and start testing your typical workload on your production machine with 1 thread per core setting. Record you requests/second and then redo the tests with more threads added. Stop adding threads when you can't get better performance. At this point, if you are not satisfied with your performance you can either:

    1. Get faster hardware
    2. Optimize your application and redo the benchmarking again
    3. Both of the above

Mixing Asset and Page Caching In A Multiserver Setup

3

Generally you would set an expires header for your assets so that your clients would retrieve them from their local caches and not bother your servers for a good amount of time.

Setting this for JavaScript and Image files causes your site to feel much faster on pages with many images and JavaScript files. But what do we do when any of those static resource change? For Images I usually change the filename with the new version and change the reference. For JavaScript files a common practice is to append some version information to the file name, usually a time stamp of the last modification date. This way when a file changes the reference to it changes as well and clients no longer use the old the cached resource and they will request the new one.

The simplistic time stamping approach works fine on a single server setup. When you add more servers you will find that you will need a more distributed safe way other than time stamping. One such way is to use your repository's revision file number. As long as you consistently deploy to all the machines you will have the same revision number on all the servers. In that case your files can look like this, application_235.js and common_42.js

Another issue arises with caching. If you are caching your entire responses (in memcached for example) and it references a Javascript file which happens to change its version then the response will keep asking for the older version rather than the new one. This can easily be solved by appending the application revision number to the cache key, i.e. "/users/1235/profile.html_1269". This way whenever the revision is upped your application will look for the latest ones in the caches and the older ones will auto expire (if you are using a cache store with auto expire capability like memcached)

Now, just relax and watch your web server serving static files blazingly fast while you are assured that everything is in sync.

Ponder This

1

Labels: , , ,

I happened to stumble on this (kinda old) page where the author is comparing Rails to Groovy performance. I was reading his comments, he was generally concluding that Groovy is faster than Rails. I didn't look at the graphs until I saw his comment that 10 Mongrels behind pound was much slower than a single Mongrel!!. I double checked the graph and this was the case! The 10 Mongrel setup is much slower in all the cases than a 1 Mongrel instance.

I was wondering where is the catch. Rereading the article I spotted it with little effort. Here's a ponder-this for those who read this blog (both of you), what did the guy do to screw up the performance figures that bad? If you get it, please add a comment with your answer.

Given his numbers, an appropriate Rails setup will make Rails suddenly faster (I don't mean any magic tricks, just fixing his fatal mistake). But the difference will not be that big anyway.

Update: seemingly no one discovered (or cared to discover?) the mistake, so here it goes. The lad used 10 Mongrels on a 1GB Ram machine that also ran mysql, OS X and whatever else he got running, he simply ran out of memory and started swapping. The numbers for the 10 Mongrel setup were including the disk swapping penalty. Couldn't he just listen to his drive or see a blinking led?