oldmoe: Tom, the cat we all love

Tom, the cat we all love

Posted by oldmoe Saturday, February 09, 2008 2/09/2008 08:03:00 PM

Labels: java , multicore , threads , tomcat

You have your Java web app hot from the oven. Looking around you see this fat cat (we call it Tom, Tomcat) sitting around the corner. You hand it the app hoping that it will do a good job of serving it.

But how does Tomcat manage to serve pages from our Java web app? Simple, Tomcat listens on a certain port (default is 8080) and accepts requests, spawning a thread for each request to be handled when it reaches the maximum number of allowed threads (the maxthreadcount parameter and it defaults to 150) it queues the incoming requests until a thread is free so it can hand over a request to it. When threads get idle, Tomcat will start killing them till it reaches the max spare threads count (maxsparethreads, defaults to 75)

Sounds good. That means that a default Tomcat instance can handle up to 150 requests in parallel. Meaning it will spawn up to 150 threads. Which is a good thing. The more threads, the more parallel processing we can do.

WRONG! Because of limits imposed by your combination of hardware and software the above naive statements are not true. Manly due to:

Hardware: You can have only have n running threads, where n is the number of your cpu cores. Other threads are waiting until the scheduler preempts the current ones and permits them to run.

VM: JVM uses native threads (it used green threads in the past) which means that creating threads is not a very small process. In reality the cost associated with it is a bit high.

OS: Context switching is usually a heavy operation as well. When you have many thread more than your cores you will be dealing with much of those operations.

Here is a scenario: You use Tomcat with its default settings on a Quad core machine to serve your web applications. Your website is attacked and get sustained 150+ concurrent requests. Thus Tomcat spawns his max thread limit of threads (150) and attempts to serve all the coming requests.

Since you only have 4 cores. Only 4 threads can be active at a time, neglecting the Tomcat process and any other system processes then we have our 4 cores being fought for by 150 threads. Many threads will be waiting for I/O (hopefully hardware accelerated) most of the time. Thus a single core will be able to handle more than 1 thread depending on the speed of that core and the amount of time the threads are waiting.

I would say that a single core can cope with 5 to 10 threads (processing web requests) with negligible context switching penalty. Having more than that will result in too many context switches for threads congested on the core. With the default Tomcat settings, a cpu core will be handling 37 threads on average. This will lead to poor performance under heavy load and will slow down the application rather than help it run faster

So, what should we do with the maxthreads setting?

Start from an informed position, knowing how much cores in your system you can just throw a suitable amount by following my (rather simplistic) approximation above of 5 to 10 (it is a guesstimate and it may turn out very bad for your specific case so don't say I didn't warn you) or ..
Use a benchmarking tool, like Apache bench (from www.apache.org) and start testing your typical workload on your production machine with 1 thread per core setting. Record you requests/second and then redo the tests with more threads added. Stop adding threads when you can't get better performance. At this point, if you are not satisfied with your performance you can either:

Get faster hardware
Optimize your application and redo the benchmarking again
Both of the above

Comments (9)

Unknown said on February 9, 2008 at 8:23 PM

I have comments about the points you listed that cost of using many threads:

1. Hardware : i think the n to n mapping or even the 10n-15n to n mapping is not right for many reasons (a). as you said the I/O i think this alone will raise this ratio (b). thread pipelining (c). most of the hardware vendors implement there hardware for optimized multi-threaded processing with minimal cores, so the addition a single core could support more threads as exponential. even hardware vendors these days designing there single core to be multi-threaded on the hardware level. (d). even if the physical threads could not be run simultaneously, this can be a good logical partitioning( i mean threads ) for the processor to figure out how it can run tasks in parallel

2. VM : yes I'm with you that the thread creation cost is high but here we can use ( I think tomcat is using ) thread pooling that the threads are created at the startup and waiting for doing there jobs right away, also i think that the memory resident threads that are idle and listening on an incoming task do not have much cost (i.e. I have now on my laptop about 500 open threads and the performance is very good)

3. OS : i think threads are originally made to hand over this costy operation ( context switching ) to the process, so the context switching is done on the process level no the OS level and this type of context switching is taking no much cost - also thanks to the support of the hardware.

With: I think it is a good idea making a benchmark for our applications so we can see to what extent of multi-threading could degrade the application performance. I want to contribute - do you have an application that is ready for the benchmark.

Seba3y said on February 9, 2008 at 9:01 PM

I think that the optimization path for medium-sized apps (one that would get hit with 150 simultaneous requests and keep that level for an extended time) has changed, with the current cheap cost of hardware,
the first path for scalability wouldnt be optimizing the current machine, it would be to try to cache as much data/pages as possible on a (JMX based cache or even memcached) cache farm (can be just one or two servers at first), and push all IO to a SAN. If the app is serving media I would say a CDN subscription is in order. Probably in the current world of linux-based boxes, it is much easier to add a new machine to the cluster than spend time tweaking the current one.

michaelyta said on February 9, 2008 at 10:38 PM

With Sebaey specially in the point of caching, there is two levels of caching can be utilized,

1. Low level caching: this can be achieved by just utilizing multi-threading as threads are using a strategy for thread caching

2. High Level caching: The bottle neck of the web content delivery is the dynamic content, which require data processing. A good approach to solve this is having a web application proxy that has a cache repository and a cache manager that caches the initially processed dynamic content, and for further dynamic content requests the cached data could be fetched and the remainder data could be fetched by a remainder query to the db. This is a modern approach to offload the demand on a database-driven web applications

oldmoe said on February 9, 2008 at 11:50 PM

Hi Michael,

1. N to N is of course far from practical. What is really practical depends a lot on your workload and whether it is computation bound or I/O bound. Lots of threads that are compute intensive competing over the CPU is not usually a good thing.

2. The JVM uses native system threads. In which model it depends on the host system for thread creation and context switching and these are none trivial.

3. Idle threads are one thing (you mention 500 on your machine) and say 500 threads that are actually handling requests at the same time are totally different situations. In the first one the memory is the only resource being utilized. Come the second situation and you will find that there is a significant decrease in performance due to thread competition over resource and the scheduler trying to be fair among them all (when it could have been better if some of the requests could have waited outside till the current sets gets done)

There wont be difference if the application is facing moderate load. But as load gets high too many threads will hurt performance rather than improve it.

oldmoe said on February 10, 2008 at 12:13 AM

Sebaey,

My hypothesis assumes a fairly optimized application. One cannot fathom that a mid-sized app would actually go online without proper caching (be that on the model, action or even the entire request/response cycle).

I suppose that by just setting the right number of threads we might gain 10 to 20% more performance under load form each box in the cluster.

But I must stress that environment tweaking comes as a distant second after application optimization.

michaelyta said on February 10, 2008 at 12:20 AM

Thanks Muhammad for replying, but I have further comments:

1. I think most of the web application is I/O bound.

2. Yes, Thread creation is not trivial but Tomcat is using thread pooling, so all the overhead of thread creation will be accomplished at the startup, so for an application that have 150 simultaneous requests and keeping that level will keep the pool of the threads and not destroy/recreate them.

can you confirm my assumptions above?

with you, overloading the server with threads will degrade performance, but we need to know where is the point of the peak performance.
I think we really need to make this benchmark.

oldmoe said on February 10, 2008 at 12:29 AM
	Michael, 1 - Agreed, that is why you can only gain in the 10 to 20% range, significant but not earth shaking. 2 - Agreed again, thread creation only happens once (not for every request). I was referring to this alongside the context switching cost when describing native threads. But yes surely thread pooling removes the former.

michaelyta said on February 10, 2008 at 12:47 AM
	Yeah, I got the point, but why JVM is using native threads?, and why it ditched its green ones?

oldmoe said on February 10, 2008 at 2:23 AM
	green threads are not able to migrate to another processor. They run within the parent process and all resource allocation is done via the parent. So they cannot take advantage of multi processors. High cost for native threading encouraged the creation of techniques like fibers and coroutines. They offer means of process (or thread) bound parallelism that augments the high cost of threads

Tom, the cat we all love

Comments (9)

About Me

Blog Archive

Followers

Labels

Site Feed

Links

Blog Roll