Digging through some of the old Python archives I found really interesting post by Mike Pall of LuaJit's fame ( the post can be found here ). Pall was talking about possible ways to improve Python's performance. One of his findings was that the branching code was one of the major culprits and that affected Python's runtime performance. This is mainly due to the fact that it was not optimized for the commonly used paths. This is understandable, since a compiler like GCC will have no prior knowledge on how the binary will perform. So branch predicition remains as such, a predicition that might be skewed from the true state of affair. In his email Mike offered a technique to show that with a little compiler trickery this effect could be a little marginalized. I followed his steps but this time I did it for the Ruby 1.9.1 p129 binary.
First I ran configure on the source code and in the Makefile I added -fprofile-arcs to optflags. Then make was run which resulted in a Ruby binary, lots of .o files and lots of .gcda and .gcno files.
What we have now is a binary that will profile the branching behaviour of the application. Next we need to run the binary in a real life situation such that we it can record how branching will be done in the wild. I tried a simple benchmark applicaition that included lots of math and string operations. After doing that the make file was changed once more, this time -fprofile-arcs was replaced with -fbranch-probabilities. This asks the compiler to use the branching probabilities that were generated from the previous run.
All .o files were deleted and make was run again resulting in a shiny new ruby binary that is ready for testing.
I ran the new binary head to head against the regular one. I used several ruby scripts for that matter and the result was that the new binary was almost consistently faster than the old one, usually by a little margin but for one benchmark the difference reached 40%. Even though the code samples differed widely from the one used in the profiling it was apparent that the VM itself gained a little from this optimization which would cause all Ruby scripts to gain as well.
At the end of the day, an average 5% improvement is not something to brag about but the fact remains that this shows that the Ruby VM still has a way to go. Considering that a silly, mostly static attempt did that I think that a JIT engine with tracing optimizations can do wonders for this little language. Since it will adapt to the application usage patterns and should be able to eliminate lots of the slow code paths in ways that static optimization cannot predict before hand.
First I ran configure on the source code and in the Makefile I added -fprofile-arcs to optflags. Then make was run which resulted in a Ruby binary, lots of .o files and lots of .gcda and .gcno files.
What we have now is a binary that will profile the branching behaviour of the application. Next we need to run the binary in a real life situation such that we it can record how branching will be done in the wild. I tried a simple benchmark applicaition that included lots of math and string operations. After doing that the make file was changed once more, this time -fprofile-arcs was replaced with -fbranch-probabilities. This asks the compiler to use the branching probabilities that were generated from the previous run.
All .o files were deleted and make was run again resulting in a shiny new ruby binary that is ready for testing.
I ran the new binary head to head against the regular one. I used several ruby scripts for that matter and the result was that the new binary was almost consistently faster than the old one, usually by a little margin but for one benchmark the difference reached 40%. Even though the code samples differed widely from the one used in the profiling it was apparent that the VM itself gained a little from this optimization which would cause all Ruby scripts to gain as well.
At the end of the day, an average 5% improvement is not something to brag about but the fact remains that this shows that the Ruby VM still has a way to go. Considering that a silly, mostly static attempt did that I think that a JIT engine with tracing optimizations can do wonders for this little language. Since it will adapt to the application usage patterns and should be able to eliminate lots of the slow code paths in ways that static optimization cannot predict before hand.
What about using -fprofile-use ? Did you not use that because it used a little too much experimental optimization "magic"?
yeah I'm interested in how much benefit profile guided optimizations would be
There was a study on it here:
http://web.archive.org/web/20070827013059/http://www.jhaampe.org/software/ruby-gcc