Monday, October 23, 2006

Benchmarks

A lot of people has been asking me about some performance comparison for the vector graphics framework we have. Rendering polygons, especially when we're dealing with stroke, tends to be the most expensive rendering operation performed in vector graphics. I constructed a little test, which tests raw polygon rendering power of Qt and Cairo.

For the test I used the latest Qt main branch, and the master branch from Cairo's Git repository.

The test is composed of rendering three complex polygons. The first one is a text path, the second is a small polygon with a large amount of vertices that fall on the same scanline and the third one is a huge polygon with about 100000 vertices.

The results measure frames per seconds at which each framework was capable of rendering the given testcase. Therefore the larger the better.

I tried being as objective as possible. All tests go through the whole pipeline, meaning I tried to make sure that the framework doesn't cache too much and actually renders what's being asked. I used latest version control code for all frameworks. The data used in the tests is available here: http://ktown.kde.org/~zrusin/complex.data (newline separated polygons whose coordinates are comma separated). All tests were written to utilize antialiasing.

Oh and all tests have been done on a machine with Pentium(R) 4, 3.20GHz processor, 1 GB RAM and NVIDIA 6600 with 1.0-9625 drivers.

Having said that the results were (charts follow):
First just pure Cairo vs Qt native performance:

Qt was respectively 7, 5 and 6 times faster. Than Cairo in those plain tests. This is a direct result of Qt's new wicked tessellator in 4.3.

But all the frameworks have many backends, the most interesting one being the OpenGL backend. For 4.3 I devised a new method of rendering polygons for the OpenGL, based on stencil clipping. So lets see how Qt's OpenGL engine compares to Qt's native XRender engine:

The difference is huge. Qt OpenGL rendered the first polygon at 487 frames per second vs 76 in Qt XRender engine. Note though, that in XRender we tessellate which is of NlogN complexity, in OpenGL I'm able to render polygons in a completely linear fashion. The method does deteriorate for polygons with 80000+ vertices due to large amount of triangles that has to be processed - it's a GPU bottleneck though which means that with a more powerful graphics card those results would skyrocket.

Finally, lets combine the results of all the frameworks and see how they match-up:

The reason for Cairo with Glitz backend yielding the same results as Cairo with XRender backend is that polygon rendering in both of those goes through the same client-side steps all the way until the final blit and it's not the blit but the tessellation and rasterization that are the bottlenecks. I added Amanith to the results because some people mentioned it in my blog before. Both Amanith and Cairo (Cairo only with native XRender backend) crash on the last polygon. In Amanith the tessellator seems to fall apart. In the Cairo case application crashes in the XRender code, so most likely rasterization code is not keen on one the trapezoids that Cairo sends it. Cairo with Glitz backend render the last polygon at about 0.2 frames per second (but doesn't crash which again shows that it's likely XRender's trapezoid rasterization code, especially that Carl couldn't reproduce the given crash on his laptop). Interesting fact right now is that Qt with XRender is way faster at rendering polygons than Amanith and Cairo with Glitz, both of which are OpenGL accelerated.

Notes: I know Carl is working on a new tessellator for Cairo which should exhibit the same logarithmic behaviour as the current Qt one. Carl was kind enough to even send me a tarball of the branch in which he's working on it. Unfortunately, although the results for the first polygon were at about 13 FPS (2 frames per second better than the current Cairo tessellator) they were degenerating for other two polygons. This is most likely due to large precision of the new tessellator (in both #2 and #3 testcases you get vertices close enough to consider them coinciding without any visual artifacts). Once Carl will get the precision down and integrate the new tessellator I'm going to run the tests again.

Conclusion from all those tests is that right now Qt is leaps ahead of any other vector graphics framework in terms of raw performance. Nothing comes even close. Qt's OpenGL engine is so fast it's basically unfair to compare anything else to it.

Objectivity aside, Qt rocks. It really does. And if you're using Qt and not using Qt's rendering architecture, everyone should point at you and make fun of you for not having complete and utter trust in me, as the only true graphics ninja and the team of Trolltech's Samurai Graphics Assassins.

47 comments:

superstoned said...

hey zack,

this is getting boring... can't you benchmark something in Qt which is way slower than some GTK/Gnome counterpart?
i can tell ppl 'Qt is mostly faster than GTK tough not in everything' but then i need something Qt is NOT good at. if i say 'Qt is between 2 and 10 times faster than GTK on *everything*' they just won't believe me... nothing can be that good, they'll say, and listen to some gnomie telling gnome uses way less memory than the bloated KDE...

;-)

ricard said...

I'm quite new to Qt, but I can't wait to get my hands on it. Specially to this new vector graphics framework you are showing.

Is Qt 4.3 available through svn or do we have to wait for a Beta release?

Thanks for your great work!!

vladc6 said...

I wonder how Xara Xtreme compares? They claim their SVG rendering engine outperforms both Microsoft GDI+ and Cairo...

Zack said...

@vladc6: Yes, I've seen their claims. They look very sketchy. They don't mention what were they testing exactly, what was the data set, on what system or how they did the comparisons. Meaning the results are pretty much meaningless.
I provided the dataset and explained how to do the tests, if Xara folks would like to see how fast their polygon rendering code _really_ is, they can surely do it :)

Anonymous said...

Cairo 1.2.5 is a stable branch of Cairo without any optimising work done on it. Testing a stable branch of Cairo against an unstable branch of Qt (4.3) is not a fair comparison.

Zack said...

As the Legend and the descriptions say, I used the latest sources from the main branches of each project. That's why it says: "Cairo Git" in the legend :)
To make it explicit the code is from Sunday 10am, from the master branch of Cairo's Git repository.

Anonymous said...

It would be interesting to seed a comparison with Java2D (with and without OpenGL-Pipeline) added to the charts.

Anonymous said...

It would be very interesting to know on which platform the comparisson was made, though I guess that it was on GNU/Linux. I have experienced huge performance differences between Win32 and Linux, the former beeing much slower (XOrg 7.1.1 + XRrender hwacceld). In 4.0 time it used to be exactly the opposite.

Could you post your benchmarking code Zack?

It would also be nice to know the cornercases for painting on QGLWidgets. (When is the XRender painter faster than OpenGL and vice versa).

Anonymous said...

I often use FreeNX to connect to my home PC. This tool works very well with a low bandwidth network and big latencies.
Currently, I wrote this post in a KDE3 session through a FreeNX session.

After testing some Qt 4.2 applications using the new Graphics View, I was surprised that these applications (like Colliding Mice or Elastic Nodes) were very slow and unusable via FreeNX.

So I'm afraid than future Qt4 applications (and KDE4...) are unusable in a network environment.

So, is it a problem in the current Qt 4.2 (solved in 4.3) or in the current FreeNX implementation ?

Anonymous said...

There is another "mature" vector framework: Anti-Grain Geometry (http://www.antigrain.com/ - one of places where it is used is Haiku OS app server, which is kinda what X is for linux). I wonder how does it compare with Cairo and QT.

vladc6 said...

I've mentioned your benchmark to the Xara Xtreme folks.

Anonymous said...

I would be really interested in comparing with engine used by Xara which claims to be very fast, at least ...

Anonymous said...

Very interesting. I knew cairo was slow but that it was *this* slow... :( It really needs some speed improvements. Especially now that gecko uses it for all its rendering. Just imagine a smoothly animated vector based web that would be awesome.

Anonymous said...

This is excellent material for improvements. Could someone please inform John Carmack about the results? With some luck, he may be interested in helping out!

Ska said...

Ok, now we know that the tasselization task is slower than the stencil-buffer based tecniques, to draw a polygon... For my part it's a pretty obvious thing.
In the real world i see a lot of vector graphic applications that makes a huge usage of svg icons, gis maps, fonts and a lot of gfx (also animated throught the classic 2d transformation matrixes) that don't need to be re-tesselated every frame; in this common practical case, the stencil-buffer brute force approach is worst than the triangles caching approach.
Another last thing, Stencil buffer is a luxus on some Opengl(ES) mobiles chips... i know, Qt targets only desktops, but Qtopia? ;)

Sorry for my bad english,

[AD]Ska - www.amanithvg.com

Madcrow said...

Meh, so QT4.3 is fast as heck? Nothing USES it yet in the real world, what with KDE 4 being a year away and all. Why not compare QT3 (aka the one used specialized embedded crap) against GTK. I have a stinking suspiscion that it would still outperform GTK, butr still...

taj said...

The mention of Qtopia is one that I am interested in too. I don't pretend to know a lot about this stuff, but I've been wondering - how well does this stuff work on FP-starved arches like the ARM/Xscale chips used by the majority of PDAs and smartphones?

Anonymous said...

having a blazing tesselator now doesnt mean a whole lot when you think that no one except kde will ever use it. cairo will catch up in time, but the difference will be cairo will be by then it will be used by dozens of non-gnome apps (OOo, Moz, etc) and people within kde will realize they should port over to it since it became a defacto standard while they werent looking.

why is kde moving to dbus when dcop clearly came first? because dcop was Invented Here, and dbus was invented to be a standard.

kde has been the clearly first in everything, so why isnt it clearly the only technology of choice?

thats the history of the kde project: playing in your own sandbox while the rest of the worlds passes you by.

sigma said...

I know I'm feeding a troll, but what great joy of interoperability would be enabled by going with Cairo instead of Qt's own rendering? From a user's perspective it makes no difference, except that Qt apps just render faster. The DBUS story is different, as it was a least common denominator solution that would placate the "OMG OMG NOT C++!!1!" brigade in a space that actually effects users.

Come back and tell us about "the world" passing KDE by when it actually has a credible, technically superior alternative to KParts, QGraphicsView and KIO.

Anonymous said...

Zack, could you post a source code for this bechmark? I would like to make similar test for Java2D (with and without OpenL pipeline enabled).

Anonymous said...

Sigma,

I know Im feeding a Troll, but whats the point of writing a great tesselator then sitting back and laughing at cairo because no one did the same for them (yet)? You tell me how that is going to get people to use free software?

The world passed you by already because no one else uses KParts, QGraphicsView and KIO, as we speak. No one else will use Arthur either.

Enjoy your sandbox.

Trolls often don't want to get the point, even if its as obvious as the setting sun.

Anonymous said...

as it was a least common denominator solution that would placate the "OMG OMG NOT C++!!1!" brigade in a space that actually effects users.

isnt that exactly part of the necessary evil of creating and pushing cross platform free software standards?

theres kde culture right there, more interested in their own leetness and squabbling over half nothing than growing the pie to the point where it affects real people, not just internet nerds.

Anonymous said...

i don't belief. give me tests sources. i'll check it self.

Anonymous said...

i don't belief. give me tests sources. i'll check it self.

Anonymous said...

Very interesting results. I look forward to the new tesselator work going into cairo, and getting improved to reach parity with Qt.

Would you consider publishing numbers for how it looks on something other than the nvidia proprietary driver? I would tend to expect the proprietary drivers to focus on OpenGL at the expense of 2D rendering capabilities. How about the Intel or Radeon drivers?

Thanks.

Anonymous said...

"I used the latest sources from the main branches of each project" sounds like a semi-clever way to say "right, I didn't use the cairo branch where they're working on optimization". Punishing projects for using a different branch for optimization? Feh.

Also, you claim to try to be "as objective as possible", yet you only used 3 tests, at least one of which (a 100,000-vertex polygon) is (a) obviously not at all a common case, and (b) seems designed to bring out the benefits of "Qt's new wicked tessellator". Objective ... right.

Yes, I'm a fan of cairo, for various reasons. I do, however, believe that Qt/Arthur is faster right now, though with wonky tests like this it's impossible to see by how much. This is an e-tabloid, no more.

Zack said...

I tested all the branches of Cairo were speed would matter. I posted the best results. If you want to I can post results from the new tessellator and claim this is the future of Cairo (the results were a lot worse).

The three testcase are very varied, that's what makes them interesting. Qt was better at all of them. If you think you know better than me how to construct varied testcases, please send them to me.

Anonymous said...

Wich compilers did you use ?

Anonymous said...

What a lame question!
Visual Basic on a 386@33MHz for Cairo and GCC++(+++^(inf)) on an Intel HyperCore512@LightSpeed for Qt (with -Ogod of course).
Is such.. obvious :-))))

Anonymous said...

Sorry if you think it is obvious...

BTW, C++ is disavantaged with GCC. So I just wanted to know...

Anonymous said...

Digg it! http://digg.com/linux_unix/Exciting_V3ctor_Gr4phics_with_Qt_4_3

sigma said...

The world passed you by already because no one else uses KParts, QGraphicsView and KIO, as we speak. No one else will use Arthur either.

Ha ha! A very clever definition "of passed you by", taken to mean "ok those guys tools are great, how do we rewrite from scratch to get something as good without losing face?"

No one else will use it, except the oldest, largest and technically most impressive project to create a free software desktop.

isnt that exactly part of the necessary evil of creating and pushing cross platform free software standards?

No, it's the result of various people not being able to put their own historical mistakes aside to accept the best possible technology. This is the kind of "necessary evil" that gave us the politically correct but technologically neutered POSIX standard.

enjoy your sandbox

while you and I are wasting our lives arguing in blog comments, dozens of KDE programmers are doing exactly that, creating the next generation of free desktop for others to copy.

logicnazi said...

Jesus Christ people. Don't rag on the man for producing some benchmarks. I don't care if you are convinced this reflect real world performance don't get defensive and mean about it.

As far as 'doing something useful' posting results like this is actually a big service. Not only does it make it clear where cairo might need improvement (or at least the consequences of deliberate design decisions) it also pushes the cairo people to improve their performance.

It's a bit rich to complain that the poster isn't going and writing a tessellaters for Cairo when you aren't doing so either. At least posting bench marks is contributing more than all us random commenters.

Anonymous said...

I agree with a previous poster - you should have tried agg (antigrain geometry).

It was certainly significantly faster than Qt 4.2 when I tried it. I wanted to use QGraphicsView though so I decided to put up with the slowness.

Pēteris Krišjānis said...

Please, stop attacking messenger. Yes, Cairo is slower. Yes, improvements are on their way. No, it is not the end of Cairo and victory of Qt or vice versa. Everything goes on.

Please people don't flame about it, such competition is very good for all of us.

MacSlow said...

Greetings Zack!

Is it possible to get hands on the 4.3-branch of Qt somehow? Sofar I was only able to find 4.2.2 on Trolltech's ftp-server. I would like to try something of my own now that your presented benchmarks spurred my curiosity even more.

Best regards...

MacSlow

Anonymous said...

Source code, plz :>

Anonymous said...

can't trust without source

Anonymous said...

yes ,Source code, plz :>
can't trust without source

Anonymous said...

Qt 4.3 is impressive, but things said by some people here are true. Can you benchmark, with the amazing results of Qt 4.3, results for Qt 3.3.4?

Stephan Sokolow said...

Can we get some de-spamming here? Looks like some CAPTCHA-proxying spambots passed through.

Anonymous said...

hi, is there any such comparision for cairo 1.4 vs. qt?

Qwertie said...

Wow, look at that spam. Anyway, Zack, I'm curious, what kind of algorithm does Qt use for polygon rendering? Is it faster than AGG?

AlexDexter said...

this is getting boring... can't you benchmark something in Qt which is way slower than some GTK/Gnome counterpart?

Anonymous said...

I have rewritten fancy gears in Java2D: http://trac-hg.assembla.com/jgears/wiki
Results of comparison: Java2D 6.10 with direct3d pipeline is slower then Qt pure software rendering!

Anonymous said...

How does QT+GL stack up against Cairo+GL today?

kaomet said...

Thanks for the post.

All the benchmark were about a big or complicated polygon, are you sure the result would be similar with lot of small polys instead ? Usually, an algorithm performing better asymptoticaly tend to be worse with small input size.