Qt, and therefore KDE, deals with 3 predominant ways of rendering graphics. I don't feel like bothering with transitions today, so find your own way from beards and dwarfs to Qt/KDE graphics. Those three ways are:
- On the CPU with no help from the GPU using the raster engine
- Using X11/Xrender with the X11 engine
- Using OpenGL with the OpenGL engine
First there's the default global engine. This is what you get when you open a QPainter on a QWidget and its derivatives. So whenever you have code like
void MyWidget::paintEvent(QPaintEvent *)
you know the default engine is being used. The rules for that are as follows:
- GNU/Linux : X11 engine is being used
- Windows : Raster engine is being used
- Application has been started with -graphicssystem= option :
- -graphicssystem=native the rules above apply
- -graphicssystem=raster the raster engine is being used by default
- -graphicssystem=opengl the OpenGL engine is being used by default
- QWidget the default engine is being used (picked as described above)
- QPixmap the default engine is being used (picked as described above)
- QImage the raster engine is being used (always, it doesn't matter what engine has been selected as the default)
- QGLWidget, QGLFramebufferObject, QGLPixelBuffer the OpenGL engine is being used (always, it doesn't matter what engine has been selected as the default)
While OpenGL engine should in general never fallback, that is not the case for X11 and there are fallbacks. One of the biggest immediate optimizations you can do to make your application run faster is to assure that you don't have fallbacks. A good way to check for that is to export QT_PAINT_FALLBACK_OVERLAY and run your application against a debug build of Qt, this way the region which caused a fallback will be highlighted (the other method is to gdb break in QPainter::draw_helper). Unfortunately this will only detect fallbacks in Qt.
All of those engines also use drastically different methods of rendering primitives.
The raster engine rasterizes primitives directly.
The X11 engine tessellates primitives into trapezoids, that's because Xrender composites trapezoids.
The GL engine either uses the stencil method (described in this blog a long time ago) or shaders to decompose the primitives and the rest is handled by the normal GL rasterization rules.
Tessellation is a fairly complicated process (also described a long ago in this blog). To handle degenerate cases the first step of this algorithm is to find intersections of the primitive. In the simplest form think about rendering figure 8. There's no way of testing whether the given primitive is self-intersecting without actually running the algorithm.
To render with anti-aliasing on the X11 engine we have to tessellate. We have to tessellate because Xrender requires trapezoids to render anti-aliased primitives. So if the X11 engine is being used and the rendering is anti-aliased whether you're rendering a line, heart or a moose we have to tessellate.
Someone was worried that it's a O(n^2) process which is of course completely incorrect. We're not using a brute force algorithm here. The process is obviously O(nlogn). O(nlogn) complexity on the cpu side is something that both the raster and X11 engines need to deal with. The question is what happens next and what happens in the subsequent calls.
While the raster engine can deal with all of it while rasterizing, the X11 engine can't. It has to tessellate, send the results to the server and hope for the best. If the X11 driver doesn't implement composition of trapezoids (which realistically speaking most of them doesn't) this operation is done by Pixman. In the raster engine the sheer spatial locality almost forces better cache utilization than what could be realistically achieved by the "application tessellate->server rasterization" process that the X11 engine has to deal with. So without all out acceleration in this case X11 engine can't compete with the raster engine. While simplifying a lot it's worth remembering that in terms of cycles register access is most likely smaller or equal to 1 cycle, access to L1 data cache is likely about 3 cycles, L2 is probably about 14 cycles, while the main memory is about 240 cycles. So for CPU based graphics efficient memory utilization is one of the most crucial undertakings.
With that in mind, this is also the reason why a heavily optimized purely software based OpenGL implementation would be a lot faster than raster engine is at 2D graphics. In terms of memory usage OpenGL pipeline is simply a lot better at handling memory than the API QPainter provides.
So what you should take away from this is that if you're living in the perfect world, the GL engine is so much better than absolutely anything else Qt/KDE have it's not even funny, X11 follows it and the raster engine trails far behind.
The reality with which you're dealing with is that when using the X11 engine, due to the fallback you will be also using the raster engine (either on the application side with Qt raster engine or the server side with Pixman) and unfortunately in this case "the more the better" doesn't apply and you will suffer tremendously. Our X11 drivers don't accelerate chunks of Xrender, the applications don't have good means of testing what is accelerated, so what Qt does is simply doesn't use many of its features. So even if the driver would accelerate for example gradient fills and source picture transformations it wouldn't help you because Qt simply doesn't use them and always falls back to the raster engine. It's a bit of a chicken and an egg problem - Qt doesn't use it because it's slow, it's slow because no one uses it.
The best solution to that conundrum is to try running your applications with -graphicssystem=opengl and report any problems you see to both Qt software and the Mesa3D/DRI bugzillas because the only way out is to make sure that both our OpenGL implementations and OpenGL usage in the rendering code on the applications side are working efficiently and correctly. The quicker we get the rendering stack to work on top of OpenGL the better off we'll be.