Friday, August 14, 2009

2D in KDE

So it seems a lot of people is wondering about this. By this I mean why dwarfs always have beards. Underground big ears would be probably a better evolutionary trait, but elfs got dibs on those.

Qt, and therefore KDE, deals with 3 predominant ways of rendering graphics. I don't feel like bothering with transitions today, so find your own way from beards and dwarfs to Qt/KDE graphics. Those three ways are:
  • On the CPU with no help from the GPU using the raster engine
  • Using X11/Xrender with the X11 engine
  • Using OpenGL with the OpenGL engine
There's a couple of ways in which the decision about which one of those engines is being used is made.

First there's the default global engine. This is what you get when you open a QPainter on a QWidget and its derivatives. So whenever you have code like

void MyWidget::paintEvent(QPaintEvent *)
{
QPainter p(this);
...
}

you know the default engine is being used. The rules for that are as follows:
  • GNU/Linux : X11 engine is being used
  • Windows : Raster engine is being used
  • Application has been started with -graphicssystem= option :
    • -graphicssystem=native the rules above apply
    • -graphicssystem=raster the raster engine is being used by default
    • -graphicssystem=opengl the OpenGL engine is being used by default
Furthermore depending on which QPaintDevice is being used, different engines will be selected. The rules for that are as follows:
  • QWidget the default engine is being used (picked as described above)
  • QPixmap the default engine is being used (picked as described above)
  • QImage the raster engine is being used (always, it doesn't matter what engine has been selected as the default)
  • QGLWidget, QGLFramebufferObject, QGLPixelBuffer the OpenGL engine is being used (always, it doesn't matter what engine has been selected as the default)
Now here's where things get tricky: if the engine doesn't support certain features it will have to fallback to one engine that is sure to work on all platforms and have all the features required by the QPainter api - that is the raster engine. This was done to assure that all engines have the same feature set.

While OpenGL engine should in general never fallback, that is not the case for X11 and there are fallbacks. One of the biggest immediate optimizations you can do to make your application run faster is to assure that you don't have fallbacks. A good way to check for that is to export QT_PAINT_FALLBACK_OVERLAY and run your application against a debug build of Qt, this way the region which caused a fallback will be highlighted (the other method is to gdb break in QPainter::draw_helper). Unfortunately this will only detect fallbacks in Qt.

All of those engines also use drastically different methods of rendering primitives.
The raster engine rasterizes primitives directly.
The X11 engine tessellates primitives into trapezoids, that's because Xrender composites trapezoids.
The GL engine either uses the stencil method (described in this blog a long time ago) or shaders to decompose the primitives and the rest is handled by the normal GL rasterization rules.

Tessellation is a fairly complicated process (also described a long ago in this blog). To handle degenerate cases the first step of this algorithm is to find intersections of the primitive. In the simplest form think about rendering figure 8. There's no way of testing whether the given primitive is self-intersecting without actually running the algorithm.
To render with anti-aliasing on the X11 engine we have to tessellate. We have to tessellate because Xrender requires trapezoids to render anti-aliased primitives. So if the X11 engine is being used and the rendering is anti-aliased whether you're rendering a line, heart or a moose we have to tessellate.

Someone was worried that it's a O(n^2) process which is of course completely incorrect. We're not using a brute force algorithm here. The process is obviously O(nlogn). O(nlogn) complexity on the cpu side is something that both the raster and X11 engines need to deal with. The question is what happens next and what happens in the subsequent calls.

While the raster engine can deal with all of it while rasterizing, the X11 engine can't. It has to tessellate, send the results to the server and hope for the best. If the X11 driver doesn't implement composition of trapezoids (which realistically speaking most of them doesn't) this operation is done by Pixman. In the raster engine the sheer spatial locality almost forces better cache utilization than what could be realistically achieved by the "application tessellate->server rasterization" process that the X11 engine has to deal with. So without all out acceleration in this case X11 engine can't compete with the raster engine. While simplifying a lot it's worth remembering that in terms of cycles register access is most likely smaller or equal to 1 cycle, access to L1 data cache is likely about 3 cycles, L2 is probably about 14 cycles, while the main memory is about 240 cycles. So for CPU based graphics efficient memory utilization is one of the most crucial undertakings.

With that in mind, this is also the reason why a heavily optimized purely software based OpenGL implementation would be a lot faster than raster engine is at 2D graphics. In terms of memory usage OpenGL pipeline is simply a lot better at handling memory than the API QPainter provides.

So what you should take away from this is that if you're living in the perfect world, the GL engine is so much better than absolutely anything else Qt/KDE have it's not even funny, X11 follows it and the raster engine trails far behind.

The reality with which you're dealing with is that when using the X11 engine, due to the fallback you will be also using the raster engine (either on the application side with Qt raster engine or the server side with Pixman) and unfortunately in this case "the more the better" doesn't apply and you will suffer tremendously. Our X11 drivers don't accelerate chunks of Xrender, the applications don't have good means of testing what is accelerated, so what Qt does is simply doesn't use many of its features. So even if the driver would accelerate for example gradient fills and source picture transformations it wouldn't help you because Qt simply doesn't use them and always falls back to the raster engine. It's a bit of a chicken and an egg problem - Qt doesn't use it because it's slow, it's slow because no one uses it.

The best solution to that conundrum is to try running your applications with -graphicssystem=opengl and report any problems you see to both Qt software and the Mesa3D/DRI bugzillas because the only way out is to make sure that both our OpenGL implementations and OpenGL usage in the rendering code on the applications side are working efficiently and correctly. The quicker we get the rendering stack to work on top of OpenGL the better off we'll be.

28 comments:

Anonymous said...

So is there a way to make this the global default? ie put it in a . file or something ?

Socceroos said...

Hey Zack,

I *LOVE* this article. Very well told and very well explained.

Thanks for putting in the effort for this! =)

Anonymous said...

mmm
opengl is much slower to me for example opening dolphin in opengl takes like double time than raster
is this because of my nvidia card? or just because of unaccelerated opengl drawing?

Rudd-O said...

No KDE application that I tested works correctly with OpenGL graphicssystem. They all segfault on startup the minute they map a window. Kopete actually starts up, but when you click the icon to show the window, boom, dies.

Anonymous said...

Zack, do I understand this correctly?: If someone could provide an input library (keyboard, mouse) I could run QT/KDE without the whole X11 environment directly on MESA?

matthias

Albert Astals Cid said...

As the said "someone" please explain me if this is not O(n^2)

QList<QPolygonF> QPainterPath::toFillPolygons(const QTransform &matrix) const
...
for (int j=0; j<count; ++j) {
  if (subpaths.at(j).size() <= 2)
    continue;
  QRectF cbounds = bounds.at(j);
  for (int i=0; i<count; ++i) {
    if (rect_intersects(cbounds, bounds.at(i))) {
      isects[j] << i;
    }
  }
}

Because valgrind tells me that rect_intersects is executed 27889 times for my line with 167 dashes

Anonymous said...

Thanks for removing the crappy background.

And yeah, even for my simplest photo editing program, which rotates and scales Photos smoothly, I'm totally lost in speed on my Intel card when using OpenGL. Why is that?

You didn't list a QGraphicsScene, what is being used by default when painting on this?

Enrico Ros said...

What about a QPainter state tracker over Gallium? Will it be the best thing out there or is it possible to improve painting speed even more?

Blaž Tomažič said...

Albert Astals Cid: Your algorithm is O(n^2) but it's a brute force algorithm.

There are better algorithms for intersections based on sweep line algorithm:
http://en.wikipedia.org/wiki/Sweep_line_algorithm
Example for line intersection:
http://en.wikipedia.org/wiki/Bentley%E2%80%93Ottmann_algorithm

Panagiotis Papadakos said...
This comment has been removed by the author.
Panagiotis Papadakos said...

http://qt.nokia.com/developer/task-tracker/index_html?method=entry&id=208626

Linuxhippy said...

Hi Zack,

> So even if the driver would accelerate > for example gradient fills and source
> picture transformations it wouldn't
> help you because Qt simply doesn't use
> them and always falls back to the
> raster engine. It's a bit of a chicken
> and an egg problem - Qt doesn't use it
> because it's slow, it's slow because no
> one uses it.

Basically this is a really stupid descision:
- Source picture transformation is now accelerated well by *all* major drivers.
- Gradients are still generated by pixman, but vram upload are optimized, so at least the composition step can be accelerated.

Anyway, client-fallbacks are probably the worst thing you can do. I am really sad QT doesn't do any better here - and to be honest it also has a bad influence on KDE4.
I'd just started Fedora8 with KDE-3.5, and was impressed how snappy the whole system felt compared to Fedora-11+KDE-4.3.

Why can't QT decide to fall back if render-version is less than xy, and use it otherwise.

Zack said...

Some quick comments:
@Anonymous: Sounds more like unaccelerated OpenGL to me.
@Rudd-O: This isn't the best place to get support. In general it sounds like your setup is busted. Either your distro via updates (e.g. installing Mesa3D libs over your proprietary NVIDIA/ATI drivers) or by you. You need to fix that first.
@Anonymous: Yes, that's true. Technically you can already do that with Qt Embedded and QWS.
@Albert Astals Cid: That "count" in that algorithm should be at most 3. However you got there is wrong. Also if it's X11 engine that does it's obviously a bug. (but this forum also isn't the best place to get people to fix it, e.g. the intersection algorithm used by the tessellation algorithm should be used there).
@Anonymous: A little hard to tell without any kind of profiling data. I'd bug Qt Software for a decent benchmarking utitily ;)
QGraphicsScene will inherit the default from its view.
@Enrico: The way I'd phrase is that "it'd be the easiest to make it fast" on account of the fact that you're so close to the real hardware. But in general the small wins you'd get from state management wouldn't be worth the lost flexibility of being able to run on more than just hardware that has Gallium drivers, so OpenGL is probably a better choice for applications and Qt right now.
@blazt: That's correct. As mentioned Qt already uses them. The path Albert hits shouldn't be executed with that big of a count but either way as Panagiotis Papadakos pointed out the code in there should have been updated with the algorithm from qtessellator.cpp.

Anonymous said...

Hi Zack,

I am developing a C# application in Windows. The app was programmed to draw with GDI+, but since it had to draw more than 200.000 2D drawing primitives (lines, ellipses, bitmaps) it was very slow (it take several minutes to draw everything).

Because of that I decided to rewite every 2D drawing primitive to OpenGL in 2D with a multilayer environment. It took me a week and the result is that now it draws everything in less than one second, so I agree with you and I recomend to write a QTOpenGLPainter (it could take 3 or four weeks) in order to accelerate drawing 2D primitives in QT.

I also think that QT shoud select automatically the drawing method (CPU, 2D accelerated or OpenGL).

Greetings

Albert Astals Cid said...

@blazt: It is not my algorithm, it's Qt one

Zack: I have code where that "count" is 3600, and it's your blog and of course you decide what you want on it, but don't do an ad hominem 'attack' to me and then complain because i answer.

Zack said...

@linuxhippy: When that decision was made none of the drivers supported it. It simply hasn't been reevaluated. It's just a question of whether it will be beneficial to majority of users who are running Qt/KDE, or whether raster engine is still faster at it. And that's of course left for benchmarking.

@Albert: I wasn't attacking you, in fact if I recall correctly I haven't even mentioned you in the post. To your problem though: if you state that the problem is intersection detection you can be 100% sure that everyone will assume you mean tessellation because that's the crucial path where that algorithm always matters. I'm not very good at reading minds and if you mean flattening of polygons when you talk about intersections in primitives then all I can do is throw my hands in the air and laugh. Oh, and I certainly haven't complained, again if I recall correctly (and my memory should be pretty clear given that the answer is about a page up) I even told you how to fix your specific problem.

Unknown said...

I would like to echo the first poster's question: can we change the default (particularly to use raster on Linux)?

Linuxhippy said...

> When that decision was made none of the
> drivers supported it. It simply hasn't
> been reevaluated.
I've filed an enhancement request, hopefully Nokia will reevaluate that soon.

> It's just a question of whether it will
> be beneficial to majority of users who
> are running Qt/KDE, or whether raster
> engine is still faster at it. And
> that's of course left for benchmarking.

It is for sure! Keep in mind that its not just the XRender vs. Raster, but Xrender vs (VRAM-download (very slow) + Raster + VRAM+upload + X11 transport overhead).
Most simple benchmarks do the same operation over and over, so a pixmap is migrated only once - but in the "real" world raster and xrender will mix quite often leading to horrible results.

Linuxhippy said...

Another point, OpenGL has a quite deep pipeline - highly optimized to get feed with large data submitted in batches.

2D in general is exactly the opposite - small one-by-one primitive calls, always changing rendering attributes (clip, color, ...) and so on.

Yes, OpenGL shines in some benchmarks like QGears2 or some imaging stuff, but I haven't seen *real* benchmarks runnig the OpenGL backend, doing what toolkits usually do - many many small operations.
And because QT doesn't change the way the 2D api looks, for OpenGL nothing changes.

I just know of Java2D that their (also very well performing) OpenGL backend doesn't perform that well for Swing interfaces - simply because all that small primitives are not worth to go through the long driver pipelines heavily tuned for games.
Have you tried to run ~20-30 OpenGL apps side-by-side? Horrible, and a huge resource waste.

So yes, there are workloads where the OpenGL has clearly a large advantage.
On the other side, recommending it for every and all apps is stupid.

Better adress all those ugly performance problems in KDE4/QT4 caused by bad design - like all that SVG re-rendering when e.g. a folderview is resized (that has a horrible profile), or the backbuffer allocation storm when a window is resized.
Yes, some parts of XRender are ugly, but its not the only reason QT4 feels that slow.

Zack said...

@linuxhippy: I'm not quite sure what you mean by "deep pipeline" when referring to GL. When you're talking about the hardware then obviously Xrender acceleration wouldn't go through special magical paths in the hardware, it will use exactly the same things GL uses. So in that sense they're exactly the same.
If you're talking about state management that Mesa3D does before sending commands, then that model is obviously easier to implement for create/bind/destroy semantics, which would be preferable by the GPU's than what Xrender offers.

Also small amount of data isn't a problem. Note that we know exactly when a scene finishes rendering (QPainter::end) and we can easily accumulate all data in buffers ready to upload if not already done so, bind and render on QPainter::end. Nothing says that the QPainter api is synchronous (in fact it's one of the main reasons why most of the benchmarks is wrong).

Anonymous said...

Qt4 applications when started with -graphicssystem opengl become horribly and unusably slow at window drawing - tested with Designer and Assistant.

ATI x1650 (R500) card with open source radeon driver - KDE compositing enabled.

I know it's not a good place to get support and neither am I seeking it, but just thought will make not in case people get all excited about "fast" opengl graphics on Linux :p

Linuxhippy said...

Yes I was talking about state management, which is usually way more complex for OpenGL compared to XRender.

Sure you can buffer drawing commands, however I doubt it does help for typical 2d apps.

I recently ran wireshark to see what resizing qtconfig produces, and beside mixing Render and X11-core-drawing all the time, the command-sequence looks like this: CreatePixmap, ChangeGC, PolySegments (outch!), ChangeGC, RenderRectangles, ChangeGC, ChangeGC, XPutImage, CompositeText and so on.
There isn't a lot you can buffer, however complex state management may lower performance.

Please don't get me wrong, a GL backend is a great thing and it clearly has its advantages, especially for demanding apps, but for a typical UI with a few buttons I doubt it does any good.
However that case is never benchmarked :-/

And all the SVG rendering that slows plasma down, won't become magically faster by using GL ;)

Zack said...

@Linuxhippy: As mentioned the state management is actually a lot simpler to do with OpenGL for accelerating 2D than with Xrender (because Xrender doesn't have state persistence and you need to essentially need to cache and hash on every call).

GL will in fact magically improve the SVG rendering that slows Plasma down. The svgviewer example in Qt always allowed you to see the difference, it's huge :)

qtconfig is a qt3 application, so that's a little meaningless. Back in those days everyone did X rendering like that (even the deprecated qt4 port used the qt3support libs). There was no Xrender composition and the Xrender text rendering was shimmied on top of it much later.

Anonymous said...

"qtconfig is a qt3 application, so that's a little meaningless. Back in those days everyone did X rendering like that (even the deprecated qt4 port used the qt3support libs)."
Do you mean that qtconfig-qt4 is deprecated? If yes, what is it replaced with?

Anonymous said...

wow! wonderful blog.. Thanks for the sharing.. easy to download it's a wonderful website...

Anonymous said...

I have been doing some further tests of the polyline drawing using the raster engine because my customers are complaining about line drawing performance.

I created a harness in Qt and tested in all versions from 4.4.1 right up to the 4.6.0 tech preview and observed the same results.

The conclusion is that there needs to be some serious profiling put on the antialiasing routines.

My harnesss used GDI+ as an example. I plotted long polylines using the raster engine and GDI+

Here is a typical benchmark.

Take a 1000 point Polyine with GDI+ and Qt's DrawPolyLine routine with Antialiasing switched OFF.

GDI+ - 94 Milliseconds
Qt - 78 Milliseconds

All my tests clearly show that Qt is outperforming GDI+ in this area.

Same points but with Antialiasing switched ON

GDI - 172 milliseconds
Wait for it...
Qt - 10062 milliseconds

Thats a whopping 10 seconds. All tests where conducted without having to clip lines (I wanted that out of the equation because apparently there is a bug regarding long polylines and clipping)

aviral said...

This might be slightly off topic but if you can help it wud be gr8...For Particle Effects i can use QOpenGLWidget and I already have Qt Animation Framework for some animations on my QGraphicsView.Now, I want to have Particle effects in background (coming from QGL) and continue using anim framework on my graphicsView...i mean can i paint QGL stuff on QGraphicsScene 's background and retain foreground with QGraphicsItems ?

Anonymous said...

Ubuntu announced switching to Wayland, with Qt 4.8 the performance should go through the roof.