Friday, August 14, 2009

More 2D in KDE

An interesting question is: if the raster engine is faster at gross majority of graphics and Qt X11 engine falls back on quite a few features anyway why shouldn't we make raster the default until OpenGL implementations/engine are stable enough.

There are two technical reasons for that. Actually that's not quite true, there's more but these two are the ones that will bother a lot of people.

The first is that we'd kill KDE performance over network. So everyone who uses X/KDE over network would suddenly realize that their setup became unusable. Their sessions would suddenly send images for absolutely everything all the time... As you can imagine institutions/schools/companies who use KDE exactly in this way wouldn't be particularly impressed if suddenly updating their installations would render them unusable.

The second reason is that I kinda like being able to read and sometimes even write text. Text tends to be pretty helpful during the process of reading. Especially things like emails and web browsing get a lot easier with text. I think a lot of people shares that opinion with me. To render text we need fonts, in turn those are composed of glyphs. To make text rendering efficient, we need to cache the glyphs. When running with the X11 engine we render the text using Xrender, which means that there's a central process that can technically manage all the glyphs used by applications running on a desktop. That process is the Xserver. With the raster engine we take Xserver out of the equation and suddenly every single application on the desktop needs to cache the glyphs for all the fonts they're using. This implies that every application suddenly uses megs and megs of extra memory. They all need to individually cache all the glyphs even if all of them use the same font. It tends to work ok for languages with a few glyphs e.g. 40+ for english (26 letters + 10 digits + a few punctuation marks). It doesn't work at all for languages with more. So unless it will be decided that KDE can only be used by people with languages whose alphabets contain about 30 letters or less, then I'd hold off with making raster engine the default.

While the latter problem could be solved with some clever shared memory usage or forcing Xrender on top of raster engine (actually I shouldn't state that as a fact, I haven't looked at font rendering in raster engine in a while and maybe that was implemented lately), it's worth noting that X11 engine is simply fast enough to not bother over a few frames in this way or another. Those few frames that you'd gain would mean unusable KDE for others.

And if you think that neither of the above two points bothers you and you'd still want to use raster engine by default you'll have to understand that I just won't post instructions on how to do that here. If you're a developer, you already know how to do it and if not there are trivial ways of finding that out from the Qt sources. If you're not a developer then you really should stick globally to the defaults and can simply test the applications with the -graphicssystem switches.


Anonymous said...

In my case, and I'm sure that I'm not a special one, raster gave to me more than a few frames, for example applications like kpressenter, amarok, or even dolphin are quite non-usable without raster. Another example is konversation, if you switch quickly around channels without raster, the application feels lagged :/

Imho, if you can fix the memory issue with raster, we should move to it. The network issue could be fixed easily by the startup process.

Anonymous said...

Gosh - the more I read about it, the more hopeless Qt4 performance seems. Between gobbling up and hanging onto MBs of memory per window for double-buffering that Qt3 didn't use (even for windows that have not had their contents changed for ages or which are invisible) and using more and more network-unfriendly strategies (raster, alien widgets) in an attempt to get some kind of usable speed ... well, it just seems like an out-of-control mess :(

I remember TrollTech claiming that they had benchmarks that showed that Qt4 was "faster and lighter" than Qt3 - I can't imagine what kind of unrealistic scenarios these benchmarks tested.

luca said...

I don't agree. I think users should have possibility to choose (it's what i like of free sw).
So, i won't ask you to make raster as default but only to give the possibility to users they know what they are doing (or they don't know it but they just want to do it, just like me) to use it.
It's not only a performance issue: i have to use raster at least in applications such as konsole or yakuake because with X11 i've an annoying bug with font rendering.
I already change .desktop files (bash helps me to do this) but i have to change them again every update and above all it seems applications restored from previous sessions doesn't automatically start with raster (and i don't know why).

Beat Wolf said...

ok, so a simple question. How can we improve the current situation? What would be the best path to follow to get better performance? because it really could be better, to be honest, it's not bad, really not, but it's not very good either, just good enough that i don't get annoyed when using it (which is a bad thing, if it would be bader, more people would look into fixing it)

Peppe said...

About the "networking" problem: there aren't modern solutions for that? FreeNX and co. don't alleviate the problem?

Rasi said...

X11 is fast enough? Ever used the plasma dashboard? Its horribly slow... Its much better with raster tho

deabru said...

@Peppe: see this:

NX uses X11

jospoortvliet said...

@anonymous (2nd comment): Well, if Qt was such a mess as you say, wouldn't KDE work just as horrible on other platforms than linux/X11? It doesn't, it works just fine on windows - so it's clearly X11 which is at fault.

Gustavo Noronha said...

What about the GL engine? Will it not cause problems for people using KDE over the network? How will that work?

Anonymous said...

The X11 font rendering can be REALLY slow in various situations.
For me (with nvidia drivers) Konsole is "almost" unusable slow with the X11 graphics engine.
Where as the raster engine is lightning fast. And Konsole is just about font rendering.

So, X11 for font rendering is also a big bottleneck. At least for me.

I hope that the OpenGL gets better soon. Ditching X11 for local session would really be a *HUGE* win!

Chaz6 said...

I have a KDE4 desktop that I access using RDP, and it is pretty much unusable using either X11 or raster due to the amount of screen updates. I guess you are talking about people who use it with X directly.

Anonymous said...

How about copying the code for those fast software rendering operations into Xorg to make the X11 performance better?

Peter said...

For me many programs are not usable without raster too. It would be nice to configure without changing some source code to make all programs use raster. If that is not possible, perhaps distributions could ship two different builds of qt4 to choose from.

Kevin Colyer said...

Thanks for outlining the network issues. When I read your last post it made me understand the basic issues here.

The network use case is very important for me. I run an LTSP network, and will continue to do so. KDE 3.5 flew and rendered smoothly and well. Happy clients.

I finally upgraded and am supporting KDE4 and KDE3 on LTSP. KDE4's performance is clearly suffering due to sending pixmaps over the wire all the time. Of course changing widgets and windowing system to Plastik helps hugely. (But pretty is a feature really is important). One example of poor performance is clicking to activate the Kickoff menu incurs a 4-5 second delay. The classic menu takes a fraction of a second.

Of course Qt4 brings wonderful cross platform support and that is not to be sniffed at, but the 2D acceleration on Linux really is important to me as well, especially over the network.

Perhaps this should become a performance metric for KDE4? It we can get very efficient 2D acceleration over X then we will have a snappy desktop. Perhaps OpenGL will help.

I tried running some apps like kate and Kontact with opengl rendering (locally) and they worked but with some major artifact problems to do with bad clipping (although the icons and menu vanished altogether). Perhaps opengl 2d development that way will bring a quicker long term solution.

Thanks for bringing this up. I had been wondering why my perception of poor graphics performance in KDE4 (first local and now seen over the network) was so at odds with the statements of "QT4 is faster, and uses less memory" I kept reading from the devs.


Zack said...

Rapid fire responses:
@Anonymous 1st: yes, that very well might be an option. It's essentially a question of whether we can combine engines like that, what's the hit in terms of network traffic and the average speedup. So it comes down to a few decent benchmarks. I'll try to write a blog about benchmarking this all later.

@Anonymous 2nd: Qt4 is faster than Qt3. You're simply comparing apples and oranges. You're comparing anti-aliased, gradient using, path rendering api to a non-anti-aliased draw line or pixmap api. If you use Qt4 with the same feature set as Qt3, Qt4 will be a lot faster.

@luca: You're just completely and utterly wrong. Especially bringing up the freedom of choice in Free Software is pathetic. First of all choice is only meaningful if you have enough knowledge to pick the one right for you. The fact that you don't even realize that it is an option already, means that you're not even close to possessing the basic knowledge required to make that decision. And my whole point is that it would be silly for me to openly state how to do something that is wrong. Which you with your response there clearly proved.

@BeatWolf: The answer to your question is a lot like the one to Anonymous 1st, I'll write a bit about that soon.

@Rasi: Using it right now. Works great. (aka. that was a useless setup specific statement)

@Gustavo Noronha: OpenGL has a network protocol. GLX.

@Chaz6: Yes, when I said X/KDE, I meant X/KDE not RDP/KDE. Those word thingies tend to be pretty crucial when it comes to understanding someone ;)

@Anonymous 4th: Look at my last blog to read about the biggest bottleneck when it comes to graphics on the cpu.

@Kevin Colyer: I think this is largely another reason we need a good performance benchmarking suite for Qt graphics. It would drastically improve the situation all over the place. As I just wrote in responses to Anonymous 1st and BeatWolf I'll try to write/clarify what we need there in a lot more detail later.

Linuxhippy said...

I was talking about qtconfig-qt4, and at least for this application (with the boring but fast windows/redmond) theme, the X11 commands QT generates are horrible.

Its mixing Render and core-X11-drawing all the time, does a lot of state management (almost 50% of the requests are ChangeGC), does small XPutImage calls which stall the whole graphics pipeline and calls e.g. PolySegment/Line which causes large software fallbacks on all modern X drivers.

From my pov there simply wasn't done a lot of optimization/profiling when that QT backend was written.

> Qt4 is faster than Qt3. You're simply
> comparing apples and oranges. You're
> comparing anti-aliased, gradient
> using, path rendering api to a
> non-anti-aliased draw line or pixmap
If I compare QT4 with a fancy theme and QT4 with the boring windows/redmong theme (which doesn't use gradients or antialiasing at all, and consists basically only of solid fills) - the QT3 window resizes way faster then the QT4 one.

OpenGL or XRender, I don't care at all. At least on my hardware (i945GM+latest UXA drivers, GeForce6600 proprietary driver, RadeonHD 3850 (radeon driver) - QT4 apps *feel* slower than QT3 apps, and KDE4 *feels* slower than KDE3.
Thats why I am using the redmond theme, because Plastik (which I prefer) or Oxygen are pain.

- Clemens

lucky said...

Nice to have a nice sum up from insiders.
I have a question, as people say that they experience graphical issues using the opengl backend for QT, is it the cas with proprietary stack instead of mesa ?
How does QT performs, when paired with fglrx or nvidia drivers ?
Because even though one would really prefer the whole open stack, saying that we better go for opengl in general largely largely implies a good support from the driver vendors. And unfortunately, not all vendors behave the same regarding the mesa.
How does those driver version impact QT development ?

Zack said...

@linuxhippy: "Its mixing Render and core-X11-drawing all the time" I'm not sure why you think that's a problem. The obvious reason for why it does what it does is because drawing a line is faster than drawing a path (e.g. tessellate a very thin polygon and render that). So what you're complaining "mixing of core and Xrender" and "draw line" were done exactly because of the fallacy of your third statement - they were done because benchmarks showed to make an insane performance difference.

There was number of blogs on that explained window resizing behavior.
And honestly whenever someone says "it feels slower/faster" I don't know whether to cry or laugh. Without a benchmark with numbers to back it up and a profile I can look at all I can do is pretend like I care and interject with "I'm not seeing that" resulting in a stalemate.
I think this entire discussion is basically a result of our failure to prepare a decent real world benchmarking suite. I'll sit down tomorrow and write about that a bit.

@lucky: The proprietary OpenGL drivers from NVIDIA and AMD are in general of a bit higher quality (especially NVIDIA) than our open drivers right now so Qt on OpenGL works better on them right now. It's simply about the quality of the OpenGL implementation, not its origins.

lucky said...

Sure having a proper benchmark suite would certainly iron out those "communication failures".
Anyway, I was really interested in those threads since we had an argument on ubuntu-fr regarding the good/bad X performances.
I'm sure lot's of people would be keen on testing and reporting breakages on their stack, if they were provided with a nice set of tests to run.
As you said the sooner everybody will be using opengl, the sooner those discussions will just vanish.

Linuxhippy said...

Sorry for spamming your blog that much. Its just I feel really inspired by the topic :)

1. Yes a good benchmark would really help. It would be great if it would have at least some compatibility with QT3, to be able to compare e.g. the window resizing issue.

2. Yes, rendering many lines with PolySgments is fast. Rendering only a few lines line on a surface with PolySegments causes a horrible slow fallback if the destination surface is currently in VRAM.
A lot has happend since XAA ;)

Furthermore mixing XRender and X11-core almost doubles the state-manegement related X-requests.

At least for my XRender-Java2d backend, I got way better performance when rasterizing lines with bresenham and sending down the small rectangles, except for huge amounts of lines rendered in a batch. I plan to improve performance in future with batching and some threshold to decide when to fall back to PolySegments.

Benoit said...


I'm the developer of the Gambas language, and I just want to share my experience with Qt3 & Qt4.

Gambas GUI uses the Qt library through an intermediate layer that makes it independent of the toolkit. But that's not the point there.

When I ported the Qt GUI component from Qt3 to Qt4, the Gambas IDE, that is written in Gambas, became slow and unusable (NVidia GeForce 8300GS / Core2 Duo @2.33GhZ / Mandriva 2009.0 or 2009.1).

This was with older versions of Qt4.

The GUI became usable with Qt 4.5 only, but it is obviously slower than Qt 3 in drawing widgets.

Note that the performance improvement in Qt 4.5 are huge, as the speed difference is not too important.

I think that comparison is interesting, as I am using exactly the same code and logic. Just that Qt4 is used instead of Qt3.

I played with all the flags I could find (raster engine, disabling double-buffering...), and I went to the following conclusions:

- Double-buffering slows widget redrawing a lot. Disabling it makes Gambas IDE/Qt4 almost as fast as Gambas IDE/Qt3.

- Disabling double-buffering makes Qt4 flicker more than Qt3. So Qt4 seems to draw more things than Qt3, but I may be wrong, as Qt3 does some double-buffering in some widgets.

- Using raster engine makes things faster, but some routines are buggy. For example blending an Image on another Image with a QPainter. But I have to investigate to understand what happened exactly.

- It seems that there is a bad interaction between XRender and NVidia too, because Plasma is slow as hell when resizing or moving things, unless I run it with the raster engine. But then I have sometimes drawing bugs or artifacts with some Plasma themes.

- Sometimes Qt4 forgets to redraw a rectangle, either in KDE apps or in Gambas. So I think there is a bug somewhere in the double-buffering. Alas I can't say more as it happens randomly and rarely.

- QT4 keeps being faster than GTK+ in every GUI aspect I could try. (Gambas has a GTK+ component too). But I think this is because GTK+ calls dozen of imbricated functions to do the simplest things all the time.

I think that Qt4 should disable double-buffering automatically when running through the network (if it is possible to check that). And I don't see why it should flicker so much then - It should not flicker more than Qt3, shouldn't it?

MrsCode said...

When it comes down to how responsive QT feels/is, QT3 beats QT4.
And after all, the only thing that matters for a user-interface is how it feels.
With QT3 windows re-size without any lag, everything is snappy and reacts immediatly.

Even on my really fast desktop hardware with latest intel drivers (and also on an older machne with a geforce 6800 and the proprietary driver) everything fells slower.

I already got used to, but booted into an older distribution based on KDE 3.5 and I was suprised how much better everything felt.

What I miss are benchmarks, that benchmark what a human would notice.
And no, it is not the backends fault - i tried raster, x11 and opengl, all are slow.

Anonymous said...

Does OpenGL rendering work over the network, or is it a local solution only?

I'm using KDE4.3, and xrdp so that I can access my workstation from home (the rdp protocol is the only one allowed through the company SSL-VPN), and the rendering looks absolutely horrid. Try it out (apt-get both software packages) and avert your eyes in horror.

I'm probably being forced to switch to something else anyway because the screensaver doesn't work and a workstation that won't lock itself after 15 minutes is seen as a security risk, but nonetheless annoying that I can't use KDE at work because rendering breaks when used over the network :-/.

Alejandro Nova said...

Zack. I'm revisiting this thread because I want you to see this pic. This is a common desktop. Amarok. A Plasmoid CPU meter. Screwed systray icons (Original systray spec) along correctly rendered ones (New systray spec). This pic hasn't anything special... until you realize that I ran:

$ plasma-desktop --graphicssystem opengl &
$ amarok --graphicssystem opengl

This is the current status of Qt Open GL. It can be slower than X11, but it idles at 0% CPU (as you can see in the plasmoid). It totally screws up rendering for systray icons, as long as they are made with the OLD spec. But IT WORKS. And I'm writing this in $ konqueror --graphicssystem opengl.

This definitely needs more testing, but this is the future. Will we see in Qt 4.6 some improvement here?

Zack said...

@MrsCode: and my favorite color is blue. aka. as I said many, many times, "perception" of what is faster is utterly useless, if you have a benchmark we can address it, otherwise I don't believe you.
@Anonymous: yes, it's GLX.
@Alejandro Nova: well yes, obviously the GL paint engine is being worked on a daily basis. Also note that bugs should really be reported to Qt software, not me.

Anonymous said...

Interesting when we show a benchmark we are told "we will address it".

This is me in regard to polyline drawing performance...

Qt - 78 Milliseconds not antialiased
Qt - 10000 Milliseconds antialiased

Surely that delta is way off the scale?

This is Nokia :(

» Posted by sroedal
on Monday, September 28, 2009 @ 08:06

@info, the line joins mean that the stroked polyline needs to be rendered as a whole, instead of rendering each line individually. Which in the case of a 1000-element polyline results in an outline path of 4000 elements, which then needs to be rasterized. Now, we use the Freetype rasterizer to do antialiased rasterizing, which is optimized for text rasterizing and performs best on paths without too many self-intersections. In this case the Freetype rasterizer is the bottleneck, spending 99 % of the run-time on generating spans to be blended. When disabling antialiasing we get ~60 ms per frame instead of ~10000 ms. It seems that the algorithmic complexity of antialiased rasterizing of paths is O(n^2) for complex paths with the Freetype rasterizer.

To solve this we could either replace the rasterizer, which is a pretty big job, or special case rendering of polylines with opaque pens when the pen width is small to render each line individually. The latter might lead to a slight degradation in quality at the line joins or at line intersections however. We’ve been reluctant to do this so far since the work-around is very simple, to use drawLines instead of drawPolyline, explicitly saying to Qt that you want the lines to be rendered individually.

In other words if Zack was still at
Trolltech he would have addressed this.

Dimitris Menounos said...

Hi, you mentioned that with the raster engine a lot more memory would be required as a result of every application caching the glyphs for all the fonts they're using.

I wonder if this is also stands true for Qt on Windows, since it uses the raster engine? Moreover why Qt/Windows doesn't use GDI instead of raster?