Friday, June 27, 2008

Accelerating desktops

In general I'm extremely good at ignoring emails and blog posts. Next to head-butting it is one of the primary skills I've developed while working on Free Software. Today I will respond to a few recent posts (all at once, I'm a mass-market responder) about accelerating graphics.

Some kernel developers released a statement saying that binary blobs are simply not a good idea. I don't think anyone can argue that. But this statement prompted a discussion about graphics acceleration, or more specifically a certain vendor who is, allegedly, doing a terrible job at it.

First of all the whole discussion is based on a fallacy rendering even the most elaborate conclusions void. It's assumed that in our graphics stack there's a straight forward way between accelerating an api and fast graphics. That's simply not the case.

I don't think it's a secret that I'm not a fan of XRender. Actually "not a fan" is an understatement I flat out don't like it. You'd think that the fact that 8 years after its introduction we still don't have any driver that is actually real good at accelerating that "simple API" would be a sign of something... anything. When we were making Qt use more of the XRender api the only way we could do that is by having Lars and I go and rewrite the parts of XRender that we were using. So what happened was that instead of depending on XRender being reasonably fast we rewrote the parts that we really needed (which is realistically just the SourceOver blending) and did everything else client side (meaning not using XRender)

Now going back to benchmarking XRender. Some people pointed out an application I wrote a while back to benchmark XRender: please do not use it to test a performance of anything. It will not respond to any real workloads. (also if you're taking something I wrote to prove some arbitrary point, it'd be likely a good idea to ping me and ask about it. You know on account of writing it, I just might have some insight into it). The thing about XRender is that there's a large amount of permutations for every operation. Each graphics framework which uses XRender uses specific, defined paths. For example Qt doesn't use server-side transformations (they were just pathetically slow and we didn't feel it would be in the best interest of our users to make Qt a lot slower), Cairo does. Accelerating server side transformations would make Cairo a lot faster, and would have absolutely no effect on Qt. So whether those tests pass with 20ms or 20hours has 0 (zero) effect on Qt performance.

What I wanted to do with the XRender performance benchmarking application is basically have a list of operations that need to be implemented in driver to make Qt, Cairo or anything else using XRender fast. "To make KDE fast look at the following results:" type of thing. So the bottom line is that if one driver has for example result of 20ms for Source and SourceOver and 26 hours for everything else and there's second driver that has 100ms for all operations, it doesn't mean that on average driver two is a lot better for running KDE, in fact it likely means that running KDE will be five times faster on driver one.

Closed sourced drivers are a terrible thing and there's a lot of reasons why vendors would profit immensely from having open drivers (which is possibly a topic for another post). Unfortunately I don't think that blaming driver writers for not accelerating graphics stack which we went out of our way to make as difficult to accelerate as possible is just a good way of bringing that point forward.

29 comments:

vdp said...

Ok, so Xrender is crap (and has been for 8 years without a fix ?), so we shouldn't blame vendors for not optimising it ? And the fact that one vendor implemented just the wrong set of Xrender primitives is just bad luck ?

A fair point of view, but who do we blame^W^W^W^W where do we look to fix the performance issues ? All discussions until now pointed towards the driver. Should we fix Xrender instead ? Qt ? KDE ?

Anonymous said...

Thanks for the insight, Zack.

Sooooooo ... how can I make my KWin on a beefy graphics card take less than a second to switch virtual desktops or resize a window? :)

Zack said...

eh,
@Moltonel: I don't think I've ever used the word "crap". And yes, certainly shouldn't blame vendors for not optimizing it. And yes, accelerating which ops is going to have an improvement is almost random.
Performance affects the whole graphics stack. Every one of the pieces has to be in sync for desktops to perform well.
And by "all discussions until now" I'm assuming you mean "what people who don't understand how this stuff works have been saying", which I can understand.
There is no "fix that line in this piece of code and it all will work". It's a complicated topic with no simple answers that requires us to take a step back and redesign a few things.

@sebas: if you have nvidia card then use GL, and if it's slow with GL then kwin is doing something wonky.

Anonymous said...

Wasn't Glucose or Xegl supposed to fix these problems? how is the state of these projects? and isn't time to switch to a bug-fix-only phase in X.org and start working actively on Xegl?

Anonymous said...

I have found that kde 4 is really snappy when I run it under xgl. Unfortunately to get kwin compositing to work you need to get xgl to use the mesa libGL and not the one given by nvidia. I failed to do this 'cos I suck, but maybe others might have better luck.

But it worked fine without compositing last time I tried. Of course then you have to suffer all the crappy things that come with xgl

Anonymous said...

@Zack: I'm using GL. And the performance problem is also there with compositing disabled. KWin is a lot snappier on my other machine with ati driver.

Anonymous said...

I have similar problems like sebas.

nvidia geforce 8600m-gt (so i can not switch...)

Terrible slow (not the 3D things but all the normal dekstop things). It really does fell slow. You can see it.

It is not that nvidia is not accelerating things it is that the try to do things with acceleration makes everything slower.

Using vesa it is a lot faster. OK, I do not know anything about this XRender thing but your benchmark (ok it is not really valid for that) but it is in most test about 5-15 times slower than with vesa driver.

Does that mean nothing? Well I see how slow it is. Using Vesa driver is no real solution as I can not have any Hardare OpenGL.

DanielW

Zack said...

@danielw: unfortunately that means less than nothing. Not only it's useless but misleading. Benchmarking in general only makes sense if you know what you're testing. One can answer the question "why is my graphics slow" just as well as the question "why doesn't my computer work". There's billion of intermediate questions that have to be posed in order to diagnose anything.

In your and Sebas' cases the bottom line is this: all Trolltech engineering team is running on machines with NVIDIA cards so Qt is obviously running /very/ well with NVIDIA.
If that's not the case on your setup then the question of whether you think you'd be better positioned to solve that problem if the driver would be open is a whole different discussion.

Anonymous said...

So I've limited skills, but I'm *willing* to help solving this problem. Personal itch, seeing that this ruins the KDE4 experience for a large part of our user base are the main motivations.

Now this "your benchmark is useless because you have no clue" is probably correct, but it's as far as it gets with my limited understanding of this situation.
And it's absolutely cool that on TT's developer machines Qt runs fine. I don't even know if it's a Qt problem, or if kwin is wonky, or if the stars are just not right (for a couple of months). And that is exactly the thing I want to find out.

It'd be helpful to me if you were less defensive, and try to help me finding out what the actual cause of this performance problem is. Maybe it's complicated, maybe it's easy -- but it is an important technical shortcoming right now. Tell me where to look, and I'll do that (within my possibilities).

Love.

Anonymous said...

@Zack:

Well first: I do not blame you or TT for the problems. I am a little angry at nvidia as least because they do not answer on ther bug report mails.

The thing I do not know what to do I feel helpless here.

The thing is I am trying to figure out the thing a few now (just finding out what cards are the one with problems and so on, not technical things, I do not have the skills for that).

One thing I am quite sure: Every single guy with a Geforce 8600(m) has this problems. This fucking card is useless (at the moment) every 10 years old card will to equal good.

If this were a desktop I would just get a new cheap card for about 30 - 40 EUR and would be happy. But that is not possible. I paid too much for this useless thing.So I am angry. Not at you, do not get me wrong.

But maybe you have some hint what do to. If this benchmark is useless ok.. What else can I/we do to have something to go do nvidia and say we have a problem?

Just saying "things are slow" doesn't help. It is fine with windows, but that is not an option. I would rather sell this laptop and get a new one.

:-(

BTW: It is not only QT. GTK2 is slower too, but not that worse.

daniels said...

Sentence one: I am awesome at ignoring emails.

Paragraph five: OH MY GOD I CAN'T BELIEVE PEOPLE DIDN'T EMAIL ME.

You are the reason free software graphics are the suck. Hang your head in shame.

Anonymous said...

Thanks for blogging this Zack.
All I can say wrt to this:
seems like a lot of people reporting performance problems are using 8600s, and other more recent cards. Those same problems usually can't be reproduced on far less capable hardware (e.g. my 7100, i9xx, etc.).


-Maks.

Anonymous said...

nice daniels. you're an everyday troll or just a weekend one?

Anonymous said...

Um, I thought the idea was to find out just which call(s) was being slow? That seems like a perfectly good use of benchmarking software. Because otherwise, where do you even begin?

Anonymous said...

@Zack:
I think, it's fine to blame the hardware vendors (not their poor developers). Nvidia has bad performance regressions since their 8xxx cards and it doesn't look as it's the fault of a bad XRender API alone. If they had spend more resources on the driver, they wouldn't have to release one with such results: http://www.nvnews.net/vbulletin/attachment.php?attachmentid=29257&d=1196719662 (and it's still unacceptable today). We would have transparency in Qt4: http://trolltech.com/developer/task-tracker/index_html?id=206998&method=entry and probably a smooth video playback, like on their older cards.
Nvidia does it just too slow and thus has to take the most blame for a bad desktop experience.

@DanielW
"Every single guy with a Geforce 8600(m) has this problems. This fucking card is useless (at the moment) every 10 years old card will to equal good. "

Try nvidia-settings -a InitialPixmapPlacement=2 (mostly good speed improvement, this will be someday the default)
and in xorg.conf: Option "PixmapCacheSize" "200000" (this may help with pixmap intensive apps)

"If this benchmark is useless ok.. What else can I/we do to have something to go do nvidia and say we have a problem?"

The Nvidias developer are aware of the problems, unfortunately they can't part themselves... But if you can always report some bugs or send testcases to
linux-bugs@nvidia.com. Though, an answear may come some weeks later or not at all.

Jon Smirl said...

OpenGL|ES, OpenGL|ES, OpenGL|ES, .... If I say it enough times will it come true? Why do we need to make up our own API?

Zack said...

I didn't think this post would be so controversial especially since my position about what we should do with our graphics stack has been known for at least two years.
@daniels: Taking something that is obviously a joke and trying to use that? Especially since you've used my email address before and you know that I always respond to questions.
As to it all being my fault, I think I've used the word "we" which certainly includes me and yea, very likely a lot of the issues that we're having is a direct result of me not spending more time working to solve those problems. And the work that I'm doing right now is not visible at the moment which is unfortunate but it is what is and it certainly doesn't take away anything from this post.

Anonymous said...

This is a most strange blog post i seen for a long time. Author is pretty good at confusing peoples.

That is waht i have understand from this post:

Does the problem with nvidia cards exists?
Yes.

Does it related to drivers?
Yes, but no one should blame developers for that.

For the God's sake, why?
Because they are good guys and XRender is a crap.

So… Why Qt uses XRender at all?
No, we are good guys too.

But xrenderbenchmark clearly shows what…
xrenderbenchmark is a crap

Is xrenderbenchmark a crap?
No, I'm a good guy

So, what is going on? I think what many peoples just does not understand it.

Zack said...

@dmiceman: Why are you using the word "crap", it's certainly not what I said. To answer your questions:
Does it related to drivers?
A: Yes, but it's a lot more complicated than that.

Why shouldn't we blame developers?
A: You certainly can, but as I've mentioned I think there's more than just people at one company to blame in this case.

So… Why Qt uses XRender at all?
A: For a number of reasons, the main one is decent support for anti-aliased fonts. The other is alpha-blending, without the need for pushing raw images over the wire all the time. It's an offset between features we needed and performance we wanted to have.

But xrenderbenchmark clearly shows what…
A: I'm not sure what does that mean.

Is xrenderbenchmark a crap?
A: It's certainly lacking tons of features, with the main one being described in the post you've responded to. Adding that feature could potentially help vendors implementing XRender, but, once more, long term I do not think that this is a good solution.

Hopefully that clears it up a bit.

Anonymous said...

@Zack:

Why are you using the word "crap", it's certainly not what I said.

I'm sorry, that is a little bit difficult sometimes to feel a word strength in foreign language. If you tell what this work is inapplicable there, I'll trust you. I was not intended to offend anyone.

But xrenderbenchmark clearly shows what…
A: I'm not sure what does that mean.


That is about various test results circulating around this days, like this one: http://vizzzion.org/?blogentry=820

As far as i have understand you, this is worthless bench results.

Hopefully that clears it up a bit.

Yes, thank you very much.

P.S. Talking about better overall solution, you mean Glucose? This project seems to be well abandoned now…

Anonymous said...

So just to be clear, Zach, is this what you're saying?

Xrender is so broken that NVIDIA shouldn't be expected to fix their drivers to work with it (even though everyone else has managed to) and we should just live with the terrible KDE4 experience for years until a replacement comes out?

I'm certainly willing to believe that XRender is bad, but I'm not exactly clear what the point of the post was - unless there wasn't any point and you were just venting. Which is perfectly fine.

daniels said...

@Zack: I was trying to put it beyond doubt that I wasn't serious, but guess I failed.

scroogie said...

I think a lot of people commenting miss the point here. As I read it Zack didn't want to solve the whole problem of accelerating X graphics, but advise against drawing wrong conclusions from results of the benchmark he wrote, which is perfectly understandable. So people, please step back and read the post again. Perhaps if we're nice enough, Zack will think about and post a possible path to enhancement. I guess it would qualify as a PhD thesis to solve all problems of graphics acceleration in the X architecture in one post though.
So, what I learn from this post:
1. The architecture is complicated and thus you cannot just fix pointXYZ to get fast graphics. Permutations in operations may lead to contradictory or rivaling requirements.
2. Point 1 implies that the benchmark measuring single operations does not scale linearily (is that the correct word in english?).
3. The benchmark does not represent real world application workflows. Cairo and Qt resp. use different subsets of the operations in different frequencies. So, do not use the benchmark to measure desktop performance.

I don't want to say that this represents Zack's intention or opinion, but at least this is what I understood.

Anonymous said...

A bit off topic: Does Qt use XShm?
http://www.osnews.com/story/19935/Cairo_Xlib_and_the_Shared_Memory_Extension
Gtk does according to that posting.

Unknown said...

> For example Qt doesn't use server-side
> transformations (they were just
> pathetically slow and we didn't feel it
> would be in the best interest of our users
> to make Qt a lot slower), Cairo does

Cairo does not. That's why we've never accelerated server-side gradients in our driver -- nobody uses them. I'm hoping in a few months once we get all this memory manager crap finally finished off that we'll finally get to fixing that on both sides at once.

A more apt criticism of XRender would have involved the tricky parts of the spec (source clipping versus transformations, filters versus gradients), which due to not being implemented have forced people to go do client-side implementations.

MaXX Desktop said...
This comment has been removed by the author.
MaXX Desktop said...

Oh boy!

In my personal experiences, NV Quadro cards has always performed better. Server-Overlay plays very nice with the Qt/Quadro combo. Wish I could say the same with Gtk ;)

Again, solid performance and stability is to be expected for a card that cost minimum 500$. You pay because you want something that works and been able to call up the vendor when it doen't. With the Quadro, Cuda and Telsa products, you do have support!


My 2cents! There are always problems when you mix free-loaders in a commercial and open-souce equation.

Zack, you got my vote!

Anonymous said...

zack can u elaborate why OPengL 3 is cool or not?

Unknown said...

GPU Data compaction, expansion and other algorithm can be transformed into gather-only algorithms - please take a look at
www.mpii.de/~gziegler
maybe it helps; let me know if you have further questions :)

/Gernot (gz@geofront.eu)