Zack Rusin

Friday, May 15, 2009

OpenGL ES

Two facts, one you didn't need to know, other that you should know. I'm the World Champion in being Vegetarian. Did you know that? Of course not, you probably didn't even know it's a competition. But it is. And I won it. My favorite vegetable? Nachos.
Technically they're a fourth of a fifth generation vegetable. But vegetables are like software, you never want to work with them on the first iteration. Like you wouldn't want to eat corn, but mash it, broil/fry it and you got something magical going on in the form of tortilla chips.

The other fact: OpenGL ES state trackers for Gallium just went public. That includes both OpenGL ES 1.x and 2.x. Brian just pushed them into opengl-es branch. Together with OpenVG they should be part of the Mesa3D 7.6 release.

At this point Mesa3D almost becomes the Khronos SDK and hopefully soon we'll add an OpenCL state tracker and start working on bumping the OpenGL code to version 3.1.

Sunday, May 10, 2009

OpenVG Release

I've been procrastinating lately. Where I define procrastination as: doing actual work, instead of writing blog entries. "Can you do that? Redefine words because you feel like it?", oh, sure you can. As long as you don't care about anyone understanding what you're saying, then it's perfectly fine. In fact it's good for you, the less people understands you, the more likely it is that they'll classify you as a genius. Life is awesome like that. That's right, you come here for technical stuff and get blessed with some serious soul searching stuff, enjoy.

We have released the OpenVG state tracker for Gallium.

It's a complete implementation of OpenVG 1.0. While my opinion on 2D graphics and long term usefulness of OpenVG changed drastically during the last year I'm very happy we were able to release the code. After the Mesa3D 7.5 we're going to merge it into master and 7.6 will be the first release that includes both OpenGL and OpenVG.

[You're exuberant about that]. I figured I'll start including stage directions for the blog so that you know how you feel about reading this.

I don't have much to write about the implementation itself. If you have a Gallium driver, then you're good to go and have accelerated 2d and 3d. It's great to see more state trackers being released for Gallium and this whole concept of "accelerating multiple APIs" on top of one driver architecture becoming a reality. [You love it and with a smile, wave bye]

Monday, March 23, 2009

Intermediate representation

I've spent the last two weeks working from our office in London. The office is right behind Tate Modern with a great view of London which is really nice. While there I was mainly trying to get the compilation of shaders rock solid for one of our drivers.

In Gallium3D currently we're using TGSI intermediate representation. It's a fairly neat representation with instruction semantics matching those from relevant GL extensions/specs. If you're familiar with D3D you know that there are subtle differences between seemingly similar instructions in both APIs. Modern hardware tends to follow the D3D semantics more closely than the GL ones.

Some of the differences include:

Indirect addressing offsets, in GL offsets are between -64 to +63, while D3D wants them >= 0.
Output to temporaries. In D3D certain instructions can only output to temporary registers while TGSI doesn't have that requirement.
Branching. D3D has both if_comp and setp instructions which can perform essentially any of the comparisons, while we immitate them with a SLT (set on less than) or similar with a IF instruction with semantics of the one in GL_NV_fragment_program2.
Looping.

None of this is particularly difficult but we had to implement almost exactly the same code in another driver of ours and that makes me a little uneasy. There's a number of ways to fix this. We could change the semantics of TGSI to match D3D, or we could implement transformation passes that operate on TGSI and do this transformation directly on the IR before it gets to the driver, or we could radically change the IR. Strangely enough it's the solution number #3, which while appearing the most bizzare, is the one I like the most. It's because it goes hand in hand with the usage of LLVM inside Gallium.

Especially because there's a lot generic transformations that really should be done on the IR itself. A good example of that includes code injections due to missing pieces of the API state on a given GPU, e.g. shadow texturing, including the compare mode, compare function and the shadow ambient value, which if the samplers on the given piece of hardware don't support need to be emulated with code injections after each sampling operation which utilizes the sampler that had the above mentioned state set on it.

Unfortunately right now the LLVM code in Gallium is far from finished. Due to looming deadlines on various projects I never got to spend the amount of time that is required to get in shape.
I'll be able to piggy back some of the LLVM work on top of the OpenCL state tracker code which is exciting.

Also, has anyone noticed that the last two blogs had no asides or any humor in them? I'm experimenting with "just what I want to say and nothing more" style of writing, wondering if it indeed, will be easier to read.

Thursday, March 12, 2009

KDE graphics benchmarks

This is really a public service announcement. KDE folks please stop writing "graphics benchmarks". It's especially pointless if you quantify your blog/article with "I'm not a graphics person but...".

What you're doing is:


   timer start
   issue some draw calls
   timer stop

This is completely and utterly wrong.
I'll give you an analogy that should make it a lot easier to understand. Lets say you have a 1MB and a 100MB lan and you want to write a benchmark to compare how fast you can download a 1GB file on both of those, so you do:


   timer start
   start download of a huge file
   timer stop

Do you see the problem? Obviously the file hasn't been downloaded by the time you stopped the timer. What you effectively measured is the speed at which you can execute function calls.
And yes, while your original suspicion that the 100MB line is a lot faster is still likely true, your test in no way proves that. In fact it does nothing short of making some poor individuals very sad due to the state of computer science. Also "So what that the test is wrong, it still feels faster" is not a valid excuse, because the whole point is that the test is wrong.

To give your tests some substance always make your applications run for at least a few seconds reporting the frames per second.

Or even better don't write them. Somewhere out there, some people, who actually know what's going on, have those tests written. And those people, who just happen to be "graphics people" have reasons for making certain things default. So while you may think you've made this incredible discovery, you really haven't. There's a kde-graphics mailing list where you can pose graphics question.
So to the next person who wants to write a KDE graphics related blog/article please, please go through kde-graphics mailing list first.

Monday, February 09, 2009

Video and other APIs

I read today a short article about video acceleration in X. First of all I was already unhappy because I had to read. Personally I think all the news sites should come in a comic form. With as few captions as possible. Just like Perl I'm a write-only entity. Then I read that Gallium isn't an option when it comes to video acceleration because it exposes programmable pipeline and I got dizzy.

So I got off the carousel and I thought to myself: "Well, that's wrong", and apparently no one else got that. Clearly the whole "we're connected" thing is a lie because it doesn't matter how vividly I think about stuff, others still don't see what's in my head (although to the person who sent me cheese when I was in Norway - that was extremely well done). So cumbersome as it might be I'm doing the whole "writing my thoughts down" thing. Look at me! I'm writing! No hands! sdsdfewwwfr vnjbnhm nhn. Hands back on.

I think the confusion stems from the fact that the main interface in Gallium is the context interface, which does in fact model the programmable pipeline. Because of the incredibly flexible nature of the programmable pipeline a huge set of APIs is covered just by reusing the context interface. But in modern GPUs there are still some fixed function parts that are not easily addressable by the programmable pipeline interface. Video is a great example of that. To a lesser degree so is basic 2D acceleration (lesser because some of the modern GPUs don't have 2D engines at all anymore).

But, and it's a big but ("and I can not lie" <- song reference, pointing out to let everyone know that I'm all about music) nothing stops us from adding interfaces which deal exclusively with the fixed function parts of modern GPUs. In fact it has already been done as the work on a simple 2D interface has already started.

Basic idea is that the state trackers which need some specific functionality use the given interface. For example the Exa state tracker would use the Gallium 2d interface, instead of the main context interface. In this case the Gallium hardware driver will have a choice: it can either implement the given interface directly in hardware, or it can use the default implementation.

The default implementation is something Gallium will provide as part of the auxiliary libraries. The default implementation will use the main context interface to emulate the entire functionality of the other interface.

Video decoding framework would use the same semantics. So it would be an additional interface(s) with default implementation on top of the 3D pipeline. Obviously some parts of the video support are quite difficult to implement on top of the 3D pipeline but the whole point of this is that: for hardware that supports it you get the whole shabangabang, for hardware that doesn't you get a reasonable fallback. Plus in the latter case the driver authors don't have to write a single line of hardware specific code.

So a very nice project for someone would be to take VDPAU, VA-API or any video framework of your choice and implement a state tracker for that API on top of Gallium and design an interface(s) that could be added to Gallium to implement the API in a way that makes full usage of the fixed functionality video units found in the GPUs. I think this is the way our XvMC state tracker is heading.
This is the moment where we break into a song.

Wednesday, February 04, 2009

Latest changes

I actually went through all my blog entries and removed spam. That means that you won't be able to find anymore links to stuff that can enlarge your penis. I hope this action will not shatter your lives and you'll find consolation in all the spam that you're getting via email anyway. And if not I saved some of the links. You never know, I say.
I also changed the template, I'd slap something ninja related on it, but I don't have anything that fits. Besides nowadays everyone is a graphics ninja. I'm counting hours until the term will be added to the dictionary. So my new nickname will be "The Lost Son of Norway, Duke of Poland and King of England". Aim high is my motto.

As a proud owner of exactly zero babies I got lots of time to think about stuff. Mostly about squirrels, goats and the letter q. So I wanted to talk about some of the things I've been thinking about lately.

Our friend ("our" as in the KDE communities, if you're not a part of it, then obviously not your friend, in fact he told me he doesn't like you at all) Ignacio Castaño has a very nice blog entry about 10 things one might do with tessellation.
The graphics pipeline continues evolving and while reading Ignacio's entry I realized that we haven't been that good about communicating the evolution of Gallium3D.
So here we go.

I've been slowly working towards support for geometry shaders in Gallium3D. Interface wise the changes are quite trivial, but a little bigger issue is that some (quite modern) hardware, while perfectly capable of emitting geometry in the pipeline is not quite capable of actually supporting all of the features of geometry shader extension. The question of how to handle that is an interesting one, because just simple emission of geometry via a shader is a very desirable feature (for example path rendering in OpenVG would profit from that).

I've been doing some small API cleanups lately. Jose made some great changes to the concept of surfaces, which became pure views on textures. As a follow up, over the last few days we have disassociated buffers from them, to make it really explicit. It gives drivers the opportunity to optimize a few things and with some changes Michel is working on avoid some redundant copies.

A lot of work went into winsys. Winsys, which is a little misnomer, was a part of Gallium that did too much. It was supposed to be a resource manager, handle command submission and handle integration with windowing systems and OS'es. We've been slowly chopping parts of it away. Making it a lot smaller and over the weekend managed to hide it completely from the state tracker side.

Keith extracted the Xlib and DRI code from winsys and put it into separate state trackers. Meaning that just like WGL state tracker, the code is actually sharable between all the drivers. That is great news.

Brian has been fixing and implementing so many bugs/features that people should start writing folk songs about him. Just the fact that we now support GL_ARB_framebuffer_object deserves at least a poem (not a big one, but certainly none of that white, non-rhyming stuff, we're talking full fledged rhymes and everything... You can tell that I know a lot about poetry can't you)

One thing that never got a lot of attention is that Thomas (who did get one of them baby thingies lately) released his window systems buffer manager code.

Another thing that didn't get a lot of attention is Alan's xf86-video-modesetting driver. It's a dummy X11 driver that uses DRM for modesetting and Gallium3D for graphics acceleration. Because of that it's hardware independent, meaning that all hardware that has a DRM driver and Gallium3D driver automatically works and is accelerated under X11. Very neat stuff.

Alright, I feel like I'm cutting into your youtube/facebook time and like all "Lost Sons of Norway, Dukes of Poland and Kings of England" I know my place, so that's it.

Sunday, February 01, 2009

OpenCL

I missed you so much. Yes, you. No, not you. You. I couldn't blog for a while and I ask you (no, not you) what's the point of living if one can't blog? Sure, there's the world outside of computers, but it's a scary place filled with people that, god forbid, might try interacting with you. Who needs that? It turns out that I do. I've spent the last week in Portland on the OpenCL working group meeting which was a part of the Khronos F2F.

For those who don't know (oh, you poor souls) OpenCL is what could be described as "the shit". That's the official spelling but substituting the word "the" for "da" is considered perfectly legal. Longer description includes the expansion of the term OpenCL to "Open Computing Language" with an accompanying wikipedia entry. OpenCL has all the ingredients, including the word "Open" right in the name, to make it one of the most important technologies of the coming years.

OpenCL allows us to tap into the tremendous power of modern GPUs. Not only that but also one can use OpenCL with accelerators (like the physics chips or Cell SPU's) and CPUs. On top of that hardware OpenCL provides both task-based and data-based parallelism making it a fascinating options for those who want to accelerate their code. For example if you have a canvas (Qt graphicsview) and you spend a lot of time doing collision detection, or if you have image manipulation application (Krita) and you spend a lot of time in effects and general image manipulation, or if you have a scientific chemistry application with an equation solver (Kalzium) and want to make it all faster, or if you have wonky hair and like to dance polka... OK, the last one is a little a fuzzy but you get the point.

Make no mistake, OpenCL is little bit more complicated than just "write your algorithm in C". Albeit well hidden, the graphics pipeline is still at the forefront of the design, so there are some limitations (remember that for a number of really good and few inconvenient reasons GPUs do their own memory management so you can not just move data structures between main and graphics memory). It's one of the reasons that you won't see a GCC based OpenCL implementation any time soon. OpenCL requires run time execution, it allows sharing of buffers with OpenGL (e.g. OpenCL image data type can be constructed from GL textures or renderbuffers) and it forces code generation to a number of different targets (GPUs, CPUs, accelerators). All those things need to be integrated. For sharing of buffers between OpenGL and OpenCL the two APIs need to go through some kind of a common framework - be it a real library or some utility code that exposes addresses and their meaning to both OpenGL and OpenCL implementations.

Fortunately we already have that layer. Gallium3D maps perfectly to the buffer and command management in OpenCL. Which shouldn't be surprising given that they both care about the graphics pipeline. So all we need is a new state tracker with some compiler framework integrated to parse and code generate from the OpenCL C language. LLVM is the obvious choice here because unlike GCC, LLVM has libraries that we can use for both (to be more specific it's Clang and LLVM). So yes, we started on an OpenCL state tracker, but of course we are far, far away from actual conformance. Being part of a large company means that we have to go through extensive legal reviews before releasing something in the open so right now we're patiently waiting.
My hope is that once the state tracker is public the work will continue at a lot faster pace. I'd love to see our implementation pass conformance with at least one GPU driver by summer (which is /hard/ but definitely not impossible).

Thursday, August 28, 2008

SVG in KDE

"Commitment" is one of the words that have never been used in this blog. Which is pretty impressive given that I've managed to use such words as sheep, llamas, raspberries, ninjas, donkeys, crack or woodchuck quite extensively (especially impressive in a technology centric blog).

That's because commitment implies that whatever it is one is committed to plays an important role in their life. It's a word that goes beyond the paper or the medium on which it was written. It enters the cold reality that surrounds us.

But today is all about commitment. It's about commitment that KDE made to a technology broadly refereed to as Scalable Vector Graphics. I took some time off this week and came to Germany where I talked about usage of SVG in KDE.

The paper about, what I like to call, the Freedom of Beauty, is available here:

https://www.svgopen.org/2008/papers/104-SVG_in_KDE/

It talks about the history of SVG in KDE, the rendering model used by KDE, it lists ways in which we use SVG and finally shows some problems which have been exposed by such diverse usage of SVG in a desktop environment. Please read it if you're interested in KDE or SVG.

Hopefully this paper marks a start of a more proactive role KDE is going to be playing in shaping of the SVG standard.

Tuesday, August 26, 2008

Fixes in Sonnet

As we all know inner beauty is the most important kind of beauty. Especially if you're ugly. Not ugly, don't sue me, I meant to say "easy on the eyes challenged". That's one of the reasons I like working on frameworks and libraries. It's the appeal of improving the inner beauty of certain things. I gave up on trying to improve the inner beauty of myself (when I was about 1) so this is the most I can do.

You can do it too. It's real easy. I took this week off because I'm going to Germany for SVG Open where I'll talk about SVG in KDE and today fixed a few irritating bugs in Sonnet.

One of the things that bugged me for a while was the fact that we kept marking misspelled text as red instead of using the God given red squiggly underline. Well, I say no more!

Our spelling dialog lists available dictionaries now and one can change them on the fly. That's good. Raspberries good. And raspberries are pretty darn good. Even sheep like raspberries. Or so I think, the only sheep I've ever seen was from a window of a car and it looked like an animal who enjoys raspberries. Who doesn't? The only problem was that it liked listing things like "en_GB-ise" or "en_GB-ize-w_accents" as language names which is really like a nasty bug in the raspberry. And what do you with bugs? I'm not quite certain myself but given the way this blog is heading it's surely something disturbing... Anywho. that's also fixed. Now we list proper and readable names. As in:

Working on Sonnet is a lot of fun. A small change in a pretty small library affects the entire KDE which is rather rewarding. So if you wanted to get into KDE development in an easy and fun way go to https://bugs.kde.org search for "kspell" or "sonnet" pick an entry and simply fix it!

Wednesday, August 20, 2008

Fast graphics

Instead of highly popular pictures of llamas today I'll post a few numbers. Not related to llamas at all. Zero llamas. These will be Qt/KDE related numbers. And there's no llamas in KDE. There's a dragon, but he doesn't hang around with llamas at all. I know what you're thinking: KDE is a multi-coltural project surely someone must be chilling with llamas. I said it before and I'll say it again, what an avarage KDE developer, two llamas, one hamster and five chickens do in a privacy of their own home is none of your business.

Lets take a simple application, called qgears2, based on David Reveman cairogears and see how it performs with different rendering backends. Pay attention to zero relation to llamas or any other animals. The application takes a few options, -image: to render using a CPU based raster engine, -render: to render using X11's Xrender and -gl to render using OpenGL (-llama option is not accepted). It has three basic tests, "GEARSFANCY" which renders a few basic paths with a linear gradient alpha blended on top, TEXT that tests some very simple text rendering and COMPO which is just compostion and scaling of images.

The numbers come from two different machines. One is my laptop which is running Xorg server version 1.4.2. Exa is 2.2.0. Intel driver 2.3.2. GPU is 965GM, CPU is T8300 at 2.4GHz running on Debian Unstable's kernel 2.6.26-1.
The second machine is running GeForce 6600 (NV43 rev a2), NVIDIA proprietary driver version G01-173.14.09, Xorg version 7.3, kernel 2.6.25.11, CPU is Q6600 @ 2.40GHz (thanks to Kevin Ottens for those numbers, as I don't have NVIDIA machine at the moment).

The results for each test are as follows:

GEARSFANCY
	I965	NVIDIA
Xrender	35.37	44.743
Raster	63.41	41.999
OpenGL	131.41	156.250

TEXT
	I965	NVIDIA
Xrender	13.389	40.683
Raster	(incorrect results)	(incorrect results)
OpenGL	36.496	202.840

COMPO
	I965	NVIDIA
Xrender	67.751	66.313
Raster	81.833	70.472
OpenGL	411.523	436.681

COMPO test isn't really fair because as I mentioned Qt doesn't use server side picture transformations with Xrender but it shows that OpenGL is certainly not slow at it.

So what these results show is that GL backend, which hasn't been optimized at all, is between 2 to 6 times faster than anything out there and that pure CPU based Raster engine is faster than the Xrender engine.

So if you're on an Intel GPU, or NVIDIA GPU rendering using GL will immediately make your application a number times faster. If you're running on a system with no capable GPU then using raster engine will make your application faster as well.
Switching Qt to use GL backend by default would result in all applications running a magnitude times faster. The quality would suffer though (unless HighQualityAntialiasing mode would be used in Qt in which case it would be the same). This certainly would fix our graphics performance woes and as a side-effect allow using GL shaders right on the widgets for some nifty effects.
On systems with no GPU raster engine is a great choice, on everything else GL is clearly the best option.

Friday, June 27, 2008

Accelerating desktops

In general I'm extremely good at ignoring emails and blog posts. Next to head-butting it is one of the primary skills I've developed while working on Free Software. Today I will respond to a few recent posts (all at once, I'm a mass-market responder) about accelerating graphics.

Some kernel developers released a statement saying that binary blobs are simply not a good idea. I don't think anyone can argue that. But this statement prompted a discussion about graphics acceleration, or more specifically a certain vendor who is, allegedly, doing a terrible job at it.

First of all the whole discussion is based on a fallacy rendering even the most elaborate conclusions void. It's assumed that in our graphics stack there's a straight forward way between accelerating an api and fast graphics. That's simply not the case.

I don't think it's a secret that I'm not a fan of XRender. Actually "not a fan" is an understatement I flat out don't like it. You'd think that the fact that 8 years after its introduction we still don't have any driver that is actually real good at accelerating that "simple API" would be a sign of something... anything. When we were making Qt use more of the XRender api the only way we could do that is by having Lars and I go and rewrite the parts of XRender that we were using. So what happened was that instead of depending on XRender being reasonably fast we rewrote the parts that we really needed (which is realistically just the SourceOver blending) and did everything else client side (meaning not using XRender)

Now going back to benchmarking XRender. Some people pointed out an application I wrote a while back to benchmark XRender: please do not use it to test a performance of anything. It will not respond to any real workloads. (also if you're taking something I wrote to prove some arbitrary point, it'd be likely a good idea to ping me and ask about it. You know on account of writing it, I just might have some insight into it). The thing about XRender is that there's a large amount of permutations for every operation. Each graphics framework which uses XRender uses specific, defined paths. For example Qt doesn't use server-side transformations (they were just pathetically slow and we didn't feel it would be in the best interest of our users to make Qt a lot slower), Cairo does. Accelerating server side transformations would make Cairo a lot faster, and would have absolutely no effect on Qt. So whether those tests pass with 20ms or 20hours has 0 (zero) effect on Qt performance.

What I wanted to do with the XRender performance benchmarking application is basically have a list of operations that need to be implemented in driver to make Qt, Cairo or anything else using XRender fast. "To make KDE fast look at the following results:" type of thing. So the bottom line is that if one driver has for example result of 20ms for Source and SourceOver and 26 hours for everything else and there's second driver that has 100ms for all operations, it doesn't mean that on average driver two is a lot better for running KDE, in fact it likely means that running KDE will be five times faster on driver one.

Closed sourced drivers are a terrible thing and there's a lot of reasons why vendors would profit immensely from having open drivers (which is possibly a topic for another post). Unfortunately I don't think that blaming driver writers for not accelerating graphics stack which we went out of our way to make as difficult to accelerate as possible is just a good way of bringing that point forward.

Monday, June 02, 2008

Animated interfaces

Lately I've been writing a lot about frameworks, today I want to take a step back and talk about a "technique". "Drunken master"/"Praying Mantis" kind of foo pertaining to animations.

Over the years of writing animated user interfaces I've developed a set of rules that I follow when writing animations. It's been a checklist that I've been following almost religiously. Much like my morning list of "1) Open eyes, 2) Check for dead bodies in the bed, 3) Around the bed, 4) if 2 and 3 are negative brush teeth and take a shower, otherwise prepare for a very bad day", which is the main reason why I never had a bad day in my life. Which is another good lesson to learn - very low expectations make for a very fulfilling life.

I've realized that those rules might be useful to others so I'll write a bit about them today. I guarantee you that if you'll follow them the animations that you'll add to any user interface will not make any of your users want to stab you, which again, following the low expectations lesson from the above, is a making of a great day. In fact following these rules will make your UI rock, which even if you have high expectations is a desirable quality.

So without further ado, here are my rules:

Anger rule:
Creating animations is a lot of fun. Which in turn makes the act of adding animations to a user interface a happy activity. When we're happy we're willing to endure a lot more abuse. In particular ignore or not even notice something that is very irritating. Unfortunately computer UIs are usually used by people who are not happy at all (e.g. they're at work) and their perception of what seemed like a neat animation to you when you were in a great mood will be vastly different. So always, always make sure you've experienced all of your animations when being angry. If they haven't irritated the hell out of you, congratulations you are on to something.

Blind interpolator rule
Find someone who has never seen the animation you're designing, tell them to close their eyes as soon as the animation starts. Ask them how they think it ended. If their brain isn't able to fill in the blanks and figure out how the animation ends then the animation does something unexpected that will force your users to learn it. For a user interface to be intuitive you have to avoid forcing users to learn its behavior. It has to come naturally.

The timing rule
This one is tricky. Timing your animation correctly is one of the hardest things to do. I use a two step approach to figure this one out:
- follow physics - so follow timings from a real world, e.g. if something is falling let it lasts as long as it would if you had dropped something in real world,
- make it fast - if animation lasts too long people try to stop it by hitting any random key on the keyboard. From user-interface perspective what you definitely want to avoid is having your users hitting random keys while the application is running.
Animations in user-interfaces need to be very, very short. The point of them is to give subtle hints as to where things are coming from. They're an aid in understanding computers, not a graphical effect that is meant to impress. I tend to violate this rule because if I spend 2 hours writing a really neat animation I'll be damned if everyone won't be forced to look at it. But that is wrong! Very wrong. Subtlety is the key here. If the animation is running on a desktop a good rule of thumb is the "escape key rule" - if your animation is longer than the time required to move a hand from the mouse and hit the escape key, the animation is too long.

No sci-fi rule.
Also known as the 'avoid goofy and crazy things rule'. Effects grounded in reality will make sure your interface is easier to learn and more intuitive. For user interfaces wacky and cool don't imply "good", in fact it's usually just the opposite. These are not games where "wacky" and "cool" are desirable qualities.

The refresh rule

Make your animation run at the number of frames per second equal to the refresh rate of the output device and synchronize the updates with vertical retrace.

Lately I got obsessed with trying to automatically figure out what is an optimal number of frames per second for animations in user interfaces. I can obsess with the craziest of them so last week I added this rule.

What do you think, how many frames per second should an animation be running at? 15? 24? 30? 40? 60? Coincidentally those are also this weeks winning lottery numbers. The answer is "it depends". It is highly dependent on the refresh rate of the output device. The "you need 24fps (or 30fps or even 60fps) to achieve smoothness" is a myth. No one knows how many frames per second humans can actually perceive but pilots were able to decipher kinds of planes shown for 1/220th of a second. So it seems that we could actually recognize objects at 220fps. How many would we require to not notice any frames is a question without an answer right now but it's likely that you'd need more than 400fps to do it. None of the commercially available display devices can refresh at that speed. So ideally what you want to do is synchronize the number of frames per second to a refresh rate of your output device. Here's an example you can play with: http://byte.kde.org/~zrusin/animsync.tar.bz2. You'll see a square moving back and forth in a window like this:

You can specify the number of frames per second on the command line and passing "-s" option will sync the animation to the vertical retrace of your output device (assuming your GL driver supports it, which, unless you're running DRM head or the closed NVIDIA driver is unlikely). Experiment with it a bit.

So, these are my rules. They're not laws so in certain situations you might need to break one of them but if you do, you better have a very good explanation that you can back up with some facts for why you're doing so.

Monday, March 03, 2008

No black here

Sup, y'all. I realized that Free Software is a lot like the wild west used to be. So, partner, I'll be spreading some "west" and a lot of "wild" over this post.
"What?", you say (oh I'll have a conversation with you whether you want it or not). Well, the connection is obvious once you think about it: during the wild west days people used to ride horses, kill each other for no apparent reason and raise cattle, while in Free Software we write software. I rest my case.

I've spent the last week with Aaron. I absolutely love hanging out with him. It's platonic. Or so I think, with all the heavy drinking that I do, it all gets a little blurry. Also, Peyton (Aaron's son) is a wickedly cool kid.

Anyway, I have a lot of Gallium3D things to do which are a priority, but at nights Aaron and I hacked on Plasma and KDE. I think I speak on behalf of Aaron when I say that we became computer programmers for the women. Which might seem a little confusing to, well all of you (especially if you're a woman) until you realize that it came down to being either a computer programmer or a crackhead. Computer programmer job pays, like, way better and if I had to pick second reason why I do what I do it's money.
We got the Dashboard widgets working. It's been something that I wanted to do for the longest time. Obviously not all of them work because some of them use OS X specific apis (like Core Image magic).
I also added interfaces to use Plasma's DataEngine's from JavaScript in web applets. So you would do something like


var engine = window.plasma.loadDataEngine("time");
var data = engine.query("Local");
document.getElementById('time').innerHTML = "Time is " + data.value("Time");

to use Plasma's time data engine to display the current time. One could use Plasma's Solid data engine to get the list of all the devices attached to the computer and display it in the web applet which would be a little more useful than another a time widget but you ain't enterprise ready unless you have 23 clock applets and we're almost there. There's also a small bug somewhere, that apparently doesn't exist in Qt and is, in fact, a figment of my own imagination due to which the background looks black. On Chuck Norris widget (that like a lot of other dashboard widgets just works. You download it, click on it, run it, show it to all your friends, and remove it once they're gone because it's pretty damn useless) it looks like this:

Do you see black? No, you don't! It's simply that Chuck Norris is a black hole that consumes everything around it, including all the color. Deal with it. Chuck Norris has.

Thursday, February 07, 2008

OpenVG and accelerating 2D

I tend to break a lot of keyboards. Not because I release all the aggression that I hold deep within me on them, but because I drool a lot. It's not my fault, I really do love graphical bling. Since I'm one of the people who flourishes not when he's doing well, but when others are doing just as badly I've thought about getting other people on the "excessive drooling" bandwagon.

I've tried it in the past. First with my "you ain't cool, unless you drool" campaign, which was not as successful as I've seen it be in my head. It made me realize that marketing is unlikely one of my superpowers. That was a real blow especially since it came a day after I've established that there's like a 95% chance that I can't fly and if I can, then my neighbor will be seriously pissed if I keep landing on his car. You'd think they'd build them stronger, but I digress. After that I went with my strengths and had two technical efforts. The first one led to Exa the second to Glucose. Both are acceleration architectures that try to accelerate Xrender - the API which we use for 2D on X. What Xrender is very good at is text. What Xrender is not so good at is everything else.

Furthermore what one really wants to do nowadays is use the results of a 2D rendering in a 3D environment as a texture or simply implement effects on top of the 2D rendering with shaders. Wouldn't it be nice to have a standard and simple API for all of that? You bet your ass it would. In this particular case "you bet your head" would be a more suitable expression, since by a simple act of reading this blog it's clear you already gave up on your ass and stake your future on your head. I endorse that (both the head more important than ass theory and better api idea). Currently, through the magic of DRM TTM and GLX_texture_from_pixmap one could achieve partially that (we'd need GLX_pixmap_from_texture to finish it), but if you've seen Japanese horror movies you know they got nothing on the code one ends up with, when doing that.

I already mentioned in my previous post that we can lay any number of API's on top of Gallium3D. In fact in the last diagram I already put the two graphics API's that interest me on top of it. OpenVG and OpenGL. In my spare time I've started implementing OpenVG on top of Gallium3D. I'm implementing 1.1 which hasn't been officially released yet. While OpenVG 1.0 is essentially useless for our drool-causing desktops because it doesn't even touch the subject of text handling, 1.1 does and that in itself makes it a great low-level 2D vector graphics api.

We already have OpenVG engines for Qt and Cairo which should make the switch fairly painless. "Hey", you say and I swiftly ignore you, because I have a name you know. "Sunshine", you correct yourself and I smile and say "Huh?". "I want my KDE 4 fast and smooth! Gallium3D has this 3D thing in the name and I have hardware that only does 2D, what about me?". Nothing. You need to use other great solutions. "But my hardware can accelerate blits and lines!". Awesome, then this will likely rock your world. As long as you won't try to run any new applications of course. Even embedded GPU's are now programmable and putting the future of our eye-candy on a technology that predates 2 year old embedded GPU's is an exercise in silly which my chiseled pecs refuse to engage in.

OpenVG standard says "It is possible to provide OpenVG on a platform without supporting EGL. In this case, the host operating system must provide some alternative means of creating a context and binding it to a drawing surface and a rendering thread." which is exactly what we want. That's because we already have that layer, it's GLX. GLX will do the context creation for us. This also means that we'll be able to seemingly combine 2D vector graphics and 3D and manipulate the results of vector rendering the same way we would normal 3D.

Finally 2D graphics will be accelerated the same way 3D is and those hours which you've spent playing 3D games thinking "how the hell is it possible that putting a clock on my desktop makes it choppy when this runs at 400fps?" will be just a story you'll get to tell your grand-kids (while they stare at you with the "please god, let me be adopted" look). As a bonus we get two extremely well documented API's (OpenGL and OpenVG) as our foundation and instead of having two drivers to accelerate 2D and 3D we'll have a single driver.

So what happens with Glucose? Alan and José are still working on it a bit and in a short-term it does provide a pretty enticing solution but long term OpenVG/OpenGL combo is the only thing that really makes sense.

With much love,
Drool Coordinator

Wednesday, February 06, 2008

GPGPU

Would you like to buy a vowel? Pick "j", it's a good one. So what if it's not a vowel. My blog, my rules. Lately I had a major crash on all things "J". Which is why I moved to Japan.

It's part of my "Most expensive places in the world" tour, unlikely coming to a city near you. I lived in New York City, Oslo, London and now Tokyo. I'm going to write a book about all of that entitled "How to see the world while having no money whatsoever". It's really more of a pamphlet. I have one sentence so far "Find good friends" and the rest are just pictures of black (and they capture the very essence of it).

José Fonseca helped me immensely with the move to Japan, which was great. Japan is amazing, even though finding vegetarian food is almost like a puzzle game and trying to read Japanese makes me feel very violated. So if you live in Tokyo your prayers have been answered, I'm here for your pleasure. Depending on your definition of pleasure of course.

Short of that I've been working on this "graphics" thing. You might have heard of it. Apparently it's real popular in some circles. I've been asked about GPGPU a few times and since I'm here to answer all questions (usually in the most sarcastic way possible... don't judge me, bible says not to, I was born this way) I'm going to talk about GPGPU.

To do GPGPU there's ATI's CTM, NVIDIA's Cuda, Brooke and a number of others. One of the issues is that there is no standard API for doing GPGPU across GPU's from different vendors so people end up using e.g. OpenGL. So the question is whether Gallium3D could make such things as scatter reads accessible, without falling back to using vertex shaders or vertex shaders/fragment shaders combination to achieve them.

Core purpose of Gallium3D is to model the way graphics hardware actually works. So if the ability to do scatter reads is available in modern hardware then Gallium3D will have support for it in the API. Now having said that, it looks like scatter reads are usually done in a few steps, meaning that while some of the GPGPU specific api's expose it as one call, internally number of cycles passes as few instructions are actually being executed to satisfy the request. As such this functionality is obviously not the best to expose in a piece of code which models the way hardware works. That functionality one would implement on top of that api.
I do not have docs for the latest GPU's from ATI and AMD so I can't say what it is that they definitely support. If you have that info let me know. As I said the idea being that if we'll see hardware supporting something natively then it will be exposed in Gallium3D.

Also you wouldn't want to use Gallium3D as the GPGPU api. It is too low level for that and exposes vasts parts of the graphics pipeline. What you (or "I" with vast amount of convincing and promises of eternal love) would do is write a "state tracker". State trackers are pieces of code layered on top of Gallium3D which are used to do state handling for the public API of your choice. Any api layered like this will execute directly on the GPU. I'm not 100% certain whether this will cure all sickness and stop world hunger but it should do, what even viagra never could, for all GPGPU fanatics. The way this looks is a little like this:

This also shows an important aspect of Gallium3D - to accelerate any number of graphical API's or to create a GPU based non-graphics API, one doesn't need N number of drivers (with N being the number of API's), as we currently do. Gallium3D driver (that's singular!) is enough to accelerate 2D, 3D, GPGPU and my blog writing skills. What's even better is that of the aforementioned only the last one is wishful thinking.

So one would create some nice dedicated GPGPU api and put it on top of Gallium3D. Also since Gallium3D started using LLVM for shaders, with minimal effort it's perfectly possible to put any language on top of GPU.

And they lived happily ever after... "Who" did is a detail, since it's obvious they lived happily ever after thanks to Gallium3D.

Thursday, December 27, 2007

Constant state objects

I know you wither away, like a leaf in the darkness of the chilly, Autumn night, without my posts. I've been depriving you for too long of the sunshine that is me. Writing an interesting post explaining the new way one is doing register allocation or how the code generation has changed to benefit the drivers is a little difficult and I didn't feel like it would interest anyone. Especially that I know that if I write a blog about a little demo app I wrote in half an hour to showcase some technology I get 50 comments but when I blog about the process of creating that new technology I get two comments with one of them being "What?". On top of that I was lacking those new nude photos (it's art you pig!) of myself that I've promised you and which you yearn for so much.

Lately I've been working on the Gallium3D i965 driver. I had to read up some documentation because, as it turned out, being a "graphics ninja" did not give me intrinsic knowledge of all graphics hardware ever built. I know! I was as shocked as you are right now.

It started with the fact that I wanted to experiment with the layout of vectors in shaders for the code generation using LLVM facilities.

To experiment with different layouts I've decided to write i965 driver for Gallium3D and experiment with LLVM code generation for i965 in that driver. Keith Whitwell and I have been hacking on it and it's going pretty well. It's amazing how much code we've removed from the old i965 driver while porting it to Gallium3D. It was a rather nice feeling to see so much of the complexity of the driver disappear.

The great thing about writing a Gallium3D driver is that a lot of the complexity of the high level API goes away, as it's being moved to "state tracker". State tracker is responsible for all API specific state handling and tricky conversions. The driver never sees the state tracker, it implements a very thin interface, which corresponds rather closely to the way modern hardware works.

One of my favorite changes is the new way we handle state changes. It used to be that the driver had to check whether any of its state changed and if it did upload it before drawing anything. It turned out to be a rather serious bottleneck and it made reading our driver a little painful.

In Gallium3D we went away from that. Now we use similar semantics to what Direct3D10 and OpenGL3 (will) use. Which is that states are largely immutable objects. Their usage follows the

create from a template
bind the state to make it active for subsequent rendering calls
delete when not needed anymore

pattern. This way the driver can do all of its conversion on creation (which ideally happens only once) and then on bind calls (which we have multiple off) it can just reference the id of this state to have it used instead of having to do a full conversion from the Mesa state and upload of the converted state.
When I wrote the constant state objects code I ported our simple i915 driver and, even though on i915 we don't have hardware state caching just doing the state conversions in create calls improved the performance in simple examples by about 15fps. For more complicated examples where the state changes are more frequent it will be a lot more. Not even mentioning drivers which will do full state caching in the hardware where this is going to fly like Superman with diarrhea.

Less complexity in the driver and faster code, is what I think love is all about. Granted that my idea of love might be a smidge skewed on account of me being crazy and all but no one can argue with the "simpler/faster" being 'awesome'.

And to really top this graphics talk off, a picture of me naked:

In retrospect not my best day. The lighting was all wrong...

Friday, November 02, 2007

Gallium3D LLVM

I've seen the future. The blurry outlines sketched by such briliant audio-visual feasts as Terminator came to fruition as in the future the world is ruled by self-aware software.
That's the bad news. The good news is that we haven't noticed.
We're too busy due to the fact that we're all playing the visually stunning "The Good Guy Kills the Bad Guys 2" on Playstation 22. It really captured the essence of the first. The theaters are ruled by "Animals - Did they exist?" documentary with some stunning CG of a horse, although I could have sworn that horses had 4, not 3 legs but then again I'm no nature expert.
What everyone was wrong about though is which program became self-aware and exerted its iron-fist like dominance upon the unsuspecting humans and the last few, very suspicious cockroaches. It wasn't a military mistake. It was a 3D framework that evolved.

But lets start from the beginning. First there was Mesa, the Open Source implementation of the OpenGL specification. Then there came Gallium3D, a new architecture for building 3D graphics drivers. Gallium3D modeled what modern graphics hardware was doing. Which meant that the framework was fully programmable and was actually code-generating its own pipeline. Every operation in Gallium3D was a combination of a vertex and a fragment shader. Internally Gallium3D was using a language called TGSI - a graphics dedicated intermediate representation.

Gallium3D was generating vertex and fragment shaders at run-time to describe what it was about to do. After that system was working, some engineers decided that it would make sense to teach Gallium3D to self-optimize the vertex/fragment programs that it, itself, was creating. LLVM was used for that purpose. It was used because it was an incredible compiler framework with a wonderful community. The decision proved to be the right one as Gallium3D with LLVM proved to be a match made in heaven. It was pure love. I'm not talking about the "roll over onto your stomach, take a deep breath, relax and lets experiment" kind of love, just pure and beautiful love.

So lets take a simple example to see what was happening. Lets deal with triangles, because they're magical.

Now to produce this example Gallium3D was creating two small programs. One that was run for every vertex in the triangle and calculated its position - it was really just multiplying the vertex by the current modelview matrix - that was the vertex shader. The other program was run on every fragment in this figure to produce the resulting pixels - that was the fragment shader. To execute these two programs they were being compiled into LLVM IR, LLVM optimization passes were run on them and LLVM code generators were used to produce executable code. People working on Gallium3D quickly noticed that, even though, their code wasn't optimized at all and it was doing terribly expensive conversions all the time, it was up to 10x faster with LLVM on some demos. They knew it was good.

So Gallium3D was in essence, at run-time, creating and optimizing itself. Which lead many Free Software enthusiast to create, wear and rarely even wash shirts with a slogan "We might not have a billion dollar budget but our graphics framework is smarter than all the people in your company together".

Then in the year 2113 Gallium3D got bored with just creating graphics and took control of the entire world. Which realistically speaking wasn't hard to do because we were willingly immersing ourselves in the worlds it was creating for us anyway.

But that's still many, many years away from our boring present. So for now, while you wait for sex robots, dinners in a tube and world without insects (or for that matter absolutely any animals at all) you can just go, get and play with Gallium3D where LLVM is used. At the moment only in the software cases, but the fifth of November is going to mark the first day in which work on code-generating directly for GPU's using LLVM is going to start.

Remember, remember the fifth of November... (oh, come on that's one heck of an ending)

Tuesday, October 23, 2007

KHTML future

I've read Harri's blog about WebKit and I figured it makes sense for someone to respond. First of all I liked the blog, It was full of drama, action, despair, marketing and bad and good characters. Which is really what I'm looking for when reading fiction.

Especially the part that mentioned QtWebKit as an irrelevant fork of KHTML sources. That was awesome. It's the kind of imagination we need more of in the blogosphere. For the purposes of the point Harri was trying to make, which I think was "no matter what's the reality, our ego is bigger than yours", it was a well suited argument.

Describing the WebKit project as a fork of KHTML sources is like calling GCC a fork of EGCS, or to use a more popular analogy it's like calling chicken a fork of an egg. If you want to talk about forks then technically nowadays KHTML is a fork of WebKit. Not a terribly good one at that. It's real easy to back that statement up by comparing the number of submits to KHTML to the number of submits to WebKit. In fact that comparison is just embarrassing for KHTML.

I also found it funny that people like Lars Knoll, Simon Hausmann, George Staikos or myself are not part of the KHTML team. "We are the 'KHTML team' (except KHTML's author and ex-main developer Lars who's one of the biggest supporters of WebKit now and other people who used to work on KHTML but now work on WebKit as well... but they were all ugly... honestly!)" you can go make shirts with that.
We're working on WebKit now hence we're not KHTML team members. Any KDE developer who works on WebKit (hey, Niko, Rob, Adam, Enrico...) is automatically dissociated from the KHTML team.

The fact is that there is more KDE developers contributing to WebKit than there is KDE developers contributing to KHTML.

So since there's more of us, I think technically that means that we are the official KDE web engine team. KHTML team, we would love to work with you, the fork, but you're kind of a pain in the butt to deal with.

Which is ok, because like I mentioned a number of times KDE community lives of the "who does the work decides" dogma. And ultimately the Apple guys, the Trolltech guys, people from George's company who work on this stuff full-time and tons of Free Software contributors working on WebKit do much, much more work than people do on KHTML.

On a more serious note, let me explain a very important point: bug for bug compatibility with the latest Safari would be worth much, much more to KDE than any patches that are in KHTML and haven't been yet merged to WebKit could ever be worth.
Web works on the principle of percentages - web-designers test their sites with engines that have X% of market reach. Konqueror with stock KHTML isn't even on their radar. WebKit is. Having web designers cater towards their engine is worth more than gold to KDE users.

And if you care more about some personal grudges than the good of KDE, that's also OK, because we, the official KDE web rendering team will do what's right for KDE and use WebKit.

Saturday, September 29, 2007

Gallium3D, Shaders and LLVM

Today we're going to talk about shaders. Well, I'll talk, or to be more specific write, or to be blunt I'll pretend like I'm actually capable of putting my thoughts into readable excerpts that other human beings (hopefully you) and some of my imaginary friends (they're not all winners) can understand.

The question I've been asked a few times during the last week was "who are you and what are you doing in the bushes outside my house", which isn't related to computer graphics at all and what I do in my spare time is none of your business so I won't be talking about that. Now the other question that I've heard a few times during the last week was "will Gallium3D use LLVM?", the short answer is "yes, it will".

First of all a little about graphics hardware. A common thing to do in modern graphics hardware is to have very wide registers and allow stuffing arbitrary vectors inside those registers. For example one register might very well store 8 2 component vectors. Or 16 components of 16 different vectors with other components being stored in subsequent registers. To support writing to those wide registers, usually there's another register, often a stack of them, which is used as a write mask for all operations. Cool, eh? So now when your language supports, god forbid, branches or loops and you want to code generate something for graphics hardware, you're left with two options. Option one is to give up and go ride donkeys in a circus and option two which is to do something crazy to make it work. To be honest I've never even seen a real life donkey. I've seen a cow but we just didn't hit it off. So I knew that option one is just not right for me.

So one of the big worries that we had was whether we'll be able to code generate from LLVM for graphics hardware. After some discussions about pattern matching in code generators and opcode lowering it finally looks like the answer is "yes, we will be able to generate something usable". So the way it will work in Gallium3D is largely similar to the I wanted to do it in the LLVM GLSL code that Roberto and I have been working on for Mesa a few months back. The difference is that the IR in Gallium3D is completely language agnostic.

You can run OpenGL examples already, granted that some of them will not produce correct results,but if it all would just work then I'd have nothing to blog about. I'll start integrating LLVM parts within the next two weeks which is when the performance should get a major boost and flowers should bloom everywhere. You might think that the latter is not, technically, related to our work on Gallium3D and the fact that Autumn is here makes that last statement even more dubious, but you're wrong. Who would you rather trust, you or me? I bet you thought "me" and so I rest my case.

And all of that is brought to you without any sheep sacrifice and hardly any virgin sacrifice ("hardly any" because I, as a representative virgin, am making a small sacrifice, but from what I understand it doesn't count as a full fledged "virgin sacrifice").
How do you like them apples? (or oranges... or strawberries... I like raspberries... They're all good is I guess my point).

Friday, September 21, 2007

Gallium3D

Critics are raving: "Gallium 3D is the best thing that ever happened to Free Software graphics", "It's breathtaking!", "Never before has nudity been so tasteful!"... Alright, maybe not the last one. Actually none of them, since it's a brand new project. In fact that's the point of this entry. To introduce you two.

You, a brilliant (as derived from the fact that you're reading this blog) Free Software enthusiast or simply my very own stalker (both options very satisfying to me personally). And Gallium3D, the foundation of Free Software graphics for years to come.

Gallium3D is a redesign of Mesa's device driver model. It's a new approach to the problem of accelerating graphics. Given tremendous investment that free desktops make in OpenGL nowadays I'm very excited to be working on it.

At Tungsten Graphics we've decided that we need a device driver model that would:

make drivers smaller and simpler
model modern graphics hardware
support multiple graphics API's

The basic model, as presented by Keith Whitwell on XDS2007, looks as follows:

You can follow the development of Gallium as it happens in Mesas gallium-0.1 branch.

Also you can read a detailed explanation of what it is on our wiki .

Now why should you be excited (besides the fact that, like I already pointed out, there's no developer nudity in it and that being excited about the stuff I'm excited about is in general a good idea).

Faster graphics
Better and more stable drivers
OpenGL 3
Ability to properly accelerate other graphics APIs through the same framework. Did someone say OpenVG?

This is a huge step on our road to tame the "accelerated graphics" demon in Free Software. We've been talking about it for a long time and now and we're finally doing it. There's something zen like about working on free software graphics for years and finally seeing all the pieces falling into place.