Zack Rusin

Bodega

2013-10-25T04:54:00.000+01:00

Aaron wrote a number of blogs about Bodega. Our open market for digital content. Today I wanted to write a bit about the technology behind it. Plus my prose is unbeatable - it works out, a lot, and is ripped like Stallone during his "definitely not on steroids, just natural, inject-able supplements" years, so it should be appreciated more widely.

If you ever used a content distribution platform, like the Google Play or Apple's App store, I'm sure you noticed the terrible way in which they handle content discovery, rankings, communication with the publishers and the vetting process. To some extend Bodega solves all of those and every day it's closer to the vision that we had when we started it. And we weren't even high that day. Well, I wasn't, Aaron was high on life. And crack. Mostly crack. I apologize, crack isn't funny. Unless you're a plumber, in which case it's hilarious.

What wasn't very clear at the beginning was the technology that we would use it to write it. We started with PHP because Aaron had some code that we could reuse from another project. Testing became a nightmare. I'm not a huge fan of dynamically typed languages for server programming, they make it a lot harder to validate the code and avoid silly mistakes and without highly efficient testing setup they can make your life hell. So one weekend when I got sick of setting up Apache once again to test something, I just sat down and rewrote the code in Javascript using Node.js. I know, I know, "you don't like dynamically typed languages and you rewrote it in js?", keep reading...

Node.js was a great choice for Bodega. Testing became trivial (we're using Mocha). Between our unit tests and jshint support, I can live with the dynamic nature of the language.

What makes node.js great is npm and its extensive collection of packages. That tooling means that you can focus just on the code that you need to write to finish your project not the code that you need to start it. That's why I picked it.

If I didn't have a day job I'd pick either Go, Haskell or just do everything by hand in C++. I have to admit that I like Go a lot (even though the system level graphics hacker in me shakes his head at a language without native vector support) but it's hard to argue with the fact that node.js allows us to move very quickly. So Bodega is written in node.js and uses Posgresql to store all of its data and that combination works amazingly well.

It's a Free Software content distribution platform that actually works and it's just the start. In my obviously biased opinion it does a lot of really interesting things that make it a very important project, but that's a different topic.

ApiTrace 2.0

2011-09-27T23:14:00.000+01:00

ApiTrace 2.0 is out. ApiTrace is by far the best graphics debugging tool out there.

We have implemented so many new features since 1.0 that we have decided to bump the version number to 2.0.

Some of the new features include:

About 10x faster tracing and 2x faster retracing
Mac OS X support
OpenGL 4.2 support
Support for multi-gigabyte traces in the GUI
Ability to display all of the uniforms
Showing number of calls per frame, marking large frames and ability to indent shaders
Support for `GL_GREMEDY_string_marker` and `GL_GREMEDY_frame_terminator` extensions, so you can mark frames and important pieces of code in your applications.
Incredible amount of bug fixes, making sure that everything we throw at it just works

The huge improvements in the performance of tracing came from both flushing/syncing the trace files only if there's an uncaught signal/exception and switching to Snappy compression from zlib.
We also did some magic to seek and load on demand from a compressed file on the disk which means that we don't load all of the data at once. That allows us to load huge files in the GUI while using very little memory. For example loading a 1GB file takes only ~20MB of RAM.

All of those improvements mean that it's possible to trace and debug huge applications with little to no costs. It's quite amazing. In fact working with graphics and not using ApiTrace starting now is going to be classified as abuse.
Read the instructions and start living the life of a computer graphics superstar, which differs from the life of an average computer graphics hacker by about 10 keyboard headbutts and half a bucket of tears a week! And that's a change we can all believe in!

NV path rendering

2011-09-18T01:27:00.000+01:00

A while ago NVIDIA released drivers with their NV_path_rendering extension. GL_NV_path_rendering is a relatively new OpenGL extension which allows rendering of stroked and filled paths on the GPU.

I've heard about it before but lately I've been too busy with other stuff to look at, well, anything.

I've decided to spend a few seconds looking at the videos NVIDIA posted on their site. They were comparing the NV_path_rendering, Skia, Cairo and Qt. Pretty neat. Some of the demos were using huge paths, clipped by another weird path and using perspective transforms. Qt was very slow. It was time for me to abandon my dream of teaching my imaginary hamster how to drive a stick shift and once again look at path rendering.

You see, I wrote the path rendering code so many times that one of my favorite pastimes was creating ridicules paths that no one would ever think about rendering and seeing how fast I could render them. Qt's OpenGL code was always unbelievably good at rendering paths no one would ever render. Clearly these people were trying to outcrazy me.

Fortunately there's an SDK posted on the NVIDIA site and it's really well done. It even compiles and works on GNU/Linux. Probably the best demo code for a new extension that I've ever seen. The extension itself is very well done as well. It's very robust, ultimately though it's the implementation that I care about. I have just one workstation with an NVIDIA card in it, a measly Quadro 600, running on a dual processor xeon e5405, but it was enough to play with it.

The parts using Qt were using the raster engine though. I've looked at the code and decided to write something that would render the same thing but using just Qt. The results were a little surprising. Qt OpenGL could render tiger.svg scaling and rotating it at about 270fps, while the NV_path_rendering was running at about 72fps. Here's both of them running side by side:

(numbers lower for both on account of them running at the same time of course). As you can see Qt is almost 4x faster. I've figured it might be related to the different SVG implementations and rendering techniques used, so I quickly hacked the demo NVIDIA posted to open a brand new window (you need to click on it to start rendering) and render to QGLPixelBuffer but using the same SVG and rendering code as their NV_path_rendering demo code. The results were basically the same.

I posted the code for the Qt demo and the patch to nvpr_svg on github: https://github.com/zackr/qt_svg

The patch is larger than it should be because it also changed the file encoding on the saved files from DOS to Unix but you shouldn't have any issues applying it.

So from a quick glance it doesn't seem like there are any performance benefits to using NV_path_rendering, in fact Qt would likely be quite a bit slower with it. Having said that NVIDIA's implementation looks very robust and a lot more numerically stable. I've spent a little bit of time looking at the individual pixels and came away very impressed.

In general the extension is in a little bit of a weird situation. On one hand, unlike OpenVG which creates a whole new API, it's the proper way of introducing GPU path rendering, on the other hand pretty much every vector graphics toolkit out there already implements GPU based path rendering. Obviously the implementations differ and some might profit from the extension but for Qt the question is whether that quality matters more than the performance. Specifically whether the quality improves enough to justify the performance hit.

I think the extension's success will largely depend on whether it's promoted to, at least an EXT or, ideally an ARB, meaning all the drivers support it. Using it would make the implementations of path rendering in toolkits/vector graphics libs a lot simpler and give driver developer a central place to optimize a pretty crucial part of the modern graphics stack. Unfortunately if you still need to maintain the non NV_path_rendering paths then it doesn't make a whole lot of sense. Mesa3D implementation would be trivial simply because I've already implemented path rendering for OpenVG using the Gallium3D interface, so it'd be a matter of moving that code but I'm just not sure if anyone will be actually using this extension. All in all, it's a very well done extension but it might be a little too late.

ApiTrace

2011-04-25T21:21:00.015+01:00

During the last three weeks I've spent most of my spare time writing a GUI for Jose's amazing ApiTrace project. ApiTrace is a project to trace, analyze and debug graphics api's. Both OpenGL and Direct3D. To some extend inspired by gDEBugger and Windows PIX. We wanted a tool that would let us slice through huge games and CAD apps to the exact call which causes problems and be able to inspect the entire graphics state, including the shaders, textures and all the buffers. We ended up doing that, plus a lot more and we're just getting started. In other words it's the best thing since "Human/Robot Emancipation Act of 3015".

You begin by tracing your target application. You can do that either from the console or from the GUI. A trace file is created and we can do some amazing things with it. You can open it in a GUI and

Inspect the state frame by frame, draw call by draw call:
Replay the trace file:
Check every texture:
Every bound framebuffer:
Every shader:
Every vertex buffer:
You can see if OpenGL threw an error at any point during the replay and if so what was it:
And to go completely nuts, as graphics developers like to do, you get to edit any shader, any uniform and large chunks of the state to immediately see the effects it would have on the rendering:

As a driver developer you no longer have to install all the games just to debug a problem, the report can simply include a short trace which you can use to immediately figure out what's wrong. As an application developer you can inspect every graphics call your app makes, you can analyze your api usage and you could automatically produce standalone testcases which you can send to driver developers.

ApiTrace is hosted on github and it's BSD licensed. It works on Linux and Windows (we're planning to add OSX support as well). Gui is written using Qt and requires the QJson library.

Jose just announced the first release so let us know if there's anything that would make your life a lot easier. Next is support for multiple GL contexts, ability to export just a single frame from a trace (either as a trace file or a standalone C application), ability to start and stop tracing on a hot key and lots of other features soon. So whether you're a driver developer, working on games, CAD apps or 2D scene-graphs this is a tool that should make your life significantly easier and better.

2D musings

2010-11-02T14:56:00.002+00:00

If you've been following graphics developments in the 2D world over the last few years you've probably seen a number of blogs and articles complaining about performance. In particular about how slow 2D is on GPUs. Have you ever wondered why it's possible to make this completely smooth but your desktop still sometimes feels sluggish?

Bad model

For some weird reason ("neglect" being one of them) 2D rendering model hasn't evolved at all in the last few years. That is if it has evolved at all since the very first "draw line" became a function call. Draw line, draw rectangle, draw image, blit this, were simply joined by fill path, stroke path, few extra composition modes and such. At its very core the model remained the same though, meaning lots of calls to draw an equally large number of small primitives.

This worked well because technically zero, or almost zero, setup code was necessary to start rendering. Then GPUs became prevalent and they could do amazing things but to get them to do anything you had to upload the data and the commands that would tell them what to do. With time more and more data had to be sent to the GPU to describe the increasingly complex and larger scenes. It made sense to optimize the process of uploads (I keep calling them "uploads" but "GPU downloads" is closer to the true meaning) by allowing to upload an entire resource once and then refer to it via a handle. Buffers, shaders, addition of new shading stages (tessellation, geometry) all meant to reduce the size of data that had to be uploaded to the GPU before every rendering.

At least for games and well designed 3D software. 2D stuck to its old model of "make GPU download everything on every draw request". It worked ok because most of the user interface was static and rather boring so the performance was never much of an issue. Plus in many cases the huge setup costs are offset by the fact that the Graphics Processing Units are really good at processing graphics.

Each application is composed of multiple widgets each widget draws itself using multiple primitives (pixmaps, rectangles, lines, paths) and each primitive needs to first upload the data needed by the GPU to render it. It's like that because from the 2D api perspective there's no object persistence. The api has no idea that you keep re-rendering the same button over and over again. All the api sees is another "draw rectangle" or "draw path" call which it will complete.

On each frame the same data is being copied to the GPU over and over again. It's not very efficient, is it? There's a limited number of optimizations you can do in this model. Some of the more obvious ones include:

adding unique identifiers to the pixmaps/surfaces and using those as identifiers as keys in a texture cache which allows you to create a texture for every pixmap/surface only once,

collecting data from each draw call in a temporary buffer and copying it all at once (e.g. in SkOSWindow::afterChildren, QWindowSurface::endPaint or such),

creating a shader cache for different types of fills and composition modes

But the real problem is that you keep making the GPU download the same data every frame and unfortunately that is really hard to fix in this model.

Fixing the model

It all boils down to creating some kind of a store where lifetime of an object/model is known. This way the scene knows exactly what objects are being rendered and before rendering begins it can initialize and upload all the data the items need to be renderer. Then rendering is just that - rendering. Data transfers are limited to object addition/removal or significant changes to their properties and then further limited by the fact that a lot of the state can always be reused. Note that trivial things like changing the texture (e.g. on hover/push) don't require any additional transfers and things like translations can be limited to just two floats (translation in x and y) and they're usually shared for multiple primitives (e.g. in a pushbutton it would be used by the background texture and the label texture/glyphs)

It would seem like the addition of QGraphicsView was a good time to change the 2D model, but that wasn't really possible because people like their QPainter. No one likes when a tool they have been using for a while and are fairly familiar with is suddenly taken away. Completely changing a model required a more drastic move.

QML and scene-graph

QML fundamentally changes the way we create interfaces and it's very neat. From the api perspective it's not much different from JavaFX and one could argue which one is neater/better but QML allows us to almost completely get rid of the old 2D rendering model and that's why I love it! A side-effect of moving to QML is likely the most significant change we've done to accelerated 2D in a long time. The new Qt scene graph is a very important project that can make a huge difference to the performance, look and feel of 2D interfaces.

Give it a try. If you don't have OpenGL working, no worries it will work fine with Mesa3D on top of llvmpipe.

A nice project would be doing the same in web engines. We have all the info there but we decompose it into the draw line, draw path, draw rectangle, draw image calls. Short of the canvas object which needs the old style painters, everything is there to make accelerated web engines a lot better at rendering the content.

Intermediate representation again

2010-10-28T22:12:00.003+01:00

I wrote about intermediate representation a few times already but I've been blogging so rarely that it feels like ions ago. It's an important topic and one that is made a bit more convoluted by Gallium.

The intermediate representation (IR) we use in Gallium is called Tokenized Gallium Shader Instructions or TGSI. In general when you think about IR you think about some middle layer. In other words you have a language (e.g. C, GLSL, Python, whatever) which is being compiled into some IR which then is transformed/optimized and finally compiled into some target language (e.g. X86 assembly).

This is not how Gallium and TGSI work or were meant to work. We realized that people were making that mistake so we tried to back-paddle on the usage of the term IR when referring to TGSI and started calling it a "transport" or "shader interface" which better describes its purpose but is still pretty confusing. TGSI was simply not designed as a transformable representation. It can be done, but it's a lot like a dance-off on a geek conference - painful and embarrassing for everyone involved.

The way it was meant to work was:

Language -> TGSI ->[ GPU specific IR -> transformations -> GPU ]

with the parts in the brackets living in the driver. Why like that? Because GPUs are so different that we thought each of them would require its own transformations and would need its own IR to operate on. Because we're not compiler experts and didn't think we could design something that would work well for everyone. Finally and most importantly because it's how Direct3D does it. Direct3D functional specification is unfortunately not public which makes it a bit hard to explain.

The idea behind it was great in both its simplicity and overabundance of misplaced optimism. All graphics companies had Direct3D drivers, they all had working code that compiled from Direct3D assembly to their respective GPUs. "If TGSI will be a lot like Direct3D assembly then TGSI will work with every GPU that works on Windows, plus wouldn't it be wonderful if all those companies could basically just take that Windows code and end up with a working shader compiler for GNU/Linux?!", we thought. Logically that would be a nice thing. Sadly companies do not like to release part of their Windows driver code as Free Software, sometimes it's not even possible. Sometimes Windows and Linux teams never talk to each other. Sometimes they just don't care. Either way our lofty goal of making the IR so much easier and quicker to adopt took a pretty severe beating. It's especially disappointing since if you look at some of the documentation e.g. for AMD Intermediate Language you'll notice that this stuff is essentially Direct3D assembly which is essentially TGSI (and most of the parts that are in AMD IL and not in TGSI are parts that will be added to TGSI) . So they have this code. In the case of AMD it's even sadder because the crucial code that we need for OpenCL right now is OpenCL C -> TGSI LLVM backend which AMD already does for their IL. Some poor schmuck will have to sit down and write more/less the same code. Of course if it's going to be me it's "poor, insanely handsome and definitely not a schmuck".

So we're left with Free Software developers who don't have access to the Direct3D functional spec and who are being confused by the IR which is unlike anything they've seen (pre-declared registers, typeless...) which on top of it is not easily transformable. TGSI is very readable, simple and pretty easy to debug though so it's not all negative. It's also great if you never have to optimize or transform its structure which unfortunately is rather rare.

If we abandon the hope of having the code from Windows drivers injected in the GNU/Linux drivers it becomes pretty clear that we could do better than TGSI. Personally I just abhor the idea of rolling out our own IR. IR in the true sense of that word. Crazy as it may sound I'd really like my compiler stuff to be written by compiler experts. It's the main reason why I really like the idea of using LLVM IR as our IR.

Ultimately it's all kind of taking the "science" out of "computer science" because it's very speculative. We know AMD and NVIDIA use it to some extend (and there's an open PTX backend for LLVM) , we like it, we use it in some places (llvmpipe), the people behind LLVM are great and know their stuff but how hard is it to use LLVM IR as the main IR in a graphics framework and how hard is it to code-generate directly from it for GPUs - we don't really know. It seems like a really good idea, good enough for folks from LunarG to give it a try which I think is really what we need; a proof that it is possible and doesn't require sacrificing any farm animals. Which, as a vegetarian, I'd be firmly against.

Graphics drivers

2010-07-01T14:42:00.001+01:00

There are only two tasks harder than writing Free Software graphics drivers. One is running a successful crocodile petting zoo, the other is wireless bungee jumping.

In general writing graphics drivers is hard. The number of people who can actually do it is very small and the ones who can do it well are usually doing it full-time already. Unless the company, which those folks are working for, supports open drivers, the earliest someone can start working on open drivers is the day the hardware is officially available. That's already about 2 years too late, maybe a year if the hardware is just an incremental update. Obviously not a lot of programmers have the motivation to do that. Small subset of the already very small subset of programmers who can write drivers. Each of them worth their weight in gold (double, since they're usually pretty skinny).

Vendors who ship hardware with GNU/Linux don't see a lot of value in having open graphics drivers on those platforms. All the Android phones are a good example of that. Then again Android decided to use Skia, which right now is simply the worst of the vector graphics frameworks out there (Qt and Cairo being two other notables). Plus having to wrap every new game port (all that code is always C/C++) with NDK is a pain. So the lack of decent drivers is probably not their biggest problem.

MeeGo has a lot better graphics framework, but we're ways of before we'll see anyone shipping devices running it and Nokia in the past did "awesome" things like release GNU/Linux devices with a GPU but without any drivers for it (n800, n810) and Intel's Poulsbo/Moorestown graphics drivers woes are well-known. On the desktop side it seems that Intel folks are afraid that porting their drivers to Gallium will destabilize them. Which is certainly true, but the benefits of doing so (multiple state trackers, cleaner driver code, being able to use Gallium debugging tools like trace/replay/rbug and nicely abstracting the api code from drivers) would be well worth it and hugely beneficial to everyone.

As it stands we have an excellent framework in Gallium3D but not a lot of open drivers for it. Ironically it's our new software driver, llvmpipe, or more precisely a mutation of it, which has the potential to fix some of our GPU issues in the future. With the continues generalization of GPUs my hope is that all we'll need is DRM code (memory management, kernel modesetting, command submission) and LLVM->GPU code generator. It's not exactly a trivial amount of code by any stretch of the imagination but smaller than what we'd need right now and once it would be done for one or two GPUs it would certainly become a lot simpler. Plus GPGPU will eventually make the latter part mandatory anyway. Having that would get us a working driver right away and after that we could play with texture sampling and vertex paths (which will likely stay as dedicated units for a while) as optimizations.

Of course it would be even better if a company shipping millions of devices with GNU/Linux wanted a working graphics stack from bottom up (memory management, kernel modesetting, Gallium drivers with multiple state trackers, optimized for whatever 2D framework they use) but that would make sense. "Is that bad?" you ask. Oh, it's terrible because GNU/Linux graphics stack on all those shipped devices is apparently meant to defy logic.

No hands!

2010-06-21T14:46:00.003+01:00

I've spent the last few weeks in London with Keith Whitwell and José Fonseca, the greatest computer graphics experts since the guy who invented the crystal ball in the Lord of the Rings, the one that allowed you to see your future in full immersive 3D. Which begs the question: do you think that there was an intermediate step to those? I mean a crystal ball that showed you your future but 2D, all pixelated and in monochrome. Like Pong. You got 8x8 pixels and then the wizard/witch really needed to get their shit together to figure out what's going on, e.g.

- "This seems to be the quintessential good news/bad news kind of a scenario: good news - you'll be having sex! Bad news - with a hippo.",

- "There's no hippos in Canada...",

- "Right, right, I see how that could be a problem. How about elephants with no trunks?",

- "None of those",

- "Walls maybe?",

- "Yea, got those",

- "Excellent! Well then you'll probably walk into a one.".

Just once I'd like to see a movie where they're figuring this stuff out. If you'll shoot it, I'll write it!

Getting back on track: we've got a lot of stuff done. I've worked mainly on the Gallium interface trying to update it a bit for modern hardware and APIs.

First of all I've finally fixed geometry shaders in our software driver. The interface for geometry shader is quite simple, but the implementation is a bit involved and I've never done it properly. Until now that is. In particular emitting multiple primitives from a single geometry shader never worked properly. Adjacency primitives never worked. Drawing indexed vertex buffer with a geometry shader never worked properly and texture sampling in GS was just never implemented. All of that is fixed and looks very solid.

Second of all stream out, aka transform feedback is in. Both in terms of interface additions and software implementation. Being able to process a set of vertices and stream them out into a buffer is a very useful feature. I still enjoy vector graphics and in there processing a set of data points is a very common operation.

I've also added the concept of resources to our shader representation, TGSI. We needed a way of arbitrarily binding resources to sampler and defining load/sample formats for resources in shaders. The feature comes with a number of new instructions. The most significant in terms of their functionality are the load and gather4 instructions because they represent first GPGPU centric features we've added to Gallium (obviously both very useful in graphics algos as well).

Staying in TGSI land, we have two new register files. I've added immediate constant array and temporary arrays. Up til now we had really no way of declaring a set of temporaries or immediates as an array. We would imitate that functionality by declaring ranges e.g. "DCL TEMP[0..20]", which becomes very nasty when a shader is using multiple indexable arrays, it also makes bounds checking almost impossible.

A while back Keith wrote Graw which is a convenience state tracker that exposes the raw Gallium interface. It's a wonderful framework for testing Gallium. We have added a few tests for all the above mentioned features.

All of those things are more developer features than anything else though. They all need to be implemented in state trackers (e.g. GL) and drivers before they really go public and that's really the next step.

Geometry Processing - A love story

2010-04-27T16:11:00.004+01:00

It's still early in the year but I feel like this is the favorite for the "best computer science related blog title of 2010". I'd include 2009 in that as well, but realistically speaking I probably wrote something equally stupid last year and I don't have the kind of time required to read what I write.

Last week I've merged the new geometry processing pipeline and hooked it into our new software driver, llvmpipe. It actually went smoother than I thought it would, i.e. it just worked. I celebrated by visiting ER, a harsh reminder that my lasting fascination with martial arts will be the end of me (it will pay dividends once llvmpipe becomes self-aware though).

The codepaths are part of the Gallium3D Draw module. It's fairly simple once you get to it. At a draw call we generate the optimal vertex pipeline for the currently set state. LLVM makes this stuff a lot easier. First of all we get the human-readable LLVM IR which is a lot easier than assembly to go over if something goes wrong. Running LLVM optimization passes over a fairly significant amount of code is a lot easier than having to hand-optimize assembly code-generation. Part of it is that geometry processing is composed of a few fairly distinct phases (e.g. fetch, shade, clip, assemble, emit) and given that they all can and will change depending on the currently set state makes it difficult to code-generate optimal code by hand. That is unless you have a compiler framework like LLVM. Then you don't care and hope LLVM will bail you out in cases where you end up doing something stupid for the sake of code-simplicity.

A good example of that is our usage of alloca's for all variables. Initially all variables were in registers but I switched the code to use alloca's for a very simple reason: doing flow control in SOA mode when everything was in registers was tough, it meant keeping track of the PHI nodes. Not to mention that we had no good way of doing indirect addressing in that scheme. Using alloca's makes our code a lot simpler. In the end, thanks to LLVM optimization passes (mem2reg in this case), virtually every usage of allocas is eliminated and replaced with direct register access.

Anyway, with the new geometry paths the improvements are quite substantial. For vertex processing dominated tests (like geartrain which went from 35fps to 110fps) the improvements are between 2x to about 6x, for cases which are dominated by fragment processing it's obviously a lot less (e.g. openarena went from about 25fps to about 32fps). All in all llvmpipe is looking real good.

The software renderer

2010-03-11T15:19:00.005+00:00

There's a number of great software renderers out there. SwiftShader and DirectX WARP being the two widely known ones. Unfortunately GNU/Linux, and Free Software in general, didn't have a modern software renderer. That's about to change thanks to a project started by José Fonseca. The project is known as the llvmpipe (obviously a development name). José decided that the way forward is writing a driver for Gallium which would code-generate at run-time the rendering paths from the currently set state. LLVM is used to code-generate and optimize the code.

Besides the idea itself, my favorite part of the driver is the software rasterizer. Keith Whitwell, Brian Paul and José implemented a threaded, tiled rasterizer which performs very nicely and scales pretty well with the number of cores. I'm sure one of them will write more about it when they'll have a bit of spare time.

Currently the entire fragment pipeline is code-generated. Over the last two weeks I've been implementing the vertex pipeline, which I'm hoping to merge soon (hence the light smile). Code generating the entire vertex pipeline isn't exactly trivial, but one can divide it into individual pieces and that makes it a bit easier. Start with the vertex shader, then go back and do the fetch and translate, then again move forward and do the emit, then go back and do the viewport transformations and clipping and so on, finally combine all the pieces together.

In between working on the vertex pipeline I've been filling in some missing pieces in the shader compilation. In particular the control-flow. We use the SOA layout which always make control flow a bit tricky. I've just committed support for loops and the only thing left is support for subroutines in shaders so I think we're in a pretty good shape. We can't rock the speedos quite yet, but we're getting there. It's my new measurment for software quality - could it pull off the speedos look? There's few things on this world that can.

Keeping in mind that we haven't even started optimizing it yet, as we're still implementing features, the driver, on my Xeon E5405 runs the anholt.dm_68 OpenArena demo at 25fps (albeit with some artifacts) which is quite frankly pretty impressive, especially if you compare it to the old Mesa3D software renderer that runs the same demo, on the same machine at 3.5 fps.

3D APIs

2010-02-10T16:42:00.002+00:00

Do you know where babies come from? It's kinda disturbing, isn't it? I blame it on the global warming. Stay with me here and I'll lead you through this. 1) Storks deliver babies (fact), 2) I, not being very careful about how I word things cause global warming (also a fact), 3) global warming kills storks (my theory) 4) evolution takes over and replaces storks with the undignified system that we have right now (conclusion).

Speaking about not being very careful about how I word things, how about me announcing those DirectX state trackers coming to GNU/Linux in my last blog, eh? By the way, now that's a good transition. The announcement was very exciting, albeit a little confusing, in particular to myself. Likely because it's really not what I meant. Mea culpa!

What I was talking about is adding features to the Gallium interfaces that will allow us to support those API's, not about actually releasing any of those API's as state trackers for Gallium. Gallium abstracts communication with the hardware and the support for OpenGL 3.2 + few extensions and Direct3D 10.x requires essentially the same new interfaces in Gallium. So support for Direct3D 10.x means that we can support OpenGL 3.2 + those extra extensions. I don't think that it's a big secret that while we do have and are shipping Direct3D 9 state tracker and are working on a Direct3D 10 state tracker, to my knowledge, we simply don't have any plans to release that code. It's Windows specific anyway, so it wouldn't help the Free Software community in any shape or form.

I know there's that misguided belief that a Direct3D state tracker would all of a sudden mean that all Windows games work on GNU/Linux. That's just not true, there's a lot more to an API than "draw those vertices and shade it like that", windowing system integration, file io, networking, input devices, etc. If software is DirectX specific then it's pretty clear that the authors didn't use X11 and Posix apis to implement all that other functionality. Those windows.h/windowsx.h includes are pretty prominent in all of those titles. So one would have to wrap entire Windows API to be able to play DirectX games. That's what Wine is about and they're better off using OpenGL like they are now.

The issue with gaming on GNU/Linux isn't the lack of any particular API, it's the small gaming market. If an IHV would release a device that 500 million people would buy and the only programming language supported would be COBOL and the only 3D API would be PHIGS then we would see lots of games in COBOL using PHIGS. Sure, the developers wouldn't like it, but the brand new Ferraris that the company could buy, after release of their first title, for everyone, including all the janitors, would make up for that.

The bottom line is that the majority of games could easily be ported to GNU/Linux since the engines usually are cross-platform anyway, at the very least to support different consoles. DirectX, libgcm, OpenGL ES... either they're all already supported or could be easily made to support them. It's simply that it's not cost-effective to do so. A good example is Palm Pre which is running GNU/Linux and even before the official release of their 3D SDK (PDK) which uses SDL they already got EA Mobile to port Need for Speed, The Sims and others to their device.

On the other hand if a title supports only DirectX or only libgcm it's usually because it's exclusive to the given platform and presence of the same API on other platforms will not make the vendor suddenly release the title for the new system.

So yea, Direct3D on GNU/Linux simply means nothing. We won't get more games, won't make it easier to port the already cross platform engines, won't allow porting of the exclusive titles and will not fill any holes in our gaming SDKs. Besides ethically speaking we should support OpenGL, not a closed API from a closed platform.

New features

2010-02-08T14:55:00.003+00:00

Lately we've been working a lot on Gallium but we haven't been publicly talking about it enough. I, personally, spent a considerable amount of time last week coming up with my strategy for the inevitable monster invasion. In general I always thought that if, sorry, "when" the monsters attack I'd be one of the very first ones to go on account of my "there's a weird noise coming from that pitch black spot over there, I better check it out" tendency. Obviously that's not ideal, especially if the need to repopulate the world arises; I just really feel like I should be present for that. In conclusion I'll quickly go over some things we've been doing with Gallium. Admittedly that wasn't one of my best transitions.

There are three new major APIs that we want to support in Gallium. OpenCL 1.0, DirectX 10.x and DirectX 11. DirectX 10.x was somewhat prioritized of late because it's a good stepping stone for a lot of the features that we wanted.

Two of my favorites were the geometry shaders and new blending functionality. I want to start with the latter which Roland worked on because it has immediate impact on Free Software graphics.

One of the things that drives people bonkers is text rendering. In particular subpixel rendering, or if you're dealing with Xrender component alpha rendering

Historically both GL and D3D provided fixed-function blending that provided means of combining source colors with the content of a render buffer in a number of ways. Unfortunately the inputs to the blending units were constrained to a source color, destination color or constants that could be used in their place. That was unfortunate because the component alpha math required two distinct values: the source color and the blending factor for destination colors (assuming the typical Porter&Duff over rendering). D3D 10 dealt with it by adding functionality called dual-source blending (OpenGL will get an extension for it sometime soon). The idea being that the fragment shader may output two distinct colors which will be treated as two independent inputs to the blending units.

Thanks to this we can support subpixel rendering in a single pass with a plethora of compositing operators. Whether you're a green ooze trying to conquer Earth or a boring human you have to appreciate 2x faster text rendering.

Geometry shaders introduce a new shader type, run after the vertices have been transformed (after the vertex shader), but before color clamping, flat shading and clipping.

Along the support for geometry shaders we have added two major features to TGSI ("Tokenized Gallium Shader Instructions", our low level graphics language).

The first one was support for properties. Geometry shaders in Gallium introduced the notion of state aware compile. This is because compilation of a geometry shader is specific to, at the very least, the input and output primitive they respectively operate on and output. We deal with it by injecting PROPERTY instructions to the program as so:

GEOM
PROPERTY GS_INPUT_PRIMITIVE TRIANGLES
PROPERTY GS_OUTPUT_PRIMITIVE TRIANGLE_STRIP

(rest of geometry shader follows)

The properties are easily extendable and are the perfect framework to support things like work-group size in OpenCL.

The second feature is support for multidimensional inputs in shaders. The syntax looks as follows:

DCL IN[][0], POSITION

DCL IN[][1], COLOR

which declares two input arrays. Note that the size of the arrays is implicit, defined by the GS_INPUT_PRIMITIVE property.

It's nice to see this framework progressing so quickly.

Summing up, this is what yin yang is all about. Do you know what yin yang is? Of course you don't, you know nothing of taoism. Technically neither do I but a) it sounds cool, b) I've been busy coming up with a "monster invasion" contingency plan, so couldn't be bothered with some hippy concepts, c) the previous two points are excellent.

Languages and vGPU

2009-11-30T21:30:00.003+00:00

Now that, despite all the protein supplements, I've lost weight again and got myself a beard (grew not bought) I feel like I really nailed the homeless pirate look.

In the spirit of camaraderie I spent a few minutes yesterday fixing one of the longest bugs we had in KDE which was the inability to spell check Indic, Middle Eastern, Asian and a few other languages. The issue was that the text segmentation algorithm, due to my uncanny optimism, was defining words as "things that are composed of letters". That isn't even true for English (e.g. "doesn't" is a word that's composed of more than just letters and breaking it up at the non-letter produces two non-words) and it was utterly and completely wrong for pretty much every language that wasn't using a Latin like alphabet. I switched the entire codebase to use the unicode text boundary specification, tr29-11 as implemented in the QTextBoundaryFinder and it's fixed now.

In other knews we've released the Gallium3D driver for VMware's vGPU (right now called svga, but we'll rename it as soon as we'll have some time). It means that we can accelerate all the APIs supported by Gallium in the virtual machine which is fairly impressive.

Accelerating X with Gallium

2009-09-01T20:56:00.004+01:00

Spring is coming. I'm certain of it because I know for a fact that it's not Spring right now and seasons have a tendency to wrap around. Obviously that means I need to work a bit on the acceleration code for X11. It's just like going to a doctor, sure, it sucks, but at the back of your mind you know you need to do it.

So what's different about this code? Well, it's not an acceleration architecture. We're reusing Exa. It's the implementation.

Currently if you want to accelerate X11, or more precisely Xrender, you implement some acceleration architecture in your X11 driver. It may be Exa, Uxa, Xaa or something else (surely also with an X and A in the name because that seems to be a requirement here). Xrender is a bit too complicated for any 2D engine out there, which means that any code which accelerates Xrender will use the same paths OpenGL acceleration does. In turn this means that if we had an interface which exposed how hardware works we could put OpenGL and Xrender acceleration on top of the same driver. One driver, accelerating multiple APIs.

That's exactly what Gallium is all about so it shouldn't be a big surprise that this is what we're doing. We're simply writing an Xorg state tracker for Gallium in which Exa is implemented through the Gallium interface.

So with a Gallium driver, besides OpenGL, OpenGL ES and OpenVG you'll get X11 acceleration. I'm specifically saying here X11 because after implementing Exa we'll move to accelerating Xv, which will give us a nice set of ops which are accelerated in a typical X application.

More 2D in KDE

2009-08-14T19:04:00.003+01:00

An interesting question is: if the raster engine is faster at gross majority of graphics and Qt X11 engine falls back on quite a few features anyway why shouldn't we make raster the default until OpenGL implementations/engine are stable enough.

There are two technical reasons for that. Actually that's not quite true, there's more but these two are the ones that will bother a lot of people.

The first is that we'd kill KDE performance over network. So everyone who uses X/KDE over network would suddenly realize that their setup became unusable. Their sessions would suddenly send images for absolutely everything all the time... As you can imagine institutions/schools/companies who use KDE exactly in this way wouldn't be particularly impressed if suddenly updating their installations would render them unusable.

The second reason is that I kinda like being able to read and sometimes even write text. Text tends to be pretty helpful during the process of reading. Especially things like emails and web browsing get a lot easier with text. I think a lot of people shares that opinion with me. To render text we need fonts, in turn those are composed of glyphs. To make text rendering efficient, we need to cache the glyphs. When running with the X11 engine we render the text using Xrender, which means that there's a central process that can technically manage all the glyphs used by applications running on a desktop. That process is the Xserver. With the raster engine we take Xserver out of the equation and suddenly every single application on the desktop needs to cache the glyphs for all the fonts they're using. This implies that every application suddenly uses megs and megs of extra memory. They all need to individually cache all the glyphs even if all of them use the same font. It tends to work ok for languages with a few glyphs e.g. 40+ for english (26 letters + 10 digits + a few punctuation marks). It doesn't work at all for languages with more. So unless it will be decided that KDE can only be used by people with languages whose alphabets contain about 30 letters or less, then I'd hold off with making raster engine the default.

While the latter problem could be solved with some clever shared memory usage or forcing Xrender on top of raster engine (actually I shouldn't state that as a fact, I haven't looked at font rendering in raster engine in a while and maybe that was implemented lately), it's worth noting that X11 engine is simply fast enough to not bother over a few frames in this way or another. Those few frames that you'd gain would mean unusable KDE for others.

And if you think that neither of the above two points bothers you and you'd still want to use raster engine by default you'll have to understand that I just won't post instructions on how to do that here. If you're a developer, you already know how to do it and if not there are trivial ways of finding that out from the Qt sources. If you're not a developer then you really should stick globally to the defaults and can simply test the applications with the -graphicssystem switches.

2D in KDE

2009-08-14T01:21:00.007+01:00

So it seems a lot of people is wondering about this. By this I mean why dwarfs always have beards. Underground big ears would be probably a better evolutionary trait, but elfs got dibs on those.

Qt, and therefore KDE, deals with 3 predominant ways of rendering graphics. I don't feel like bothering with transitions today, so find your own way from beards and dwarfs to Qt/KDE graphics. Those three ways are:

On the CPU with no help from the GPU using the raster engine
Using X11/Xrender with the X11 engine
Using OpenGL with the OpenGL engine

There's a couple of ways in which the decision about which one of those engines is being used is made.

First there's the default global engine. This is what you get when you open a QPainter on a QWidget and its derivatives. So whenever you have code like


void MyWidget::paintEvent(QPaintEvent *)
{
 QPainter p(this);
 ...
}

you know the default engine is being used. The rules for that are as follows:

GNU/Linux : X11 engine is being used
Windows : Raster engine is being used
Application has been started with -graphicssystem= option :
- -graphicssystem=native the rules above apply
- -graphicssystem=raster the raster engine is being used by default
- -graphicssystem=opengl the OpenGL engine is being used by default

Furthermore depending on which QPaintDevice is being used, different engines will be selected. The rules for that are as follows:

QWidget the default engine is being used (picked as described above)
QPixmap the default engine is being used (picked as described above)
QImage the raster engine is being used (always, it doesn't matter what engine has been selected as the default)
QGLWidget, QGLFramebufferObject, QGLPixelBuffer the OpenGL engine is being used (always, it doesn't matter what engine has been selected as the default)

Now here's where things get tricky: if the engine doesn't support certain features it will have to fallback to one engine that is sure to work on all platforms and have all the features required by the QPainter api - that is the raster engine. This was done to assure that all engines have the same feature set.

While OpenGL engine should in general never fallback, that is not the case for X11 and there are fallbacks. One of the biggest immediate optimizations you can do to make your application run faster is to assure that you don't have fallbacks. A good way to check for that is to export QT_PAINT_FALLBACK_OVERLAY and run your application against a debug build of Qt, this way the region which caused a fallback will be highlighted (the other method is to gdb break in QPainter::draw_helper). Unfortunately this will only detect fallbacks in Qt.

All of those engines also use drastically different methods of rendering primitives.
The raster engine rasterizes primitives directly.
The X11 engine tessellates primitives into trapezoids, that's because Xrender composites trapezoids.
The GL engine either uses the stencil method (described in this blog a long time ago) or shaders to decompose the primitives and the rest is handled by the normal GL rasterization rules.

Tessellation is a fairly complicated process (also described a long ago in this blog). To handle degenerate cases the first step of this algorithm is to find intersections of the primitive. In the simplest form think about rendering figure 8. There's no way of testing whether the given primitive is self-intersecting without actually running the algorithm.
To render with anti-aliasing on the X11 engine we have to tessellate. We have to tessellate because Xrender requires trapezoids to render anti-aliased primitives. So if the X11 engine is being used and the rendering is anti-aliased whether you're rendering a line, heart or a moose we have to tessellate.

Someone was worried that it's a O(n^2) process which is of course completely incorrect. We're not using a brute force algorithm here. The process is obviously O(nlogn). O(nlogn) complexity on the cpu side is something that both the raster and X11 engines need to deal with. The question is what happens next and what happens in the subsequent calls.

While the raster engine can deal with all of it while rasterizing, the X11 engine can't. It has to tessellate, send the results to the server and hope for the best. If the X11 driver doesn't implement composition of trapezoids (which realistically speaking most of them doesn't) this operation is done by Pixman. In the raster engine the sheer spatial locality almost forces better cache utilization than what could be realistically achieved by the "application tessellate->server rasterization" process that the X11 engine has to deal with. So without all out acceleration in this case X11 engine can't compete with the raster engine. While simplifying a lot it's worth remembering that in terms of cycles register access is most likely smaller or equal to 1 cycle, access to L1 data cache is likely about 3 cycles, L2 is probably about 14 cycles, while the main memory is about 240 cycles. So for CPU based graphics efficient memory utilization is one of the most crucial undertakings.

With that in mind, this is also the reason why a heavily optimized purely software based OpenGL implementation would be a lot faster than raster engine is at 2D graphics. In terms of memory usage OpenGL pipeline is simply a lot better at handling memory than the API QPainter provides.

So what you should take away from this is that if you're living in the perfect world, the GL engine is so much better than absolutely anything else Qt/KDE have it's not even funny, X11 follows it and the raster engine trails far behind.

The reality with which you're dealing with is that when using the X11 engine, due to the fallback you will be also using the raster engine (either on the application side with Qt raster engine or the server side with Pixman) and unfortunately in this case "the more the better" doesn't apply and you will suffer tremendously. Our X11 drivers don't accelerate chunks of Xrender, the applications don't have good means of testing what is accelerated, so what Qt does is simply doesn't use many of its features. So even if the driver would accelerate for example gradient fills and source picture transformations it wouldn't help you because Qt simply doesn't use them and always falls back to the raster engine. It's a bit of a chicken and an egg problem - Qt doesn't use it because it's slow, it's slow because no one uses it.

The best solution to that conundrum is to try running your applications with -graphicssystem=opengl and report any problems you see to both Qt software and the Mesa3D/DRI bugzillas because the only way out is to make sure that both our OpenGL implementations and OpenGL usage in the rendering code on the applications side are working efficiently and correctly. The quicker we get the rendering stack to work on top of OpenGL the better off we'll be.

OpenGL ES

2009-05-15T23:10:00.003+01:00

Two facts, one you didn't need to know, other that you should know. I'm the World Champion in being Vegetarian. Did you know that? Of course not, you probably didn't even know it's a competition. But it is. And I won it. My favorite vegetable? Nachos.
Technically they're a fourth of a fifth generation vegetable. But vegetables are like software, you never want to work with them on the first iteration. Like you wouldn't want to eat corn, but mash it, broil/fry it and you got something magical going on in the form of tortilla chips.

The other fact: OpenGL ES state trackers for Gallium just went public. That includes both OpenGL ES 1.x and 2.x. Brian just pushed them into opengl-es branch. Together with OpenVG they should be part of the Mesa3D 7.6 release.

At this point Mesa3D almost becomes the Khronos SDK and hopefully soon we'll add an OpenCL state tracker and start working on bumping the OpenGL code to version 3.1.

OpenVG Release

2009-05-10T02:10:00.004+01:00

I've been procrastinating lately. Where I define procrastination as: doing actual work, instead of writing blog entries. "Can you do that? Redefine words because you feel like it?", oh, sure you can. As long as you don't care about anyone understanding what you're saying, then it's perfectly fine. In fact it's good for you, the less people understands you, the more likely it is that they'll classify you as a genius. Life is awesome like that. That's right, you come here for technical stuff and get blessed with some serious soul searching stuff, enjoy.

We have released the OpenVG state tracker for Gallium.

It's a complete implementation of OpenVG 1.0. While my opinion on 2D graphics and long term usefulness of OpenVG changed drastically during the last year I'm very happy we were able to release the code. After the Mesa3D 7.5 we're going to merge it into master and 7.6 will be the first release that includes both OpenGL and OpenVG.

[You're exuberant about that]. I figured I'll start including stage directions for the blog so that you know how you feel about reading this.

I don't have much to write about the implementation itself. If you have a Gallium driver, then you're good to go and have accelerated 2d and 3d. It's great to see more state trackers being released for Gallium and this whole concept of "accelerating multiple APIs" on top of one driver architecture becoming a reality. [You love it and with a smile, wave bye]

Intermediate representation

2009-03-23T02:01:00.002+00:00

I've spent the last two weeks working from our office in London. The office is right behind Tate Modern with a great view of London which is really nice. While there I was mainly trying to get the compilation of shaders rock solid for one of our drivers.

In Gallium3D currently we're using TGSI intermediate representation. It's a fairly neat representation with instruction semantics matching those from relevant GL extensions/specs. If you're familiar with D3D you know that there are subtle differences between seemingly similar instructions in both APIs. Modern hardware tends to follow the D3D semantics more closely than the GL ones.

Some of the differences include:

Indirect addressing offsets, in GL offsets are between -64 to +63, while D3D wants them >= 0.
Output to temporaries. In D3D certain instructions can only output to temporary registers while TGSI doesn't have that requirement.
Branching. D3D has both if_comp and setp instructions which can perform essentially any of the comparisons, while we immitate them with a SLT (set on less than) or similar with a IF instruction with semantics of the one in GL_NV_fragment_program2.
Looping.

None of this is particularly difficult but we had to implement almost exactly the same code in another driver of ours and that makes me a little uneasy. There's a number of ways to fix this. We could change the semantics of TGSI to match D3D, or we could implement transformation passes that operate on TGSI and do this transformation directly on the IR before it gets to the driver, or we could radically change the IR. Strangely enough it's the solution number #3, which while appearing the most bizzare, is the one I like the most. It's because it goes hand in hand with the usage of LLVM inside Gallium.

Especially because there's a lot generic transformations that really should be done on the IR itself. A good example of that includes code injections due to missing pieces of the API state on a given GPU, e.g. shadow texturing, including the compare mode, compare function and the shadow ambient value, which if the samplers on the given piece of hardware don't support need to be emulated with code injections after each sampling operation which utilizes the sampler that had the above mentioned state set on it.

Unfortunately right now the LLVM code in Gallium is far from finished. Due to looming deadlines on various projects I never got to spend the amount of time that is required to get in shape.
I'll be able to piggy back some of the LLVM work on top of the OpenCL state tracker code which is exciting.

Also, has anyone noticed that the last two blogs had no asides or any humor in them? I'm experimenting with "just what I want to say and nothing more" style of writing, wondering if it indeed, will be easier to read.

KDE graphics benchmarks

2009-03-12T14:14:00.002+00:00

This is really a public service announcement. KDE folks please stop writing "graphics benchmarks". It's especially pointless if you quantify your blog/article with "I'm not a graphics person but...".

What you're doing is:


   timer start
   issue some draw calls
   timer stop

This is completely and utterly wrong.
I'll give you an analogy that should make it a lot easier to understand. Lets say you have a 1MB and a 100MB lan and you want to write a benchmark to compare how fast you can download a 1GB file on both of those, so you do:


   timer start
   start download of a huge file
   timer stop

Do you see the problem? Obviously the file hasn't been downloaded by the time you stopped the timer. What you effectively measured is the speed at which you can execute function calls.
And yes, while your original suspicion that the 100MB line is a lot faster is still likely true, your test in no way proves that. In fact it does nothing short of making some poor individuals very sad due to the state of computer science. Also "So what that the test is wrong, it still feels faster" is not a valid excuse, because the whole point is that the test is wrong.

To give your tests some substance always make your applications run for at least a few seconds reporting the frames per second.

Or even better don't write them. Somewhere out there, some people, who actually know what's going on, have those tests written. And those people, who just happen to be "graphics people" have reasons for making certain things default. So while you may think you've made this incredible discovery, you really haven't. There's a kde-graphics mailing list where you can pose graphics question.
So to the next person who wants to write a KDE graphics related blog/article please, please go through kde-graphics mailing list first.

Video and other APIs

2009-02-09T01:31:00.002+00:00

I read today a short article about video acceleration in X. First of all I was already unhappy because I had to read. Personally I think all the news sites should come in a comic form. With as few captions as possible. Just like Perl I'm a write-only entity. Then I read that Gallium isn't an option when it comes to video acceleration because it exposes programmable pipeline and I got dizzy.

So I got off the carousel and I thought to myself: "Well, that's wrong", and apparently no one else got that. Clearly the whole "we're connected" thing is a lie because it doesn't matter how vividly I think about stuff, others still don't see what's in my head (although to the person who sent me cheese when I was in Norway - that was extremely well done). So cumbersome as it might be I'm doing the whole "writing my thoughts down" thing. Look at me! I'm writing! No hands! sdsdfewwwfr vnjbnhm nhn. Hands back on.

I think the confusion stems from the fact that the main interface in Gallium is the context interface, which does in fact model the programmable pipeline. Because of the incredibly flexible nature of the programmable pipeline a huge set of APIs is covered just by reusing the context interface. But in modern GPUs there are still some fixed function parts that are not easily addressable by the programmable pipeline interface. Video is a great example of that. To a lesser degree so is basic 2D acceleration (lesser because some of the modern GPUs don't have 2D engines at all anymore).

But, and it's a big but ("and I can not lie" <- song reference, pointing out to let everyone know that I'm all about music) nothing stops us from adding interfaces which deal exclusively with the fixed function parts of modern GPUs. In fact it has already been done as the work on a simple 2D interface has already started.

Basic idea is that the state trackers which need some specific functionality use the given interface. For example the Exa state tracker would use the Gallium 2d interface, instead of the main context interface. In this case the Gallium hardware driver will have a choice: it can either implement the given interface directly in hardware, or it can use the default implementation.

The default implementation is something Gallium will provide as part of the auxiliary libraries. The default implementation will use the main context interface to emulate the entire functionality of the other interface.

Video decoding framework would use the same semantics. So it would be an additional interface(s) with default implementation on top of the 3D pipeline. Obviously some parts of the video support are quite difficult to implement on top of the 3D pipeline but the whole point of this is that: for hardware that supports it you get the whole shabangabang, for hardware that doesn't you get a reasonable fallback. Plus in the latter case the driver authors don't have to write a single line of hardware specific code.

So a very nice project for someone would be to take VDPAU, VA-API or any video framework of your choice and implement a state tracker for that API on top of Gallium and design an interface(s) that could be added to Gallium to implement the API in a way that makes full usage of the fixed functionality video units found in the GPUs. I think this is the way our XvMC state tracker is heading.
This is the moment where we break into a song.

Latest changes

2009-02-04T06:06:00.001+00:00

I actually went through all my blog entries and removed spam. That means that you won't be able to find anymore links to stuff that can enlarge your penis. I hope this action will not shatter your lives and you'll find consolation in all the spam that you're getting via email anyway. And if not I saved some of the links. You never know, I say.
I also changed the template, I'd slap something ninja related on it, but I don't have anything that fits. Besides nowadays everyone is a graphics ninja. I'm counting hours until the term will be added to the dictionary. So my new nickname will be "The Lost Son of Norway, Duke of Poland and King of England". Aim high is my motto.

As a proud owner of exactly zero babies I got lots of time to think about stuff. Mostly about squirrels, goats and the letter q. So I wanted to talk about some of the things I've been thinking about lately.

Our friend ("our" as in the KDE communities, if you're not a part of it, then obviously not your friend, in fact he told me he doesn't like you at all) Ignacio Castaño has a very nice blog entry about 10 things one might do with tessellation.
The graphics pipeline continues evolving and while reading Ignacio's entry I realized that we haven't been that good about communicating the evolution of Gallium3D.
So here we go.

I've been slowly working towards support for geometry shaders in Gallium3D. Interface wise the changes are quite trivial, but a little bigger issue is that some (quite modern) hardware, while perfectly capable of emitting geometry in the pipeline is not quite capable of actually supporting all of the features of geometry shader extension. The question of how to handle that is an interesting one, because just simple emission of geometry via a shader is a very desirable feature (for example path rendering in OpenVG would profit from that).

I've been doing some small API cleanups lately. Jose made some great changes to the concept of surfaces, which became pure views on textures. As a follow up, over the last few days we have disassociated buffers from them, to make it really explicit. It gives drivers the opportunity to optimize a few things and with some changes Michel is working on avoid some redundant copies.

A lot of work went into winsys. Winsys, which is a little misnomer, was a part of Gallium that did too much. It was supposed to be a resource manager, handle command submission and handle integration with windowing systems and OS'es. We've been slowly chopping parts of it away. Making it a lot smaller and over the weekend managed to hide it completely from the state tracker side.

Keith extracted the Xlib and DRI code from winsys and put it into separate state trackers. Meaning that just like WGL state tracker, the code is actually sharable between all the drivers. That is great news.

Brian has been fixing and implementing so many bugs/features that people should start writing folk songs about him. Just the fact that we now support GL_ARB_framebuffer_object deserves at least a poem (not a big one, but certainly none of that white, non-rhyming stuff, we're talking full fledged rhymes and everything... You can tell that I know a lot about poetry can't you)

One thing that never got a lot of attention is that Thomas (who did get one of them baby thingies lately) released his window systems buffer manager code.

Another thing that didn't get a lot of attention is Alan's xf86-video-modesetting driver. It's a dummy X11 driver that uses DRM for modesetting and Gallium3D for graphics acceleration. Because of that it's hardware independent, meaning that all hardware that has a DRM driver and Gallium3D driver automatically works and is accelerated under X11. Very neat stuff.

Alright, I feel like I'm cutting into your youtube/facebook time and like all "Lost Sons of Norway, Dukes of Poland and Kings of England" I know my place, so that's it.

OpenCL

2009-02-01T22:45:00.001+00:00

I missed you so much. Yes, you. No, not you. You. I couldn't blog for a while and I ask you (no, not you) what's the point of living if one can't blog? Sure, there's the world outside of computers, but it's a scary place filled with people that, god forbid, might try interacting with you. Who needs that? It turns out that I do. I've spent the last week in Portland on the OpenCL working group meeting which was a part of the Khronos F2F.

For those who don't know (oh, you poor souls) OpenCL is what could be described as "the shit". That's the official spelling but substituting the word "the" for "da" is considered perfectly legal. Longer description includes the expansion of the term OpenCL to "Open Computing Language" with an accompanying wikipedia entry. OpenCL has all the ingredients, including the word "Open" right in the name, to make it one of the most important technologies of the coming years.

OpenCL allows us to tap into the tremendous power of modern GPUs. Not only that but also one can use OpenCL with accelerators (like the physics chips or Cell SPU's) and CPUs. On top of that hardware OpenCL provides both task-based and data-based parallelism making it a fascinating options for those who want to accelerate their code. For example if you have a canvas (Qt graphicsview) and you spend a lot of time doing collision detection, or if you have image manipulation application (Krita) and you spend a lot of time in effects and general image manipulation, or if you have a scientific chemistry application with an equation solver (Kalzium) and want to make it all faster, or if you have wonky hair and like to dance polka... OK, the last one is a little a fuzzy but you get the point.

Make no mistake, OpenCL is little bit more complicated than just "write your algorithm in C". Albeit well hidden, the graphics pipeline is still at the forefront of the design, so there are some limitations (remember that for a number of really good and few inconvenient reasons GPUs do their own memory management so you can not just move data structures between main and graphics memory). It's one of the reasons that you won't see a GCC based OpenCL implementation any time soon. OpenCL requires run time execution, it allows sharing of buffers with OpenGL (e.g. OpenCL image data type can be constructed from GL textures or renderbuffers) and it forces code generation to a number of different targets (GPUs, CPUs, accelerators). All those things need to be integrated. For sharing of buffers between OpenGL and OpenCL the two APIs need to go through some kind of a common framework - be it a real library or some utility code that exposes addresses and their meaning to both OpenGL and OpenCL implementations.

Fortunately we already have that layer. Gallium3D maps perfectly to the buffer and command management in OpenCL. Which shouldn't be surprising given that they both care about the graphics pipeline. So all we need is a new state tracker with some compiler framework integrated to parse and code generate from the OpenCL C language. LLVM is the obvious choice here because unlike GCC, LLVM has libraries that we can use for both (to be more specific it's Clang and LLVM). So yes, we started on an OpenCL state tracker, but of course we are far, far away from actual conformance. Being part of a large company means that we have to go through extensive legal reviews before releasing something in the open so right now we're patiently waiting.
My hope is that once the state tracker is public the work will continue at a lot faster pace. I'd love to see our implementation pass conformance with at least one GPU driver by summer (which is /hard/ but definitely not impossible).

SVG in KDE

2008-08-28T18:28:00.002+01:00

"Commitment" is one of the words that have never been used in this blog. Which is pretty impressive given that I've managed to use such words as sheep, llamas, raspberries, ninjas, donkeys, crack or woodchuck quite extensively (especially impressive in a technology centric blog).

That's because commitment implies that whatever it is one is committed to plays an important role in their life. It's a word that goes beyond the paper or the medium on which it was written. It enters the cold reality that surrounds us.

But today is all about commitment. It's about commitment that KDE made to a technology broadly refereed to as Scalable Vector Graphics. I took some time off this week and came to Germany where I talked about usage of SVG in KDE.

The paper about, what I like to call, the Freedom of Beauty, is available here:

https://www.svgopen.org/2008/papers/104-SVG_in_KDE/

It talks about the history of SVG in KDE, the rendering model used by KDE, it lists ways in which we use SVG and finally shows some problems which have been exposed by such diverse usage of SVG in a desktop environment. Please read it if you're interested in KDE or SVG.

Hopefully this paper marks a start of a more proactive role KDE is going to be playing in shaping of the SVG standard.

Fixes in Sonnet

2008-08-26T05:33:00.002+01:00

As we all know inner beauty is the most important kind of beauty. Especially if you're ugly. Not ugly, don't sue me, I meant to say "easy on the eyes challenged". That's one of the reasons I like working on frameworks and libraries. It's the appeal of improving the inner beauty of certain things. I gave up on trying to improve the inner beauty of myself (when I was about 1) so this is the most I can do.

You can do it too. It's real easy. I took this week off because I'm going to Germany for SVG Open where I'll talk about SVG in KDE and today fixed a few irritating bugs in Sonnet.

One of the things that bugged me for a while was the fact that we kept marking misspelled text as red instead of using the God given red squiggly underline. Well, I say no more!
Our spelling dialog lists available dictionaries now and one can change them on the fly. That's good. Raspberries good. And raspberries are pretty darn good. Even sheep like raspberries. Or so I think, the only sheep I've ever seen was from a window of a car and it looked like an animal who enjoys raspberries. Who doesn't? The only problem was that it liked listing things like "en_GB-ise" or "en_GB-ize-w_accents" as language names which is really like a nasty bug in the raspberry. And what do you with bugs? I'm not quite certain myself but given the way this blog is heading it's surely something disturbing... Anywho. that's also fixed. Now we list proper and readable names. As in:

Working on Sonnet is a lot of fun. A small change in a pretty small library affects the entire KDE which is rather rewarding. So if you wanted to get into KDE development in an easy and fun way go to https://bugs.kde.org search for "kspell" or "sonnet" pick an entry and simply fix it!