Friday, October 25, 2013


Aaron wrote a number of blogs about Bodega. Our open market for digital content. Today I wanted to write a bit about the technology behind it. Plus my prose is unbeatable - it works out, a lot, and is ripped like Stallone during his "definitely not on steroids, just natural, inject-able supplements" years, so it should be appreciated more widely.

If you ever used a content distribution platform, like the Google Play or Apple's App store, I'm sure you noticed the terrible way in which they handle content discovery, rankings, communication with the publishers and the vetting process. To some extend Bodega solves all of those and every day it's closer to the vision that we had when we started it. And we weren't even high that day. Well, I wasn't, Aaron was high on life. And crack. Mostly crack. I apologize, crack isn't funny. Unless you're a plumber, in which case it's hilarious.

What wasn't very clear at the beginning was the technology that we would use it to write it. We started with PHP because Aaron had some code that we could reuse from another project. Testing became a nightmare. I'm not a huge fan of dynamically typed languages for server programming, they make it a lot harder to validate the code and avoid silly mistakes and without highly efficient testing setup they can make your life hell. So one weekend when I got sick of setting up Apache once again to test something, I just sat down and rewrote the code in Javascript using Node.js. I know, I know, "you don't like dynamically typed languages and you rewrote it in js?", keep reading...

Node.js was a great choice for Bodega. Testing became trivial (we're using Mocha). Between our unit tests and jshint support, I can live with the dynamic nature of the language.

What makes node.js great is npm and its extensive collection of packages. That tooling means that you can focus just on the code that you need to write to finish your project not the code that you need to start it. That's why I picked it.

If I didn't have a day job I'd pick either Go, Haskell or just do everything by hand in C++. I have to admit that I like Go a lot (even though the system level graphics hacker in me shakes his head at a language without native vector support) but it's hard to argue with the fact that node.js allows us to move very quickly. So Bodega is written in node.js and uses Posgresql to store all of its data and that combination works amazingly well.

It's a Free Software content distribution platform that actually works and it's just the start. In my obviously biased opinion it does a lot of really interesting things that make it a very important project, but that's a different topic.

Tuesday, September 27, 2011

ApiTrace 2.0

ApiTrace 2.0 is out. ApiTrace is by far the best graphics debugging tool out there.

We have implemented so many new features since 1.0 that we have decided to bump the version number to 2.0.

Some of the new features include:
  • About 10x faster tracing and 2x faster retracing
  • Mac OS X support
  • OpenGL 4.2 support
  • Support for multi-gigabyte traces in the GUI
  • Ability to display all of the uniforms
  • Showing number of calls per frame, marking large frames and ability to indent shaders
  • Support for `GL_GREMEDY_string_marker` and `GL_GREMEDY_frame_terminator` extensions, so you can mark frames and important pieces of code in your applications.
  • Incredible amount of bug fixes, making sure that everything we throw at it just works

The huge improvements in the performance of tracing came from both flushing/syncing the trace files only if there's an uncaught signal/exception and switching to Snappy compression from zlib.
We also did some magic to seek and load on demand from a compressed file on the disk which means that we don't load all of the data at once. That allows us to load huge files in the GUI while using very little memory. For example loading a 1GB file takes only ~20MB of RAM.

All of those improvements mean that it's possible to trace and debug huge applications with little to no costs. It's quite amazing. In fact working with graphics and not using ApiTrace starting now is going to be classified as abuse.
Read the instructions and start living the life of a computer graphics superstar, which differs from the life of an average computer graphics hacker by about 10 keyboard headbutts and half a bucket of tears a week! And that's a change we can all believe in!

Sunday, September 18, 2011

NV path rendering

A while ago NVIDIA released drivers with their NV_path_rendering extension. GL_NV_path_rendering is a relatively new OpenGL extension which allows rendering of stroked and filled paths on the GPU.

 I've heard about it before but lately I've been too busy with other stuff to look at, well, anything.

 I've decided to spend a few seconds looking at the videos NVIDIA posted on their site. They were comparing the NV_path_rendering, Skia, Cairo and Qt. Pretty neat. Some of the demos were using huge paths, clipped by another weird path and using perspective transforms. Qt was very slow. It was time for me to abandon my dream of teaching my imaginary hamster how to drive a stick shift and once again look at path rendering.

You see, I wrote the path rendering code so many times that one of my favorite pastimes was creating ridicules paths that no one would ever think about rendering and seeing how fast I could render them. Qt's OpenGL code was always unbelievably good at rendering paths no one would ever render. Clearly these people were trying to outcrazy me.

Fortunately there's an SDK posted on the NVIDIA site and it's really well done. It even compiles and works on GNU/Linux. Probably the best demo code for a new extension that I've ever seen. The extension itself is very well done as well. It's very robust, ultimately though it's the implementation that I care about. I have just one workstation with an NVIDIA card in it, a measly Quadro 600, running on a dual processor xeon e5405, but it was enough to play with it.

 The parts using Qt were using the raster engine though. I've looked at the code and decided to write something that would render the same thing but using just Qt. The results were a little surprising. Qt OpenGL could render tiger.svg scaling and rotating it at about 270fps, while the NV_path_rendering was running at about 72fps. Here's both of them running side by side:

 (numbers lower for both on account of them running at the same time of course). As you can see Qt is almost 4x faster. I've figured it might be related to the different SVG implementations and rendering techniques used, so I quickly hacked the demo NVIDIA posted to open a brand new window (you need to click on it to start rendering) and render to QGLPixelBuffer but using the same SVG and rendering code as their NV_path_rendering demo code. The results were basically the same.

I posted the code for the Qt demo and the patch to nvpr_svg on github:

The patch is larger than it should be because it also changed the file encoding on the saved files from DOS to Unix but you shouldn't have any issues applying it.

So from a quick glance it doesn't seem like there are any performance benefits to using NV_path_rendering, in fact Qt would likely be quite a bit slower with it. Having said that NVIDIA's implementation looks very robust and a lot more numerically stable. I've spent a little bit of time looking at the individual pixels and came away very impressed.

In general the extension is in a little bit of a weird situation. On one hand, unlike OpenVG which creates a whole new API, it's the proper way of introducing GPU path rendering, on the other hand pretty much every vector graphics toolkit out there already implements GPU based path rendering. Obviously the implementations differ and some might profit from the extension but for Qt the question is whether that quality matters more than the performance. Specifically whether the quality improves enough to justify the performance hit.

I think the extension's success will largely depend on whether it's promoted to, at least an EXT or, ideally an ARB, meaning all the drivers support it. Using it would make the implementations of path rendering in toolkits/vector graphics libs a lot simpler and give driver developer a central place to optimize a pretty crucial  part of the modern graphics stack. Unfortunately if you still need to maintain the non NV_path_rendering paths then it doesn't make a whole lot of sense. Mesa3D implementation would be trivial simply because I've already implemented path rendering for OpenVG using the Gallium3D interface, so it'd be a matter of moving that code but I'm just not sure if anyone will be actually using this extension. All in all, it's a very well done extension but it might be a little too late.

Monday, April 25, 2011


During the last three weeks I've spent most of my spare time writing a GUI for Jose's amazing ApiTrace project. ApiTrace is a project to trace, analyze and debug graphics api's. Both OpenGL and Direct3D. To some extend inspired by gDEBugger and Windows PIX. We wanted a tool that would let us slice through huge games and CAD apps to the exact call which causes problems and be able to inspect the entire graphics state, including the shaders, textures and all the buffers. We ended up doing that, plus a lot more and we're just getting started. In other words it's the best thing since "Human/Robot Emancipation Act of 3015".

You begin by tracing your target application. You can do that either from the console or from the GUI. A trace file is created and we can do some amazing things with it. You can open it in a GUI and
  • Inspect the state frame by frame, draw call by draw call:
  • Replay the trace file:
  • Check every texture:
  • Every bound framebuffer:
  • Every shader:
  • Every vertex buffer:
  • You can see if OpenGL threw an error at any point during the replay and if so what was it:
  • And to go completely nuts, as graphics developers like to do, you get to edit any shader, any uniform and large chunks of the state to immediately see the effects it would have on the rendering:

As a driver developer you no longer have to install all the games just to debug a problem, the report can simply include a short trace which you can use to immediately figure out what's wrong. As an application developer you can inspect every graphics call your app makes, you can analyze your api usage and you could automatically produce standalone testcases which you can send to driver developers.

ApiTrace is hosted on github and it's BSD licensed. It works on Linux and Windows (we're planning to add OSX support as well). Gui is written using Qt and requires the QJson library.

Jose just announced the first release so let us know if there's anything that would make your life a lot easier. Next is support for multiple GL contexts, ability to export just a single frame from a trace (either as a trace file or a standalone C application), ability to start and stop tracing on a hot key and lots of other features soon. So whether you're a driver developer, working on games, CAD apps or 2D scene-graphs this is a tool that should make your life significantly easier and better.

Tuesday, November 02, 2010

2D musings

If you've been following graphics developments in the 2D world over the last few years you've probably seen a number of blogs and articles complaining about performance. In particular about how slow 2D is on GPUs. Have you ever wondered why it's possible to make this completely smooth but your desktop still sometimes feels sluggish?

Bad model

For some weird reason ("neglect" being one of them) 2D rendering model hasn't evolved at all in the last few years. That is if it has evolved at all since the very first "draw line" became a function call. Draw line, draw rectangle, draw image, blit this, were simply joined by fill path, stroke path, few extra composition modes and such. At its very core the model remained the same though, meaning lots of calls to draw an equally large number of small primitives.

This worked well because technically zero, or almost zero, setup code was necessary to start rendering. Then GPUs became prevalent and they could do amazing things but to get them to do anything you had to upload the data and the commands that would tell them what to do. With time more and more data had to be sent to the GPU to describe the increasingly complex and larger scenes. It made sense to optimize the process of uploads (I keep calling them "uploads" but "GPU downloads" is closer to the true meaning) by allowing to upload an entire resource once and then refer to it via a handle. Buffers, shaders, addition of new shading stages (tessellation, geometry) all meant to reduce the size of data that had to be uploaded to the GPU before every rendering.

At least for games and well designed 3D software. 2D stuck to its old model of "make GPU download everything on every draw request". It worked ok because most of the user interface was static and rather boring so the performance was never much of an issue. Plus in many cases the huge setup costs are offset by the fact that the Graphics Processing Units are really good at processing graphics.

Each application is composed of multiple widgets each widget draws itself using multiple primitives (pixmaps, rectangles, lines, paths) and each primitive needs to first upload the data needed by the GPU to render it. It's like that because from the 2D api perspective there's no object persistence. The api has no idea that you keep re-rendering the same button over and over again. All the api sees is another "draw rectangle" or "draw path" call which it will complete.

On each frame the same data is being copied to the GPU over and over again. It's not very efficient, is it? There's a limited number of optimizations you can do in this model. Some of the more obvious ones include:
  • adding unique identifiers to the pixmaps/surfaces and using those as identifiers as keys in a texture cache which allows you to create a texture for every pixmap/surface only once,
  • collecting data from each draw call in a temporary buffer and copying it all at once (e.g. in SkOSWindow::afterChildren, QWindowSurface::endPaint or such),
  • creating a shader cache for different types of fills and composition modes

  • But the real problem is that you keep making the GPU download the same data every frame and unfortunately that is really hard to fix in this model.

    Fixing the model

    It all boils down to creating some kind of a store where lifetime of an object/model is known. This way the scene knows exactly what objects are being rendered and before rendering begins it can initialize and upload all the data the items need to be renderer. Then rendering is just that - rendering. Data transfers are limited to object addition/removal or significant changes to their properties and then further limited by the fact that a lot of the state can always be reused. Note that trivial things like changing the texture (e.g. on hover/push) don't require any additional transfers and things like translations can be limited to just two floats (translation in x and y) and they're usually shared for multiple primitives (e.g. in a pushbutton it would be used by the background texture and the label texture/glyphs)

    It would seem like the addition of QGraphicsView was a good time to change the 2D model, but that wasn't really possible because people like their QPainter. No one likes when a tool they have been using for a while and are fairly familiar with is suddenly taken away. Completely changing a model required a more drastic move.

    QML and scene-graph

    QML fundamentally changes the way we create interfaces and it's very neat. From the api perspective it's not much different from JavaFX and one could argue which one is neater/better but QML allows us to almost completely get rid of the old 2D rendering model and that's why I love it! A side-effect of moving to QML is likely the most significant change we've done to accelerated 2D in a long time. The new Qt scene graph is a very important project that can make a huge difference to the performance, look and feel of 2D interfaces.
    Give it a try. If you don't have OpenGL working, no worries it will work fine with Mesa3D on top of llvmpipe.

    A nice project would be doing the same in web engines. We have all the info there but we decompose it into the draw line, draw path, draw rectangle, draw image calls. Short of the canvas object which needs the old style painters, everything is there to make accelerated web engines a lot better at rendering the content.

    Thursday, October 28, 2010

    Intermediate representation again

    I wrote about intermediate representation a few times already but I've been blogging so rarely that it feels like ions ago. It's an important topic and one that is made a bit more convoluted by Gallium.

    The intermediate representation (IR) we use in Gallium is called Tokenized Gallium Shader Instructions or TGSI. In general when you think about IR you think about some middle layer. In other words you have a language (e.g. C, GLSL, Python, whatever) which is being compiled into some IR which then is transformed/optimized and finally compiled into some target language (e.g. X86 assembly).

    This is not how Gallium and TGSI work or were meant to work. We realized that people were making that mistake so we tried to back-paddle on the usage of the term IR when referring to TGSI and started calling it a "transport" or "shader interface" which better describes its purpose but is still pretty confusing. TGSI was simply not designed as a transformable representation. It can be done, but it's a lot like a dance-off on a geek conference - painful and embarrassing for everyone involved.

    The way it was meant to work was:

    Language -> TGSI ->[ GPU specific IR -> transformations -> GPU ]

    with the parts in the brackets living in the driver. Why like that? Because GPUs are so different that we thought each of them would require its own transformations and would need its own IR to operate on. Because we're not compiler experts and didn't think we could design something that would work well for everyone. Finally and most importantly because it's how Direct3D does it. Direct3D functional specification is unfortunately not public which makes it a bit hard to explain.

    The idea behind it was great in both its simplicity and overabundance of misplaced optimism. All graphics companies had Direct3D drivers, they all had working code that compiled from Direct3D assembly to their respective GPUs. "If TGSI will be a lot like Direct3D assembly then TGSI will work with every GPU that works on Windows, plus wouldn't it be wonderful if all those companies could basically just take that Windows code and end up with a working shader compiler for GNU/Linux?!", we thought. Logically that would be a nice thing. Sadly companies do not like to release part of their Windows driver code as Free Software, sometimes it's not even possible. Sometimes Windows and Linux teams never talk to each other. Sometimes they just don't care. Either way our lofty goal of making the IR so much easier and quicker to adopt took a pretty severe beating. It's especially disappointing since if you look at some of the documentation e.g. for AMD Intermediate Language you'll notice that this stuff is essentially Direct3D assembly which is essentially TGSI (and most of the parts that are in AMD IL and not in TGSI are parts that will be added to TGSI) . So they have this code. In the case of AMD it's even sadder because the crucial code that we need for OpenCL right now is OpenCL C -> TGSI LLVM backend which AMD already does for their IL. Some poor schmuck will have to sit down and write more/less the same code. Of course if it's going to be me it's "poor, insanely handsome and definitely not a schmuck".

    So we're left with Free Software developers who don't have access to the Direct3D functional spec and who are being confused by the IR which is unlike anything they've seen (pre-declared registers, typeless...) which on top of it is not easily transformable. TGSI is very readable, simple and pretty easy to debug though so it's not all negative. It's also great if you never have to optimize or transform its structure which unfortunately is rather rare.
    If we abandon the hope of having the code from Windows drivers injected in the GNU/Linux drivers it becomes pretty clear that we could do better than TGSI. Personally I just abhor the idea of rolling out our own IR. IR in the true sense of that word. Crazy as it may sound I'd really like my compiler stuff to be written by compiler experts. It's the main reason why I really like the idea of using LLVM IR as our IR.

    Ultimately it's all kind of taking the "science" out of "computer science" because it's very speculative. We know AMD and NVIDIA use it to some extend (and there's an open PTX backend for LLVM) , we like it, we use it in some places (llvmpipe), the people behind LLVM are great and know their stuff but how hard is it to use LLVM IR as the main IR in a graphics framework and how hard is it to code-generate directly from it for GPUs - we don't really know. It seems like a really good idea, good enough for folks from LunarG to give it a try which I think is really what we need; a proof that it is possible and doesn't require sacrificing any farm animals. Which, as a vegetarian, I'd be firmly against.

    Thursday, July 01, 2010

    Graphics drivers

    There are only two tasks harder than writing Free Software graphics drivers. One is running a successful crocodile petting zoo, the other is wireless bungee jumping.

    In general writing graphics drivers is hard. The number of people who can actually do it is very small and the ones who can do it well are usually doing it full-time already. Unless the company, which those folks are working for, supports open drivers, the earliest someone can start working on open drivers is the day the hardware is officially available. That's already about 2 years too late, maybe a year if the hardware is just an incremental update. Obviously not a lot of programmers have the motivation to do that. Small subset of the already very small subset of programmers who can write drivers. Each of them worth their weight in gold (double, since they're usually pretty skinny).

    Vendors who ship hardware with GNU/Linux don't see a lot of value in having open graphics drivers on those platforms. All the Android phones are a good example of that. Then again Android decided to use Skia, which right now is simply the worst of the vector graphics frameworks out there (Qt and Cairo being two other notables). Plus having to wrap every new game port (all that code is always C/C++) with NDK is a pain. So the lack of decent drivers is probably not their biggest problem.

    MeeGo has a lot better graphics framework, but we're ways of before we'll see anyone shipping devices running it and Nokia in the past did "awesome" things like release GNU/Linux devices with a GPU but without any drivers for it (n800, n810) and Intel's Poulsbo/Moorestown graphics drivers woes are well-known. On the desktop side it seems that Intel folks are afraid that porting their drivers to Gallium will destabilize them. Which is certainly true, but the benefits of doing so (multiple state trackers, cleaner driver code, being able to use Gallium debugging tools like trace/replay/rbug and nicely abstracting the api code from drivers) would be well worth it and hugely beneficial to everyone.

    As it stands we have an excellent framework in Gallium3D but not a lot of open drivers for it. Ironically it's our new software driver, llvmpipe, or more precisely a mutation of it, which has the potential to fix some of our GPU issues in the future. With the continues generalization of GPUs my hope is that all we'll need is DRM code (memory management, kernel modesetting, command submission) and LLVM->GPU code generator. It's not exactly a trivial amount of code by any stretch of the imagination but smaller than what we'd need right now and once it would be done for one or two GPUs it would certainly become a lot simpler. Plus GPGPU will eventually make the latter part mandatory anyway. Having that would get us a working driver right away and after that we could play with texture sampling and vertex paths (which will likely stay as dedicated units for a while) as optimizations.

    Of course it would be even better if a company shipping millions of devices with GNU/Linux wanted a working graphics stack from bottom up (memory management, kernel modesetting, Gallium drivers with multiple state trackers, optimized for whatever 2D framework they use) but that would make sense. "Is that bad?" you ask. Oh, it's terrible because GNU/Linux graphics stack on all those shipped devices is apparently meant to defy logic.