Monday, March 23, 2009

Intermediate representation

I've spent the last two weeks working from our office in London. The office is right behind Tate Modern with a great view of London which is really nice. While there I was mainly trying to get the compilation of shaders rock solid for one of our drivers.

In Gallium3D currently we're using TGSI intermediate representation. It's a fairly neat representation with instruction semantics matching those from relevant GL extensions/specs. If you're familiar with D3D you know that there are subtle differences between seemingly similar instructions in both APIs. Modern hardware tends to follow the D3D semantics more closely than the GL ones.

Some of the differences include:
  • Indirect addressing offsets, in GL offsets are between -64 to +63, while D3D wants them >= 0.
  • Output to temporaries. In D3D certain instructions can only output to temporary registers while TGSI doesn't have that requirement.
  • Branching. D3D has both if_comp and setp instructions which can perform essentially any of the comparisons, while we immitate them with a SLT (set on less than) or similar with a IF instruction with semantics of the one in GL_NV_fragment_program2.
  • Looping.
None of this is particularly difficult but we had to implement almost exactly the same code in another driver of ours and that makes me a little uneasy. There's a number of ways to fix this. We could change the semantics of TGSI to match D3D, or we could implement transformation passes that operate on TGSI and do this transformation directly on the IR before it gets to the driver, or we could radically change the IR. Strangely enough it's the solution number #3, which while appearing the most bizzare, is the one I like the most. It's because it goes hand in hand with the usage of LLVM inside Gallium.

Especially because there's a lot generic transformations that really should be done on the IR itself. A good example of that includes code injections due to missing pieces of the API state on a given GPU, e.g. shadow texturing, including the compare mode, compare function and the shadow ambient value, which if the samplers on the given piece of hardware don't support need to be emulated with code injections after each sampling operation which utilizes the sampler that had the above mentioned state set on it.

Unfortunately right now the LLVM code in Gallium is far from finished. Due to looming deadlines on various projects I never got to spend the amount of time that is required to get in shape.
I'll be able to piggy back some of the LLVM work on top of the OpenCL state tracker code which is exciting.

Also, has anyone noticed that the last two blogs had no asides or any humor in them? I'm experimenting with "just what I want to say and nothing more" style of writing, wondering if it indeed, will be easier to read.

Thursday, March 12, 2009

KDE graphics benchmarks

This is really a public service announcement. KDE folks please stop writing "graphics benchmarks". It's especially pointless if you quantify your blog/article with "I'm not a graphics person but...".

What you're doing is:

timer start
issue some draw calls
timer stop

This is completely and utterly wrong.
I'll give you an analogy that should make it a lot easier to understand. Lets say you have a 1MB and a 100MB lan and you want to write a benchmark to compare how fast you can download a 1GB file on both of those, so you do:

timer start
start download of a huge file
timer stop

Do you see the problem? Obviously the file hasn't been downloaded by the time you stopped the timer. What you effectively measured is the speed at which you can execute function calls.
And yes, while your original suspicion that the 100MB line is a lot faster is still likely true, your test in no way proves that. In fact it does nothing short of making some poor individuals very sad due to the state of computer science. Also "So what that the test is wrong, it still feels faster" is not a valid excuse, because the whole point is that the test is wrong.

To give your tests some substance always make your applications run for at least a few seconds reporting the frames per second.

Or even better don't write them. Somewhere out there, some people, who actually know what's going on, have those tests written. And those people, who just happen to be "graphics people" have reasons for making certain things default. So while you may think you've made this incredible discovery, you really haven't. There's a kde-graphics mailing list where you can pose graphics question.
So to the next person who wants to write a KDE graphics related blog/article please, please go through kde-graphics mailing list first.