Tuesday, April 27, 2010

Geometry Processing - A love story

It's still early in the year but I feel like this is the favorite for the "best computer science related blog title of 2010". I'd include 2009 in that as well, but realistically speaking I probably wrote something equally stupid last year and I don't have the kind of time required to read what I write.

Last week I've merged the new geometry processing pipeline and hooked it into our new software driver, llvmpipe. It actually went smoother than I thought it would, i.e. it just worked. I celebrated by visiting ER, a harsh reminder that my lasting fascination with martial arts will be the end of me (it will pay dividends once llvmpipe becomes self-aware though).

The codepaths are part of the Gallium3D Draw module. It's fairly simple once you get to it. At a draw call we generate the optimal vertex pipeline for the currently set state. LLVM makes this stuff a lot easier. First of all we get the human-readable LLVM IR which is a lot easier than assembly to go over if something goes wrong. Running LLVM optimization passes over a fairly significant amount of code is a lot easier than having to hand-optimize assembly code-generation. Part of it is that geometry processing is composed of a few fairly distinct phases (e.g. fetch, shade, clip, assemble, emit) and given that they all can and will change depending on the currently set state makes it difficult to code-generate optimal code by hand. That is unless you have a compiler framework like LLVM. Then you don't care and hope LLVM will bail you out in cases where you end up doing something stupid for the sake of code-simplicity.

A good example of that is our usage of alloca's for all variables. Initially all variables were in registers but I switched the code to use alloca's for a very simple reason: doing flow control in SOA mode when everything was in registers was tough, it meant keeping track of the PHI nodes. Not to mention that we had no good way of doing indirect addressing in that scheme. Using alloca's makes our code a lot simpler. In the end, thanks to LLVM optimization passes (mem2reg in this case), virtually every usage of allocas is eliminated and replaced with direct register access.

Anyway, with the new geometry paths the improvements are quite substantial. For vertex processing dominated tests (like geartrain which went from 35fps to 110fps) the improvements are between 2x to about 6x, for cases which are dominated by fragment processing it's obviously a lot less (e.g. openarena went from about 25fps to about 32fps). All in all llvmpipe is looking real good.

9 comments:

Tom said...

You really write the funniest posts. You could ask for money ;)

The LLVM stuff rules! I really hope that FOSS 3D will become self aware soon and start ruling the world!

Unknown said...

Great results, keep working !
;)

And thanks for all the stuff you do.

Anonymous said...

Did you mean SSA instead of SOA?

Zack said...

No, I meant SOA (structure of arrays).

makomk said...

I seem to recall that using alloca for all local variables and letting the LLVM mem2reg optimizer convert them to registers is actually the recommended approach. The documentation claims mem2reg is a fairly cheap optimization.

Jos Poortvliet said...

Hmmm. SOA is short for Sexually Transmitted Dissease in dutch. Somehow it still feels right if *you* use it, however...

Anonymous said...

The only thing I can think about when I read this post is "would Flash playback on my i915 be better if the Intel drivers were using Gallium" (I gather they use shaders to speed up various conversions). I guess I'll never know...

Anonymous said...

I came for the post, I left wishing for flash's demise. If there was ever a piece of software I wanted crucified, flash is it. I am not Steve Jobs, but I wish I was, then I could rent your mom by the hour.

Anonymous said...

Hi Zack

Thanks for enlightening me and the rest of the world about the inner workings of the graphics pipeline et al.

And for the entertainment ;-).