Tuesday, April 27, 2010

Geometry Processing - A love story

It's still early in the year but I feel like this is the favorite for the "best computer science related blog title of 2010". I'd include 2009 in that as well, but realistically speaking I probably wrote something equally stupid last year and I don't have the kind of time required to read what I write.

Last week I've merged the new geometry processing pipeline and hooked it into our new software driver, llvmpipe. It actually went smoother than I thought it would, i.e. it just worked. I celebrated by visiting ER, a harsh reminder that my lasting fascination with martial arts will be the end of me (it will pay dividends once llvmpipe becomes self-aware though).

The codepaths are part of the Gallium3D Draw module. It's fairly simple once you get to it. At a draw call we generate the optimal vertex pipeline for the currently set state. LLVM makes this stuff a lot easier. First of all we get the human-readable LLVM IR which is a lot easier than assembly to go over if something goes wrong. Running LLVM optimization passes over a fairly significant amount of code is a lot easier than having to hand-optimize assembly code-generation. Part of it is that geometry processing is composed of a few fairly distinct phases (e.g. fetch, shade, clip, assemble, emit) and given that they all can and will change depending on the currently set state makes it difficult to code-generate optimal code by hand. That is unless you have a compiler framework like LLVM. Then you don't care and hope LLVM will bail you out in cases where you end up doing something stupid for the sake of code-simplicity.

A good example of that is our usage of alloca's for all variables. Initially all variables were in registers but I switched the code to use alloca's for a very simple reason: doing flow control in SOA mode when everything was in registers was tough, it meant keeping track of the PHI nodes. Not to mention that we had no good way of doing indirect addressing in that scheme. Using alloca's makes our code a lot simpler. In the end, thanks to LLVM optimization passes (mem2reg in this case), virtually every usage of allocas is eliminated and replaced with direct register access.

Anyway, with the new geometry paths the improvements are quite substantial. For vertex processing dominated tests (like geartrain which went from 35fps to 110fps) the improvements are between 2x to about 6x, for cases which are dominated by fragment processing it's obviously a lot less (e.g. openarena went from about 25fps to about 32fps). All in all llvmpipe is looking real good.