Lately I've been working on the Gallium3D i965 driver. I had to read up some documentation because, as it turned out, being a "graphics ninja" did not give me intrinsic knowledge of all graphics hardware ever built. I know! I was as shocked as you are right now.
It started with the fact that I wanted to experiment with the layout of vectors in shaders for the code generation using LLVM facilities.
To experiment with different layouts I've decided to write i965 driver for Gallium3D and experiment with LLVM code generation for i965 in that driver. Keith Whitwell and I have been hacking on it and it's going pretty well. It's amazing how much code we've removed from the old i965 driver while porting it to Gallium3D. It was a rather nice feeling to see so much of the complexity of the driver disappear.
The great thing about writing a Gallium3D driver is that a lot of the complexity of the high level API goes away, as it's being moved to "state tracker". State tracker is responsible for all API specific state handling and tricky conversions. The driver never sees the state tracker, it implements a very thin interface, which corresponds rather closely to the way modern hardware works.
One of my favorite changes is the new way we handle state changes. It used to be that the driver had to check whether any of its state changed and if it did upload it before drawing anything. It turned out to be a rather serious bottleneck and it made reading our driver a little painful.
In Gallium3D we went away from that. Now we use similar semantics to what Direct3D10 and OpenGL3 (will) use. Which is that states are largely immutable objects. Their usage follows the
- create from a template
- bind the state to make it active for subsequent rendering calls
- delete when not needed anymore
When I wrote the constant state objects code I ported our simple i915 driver and, even though on i915 we don't have hardware state caching just doing the state conversions in create calls improved the performance in simple examples by about 15fps. For more complicated examples where the state changes are more frequent it will be a lot more. Not even mentioning drivers which will do full state caching in the hardware where this is going to fly like Superman with diarrhea.
Less complexity in the driver and faster code, is what I think love is all about. Granted that my idea of love might be a smidge skewed on account of me being crazy and all but no one can argue with the "simpler/faster" being 'awesome'.
And to really top this graphics talk off, a picture of me naked:
In retrospect not my best day. The lighting was all wrong...