There's a number of great software renderers out there. SwiftShader and DirectX WARP being the two widely known ones. Unfortunately GNU/Linux, and Free Software in general, didn't have a modern software renderer. That's about to change thanks to a project started by José Fonseca. The project is known as the llvmpipe (obviously a development name). José decided that the way forward is writing a driver for Gallium which would code-generate at run-time the rendering paths from the currently set state. LLVM is used to code-generate and optimize the code.
Besides the idea itself, my favorite part of the driver is the software rasterizer. Keith Whitwell, Brian Paul and José implemented a threaded, tiled rasterizer which performs very nicely and scales pretty well with the number of cores. I'm sure one of them will write more about it when they'll have a bit of spare time.
Currently the entire fragment pipeline is code-generated. Over the last two weeks I've been implementing the vertex pipeline, which I'm hoping to merge soon (hence the light smile). Code generating the entire vertex pipeline isn't exactly trivial, but one can divide it into individual pieces and that makes it a bit easier. Start with the vertex shader, then go back and do the fetch and translate, then again move forward and do the emit, then go back and do the viewport transformations and clipping and so on, finally combine all the pieces together.
In between working on the vertex pipeline I've been filling in some missing pieces in the shader compilation. In particular the control-flow. We use the SOA layout which always make control flow a bit tricky. I've just committed support for loops and the only thing left is support for subroutines in shaders so I think we're in a pretty good shape. We can't rock the speedos quite yet, but we're getting there. It's my new measurment for software quality - could it pull off the speedos look? There's few things on this world that can.
Keeping in mind that we haven't even started optimizing it yet, as we're still implementing features, the driver, on my Xeon E5405 runs the anholt.dm_68 OpenArena demo at 25fps (albeit with some artifacts) which is quite frankly pretty impressive, especially if you compare it to the old Mesa3D software renderer that runs the same demo, on the same machine at 3.5 fps.