If you've been following graphics developments in the 2D world over the last few years you've probably seen a number of blogs and articles complaining about performance. In particular about how slow 2D is on GPUs. Have you ever wondered why it's possible to make this completely smooth but your desktop still sometimes feels sluggish?
Bad model
For some weird reason ("neglect" being one of them) 2D rendering model hasn't evolved at all in the last few years. That is if it has evolved at all since the very first "draw line" became a function call. Draw line, draw rectangle, draw image, blit this, were simply joined by fill path, stroke path, few extra composition modes and such. At its very core the model remained the same though, meaning lots of calls to draw an equally large number of small primitives.
This worked well because technically zero, or almost zero, setup code was necessary to start rendering. Then GPUs became prevalent and they could do amazing things but to get them to do anything you had to upload the data and the commands that would tell them what to do. With time more and more data had to be sent to the GPU to describe the increasingly complex and larger scenes. It made sense to optimize the process of uploads (I keep calling them "uploads" but "GPU downloads" is closer to the true meaning) by allowing to upload an entire resource once and then refer to it via a handle. Buffers, shaders, addition of new shading stages (tessellation, geometry) all meant to reduce the size of data that had to be uploaded to the GPU before every rendering.
At least for games and well designed 3D software. 2D stuck to its old model of "make GPU download everything on every draw request". It worked ok because most of the user interface was static and rather boring so the performance was never much of an issue. Plus in many cases the huge setup costs are offset by the fact that the Graphics Processing Units are really good at processing graphics.
Each application is composed of multiple widgets each widget draws itself using multiple primitives (pixmaps, rectangles, lines, paths) and each primitive needs to first upload the data needed by the GPU to render it. It's like that because from the 2D api perspective there's no object persistence. The api has no idea that you keep re-rendering the same button over and over again. All the api sees is another "draw rectangle" or "draw path" call which it will complete.
On each frame the same data is being copied to the GPU over and over again. It's not very efficient, is it? There's a limited number of optimizations you can do in this model. Some of the more obvious ones include:
But the real problem is that you keep making the GPU download the same data every frame and unfortunately that is really hard to fix in this model.
Fixing the model
It all boils down to creating some kind of a store where lifetime of an object/model is known. This way the scene knows exactly what objects are being rendered and before rendering begins it can initialize and upload all the data the items need to be renderer. Then rendering is just that - rendering. Data transfers are limited to object addition/removal or significant changes to their properties and then further limited by the fact that a lot of the state can always be reused. Note that trivial things like changing the texture (e.g. on hover/push) don't require any additional transfers and things like translations can be limited to just two floats (translation in x and y) and they're usually shared for multiple primitives (e.g. in a pushbutton it would be used by the background texture and the label texture/glyphs)
It would seem like the addition of QGraphicsView was a good time to change the 2D model, but that wasn't really possible because people like their QPainter. No one likes when a tool they have been using for a while and are fairly familiar with is suddenly taken away. Completely changing a model required a more drastic move.
QML and scene-graph
QML fundamentally changes the way we create interfaces and it's very neat. From the api perspective it's not much different from JavaFX and one could argue which one is neater/better but QML allows us to almost completely get rid of the old 2D rendering model and that's why I love it! A side-effect of moving to QML is likely the most significant change we've done to accelerated 2D in a long time. The new Qt scene graph is a very important project that can make a huge difference to the performance, look and feel of 2D interfaces.
Give it a try. If you don't have OpenGL working, no worries it will work fine with Mesa3D on top of llvmpipe.
A nice project would be doing the same in web engines. We have all the info there but we decompose it into the draw line, draw path, draw rectangle, draw image calls. Short of the canvas object which needs the old style painters, everything is there to make accelerated web engines a lot better at rendering the content.
36 comments:
Nice article, but what about normal rendering? would all kde software have to be ported to qml to take advantage of this?
I'm not quite sure, but isn't IE9 doing this: http://blogs.msdn.com/b/ie/archive/2010/09/10/the-architecture-of-full-hardware-acceleration-of-all-web-page-content.aspx
I actually don't know how Direct2D works which is used there.
When are we going to see glyph generation in the GPU?
How about resolution independent UIs that get turned into device coordinates on the GPU?
Or JIT anti-aliasing on the GPU when the GPU knows the object's final location? Why are apps doing anti-aliasing?
You do realize that you can use OpenGL to display in 2D using ALL the functions available in 3D??
All that is require is a call to glOrtho() to enable parallel projection (no depth as such, but you can still use the Z buffer).
* http://www.opengl.org/sdk/docs/man/xhtml/glOrtho.xml
Full hardware acceleration (which unless it is a very bad gfx chip is considerably faster than the CPU).
OpenGL is a breeze to use and is cross platform.
I shall even point you in the direction of a tutorial, for use of SDL with OpenGL (SDL is just a slightly convent wrapper that removes tedious setup code).
* http://www.sdltutorials.com/sdl-opengl-tutorial-basics/
You will have arcade smooth visuals in no time.
About the only thing of note is that modern TFT monitors (compared to a CRT based monitor) don't really like fast movement. With 3D you can nearly get away with it but in 2D you will notice anything moving faster than 2 pixels a frame may, depending on manufacturer, starts to blur.
@damian: Yea, it unfortunately would. It's not as difficult as it seems - the entire logic stays the same you just remove void paintEvent() {...} and replace it with with some qml text. Plus everyone is looking into switching at least the animating portions of their ui's to qml anyway.
@Anonymous: I'm not sure what IE9 is doing. D2D though is still largely the old model though. They did abstract a lot of the state into objects which helps quite a bit though.
@Jon: 1) Good question. I'm not sure if its going to be anytime soon. It's neat but it's a lot slower than blitting them from textures using extended blending (to get subpixel rendering). And webpages don't transform the glyphs too much to make it an improvement in quality.
2) In the scene graph case the transformations would of course happen all on the GPU.
3) In Qt when rendering in GL we weren't anti-aliasing on the CPU unless the users explicitly requested high-quality anti-aliasing. Otherwise whatever GPU preferred method of doing it was used. I always thought that was good enough but others disagreed. We still do it because it still makes a big difference though.
@Anonymous: Really guy? You couldn't stop yourself from making me have even less faith in human kind, could you? Maybe I should start deleting comments like that because otherwise I'd have to start pointing out things like links to a couple of year old sources that do what you complain we don't (like http://qt.gitorious.org/qt/qt/blobs/4.7/src/opengl/gl2paintengineex/qpaintengineex_opengl2.cpp ), or point out that I explicitly said in the blog to which you're responding that that's not the biggest problem, or shake my head at the idea of using SDL in Qt applications and how the blog to which you're responding says that's not the problem, or point out that if you're trying to be condescending then knowing at least a little bit about what the hell you're talking about helps... I could also do what I usually do in those cases and ignore it, but somehow this time I just didn't feel like it.
The biggest problem isn't really the amount of data sent to the GPU for every frame, but instead the fact that each primitive in a 2D UI is drawn individually. This is in contrast to game engines, which try to render objects in large batches and generally attempt to minimize the amount of state changes between objects.
The new QML scene graph is one necessary step in solving this, but I think we will also need to learn some lessons from the game industry when it comes to the content creation pipeline. Where is the UI editor that, for instance, automatically packs QML elements into a shared, compressed texture atlas?
All 2D graphics want to be NeWS someday.
You should look at WPF. It has cached buffers of GPU commands for each visible element. The buffer is updated with new draw commands only when the element is changed. Rendering is just sending these command buffers (which live in the GPU memory) to the GPU in the right order.
I love the idea, but...
This would potentially make code less portable (blitting to any graphics adapter is relatively easy.. implementing OpenGL.. not so much..), and then what happens with software such as VNC? Would it need to implement it's own software renderer for OpenGL?
QML eh? biggest advance? :) Funny that EFL paved the way here and was doing this years before QML was a twinkle in Nokia... trolltech's eyes. QML has definitely examined EFL and taken heavily from it. It was doing the "we can use both software and OpenGL to do this rendering" long before Qt...
But yes - 2D was neglected. it's model is stale and wrong. EFL started to change that. It's an uphill battle to change mindsets of developers though.
First, unlike the second anonymous poster, I have been following your blog for a while, and anyone who has would know you've thoroughly investigated the OpenGL pipeline. Obviously the second anonymous poster has never paid attention to anything like Loop-Blinn, etc.
I'm not sure what the answer is, but I agree that there are some deep, fundamental issues with quality 2D rendering in real time (HD, anti-alias, vector, 60fps). I have been investigating this subject and following it, as well as this blog and others who are interested in high quality, real-time, high-resolution, 2D rendering. Many will think that this already exists, and it does, to some limited extent, but not to the extent of providing a large, powerful, open canvas for drawing and interaction where shapes and data are interacted with, and generated on the fly as in 3D games.
If one wants to draw a string on the screen, and some rectangles, or some vectors, current technology works fine. But, if you want to start generating the drawing data, not just specifying it, it becomes much more complex.
When I can write a program in an interpreted language, render it in real-time, and HD, at high rates, I will be very pleased. Basically, what I want to know is when can I have some software like Flash, or Processing, but with output like Adobe Illustrator, in real-time, HD.
If that were an easy request, Microsoft, Adobe, and Java would have answered it long ago. They haven't because there are still fundamental problems in 2D graphics. Keep up the good work.
redpicture
@Sami Kyöstilä: yea, batching is certainly an issue, but as I mentioned you can do a lot of batching in the old model (that was subpoint #2 in the list I posted). Of course knowing ahead of the time what exactly will be rendered makes batching a lot simpler. I agree with your points about the creation pipeline but it looks like the Nokia guys are looking at multiple projects there (Qt creator and exporters for Photoshop and Gimp) so hopefully that will be fixed soonish.
@Anonymous: Yea, things like VNC would have to have something like llvmpipe running. Either that or we could always fallback to the old model in those cases.
@raster: Not really. You'd be probably right saying that JavaFX had a lot of impact on it, but not EFL. In fact QML as it stands uses the old model (qgraphicsview) it's the new scene-graph (link in the blog) that makes it interesting from the graphics perspective. For the scene-graph it's game engines and proper usage of GL are bigger muses than anything else. Of course if you wanted you could implement QML on top of EFL or more specifically Evas but for us (us as in KDE/Qt) writing scene-graph is a better option.
@Anonymous: Yea, definitely. For now if you have working GL (and if not llvmpipe is always an option) I'd suggest cloning the Qt scene-graph (link in the blog) and give it a shot. There are some examples in the examples directory. See how far it is from your ideal and if you have any suggestions the Qt bugtracker (even though it's not as user friendly as it could) is waiting :)
"For some weird reason ("neglect" being one of them) 2D rendering model hasn't evolved at all in the last few years."
You seem to forget that Edje (from the Enlightenment project) has already had these ideas implemented for years, and was used in QEdje, which was actually the start of QML.
actually evas is a scene graph. not saying that kde should change from qt to efl - but wml and qscenegraph is much newer than evas or edje for example. evas first started doing its thing back in 2001. with opengl and software engines to boot. scene graph with multiple abstracted rendering pipelines since then. edje is the "ui in data file loaded/interpreted at runtime" model like qml - built on to pof.. a scene graph... which is evas :) unless i totally mis-understand qt... :)
This post would have been awesome in 1997 but, as others have already explained, these problems have long since been solved. Even Microsoft's Windows Presentation Foundation has provided this kind of functionality as standard for over 4 years. Our own libraries have been providing much more advanced functionality for over 13 years.
The fact that people seem to chuckle and move on, or have some form of a "wtf" reaction whenever I mention NeWS in a positive manner bothers me.
It was far superior to X, but lost because it was non-free, and incredibly expensive.
Look to NeWS for ideas, yo.
"The NeWS Book" is available cheap on amazon, and is full of good ideas and insight on 2D rendering, particularly of vector graphics. The specifics of the rendering, implementation language etc can be kind of glossed-over, as can the sections on programming in PS, though the concepts behind the NeWS API in the programming sections are worth reading.
As a side note,
I frankly don't see why we need SVG et al, when extending PS to support layers would have been enough. Perfect.
Oh, and there are actual 100% feature-complete implementations.
I'm not saying SVG is bad! It's very good, I just don't see the need for another format for people to spend time and effort implementing and has to be installed on every system that wants to view or otherwise process said format(s).
Duplication of effort frustrates me when it's unnecessary.
That said, I hope other 3D APIs come to X with Gallium3D drivers. I hear GL's not the easiest to implement properly, and not the easiest 3D api to use. The fact that it assumes a C-family language, or a comfort with that programming style doesn't appeal to me, is all I know personally.
I don't have alot of expectation of that, but I have hope.
Is it possible that this is to do with backward compatibility so that low end graphics cards/chips still work; much in the same way VGA is still the default when all else fails and/or until the OS has loaded more advanced drivers, X and so on.
To be honest I don't think I've felt any system to be "slugish" except when loading from disk or a high (CPU) demand job is running in the background such as a blender render or a process has gone rogue due to a bug.
Then again my first computer was based on a Motorola 6502 processor and 32k of ram and could run defender as well as any purpose built arcade box.
@vtorri: Comment above explains your comment.
@raster: Yes, and qgraphicsview is technically a scene-graph as well, as was qcanvas as well, as was many, many other projects. Declarative languages were there before as well, it's not like HTML with javascript was never used. So if we're using "came before" as "based on my code" then surely Evas is based on QCanvas and Edje is based on Qt UI (you know the stuff Qt Designer was generating for close to 10 years now) :) Also not to critique Evas but afaik you don't minimize state changes, you just batch them and flush them all together in shader_array_flush.
@Flying Frog Consultancy Ltd: this comment would be irrelevant even in 1997. It's not scene-graph by itself, and it's not QML by itself, it's the combination of both that makes it what it is. I thought I made that clear. There's lots of scene-graphs and lots of declarative languages - they both matter. It's not that people don't like you, it's that you didn't give them a compelling reason to switch to your product, accelerated vector graphics libs are everywhere.
@Zack: Yes, we were using a different declarative language (not QML) when we did this in 1997, of course.
@Flying Frog Consultancy Ltd.: Good stuff. Of course I'd point out that since this is computers and not magic we're talking about instead of posting bitter comments you could simply show people the code, the benchmarks and the amazing interfaces you've created to prove how much better you were in 1997, but hey, this is more fun, ain't it?
@Zack: The code is only available under commercial license, of course, but as you're being so friendly I'm happy to tell you more about the history of our product line. The original version was 50,000 lines of C++ code that could zoom and pan around the Postscript Tiger full screen at 100fps on an nVidia RIVA 128 with a 300MHz Intel Pentium. A colleague integrated OCaml support which became our declarative language of choice and, ultimately, led to the entire code base being rewritten in OCaml. That system was used to build various custom demos for companies. For example, High Energy Magic's SpotCode product had an interactive shop front demo that users could control using camera phones. That demo was an Acme travel agent where the user could fly around a map of the world to book flights. We also advised Wolfram Research before they released their own solution in Mathematica 6 (2007). Today, we still use the same code (albeit translated from OCaml to F#) to power our current products such as F# for Visualization and we have used it to build custom presentations for customers such as Microsoft.
@Flying Frog Consultancy Ltd.: me: "this is what we're doing on gnu/linux, in particular in kde, this is why it's broken, this is how we're fixing it", you: "Pff, we did it, but different and not in kde, and it's a secret". Good stuff!
Sorry, I did not read all comments, so maybe this has been covered.
The CORE problem is not any hardware architecture or anything like that at all. The core problem is that current APIs (like Windows GDI, Qt's QWidget::paintEvent() and such) are doing it the way they are doing it: Repaint every time when the window is invalidated. This is plain wrong - a (partial) repaint should be made when the state of the window changes, not when it needs to be (re-)displayed on the screen. It should always be the case that you are not painting to the screen, but to a buffer, that will then be blitted to the screen whenever the window (or parts of it) needs to be repainted. And this buffer does not need re-invalidation e.g. when the application is minimized and then restored. This is really the fault of the APIs, not of GPUs or other hardware architectures. Actually, Windows did a few steps in this direction. Windows 7's drawing mechanics have undergone a complete re-write to make the small previews in the taskbar's tooltip and the alt-tab dialog possible. Windows are not drawing to a screen, but to a buffer, that will then be re-used (e.g. for scaling into the alt-tab dialog) by the Windows Desktop.
And by the way, I totally don't get your response to Anonymous (talking about using OpenGL for 2D). His point is perfectly valid.
I think you are blaming the wrong folks.
CAPTCHA: snesup. makes me want to play mario kart.
actually the evas gl engine does batch and minimize state changes. it does it quite aggressively. it has a number of parallel geometry pipes it keeps going and as long as you wont have incorrect-ordering when rendering, it will batch newer draws with old ones that match the same state. it by default maintains 32 pipes (unless you are on specific gpu's where it actually doesn't help - in these it keeps only 1 around). you can have up to 128 (a recompile can increase this limit - it's a #define, but runtime you can set EVAS_GL_PIPES_MAX to the maximum number of parallel pipes to maintain at a time). if you disable the pipes - or well bring it to a single pipe, then one scene (drawing icons with labels) results in 92 flushes (92 gldrawarrays) per frame. with it on even at the default this goes down to 4 per frame. instead of 92. it certainly does work.
but ymmv depending on driver/gpu/platform/ i've tested across quite a few - fglrx on radeon hd 4650 saw a massive speedup - like 200-300% framerate increase from memory, cedric tells me on his eee he sees a 30% speedup. on an sgx540 i've seen a good 30% speedup - WHEN it actually finds an optimal path (and my tests show its pretty good at doing so), nvidia tegra 2 shows no speedups at all and on recent nvidia desktop GPU's (GT220 for example) it's no win, on older (8600 GTS) it's a big win - 75% speedup.
so yes - evas does do this state change minimization quite well :)
as for who came first - i'm getting at the "use opengl for regular 2D rendering, but to do so effectively oyu need to change rendering model from immediate-mode drawbox, drawline etc. to more of a scene graph as WELL as abstract the rendering to the point where you can just slide in opengl etc." and that's something Qt has only started doing in very recent times. :).
that hard bit is getting people to use a new model and break away from the immediate-mode mindset and codebases.
This is a tangent but I just wanted to describe the bit of history I was involved in:
Symbian OS had a reactive drawing window-server - when a part of the screen needed to be drawn, the client that had that bit of screen was woken up to draw. Very Windows-like.
It worked ok, low RAM - there was a screen-buffer, usually not even double-buffered, so it worked.
The key thing was that the draw commands were serialised and sent over IPC to the window server to do the actual drawing, where it could enforce clipping.
This worked horrid when semi-transparent windows were added.
So a 'redraw store' was added to the server so that it would store all the draw commands for windows in a buffer and 'replay' them server-side when it needed to redraw part of a window.
The initial implementation was horrid and my team put a lot of effort into speeding it up, but the concept was sound and a big step forward.
We had dreams of translating the primatives we got from clients to openvg or display-lists - openvg seemed promising, then it faded away, then it seemed to become viable again, but we never jumped before UIQ evaporated.
The implementation was always complicated by the old APIs for 'direct screen access' and 'getpixel()', which needed to be supported even when UIQ was very careful not to use them.
@JonathanWilson ... look at EFL (Evas and friends) - it does just what you ask. It intorduces a new model (scene graph), BUT does it with multilpe render targets. default is an optimised software engine perfectly capable of realtime display even with all the fancy bits on. you don't NEED OpenGL acceleration for it to work well. In addition there is an OpenGL rendering engine (just select which you want at runtime) that can do all the same rendering of 2D scene graph elements, but using your GPU and its drivers. so it provides a forward-moving path allowing for gl to be used when/if the drivers are solid and you have the hardware, and software to be used otherwise (or many other rendering targets supported too).
@ Daniel Albuschat ...
yes - the problem is how you expose painting TO the app or even the widgets themselves. but... requiring EVERYTHING (every button, every list item, ...) to be a buffer will mean you'll have no memory left very quickly. you can't do that. you need to re-paint. you just have to move the painting out of the view of apps and put it even well below the toolkit/widget set. deal in objects (a rectangle, an image, a text string etc.) and just manipulate them, stack them and change their properties. let state management figure out how to re-draw such changes.
the fact that in "windows 7 you have previews" is simply a bi-product of forcing all windows to render not to the fb, but to backing pixmaps. this happens in x11 when you use a compositor too. it just happens to consume quite prodigious amounts of memory. you'll go through dozens of mb before you know it - and 100's of mb are easily used up.
in the end if there is a buffer or not should be transparent to the app - or a toolkit. it should be down at a lower layer. it should manage that and how to redraw, what to redraw and how to minimize that, if that is needed.
@Daniel Albuschat: we're already doing that. On two levels in fact. Composition introduces one backing pixmap which is being simply blitted on window moves and such, and another is used by the toolkits (in Qt it's in QWindowSurface) to blit locally. As to the other comment being valid, no, it's not, we've been doing it for years.
@raster: when I'm talking about minimizing state changes, I'm not talking about per-frame, as I mentioned that can be done in the old model (and we've been doing that too), I'm talking about per-lifetime of the application. For example once you initialize a button the four coordinates should never be uploaded again, there should be a permanent buffer object which simply is binded whenever it's being rendered. Furthermore things like moves should be minimized to just sending translation in x and y floats along a newly bound vertex shader (or a full matrix with a generic mapping vertex shader, which would still come up to a lot less data than translation on the client side and resending coordinates for every primitive). No one is doing that at moment and that's really what I'm talking about.
I completely agree about getting people of the old model being difficult. With HTML5 apps becoming more and more common at least we're getting away from the old model and towards declarative "think in terms of objects" which is good.
(btw, your response to Daniel isn't correct, Qt and iirc GTK+ are doing that, effectively we do triple buffering when composition manager is running, Kristian and I talked about it a long time ago and we didn't see a way around it, topic for another blog maybe)
@Will: interesting, thanks.
In fact current mainline Evas OpenGL backend doesn't do that anymore, but the previous one was building DisplayList to reduce rebuilding stuff from one frame to another. I did add that a few years ago. It was an improvement over not using them, but when the new OpenGL backend written by raster came in, it was much faster than the previous one without this kind of improvements. Don't know what raster think about adding back this kind of trick, but once you have a state full canvas it's really easy to add the needed logic.
So the biggest issue is to get people to use object model instead of direct rendering. Hope adoption will get faster now that all modern framework are going in that direction.
We already accelerate in webkit the same things we accelerate for QML, i.e. the scene graph for animations and transforms. See, for example, http://labs.qt.nokia.com/2010/05/17/qtwebkit-now-accelerates-css-animations-3d-transforms/.
So, both Webkit and QML make pretty good use of the GPU, and the difference is more about productivity than about hardware acceleration.
@Anonymous: No, you don't. QGraphicsView uses the old model.
Dude, I need someone that doesn't give me support but that tells me "Read that", first time I see a graphics ninja. Can you help me?
I'm at iampowerslave which is a hotmail e-mail if you understand me.
Very interesting discussions going on, and I may even remember some of the things I've learned reading it!
Since this is a post on 2D rendering and toolkits in general, I was wondering what your (and reader's) thoughts are on Morphic? Morphic is an object oriented 2D graphics system which was originally built as part of Self, and has since been ported to Smalltalk (eg. Squeak), SVG+Javascript (Lively Kernel) and even Qt (the experimental "Lively for Qt" project).
As far as I understand it, Morphic is essentially a very elaborate way to organise the primitive shapes you want to draw, and thus is very much the old-school mentioned in your post; however, applications live at such a high level, and communicate only via late-bound message sending, that they are very much the "object model" way of doing things (especially since Smalltalk first formally defined objects ;).
Also of relevance is Juan Vuletich's "Morphic 3" project, which is trying to disconnect Morphic's 2D graphics from the pixels, screen, resolution and even from the coordinate system; then rendering it at whichever zoom level is desired using sampling theory. Sounds rather ninja-like if you ask me... :)
Hey, Zack, how can I be as awesome as you?
The most i can do is write bash/python scripts and program a little in C#/C++
Well, I'm a Linux user which makes me slightly awesome, but it's not enough.
You should write a guide :p
Great work, man.
I wish I had enough knowledge and skills to be able to write drivers. Especially ones as complicated as graphic drivers
What do you think about wayland? Does it solve this? Maybe it should be fixed before it's (again) too late..
Quite honestly, I'm surprised this isn't a lot more common. I guess old APIs are hard to break XD. Just imagine the optimizations we can have over the next decade- if we really tackle this, Qt can become an extraordinary platform for enough the most dainty mobile hardware. I'm excited for how things will change.
Sometimes I feel like open source toolkits just kick way too much ass for how little they're noticed. Perhaps when most of the modern world has been using overpowered computers for the past five years, it's hard to remember the small, important things. Doing well because you can, not because you have to.
I really love your blog, by the way- you've helped me, someone who loves graphics but is horrible with the terminology and mechanics of it all, to understand more about how we can improve one of the most essential and dynamic parts of our software.
Post a Comment