Tuesday 8 September 2009

Challenges in 2D acceleration

GPU based 2D acceleration has been an hot topic in Linux already for years. With hot topic meaning should or shouldn't it be utilizing GPU for rendering and if so - what should be accelerated. The first attempts with XAA didn't give basically any speed advantage and accelerated only some core rendering elements to the graphics HW. Along the X Render Extension providing some nice functionality for modern applications like alpha blending, drop shadows and translucency the questions of whether GPU should be used needed to be revisited. EXA enabled this opportunity to HW accelerate render extension requests and therefor improving the X.Org Server 2D performance.

The latest twist in this field is UXA which is based on the EXA code base and utilizing GEM Memory Manager. With this latest combination by moving the memory manager to the Kernel side (with GEM) it seems that the equation of accelerating X is finally getting in shape but there still seems to be challenges especially with text/glyph rendering - more from this in Carl Worth's web pages.


But how about these acceleration architectures in action? As I was anyway about to reinstall my Intel 82852/855GM Graphics Device accelerated laptop (old and integrated, I know...), I decided to put these architectures in test.
As an distro I selected the latest Ubuntu 9.04. The actual benchmarking was done with mx11perf - this mainly as it had nice benchmark script already existing which covered quite nice variety of X11 operations. Some of the most interesting results from this run can be found from below.

Rendering of basic rect:

EXA UXA XAA NoAccel
Rect 8x8 Src 10410/sec (0.67 Mpix/s) 7598/sec (0.49 Mpix/s) 5527/sec (0.35 Mpix/s) 263154/sec (16.84 Mpix/s)
Rect 32x32 Src 9316/sec (9.54 Mpix/s) 6701/sec (6.86 Mpix/s) 10050/sec (10.29 Mpix/s) 33070/sec (33.86 Mpix/s)
Rect 512x256 Src 3950/sec (517.77 Mpix/s) 3862/sec (506.19 Mpix/s) 6245/sec (818.65 Mpix/s) 299/sec (39.25 Mpix/s)

Then a Copy:

EXA UXA XAA NoAccel
Copy (Render) 32x32 8884/sec (9.10 Mpix/s) 6362/sec (6.51 Mpix/s) 6320/sec (6.47 Mpix/s) 7467/sec (7.65 Mpix/s)
Copy (Render) 128x128 10909/sec (178.74 Mpix/s) 10845/sec (177.69 Mpix/s) 12141/sec (198.92 Mpix/s) 530/sec (8.69 Mpix/s)

Composite:

EXA UXA XAA NoAccel
Composite (Src 16) 32x32 5612/sec (5.75 Mpix/s) 1254/sec (1.28 Mpix/s) 5165/sec (5.29 Mpix/s) 14802/sec (15.16 Mpix/s)
Composite (Src 16) 512x256 147/sec (19.32 Mpix/s) 35/sec (4.70 Mpix/s) 307/sec (40.30 Mpix/s) 127/sec (16.76 Mpix/s)
Composite (Src 32) 32x32 7087/sec (7.26 Mpix/s) 6382/sec (6.54 Mpix/s) 6115/sec (6.26 Mpix/s) 30312/sec (31.04 Mpix/s)
Composite (Src 32) 512x256 1845/sec (241.89 Mpix/s) 3538/sec (463.78 Mpix/s) 1731/sec (226.92 Mpix/s) 286/sec (37.60 Mpix/s)
(with this one it is interesting to see UXA with 16-bit depth. Why so? Also with 24- and 32-bit UXA seemed to perform really nicely with special sized 512x256 rect's)

and finally text:

EXA UXA XAA NoAccel
Text 8px 1726/sec (1.49 Mpix/s) 2295/sec (1.98 Mpix/s) 2490/sec (2.14 Mpix/s) 6830/sec (5.87 Mpix/s)
Text 12px 1658/sec (2.39 Mpix/s) 2265/sec (3.26 Mpix/s) 2274/sec (3.28 Mpix/s) 4550/sec (6.54 Mpix/s)
Text 24px 2099/sec (9.89 Mpix/s) 2383/sec (11.24 Mpix/s) 1978/sec (9.32 Mpix/s) 1846/sec (8.69 Mpix/s)
(Showing that rendering small areas and text is something where the CPU can still outperform GPU)


At the end the mx11mark gives also an total score. From these figures EXA and actually XAA gets the highets scores which is basically the same ~80 total score. XAA mainly getting these scores as it gets really high figures with pure rect rendering. UXA and NoAccel are also even with ~20 total score - NoAccel performing with small are renders and UXA with 512x256 rect's.


OK, my selection from these was the default one - EXA. Seems that at least with out_of_the_box UXA there is still quite much work to be done and I would definitely be interested to see how UXA for ex performs in tuned Intel 965 graphics device.

Thursday 27 August 2009

The new Maemo 5 is finally out

This long speculated and waited "software behind our mobile computer" is finally released. See http://maemo.nokia.com/

Some interesting characteristics of this device
  • Including Clutter based compositing manager
  • Using EGL and OpenGL ES 2.0 for rendering, and of course also providing the GLES API for developers
  • Including Xorg X window system
  • Support for Flash (not just the h.264 converted videos)

From the engine room's side this all will be running with TI's OMAP 3430 and PowerVR's SGX for HW accelerated graphics. Cool!

Sunday 16 August 2009

Compositing Window Manager - how it works

Compiz and other compositors have been living in the x86 world already for an while but these compositors are also making their way to mobile devices. To understand a bit better how these compositing managers work, I decided to take an bit closer look at the de facto Linux composite window manager - Compiz. I know that there are several good articles about compositing in general (like composite_howto) and Compiz as it is, is not aimed for mobile devices but the idea with this was to understand in higher level how this compositing in general is working in Linux platform.
For this I set up an separate Xgl session and used the good old ltrace to see what happens under the hood. So, off we go...


When launching Compiz the first thing we do is query whether the needed extensions are in place. This by calling functions like XCompositeQueryExtension, XDamageQueryExtension, etc. This to ensure that the infra is there.
The actual magic starts to happen when we call the XCompositeRedirectSubwindows, which requests the XServer to redirect the entire window hierarchy to off-screen buffer. This including current and future windows. This redirection is done at XServer level and is completely transparent to the applications, and therefore does not require any changes to them.

In Compiz case the XCompositeRedirectSubwindows call requests for manual redirection (CompositeRedirectManual). This will mean that the server does not internally track damage for them but instead registers damage listeners for each window. Updating of the screen in this case is done by Compiz itself - allowing it to do the final presentation to the screen.

Next step is to notify the X Server that Compiz wishes to draw to an window of the XServer itself - window called the Composite Overlay. This is done by calling XCompositeGetOverlayWindow. This overlay window exists above all other windows and provides an surface for the compositing manager to draw to. The challenge with this overlay window comes when the user generates input events (key- / mouse presses, etc) and these would need to be passed trough this overlay window to the actual X Window for interaction. Compiz use XFixes shape to enable this to happen. This with an set of function calls - XFixesCreateRegion, XFixesSetWindowShapeRegion, XFixesSetWindowShapeRegion, XFixesDestroyRegion.


So now we would have the basics up and running and we are ready for the applications.

First thing with the applications executed through the compositing manager is to create an damage listener for the window. This is done by calling XDamageCreate. With this call we want to be notified through an event whenever the damage state changes.

For the applications we need to create the off-screen storage, as we have told the XServer that we would like to render to off-screen buffer. This is done by calling XCompositeNameWindowPixmap. With this we are creating an window sized pixmap and also getting as return value a handle to the pixmap as an reference to the off-screen storage. When rendering the screen to the display Compiz uses this handle to bind this pixmap as an texture with an GLX (not in EGL) extension texture_from_pixmap - glXBindTexImageEXT. Compiz uses these GLXPixmaps to do the final rendering to the screen.