Tuesday 8 September 2009

Challenges in 2D acceleration

GPU based 2D acceleration has been an hot topic in Linux already for years. With hot topic meaning should or shouldn't it be utilizing GPU for rendering and if so - what should be accelerated. The first attempts with XAA didn't give basically any speed advantage and accelerated only some core rendering elements to the graphics HW. Along the X Render Extension providing some nice functionality for modern applications like alpha blending, drop shadows and translucency the questions of whether GPU should be used needed to be revisited. EXA enabled this opportunity to HW accelerate render extension requests and therefor improving the X.Org Server 2D performance.

The latest twist in this field is UXA which is based on the EXA code base and utilizing GEM Memory Manager. With this latest combination by moving the memory manager to the Kernel side (with GEM) it seems that the equation of accelerating X is finally getting in shape but there still seems to be challenges especially with text/glyph rendering - more from this in Carl Worth's web pages.


But how about these acceleration architectures in action? As I was anyway about to reinstall my Intel 82852/855GM Graphics Device accelerated laptop (old and integrated, I know...), I decided to put these architectures in test.
As an distro I selected the latest Ubuntu 9.04. The actual benchmarking was done with mx11perf - this mainly as it had nice benchmark script already existing which covered quite nice variety of X11 operations. Some of the most interesting results from this run can be found from below.

Rendering of basic rect:

EXA UXA XAA NoAccel
Rect 8x8 Src 10410/sec (0.67 Mpix/s) 7598/sec (0.49 Mpix/s) 5527/sec (0.35 Mpix/s) 263154/sec (16.84 Mpix/s)
Rect 32x32 Src 9316/sec (9.54 Mpix/s) 6701/sec (6.86 Mpix/s) 10050/sec (10.29 Mpix/s) 33070/sec (33.86 Mpix/s)
Rect 512x256 Src 3950/sec (517.77 Mpix/s) 3862/sec (506.19 Mpix/s) 6245/sec (818.65 Mpix/s) 299/sec (39.25 Mpix/s)

Then a Copy:

EXA UXA XAA NoAccel
Copy (Render) 32x32 8884/sec (9.10 Mpix/s) 6362/sec (6.51 Mpix/s) 6320/sec (6.47 Mpix/s) 7467/sec (7.65 Mpix/s)
Copy (Render) 128x128 10909/sec (178.74 Mpix/s) 10845/sec (177.69 Mpix/s) 12141/sec (198.92 Mpix/s) 530/sec (8.69 Mpix/s)

Composite:

EXA UXA XAA NoAccel
Composite (Src 16) 32x32 5612/sec (5.75 Mpix/s) 1254/sec (1.28 Mpix/s) 5165/sec (5.29 Mpix/s) 14802/sec (15.16 Mpix/s)
Composite (Src 16) 512x256 147/sec (19.32 Mpix/s) 35/sec (4.70 Mpix/s) 307/sec (40.30 Mpix/s) 127/sec (16.76 Mpix/s)
Composite (Src 32) 32x32 7087/sec (7.26 Mpix/s) 6382/sec (6.54 Mpix/s) 6115/sec (6.26 Mpix/s) 30312/sec (31.04 Mpix/s)
Composite (Src 32) 512x256 1845/sec (241.89 Mpix/s) 3538/sec (463.78 Mpix/s) 1731/sec (226.92 Mpix/s) 286/sec (37.60 Mpix/s)
(with this one it is interesting to see UXA with 16-bit depth. Why so? Also with 24- and 32-bit UXA seemed to perform really nicely with special sized 512x256 rect's)

and finally text:

EXA UXA XAA NoAccel
Text 8px 1726/sec (1.49 Mpix/s) 2295/sec (1.98 Mpix/s) 2490/sec (2.14 Mpix/s) 6830/sec (5.87 Mpix/s)
Text 12px 1658/sec (2.39 Mpix/s) 2265/sec (3.26 Mpix/s) 2274/sec (3.28 Mpix/s) 4550/sec (6.54 Mpix/s)
Text 24px 2099/sec (9.89 Mpix/s) 2383/sec (11.24 Mpix/s) 1978/sec (9.32 Mpix/s) 1846/sec (8.69 Mpix/s)
(Showing that rendering small areas and text is something where the CPU can still outperform GPU)


At the end the mx11mark gives also an total score. From these figures EXA and actually XAA gets the highets scores which is basically the same ~80 total score. XAA mainly getting these scores as it gets really high figures with pure rect rendering. UXA and NoAccel are also even with ~20 total score - NoAccel performing with small are renders and UXA with 512x256 rect's.


OK, my selection from these was the default one - EXA. Seems that at least with out_of_the_box UXA there is still quite much work to be done and I would definitely be interested to see how UXA for ex performs in tuned Intel 965 graphics device.