Monday 13 December 2010

SceneGraph in toolkits

SceneGraph architectures can be found from quite many of the new architectures introduced. Clutter with its latest rework on the SceneGraph to make it as retained as possible, Qt with its new SceneGraph architecture and also Mozilla's Layers architecture is getting its idea from SceneGraph. Earlier this has been the way the games render them selves but now this model is starting to define how toolkits render our 2D content.
First idea with the scene graph is that you will start to take the benefit from GPU rendering. Instead of having the applications doing the rendering in SceneGraph the compositing is done in retained mode, constructing an tree structure of the scene to be visualized. The SceneGraph paradigm and problem setting with the toolkit/widget and games is still quite different cmp'd to games. When we are discussing about 2D UI's we do not have complex geometry, fancy viewport transformations or dynamic lightnings. Instead we are mainly talking about getting our textures displayed on the screen and about blending them. This difference puts also requirements to the actual traverse algorithm and renderer to ensure that we can take everything out from the GPU from 2D toolkits point of view. Also the problem statement is rather aiming for optimizing of the performance than managing the complexity of the view.

With the GPU there is one particular thing that it cannot execute efficiently - changing states. What the GPU's are designed to do is to load in the shader program and do their stuff. Simple. With the imperative way of doing the rendering where the view is constructed by hierarchically rendering your scene by the application you are bound to do a lot of state changes. This as every single application is in charge of rendering itself - widget by widget and line-by-line, constantly loading content to the GPU. So what we want to do is to feed our GPU as much data as possible and letting it do its job. This by grouping our geometry, shaders and textures data together.
The most important data (or node) from toolkits point of view would be our appearance or geometry node defining our graphics primitives (vertices data and texture info. Data required for your widget rendering). This data is something that we want to store not only per-frame but per lifetime of an application to minimize the state changes.

Second important item is the drawing it self. Although GPU's are designed for drawing, the unnecessary drawing is naturally something that we don't want to do. An easy way to improve in this area is to draw less, or draw only what is visible on the screen. This is exactly what our SceneGraph architectures want to do. With its tree structure the SceneGraph model can smartly use the z buffer to avoid the overdraw. And by this not meaning the computation time clipping and depth buffering that the GPU's are able to do, but rather limiting the objects actually sent to the GPU. This is done by selecting the rendering order smartly (front-to-back when possible) or for ex in Mozilla's Layer case placing the transparent areas into their own layer. The challenge is of course that with the modern UI's we are discussing about a lot of transparency which will force us to do back-to-front rendering.

There are of course also challenges. The SceneGraphs are balancing between the memory consumption (caching too much) and performance. Other challenge is the different nature of different GPUs and drivers as they sometimes behave a little differently and do different things well.
With the toolkit SceneGraphs the challenge is also the target. Are they only aiming for optimized 2D rendering or would they also like to serve the games.

Sunday 17 October 2010

SW that matters

During the past few weeks/months we have been seeing a lot of news about the future HW of different vendors. Qualcomm has been pushing out news about their 1.5GHz dual-core Snapdragon (ARM cortex-A8 based, Qualcomm modified), Samsung about their Orion a 1GHz dual-core ARM Cortex A9 and Marvell its tri-core Armada 628 (once again ARM based, ARM v7 cores) which can have the main two cores running up to 1.5GHz and the third at 624 MHz. Quite impressive, right?

Lets first think about the figures that the different vendors are providing. What do we do with all of that power? at least form my point of a view this is an excellent questions.
Take for ex the first generation iPhone - ARM11 downclocked to run at 412Mhz, PowerVR MBX light (optimized for low power consumption instead of speed), 128MB of RAM. Still with these figures in the paper even the first iPhones can still beat many of the devices in the markets today. And bare in mind that iPhone 2G was announced already at at the beginning of 2007.
Why is that? Well the SW and especially the framework has been build in a respect of the underlaying HW. And because of the fact that Apple was ready to make compromises with the device - no multitasking which makes things a lot simpler (limited support existing nowadays), simple set of features which really work, extremely simple and limited set of widgets without theming or the 480x320 pixel resolution combined with a color depth of 18 bits (helping a lot with memory bandwidth usage [1]) - focusing on simplicity and usability.

And its not only about the framework from the SW side that matters - badly implemented drivers, SW which is not making the use of the architecture and features a chip offers (separate cores for video, image processing, blitting, etc.). This will effect to the overall performance.


The other thing with these nice MHz figures is the battery consumption. My self I would at least like to get my hands on one touch screen based device which would really last longer than my working day. Of course this is also how the power management of the device is done, which comes once again to well designed and implemented SW...

Instead of getting the MHz's up, how about as an ultimate goal to get the CPU to run as slow as possible and really shifting the operations to GPU. The GPU's can in many cases achieve far better efficiency than the CPU. This would of course still require something from the SW - framework build in such away that it can really utilize the GPU and not just making it to do some fancy effects when switching between applications.


[1] 320x480pixels, 18 bits for a pixel and rendering that 60 times per sec vs. that a frame would hold 800 x 480 pixels (as the Nexus has).

Sunday 12 September 2010

Standards, agreenments and fragmentation

The graphics area can be seen through your OpenSource "standards" and industry _standards_. Even when discussion only about Linux and not even going to the Microsoft land the field is starting to be a quite messy.

The fragmentation in open source side is in many areas caused by the fact that the communities are competing with each other. They want to ensure that they can bring out as good as possible SW as soon as possibly - naturally. What this causes is that instead of making an effort and developing something together we fragment. Good examples of this is your TTM vs. GEM, classic Mesa vs. Mesa Gallium state tracker or EXA/XAA/UXA.

Then we have our industry standards by Khronos. What Khronos defines is an open standard mainly for Graphics and Media. We have the defacto OpenGL/ES and the counterparts GLX and EGL and then we have something else. We have things like OpenWF (display and Composition) which in OSS world is know as KMS (with limited composition support), OpenKODE with the OSS counterpart being SDL and OpenMAX which in OSS world is tackled with VAAPI or XvMC or VDPAU. Well you get the point.

So why should we care about this? What this all means is that the set of selection when developing an Linux based system - following the industry standards or selection one of the Open Source paths. Often the proprietary drivers do follow the industry standards but they have not adopted any of the OSS ways of providing the functionality and why would they as the OSS "standards" are often vendor specific. This will mean that reasonable way to build our platform would be the industry standards but then you would loose the possibility to try out the new nice things which are not executable in these platforms. Difficult choice and ideally a choice that we wouldn't need to do.

Sunday 5 September 2010

UI done right

I have always been a fan of what the guys at TAT are doing. These guys have really understood the whole thing of making great looking UI's in away that it is respecting the underlaying HW and taking every last bit out of it.

Their latest creation Velvet is just that - innovative, simple, great looking UI. Nice job!


PS. Would be nice to understand how these guys are working. You have your UI designer, framework developer and lower level graphics guy all working seamlessly together...

Tuesday 3 August 2010

New Gnome-Shell release ... reminds me of something

Gnome-Shell had an new release published at mid last month. The new release is now dependent from GTK3.0 and at least according to different sources should be more stable and performing - ready for a test drive.

So what is this Gnome-Shell.
Well Gnome-Shell is your integrated window manager, compositor, and application launcher. It uses Clutter canvas as an scene where your GtkWindows are reparented as actors. The main logic behind this is under Mutter, an fork from Metacity.

The architecture is done in such away that the window manager, compositor, and application launcher are tightly coupled together. With this approach there is no need to provide any additional abstraction layers to different WM/CMs or any scene/graph abstraction. Also with this the GNOME guys have the full control of the stack. Because of the architecture Gnome-Shell has been receiving quite a bit of comments especially from the Compiz people. This as the approach will make other WM/CMs live quite difficult.

With the architecture, it is actually funny how it reminds the Hildon 5 architecture - composition with clutter canvas as scene, Hildon desktop integrated appl launcher, desktop integrated Window Manager, etc. and everything nicely coupled together.


As from the user experience,
launching the latest and greatest Gnome-Shell with my Intel 855GM based based device was an extremely slow experience. Gnome-Shell was really unusable. Bit of an Googling revealed that syncing to VBlank might be the root cause for this problem as it is enabled by default. "export CLUTTER_VBLANK=none" seemed to be the cure for this problem. Wonder why this problem... (bit of a debugging might be in place). This trick makes it at least testable.
The main thing that I noted with the Gnome-Shell is that you really want to learn to use the basics - your ALT+F2 key combo. The new Application Area did not at least convince me. Maybe as the usage is going towards more basics it would be time to put the good old Awesome back in use - going really back to basics...

Thursday 22 July 2010

When accleration came to browser

HW accelerating in website rendering has become an hot topic along the HTML5. Mozillas answer to this demand is their new Layers architecture.
Mozilla's Layers allows browser to take the suitable parts of the rendering to GPU and of course along that to get some eye candy to the end users. Idea is to divide the logically different parts of the rendered content to different layers allowing better division between the CPU and GPU.

The Layers will structure a layer tree which constructs the final webpage. To simplify, they have a container (representing the surface) and its leafs (including the actual graphical data). The final render is constructed from the textures generated from the leafs and composited to the screen.


The basic layer infra will include ThebesLayers, ContainerLayers and a LayerManager.

The Manager is responsible of constructing the layer tree and rendering that to the destination context. The rendering is always happening as an result of an transaction to the tree and only happening to the visible region parts of the texture.

The Container layers task is to group together the child layers beneath it. It represents the surface into which its children are composited and handles the Z order of the its children.

Thebes in general is nothing new. It has been part of the Mozilla gfx architecture already quite awhile as an C++ wrapper for Cairo API's and as an API for text handling.
The ThebesLayer abstracts the Thebes surface which is used to display items that will be rendered (surprise, surpise) using Thebes. So Cairo Image surface drawn to by Cairo. The question is of course - why not doing this directly with Cairo? Why the wrapper? - Some discussions about this from Bas's blog comments.

There is bit of an older bug in Mozilla bugzilla explaining this basic functionality in detailed. In principle what you get with the basic functionality is what you already have - Cairo content shown to the screen. Of course besides the basic functionality you will also need some additional layers for video, something for webgl plus you would also like to have optimized way of doing the rendering. So not all about just basics.


What from my point of view makes this layers especially in interesting are the plans to bring layers to also to Fennec 2.0 along the other goodies like Electrolysis. It will be most definitely interesting to see how it performs and behaves in a more limited environment.

BTW the first bits of layers are already available through the latest nightly builds of Firefox4. The usage of the layers is still limited to Windows/GL, full-screen video only but the sources are available for seeing and testing.

Monday 5 July 2010

Difficulty of developing Open Source Linux drivers for GPU

During the last few days there as been an active discussion ongoing at dri-devel list about Open Source Graphics drivers. The discussion was initiated by Qualcomm's RFC message - [RFC] Qualcomm 2D/3D graphics driver.

Qualcomm's Linux team has been working on Open Sourced Kernel drivers for their chipset. What they have done so far is that they have build DRM driver with a limited set of supported functionality to enable DRI2. Their own userland parts of the drivers do still utilize the KGSL (Kernel Graphics Support Layer) interface to allocate memory making DRM just as an API to be more closer with the mainstream Linux Graphics stack. Qualcomm does still have some plans in the future to move to an standard design - possibly TTM.


The dri-devel list the opinions about drivers with an userland binary blobs was made quite clear - if you aren't going to open the whole drivers don't try to push them to mainline. Basically the door was slammed from the Qualcomm efforts for pushing the drivers in.

In between the lines I think there were some excellent point made in this discussion.
The bottom line is that the community cannot work with the drivers if there exists and dependency to closed sourced userland - this is definitely clear. With this kind of an setup it is just impossible to work with the drivers. On the otherhand if the community is going to bombard down all the drivers with the closed userland parts (even without making sure what are the attemts to open source them) how are we ever going to see open sourced mobile GPU drivers? From my point of view the community also has the means to guide this work to right direction - giving comments to the request. It really seems that the Qualcomm guys do have the enthusiasm to develop the open source drivers. How about making sure that these guys are taking the correct steps also with the userland and making sure that the Kernel bits are as they should?

Wednesday 30 June 2010

MeeGo for handsets getting its UI

MeeGo for handsets has taken its next step and is now capable to provide MeeGo User Experience also for mobile users - see the blog entry.
There is some nice screenshots and video showing its functionality. Unfortunately the downloadable image is only available for Moorestown based Aava device. The N900 image still needs to be compiled by hand - instructions from here and repo from here.

Sunday 9 May 2010

GLES1.1/2.0 on Mesa DRI and a bit about Gallium

Mesa has had GLES 1.1/2.0 already since since last year through Gallium state trackers. What this means is that Gallium based drivers are capable of providing support GLES support for end users. Now there is also an activity ongoing to get the Mesa GLES1/2 support to work with DRI drivers and along that to expose multiple GL APIs without separate state trackers on Gallium side.

This work on GLES 1.1/2.0 on DRI made by Kristian received good feedback also Gallium driver developer and is now merged to Mesa.


The discussion about classic Mesa vs. Mesa Gallium State Tracker in general has not lately been this productive (see the Mesa thread). Intel guys are clearly wanting to stay with the classic drivers, and whom could actually blame after all the work that Intel has place to get their drivers to perform. Currently Nouveau camp is the only one that is really utilising Gallium3D and the RedHat guys are going to ship the latest free software NVIDIA bits with the upcoming Fedora 13.

Wednesday 7 April 2010

Multitouch support for X

This is not directly grapihcs related but about interesting enablers for modern UI's anyway...

XInput2/MPX was introduced already in XServer1.7 and the Linux Kernel has had the multitouch support since 2.6.30. So we should have case closed and a working multitouch solution existing for Lixux environment. Right?
Well, what we currently have are couple of reference implementation done to show of the Linux multitouch capabilities, but the full pipeline coming directly from upstream is still missing. When we are discussing about X we should be capable to support all the different use cases from simple gestures with pinch, all the way to cases where we have multiple user interacting simultaneously with a big multitouch device. This makes things complicated.

From the current reference implementations people at ENAC have been working quite actively around the multitouch solutions - directly on top of Linux + they have also patched the existing evdev to propagate multiple pointers through MPX. Another reference implementation of multitouch X driver is done by Henrik Rydberg - found from here. Still either of these solutions have not made it to official XServer.

Now there has been an discussion ongoing in the xorg-devel mailinglist that there would be now an clear first step to include support multitouch features for X. This first step would be limited to have a single focus / input point with auxiliary information attach to the event. Most importantly X would propagate these events to the right clients.
The solution would be mainly for indirect touch devices (like touchpads), but would also cover single-application input cases. From mobile devices point of view, where we would have a simple window manager with top most application fullscreen, this solution would already be able to cover all your favourite two finger pinch gestures - making this basically a full fledged solution for these kinds of devices.

Thursday 11 March 2010

DRI2 driver for Mesa EGL

Couple of weeks ago there was an announcement that Wayland would be moving to Mesa EGL and drop its dependencies to its own Wayland specific Eagle EGL stack. The Eagle is a non-conformant EGL loader for the DRI drivers originally develped by Kristian Høgsberg as an "side product" of his DRI2 work. Eagle as an lightweight and with its way of integrating EGL directly to KMS/DRM, was the solution for Wayland - this at least until recently.

Major reason to allow Wayland to jump to Mesa EGL is Kristian's work for the work behind the DRI2 driver for EGL and to get the Eagle EGL parts as officially to Mesa. This work - EGL on DRI2 is firstly aiming to implement this support through X, but the longer term goal seems to be that the implementation would be standalone - working directly with KMS/DRM and making it suitable for Wayland purposes.

Current plan is that the DRI2 parts for the Mesa EGL would be integrated to the upcoming Mesa 7.8 - planned to be release during this month. This is good news especially for Wayland. Along the Mesa integration the Wayland enablers are now even more mainlined. This move will most definitely make Wayland it self more attractive to outside world.

Monday 15 February 2010

Challenges in 2D acceleration - how about the future?

To achieve great 2D graphics performance from Linux Graphics stack has most definitely been an challenge for years. Trying to get the XRender extension more usable with XAA/EXA/UXA (see my previous blog post), utilising all-software image backends, utilise GL directly to accelerate the 2D and so on. The best solution really has been the CPU rasterization backend based on the fact that the CPU simply runs with higher clock speed and quite many of these 2D operations require raw processing power.
So are we stuck just to continue the everlasting optimisation of the existing solutions? Maybe not.

Now it seems that this there might be some light in the tunnel. Cairo has been getting some quite impressive performance figures with its cairo-drm backend. Instead of using XServer to do the rendering or build an abstraction of XRender with GL backend (what glitz was doing) it has the capabilities to use directly the GEM buffers and issues the rendering commands directly to HW. This will allow direct access to GPU and will give quite impressive performance figures.
The thing is still that cairo-drm is an experimental backend and that the backend is limited to Intel HW (i915 to be specific).

With the performance figures - what the cairo-drm can really do, the goal for the guys is now to get this same performance out from the GL interface directly. This would 1st of all remove the limitations mentioned above and 2nd, it would also allow all the rendering to happen under one driver - OpenGL. This of course would mean that there would be only one graphics stack to tune and maintain and would allow CPU to do other stuff. The cairo-gl project aims to do exactly this.
The cairo-gl backend would also make the solution usable with other than Intel specific HW. The questions of course is that how well the solution performs in non-Intel HW.


Time will show how the cairo-gl backend will perform and whether it is capable to beat CPU rasterization backends.
More info about the progress from:

1. http://anholt.livejournal.com/
2. http://ickle.wordpress.com/

Saturday 23 January 2010

N900/Maemo5 and different texture formats

Garage.maemo.org is providing an nice tool for benchmarking the different texture formats that the meamo5, SGX based device supports. The tool GLMemPerf is meant for measuring the texture memory bandwidth performance of an OpenGL ES 2.0 implementation with a set of different kinds of precompressed textures. As an output it blit the different textures from its /data folder to the screen and provides an nice set of information - how the different formats and texture processing methods effect to the overall performance.


From the results the compressed textures formats (IMG specific PVRTC and ETC1) are clearly giving an performance benefit - as one could expect. The IMG PVR Texture Compression format has the 2-bits-per-pixel and 4-bits-per-pixel options create PVRTC data that encodes source pixels into 2 or 4 bits per pixel. This meaning that each texel is encoded into fewer bits than the equivalent uncompressed texture, which of course is meaning less load for memory bandwidth.
The downside with the compression is that the overall quality of the render will suffer. The quality hit is still depending from the compressed content so might be worth of checking how 2 vs 4 bits per pixel will effect.

The other thing to note is how the textures are stored and accessed from the memory.
SGX as many other GPUs benefit greatly from the usage of using power-of-two sized textures - which improves the implementation of the texture mapping when converting the texture coordinates to texel coordinates. It might be worth of rounding up the textures to the nearest power-of-two to get the benefit. Note also that the PVRTC formats require that the textures are in power-of-two size.
The other thing related accessing the memory is the texture twiddling. The twiddling is by default done for all the SGX uploaded textures. What the twiddling does is that it re-arranges the texture samples to optimize the memory access for different operations. One thing to note is that with pixmaps you will not get this benefit as in a case of pixmaps you don't upload the textures to the video memory but instead source the pixels directly.


How about generating and processing your textures? For this IMGTec is providing an nice tool called PVRTexTool. The tool has an version for command line usage and also an GUI based tool for seeing in more detailed the output of your changes - for ex when wanting to do comparison between the different compression methods. The PVRTexTool supports your normal RGB/ARGB formats + the texture compression format PVRTC and ETC.