sabato 21 marzo 2009

AA1: let's optimize!

I didn't expect to wait three months before posting something new here.

I've been so busy at work I had no spare time to spend on programming at home.
In February I had to setup a simple web server and decided to make it on my AA1. I installed and configured MySQL DB, Apache, PHP. It was weird, to say the least, to see that little baby run a website.

Recently I had the chance to relax for a couple of days and decided to remove the webserver from my AA1 and see if I can fix my octree (see my previous post).

To sum up, here's the situation: 5fps without octree acceleration, 29 fps when enabling it (but with corrupted rendering, including missing parts of the scene).

I started looking at mesh corruption, and I discovered some tests were missing while generating the octree. After fixing it the reference model (sponza atrium) run at 15-16 fps, regardless of the resolution. It could make sense, as the previous framerate referred to a corrupted mesh (some parts were not drawn).

I tried different octree configurations (adjusting the minimum node size and the maximum amount of primitives per node). As expected when seeing the entire model the speed decreases to 10fps, going up to 40-50fps when looking at a boundary.

I was not yet satisfied by the resulting performance, and decided to dig into my code to check if there was some space for improvement. I noticed the code building primitives didn't take into account MinIndex and NumVertices.
In general it's not a problem to set those values to 0 and NumberOfVerticesInVB, as modern GPUs are quite tolerant and these parameters come from an old age.

In case of Intel 945GM, vertex shaders aren't implemented in HW, but are emulated by the CPU. Considering the AA1 is powered by a tiny 1.6ghz Atom, it's easy to imagine how some simple details can make an huge difference.

After some tests and tweaks now the final fps count for the octree-accelerated sponza atrium is:
- minimum 36 fps
- average 45 fps
- max 65 fps

Not bad for a 65-70k poly scene.

I'm looking for new ways to use my AA1. It's cool to use emails, surf the web, write documents, use msn and everything you usually do with a netbook, but I'd like to do some (graphic) programming on it. Unluckly I can't use the framework/engine, as compiling a simple application based on it (in release and debug) produces more than 2GB of intermediate data. It takes almost half of an hour compiling on a pretty decent desktop dual core machine, I don't want to know how much time it would take on the little baby.

That's why I took back from the grave (aka a 250GB 2.5" USB HDD) a small framework for 4kb intros. After recompiling it in VS2005 and performing tweaks on linkage I managed to get an extra 23-bytes optimization. Oh... and the thing runs in my vista-based notebook. Wow.

I'd like to install a VS and the last DirectX SDK, but I have some doubts. I'd like to go with VS2005, but VS2005 express doesn't include the Windows SDK and it doesn't sound like a good idea to install the entire Windows SDK on such a small SSD drive (or SD card). The same applies to DirectX. Out of 900mb I barely need 100mb.

Probably I'll save an image of the SDD, so that I'll be able to easily revert to my current configuration, then I'll install VS2008 and try to manually configure DX SDK by copying only the files I need and forcing VS IDE to point at the correct directories. I wonder how VS2008 compiler is going to perform compared to the one I'm currently using. My worst nightmare is the /QIfist option, deprecated in VS2005, could have been removed from VS2008. I hope it's not the case.

Feel free to drop a comment if you have infos about VS2008 code generation.