domenica 5 luglio 2009

A small update: work and framework

I've been spending the last two months working on a small framework, while at work I've entirely rewritten the scene import system.

The import system was implemented from scratch without any kind of design and it got bigger and bigger as new features were added. In the end I found myself with a couple of huge classes, 400+ KBytes of code and mem leaks all around. WOW.
When I had to modify something I felt disgusted everytime I opened those dirty files. Of course to cleanly extend the import system was nearly impossible, you could only add more crap to it. Of course the system was supposed to be an emergency fix, but it stayed there for too much time. IMHO there were a couple of good ideas, but it was impossible to take advantage of them as everything which was implemented around was nothing but a huge mess. BTW the new import system is running and performs well: faster, cleaner, powerful, extendable and without mem leaks.

As for the framework, it works but some parts are still missing. I'm planning to start updating this blog more often, and write a serie of posts about the making of the framework.

Thinking about the last two months, it wasn't a good time, as it's awful to write 900KB of code without seeing anything exciting on the lcd panel.

I hope the fun part is going to start asap.

venerdì 8 maggio 2009

AA1: configuration done. Time to write a small framework.

I managed to properly configure the AA1 for development a couple of days after my previous post.

After downloading and burning VS2008 express SP1 to a DVD (it's only 750+ MBs, what a waste of space!), I made an image of the internal 8GB drive to revert to the previous configuration in case something went wrong.

I wasn't able to install VS2008 to my second drive (16GB SDHC), since the operation isn't supported on removable drives. It seems there's a way to force the additional storage to be seen as a permanent disk, but this applies to the EEE and needs a device driver change. I went for drive C.

As for DirectX, I picked up August 2008 SDK (which is the one I use in my notebook) and installed with minimum components.

VS2008 works reasonably well (as expected, compilation times aren't that great.. I suppose the SD is the bottleneck), starts quite fast but it takes some time to close.

I suggest to disable intellisense, there are a couple of methods for removing it.
Change the dll filename located at:
\VC\vcpackages\feacp.dll
or disable it via macros.

The little baby still works and I can compile/run a a simple piece of network code.

I've been busy at work, but last weekend I took the code for 4kb intros and tried to furtherly optimize it. The exe now is 27 bytes smaller and runs everywhere, while the older one had a lot of compatibility issues.

Speaking about work, hopefully I will be able to release new screenshots later this month. This should include an IOTD submission on gamedev.

I need to write a small framework for testing stuff, and I'd like it to work an AA1.
I'm going to use it for a tiny project which should come with documentation about design choices and implementation details. I could use this blog to comment my work daily, so when it's done I'll just have to copy-n-paste my blog entries.

I hope I'm going to update this blog more often. :)

sabato 21 marzo 2009

AA1: let's optimize!

I didn't expect to wait three months before posting something new here.

I've been so busy at work I had no spare time to spend on programming at home.
In February I had to setup a simple web server and decided to make it on my AA1. I installed and configured MySQL DB, Apache, PHP. It was weird, to say the least, to see that little baby run a website.

Recently I had the chance to relax for a couple of days and decided to remove the webserver from my AA1 and see if I can fix my octree (see my previous post).

To sum up, here's the situation: 5fps without octree acceleration, 29 fps when enabling it (but with corrupted rendering, including missing parts of the scene).

I started looking at mesh corruption, and I discovered some tests were missing while generating the octree. After fixing it the reference model (sponza atrium) run at 15-16 fps, regardless of the resolution. It could make sense, as the previous framerate referred to a corrupted mesh (some parts were not drawn).

I tried different octree configurations (adjusting the minimum node size and the maximum amount of primitives per node). As expected when seeing the entire model the speed decreases to 10fps, going up to 40-50fps when looking at a boundary.

I was not yet satisfied by the resulting performance, and decided to dig into my code to check if there was some space for improvement. I noticed the code building primitives didn't take into account MinIndex and NumVertices.
In general it's not a problem to set those values to 0 and NumberOfVerticesInVB, as modern GPUs are quite tolerant and these parameters come from an old age.

In case of Intel 945GM, vertex shaders aren't implemented in HW, but are emulated by the CPU. Considering the AA1 is powered by a tiny 1.6ghz Atom, it's easy to imagine how some simple details can make an huge difference.

After some tests and tweaks now the final fps count for the octree-accelerated sponza atrium is:
- minimum 36 fps
- average 45 fps
- max 65 fps

Not bad for a 65-70k poly scene.

I'm looking for new ways to use my AA1. It's cool to use emails, surf the web, write documents, use msn and everything you usually do with a netbook, but I'd like to do some (graphic) programming on it. Unluckly I can't use the framework/engine, as compiling a simple application based on it (in release and debug) produces more than 2GB of intermediate data. It takes almost half of an hour compiling on a pretty decent desktop dual core machine, I don't want to know how much time it would take on the little baby.

That's why I took back from the grave (aka a 250GB 2.5" USB HDD) a small framework for 4kb intros. After recompiling it in VS2005 and performing tweaks on linkage I managed to get an extra 23-bytes optimization. Oh... and the thing runs in my vista-based notebook. Wow.

I'd like to install a VS and the last DirectX SDK, but I have some doubts. I'd like to go with VS2005, but VS2005 express doesn't include the Windows SDK and it doesn't sound like a good idea to install the entire Windows SDK on such a small SSD drive (or SD card). The same applies to DirectX. Out of 900mb I barely need 100mb.

Probably I'll save an image of the SDD, so that I'll be able to easily revert to my current configuration, then I'll install VS2008 and try to manually configure DX SDK by copying only the files I need and forcing VS IDE to point at the correct directories. I wonder how VS2008 compiler is going to perform compared to the one I'm currently using. My worst nightmare is the /QIfist option, deprecated in VS2005, could have been removed from VS2008. I hope it's not the case.

Feel free to drop a comment if you have infos about VS2008 code generation.

venerdì 2 gennaio 2009

AA1: rocks! 2009: Rocks!

I've not updated this blog since sept, I hope I'll write more posts this year.
Actually I have 5-6 posts which are in a "draft" state, as they need to be polished before appearing here.

I recently got a white Acer Aspire One. This little baby has already undergone a serie of heavy modifications:
- ram increased to 1.5gb
- bios updated to rev 3309
- extra 16gb sd
- internal sd reformatted with 32kb clusters
- two installations of windows xp, the one currently running being an nlited xp sp3
- created a 256mb ramdisk to store temporary files
- removed virtual memory/pagefile.sys file

Boot time is approx 35-40 secs while shutdown takes 40-45.

I'm happy, as I brought it in a supermarket at a good price: 199€. The only problem with my configuration is the bios update. It seems some LCD panels have problems when brightness is set to minimum. Mine was fine but acer increased the minimum brightness in 3309 bios rev, so this results in a shorter battery time. I knew about this bios problem, but I've been forced to upgrade because the original version wasn't able to properly detect/use the additional 1gb 667mhz ddr2 ram. I know the baby has 533 ddr2, but the only memory available (and cheap) was clocked at 667mhz.

Hope acer will fix this problem ASAP.

I've been able to launch a couple of simple apps I wrote. They run, but some of them are vertex shader limited. The problem is Intel945 doesn't support hardware vertex shaders, as they are software emulated. It's weird to see an application run at the same speed at 320x200 and 1024x600.

I tried sound streaming code but external SD card is slow. Of course when using the ramdisk everything is fine. The problem isn't about the sound itself, as I can stream it without glitches from an SD. The problem is the thread decoding audio takes too much time, thus the rendering one is slower. Sounds like a good test.

Rendering seems fine until I generate an huge octree (65-70k polys). In that case I get visual artifacts and the app slows down to 4-5 fps. If I render stuff as single meshes, I get 29fps. When "octreeing" a 5k polys scene I get up to 250fps.
Skinning also works seamlessly, including animation mixing.

I've spent the first day of 2009 speeping. Kinda of. The problem is I came back home at 11.30AM, too much tired to work. :D

Happy new year!

sabato 20 settembre 2008

SSGI. Version 2.0

I've had some spare time to waste on my SSGI implementation.

I had two VS projects, one dating back to january (whose renders are available in my previous post about SSGI) and the other modified in march.

I didn't remember exactly which modifications I made, but it looks different than the first one, although it suffers from the same problems.

In my previous post I spent some words about the impossibility to fine tune SSGI.. let's see again one of the shots.

As you can see, there are many problems:



1- "E" gets blurred. Since I gather samples around a pixel and every surrounding pixel emits light, the resulting image looks "haloed".
2- There's fake lighting. By fake fighting I mean the shape gets too much light. I'd like SSGI not to start a lighting war against a standard lighting model. SSGI should add a modest contribution, it's not supposed to create fake lights.
3- You can clearly see an halo representing my filter kernel size, this is awful and gets worse when you move the camera. This is due to the SSAOish nature of the algorithm, but there are some tricks to reduce this effect.
4- This is impossible to see, as it's related to the way I'm combining the diffuse and SSGI buffers. Since the contribution is too much heavy, I've been forced to scale SSGI buffer AND blend it with diffuse buffer. I'd like to be able to simply add SSGI buffer.

Since the algorithm suffers from the aforementioned problems, the results are:

1- it's impossible to clearly see a fine detail of a texture. That blurred look could be ok for a dream-like scene, but it's not going to help you render realistic scenes.
2- coherence with local lighting is lost. Lights supposed to gently illuminate geometry produce too much bright areas. It's going to be a nightmare to tune it.
3- the effect is quite awful when moving the camera. Do I need to say more?
4- the artist isn't able to control the overall look of the scene.

I took the modded version and looked for possible solutions. After some work and tests, I came out with a version I think it's better than the previous one. It still suffers from some problem like haloing, but I've some ideas to furtherly improve it.

My goal was to create a "gentle" SSGI shader adding subtle details to the scene.


No SSGI


SSGI

It's hardly noticeable, but things gets better by adding a simple dot(n,l) lighting to the reference image. Here's a closer shot.


No SSGI


SSGI

The "cool" look of the old shots, a-la photon mapping, is still here but is noticeable when looking at small, flat, details:


No SSGI


SSGI

I also ran a simple test on "Sponza Atrium". SSGI haloing is still here and is a bit too bright but as I said now it's easily tweakable.


No SSGI


SSGI

I'm planning to improve the algorithm, in particular I'd like to remove halos and to integrate it into a full-featured render system.

martedì 16 settembre 2008

(Almost) Back from holidays! Ray Marching inside.

My holidays are almost gone. Next monday I'm expected to be in front of my PC working on some things we didn't complete last month.

I know I'm going to have an headache soon, since ATM we have two branches of the engine. The first branch is the "good" one and dates back to mid july, the other one has been our testing lab for an entire month.
Of course it's buggy but it looks very promising. We're going to fix the second branch and then we'll merge the two. I can't wait to see the new stuff running on the clean version of the engine.

I hope someday I'll be able to post a couple of pics and comment them.

Now, let's move to something better: programming stuff.

Unsurprisingly, as this is a recurring topic in CG, it seems ray marching/ray tracing is going to be the "next big thing". Again.

Maybe you'll want to check this link and have a look at OMPF forums.

I'm not going to share my opinion about the role rays will play in next-gen engines, as "it's hard to make predictions - especially about the future", athough I have to admit I've always been fascinated by "alternative" (non polygon based) rendering algorithms.

To put it simple, a couple of months ago I decided to write a (very) simple ray marcher in my spare time, just to start playing with rays and GPUs.

Today I've added lighting and I'd like to share some pics and links.

First of all if you're interested in ray marching, or distance field rendering you should jump to IQ's website. There's a lot of great stuff there, he also has a developer journal on gamedev.net which definitely deserves (more than just) a look.

I started playing with an heightmap renderer, just to test the basic ray marching machinery.

The idea is quite simple, draw a full screen quad, fire a ray for each pixel on screen and perform steps until it collides with the underlying "geometry", which in the case of an heightmap is a 3D point made up by texel coordinates and the color of the texel itself.

I've decided to go for a true 3D raymarcher, that means I'm able to freely move the camera along any axis. After setting up the rays and displaying stuff around, I've noticed my raymarcher "featured" awful visual artifacts. I borrowed a binary search step from my parallax mapping implementation and it solved them. I wanted to render an heightmap a-la "Comanche: Maximum Overkill". Here's a shot of the first version:


Vintage power!

I expected it would have been enough to enable bilinear filtering and carefully pick up the proper subpixel position to let the GPU smooth my voxels.
Of course that was not the case and I had to write a specific texture fetching routine to smooth them.

Here's the second version:


That's better

I've added a simple dot(n,l) lighting model, note that lighting is calculated via a "blocky" texture fetch, so it's not as cool as the "geometry":


Blocky lighting on near geometry

Of course the ray marcher also works with volume textures and functions (check IQ's papers for that).

I plan to improve the heightmap renderer, in particular I'd like to speed it up by implementing a cone step mapping algorithm. I'd also like to generate the heightmap and color textures via a shader instead of loading them from image files.

Maybe I can get a 4kb intro out of this simple raymarcher..

venerdì 11 luglio 2008

SSGI

I recently read an interesting post by Wolfgang Engel here.

The idea is to calculate indirect lighting in screen space via a technique which resembles SSAO.

I've been playing with something similar a few months ago (january). At that time, the engine had a very simple rendering pipeline.

After playing with SSGI a couple of days, here's the result I got:



Now, a few months later, the engine has a complete "render system" but there's no support for SSGI.

The main problem with my implementation was its (lack of) tweakability. Furthermore, it's heavily scene dependent and I've been so busy on the engine that I never got the chance to fix some artifacts (partly derived from my old SSAO implementation).

I also took some comparison shots to show the effects of SSGI contribution.


Albedo-only


With SSGI

I'd like to have some spare time to play with SSGI and see what it looks like after being fixed and applied to an image coming from a complete render system.

If I'll ever get a good SSGI, I'll definitely post something here.