The multithreaded pipeline is almost working.
I still have to clean the code, remove hacks, change the way the frame data is handled and extensively test everything.
Thursday I turned off my pc at 10.15pm, so I didn't have too much time to play with the pipeline!
The first thing I did friday morning was to change my singlethreaded pipeline so that it could run exactly like the multithreaded one does.
I prepared a simple test scene (sponza atrium imported as separate meshes, 500 cubes (each with an unique VB/IB pair) each moving left/right with a simple sin(x) instruction) and started measuring rendering speed.
The first results in dx10 windowed mode were encouraging:
- singlethreaded 133 fps
- multithreaded 177 fps
This improvement comes only from the separation of the scene update thread and the rendering thread, both running at full speed.
Reducing the scene update time and performing interpolation widens the difference in windowed mode:
- singlethreaded 133 fps
- multithreaded 195 fps
The final performances for all configurations in fullscreen mode are:
- dx9 singlethreaded 84
- dx9 multithreaded 104
- dx10 singlethreaded 170
- dx10 multithreaded 275
The dx9 version is clearly limited by a GPU bottleneck. The trick speeding up dx10 seems to be related to the fact vertex format/vertex shader linkage is the same for all meshes. While dx9 recalculates it when a shader/vb change is requested, in dx10 it is user responsibility to link them, so there's no state change when switching VBs.
Despite being GPU-limited, dx9 version still shows an improvement of approximately 20-25% when using the multithreaded pipeline.
What's next?
As I said I'd like to clean and extend the code, then I'll probably have a look at furtherly parallelize my pipeline. In particular I'd like to get a closer look at jobs/job pools/job graphs and schedulers. There's still room for improvement there.
venerdì 11 settembre 2009
domenica 6 settembre 2009
Life on the Other Side
I'm still here and the framework is getting better and better!
Except for a couple of trips to Isola d'Elba I've spent my summer working on different engine/framework designs.
The framework is the testbed for new things which could (or could not) find their way into the engine. As for the features I've been implementing in my framework, let me name two:
- DX10 support
- multithreaded rendering pipeline (work in progress)
As for D3D10, it took just a couple of days to implement the rendering subsystem. It's far from perfect but the tests look promising. For simple scenes the speed is comparable of the one I get with the old D3D9 subsystem.
A more complex scene where hundreds of objects to be rendered is up to 3 times faster in D3D10!
Multithreading the framework took a lot of time for designing and 5 different implementations to get a decent level of independency. ATM there's only separation beetween the scene thread and the rendering thread. The speed increase is approx 5-10%, but the scene thread is lightweight so I expect to see major improvements when the entities will feature complex behaviour.
On the scene side, I'm in the process of completing the design (and of course the implementation) of the entities update. The ultimate goal is to be able to independently update the entities (in theory with pools containing N entities).
I don't expect to need this level of control on granularity, but I'd like the system to be enough flexible to easily support it.
The part I'm working on right now is, as I said, the way an entity can be updated.
The idea is to let an entity expose different parameter sets, then only work on them so that the code is as much reusable as possible.
FROM a multithreaded POV, the parameters update processes could be grouped (think about a job pool) and executed independently, then the current scene gets rendered.
Thus the multithreading involves two different levels:
level 1: job pools -> scene
level 2: scene -> rendering pipeline
Hopefully next week I'll have the final design and implementation.
I still think it would be cool to write a serie of posts about how the framework has been implemented and designed, but I'd like to complete the overall design before writing them.
Except for a couple of trips to Isola d'Elba I've spent my summer working on different engine/framework designs.
The framework is the testbed for new things which could (or could not) find their way into the engine. As for the features I've been implementing in my framework, let me name two:
- DX10 support
- multithreaded rendering pipeline (work in progress)
As for D3D10, it took just a couple of days to implement the rendering subsystem. It's far from perfect but the tests look promising. For simple scenes the speed is comparable of the one I get with the old D3D9 subsystem.
A more complex scene where hundreds of objects to be rendered is up to 3 times faster in D3D10!
Multithreading the framework took a lot of time for designing and 5 different implementations to get a decent level of independency. ATM there's only separation beetween the scene thread and the rendering thread. The speed increase is approx 5-10%, but the scene thread is lightweight so I expect to see major improvements when the entities will feature complex behaviour.
On the scene side, I'm in the process of completing the design (and of course the implementation) of the entities update. The ultimate goal is to be able to independently update the entities (in theory with pools containing N entities).
I don't expect to need this level of control on granularity, but I'd like the system to be enough flexible to easily support it.
The part I'm working on right now is, as I said, the way an entity can be updated.
The idea is to let an entity expose different parameter sets, then only work on them so that the code is as much reusable as possible.
FROM a multithreaded POV, the parameters update processes could be grouped (think about a job pool) and executed independently, then the current scene gets rendered.
Thus the multithreading involves two different levels:
level 1: job pools -> scene
level 2: scene -> rendering pipeline
Hopefully next week I'll have the final design and implementation.
I still think it would be cool to write a serie of posts about how the framework has been implemented and designed, but I'd like to complete the overall design before writing them.
domenica 5 luglio 2009
A small update: work and framework
I've been spending the last two months working on a small framework, while at work I've entirely rewritten the scene import system.
The import system was implemented from scratch without any kind of design and it got bigger and bigger as new features were added. In the end I found myself with a couple of huge classes, 400+ KBytes of code and mem leaks all around. WOW.
When I had to modify something I felt disgusted everytime I opened those dirty files. Of course to cleanly extend the import system was nearly impossible, you could only add more crap to it. Of course the system was supposed to be an emergency fix, but it stayed there for too much time. IMHO there were a couple of good ideas, but it was impossible to take advantage of them as everything which was implemented around was nothing but a huge mess. BTW the new import system is running and performs well: faster, cleaner, powerful, extendable and without mem leaks.
As for the framework, it works but some parts are still missing. I'm planning to start updating this blog more often, and write a serie of posts about the making of the framework.
Thinking about the last two months, it wasn't a good time, as it's awful to write 900KB of code without seeing anything exciting on the lcd panel.
I hope the fun part is going to start asap.
The import system was implemented from scratch without any kind of design and it got bigger and bigger as new features were added. In the end I found myself with a couple of huge classes, 400+ KBytes of code and mem leaks all around. WOW.
When I had to modify something I felt disgusted everytime I opened those dirty files. Of course to cleanly extend the import system was nearly impossible, you could only add more crap to it. Of course the system was supposed to be an emergency fix, but it stayed there for too much time. IMHO there were a couple of good ideas, but it was impossible to take advantage of them as everything which was implemented around was nothing but a huge mess. BTW the new import system is running and performs well: faster, cleaner, powerful, extendable and without mem leaks.
As for the framework, it works but some parts are still missing. I'm planning to start updating this blog more often, and write a serie of posts about the making of the framework.
Thinking about the last two months, it wasn't a good time, as it's awful to write 900KB of code without seeing anything exciting on the lcd panel.
I hope the fun part is going to start asap.
venerdì 8 maggio 2009
AA1: configuration done. Time to write a small framework.
I managed to properly configure the AA1 for development a couple of days after my previous post.
After downloading and burning VS2008 express SP1 to a DVD (it's only 750+ MBs, what a waste of space!), I made an image of the internal 8GB drive to revert to the previous configuration in case something went wrong.
I wasn't able to install VS2008 to my second drive (16GB SDHC), since the operation isn't supported on removable drives. It seems there's a way to force the additional storage to be seen as a permanent disk, but this applies to the EEE and needs a device driver change. I went for drive C.
As for DirectX, I picked up August 2008 SDK (which is the one I use in my notebook) and installed with minimum components.
VS2008 works reasonably well (as expected, compilation times aren't that great.. I suppose the SD is the bottleneck), starts quite fast but it takes some time to close.
I suggest to disable intellisense, there are a couple of methods for removing it.
Change the dll filename located at:
\VC\vcpackages\feacp.dll
or disable it via macros.
The little baby still works and I can compile/run a a simple piece of network code.
I've been busy at work, but last weekend I took the code for 4kb intros and tried to furtherly optimize it. The exe now is 27 bytes smaller and runs everywhere, while the older one had a lot of compatibility issues.
Speaking about work, hopefully I will be able to release new screenshots later this month. This should include an IOTD submission on gamedev.
I need to write a small framework for testing stuff, and I'd like it to work an AA1.
I'm going to use it for a tiny project which should come with documentation about design choices and implementation details. I could use this blog to comment my work daily, so when it's done I'll just have to copy-n-paste my blog entries.
I hope I'm going to update this blog more often. :)
After downloading and burning VS2008 express SP1 to a DVD (it's only 750+ MBs, what a waste of space!), I made an image of the internal 8GB drive to revert to the previous configuration in case something went wrong.
I wasn't able to install VS2008 to my second drive (16GB SDHC), since the operation isn't supported on removable drives. It seems there's a way to force the additional storage to be seen as a permanent disk, but this applies to the EEE and needs a device driver change. I went for drive C.
As for DirectX, I picked up August 2008 SDK (which is the one I use in my notebook) and installed with minimum components.
VS2008 works reasonably well (as expected, compilation times aren't that great.. I suppose the SD is the bottleneck), starts quite fast but it takes some time to close.
I suggest to disable intellisense, there are a couple of methods for removing it.
Change the dll filename located at:
or disable it via macros.
The little baby still works and I can compile/run a a simple piece of network code.
I've been busy at work, but last weekend I took the code for 4kb intros and tried to furtherly optimize it. The exe now is 27 bytes smaller and runs everywhere, while the older one had a lot of compatibility issues.
Speaking about work, hopefully I will be able to release new screenshots later this month. This should include an IOTD submission on gamedev.
I need to write a small framework for testing stuff, and I'd like it to work an AA1.
I'm going to use it for a tiny project which should come with documentation about design choices and implementation details. I could use this blog to comment my work daily, so when it's done I'll just have to copy-n-paste my blog entries.
I hope I'm going to update this blog more often. :)
sabato 21 marzo 2009
AA1: let's optimize!
I didn't expect to wait three months before posting something new here.
I've been so busy at work I had no spare time to spend on programming at home.
In February I had to setup a simple web server and decided to make it on my AA1. I installed and configured MySQL DB, Apache, PHP. It was weird, to say the least, to see that little baby run a website.
Recently I had the chance to relax for a couple of days and decided to remove the webserver from my AA1 and see if I can fix my octree (see my previous post).
To sum up, here's the situation: 5fps without octree acceleration, 29 fps when enabling it (but with corrupted rendering, including missing parts of the scene).
I started looking at mesh corruption, and I discovered some tests were missing while generating the octree. After fixing it the reference model (sponza atrium) run at 15-16 fps, regardless of the resolution. It could make sense, as the previous framerate referred to a corrupted mesh (some parts were not drawn).
I tried different octree configurations (adjusting the minimum node size and the maximum amount of primitives per node). As expected when seeing the entire model the speed decreases to 10fps, going up to 40-50fps when looking at a boundary.
I was not yet satisfied by the resulting performance, and decided to dig into my code to check if there was some space for improvement. I noticed the code building primitives didn't take into account MinIndex and NumVertices.
In general it's not a problem to set those values to 0 and NumberOfVerticesInVB, as modern GPUs are quite tolerant and these parameters come from an old age.
In case of Intel 945GM, vertex shaders aren't implemented in HW, but are emulated by the CPU. Considering the AA1 is powered by a tiny 1.6ghz Atom, it's easy to imagine how some simple details can make an huge difference.
After some tests and tweaks now the final fps count for the octree-accelerated sponza atrium is:
- minimum 36 fps
- average 45 fps
- max 65 fps
Not bad for a 65-70k poly scene.
I'm looking for new ways to use my AA1. It's cool to use emails, surf the web, write documents, use msn and everything you usually do with a netbook, but I'd like to do some (graphic) programming on it. Unluckly I can't use the framework/engine, as compiling a simple application based on it (in release and debug) produces more than 2GB of intermediate data. It takes almost half of an hour compiling on a pretty decent desktop dual core machine, I don't want to know how much time it would take on the little baby.
That's why I took back from the grave (aka a 250GB 2.5" USB HDD) a small framework for 4kb intros. After recompiling it in VS2005 and performing tweaks on linkage I managed to get an extra 23-bytes optimization. Oh... and the thing runs in my vista-based notebook. Wow.
I'd like to install a VS and the last DirectX SDK, but I have some doubts. I'd like to go with VS2005, but VS2005 express doesn't include the Windows SDK and it doesn't sound like a good idea to install the entire Windows SDK on such a small SSD drive (or SD card). The same applies to DirectX. Out of 900mb I barely need 100mb.
Probably I'll save an image of the SDD, so that I'll be able to easily revert to my current configuration, then I'll install VS2008 and try to manually configure DX SDK by copying only the files I need and forcing VS IDE to point at the correct directories. I wonder how VS2008 compiler is going to perform compared to the one I'm currently using. My worst nightmare is the /QIfist option, deprecated in VS2005, could have been removed from VS2008. I hope it's not the case.
Feel free to drop a comment if you have infos about VS2008 code generation.
I've been so busy at work I had no spare time to spend on programming at home.
In February I had to setup a simple web server and decided to make it on my AA1. I installed and configured MySQL DB, Apache, PHP. It was weird, to say the least, to see that little baby run a website.
Recently I had the chance to relax for a couple of days and decided to remove the webserver from my AA1 and see if I can fix my octree (see my previous post).
To sum up, here's the situation: 5fps without octree acceleration, 29 fps when enabling it (but with corrupted rendering, including missing parts of the scene).
I started looking at mesh corruption, and I discovered some tests were missing while generating the octree. After fixing it the reference model (sponza atrium) run at 15-16 fps, regardless of the resolution. It could make sense, as the previous framerate referred to a corrupted mesh (some parts were not drawn).
I tried different octree configurations (adjusting the minimum node size and the maximum amount of primitives per node). As expected when seeing the entire model the speed decreases to 10fps, going up to 40-50fps when looking at a boundary.
I was not yet satisfied by the resulting performance, and decided to dig into my code to check if there was some space for improvement. I noticed the code building primitives didn't take into account MinIndex and NumVertices.
In general it's not a problem to set those values to 0 and NumberOfVerticesInVB, as modern GPUs are quite tolerant and these parameters come from an old age.
In case of Intel 945GM, vertex shaders aren't implemented in HW, but are emulated by the CPU. Considering the AA1 is powered by a tiny 1.6ghz Atom, it's easy to imagine how some simple details can make an huge difference.
After some tests and tweaks now the final fps count for the octree-accelerated sponza atrium is:
- minimum 36 fps
- average 45 fps
- max 65 fps
Not bad for a 65-70k poly scene.
I'm looking for new ways to use my AA1. It's cool to use emails, surf the web, write documents, use msn and everything you usually do with a netbook, but I'd like to do some (graphic) programming on it. Unluckly I can't use the framework/engine, as compiling a simple application based on it (in release and debug) produces more than 2GB of intermediate data. It takes almost half of an hour compiling on a pretty decent desktop dual core machine, I don't want to know how much time it would take on the little baby.
That's why I took back from the grave (aka a 250GB 2.5" USB HDD) a small framework for 4kb intros. After recompiling it in VS2005 and performing tweaks on linkage I managed to get an extra 23-bytes optimization. Oh... and the thing runs in my vista-based notebook. Wow.
I'd like to install a VS and the last DirectX SDK, but I have some doubts. I'd like to go with VS2005, but VS2005 express doesn't include the Windows SDK and it doesn't sound like a good idea to install the entire Windows SDK on such a small SSD drive (or SD card). The same applies to DirectX. Out of 900mb I barely need 100mb.
Probably I'll save an image of the SDD, so that I'll be able to easily revert to my current configuration, then I'll install VS2008 and try to manually configure DX SDK by copying only the files I need and forcing VS IDE to point at the correct directories. I wonder how VS2008 compiler is going to perform compared to the one I'm currently using. My worst nightmare is the /QIfist option, deprecated in VS2005, could have been removed from VS2008. I hope it's not the case.
Feel free to drop a comment if you have infos about VS2008 code generation.
venerdì 2 gennaio 2009
AA1: rocks! 2009: Rocks!
I've not updated this blog since sept, I hope I'll write more posts this year.
Actually I have 5-6 posts which are in a "draft" state, as they need to be polished before appearing here.
I recently got a white Acer Aspire One. This little baby has already undergone a serie of heavy modifications:
- ram increased to 1.5gb
- bios updated to rev 3309
- extra 16gb sd
- internal sd reformatted with 32kb clusters
- two installations of windows xp, the one currently running being an nlited xp sp3
- created a 256mb ramdisk to store temporary files
- removed virtual memory/pagefile.sys file
Boot time is approx 35-40 secs while shutdown takes 40-45.
I'm happy, as I brought it in a supermarket at a good price: 199€. The only problem with my configuration is the bios update. It seems some LCD panels have problems when brightness is set to minimum. Mine was fine but acer increased the minimum brightness in 3309 bios rev, so this results in a shorter battery time. I knew about this bios problem, but I've been forced to upgrade because the original version wasn't able to properly detect/use the additional 1gb 667mhz ddr2 ram. I know the baby has 533 ddr2, but the only memory available (and cheap) was clocked at 667mhz.
Hope acer will fix this problem ASAP.
I've been able to launch a couple of simple apps I wrote. They run, but some of them are vertex shader limited. The problem is Intel945 doesn't support hardware vertex shaders, as they are software emulated. It's weird to see an application run at the same speed at 320x200 and 1024x600.
I tried sound streaming code but external SD card is slow. Of course when using the ramdisk everything is fine. The problem isn't about the sound itself, as I can stream it without glitches from an SD. The problem is the thread decoding audio takes too much time, thus the rendering one is slower. Sounds like a good test.
Rendering seems fine until I generate an huge octree (65-70k polys). In that case I get visual artifacts and the app slows down to 4-5 fps. If I render stuff as single meshes, I get 29fps. When "octreeing" a 5k polys scene I get up to 250fps.
Skinning also works seamlessly, including animation mixing.
I've spent the first day of 2009 speeping. Kinda of. The problem is I came back home at 11.30AM, too much tired to work. :D
Happy new year!
Actually I have 5-6 posts which are in a "draft" state, as they need to be polished before appearing here.
I recently got a white Acer Aspire One. This little baby has already undergone a serie of heavy modifications:
- ram increased to 1.5gb
- bios updated to rev 3309
- extra 16gb sd
- internal sd reformatted with 32kb clusters
- two installations of windows xp, the one currently running being an nlited xp sp3
- created a 256mb ramdisk to store temporary files
- removed virtual memory/pagefile.sys file
Boot time is approx 35-40 secs while shutdown takes 40-45.
I'm happy, as I brought it in a supermarket at a good price: 199€. The only problem with my configuration is the bios update. It seems some LCD panels have problems when brightness is set to minimum. Mine was fine but acer increased the minimum brightness in 3309 bios rev, so this results in a shorter battery time. I knew about this bios problem, but I've been forced to upgrade because the original version wasn't able to properly detect/use the additional 1gb 667mhz ddr2 ram. I know the baby has 533 ddr2, but the only memory available (and cheap) was clocked at 667mhz.
Hope acer will fix this problem ASAP.
I've been able to launch a couple of simple apps I wrote. They run, but some of them are vertex shader limited. The problem is Intel945 doesn't support hardware vertex shaders, as they are software emulated. It's weird to see an application run at the same speed at 320x200 and 1024x600.
I tried sound streaming code but external SD card is slow. Of course when using the ramdisk everything is fine. The problem isn't about the sound itself, as I can stream it without glitches from an SD. The problem is the thread decoding audio takes too much time, thus the rendering one is slower. Sounds like a good test.
Rendering seems fine until I generate an huge octree (65-70k polys). In that case I get visual artifacts and the app slows down to 4-5 fps. If I render stuff as single meshes, I get 29fps. When "octreeing" a 5k polys scene I get up to 250fps.
Skinning also works seamlessly, including animation mixing.
I've spent the first day of 2009 speeping. Kinda of. The problem is I came back home at 11.30AM, too much tired to work. :D
Happy new year!
sabato 20 settembre 2008
SSGI. Version 2.0
I've had some spare time to waste on my SSGI implementation.
I had two VS projects, one dating back to january (whose renders are available in my previous post about SSGI) and the other modified in march.
I didn't remember exactly which modifications I made, but it looks different than the first one, although it suffers from the same problems.
In my previous post I spent some words about the impossibility to fine tune SSGI.. let's see again one of the shots.
As you can see, there are many problems:

1- "E" gets blurred. Since I gather samples around a pixel and every surrounding pixel emits light, the resulting image looks "haloed".
2- There's fake lighting. By fake fighting I mean the shape gets too much light. I'd like SSGI not to start a lighting war against a standard lighting model. SSGI should add a modest contribution, it's not supposed to create fake lights.
3- You can clearly see an halo representing my filter kernel size, this is awful and gets worse when you move the camera. This is due to the SSAOish nature of the algorithm, but there are some tricks to reduce this effect.
4- This is impossible to see, as it's related to the way I'm combining the diffuse and SSGI buffers. Since the contribution is too much heavy, I've been forced to scale SSGI buffer AND blend it with diffuse buffer. I'd like to be able to simply add SSGI buffer.
Since the algorithm suffers from the aforementioned problems, the results are:
1- it's impossible to clearly see a fine detail of a texture. That blurred look could be ok for a dream-like scene, but it's not going to help you render realistic scenes.
2- coherence with local lighting is lost. Lights supposed to gently illuminate geometry produce too much bright areas. It's going to be a nightmare to tune it.
3- the effect is quite awful when moving the camera. Do I need to say more?
4- the artist isn't able to control the overall look of the scene.
I took the modded version and looked for possible solutions. After some work and tests, I came out with a version I think it's better than the previous one. It still suffers from some problem like haloing, but I've some ideas to furtherly improve it.
My goal was to create a "gentle" SSGI shader adding subtle details to the scene.

No SSGI

SSGI
It's hardly noticeable, but things gets better by adding a simple dot(n,l) lighting to the reference image. Here's a closer shot.

No SSGI

SSGI
The "cool" look of the old shots, a-la photon mapping, is still here but is noticeable when looking at small, flat, details:

No SSGI

SSGI
I also ran a simple test on "Sponza Atrium". SSGI haloing is still here and is a bit too bright but as I said now it's easily tweakable.

No SSGI

SSGI
I'm planning to improve the algorithm, in particular I'd like to remove halos and to integrate it into a full-featured render system.
I had two VS projects, one dating back to january (whose renders are available in my previous post about SSGI) and the other modified in march.
I didn't remember exactly which modifications I made, but it looks different than the first one, although it suffers from the same problems.
In my previous post I spent some words about the impossibility to fine tune SSGI.. let's see again one of the shots.
As you can see, there are many problems:

1- "E" gets blurred. Since I gather samples around a pixel and every surrounding pixel emits light, the resulting image looks "haloed".
2- There's fake lighting. By fake fighting I mean the shape gets too much light. I'd like SSGI not to start a lighting war against a standard lighting model. SSGI should add a modest contribution, it's not supposed to create fake lights.
3- You can clearly see an halo representing my filter kernel size, this is awful and gets worse when you move the camera. This is due to the SSAOish nature of the algorithm, but there are some tricks to reduce this effect.
4- This is impossible to see, as it's related to the way I'm combining the diffuse and SSGI buffers. Since the contribution is too much heavy, I've been forced to scale SSGI buffer AND blend it with diffuse buffer. I'd like to be able to simply add SSGI buffer.
Since the algorithm suffers from the aforementioned problems, the results are:
1- it's impossible to clearly see a fine detail of a texture. That blurred look could be ok for a dream-like scene, but it's not going to help you render realistic scenes.
2- coherence with local lighting is lost. Lights supposed to gently illuminate geometry produce too much bright areas. It's going to be a nightmare to tune it.
3- the effect is quite awful when moving the camera. Do I need to say more?
4- the artist isn't able to control the overall look of the scene.
I took the modded version and looked for possible solutions. After some work and tests, I came out with a version I think it's better than the previous one. It still suffers from some problem like haloing, but I've some ideas to furtherly improve it.
My goal was to create a "gentle" SSGI shader adding subtle details to the scene.

No SSGI

SSGI
It's hardly noticeable, but things gets better by adding a simple dot(n,l) lighting to the reference image. Here's a closer shot.

No SSGI

SSGI
The "cool" look of the old shots, a-la photon mapping, is still here but is noticeable when looking at small, flat, details:

No SSGI

SSGI
I also ran a simple test on "Sponza Atrium". SSGI haloing is still here and is a bit too bright but as I said now it's easily tweakable.

No SSGI

SSGI
I'm planning to improve the algorithm, in particular I'd like to remove halos and to integrate it into a full-featured render system.
Iscriviti a:
Post (Atom)