venerdì 11 settembre 2009

Multithreading: a small update

The multithreaded pipeline is almost working.

I still have to clean the code, remove hacks, change the way the frame data is handled and extensively test everything.

Thursday I turned off my pc at 10.15pm, so I didn't have too much time to play with the pipeline!

The first thing I did friday morning was to change my singlethreaded pipeline so that it could run exactly like the multithreaded one does.

I prepared a simple test scene (sponza atrium imported as separate meshes, 500 cubes (each with an unique VB/IB pair) each moving left/right with a simple sin(x) instruction) and started measuring rendering speed.

The first results in dx10 windowed mode were encouraging:

- singlethreaded 133 fps
- multithreaded 177 fps

This improvement comes only from the separation of the scene update thread and the rendering thread, both running at full speed.

Reducing the scene update time and performing interpolation widens the difference in windowed mode:

- singlethreaded 133 fps
- multithreaded 195 fps

The final performances for all configurations in fullscreen mode are:

- dx9 singlethreaded 84
- dx9 multithreaded 104
- dx10 singlethreaded 170
- dx10 multithreaded 275

The dx9 version is clearly limited by a GPU bottleneck. The trick speeding up dx10 seems to be related to the fact vertex format/vertex shader linkage is the same for all meshes. While dx9 recalculates it when a shader/vb change is requested, in dx10 it is user responsibility to link them, so there's no state change when switching VBs.

Despite being GPU-limited, dx9 version still shows an improvement of approximately 20-25% when using the multithreaded pipeline.

What's next?

As I said I'd like to clean and extend the code, then I'll probably have a look at furtherly parallelize my pipeline. In particular I'd like to get a closer look at jobs/job pools/job graphs and schedulers. There's still room for improvement there.

domenica 6 settembre 2009

Life on the Other Side

I'm still here and the framework is getting better and better!

Except for a couple of trips to Isola d'Elba I've spent my summer working on different engine/framework designs.

The framework is the testbed for new things which could (or could not) find their way into the engine. As for the features I've been implementing in my framework, let me name two:

- DX10 support
- multithreaded rendering pipeline (work in progress)

As for D3D10, it took just a couple of days to implement the rendering subsystem. It's far from perfect but the tests look promising. For simple scenes the speed is comparable of the one I get with the old D3D9 subsystem.
A more complex scene where hundreds of objects to be rendered is up to 3 times faster in D3D10!

Multithreading the framework took a lot of time for designing and 5 different implementations to get a decent level of independency. ATM there's only separation beetween the scene thread and the rendering thread. The speed increase is approx 5-10%, but the scene thread is lightweight so I expect to see major improvements when the entities will feature complex behaviour.

On the scene side, I'm in the process of completing the design (and of course the implementation) of the entities update. The ultimate goal is to be able to independently update the entities (in theory with pools containing N entities).

I don't expect to need this level of control on granularity, but I'd like the system to be enough flexible to easily support it.

The part I'm working on right now is, as I said, the way an entity can be updated.

The idea is to let an entity expose different parameter sets, then only work on them so that the code is as much reusable as possible.

FROM a multithreaded POV, the parameters update processes could be grouped (think about a job pool) and executed independently, then the current scene gets rendered.

Thus the multithreading involves two different levels:

level 1: job pools -> scene
level 2: scene -> rendering pipeline

Hopefully next week I'll have the final design and implementation.

I still think it would be cool to write a serie of posts about how the framework has been implemented and designed, but I'd like to complete the overall design before writing them.