mercoledì 23 dicembre 2009
Simple tone generator tutorial
I've posted a simple tone generator tutorial here. Feel free to post comments.
giovedì 12 novembre 2009
giovedì 5 novembre 2009
lunedì 2 novembre 2009
Playing with QT part one: configure your environment
I've been using WxWidgets at work for some time and I wanted to try something different for my framework, so I decided to give a try to Qt.
I had troubles finding good documentation, so here's a small guide about how to configure Qt with MSVC in a few simple steps.
First of all I assume you have a copy of MSVC 2005 (not an express versions). The problem with express versions is they don't support plugins, so the integration with express versions might not be optimal.
Are you ready? Let's go.
1- download a copy of Qt from here. Get the SDK, windows version.
2- install the sdk.
3- add the environment variable "QTDIR = installdir\version\qt". In my case I installed Qt in c:\qt and the version is 2009.04 so "QTDIR = c:\qt\2009.04\qt". To add an environment variable, right-click the computer icon on your desktop, select "properties". A window opens, select "Advanced system settings", then select "Environment variables...".
4- copy the following text "configure -no-sql-sqlite -no-qt3support -no-opengl -platform win32-msvc2005 -no-libtiff -no-dbus -no-phonon -no-phonon-backend -no-webkit", open the notepad and past it. Save as "qt-configure.bat" in "installdir\version\qt". You can play with it, but that configuration works for me. I found it somewhere and looks fine.
5- Launch MSVC. Launch "visual studio command prompt" from tools menu. A visual studio command prompt includes MSVC environment variables. The other solution is to manually open a command prompt (start button->execute->cmd.exe) and launch vcvars32.bat in your MSVC install dir (under vc\bin).
6- type "cd installdrive\installdir\version\qt". In my case "cd c:\qt\2009.04\qt".
7- execute our .bat file by writing "qt-configure" or "qt-configure.bat". Wait until Qt generates all makefiles for MSVC.
8- Launch the compilation with nmake. Type "nmake".
9- Wait until the compilation is completed. It's going to take a lot of time.
10- Close the command prompt and MSVC, download the add-in and install it. At this time the last version is 1.1.1
11- Reboot and start MSVC, go to qt menu->qt options. Add a new version, in my case the path is c:\qt\2009.04\qt.
Now you can use Qt with MSVC. You can create Qt applications and Qt designer plugins easily.
Problems I have encountered so far are:
- After installing the plugin, everytime you execute or compile, MSVC will open the entire solution tree and scroll down until the end of the tree. I hope they'll fix it soon.
- The Qt designer plugin appwizard generates code that needs a small modification to be correctly loaded by the designer. I'll discuss this problem in another post.
I'm using Windows Vista 32-bit and everything works.
I had troubles finding good documentation, so here's a small guide about how to configure Qt with MSVC in a few simple steps.
First of all I assume you have a copy of MSVC 2005 (not an express versions). The problem with express versions is they don't support plugins, so the integration with express versions might not be optimal.
Are you ready? Let's go.
1- download a copy of Qt from here. Get the SDK, windows version.
2- install the sdk.
3- add the environment variable "QTDIR = installdir\version\qt". In my case I installed Qt in c:\qt and the version is 2009.04 so "QTDIR = c:\qt\2009.04\qt". To add an environment variable, right-click the computer icon on your desktop, select "properties". A window opens, select "Advanced system settings", then select "Environment variables...".
4- copy the following text "configure -no-sql-sqlite -no-qt3support -no-opengl -platform win32-msvc2005 -no-libtiff -no-dbus -no-phonon -no-phonon-backend -no-webkit", open the notepad and past it. Save as "qt-configure.bat" in "installdir\version\qt". You can play with it, but that configuration works for me. I found it somewhere and looks fine.
5- Launch MSVC. Launch "visual studio command prompt" from tools menu. A visual studio command prompt includes MSVC environment variables. The other solution is to manually open a command prompt (start button->execute->cmd.exe) and launch vcvars32.bat in your MSVC install dir (under vc\bin).
6- type "cd installdrive\installdir\version\qt". In my case "cd c:\qt\2009.04\qt".
7- execute our .bat file by writing "qt-configure" or "qt-configure.bat". Wait until Qt generates all makefiles for MSVC.
8- Launch the compilation with nmake. Type "nmake".
9- Wait until the compilation is completed. It's going to take a lot of time.
10- Close the command prompt and MSVC, download the add-in and install it. At this time the last version is 1.1.1
11- Reboot and start MSVC, go to qt menu->qt options. Add a new version, in my case the path is c:\qt\2009.04\qt.
Now you can use Qt with MSVC. You can create Qt applications and Qt designer plugins easily.
Problems I have encountered so far are:
- After installing the plugin, everytime you execute or compile, MSVC will open the entire solution tree and scroll down until the end of the tree. I hope they'll fix it soon.
- The Qt designer plugin appwizard generates code that needs a small modification to be correctly loaded by the designer. I'll discuss this problem in another post.
I'm using Windows Vista 32-bit and everything works.
martedì 20 ottobre 2009
Be linear or be wrong: get rid of gamma correction!
This is a recurring topic.
Everything you need to know about gamma correction and linear color spaces is available for free, in GPU Gems 3, here.
When I received my copy of GPU Gems 3 it took a while to correctly understand what they were talking about in chapter 24. Call me dumb, but I think that chapter is a complex description of a simple problem.
I don't pretend to do better, but here's a simple description of the problem (toghether with a solution). I won't cover sRGB textures or mipmap generation.
It's nothing but a very simple post. After all many people just get textures as standard images, ask the rendering API to generate mipmaps for them and draw stuff on screen.
The problem
The human eye is more sensitive to dark colors and the precision of a display is limited to 256 shades (8-bits) per color channel. In order to increase the amount of shades we can distinguish on a monitor/lcd panel, a function is applied by the display. This operation is called "gamma correction", and is typically c=x^y, where x = original shade, y = something between 2.0 and 2.4 and c is the perceived color.
For simplicity, I will assume a gamma correction factor of 2, thus y = 2.0.
It is important to understand your display is doing something RIGHT NOW to make images look "better". The web page you're looking at is corrected by your display according to what your eyes sees better.
So?
The problem is in theory an intensity should scale linearly. If 1.0 is full intensity, then 0.5 should result in half the intensity. When we apply gamma correction we have:
1.0*1.0 = 1.0 -> full intensity (correct)
0.5*0.5 = 0.25 -> 1/4 intensity (wrong)
Thus, a gamma corrected color space is not linear.
How things can get wrong: an example
Alice the graphic artist has to pick up a solid grey color that can fit well with a white background. She creates a new white image, then draws a solid gray rectangle. She changes the rectangle color until she finds a good one. She picks that color and saves a 1x1 image.
Bob the programmer receives the 1x1 image to be applied as a texture to an interface button.
Alice is happy, she thinks that dark grey, of approximately 1/4th the intensity of full white, is going to fit well.
Bob is happy, he got a 1x1 image and all he needs to do is to load that texture and draw the button.
They are wrong.
The color received by Bob IS NOT THE SAME COLOR Alice saw on her display.
Alice had the perception of a color whose intensity is (0.25,0.25,0.25), since she saw something gamma corrected. The color saved in the image is actually (0.5,0.5,0.5)!
Bob draws the control... and the color is fine.
Bob and Alice think the problem is solved, actually they never thought it was a problem to display a color, but they don't know there's a subtle mistake in their image rendering process.
Why did Bob see the correct color?
Bob loads the texture (0.5,0.5,0.5), then renders the button background color. The display applies gamma correction so:
0.5*0.5 = 0.25
This is the same color Alice saw on her display.
How can things go wrong?
Bob is asked to apply a simple diffuse lighting model to the interface, so he goes for the standard dot(N,L).
Now, let's do the math. We assume dot(N,L) for a given pixel is 0.8.
We have 0.5*0.8 = 0.4
Then the display applies gamma correction:
0.4 * 0.4 = 0.16
We are multiplying the original color by 0.8. That means we want 80% of the original intensity.
Alice saw a color intensity of 1/4th (approx 0.25) on her screen, so we should get a color intensity of 0.20. But we have 0.16 instead of 0.20!
Obviously there's a mistake, as the output color is darker than the one we expected. It's an huge error, 20% darker than we expected!
What's the problem?
Problem number one: the original color isn't the one Alice saw on her display.
Problem number two: the display remaps our color in a non linear color space.
Which is the solution?
The solution is simple and is divided into two steps. Let's see the first.
The original color is not gamma corrected, as its intensity is 0.5. We need to work on the same color intensity Alice saw on her display.
So the first step after the texture sampling is to apply gamma correction:
0.5*0.5 = 0.25
Now we are working on the proper color shade.
0.25*0.8 = 0.2
This is the color we expect to see.
Bob tries to render the interface and he gets something very dark. Too much dark.
Bob forgot the display applies gamma correction AGAIN, so:
0.2*0.2 = 0.04
Thus the second step required is to cancel out the gamma correction, by applying the inverse operation, just before returning the color in our pixel shader.
0.2^0.5 = 0.4472
The display will gamma-correct 0.4472, so:
0.4472*0.4472 = 0.19998
Except for limited precision, Bob is now seeing the correct color shade.
In brief, the solution is the following:
- get the color
- apply gamma correction
- perform operations
- apply inverse gamma correction
- output to screen (this will cancel out the previous step)
Note the same also applies to constants like the color of a light.
It's easy to understand all those mistakes lead to wrong rendering output, expecially when dealing with multiple lights.
Be careful
Just a couple of hints.
Unless you are storing intermediate data on a buffer with 16-bit per channel, NEVER store gamma corrected colors in buffers, or you'll get horrible banding. The problem is by applying gamma correction to a color, you require more precision than the one available on a 8-bit channe. Let's do the math:
1.0/255.0 = 0.003921
This is the step between each intensity for an 8-bit channel. You can't be more precise than that.
"color as an 8-bit value in the image" vs "float you get in your pixel shader"
0 = 0.0
1 = 0.003921
2 = 0.003921+0.003921
.....
254 = 1.0f-0.003921
255 = 1.0
If you apply gamma correction and store the results in an 8-bit per channel buffer you can calculate which is the minimum color you can represent.
0.003921^0.5 = 0.06261
No color below 0.06261 can be represented.
Which color is 0.06261 in your image?
0.06261*255.0 = 15,96.
That means all colors between 0 and 15 will become 0 in your intermediate 8-bit buffer, if the float-to-int conversion is truncation. If it's done by rounding to the nearest integer, then all colors between 0-7 will become 0 while the ones between 8-15 will be 1. Either way, it's not good.
You may ask: what does happen to colors greater than 15?
The same principle applies.
Your image (8-bit numbers) has a color X and a color X+1.
Your shader interprets them as x/255.0 and x/255.0 + 0.0039. When you apply gamma, the difference between the two colors gets so small there's no way to distinguish them.
When you save your color you have lost information, thus the result is an awful rendering with color bands when you retrieve it.
The lesson is: IF THE INTERMEDIATE BUFFER HAS 8-BIT PER CHANNEL, ALWAYS STORE THE GAMMA UNCORRECTED (ORIGINAL) DATA.
Another solution is to use sRGB textures, have a look at the GPU Gems 3 chapter for that.
Which shader instructions should I use?
It's simple, assuming a 2.0 gamma.
Apply gamma correction:
col = col*col;
Cancel display gamma correction:
col = sqrt(col);
Assuming a 2.2 gamma things are a bit different
Apply gamma correction:
col = pow(col,2.2f);
Cancel display gamma correction:
col = pow(col,1.0f/2.2f);
Note: if you're using alpha testing or alpha blending then save the alpha channel value to a temporary variable before applying the color space transformations then restore it. Alpha channel in the original image is ok. Normal maps and height data are also ok.
How does a correct linear rendering looks like?
Sorry for the bad quality, here's a simple sphere with a single spot light.
Left: wrong rendering. Right: correct rendering.
They look different than the ones on GPU Gems 3 because the ambient term is zero.
Everything you need to know about gamma correction and linear color spaces is available for free, in GPU Gems 3, here.
When I received my copy of GPU Gems 3 it took a while to correctly understand what they were talking about in chapter 24. Call me dumb, but I think that chapter is a complex description of a simple problem.
I don't pretend to do better, but here's a simple description of the problem (toghether with a solution). I won't cover sRGB textures or mipmap generation.
It's nothing but a very simple post. After all many people just get textures as standard images, ask the rendering API to generate mipmaps for them and draw stuff on screen.
The problem
The human eye is more sensitive to dark colors and the precision of a display is limited to 256 shades (8-bits) per color channel. In order to increase the amount of shades we can distinguish on a monitor/lcd panel, a function is applied by the display. This operation is called "gamma correction", and is typically c=x^y, where x = original shade, y = something between 2.0 and 2.4 and c is the perceived color.
For simplicity, I will assume a gamma correction factor of 2, thus y = 2.0.
It is important to understand your display is doing something RIGHT NOW to make images look "better". The web page you're looking at is corrected by your display according to what your eyes sees better.
So?
The problem is in theory an intensity should scale linearly. If 1.0 is full intensity, then 0.5 should result in half the intensity. When we apply gamma correction we have:
1.0*1.0 = 1.0 -> full intensity (correct)
0.5*0.5 = 0.25 -> 1/4 intensity (wrong)
Thus, a gamma corrected color space is not linear.
How things can get wrong: an example
Alice the graphic artist has to pick up a solid grey color that can fit well with a white background. She creates a new white image, then draws a solid gray rectangle. She changes the rectangle color until she finds a good one. She picks that color and saves a 1x1 image.
Bob the programmer receives the 1x1 image to be applied as a texture to an interface button.
Alice is happy, she thinks that dark grey, of approximately 1/4th the intensity of full white, is going to fit well.
Bob is happy, he got a 1x1 image and all he needs to do is to load that texture and draw the button.
They are wrong.
The color received by Bob IS NOT THE SAME COLOR Alice saw on her display.
Alice had the perception of a color whose intensity is (0.25,0.25,0.25), since she saw something gamma corrected. The color saved in the image is actually (0.5,0.5,0.5)!
Bob draws the control... and the color is fine.
Bob and Alice think the problem is solved, actually they never thought it was a problem to display a color, but they don't know there's a subtle mistake in their image rendering process.
Why did Bob see the correct color?
Bob loads the texture (0.5,0.5,0.5), then renders the button background color. The display applies gamma correction so:
0.5*0.5 = 0.25
This is the same color Alice saw on her display.
How can things go wrong?
Bob is asked to apply a simple diffuse lighting model to the interface, so he goes for the standard dot(N,L).
Now, let's do the math. We assume dot(N,L) for a given pixel is 0.8.
We have 0.5*0.8 = 0.4
Then the display applies gamma correction:
0.4 * 0.4 = 0.16
We are multiplying the original color by 0.8. That means we want 80% of the original intensity.
Alice saw a color intensity of 1/4th (approx 0.25) on her screen, so we should get a color intensity of 0.20. But we have 0.16 instead of 0.20!
Obviously there's a mistake, as the output color is darker than the one we expected. It's an huge error, 20% darker than we expected!
What's the problem?
Problem number one: the original color isn't the one Alice saw on her display.
Problem number two: the display remaps our color in a non linear color space.
Which is the solution?
The solution is simple and is divided into two steps. Let's see the first.
The original color is not gamma corrected, as its intensity is 0.5. We need to work on the same color intensity Alice saw on her display.
So the first step after the texture sampling is to apply gamma correction:
0.5*0.5 = 0.25
Now we are working on the proper color shade.
0.25*0.8 = 0.2
This is the color we expect to see.
Bob tries to render the interface and he gets something very dark. Too much dark.
Bob forgot the display applies gamma correction AGAIN, so:
0.2*0.2 = 0.04
Thus the second step required is to cancel out the gamma correction, by applying the inverse operation, just before returning the color in our pixel shader.
0.2^0.5 = 0.4472
The display will gamma-correct 0.4472, so:
0.4472*0.4472 = 0.19998
Except for limited precision, Bob is now seeing the correct color shade.
In brief, the solution is the following:
- get the color
- apply gamma correction
- perform operations
- apply inverse gamma correction
- output to screen (this will cancel out the previous step)
Note the same also applies to constants like the color of a light.
It's easy to understand all those mistakes lead to wrong rendering output, expecially when dealing with multiple lights.
Be careful
Just a couple of hints.
Unless you are storing intermediate data on a buffer with 16-bit per channel, NEVER store gamma corrected colors in buffers, or you'll get horrible banding. The problem is by applying gamma correction to a color, you require more precision than the one available on a 8-bit channe. Let's do the math:
1.0/255.0 = 0.003921
This is the step between each intensity for an 8-bit channel. You can't be more precise than that.
"color as an 8-bit value in the image" vs "float you get in your pixel shader"
0 = 0.0
1 = 0.003921
2 = 0.003921+0.003921
.....
254 = 1.0f-0.003921
255 = 1.0
If you apply gamma correction and store the results in an 8-bit per channel buffer you can calculate which is the minimum color you can represent.
0.003921^0.5 = 0.06261
No color below 0.06261 can be represented.
Which color is 0.06261 in your image?
0.06261*255.0 = 15,96.
That means all colors between 0 and 15 will become 0 in your intermediate 8-bit buffer, if the float-to-int conversion is truncation. If it's done by rounding to the nearest integer, then all colors between 0-7 will become 0 while the ones between 8-15 will be 1. Either way, it's not good.
You may ask: what does happen to colors greater than 15?
The same principle applies.
Your image (8-bit numbers) has a color X and a color X+1.
Your shader interprets them as x/255.0 and x/255.0 + 0.0039. When you apply gamma, the difference between the two colors gets so small there's no way to distinguish them.
When you save your color you have lost information, thus the result is an awful rendering with color bands when you retrieve it.
The lesson is: IF THE INTERMEDIATE BUFFER HAS 8-BIT PER CHANNEL, ALWAYS STORE THE GAMMA UNCORRECTED (ORIGINAL) DATA.
Another solution is to use sRGB textures, have a look at the GPU Gems 3 chapter for that.
Which shader instructions should I use?
It's simple, assuming a 2.0 gamma.
Apply gamma correction:
col = col*col;
Cancel display gamma correction:
col = sqrt(col);
Assuming a 2.2 gamma things are a bit different
Apply gamma correction:
col = pow(col,2.2f);
Cancel display gamma correction:
col = pow(col,1.0f/2.2f);
Note: if you're using alpha testing or alpha blending then save the alpha channel value to a temporary variable before applying the color space transformations then restore it. Alpha channel in the original image is ok. Normal maps and height data are also ok.
How does a correct linear rendering looks like?
Sorry for the bad quality, here's a simple sphere with a single spot light.
Left: wrong rendering. Right: correct rendering.
They look different than the ones on GPU Gems 3 because the ambient term is zero.
domenica 18 ottobre 2009
Working with D3D10: fullscreen quads inside.
As I already pointed out, the framework has D3D9 and D3D10 support.
The D3D10 Rendering Subsystem has been implemented from scratch this summer in two sessions, 4-5 hours each. While I have been working with D3D9 for a long time, that was the first (and until today, the only) time I wrote D3D10 code.
Here's the story in brief.
I had a running D3D9 rendering subsystem class, derived from a base class. I decided to give a try to D3D10, so I looked around for hints and tutorials.
I started with DirectX10 documentation and this. IMHO tutorials often aren't the best way to learn something (expecially when it comes down to properly detect errors, allocate/release resources, cleverly store objects, etc.), but they are great if you use them as reference (working) code.
I adapted my D3D10 code so that it could fit the rendering subsystem requirements, keeping YAGNI in mind.
The next day I had almost everything I needed already working. Shaders, textures (materials), shader parameters, vertex and index buffers, render targets. The reference sample code at that time was simple but feature rich: a 3d quad with a texture coloured and zoomed with shader parameters rendered to a render target, that is used as texture for the same quad rendered in another (rendering) pass.
Now, three months later, I'm working on the D3D10 rendering subsystem again.
What's good about it is since then I had to add very little functionalities in my D3D10 rendering subsystem. Despite my little knowledge and the fact I don't take advantage of useful features like constant buffers, the subsystem is fast.
What's bad is I "discovered" today something very important is missing: rendering a full-screen quad. Hey, after all that's what YAGNI is about: I need it now, so I'm going to implement it.
The rendering subsystem provides the user the chance to draw a rectangle, via this method:
NxBool DrawRect( NxFloat i_fX0, NxFloat i_fY0, NxFloat i_fX1, NxFloat i_fY1 ) = 0;
In D3D9 the implementation is straightforward, I have a local array with per-vertex data initialized according to the parameters submitted, I set a proper vertex format with position and UVs and I use DrawPrimitiveUP.
In D3D10, without such a mechanism, there are two options:
- use a common VB/IB pair and apply a proper transformation
- don't use buffers at all
As for the second solution, DirectX documentation covers this subject.
I decided to write something like that, here are a few hints:
1- don't use a triangle list, use a triangle strip (4 vertices instead of 6). Be careful with the vertex order. When I have to render an ABCD quad (A=(0,0) B=(0,1) C=(1,1) D=(0,1))*, the order I'm using is "BCAD". The reason is since the strip is built from the last two vertices of the first primitive, they need to be the vertices of the shared edge.
2- I didn't give a try to this idea, but if you need a simple fullscreen quad you could try rendering a single triangle of size ((0,0) (2,0) (0,2))* and scale your vertex data by a factor of two. Clipping should avoid extra calculations and in theory you'll get your good ol' fullscreen quad. Since this solution doesn't work with rects, I didn't implement this idea.
What's great is I don't have to worry about pixel-texel centers alignment. If you don't know what I'm talking about and you are using D3D9 you definitely need to have a look here before starting develop any postprocessing shader.
When using the framework, the main difference between D3D9 and D3D10 implementations is the amount of constraints in postprocessing shader. In D3D9 user has to write a simple passthrough vertex shader and provide two variables, one for half pixel width and another one for half pixel height. In D3D10, the vertex shader is more complex and the required parameters are the quad coordinates X0,Y0 X1,Y1.
The next step has been to create a specific class for screen rects. I wanted to be able to push screen rects inside a scene, so that building interfaces or postprocessors should be easy.
When testing compatibility I tried multithreaded and singlethreaded pipelines, they worked. I decided to check my D3D9 rendering subsystem and nothing is shown on screen.
It seems the subsystem doesn't correctly expose shader parameters. Yes, I haven't implemented it yet because of YAGNI!
Now I know what I'm going to implement today... ;)
* To make the post easier to understand I'm assuming the reference coordinate system is (0,0) at the top left corner and (1,1) at the bottom right. Which of course is NOT the case of D3D.
The D3D10 Rendering Subsystem has been implemented from scratch this summer in two sessions, 4-5 hours each. While I have been working with D3D9 for a long time, that was the first (and until today, the only) time I wrote D3D10 code.
Here's the story in brief.
I had a running D3D9 rendering subsystem class, derived from a base class. I decided to give a try to D3D10, so I looked around for hints and tutorials.
I started with DirectX10 documentation and this. IMHO tutorials often aren't the best way to learn something (expecially when it comes down to properly detect errors, allocate/release resources, cleverly store objects, etc.), but they are great if you use them as reference (working) code.
I adapted my D3D10 code so that it could fit the rendering subsystem requirements, keeping YAGNI in mind.
The next day I had almost everything I needed already working. Shaders, textures (materials), shader parameters, vertex and index buffers, render targets. The reference sample code at that time was simple but feature rich: a 3d quad with a texture coloured and zoomed with shader parameters rendered to a render target, that is used as texture for the same quad rendered in another (rendering) pass.
Now, three months later, I'm working on the D3D10 rendering subsystem again.
What's good about it is since then I had to add very little functionalities in my D3D10 rendering subsystem. Despite my little knowledge and the fact I don't take advantage of useful features like constant buffers, the subsystem is fast.
What's bad is I "discovered" today something very important is missing: rendering a full-screen quad. Hey, after all that's what YAGNI is about: I need it now, so I'm going to implement it.
The rendering subsystem provides the user the chance to draw a rectangle, via this method:
NxBool DrawRect( NxFloat i_fX0, NxFloat i_fY0, NxFloat i_fX1, NxFloat i_fY1 ) = 0;
In D3D9 the implementation is straightforward, I have a local array with per-vertex data initialized according to the parameters submitted, I set a proper vertex format with position and UVs and I use DrawPrimitiveUP.
In D3D10, without such a mechanism, there are two options:
- use a common VB/IB pair and apply a proper transformation
- don't use buffers at all
As for the second solution, DirectX documentation covers this subject.
I decided to write something like that, here are a few hints:
1- don't use a triangle list, use a triangle strip (4 vertices instead of 6). Be careful with the vertex order. When I have to render an ABCD quad (A=(0,0) B=(0,1) C=(1,1) D=(0,1))*, the order I'm using is "BCAD". The reason is since the strip is built from the last two vertices of the first primitive, they need to be the vertices of the shared edge.
2- I didn't give a try to this idea, but if you need a simple fullscreen quad you could try rendering a single triangle of size ((0,0) (2,0) (0,2))* and scale your vertex data by a factor of two. Clipping should avoid extra calculations and in theory you'll get your good ol' fullscreen quad. Since this solution doesn't work with rects, I didn't implement this idea.
What's great is I don't have to worry about pixel-texel centers alignment. If you don't know what I'm talking about and you are using D3D9 you definitely need to have a look here before starting develop any postprocessing shader.
When using the framework, the main difference between D3D9 and D3D10 implementations is the amount of constraints in postprocessing shader. In D3D9 user has to write a simple passthrough vertex shader and provide two variables, one for half pixel width and another one for half pixel height. In D3D10, the vertex shader is more complex and the required parameters are the quad coordinates X0,Y0 X1,Y1.
The next step has been to create a specific class for screen rects. I wanted to be able to push screen rects inside a scene, so that building interfaces or postprocessors should be easy.
When testing compatibility I tried multithreaded and singlethreaded pipelines, they worked. I decided to check my D3D9 rendering subsystem and nothing is shown on screen.
It seems the subsystem doesn't correctly expose shader parameters. Yes, I haven't implemented it yet because of YAGNI!
Now I know what I'm going to implement today... ;)
* To make the post easier to understand I'm assuming the reference coordinate system is (0,0) at the top left corner and (1,1) at the bottom right. Which of course is NOT the case of D3D.
sabato 17 ottobre 2009
Multithreading, parameters update rate, etc.
Big news: the first (huge) step of the multithreaded architecture has been completed.
Pros:
- I can switch from a singlethreaded pipeline to a multithreaded one with one line of code.
- I can adjust the scene update speed. Even on the fly!
- The system is robust, error free and works well regardless of the pressure put on the CPU and/or the GPU.
- When the CPU is under heavy pressure, there's a tremendous improvement. Last tests show 250fps for the multithreaded version vs 60fps in singlethread.
Cons:
- the system doesn't scale linearly with N cores.
My main concern for the last month was about the way the scene entities gets updated over time, in particular which parts of the pipeline should be responsible of the updates.
I wanted a straightforward way to update parameters on a per-frame basis. The older one was quite hacky and probably not as fast as expected.
The problem I had to face was about multiple "reference times".
Some parameters should be updated every frame, some every N milliseconds, some at unknown times (think about network data), some are updated upon request from an external class.
All those different timings must convive. You may ask: what does this have to do with rendering speed? As layers of abstraction and containers are added to cope with different timings, the "distance" between the data needed and the class receiving data widens.
Unless there is a "direct" way to map the correct data to the correct parameter, that distance is going to affect the rendering speed. How? Why? If the scene thread is responsible of updating data, why the should the rendering thread be affected? Isn't that separation the main purpose of a multithreaded pipeline?
The answer to all those questions is: rendering speed is affected because some updates are performed by the rendering thread.
You can't update a shader parameter from the scene thread, as you would be changing it while the rendering thread is running. Even if a multithreade device didn't slow down everything, synching something set so many times during a single rendering frame would be very bad, performance-wise.
The problem is by reducing the amount of abstraction layers it's impossible to keep multiple reference timings coherent, while increasing it widens the distance required by the rendering thread to get the data needed.
The results are ok, as the rendering thread can render a static 50k poly model + 500 animated objects (each with 7 animated parameters) at a speed of approximately 250fps.
I'm fine with 3500 parameters updated per frame at 250fps+rendering, considering some of them can be grouped to increase speed. After all my notebook has a T5550 and an 8600M GT, definitely not the fastest hardware around.
I guess I should now try to scale linearly with N cores and in general work on job pools/graphs and schedulers, but I miss graphics stuff...
Pros:
- I can switch from a singlethreaded pipeline to a multithreaded one with one line of code.
- I can adjust the scene update speed. Even on the fly!
- The system is robust, error free and works well regardless of the pressure put on the CPU and/or the GPU.
- When the CPU is under heavy pressure, there's a tremendous improvement. Last tests show 250fps for the multithreaded version vs 60fps in singlethread.
Cons:
- the system doesn't scale linearly with N cores.
My main concern for the last month was about the way the scene entities gets updated over time, in particular which parts of the pipeline should be responsible of the updates.
I wanted a straightforward way to update parameters on a per-frame basis. The older one was quite hacky and probably not as fast as expected.
The problem I had to face was about multiple "reference times".
Some parameters should be updated every frame, some every N milliseconds, some at unknown times (think about network data), some are updated upon request from an external class.
All those different timings must convive. You may ask: what does this have to do with rendering speed? As layers of abstraction and containers are added to cope with different timings, the "distance" between the data needed and the class receiving data widens.
Unless there is a "direct" way to map the correct data to the correct parameter, that distance is going to affect the rendering speed. How? Why? If the scene thread is responsible of updating data, why the should the rendering thread be affected? Isn't that separation the main purpose of a multithreaded pipeline?
The answer to all those questions is: rendering speed is affected because some updates are performed by the rendering thread.
You can't update a shader parameter from the scene thread, as you would be changing it while the rendering thread is running. Even if a multithreade device didn't slow down everything, synching something set so many times during a single rendering frame would be very bad, performance-wise.
The problem is by reducing the amount of abstraction layers it's impossible to keep multiple reference timings coherent, while increasing it widens the distance required by the rendering thread to get the data needed.
The results are ok, as the rendering thread can render a static 50k poly model + 500 animated objects (each with 7 animated parameters) at a speed of approximately 250fps.
I'm fine with 3500 parameters updated per frame at 250fps+rendering, considering some of them can be grouped to increase speed. After all my notebook has a T5550 and an 8600M GT, definitely not the fastest hardware around.
I guess I should now try to scale linearly with N cores and in general work on job pools/graphs and schedulers, but I miss graphics stuff...
venerdì 11 settembre 2009
Multithreading: a small update
The multithreaded pipeline is almost working.
I still have to clean the code, remove hacks, change the way the frame data is handled and extensively test everything.
Thursday I turned off my pc at 10.15pm, so I didn't have too much time to play with the pipeline!
The first thing I did friday morning was to change my singlethreaded pipeline so that it could run exactly like the multithreaded one does.
I prepared a simple test scene (sponza atrium imported as separate meshes, 500 cubes (each with an unique VB/IB pair) each moving left/right with a simple sin(x) instruction) and started measuring rendering speed.
The first results in dx10 windowed mode were encouraging:
- singlethreaded 133 fps
- multithreaded 177 fps
This improvement comes only from the separation of the scene update thread and the rendering thread, both running at full speed.
Reducing the scene update time and performing interpolation widens the difference in windowed mode:
- singlethreaded 133 fps
- multithreaded 195 fps
The final performances for all configurations in fullscreen mode are:
- dx9 singlethreaded 84
- dx9 multithreaded 104
- dx10 singlethreaded 170
- dx10 multithreaded 275
The dx9 version is clearly limited by a GPU bottleneck. The trick speeding up dx10 seems to be related to the fact vertex format/vertex shader linkage is the same for all meshes. While dx9 recalculates it when a shader/vb change is requested, in dx10 it is user responsibility to link them, so there's no state change when switching VBs.
Despite being GPU-limited, dx9 version still shows an improvement of approximately 20-25% when using the multithreaded pipeline.
What's next?
As I said I'd like to clean and extend the code, then I'll probably have a look at furtherly parallelize my pipeline. In particular I'd like to get a closer look at jobs/job pools/job graphs and schedulers. There's still room for improvement there.
I still have to clean the code, remove hacks, change the way the frame data is handled and extensively test everything.
Thursday I turned off my pc at 10.15pm, so I didn't have too much time to play with the pipeline!
The first thing I did friday morning was to change my singlethreaded pipeline so that it could run exactly like the multithreaded one does.
I prepared a simple test scene (sponza atrium imported as separate meshes, 500 cubes (each with an unique VB/IB pair) each moving left/right with a simple sin(x) instruction) and started measuring rendering speed.
The first results in dx10 windowed mode were encouraging:
- singlethreaded 133 fps
- multithreaded 177 fps
This improvement comes only from the separation of the scene update thread and the rendering thread, both running at full speed.
Reducing the scene update time and performing interpolation widens the difference in windowed mode:
- singlethreaded 133 fps
- multithreaded 195 fps
The final performances for all configurations in fullscreen mode are:
- dx9 singlethreaded 84
- dx9 multithreaded 104
- dx10 singlethreaded 170
- dx10 multithreaded 275
The dx9 version is clearly limited by a GPU bottleneck. The trick speeding up dx10 seems to be related to the fact vertex format/vertex shader linkage is the same for all meshes. While dx9 recalculates it when a shader/vb change is requested, in dx10 it is user responsibility to link them, so there's no state change when switching VBs.
Despite being GPU-limited, dx9 version still shows an improvement of approximately 20-25% when using the multithreaded pipeline.
What's next?
As I said I'd like to clean and extend the code, then I'll probably have a look at furtherly parallelize my pipeline. In particular I'd like to get a closer look at jobs/job pools/job graphs and schedulers. There's still room for improvement there.
domenica 6 settembre 2009
Life on the Other Side
I'm still here and the framework is getting better and better!
Except for a couple of trips to Isola d'Elba I've spent my summer working on different engine/framework designs.
The framework is the testbed for new things which could (or could not) find their way into the engine. As for the features I've been implementing in my framework, let me name two:
- DX10 support
- multithreaded rendering pipeline (work in progress)
As for D3D10, it took just a couple of days to implement the rendering subsystem. It's far from perfect but the tests look promising. For simple scenes the speed is comparable of the one I get with the old D3D9 subsystem.
A more complex scene where hundreds of objects to be rendered is up to 3 times faster in D3D10!
Multithreading the framework took a lot of time for designing and 5 different implementations to get a decent level of independency. ATM there's only separation beetween the scene thread and the rendering thread. The speed increase is approx 5-10%, but the scene thread is lightweight so I expect to see major improvements when the entities will feature complex behaviour.
On the scene side, I'm in the process of completing the design (and of course the implementation) of the entities update. The ultimate goal is to be able to independently update the entities (in theory with pools containing N entities).
I don't expect to need this level of control on granularity, but I'd like the system to be enough flexible to easily support it.
The part I'm working on right now is, as I said, the way an entity can be updated.
The idea is to let an entity expose different parameter sets, then only work on them so that the code is as much reusable as possible.
FROM a multithreaded POV, the parameters update processes could be grouped (think about a job pool) and executed independently, then the current scene gets rendered.
Thus the multithreading involves two different levels:
level 1: job pools -> scene
level 2: scene -> rendering pipeline
Hopefully next week I'll have the final design and implementation.
I still think it would be cool to write a serie of posts about how the framework has been implemented and designed, but I'd like to complete the overall design before writing them.
Except for a couple of trips to Isola d'Elba I've spent my summer working on different engine/framework designs.
The framework is the testbed for new things which could (or could not) find their way into the engine. As for the features I've been implementing in my framework, let me name two:
- DX10 support
- multithreaded rendering pipeline (work in progress)
As for D3D10, it took just a couple of days to implement the rendering subsystem. It's far from perfect but the tests look promising. For simple scenes the speed is comparable of the one I get with the old D3D9 subsystem.
A more complex scene where hundreds of objects to be rendered is up to 3 times faster in D3D10!
Multithreading the framework took a lot of time for designing and 5 different implementations to get a decent level of independency. ATM there's only separation beetween the scene thread and the rendering thread. The speed increase is approx 5-10%, but the scene thread is lightweight so I expect to see major improvements when the entities will feature complex behaviour.
On the scene side, I'm in the process of completing the design (and of course the implementation) of the entities update. The ultimate goal is to be able to independently update the entities (in theory with pools containing N entities).
I don't expect to need this level of control on granularity, but I'd like the system to be enough flexible to easily support it.
The part I'm working on right now is, as I said, the way an entity can be updated.
The idea is to let an entity expose different parameter sets, then only work on them so that the code is as much reusable as possible.
FROM a multithreaded POV, the parameters update processes could be grouped (think about a job pool) and executed independently, then the current scene gets rendered.
Thus the multithreading involves two different levels:
level 1: job pools -> scene
level 2: scene -> rendering pipeline
Hopefully next week I'll have the final design and implementation.
I still think it would be cool to write a serie of posts about how the framework has been implemented and designed, but I'd like to complete the overall design before writing them.
domenica 5 luglio 2009
A small update: work and framework
I've been spending the last two months working on a small framework, while at work I've entirely rewritten the scene import system.
The import system was implemented from scratch without any kind of design and it got bigger and bigger as new features were added. In the end I found myself with a couple of huge classes, 400+ KBytes of code and mem leaks all around. WOW.
When I had to modify something I felt disgusted everytime I opened those dirty files. Of course to cleanly extend the import system was nearly impossible, you could only add more crap to it. Of course the system was supposed to be an emergency fix, but it stayed there for too much time. IMHO there were a couple of good ideas, but it was impossible to take advantage of them as everything which was implemented around was nothing but a huge mess. BTW the new import system is running and performs well: faster, cleaner, powerful, extendable and without mem leaks.
As for the framework, it works but some parts are still missing. I'm planning to start updating this blog more often, and write a serie of posts about the making of the framework.
Thinking about the last two months, it wasn't a good time, as it's awful to write 900KB of code without seeing anything exciting on the lcd panel.
I hope the fun part is going to start asap.
The import system was implemented from scratch without any kind of design and it got bigger and bigger as new features were added. In the end I found myself with a couple of huge classes, 400+ KBytes of code and mem leaks all around. WOW.
When I had to modify something I felt disgusted everytime I opened those dirty files. Of course to cleanly extend the import system was nearly impossible, you could only add more crap to it. Of course the system was supposed to be an emergency fix, but it stayed there for too much time. IMHO there were a couple of good ideas, but it was impossible to take advantage of them as everything which was implemented around was nothing but a huge mess. BTW the new import system is running and performs well: faster, cleaner, powerful, extendable and without mem leaks.
As for the framework, it works but some parts are still missing. I'm planning to start updating this blog more often, and write a serie of posts about the making of the framework.
Thinking about the last two months, it wasn't a good time, as it's awful to write 900KB of code without seeing anything exciting on the lcd panel.
I hope the fun part is going to start asap.
venerdì 8 maggio 2009
AA1: configuration done. Time to write a small framework.
I managed to properly configure the AA1 for development a couple of days after my previous post.
After downloading and burning VS2008 express SP1 to a DVD (it's only 750+ MBs, what a waste of space!), I made an image of the internal 8GB drive to revert to the previous configuration in case something went wrong.
I wasn't able to install VS2008 to my second drive (16GB SDHC), since the operation isn't supported on removable drives. It seems there's a way to force the additional storage to be seen as a permanent disk, but this applies to the EEE and needs a device driver change. I went for drive C.
As for DirectX, I picked up August 2008 SDK (which is the one I use in my notebook) and installed with minimum components.
VS2008 works reasonably well (as expected, compilation times aren't that great.. I suppose the SD is the bottleneck), starts quite fast but it takes some time to close.
I suggest to disable intellisense, there are a couple of methods for removing it.
Change the dll filename located at:
\VC\vcpackages\feacp.dll
or disable it via macros.
The little baby still works and I can compile/run a a simple piece of network code.
I've been busy at work, but last weekend I took the code for 4kb intros and tried to furtherly optimize it. The exe now is 27 bytes smaller and runs everywhere, while the older one had a lot of compatibility issues.
Speaking about work, hopefully I will be able to release new screenshots later this month. This should include an IOTD submission on gamedev.
I need to write a small framework for testing stuff, and I'd like it to work an AA1.
I'm going to use it for a tiny project which should come with documentation about design choices and implementation details. I could use this blog to comment my work daily, so when it's done I'll just have to copy-n-paste my blog entries.
I hope I'm going to update this blog more often. :)
After downloading and burning VS2008 express SP1 to a DVD (it's only 750+ MBs, what a waste of space!), I made an image of the internal 8GB drive to revert to the previous configuration in case something went wrong.
I wasn't able to install VS2008 to my second drive (16GB SDHC), since the operation isn't supported on removable drives. It seems there's a way to force the additional storage to be seen as a permanent disk, but this applies to the EEE and needs a device driver change. I went for drive C.
As for DirectX, I picked up August 2008 SDK (which is the one I use in my notebook) and installed with minimum components.
VS2008 works reasonably well (as expected, compilation times aren't that great.. I suppose the SD is the bottleneck), starts quite fast but it takes some time to close.
I suggest to disable intellisense, there are a couple of methods for removing it.
Change the dll filename located at:
or disable it via macros.
The little baby still works and I can compile/run a a simple piece of network code.
I've been busy at work, but last weekend I took the code for 4kb intros and tried to furtherly optimize it. The exe now is 27 bytes smaller and runs everywhere, while the older one had a lot of compatibility issues.
Speaking about work, hopefully I will be able to release new screenshots later this month. This should include an IOTD submission on gamedev.
I need to write a small framework for testing stuff, and I'd like it to work an AA1.
I'm going to use it for a tiny project which should come with documentation about design choices and implementation details. I could use this blog to comment my work daily, so when it's done I'll just have to copy-n-paste my blog entries.
I hope I'm going to update this blog more often. :)
sabato 21 marzo 2009
AA1: let's optimize!
I didn't expect to wait three months before posting something new here.
I've been so busy at work I had no spare time to spend on programming at home.
In February I had to setup a simple web server and decided to make it on my AA1. I installed and configured MySQL DB, Apache, PHP. It was weird, to say the least, to see that little baby run a website.
Recently I had the chance to relax for a couple of days and decided to remove the webserver from my AA1 and see if I can fix my octree (see my previous post).
To sum up, here's the situation: 5fps without octree acceleration, 29 fps when enabling it (but with corrupted rendering, including missing parts of the scene).
I started looking at mesh corruption, and I discovered some tests were missing while generating the octree. After fixing it the reference model (sponza atrium) run at 15-16 fps, regardless of the resolution. It could make sense, as the previous framerate referred to a corrupted mesh (some parts were not drawn).
I tried different octree configurations (adjusting the minimum node size and the maximum amount of primitives per node). As expected when seeing the entire model the speed decreases to 10fps, going up to 40-50fps when looking at a boundary.
I was not yet satisfied by the resulting performance, and decided to dig into my code to check if there was some space for improvement. I noticed the code building primitives didn't take into account MinIndex and NumVertices.
In general it's not a problem to set those values to 0 and NumberOfVerticesInVB, as modern GPUs are quite tolerant and these parameters come from an old age.
In case of Intel 945GM, vertex shaders aren't implemented in HW, but are emulated by the CPU. Considering the AA1 is powered by a tiny 1.6ghz Atom, it's easy to imagine how some simple details can make an huge difference.
After some tests and tweaks now the final fps count for the octree-accelerated sponza atrium is:
- minimum 36 fps
- average 45 fps
- max 65 fps
Not bad for a 65-70k poly scene.
I'm looking for new ways to use my AA1. It's cool to use emails, surf the web, write documents, use msn and everything you usually do with a netbook, but I'd like to do some (graphic) programming on it. Unluckly I can't use the framework/engine, as compiling a simple application based on it (in release and debug) produces more than 2GB of intermediate data. It takes almost half of an hour compiling on a pretty decent desktop dual core machine, I don't want to know how much time it would take on the little baby.
That's why I took back from the grave (aka a 250GB 2.5" USB HDD) a small framework for 4kb intros. After recompiling it in VS2005 and performing tweaks on linkage I managed to get an extra 23-bytes optimization. Oh... and the thing runs in my vista-based notebook. Wow.
I'd like to install a VS and the last DirectX SDK, but I have some doubts. I'd like to go with VS2005, but VS2005 express doesn't include the Windows SDK and it doesn't sound like a good idea to install the entire Windows SDK on such a small SSD drive (or SD card). The same applies to DirectX. Out of 900mb I barely need 100mb.
Probably I'll save an image of the SDD, so that I'll be able to easily revert to my current configuration, then I'll install VS2008 and try to manually configure DX SDK by copying only the files I need and forcing VS IDE to point at the correct directories. I wonder how VS2008 compiler is going to perform compared to the one I'm currently using. My worst nightmare is the /QIfist option, deprecated in VS2005, could have been removed from VS2008. I hope it's not the case.
Feel free to drop a comment if you have infos about VS2008 code generation.
I've been so busy at work I had no spare time to spend on programming at home.
In February I had to setup a simple web server and decided to make it on my AA1. I installed and configured MySQL DB, Apache, PHP. It was weird, to say the least, to see that little baby run a website.
Recently I had the chance to relax for a couple of days and decided to remove the webserver from my AA1 and see if I can fix my octree (see my previous post).
To sum up, here's the situation: 5fps without octree acceleration, 29 fps when enabling it (but with corrupted rendering, including missing parts of the scene).
I started looking at mesh corruption, and I discovered some tests were missing while generating the octree. After fixing it the reference model (sponza atrium) run at 15-16 fps, regardless of the resolution. It could make sense, as the previous framerate referred to a corrupted mesh (some parts were not drawn).
I tried different octree configurations (adjusting the minimum node size and the maximum amount of primitives per node). As expected when seeing the entire model the speed decreases to 10fps, going up to 40-50fps when looking at a boundary.
I was not yet satisfied by the resulting performance, and decided to dig into my code to check if there was some space for improvement. I noticed the code building primitives didn't take into account MinIndex and NumVertices.
In general it's not a problem to set those values to 0 and NumberOfVerticesInVB, as modern GPUs are quite tolerant and these parameters come from an old age.
In case of Intel 945GM, vertex shaders aren't implemented in HW, but are emulated by the CPU. Considering the AA1 is powered by a tiny 1.6ghz Atom, it's easy to imagine how some simple details can make an huge difference.
After some tests and tweaks now the final fps count for the octree-accelerated sponza atrium is:
- minimum 36 fps
- average 45 fps
- max 65 fps
Not bad for a 65-70k poly scene.
I'm looking for new ways to use my AA1. It's cool to use emails, surf the web, write documents, use msn and everything you usually do with a netbook, but I'd like to do some (graphic) programming on it. Unluckly I can't use the framework/engine, as compiling a simple application based on it (in release and debug) produces more than 2GB of intermediate data. It takes almost half of an hour compiling on a pretty decent desktop dual core machine, I don't want to know how much time it would take on the little baby.
That's why I took back from the grave (aka a 250GB 2.5" USB HDD) a small framework for 4kb intros. After recompiling it in VS2005 and performing tweaks on linkage I managed to get an extra 23-bytes optimization. Oh... and the thing runs in my vista-based notebook. Wow.
I'd like to install a VS and the last DirectX SDK, but I have some doubts. I'd like to go with VS2005, but VS2005 express doesn't include the Windows SDK and it doesn't sound like a good idea to install the entire Windows SDK on such a small SSD drive (or SD card). The same applies to DirectX. Out of 900mb I barely need 100mb.
Probably I'll save an image of the SDD, so that I'll be able to easily revert to my current configuration, then I'll install VS2008 and try to manually configure DX SDK by copying only the files I need and forcing VS IDE to point at the correct directories. I wonder how VS2008 compiler is going to perform compared to the one I'm currently using. My worst nightmare is the /QIfist option, deprecated in VS2005, could have been removed from VS2008. I hope it's not the case.
Feel free to drop a comment if you have infos about VS2008 code generation.
venerdì 2 gennaio 2009
AA1: rocks! 2009: Rocks!
I've not updated this blog since sept, I hope I'll write more posts this year.
Actually I have 5-6 posts which are in a "draft" state, as they need to be polished before appearing here.
I recently got a white Acer Aspire One. This little baby has already undergone a serie of heavy modifications:
- ram increased to 1.5gb
- bios updated to rev 3309
- extra 16gb sd
- internal sd reformatted with 32kb clusters
- two installations of windows xp, the one currently running being an nlited xp sp3
- created a 256mb ramdisk to store temporary files
- removed virtual memory/pagefile.sys file
Boot time is approx 35-40 secs while shutdown takes 40-45.
I'm happy, as I brought it in a supermarket at a good price: 199€. The only problem with my configuration is the bios update. It seems some LCD panels have problems when brightness is set to minimum. Mine was fine but acer increased the minimum brightness in 3309 bios rev, so this results in a shorter battery time. I knew about this bios problem, but I've been forced to upgrade because the original version wasn't able to properly detect/use the additional 1gb 667mhz ddr2 ram. I know the baby has 533 ddr2, but the only memory available (and cheap) was clocked at 667mhz.
Hope acer will fix this problem ASAP.
I've been able to launch a couple of simple apps I wrote. They run, but some of them are vertex shader limited. The problem is Intel945 doesn't support hardware vertex shaders, as they are software emulated. It's weird to see an application run at the same speed at 320x200 and 1024x600.
I tried sound streaming code but external SD card is slow. Of course when using the ramdisk everything is fine. The problem isn't about the sound itself, as I can stream it without glitches from an SD. The problem is the thread decoding audio takes too much time, thus the rendering one is slower. Sounds like a good test.
Rendering seems fine until I generate an huge octree (65-70k polys). In that case I get visual artifacts and the app slows down to 4-5 fps. If I render stuff as single meshes, I get 29fps. When "octreeing" a 5k polys scene I get up to 250fps.
Skinning also works seamlessly, including animation mixing.
I've spent the first day of 2009 speeping. Kinda of. The problem is I came back home at 11.30AM, too much tired to work. :D
Happy new year!
Actually I have 5-6 posts which are in a "draft" state, as they need to be polished before appearing here.
I recently got a white Acer Aspire One. This little baby has already undergone a serie of heavy modifications:
- ram increased to 1.5gb
- bios updated to rev 3309
- extra 16gb sd
- internal sd reformatted with 32kb clusters
- two installations of windows xp, the one currently running being an nlited xp sp3
- created a 256mb ramdisk to store temporary files
- removed virtual memory/pagefile.sys file
Boot time is approx 35-40 secs while shutdown takes 40-45.
I'm happy, as I brought it in a supermarket at a good price: 199€. The only problem with my configuration is the bios update. It seems some LCD panels have problems when brightness is set to minimum. Mine was fine but acer increased the minimum brightness in 3309 bios rev, so this results in a shorter battery time. I knew about this bios problem, but I've been forced to upgrade because the original version wasn't able to properly detect/use the additional 1gb 667mhz ddr2 ram. I know the baby has 533 ddr2, but the only memory available (and cheap) was clocked at 667mhz.
Hope acer will fix this problem ASAP.
I've been able to launch a couple of simple apps I wrote. They run, but some of them are vertex shader limited. The problem is Intel945 doesn't support hardware vertex shaders, as they are software emulated. It's weird to see an application run at the same speed at 320x200 and 1024x600.
I tried sound streaming code but external SD card is slow. Of course when using the ramdisk everything is fine. The problem isn't about the sound itself, as I can stream it without glitches from an SD. The problem is the thread decoding audio takes too much time, thus the rendering one is slower. Sounds like a good test.
Rendering seems fine until I generate an huge octree (65-70k polys). In that case I get visual artifacts and the app slows down to 4-5 fps. If I render stuff as single meshes, I get 29fps. When "octreeing" a 5k polys scene I get up to 250fps.
Skinning also works seamlessly, including animation mixing.
I've spent the first day of 2009 speeping. Kinda of. The problem is I came back home at 11.30AM, too much tired to work. :D
Happy new year!
Iscriviti a:
Post (Atom)