Hardware Skinning for Irrlicht 1.7

Post those lines of code you feel like sharing or find what you require for your project here; or simply use them as tutorials.
christianclavet
Posts: 1638
Joined: Mon Apr 30, 2007 3:24 am
Location: Montreal, CANADA
Contact:

Hardware Skinning for Irrlicht 1.7

Post by christianclavet »

(Edited 11/27/2010)
Hi, I've been asking this feature but it's not fully ready now.

Blindside created a vertex shader that could do this and submitted it to the forum. The version submitted had memory leaks, that I fixed.
BlindSide used it only to check if it worked. Most of the code here are based on Blindside work and the class instancing trick came from Andres. FMX had removed the lighting from the test so we could really see the Skinning performance, that was really nice. So now you can choose to check with or without lighting.

I've made it work in Irrlicht 1.7.2 (Should also work on 1.7.1), and then created separate files to use as include, so we could only issue a simple command to activate that feature on our Skinned model.

So to use this with your own skinned meshes, you only need to do this:

1. Add the include:

Code: Select all

#include "HardwareSkinCallback.h"
2. Setup the mesh. You only need to input your device and your animated mesh to activate it. (before the run loop)

Code: Select all

HWSkinCB::getInstance()->setupNode(device,dwarf);
The current example show now 36 Dwarves to check the performance on crowds. Since I would use this feature mostly for this.
Image

Current performance on my system (EVGA NVidia GTX 460, 2,66Ghz Quad Core Intel, OS: VISTA):
Drawing 36 dwarfs (6 x 6)
OpenGL
494 FPS No vertex lighting + Hardware Skinning
170 FPS Vertex Lighting + Hardware Skinning
110 FPS No lighting + Software Skinning
108 FPS Lighting + Software skinning
303 FPS Skinning disabled

DirectX 9
585 FPS No vertex lighting + Harware Skinning
353 FPS Vertex Lighting + Hardware Skinning
121 FPS No lighting + Software skinning
---- Lighting + Software skinning not working (no lights on the model)
381 FPS Skinning disabled

Current limits:
8 Lights
80 Joints max for OpenGL
57 joints max for DirectX

[WARNING] - Blindslide shader code seem to work only on GLSL 4.0! (As my primary PC), tested it on my HTPC (Radeon 4350 - Pentium - Glsl 3.3) and the shader give 9 errors and fail to compile! Found out after Lazerblade message... We'll need to convert the shader to a lesser version of OpenGL as my current knowledge of shader is insuficient to do the task. Until then, the open GL version will only work on very recent cards... (I wonder how could shadowlair could have tested openGL?!)

Exemple set at 1/60 refresh interval for the shader callback. You can disable the refresh by putting 0 in the parameter. (milliseconds)

I've created an MSVC (2008) project with all the files (also the shaders). The shaders are (DX hlsl and GL glsl). I've tested them both and did not see memory leaks. The archive also contain a binary so it will be easier to check.

This would really take advantage of your video card shaders processing power, but It would be surely pointless to use on a old (before they came in unified architecture) or really weak (integrated). If your system can't run a recent game, then this will not give you a big improvement...

You can download it here: (1.48Mb) http://www.clavet.org/files/Hardware_Skinning.zip (Last update 26/11/2010)

Thank a lot to BlindSide and FMX for providing the code and the shaders!
Last edited by christianclavet on Mon Nov 29, 2010 6:46 pm, edited 14 times in total.
shadowslair
Posts: 758
Joined: Mon Mar 31, 2008 3:32 pm
Location: Bulgaria

Post by shadowslair »

Haha!
With: 580 fps
Without: 830 fps
Conclusion: my hardware sucks! =/
:D

EDIT: And I wonder what`s the benefit from the "GetInstance()" method? What`s the problem calling "new" and keeping the pointer? Seems pointless to me. And I don`t see where you drop the device ("createDevice") and the so precious HwSkinCallback pointer? :roll:
"Although we walk on the ground and step in the mud... our dreams and endeavors reach the immense skies..."
christianclavet
Posts: 1638
Joined: Mon Apr 30, 2007 3:24 am
Location: Montreal, CANADA
Contact:

Post by christianclavet »

Hi, I improved the demo a little. Since it's about hardware skinning, I think we should see more difference over a batch of characters instead of a single one. The class now also hint the mesh as dynamic (EHM_DYNAMIC). Hoping to get more performance with the VBO.

Image

Shadowlair, the "getInstance" will allow me to retrieve the instance of the class that is kept inside a static pointer. (I think it work as a cheap singleton). Doing a new would surely work, but this was simpler to me.

Thanks for your inputs, you were right, I was not dropping the device in the main.cpp. I added it, but I doubt it will go there (as it goes thru an infinite loop). Anyway its alway good to have it. I also updated a little the class, to have a check for the mesh when it goes into the callback and better clearing of the pointers (arrays) and added the destructor. Checked again my memory for any sign of leaks, and did not see anything. (1,56Gb used and after 15 min 1,56Gb). If you have ideas to propose, we could improve this further...

This should also work now in a subclass as in that thread (if you use this with other shaders):
http://irrlicht.sourceforge.net/phpBB2/ ... hp?t=34253


I simply replaced the archive, and it's about the same size. Check the link in the first post.

[EDIT] Done some tests here in OpenGl.
Non animated mesh (all skinning disabled) = 255 FPS
Skinned mesh (hardware) = 155 FPS
Skinned mesh (software) = 103 FPS

I wonder if there is a way to get even more performance (lost 100 FPS only activating the hardware skinning on the 36 dwarfs)

Also is there a way to increase the limit for the maximum number of joints to be at least 80? My test with a 3DS biped show 71 joints (I had around 30 bones) (There is a limit set inside the shader itself). Right now the limit set by BlindSide is 55 joints.
Last edited by christianclavet on Thu Nov 25, 2010 10:50 pm, edited 5 times in total.
lazerblade
Posts: 194
Joined: Thu Mar 18, 2010 3:31 am
Contact:

Post by lazerblade »

Yay!!!! :D

This saved me from switching rendering engines. :wink:
LazerBlade

When your mind is racing, make sure it's not racing in a circle.

3d game engine: http://sites.google.com/site/lazerbladegames/home/ray3d
lazerBlade blog: http://lazerbladegames.blogspot.com/
fmx

Post by fmx »

I had a quick look at this, spotted something you should be aware of:
there is no need to worry about 8 light-sources in your implementation just yet!

The Hardware-Shader callback has to send 8*3+8*4=56 floats to the shader for every model, every frame.
Not only are these constants (so can be precomputed and passed to the shaders only once per frame) but they can be compressed easily into shorts or bytes, no need for floats.

Commenting out the light-information lines immediately doubled the FPS of your hardware-skinning implementation on my system, which is good enough to impress even the most cynical of programmers :wink:

In HardwareSkinCallback.cpp

Code: Select all

/*
		f32* lightPosArray = new f32[8 * 3];
		f32* lightPosArrayPtr = lightPosArray;
		f32* lightColArray = new f32[8 * 4];
		f32* lightColArrayPtr = lightColArray;

		for(u32 i = 0; i < 8;++i)
		{
			if(i < services->getVideoDriver()->getDynamicLightCount())
			{
				vector3df lightPos = services->getVideoDriver()->getDynamicLight(i).Position;
				SColorf lightCol = services->getVideoDriver()->getDynamicLight(i).DiffuseColor;
				lightPosArray[0] = lightPos.X;
				lightPosArray[1] = lightPos.Y;
				lightPosArray[2] = lightPos.Z;
				lightPosArray += 3;
				
				lightColArray[0] = lightCol.r;
				lightColArray[1] = lightCol.g;
				lightColArray[2] = lightCol.b;
				lightColArray[3] = lightCol.a;
				lightColArray += 4;
			}
			else
			{
				lightPosArray[0] = 0;
				lightPosArray[1] = 0;
				lightPosArray[2] = 0;
				lightPosArray += 3;
				
				lightColArray[0] = 0;
				lightColArray[1] = 0;
				lightColArray[2] = 0;
				lightColArray[3] = 0;
				lightColArray += 4;
			}

			services->setVertexShaderConstant("lightPosArray", &lightPosArrayPtr[0], 8 * 3);
			services->setVertexShaderConstant("lightColorArray", &lightColArrayPtr[0], 8 * 4);
		}
		delete (lightPosArrayPtr);
		delete (lightColArrayPtr);
*/
Bear_130278
Posts: 237
Joined: Mon Jan 16, 2006 1:18 pm
Location: Odessa,Russian Federation

Post by Bear_130278 »

With: 23 fps
Without: 47 fps
Lol 8))
Do you like VODKA???
Image
Image
Virion
Competition winner
Posts: 2148
Joined: Mon Dec 18, 2006 5:04 am

Post by Virion »

can someone share his compiled binary? i'm lazy to compile but i would like to do the testing.
fmx

Post by fmx »

after making some changes, I get

DX
without HWS: 108 fps
with HWS: 280 fps

GL
without HWS: 106 fps
with HWS: 402 fps

I commented out all lighting calculation code in the source and shaders, and set GL to use VS_2 instead of VS_3
Last edited by fmx on Mon Nov 29, 2010 4:39 pm, edited 1 time in total.
christianclavet
Posts: 1638
Joined: Mon Apr 30, 2007 3:24 am
Location: Montreal, CANADA
Contact:

Post by christianclavet »

Thanks FMX.

Ok. the info there seemed mostly for directx. I've not seen that applied when in OpenGL. A more flexible vertex format would surely handle both the vertex lighting and the skinning. Thanks a lot for explaining!

Have you seen if there is a way to increase the joint count since we're not using the lighting anymore? I would need at least 71 joints (3DS MAX base Biped) and the max default set are 55.

Now with the current changes, there is no vertex lighting but my frame rate has gone to 370FPS with hardware skinning(GL), and 110FPS without. It's showing a major increase!

From this, I'd like to try to limit the constant refresh to 1/60 sec with a timer check and see if it will go even faster. :twisted:

Thanks for sharing!

[EDIT]
Wow! It worked!! :)
Refreshing the constants at 60FPS increased the rendering speed to 535 FPS!!! So the shader contant refresh now are done on a 60FPS interval but the rendering speed is boosted. I don't see any slowdown or lag in the animation. (why would we need animations refresh faster than 60FPS anyway!)

I added your changes to the code and also added a new parameter when we set the mesh, we can now define the refresh interval to update the constant of the shader. In the example, in main.cpp. For the Dwarf, I set the refresh to 60FPS by putting 17ms in the parameter.

This could serve for example if your have far away characters, you could lower even more their animation refresh to gain more FPS. (Perhaps adding another command to change the refresh interval once the mesh is set)

The archive has also been updated with a binary so you will not have to recompile it to check it out. Check the first post. And thanks again FMX!
[EDIT] Found a little problem in DX, fixed it and updated the archive again. If you test DX and have a problem, download the updated archive.
macron12388
Posts: 126
Joined: Wed Sep 29, 2010 8:23 pm

Post by macron12388 »

That is a good increase in performance! Yet, like you said, why would we need something that refreshes that fast, I doubt many people have monitors that refresh at more than 60/70htz. Still, that pool of extra performance is nice to have, you can pull from that and add some other resource expensive features, and you can keep it at 60 fps.
shadowslair
Posts: 758
Joined: Mon Mar 31, 2008 3:32 pm
Location: Bulgaria

Post by shadowslair »

There should be sth wrong with the latest version, christianclavet - with fmx`s version I get:
without 37 -> 84 with
with yours I get:
without 37 -> 39 with
:D
"Although we walk on the ground and step in the mud... our dreams and endeavors reach the immense skies..."
christianclavet
Posts: 1638
Joined: Mon Apr 30, 2007 3:24 am
Location: Montreal, CANADA
Contact:

Post by christianclavet »

Hi, Shadowlair.

Surely, the timer check that take too much time on your 'little card' the call to check the timer eat too much fps. Do you have something integrated? What is your type of video card? If you can recompile can you set the refresh to 0ms to see if it improve? (This is in main.cpp, when you set the mesh)

I've updated again the version, so we can enable back again the vertex lighting. Just to set the material flag to EMF_LIGHTING to true and the shader will use back lighting. When you lauch the exemple, you now have a choice to check it with or without lighting (vertex). I had to create a separate shader for GL because the boolean check was eating lot of FPS (I lost almost 40FPS just to check for enabling the lighting). I did not lost any fps using separate shaders.

I've added another test option, is to check with skinning completely disabled (nor software nor hardware). That should give and idea if the meshes were completly static. With that enabled, I saw that the FPS was lower than if I activate the hardware skinning... (250FPS, 500+ with Hardware skinning)

For some reason, the GL shader seem to support more joints than the DX shader. I was able to set the max to 80 joints (tested with my Biped mesh and the skinning was ok). It still work with DX but the skinning is affected. (Max I was able to reach is 57 joints)
Virion
Competition winner
Posts: 2148
Joined: Mon Dec 18, 2006 5:04 am

Post by Virion »

thx for the binary.

DX, with: 145
DX, without: 110

OGL, with: 340
OGL, without: 100
shadowslair
Posts: 758
Joined: Mon Mar 31, 2008 3:32 pm
Location: Bulgaria

Post by shadowslair »

christianclavet wrote:Hi, Shadowlair. Surely, the timer check that take too much time on your 'little card' the call to check the timer eat too much fps. Do you have something integrated? What is your type of video card? If you can recompile can you set the refresh to 0ms to see if it improve? (This is in main.cpp, when you set the mesh)
Hi. Card is Radeon 9550 =). Too late to compile it right now. Just wondered what was the reason. Maybe will compile it later. :wink:
"Although we walk on the ground and step in the mud... our dreams and endeavors reach the immense skies..."
christianclavet
Posts: 1638
Joined: Mon Apr 30, 2007 3:24 am
Location: Montreal, CANADA
Contact:

Post by christianclavet »

Yep! That explain all. That card was released in 2004! Seem to work with a fixed pipeline architecture and not the unified architecture as the new card now.

Here some of the specs from the AMD site:
4 parallel pixel pipelines
2 programmable vertex shader pipelines
128-bit dual-channel DDR memory interface

DirectX® 9.0 Vertex Shaders
Vertex programs up to 65,280 instructions with flow control
DirectX® 9.0 Pixel Shaders
Up to 1,536 instructions and 16 textures per rendering pass
32 temporary and constant registers
128-bit, 64-bit & 32-bit per pixel floating point color formats
Multiple Render Target (MRT) support
Complete feature set also supported in OpenGL® via extensions
I would not recommend using this for this type of card. I was even surprised you got better performance! I doubt it could run Crysis! :lol:
With a refresh of 0ms on this card it should get near FMX (Almost the same exept the check). Nice to know that it can even improve performance on theses old cards!
Post Reply