SSE vector3df and matrix4

Discuss about anything related to the Irrlicht Engine, or read announcements about any significant features or usage changes.
Simpe
Posts: 13
Joined: Tue Nov 30, 2010 12:54 pm

Post by Simpe »

... Or in a place where you can perform multiple vector operations at once. Such as a brute-force culling-algorithm.

However, I'm assuming that you've been testing on x86/x64 which are out-of-order cpu's and they can be pretty awesome at times to swallow "bad code". If you compare that to a in-order cpu (xbox360/ps3/mobile devices etc), you'll get completely different results. If Irrlicht is thought to be used in none-x86/x64-platforms, SSE should definetly be implemented for performance.

Or if you're going to evaluate something like SSE, atleast try multiple platforms, specifically platforms that benefits from it.
Last edited by Simpe on Wed Dec 15, 2010 1:04 pm, edited 1 time in total.
fmx

Post by fmx »

Simpe, you have a point, but remember that most uses of irrlicht are currently on desktop systems which dont use out-of-order CPUs.

I think more tests should be done before this is outright rejected.
I'm developing a custom GLES 2.0 based renderer for my iOS engine (iPhones, etc) and I make use of many base irrlicht types, including Vector2D, Vector3D and Matrix4's.

devsh if you can post your SSE versions of these then I can benchmark the performance of your changes on my iPhone4.
devsh
Competition winner
Posts: 2057
Joined: Tue Dec 09, 2008 6:00 pm
Location: UK
Contact:

Post by devsh »

I have posted my matrix class... but the functions are incomplete... either just copy them from the class and use pointer arithmetic to treat the 4 __m128 like 16 floats or complete the SSE implementation the reference is pretty easy... SSE is most useful with matrices so I'd start with that... you will have to test combinations of SSE functions against normal functions, stuff like multiplying by another matrix, assignment from scalar, and especially computing the inverse is always obviously faster and assignment of 4 __m128 is faster than memcpy or the same, but other stuff I found slower like some assignments (identity), transposing can be slower with -03 and -ffast-math.
Simpe
Posts: 13
Joined: Tue Nov 30, 2010 12:54 pm

Post by Simpe »

fmx wrote:Simpe, you have a point, but remember that most uses of irrlicht are currently on desktop systems which dont use out-of-order CPUs.
I think you meant the other way, (most desktop systems are out-of-order execution cpu's)... possibly because I typoed it in my post :P

But yeah, most irrlicht users run on desktop systems but for those who don't I'd say that something like this is extremely important since it makes a huge diff on performance. Just like vcalls does on in-order-machines ;)
fmx

Post by fmx »

:oops: I honestly had no idea, my experience is limited to consumer desktop PCs and iPhones, I still have a lot to learn about other hardware and differences in CPU architecture.
BlindSide
Admin
Posts: 2821
Joined: Thu Dec 08, 2005 9:09 am
Location: NZ!

Post by BlindSide »

You're gonna need to use SOA form if you want a decent speed boost, that would require re-writing a lot of the algorithms in Irrlicht.

One trick I found useful is to use a SOA3Vector class which holds 4 3-dimensional vector and performs all the ordinary operations on 4 vectors at once, so as long as you always have a long list of data to perform operations on, you should be fine :)
ShadowMapping for Irrlicht!: Get it here
Need help? Come on the IRC!: #irrlicht on irc://irc.freenode.net
devsh
Competition winner
Posts: 2057
Joined: Tue Dec 09, 2008 6:00 pm
Location: UK
Contact:

Re: SSE vector3df and matrix4

Post by devsh »

i'm ressurecting this effort because the CPU is dragging down the performance of Build a World

the previous code I made is completely unusable because its not proper SSE

this time the classes of vector3d and matrix4 and rect2d and aabbox will all be 16byte aligned and padded to 4 floats
even on normal assignment or variable declaration

we'll provide a aligned16 call which will work on both windows and linux as well as a _SSSE3_ #ifdef s and #else s , so that irrlicht can be compiled without those

we'll release the whole thing when done and opengl 3.2 compliance is in

our irrlicht is always merged with the latest stable version ( 1.8 now, but merging with 1.8.1)
Granyte
Posts: 850
Joined: Tue Jan 25, 2011 11:07 pm
Contact:

Re: SSE vector3df and matrix4

Post by Granyte »

I have been working on an SSE implementation of the matrix4 it's been able to make my fps go up 40% in some of my math heavy situation
so far only matrix works has been done as padding the vector to 4 components ended poorly have you gotten it to work?
devsh
Competition winner
Posts: 2057
Joined: Tue Dec 09, 2008 6:00 pm
Location: UK
Contact:

Re: SSE vector3df and matrix4

Post by devsh »

so far, post-poned until some game features are in and OGL 3.2 core context gets sorted
devsh
Competition winner
Posts: 2057
Joined: Tue Dec 09, 2008 6:00 pm
Location: UK
Contact:

Re: SSE vector3df and matrix4

Post by devsh »

only just sorted out the proper implementation... first I'm going to make/publish the classes and then you need to change the actual type of the matrix etc.

head over to http://irrlicht.sourceforge.net/forum/v ... =9&t=50230
Post Reply