TODO in quaternion.h

Post those lines of code you feel like sharing or find what you require for your project here; or simply use them as tutorials.
Post Reply
Squareys
Posts: 18
Joined: Tue Mar 11, 2014 8:09 pm
Location: Konstanz, Germany

TODO in quaternion.h

Post by Squareys »

Hello all!

I just stumbled upon this TODO code on line 242 in quaternion.h:

Code: Select all

 
        const f32 scale = sqrtf(diag) * 2.0f; // get scale from diagonal
 
        // TODO: speed this up
        X = (m[6] - m[9]) / scale;
        Y = (m[8] - m[2]) / scale;
        Z = (m[1] - m[4]) / scale;
        W = 0.25f * scale;
 
My suggestion simply:

Code: Select all

 
        const f32 scale = (sqrtf(diag) * 2.0f); // get scale from diagonal
        const f32 scaleinv = 1.0f / scale; // invers of scale for speedup
 
        X = (m[6] - m[9]) * scaleinv;
        Y = (m[8] - m[2]) * scaleinv;
        Z = (m[1] - m[4]) * scaleinv;
        W = 0.25 * scale;
 
The following three TODOs are analog to this one except for only needing scaleinv. Anybody have further suggestions?
Looking forward to what you come up with.

Greetings!
Squareys


PS: I'm wondering if inline assembly would be overdoing it/help at all. Maybe somebody more experienced with this topic
may want to comment on that.
Squareys @ facebook > https://www.facebook.com/Squareys
Squareys @ twitter > https://twitter.com/squareys

VhiteRabbit @ facebook > https://facebook.com/vhiterabbit
VhiteRabbit @ twitter > https://twitter.com/vhiterabbitvr
hendu
Posts: 2600
Joined: Sat Dec 18, 2010 12:53 pm

Re: TODO in quaternion.h

Post by hendu »

That causes a precision loss, which may matter when playing with quaternions.
REDDemon
Developer
Posts: 1044
Joined: Tue Aug 31, 2010 8:06 pm
Location: Genova (Italy)

Re: TODO in quaternion.h

Post by REDDemon »

at least on x86 you'll get greater precision using temporary f64 values. (same time for arithmetic operations, the only instructions that change are load and store from registers to cache.. of course mobile is another story.. maybe: I don't know wich instructions have android phones etc.)

Also you can skip one multiplication by 2 (wich is extra instruction even if the compiler decide to optimize it.)

Code: Select all

 
        const f64 scale = f64(sqrtf(diag)); // get scale from diagonal
        const f64 scaleinv = 0.5 / scale; // invers of scale for speedup
 
        X = f32(  (f64(m[6]) - f64(m[9])) * scaleinv  );
        Y = f32(  (f64(m[8]) - f64(m[2])) * scaleinv  );
        Z = f32(  (f64(m[1]) - f64(m[4])) * scaleinv  );
        W = f32( 0.50 * scale );
 
to profile against:

Code: Select all

 
        const f64 scaleinv = 0.5 /  f64(sqrtf(diag)); 
 
        X = f32(  (f64(m[6]) - f64(m[9])) * scaleinv  );
        Y = f32(  (f64(m[8]) - f64(m[2])) * scaleinv  );
        Z = f32(  (f64(m[1]) - f64(m[4])) * scaleinv  );
        W = f32( 0.25 / scaleinv );
 
 
The compiler can't do that kind of optimizations on its own because those optimizations could break a ton of code.

At some point irrlicht decided to switch project optimization level, wich at least for GCC means going from x87 extended register to MMX registers (that are used with limited 32 and 64 bit precision even when not using SSE ) wich could potentially have reduced precision for all users in all places where temporary f32 values were used (because a float on x87 have 80 bit O_O)..

Someone should take actual irrlicht code, all snippets in this thread and profile performance and precision :D
Junior Irrlicht Developer.
Real value in social networks is not about "increasing" number of followers, but about getting in touch with Amazing people.
- by Me
Post Reply