TODO in quaternion.h

Squareys · Post by **Squareys** » Mon Sep 08, 2014 3:05 pm

Hello all!

I just stumbled upon this TODO code on line 242 in quaternion.h:

 
        const f32 scale = sqrtf(diag) * 2.0f; // get scale from diagonal
 
        // TODO: speed this up
        X = (m[6] - m[9]) / scale;
        Y = (m[8] - m[2]) / scale;
        Z = (m[1] - m[4]) / scale;
        W = 0.25f * scale;

My suggestion simply:

Code: Select all

 
        const f32 scale = (sqrtf(diag) * 2.0f); // get scale from diagonal
        const f32 scaleinv = 1.0f / scale; // invers of scale for speedup
 
        X = (m[6] - m[9]) * scaleinv;
        Y = (m[8] - m[2]) * scaleinv;
        Z = (m[1] - m[4]) * scaleinv;
        W = 0.25 * scale;

The following three TODOs are analog to this one except for only needing scaleinv. Anybody have further suggestions?
Looking forward to what you come up with.

Greetings!
Squareys

PS: I'm wondering if inline assembly would be overdoing it/help at all. Maybe somebody more experienced with this topic
may want to comment on that.

hendu · Post by **hendu** » Mon Sep 08, 2014 7:31 pm

That causes a precision loss, which may matter when playing with quaternions.

REDDemon · Post by **REDDemon** » Fri Sep 12, 2014 1:30 am

at least on x86 you'll get greater precision using temporary f64 values. (same time for arithmetic operations, the only instructions that change are load and store from registers to cache.. of course mobile is another story.. maybe: I don't know wich instructions have android phones etc.)

Also you can skip one multiplication by 2 (wich is extra instruction even if the compiler decide to optimize it.)

Code: Select all

 
        const f64 scale = f64(sqrtf(diag)); // get scale from diagonal
        const f64 scaleinv = 0.5 / scale; // invers of scale for speedup
 
        X = f32(  (f64(m[6]) - f64(m[9])) * scaleinv  );
        Y = f32(  (f64(m[8]) - f64(m[2])) * scaleinv  );
        Z = f32(  (f64(m[1]) - f64(m[4])) * scaleinv  );
        W = f32( 0.50 * scale );

to profile against:

Code: Select all

 
        const f64 scaleinv = 0.5 /  f64(sqrtf(diag)); 
 
        X = f32(  (f64(m[6]) - f64(m[9])) * scaleinv  );
        Y = f32(  (f64(m[8]) - f64(m[2])) * scaleinv  );
        Z = f32(  (f64(m[1]) - f64(m[4])) * scaleinv  );
        W = f32( 0.25 / scaleinv );

The compiler can't do that kind of optimizations on its own because those optimizations could break a ton of code.

At some point irrlicht decided to switch project optimization level, wich at least for GCC means going from x87 extended register to MMX registers (that are used with limited 32 and 64 bit precision even when not using SSE ) wich could potentially have reduced precision for all users in all places where temporary f32 values were used (because a float on x87 have 80 bit O_O)..

Someone should take actual irrlicht code, all snippets in this thread and profile performance and precision

Irrlicht Engine

TODO in quaternion.h

TODO in quaternion.h

Re: TODO in quaternion.h

Re: TODO in quaternion.h