[fixed]XML reader can't read files produced by XML writer

You discovered a bug in the engine, and you are sure that it is not a problem of your code? Just post it in here. Please read the bug posting guidelines first.

[fixed]XML reader can't read files produced by XML writer

Postby nburlock » Tue Oct 21, 2008 3:56 am

I've just finished writing the Linux implementation of IrrFontTool, but I've been having trouble with getFont rejecting the produced XML font file. The problem seems to be that the XML writer is producing a file with four bytes per character, which is the size of a wchar_t on 64 bit Linux, but the reader seems to only be able to handle two bytes per character in an XML file.

I have a couple of questions for someone knowledgeable:
1) Is there some sort of "quick fix" that I haven't heard about for 64bit systems that makes this work?
2) If this is a problem, then is it that the reader should be able to handle four byte characters, or is it that the writer shouldn't be producing four byte characters?

As soon as I've got this out the way, I'll be able to finish testing the Linux implementation of IrrFontTool and release it.
nburlock
 
Posts: 17
Joined: Tue Oct 21, 2008 3:33 am
Location: Australia

Postby Dorth » Tue Oct 21, 2008 6:24 am

Just a thought: Great way to make an entrance ^^
Finding a bug and extending Irrlicht in your 2 first posts. Nice :)
Dorth
 
Posts: 931
Joined: Sat May 26, 2007 11:03 pm

Re: XML reader can't read files produced by XML writer

Postby rogerborg » Tue Oct 21, 2008 10:52 am

nburlock wrote:1) Is there some sort of "quick fix" that I haven't heard about for 64bit systems that makes this work?


I believe that wchar_t is 32 bits by default on gcc compilers, regardless of the CPU architecture being targetted. -fshort-wchar should force it to be 16 bits.


nburlock wrote:2) If this is a problem, then is it that the reader should be able to handle four byte characters, or is it that the writer shouldn't be producing four byte characters?


It's a fundamental problem with wchar_t, which is why it's not a good type for data exchange. It would be great if Irrlicht defined its own wide type instead, perhaps a UCS-2 type (since UTF-16 brings its own sizing problems to the party).

Hmm, I'm meandering here. I guess I should actually look into doing a patch for this, although robustly testing it across all platforms will be interesting.
Please upload candidate patches to the tracker.
Need help now? IRC to #irrlicht on irc.freenode.net
How To Ask Questions The Smart Way
User avatar
rogerborg
Admin
 
Posts: 3590
Joined: Mon Oct 09, 2006 9:36 am
Location: Scotland - gonnae no slag aff mah Engleesh

Re: XML reader can't read files produced by XML writer

Postby nburlock » Tue Oct 21, 2008 12:20 pm

rogerborg wrote:-fshort-wchar should force it to be 16 bits.

Great info, thanks for that.

I've logged it as a bug:

https://sourceforge.net/tracker2/?func=detail&aid=2184294&group_id=74339&atid=540676
nburlock
 
Posts: 17
Joined: Tue Oct 21, 2008 3:33 am
Location: Australia

Postby CuteAlien » Tue Oct 21, 2008 2:24 pm

I don't think that's the problem. All xml-files produced by Irrlicht on Linux are (unfortunately) always 4 bytes and it usually can also read them.

I have no experience with the IrrFontTool, but search around in the forum, I remember having seen already a few threads about that.
IRC: #irrlicht on irc.freenode.net
My patches&stuff: http://www.michaelzeilfelder.de/irrlicht.htm
Games with Irrlicht: http://www.irrgheist.com/
News: http://www.reddit.com/r/irrlicht/
User avatar
CuteAlien
Admin
 
Posts: 5396
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany

Postby nburlock » Tue Oct 21, 2008 4:31 pm

The problem is happening inside the read method of IXMLReader. If I give it a four byte per char file (created by Font Tool), that function will fail. If I strip the extra 2 bytes out of each char in the XML file, then read will work. I've also noticed that Text Editor (Ubuntu's Wordpad equivalent) can't open the four byte per char XML file (it thinks it's a binary file), while Firefox can.

I went back and had Irrlicht create the simplest possible XML file, just a header and one tag, and the same problem is present. I checked the file in a Hex editor, and apart from the Unicode header in the first two bytes of the file, 0xFFFE, everything else is one character value followed by 3 zero bytes which should be legal. Again, Firefox can open this file, but Text Editor and Irrlicht can't.
nburlock
 
Posts: 17
Joined: Tue Oct 21, 2008 3:33 am
Location: Australia

Postby CuteAlien » Tue Oct 21, 2008 5:30 pm

Irrlicht checks for the following formats:

Code: Select all
const unsigned char UTF8[] = {0xEF, 0xBB, 0xBF}; // 0xEFBBBF;
const int UTF16_BE = 0xFFFE;
const int UTF16_LE = 0xFEFF;
const int UTF32_BE = 0xFFFE0000;
const int UTF32_LE = 0x0000FEFF;


So 0xfffe would be utf16_be, only if it's followed by 0000 then it's an utf32_be.

I'm not really an expert on IrrXML, but I'm often using utf32 files with Irrlicht so that's why I would be surprised to see a problem there. Which version of irrlicht are you using?
IRC: #irrlicht on irc.freenode.net
My patches&stuff: http://www.michaelzeilfelder.de/irrlicht.htm
Games with Irrlicht: http://www.irrgheist.com/
News: http://www.reddit.com/r/irrlicht/
User avatar
CuteAlien
Admin
 
Posts: 5396
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany

Postby nburlock » Wed Oct 22, 2008 12:19 am

I'm running 1.4.2

I've tracked the problem down. It starts at line 573 of CXMLReaderImpl.h:

Code: Select all
char32* data32 = reinterpret_cast<char32*>(data8);


Then, the following is defined a little further on:

Code: Select all
const int UTF32_BE = 0xFFFE0000;
const int UTF32_LE = 0x0000FEFF;


Two if statements are used to determine of the first four bytes of the file are big (line 587) or little endian (594):

Code: Select all
if (size >= 4 && data32[0] == (char32)UTF32_BE)

if (size >= 4 && data32[0] == (char32)UTF32_LE)


Both tests fail because:
Code: Select all
data32[0] = 0x0000FEFF
(char32) UTF32_BE = 0xFFFE0000
(char32) UTF32_LE = 0xFEFF


And the code goes on to determine that it's a 2 byte character file of type UTF16_LE, which is why it doesn't work. This will need someone with more experience of the system to say what needs to be fixed.
nburlock
 
Posts: 17
Joined: Tue Oct 21, 2008 3:33 am
Location: Australia

Postby CuteAlien » Wed Oct 22, 2008 3:53 am

Looks like something for hybrid (I guess he's currently in holiday as he didn't post the last days and it's holiday time in his area).

Still I don't really get it as 0xFEFF should be equal to 0x0000FEFF and so it should recognice the UTF32_LE in that 'if' clause.
IRC: #irrlicht on irc.freenode.net
My patches&stuff: http://www.michaelzeilfelder.de/irrlicht.htm
Games with Irrlicht: http://www.irrgheist.com/
News: http://www.reddit.com/r/irrlicht/
User avatar
CuteAlien
Admin
 
Posts: 5396
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany

Postby nburlock » Wed Oct 22, 2008 4:58 am

I mistyped the value of data32[0] in my previous post, it's actually 0x3C0000FFFE.

char32 is defined as an unsigned long, which is eight bytes on my 64 bit system. That explains why this isn't working, because it's comparing the first 8 bytes of the file against a four byte value. The following code demonstrates the problem:

Code: Select all
        char data8[8] = { 0xFE,0xFF,0x00,0x00,0x3C,0x00,0x00,0x00 };
        char32* data32 = reinterpret_cast<char32*>(&data8[0]);
        char16* data16 = reinterpret_cast<char16*>(&data8[0]);
        const int UTF32_BE = 0xFFFE0000;
        const int UTF32_LE = 0x0000FEFF;
       
        if (data32[0] == (char32)UTF32_BE)
            printf("big endian\n");

        if (data32[0] == (char32)UTF32_LE)
            printf("little endian\n");

So then I guess that the solution is to change char32 to some type that is four bytes long on all platforms.
nburlock
 
Posts: 17
Joined: Tue Oct 21, 2008 3:33 am
Location: Australia

Postby CuteAlien » Wed Oct 22, 2008 6:57 am

nburlock wrote:So then I guess that the solution is to change char32 to some type that is four bytes long on all platforms.


Yes, that sounds like a rather good idea :-)
IRC: #irrlicht on irc.freenode.net
My patches&stuff: http://www.michaelzeilfelder.de/irrlicht.htm
Games with Irrlicht: http://www.irrgheist.com/
News: http://www.reddit.com/r/irrlicht/
User avatar
CuteAlien
Admin
 
Posts: 5396
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany

Postby nburlock » Wed Oct 22, 2008 9:17 am

I've posted the info to the bug report, but I'm not going to post a patch - I've no idea what types are constant across all the different compilers and platforms Irrlicht supports :P
Last edited by nburlock on Thu Oct 30, 2008 3:17 am, edited 1 time in total.
nburlock
 
Posts: 17
Joined: Tue Oct 21, 2008 3:33 am
Location: Australia

Postby hybrid » Wed Oct 22, 2008 7:04 pm

Hmm, long type is no good idea, indeed. I also thought that I fixed the 64bit problems some month ago, but I'll chek when I'm home from holidays.
hybrid
Admin
 
Posts: 13970
Joined: Wed Apr 19, 2006 9:20 pm
Location: Oldenburg(Oldb), Germany

Postby rogerborg » Mon Nov 17, 2008 10:45 pm

Do we just want char32 to be an unsigned 32 bit type?

Presumably u32 is an unsigned 32 bit type, even on a 64 bit system?

Unfortunately, we can't just "typedef u32 char32", since that farks up the string<char32> type defined by CXMLReaderImpl ( operator += (const unsigned int i) is the same as operator += (T c) )

What a pretty pickle!
Please upload candidate patches to the tracker.
Need help now? IRC to #irrlicht on irc.freenode.net
How To Ask Questions The Smart Way
User avatar
rogerborg
Admin
 
Posts: 3590
Joined: Mon Oct 09, 2006 9:36 am
Location: Scotland - gonnae no slag aff mah Engleesh

Postby vitek » Tue Nov 18, 2008 5:45 am

rogerborg wrote:Do we just want char32 to be an unsigned 32 bit type?

I wouldn't think so. I think that a char32 should be a 32-bit integral type that has the same signedness as a char.

rogerborg wrote:Presumably u32 is an unsigned 32 bit type, even on a 64 bit system?

Yeah, it should.

rogerborg wrote:that farks up the string<char32> type defined by CXMLReaderImpl (operator += (const unsigned int i) is the same as operator += (T c) )

There are ways around this. One would be to just remove the operator overloading and use unique method names. Of course that breaks source compatibility for some users. Another way is to us SFINAE and remove one of the overloads when T is unsigned int.

Travis
User avatar
vitek
Bug Slayer
 
Posts: 3919
Joined: Mon Jan 16, 2006 10:52 am
Location: Corvallis, OR

Next

Return to Bug reports

Who is online

Users browsing this forum: No registered users and 0 guests