I tried my "trick" that I've done with TinyXML to correct the strings and it seem to work also with IRRXml. My primary interest is to at least have
accented characters working (LATIN1 would be fine). So I will have something while Irrlicht support that fully. (By the way, great work on this! I just hope the DEVs will add theses into Irrlicht one day..)
As for my "trick", I will have to check how the string come out on Linux as the Irrlicht string will surely be different on that platform as on Windows. My sources is UTF8 xml files. Thanks Nalin.
If you're curious here what I've done:
- cpp Code: Select all
core::stringw winconvert(core::stringw str)
// Convert accents from loaded XML files (irrXML)
// WARNING: Tested only on windows
// might not work on Linux or other platform.
{
bool debug = false;
core::stringw textline = L"";
core::stringw text = L"";
u32 base = 0;
char test2 = ' ';
for (u32 a=0; a<str.size(); a++)
{
// Get the character first
text = str.subString(a,1);
// Then check this character directly (convert to unsigned 32bit)
base=(u32)text[0];
if (base<256) // Standard characters
{
textline+=text;
}
// All characters after 256 are ignored except thoses
// Character higher are re-aligned from the offset to match LATIN1
// Reference to the table is here:
// http://www.utf8-chartable.de/unicode-utf8-table.pl
const u32 offset=65216;
core::stringw replace = L" ";
if ((base>255) && ((base-offset)<255))
{
replace[0]=(base-offset);
textline+=replace;
}
}
return textline;
}
Then when I want to "convert" the irrXML string to a valid Irrlicht stringw:
(Here getting the stringw into "description" and sending it to a listbox for verification):
- cpp Code: Select all
description = xml->getAttributeValue("description");
list->addItem(winconvert(description).c_str());
Please note that not all of the accented characters are "parsed" as this was only used to test. I think there could be an even better way to get this, but I have to know how Irrlicht string table is build (is it matching the Ascii Latin1 table?). If it's matching I could simply put the "resulting number from the offset" and convert it to ascii back into the string. Right now I'm checking PER character and I think it work fine but not hyper efficient.
EDIT:Cleaned the code having found a better way to get the proper LATIN1 characters. The character is then replaced in the string.