Support auto utf cjk line wrapping

Post those lines of code you feel like sharing or find what you require for your project here; or simply use them as tutorials.
Post Reply
benau
Posts: 3
Joined: Fri Aug 28, 2015 9:27 am

Support auto utf cjk line wrapping

Post by benau »

Hi

I recently discover an issue in a game named supertuxkart which use 1.8 irrlicht, having problems printing chinese characters.

When chinese is too long to fit in the display area, irrlicht doesn't know how to split it as chinese/jap/kr normally don't have spaces between char.

This is what's look like before, forget the patch in that page, that was not good enough.
https://github.com/supertuxkart/stk-code/issues/2110

Now here is what's look like with a better patch. it handles most utf breaking rules. Don't use that patch too as it is patched against the previous one.
https://github.com/supertuxkart/stk-code/issues/2297

I email niko about the patch, he said it's ok and should be posted here too.

Also niko i add 1 more breaking condition compared to the patch i sent, details below.

Comments and testing are welcome.

Patch:

Index: include/utfwrapping.h
===================================================================
--- include/utfwrapping.h (revision 0)
+++ include/utfwrapping.h (working copy)
@@ -0,0 +1,104 @@
+// Copyright (C) 2015 Ben Au
+// This file is part of the "Irrlicht Engine".
+// For conditions of distribution and use, see copyright notice in irrlicht.h
+
+namespace irr
+{
+namespace gui
+{
+
+//Here a list of characters that don't start or end a line for chinese/japanese/korean
+//Only commonly use and full width characters are included
+//You should use full width characters when writing CJK, like using "。"instead of a "."
+//You can add more characters if needed
+//For full list please visit http://webapp.docx4java.org/OnlineDemo/ ... nsoku.html
+
+bool UtfNoStarting (wchar_t c)
+{
+ switch (c)
+ {
+ case 12293: //々
+ return true;
+ case 12297: //〉
+ return true;
+ case 12299: //》
+ return true;
+ case 12301: //」
+ return true;
+ case 65373: //}
+ return true;
+ case 12309: //〕
+ return true;
+ case 65289: //)
+ return true;
+ case 12303: //』
+ return true;
+ case 12305: //】
+ return true;
+ case 12311: //〗
+ return true;
+ case 65281: //!
+ return true;
+ case 65285: //%
+ return true;
+ case 65311: //?
+ return true;
+ case 65344: //`
+ return true;
+ case 65292: //,
+ return true;
+ case 65306: //:
+ return true;
+ case 65307: //;
+ return true;
+ case 65294: //.
+ return true;
+ case 12290: //。
+ return true;
+ case 12289: //、
+ return true;
+ default:
+ return false;
+ }
+}
+
+bool UtfNoEnding (wchar_t c)
+{
+ switch (c)
+ {
+ case 12296: //〈
+ return true;
+ case 12298: //《
+ return true;
+ case 12300: //「
+ return true;
+ case 65371: //{
+ return true;
+ case 12308: //〔
+ return true;
+ case 65288: //(
+ return true;
+ case 12302: //『
+ return true;
+ case 12304: //【
+ return true;
+ case 12310: //〖
+ return true;
+ default:
+ return false;
+ }
+}
+
+//Helper function
+
+bool breakable (wchar_t c)
+{
+ if ((c > 12287 && c < 40960) || //Common CJK words
+ (c > 44031 && c < 55204) || //Hangul
+ (c > 63743 && c < 64256) || //More Chinese
+ c == 173 || c == L' ' || c == 0) //Soft hyphen and white space
+ return true;
+ return false;
+}
+} // end namespace core
+} // end namespace irr
Index: source/Irrlicht/CGUIStaticText.cpp
===================================================================
--- source/Irrlicht/CGUIStaticText.cpp (revision 5121)
+++ source/Irrlicht/CGUIStaticText.cpp (working copy)
@@ -10,6 +10,7 @@
#include "IGUIFont.h"
#include "IVideoDriver.h"
#include "rect.h"
+#include "utfwrapping.h"

namespace irr
{
@@ -350,88 +351,77 @@
lineBreak = true;
c = '\0';
}
+ word += c;

- bool isWhitespace = (c == L' ' || c == 0);
- if ( !isWhitespace )
+ if (word.size())
{
- // part of a word
- word += c;
- }
+ const s32 wordlgth = font->getDimension(word.c_str()).Width;

- if ( isWhitespace || i == (size-1))
- {
- if (word.size())
- {
- // here comes the next whitespace, look if
- // we must break the last word to the next line.
- const s32 whitelgth = font->getDimension(whitespace.c_str()).Width;
- const s32 wordlgth = font->getDimension(word.c_str()).Width;
-
- if (wordlgth > elWidth)
+ if (length && (length + wordlgth > elWidth))
+ { // too long to fit inside
+ // break to next line
+ unsigned int where = 1;
+ while (where != line.size()) //Find the first breakable position
{
- // This word is too long to fit in the available space, look for
- // the Unicode Soft HYphen (SHY / 00AD) character for a place to
- // break the word at
- int where = word.findFirst( wchar_t(0x00AD) );
- if (where != -1)
+ if (UtfNoEnding(Text) || //Prevent unsuitable character from displaying
+ UtfNoStarting(Text) || //at the position of starting or ending of a line
+ UtfNoStarting(Text[i + 1 - where])) //Handle case which more than one non-newline-starting characters are together
{
- core::stringw first = word.subString(0, where);
- core::stringw second = word.subString(where, word.size() - where);
- BrokenText.push_back(line + first + L"-");
- const s32 secondLength = font->getDimension(second.c_str()).Width;
-
- length = secondLength;
- line = second;
+ where++;
+ continue;
}
+ if (breakable(Text))
+ break;
else
- {
- // No soft hyphen found, so there's nothing more we can do
- // break to next line
- if (length)
- BrokenText.push_back(line);
- length = wordlgth;
- line = word;
- }
+ where++;
}
- else if (length && (length + wordlgth + whitelgth > elWidth))
+ if (where != line.size())
{
- // break to next line
- BrokenText.push_back(line);
- length = wordlgth;
- line = word;
+ core::stringw first = line.subString(0, line.size() + 1 - where);
+ core::stringw second = line.subString(line.size() + 1 - where , where - 1);
+ if (first.lastChar() == wchar_t(0x00AD))
+ BrokenText.push_back(first + L"-"); //Print the Unicode Soft HYphen (SHY / 00AD) character
+ else
+ BrokenText.push_back(first);
+ const s32 secondLength = font->getDimension(second.c_str()).Width;
+
+ length = secondLength + wordlgth;
+ line = second + word;
}
+ else if (breakable(c) || UtfNoEnding(c) || UtfNoStarting(c)) //Unusual case
+ {
+ BrokenText.push_back(line); //Force breaking to next line too if last word is breakable,
+ line = word; //it happens when someone writes too many non-newline-starting
+ length = wordlgth; //chars in the first line, so we ignore the rules.
+ }
+ // No suitable place to break words, so there's nothing more we can do
+ // break to next line
else
{
- // add word to line
- line += whitespace;
line += word;
- length += whitelgth + wordlgth;
+ length += wordlgth;
}
-
- word = L"";
- whitespace = L"";
}
-
- if ( isWhitespace )
+ else
{
- whitespace += c;
+ line += word;
+ length += wordlgth;
}

- // compute line break
- if (lineBreak)
- {
- line += whitespace;
- line += word;
- BrokenText.push_back(line);
- line = L"";
- word = L"";
- whitespace = L"";
- length = 0;
- }
+ word = L"";
+
}
+ // compute line break
+ if (lineBreak)
+ {
+ line += word;
+ BrokenText.push_back(line);
+ line = L"";
+ word = L"";
+ length = 0;
+ }
}

- line += whitespace;
line += word;
BrokenText.push_back(line);
}
CuteAlien
Admin
Posts: 9628
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: Support auto utf cjk line wrapping

Post by CuteAlien »

Hi benau, thanks for the patch. I did receive it in chat yesterday, you were just already gone again by the time I noticed :-)
I'll look at it - but will probably take some time as I got already too many other bugs&patches in the loop.

Also this is related to something I want to rewrite since half a decade *sigh* (textwrapping should be in it's own class - right now every element handles this on it's own - so you fix one here - it still would be broken in others with wrapping).
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
benau
Posts: 3
Joined: Fri Aug 28, 2015 9:27 am

Re: Support auto utf cjk line wrapping

Post by benau »

Thanks for quick reply.

Feel free to tell if there's anything i can help
benau
Posts: 3
Joined: Fri Aug 28, 2015 9:27 am

Re: Support auto utf cjk line wrapping

Post by benau »

Btw, you said textwrapping should be in it own class, is it what you like?

Brokentext pushback......

If lgth> width

Dowrapping....,

With dowrapping defination outside of irr gui class?
And put my patch code in dowrapping, so it can be used again in another place?

As i see, there's only 1 more file that use wrapping bool,cguieditbox.

Is that correct?
CuteAlien
Admin
Posts: 9628
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: Support auto utf cjk line wrapping

Post by CuteAlien »

Nothing for you to worry about. It's a larger topic - textwrapping is needed by several gui-elements and could even be used in some text-scenenodes. And current solution is also somewhat strange as in some gui-elements getText can return different line-ends than had been used in setText.
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
Post Reply