[no bug, but changed now] string::split, another error

You discovered a bug in the engine, and you are sure that it is not a problem of your code? Just post it in here. Please read the bug posting guidelines first.
Post Reply
manni63
Posts: 12
Joined: Tue Sep 27, 2016 6:23 am

[no bug, but changed now] string::split, another error

Post by manni63 »

When the flag ignoreEmptyTokens is set to false and the part of the string after the last separator is empty, no empty string is pushed back to the array.
I changed the last three lines of code as follows:

Code: Select all

 
 //    if ((used - 1) > lastpos)
  //    changed to:      
        if (!ignoreEmptyTokens ||(used - 1) > lastpos)
            ret.push_back(string<T,TAlloc>(&array[lastpos], (used - 1) - lastpos));
        return ret.size()-oldSize;
 
and it seems to work correctly, but I didn't test any other case.
CuteAlien
Admin
Posts: 9628
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: string::split, another error

Post by CuteAlien »

Thanks, I'm going to check it!
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
CuteAlien
Admin
Posts: 9628
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: string::split, another error

Post by CuteAlien »

Doesn't seem to be a bug. It can't have an empty string after an separator as separator cause the code in the loop above to create a new token. And the only way to get an empty string should be to have a string which only contains of a separator. Thought maybe I didn't understand your bug in this case - hard to tell without example. I made a few, but seemed fine with those. Please provide always examples for such bug-reports, that makes things easier to reproduce.

I rewrote the function now anyway because I didn't like how it adds the separator to the token after the separator. Because when the flags to keep empty strings and keep seperators had been set we could end up with 2 tokens for a single character. And documentation is kinda unspecified here anyway - just saying when keeping delimiters then merging the tokens will end up being the original string. Which is still the case with my new implementation.

New test can be found here: https://bitbucket.org/mzeilfelder/irr-p ... ew-default

edit: Keeping seperator is probably a useless feature anyway. Generally tokenizer implementations just kick it out.
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
manni63
Posts: 12
Joined: Tue Sep 27, 2016 6:23 am

Re: [no bug, but changed now] string::split, another error

Post by manni63 »

I think there was a missunderstanding of the problem, sorry for my bad explanation.
With the new code from trunk I used the following test code:

Code: Select all

 
#include <irrlicht.h>
#include <iostream>
 
using namespace irr;
using namespace core;
 
int main(int argc, char *argv[])
 
{
    array<string<c8> > split1;
    array<string<c8> > split2;
 
    u32 splitsize1;
    u32 splitsize2;
 
    string<c8> str1 ("1;2;3;4;5");
    string<c8> str2 ("1;2;3;4;");
 
    splitsize1=str1.split(split1,";",1,false,false);
    splitsize2=str2.split(split2,";",1,false,false);
 
    std::cout <<"Array split1 contains "<<splitsize1<< " Elements\n";
    for (s32 i=0; i<splitsize1;i++)
    {
        std::cout <<"Index "<< i << ": Value: "<< split1[i].c_str()<<"\n";
    }
 
    std::cout <<"Array split2 contains "<<splitsize2<< " Elements\n";
    for (s32 i=0; i<splitsize2;i++)
    {
        std::cout <<"Index "<< i << ": Value: "<< split2[i].c_str()<<"\n";
    }
}
 
 
with the followingoutput result:

Array split1 contains 5 Elements
Index 0: Value: 1
Index 1: Value: 2
Index 2: Value: 3
Index 3: Value: 4
Index 4: Value: 5
Array split2 contains 4 Elements
Index 0: Value: 1
Index 1: Value: 2
Index 2: Value: 3
Index 3: Value: 4

The first string splits correctly, but in the second string I get only 4 substrings, the last one, that is empty, is missing.
The reason for this is, that the case of ignoreEmptyTokens set to false is not considered for the last pushback outside of the loop.

I think the code should be as follows:

Code: Select all

 
        u32 split(container& ret, const T* const delimiter, u32 countDelimiters=1, bool ignoreEmptyTokens=true, bool keepSeparators=false) const
        {
            if (!delimiter)
                return 0;
 
            const u32 oldSize=ret.size();
 
            u32 tokenStartIdx = 0;
            for (u32 i=0; i<used; ++i)
            {
                for (u32 j=0; j<countDelimiters; ++j)
                {
                    if (array[i] == delimiter[j])
                    {
                        if ( keepSeparators )
                        {
                            ret.push_back(string<T,TAlloc>(&array[tokenStartIdx], i+1 - tokenStartIdx));
                        }
                        else
                        {
                            if (i - tokenStartIdx > 0)
                                ret.push_back(string<T,TAlloc>(&array[tokenStartIdx], i - tokenStartIdx));
                            else if ( !ignoreEmptyTokens )
                                ret.push_back(string<T,TAlloc>());
                        }
                        tokenStartIdx = i+1;
                        break;
                    }
                }
            }
            if ((used - 1) > tokenStartIdx)
                ret.push_back(string<T,TAlloc>(&array[tokenStartIdx], (used - 1) - tokenStartIdx)); 
// change:
            else if ( !ignoreEmptyTokens )
                ret.push_back(string<T,TAlloc>());
// end change
            return ret.size()-oldSize;
        }
 
Now I get the following output:

Array split1 contains 5 Elements
Index 0: Value: 1
Index 1: Value: 2
Index 2: Value: 3
Index 3: Value: 4
Index 4: Value: 5
Array split2 contains 5 Elements
Index 0: Value: 1
Index 1: Value: 2
Index 2: Value: 3
Index 3: Value: 4
Index 4: Value:

Now the second string is correctly split into 5 substrings, where the last one is empty.
CuteAlien
Admin
Posts: 9628
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: [no bug, but changed now] string::split, another error

Post by CuteAlien »

OK, I get what you mean. Basically I think of the seperator as the end of a token while you think of it as a splitter (so I didn't consider there to be any token after it but only before). I guess your interpretation makes also sense given the name of the function. Maybe I should make that yet another parameter.
I'll rewrite the implementation once more (and also put the seperators into their own tokens when they should be kept which maybe makes even sense than adding them to the start or end of one token).
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
manni63
Posts: 12
Joined: Tue Sep 27, 2016 6:23 am

Re: [no bug, but changed now] string::split, another error

Post by manni63 »

It is the behaviour of split routines I know from other libraries with string manipulation (e.g. Qt), so I expected that this will work in the same way.
I don't know if it makes sense to introduce a new parameter.
CuteAlien
Admin
Posts: 9628
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: [no bug, but changed now] string::split, another error

Post by CuteAlien »

Another paramter shouldn't cost too much as it's only checked once at the end. Thought finding a good name for it will be hard :-)
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
manni63
Posts: 12
Joined: Tue Sep 27, 2016 6:23 am

Re: [no bug, but changed now] string::split, another error

Post by manni63 »

I think less about the effort to test a flag, but about the user friendliness of the interface. As you mentioned, it is not only hard to find a name, but also to explain its effect to the user.
As I see so far, the only difference between this two split interpretations is an empty string at the end of an array, that can easily be ignored or deleted if it is not used.
I think it is more important get an intuitive interface and a documentation easy to understand.
CuteAlien
Admin
Posts: 9628
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: [no bug, but changed now] string::split, another error

Post by CuteAlien »

Good point. Thought I intuitively expected it work the other way... ;-)
That wouldn't matter really, but unfortunately this also breaks now downward compatiblity when I change it without flag :-(
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
manni63
Posts: 12
Joined: Tue Sep 27, 2016 6:23 am

Re: [no bug, but changed now] string::split, another error

Post by manni63 »

I agree completely with you concerning the downward compatibility and I have the following thoughts about this:
If the flag ignoreEmptyTokens is set to true, all is compatible with the old version,
If the flag ignoreEmptyTokens is set to false, the old version didn't work at all (see my post from 10/27) and I'm not sure if it makes sense to keep here compatibility.
So the only compatibility problem concerns the flag keepSeparators, and I don't know what to do here. I never had a case so far where this is useful.
CuteAlien
Admin
Posts: 9628
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: [no bug, but changed now] string::split, another error

Post by CuteAlien »

No problem, I've changed (and will change some more) the keepSeparators anyway. So yeah, probably good idea, thanks.
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
CuteAlien
Admin
Posts: 9628
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: [no bug, but changed now] string::split, another error

Post by CuteAlien »

Sorry, couldn't bring myself to re-code it once more back then immediately and then new todo's kept piling up fast and I never got back to this one :-(
Just in case you ever look back at this thread: It's now changed according to your wishes (I hope) in svn trunk [r6007].
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
Post Reply