4

I have a C++ function that accepts strings in below format:

<WORD>: [VALUE]; <ANOTHER WORD>: [VALUE]; ...

This is the function:

std::wstring ExtractSubStringFromString(const std::wstring String, const std::wstring SubString) {

    std::wstring S = std::wstring(String), SS = std::wstring(SubString), NS;
    size_t ColonCount = NULL, SeparatorCount = NULL; WCHAR Separator = L';';

    ColonCount = std::count(S.begin(), S.end(), L':');
    SeparatorCount = std::count(S.begin(), S.end(), Separator);

    if ((SS.find(Separator) != std::wstring::npos) || (SeparatorCount > ColonCount))
    {
        // SEPARATOR NEED TO BE ESCAPED, BUT DON'T KNOW TO DO THIS.
    }

    if (S.find(SS) != std::wstring::npos)
    {
        NS = S.substr(S.find(SS) + SS.length() + 1);

        if (NS.find(Separator) != std::wstring::npos) { NS = NS.substr(NULL, NS.find(Separator)); }
        if (NS[NS.length() - 1] == L']') { NS.pop_back(); }

        return NS;
    }
    return L"";
}

Above function correctly outputs MANGO if I use it like:

ExtractSubStringFromString(L"[VALUE: MANGO; DATA: NOTHING]", L"VALUE")

However, if I have two escape separators in following string, I tried doubling like ;;, but I am still getting MANGO instead ;MANGO;:

ExtractSubStringFromString(L"[VALUE: ;;MANGO;;; DATA: NOTHING]", L"VALUE")

Here, value assigner is colon and separator is semicolon. I want to allow users to pass colons and semicolons to my function by doubling extra ones. Just like we escape double quotes, single quotes and many others in many scripting languages and programming languages, also in parameters in many commands of programs.

I thought hard but couldn't even think a way to do it. Can anyone please help me on this situation?

Thanks in advance.

Blueeyes789
  • 543
  • 6
  • 18
  • 1
    *doubling extra ones* -- Why not follow defacto convention for things like this and prepend "\" to the character if it is deemed to be a literal character instead of a delimiter? Doubling items like this makes the job harder, IMO -- when you see a "\", you know that the next character is considered a literal character with no special meaning. – PaulMcKenzie Aug 03 '17 at 14:46
  • 1
    I would suggest looking up json - why reinvent the wheel? – UKMonkey Aug 03 '17 at 14:58
  • @AlexG Then what if string contains `;;;;`? – Blueeyes789 Aug 03 '17 at 15:04
  • @AlexG Thanks! Your solution worked fine! You should post it as an answer for me to accept. :-) It looks like that it may be the escaping mechanism of many programs. ;-) – Blueeyes789 Aug 03 '17 at 15:45

2 Answers2

3

You should search in the string for ;; and replace it with either a temporary filler char or string which can later be referenced and replaced with the value.

So basically:

1) Search through the string and replace all instances of ;; with \tempFill
- It would be best to pick a combination of characters that would be highly unlikely to be in the original string.
2) Parse the string
3) Replace all instances of \tempFill with ;

Note: It would be wise to run an assert on your string to ensure that your \tempFill (or whatever you choose as the filler) is not in the original string to prevent an bug/fault/error. You could use a character such as a \n and make sure there are non in the original string.

Disclaimer: I can almost guarantee there are cleaner and more efficient ways to do this but this is the simplest way to do it.

2

First as the substring does not need to be splitted I assume that it does not need to b pre-processed to filter escaped separators.

Then on the main string, the simplest way IMHO is to filter the escaped separators when you search them in the string. Pseudo code (assuming the enclosing [] have been removed):

last_index = begin_of_string
index_of_current_substring = begin_of_string
loop: search a separator starting at last index - if not found exit loop
    ok: found one at ix
    if char at ix+1 is a separator (meaning with have an escaped separator
       remove character at ix from string by copying all characters after it one step to the left
       last_index = ix+1
       continue loop
    else this is a true separator
        search a column in [ index_of_current_substring, ix [
        if not found: error incorrect string
        say found at c
        compare key_string with string[index_of_current_substring, c [
        if equal - ok we found the key
            value is string[ c+2 (skip a space after the colum), ix [
            return value - search is finished
        else - it is not our key, just continue searching
            index_of_current_substring = ix+1
            last_index = index_of_current_substring
            continue loop

It should now be easy to convert that to C++

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252