3

I'm very new to regex and C++, so please be easy on me :) !


Given a string like this one:

Input:

string s = "<ph0/>Hello StackOverflow! Thank you for helping! <ph1/>"

I want to replace the ph1 and ph2 tags for __ent_00000_ and __ent_00001_ respectively, so in the end I'd like my output to be: Output:

string s = "__ent_00000_Hello StackOverflow! Thank you for helping! __ent_00001_"



And I also would like to do the reverse i.e:

Input:

string s = "__ent_00000_Bye bye StackOverflow!  __ent_00001_"

Output:

string s = "<ph0/>Bye bye StackOverflow!  <ph1/>"


This would be for any arbitrary number of tags in a string! So the idea here is to simply replace but keep the number intact!

My idea was to regex_replace (documentation) but maybe there's another way,I'm open to any other solution that works!


Example with multiple tags:

Input:

string input = "Restaurant is closed<ph0/> <ph1/> <ph2/> <ph3/> | <ph4/> <ph5/>alert<ph6/>We are not serving meals<ph7/> <ph8/> <ph9/> <ph10/> | <ph11/> <ph12/>sorry!"

Output:

string output = "Restaurant is closed__ent_00000_ __ent_00001_ __ent_00002_ __ent_00003_ | __ent_00004_ __ent_00005_alert__ent_00006_We are not serving meals__ent_00007_ __ent_00008_ __ent_00009_ __ent_00010_ | __ent_00011_ __ent_00012_sorry!"

Thank you and have a nice day! :)

Tanveer Badar
  • 5,438
  • 2
  • 27
  • 32
José Rodrigues
  • 467
  • 3
  • 12

2 Answers2

2

You are setting yourself up for failure if your first solution to this problem involves regular expressions. Please don't! (where a simple string replace will suffice).

If each tag occurs only once, all you really need to do is call string::replace on them. Even if they occur multiple times, using boost's replace_all() algorithm.

Tanveer Badar
  • 5,438
  • 2
  • 27
  • 32
1

The first case is really a scenario C++ regex is not quite capable to handle out-of-the-box due to the fact you need to replace the capture in Group 1 with a zero left-padded number. It requires a callback, and this may be implemented like this:

template<class BidirIt, class Traits, class CharT, class UnaryFunction>
std::basic_string<CharT> regex_replace(BidirIt first, BidirIt last,
    const std::basic_regex<CharT,Traits>& re, UnaryFunction f)
{
    std::basic_string<CharT> s;

    typename std::match_results<BidirIt>::difference_type
        positionOfLastMatch = 0;
    auto endOfLastMatch = first;

    auto callback = [&](const std::match_results<BidirIt>& match)
    {
        auto positionOfThisMatch = match.position(0);
        auto diff = positionOfThisMatch - positionOfLastMatch;

        auto startOfThisMatch = endOfLastMatch;
        std::advance(startOfThisMatch, diff);

        s.append(endOfLastMatch, startOfThisMatch);
        s.append(f(match));

        auto lengthOfMatch = match.length(0);

        positionOfLastMatch = positionOfThisMatch + lengthOfMatch;

        endOfLastMatch = startOfThisMatch;
        std::advance(endOfLastMatch, lengthOfMatch);
    };

    std::sregex_iterator begin(first, last, re), end;
    std::for_each(begin, end, callback);

    s.append(endOfLastMatch, last);

    return s;
}

template<class Traits, class CharT, class UnaryFunction>
std::string regex_replace(const std::string& s,
    const std::basic_regex<CharT,Traits>& re, UnaryFunction f)
{
    return regex_replace(s.cbegin(), s.cend(), re, f);
}

std::string callback_to(const std::smatch& m) {
    stringstream s;
    char buffer[6];
    sprintf(buffer, "%05d", stoi(m.str(1)));
    s << "__ent_" << buffer << "_";
    return s.str();
}

Then, inside the main code, you may use it like

std::string s = "Restaurant is closed<ph0/> <ph1/> <ph2/> <ph3/> | <ph4/> <ph5/>alert<ph6/>We are not serving meals<ph7/> <ph8/> <ph9/> <ph10/> | <ph11/> <ph12/>sorry!";
std::regex reg_to("<ph(\\d+)/>");
std::cout << regex_replace(s, reg_to, callback_to) << std::endl;
// => Restaurant is closed__ent_00000_ __ent_00001_ __ent_00002_ __ent_00003_ | __ent_00004_ __ent_00005_alert__ent_00006_We are not serving meals__ent_00007_ __ent_00008_ __ent_00009_ __ent_00010_ | __ent_00011_ __ent_00012_sorry!

The regex is simple, <ph(\d+)/>, matching <ph, 1+ digits captured in Group 1, and then />. Inside the callback method, char buffer[6]; sprintf(buffer, "%05d", stoi(m.str(1))); will prep the number and s << "__ent_" << buffer << "_"; will get the string steam filled with the necessary data.

The opposite replacement is simple and straight-forward:

std::string t = "__ent_00000_Bye bye StackOverflow!  __ent_00001_";
std::regex reg_from("__ent_0*(\\d+)_");
std::cout << std::regex_replace(t, reg_from, "<ph$1/>") << std::endl;
// => <ph0/>Bye bye StackOverflow!  <ph1/>

The __ent_0*(\d+)_ pattern matches __ent_, then zero or more 0 chars, then captures 1+ digits into Group 1 and then _ is matched. The replacement is <ph + Group 1 value, and /> text.

See the regex demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563