I am parsing a text file using boost regex in C++. I am looking for '\' characters from the file. This file also contains some unicode '\u' characters as well. So, is there a way to separate out '\' and '\u' character. Following is content of test.txt that I am parsing
"ID": "\u01FE234DA - this is id ",
"speed": "96\/78",
"avg": "\u01FE234DA avg\83"
Following is my try
#include <boost/regex.hpp>
#include <string>
#include <iostream>
#include <fstream>
using namespace std;
const int BUFSIZE = 500;
int main(int argc, char** argv) {
if (argc < 2) {
cout << "Pass the input file" << endl;
exit(0);
}
boost::regex re("\\\\+");
string file(argv[1]);
char buf[BUFSIZE];
boost::regex uni("\\\\u+");
ifstream in(file.c_str());
while (!in.eof())
{
in.getline(buf, BUFSIZE-1);
if (boost::regex_search(buf, re))
{
cout << buf << endl;
cout << "(\) found" << endl;
if (boost::regex_search(buf, uni)) {
cout << buf << endl;
cout << "unicode found" << endl;
}
}
}
}
Now when I use above code it prints following
"ID": "\u01FE234DA - this is id ",
(\) found
"ID": "\u01FE234DA - this is id ",
unicode found
"speed": "96\/78",
(\) found
"avg": "\u01FE234DA avg\83"
(\) found
"avg": "\u01FE234DA avg\83"
unicode found
Instead of I want following
"ID": "\u01FE234DA - this is id ",
unicode found
"speed": "96\/78",
(\) found
"avg": "\u01FE234DA avg\83"
(\) and unicode found
I think the code is not able to distinguish '\' and '\u' separately but I am not sure where to change what.