I know that this may seem a strange question, but, the input of my algorithm is a stream of JSON strings composed by syntactically correct JSON blocks, at least for all blocks but this. A block in the stream has this structure:
{
"comment",
{
"author":"X",
"body":"Hello world",
"json_metadata":"{\"tags\":[\"hello, world\"],\"community\":\"programming\",\"app\":\"application_for_publish\"}",
"parent_author":"waggy6",
"parent_permlink":"programming_in_c",
"permlink":"re-author-programming_in_c-20180916t035418244z",
"title":"some_title"
}
}
So, everything works fine, up to arriving to this block, that I don't know how to parse. The field that gives me troubles is the "json_metadata"
one:
{
"comment",
{
"author": "Y",
"body": "Hello another world!",
"json_metadata": "\"{\\\"tags\\\":[\\\"hello\\\",\\\"world\\\"],\\\"app\\\":\\\"application_for_publish_content\\\",\\\"format\\\":\\\"markdown+html\\\",\\\"pollid\\\":\\\"p_id\\\",\\\"image\\\":[\\\"https://un.useful.url/path/image.png\\\"]}\"",
"parent_author": "",
"parent_permlink": "helloworld",
"permlink": "hello_world_programming_in_c-2017319t94958596z",
"title": "Hello World in C!"
}
}
It's like this field has been parsed twice, when the data has been acquired.
I'm using rapidjson
as parsing tool, in C++.
The piece of code related to this problem is the following:
static std::string parseNode(const Value &node){
string toret = "";
if (node.IsBool()) toret = toret + to_string(node.GetBool());
else if (node.IsInt()) toret = toret + to_string(node.GetInt());
else if (node.IsUint()) toret = toret + to_string(node.GetUint());
else if (node.IsInt64()) toret = toret + to_string(node.GetInt64());
else if (node.IsUint64()) toret = toret + to_string(node.GetUint64());
else if (node.IsDouble()) toret = toret + to_string(node.GetDouble());
else if (node.IsString()) toret = toret + node.GetString();
else if (node.IsArray()) toret = toret + parseArray(node); // parse the given array
else if (node.IsObject()) toret = toret + parseObject(node); // parse the given object
return toret;
}
...
std::string search_member(Value& js, std::string member){
Value::ConstMemberIterator itr = js.FindMember(member.c_str());
std::string els = "";
if(itr != js.MemberEnd())
els = parseNode(itr->value) + " ";
return els;
}
...
// op_struct type is Value*; it is the Value* that refers to all the fields of the block
std::string json_m = (*op_struct)["json_metadata"];
std::string elements = "";
if((json_m.compare("") != 0) && (json_m.compare("{}") != 0) && (json_m.compare("\"\"") != 0)){
Document js;
js.Parse<0>(json_m.c_str());
elements = elements + search_member(js, "community") + search_member(js, "tags") + search_member(js, "app");
}
Comment * comment = new Comment(title + " " + body + " " + elements, auth);
...
The problem occurs in the js.FindMember(member.c_str());
row, in the search_member()
function, because js.Parse<0>(json_m.c_str());
recognizes that the input is a valid JSON, and indeed it is valid, it refers to:
"\"{\\\"tags\\\":[\\\"hello\\\",\\\"world\\\"],\\\"app\\\":\\\"application_for_publish_content\\\",\\\"format\\\":\\\"markdown+html\\\",\\\"pollid\\\":\\\"p_id\\\",\\\"image\\\":[\\\"https://un.useful.url/path/image.png\\\"]}\""
But, then, the result of this computation, is the string:
"{\"tags\":[\"hello\",\"world\"],\"app\":\"application_for_publish_content\",\"format\":\"markdown+html\",\"pollid\":\"p_id\",\"image\"
And for this reason, the FindMember()
function can not find any tags
, community
or app
field, since it is recognized as a string.
My question is: is there any way (different by just skipping this block) with which I can recognize such special cases?