4

Some data files that I need to read / parse have headers in the style:

level0var = value0
level0var.level1field = value1 
level0var.level1array[11].level2field = value2
...

In other words, they look like nested C-style structs and arrays, but none of these are declared in the header: I need to infer the structure as I read.

My plan was to use the famous nlohmann::json library to store this, because its flexibility would allow me to change the structure of the data during parsing, and save the header in a more readable form.

I read the assignments in as lhs = rhs, and both of those are strings. Given json header; to deal with the unknown, variable depth of the structures I want to do something like

// std::string lhs = "level0var.level1field.level2field";
// std::string rhs = "value2"; 
auto current_level = header;
while ( lhs != "" ) {
   auto between = lhs.substr ( 0, lhs.find_first_of ( "." ) );
   lhs.erase ( 0, lhs.find_first_of ( "." ) + 1 );
   if ( lhs == between ) break;
   current_level [ between ] = json ( {} );
   current_level = current_level [ between ];
}
current_level = rhs;
std::cout << std::setw(4) << header;

for every line that has at least 1 struct level (leaving the arrays for now).

The strange thing is that using this loop, the only thing the last line returns is null, whereas when I use

header [ "level0var" ] [ "level1field" ] [ "level2field" ] = rhs;
std::cout << std::setw(4) << header;

it gives the expected result:

{
    "level0var": {
        "level1field": {
           "level2field": "value2"
        } 
    }
}

Is there a way to build this hierarchical structure iteratively (without supplying it as a whole)? Once I know how to do structs, I hope the arrays will be easy!

The example I made at work does not run on coliru (which does not have the JSON library I guess).

alle_meije
  • 2,424
  • 1
  • 19
  • 40
  • json is a recuresive structure. So you can build the inner json into a `nlohmann::json` object, and then add it to another `nlohmann::json` object that represent the outer one. – wohlstad Apr 05 '22 at 09:18
  • You probably want `auto& current_level = header;` instead of mutating a copy. – Jarod42 Apr 05 '22 at 09:31
  • Ah thanks @wohlstad -- does that mean that the 'forward declaration' way using `current_level [ between ] = json ( {} );` will not work? As I said in the post, it is possible to build the header structure myself first (as a `map` or something), but I was hoping not to have to do that. – alle_meije Apr 05 '22 at 09:31
  • @Jarod42 thanks for the tip! I just tried it, but that results in a core dump: `terminate called after throwing an instance of 'nlohmann::detail::type_error'` `what(): [json.exception.type_error.305] cannot use operator[] with a string argument with string` `Aborted (core dumped)` – alle_meije Apr 05 '22 at 09:35
  • What type is `current_level` ? If it's not pointer-like, that assignment probably isn't doing what you want and expect. Specifically, if it's a reference it can't be reseated, and if it's an object then every assignment is a copy assignment. – Useless Apr 05 '22 at 10:31
  • 1
    Actually, talking about "pointer-like" semantics, it looks like you should just be using this library's `json_pointer` type anyway, as it directly supports arbitrary nesting and key concatenation. – Useless Apr 05 '22 at 10:37
  • That may be a 'cleaner' way to do it! Do you think that would be able to deal with the array situation as well? I still have a feeling that involves some 'ugly' code... – alle_meije Apr 06 '22 at 07:48

2 Answers2

4

This is indeed trivial to do with the json_pointer, using the correct operator[] overload and the json_pointer::get_unchecked() function that already does all this work for you.

The only effort is to convert your .-separated key into the /-separated path it expects.

#include <nlohmann/json.hpp>

#include <algorithm>
#include <iostream>
#include <string>

using json = nlohmann::json;

std::string dots_to_path(std::string key)
{
    std::replace(key.begin(), key.end(), '.', '/');
    key.insert(0, 1, '/');
    return key;
}

int main() {

    json header;

    std::string lhs = "level0var.level1field.level2field";
    std::string rhs = "value1";

    json::json_pointer key{dots_to_path(lhs)};
    header[key] = rhs;
    std::cout << std::setw(4) << header;
}

For future reference, the linked code was extended to transform keys including array indices, like level0var.level1field[1].level2field -> level0var/level1field/1/level2field.

At this stage, it might be cleaner to tokenize the original string and simply append each token with json_pointer::operator/=, but since it's not in the original question scope I'll leave that as an exercise for the reader.

Useless
  • 64,155
  • 6
  • 88
  • 132
  • 1
    Wow. I did not realise what you can do with the path separator in the key. Just found out that if you replace the `.` *and* the `[` and `]`, the numbers are automatically converted to array indices. Pure magic. Only watch out for double slashes. The new code https://ideone.com/93mgLa creates an array, even puts a 'null' in front for element 0! – alle_meije Apr 06 '22 at 08:44
1

I managed to achieve what I understood you wanted with this code:

using namespace nlohmann;
    
void main()
{
    json header;
    std::string lhs = "level0var.level1field.level2field";
    std::string rhs = "value2"; 
    json * current_level = &header;
    while (lhs != "") {
        auto between = lhs.substr(0, lhs.find_first_of("."));
        lhs.erase(0, lhs.find_first_of(".") + 1);
        (*current_level)[between] = json({});
        current_level = &((*current_level)[between]);
        if (lhs == between) break;
    }
    *current_level = rhs;
    std::cout << std::setw(4) << header;
}

A few notes:

  1. I haven't managed to use nlohmann::json_pointer although it seemed useful at first.
    Instead I used a "normal" raw pointer. Some pointer semantics is required in order to refer to the exiting json structure you have.

  2. I moved the condition for exiting the loop to the end of the loop (otherwise the innermost level was missing).

  3. To be honest, I am not sure my solution is the best one. Messing with raw pointers in this case is something that has to be done very carefully. But you can try it if you like.

wohlstad
  • 12,661
  • 10
  • 26
  • 39
  • Amazing, this is what I was after indeed: the while-loop route produces the same result as the manual input. But: the while version works for `level3field` _etc_. as well! So what I overlooked was that assignment makes a copy rather than a reference. As messing with pointers goes, this is acceptable :). I'll try sorting the arrays now. – alle_meije Apr 05 '22 at 13:01
  • Sorry, I did change the accepted answer from this to the path strings one. The json library tuns path strings with number into arrays. I hope that will save us both a lot of work - thanks again! – alle_meije Apr 06 '22 at 08:32
  • Sure. It's an elegant solution. – wohlstad Apr 06 '22 at 17:14