1

So I have this simple class, it takes a character array and parses it into a JSON object. It then stores that object internally and provides a getter.

class JSONContainer {
public:
    explicit JSONContainer(const char* const json) {
        std::string t(json);

        _json = new nlohmann::basic_json(json);
    }

    ~JSONContainer() {
        delete _json;
    }

    nlohmann::json *j() {
        return _json;
    }

private:
    nlohmann::json* _json;
};

If I instantiate the class with something simple like ...

{"data": [100,100]}

it works but if this string grows to the length of ~1000+ the incoming character array gets corrupted when I try to parse json to a string.

                      // incoming json {"data": [100,100,100,100,100...
std::string t(json);  // turns into "ÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ..." after this line

I have no idea what could be causing this. The one thing I though to check was the existence of the null terminator at the end of json and I always found it.

Appreciate the help!

Additional context for comments ...

This is the method calling the constructor above ...

std::shared_ptr<void> JSONSerDes::deserialize(const char *serializedData) {
    auto *ct = new JSONContainer(serializedData);
    return std::shared_ptr<void>(ct);
}

and then going up the stack to the main function, note this line deserializedData = t->deserialize(serializedData); ...

...
    // declare intermediate data
    const char* serializedData;
    std::shared_ptr<void> deserializedData;

    // for each data set size, run each test
    for (const int testSize: sizeTestsB) {
        // generate the test data, imitate data coming from python program
        PyObject* td = data(testSize);

        for (const std::unique_ptr<SerDesTest>& t: tests) {
            // log the start
            startTest(t->type(), testSize, currentTest, totalTests);

            // mark start, ser/des mark end
            start = std::chrono::steady_clock::now();

            serializedData = t->serialize(td);                                      // Python -> Redis
            checkpoints.push_back(checkpoint(t->type(), testSize,  "PythonToRedis", start));

            deserializedData = t->deserialize(serializedData);            // Redis -> Container
            checkpoints.push_back(checkpoint(t->type(), testSize,  "RedisToContainer", start));
...

This is the function used to turn the python object into a character array. dumps is a method from pythons json module. I may be misunderstanding what the lifecycle of the character array is.

const char* JSONSerDes::serialize(PyObject * pyJson) {
    // convert pyobject to boost python object
    boost::python::object d = boost::python::extract<boost::python::object>(pyJson);

    // call the dumps function and capture the return value
    return boost::python::extract<const char*>(dumps(d));
}
Tyler Weiss
  • 139
  • 1
  • 15
  • Could you show the code of the construction, including lifetime-related code of the constructor argument ? – SR_ Sep 07 '21 at 06:09
  • @SR_ Changed the post to provide more context. – Tyler Weiss Sep 08 '21 at 03:15
  • What does `t->serialize(td)` return? My guess is that it returns a pointer to a freed `std::string`, so the value managed to remain alive for short strings (small string optimization) but fails for long strings (which overflow onto the heap). The character `Í` is hex 0xCD, which is a common debugging fill value. According to [this table](https://en.wikipedia.org/wiki/Magic_number_(programming)#Magic_debug_values), it is used by MSVC for uninitialized heap data. (Note also that `shared_ptr` is going to be a memory leak, since it won't know how to destruct the `JSONContainer`.) – Raymond Chen Sep 08 '21 at 03:26
  • @RaymondChen it returns a character array, meant to be a buffer of data that can be written to a file / written to redis. I included the method in my post as well. I couldn't figure out how to create a json object on the heap so that's why i created that container object. I'm still learning as you can see so any suggestions are much appreciated – Tyler Weiss Sep 08 '21 at 03:35
  • I updated my post to include a new container that has a destructor, same behavior – Tyler Weiss Sep 08 '21 at 03:55
  • That's too complex for a test. Declare a const char * and initialize it with a long string literal. Instantiate a JSONContainer with this const char * as argument. Does it break ? – SR_ Sep 08 '21 at 04:52
  • Another test you can make, is to return a string object from JSONSerDes::serialize(PyObject*) and then use this string's c_str() as argument : does it break ? – SR_ Sep 08 '21 at 05:12
  • You can't return a character array in C++. If you try, it just returns a pointer to the first element of the array. You are probably returning a pointer to an array that is going out of scope, so by the time you use the pointer, it is invalid. – Raymond Chen Sep 08 '21 at 12:58

1 Answers1

0

I figured out what was wrong. When this handle

boost::python::object rv = dumps(d);

falls out of scope the data seems to get deleted, making me thing the "extracted" pointer was just referencing the internal data, not data copied out out as the provided type.

I just changed my serialize method to copy the data over to a new buffer I allocated on the heap.

const char* JSONSerDes::serialize(PyObject * pyJson) {
    // convert pyobject to boost python object
    boost::python::object d = boost::python::extract<boost::python::object>(pyJson);

    // capture the return value of the return value
    boost::python::object rv = dumps(d);
    const char* prv = boost::python::extract<const char*>(rv);
    size_t prvLen = strlen(prv);

    // copy and return
    char* rvBuffer = new char[prvLen + 1];
    rvBuffer[prvLen] = '\0';
    strncpy(rvBuffer, prv, strlen(prv));

    // call the dumps function and capture the return value
    return rvBuffer;
}
Tyler Weiss
  • 139
  • 1
  • 15