0

For our application we have the following scenario:

Firstly, we get a large amount of data (on cases, this can be more than 100MB) through a 3rd party API into our class via a constructor, like:

class DataInputer
{
public:
    DataInputer(int id, const std::string& data) : m_id(id), m_data(data) {}
    int handle() { /* Do some stuff */ } 
private:
    std::string m_id;
    std::string m_data;
};

The chain of invocation going into our class DataInputer looks like:

int dataInputHandler()
{
    std::string inputStringFromThirdParty = GiveMeStringFrom3rdPartyMagic(); // <- 1.
    int inputIntFromThirdParty = GiveMeIntFrom3rdPartyMagic();

    return DataInputer(inputIntFromThirdParty, inputDataFromThirdParty).handle();
}

We have some control over how the dataInputHandler handles its string (Line marked with 1. is the place where the string is created as an actual object), but no control for what GiveMeStringFrom3rdPartyMagic actually uses to provide it (if it's important for anyone, this data is coming from somewhere via a network connection) so we need. As a consolation we have full control over the DataInputer class.

Now, what the application is supposedly doing is to hold on to the string and the associated integer ID till a later point when it can send to another component (via a different network connection) provided the component provides a valid ID (this is the short description). The problem is that we can't (don't want to) do it in the handle method of the DataInputer class, it would block it for an unknown amount of time.

As a rudimentary solution, we were thinking on creating an "in-memory" string store for all the various strings that will come in from all the various network clients, where the bottom line consists of a:

std::map<int, std::string> idStringStore;

where the int identifies the id of the string, the string is actually the data and DataInputer::handle does something like idStringStore.emplace(m_id, m_data);:

The problem is that unnecessarily copying a string which is on the size of 100s of megabytes can be a very time consuming process, so I would like to ask the community if they have any recommendations or best practices for scenarios like this.

An important mention: we are bound to C++11 for now :(

Ferenc Deak
  • 34,348
  • 17
  • 99
  • 167
  • Sounds like what you need is to *move* the string into the map and then have the class hold a "reference" to the string. ("reference" could be a `string_view` so you keep copy semantics) – NathanOliver Jan 06 '20 at 14:09
  • @NathanOliver sorry, I forgot to mention initially we are bound to C++11 for now – Ferenc Deak Jan 06 '20 at 14:14
  • No worries. You can replace `string_view` with just holding a pointer or iterator to the string as well if you want to keep the default copy semantics. Moving is a C++11 feature so you still get that performance gain. – NathanOliver Jan 06 '20 at 14:15
  • If the goal is to eliminate the copies of `inputDataFromThirdParty`, its feasible, but both `DataInputer::DataInputer` and `dataInputHandler` would have to change to properly push the data through via move-semantics. – WhozCraig Jan 06 '20 at 14:15
  • Will not the references to the string go invalid when `DataInputer(inputIntFromThirdParty, inputDataFromThirdParty).handle();` is done and the `dataInputHandler` is done? (method call of a temporary object?) – Ferenc Deak Jan 06 '20 at 14:17
  • Not if you store the string in the map and have the reference point to that. You'd have to rejigger the work flow a little but that's not a big deal. Encapsulating it in a `make_dataInputer`function would work nicely. – NathanOliver Jan 06 '20 at 14:24
  • @NathanOliver Thanks, let me see what can I come up with :) – Ferenc Deak Jan 06 '20 at 14:29

1 Answers1

0

Use move-semantics to pass the 3rd-party data into your DataInputer constructor. The std::move here is redundant but makes the intention clear to the reader:

class DataInputer
{
public:
    DataInputer(int id, std::string&& data) : m_id(id), m_data(std::move(data)) {}
    int handle() { /* Do some stuff */ } 
private:
    std::string m_id;
    std::string m_data;
};

And pass GiveMeStringFrom3rdPartyMagic() directly as an argument to the constructor without first copying into inputStringFromThirdParty.

int dataInputHandler()
{
    int inputIntFromThirdParty = GiveMeIntFrom3rdPartyMagic();

    return DataInputer(inputIntFromThirdParty, GiveMeStringFrom3rdPartyMagic()).handle();
}

Of course, you can use a std::map or any other STL container that supports move-semantics. The point is that move-semantics, generally, is what you're looking to use to avoid needless copies.

Matthew M.
  • 392
  • 2
  • 11
  • There is an issue with the DataInputer being constructed as a temporary that immediately goes out of scope when `handle()` returns. That's a different issue than you asked about, however. – Matthew M. Jan 06 '20 at 14:41