19

How can I take ownership of std::string char data without copying and withoug keeping source std::string object? (I want to use moving semantics but between different types.)

I use the C++11 Clang compiler and Boost.

Basically I want to do something equivalent to this:

{
    std::string s(“Possibly very long user string”);
    const char* mine = s.c_str();

    // 'mine' will be passed along,
    pass(mine);

    //Made-up call
    s.release_data();

    // 's' should not release data, but it should properly destroy itself otherwise.
}

To clarify, I do need to get rid of std::string: further down the road. The code deals with both string and binary data and should handle it in the same format. And I do want the data from std::string, because that comes from another code layer that works with std::string.

To give more perspective where I run into wanting to do so: for example I have an asynchronous socket wrapper that should be able to take both std::string and binary data from user for writing. Both "API" write versions (taking std::string or row binary data) internally resolve to the same (binary) write. I need to avoid any copying as the string may be long.

WriteId     write( std::unique_ptr< std::string > strToWrite )
{

    // Convert std::string data to contiguous byte storage
    // that will be further passed along to other
    // functions (also with the moving semantics).
    // strToWrite.c_str() would be a solution to my problem
    // if I could tell strToWrite to simply give up its
    // ownership. Is there a way?

    unique_ptr<std::vector<char> > dataToWrite= ??

    //
    scheduleWrite( dataToWrite );
}

void scheduledWrite( std::unique_ptr< std::vecor<char> > data)
{
    …
}

std::unique_ptr in this example to illustrate ownership transfer: any other approach with the same semantics is fine to me.

I am wondering about solutions to this specific case (with std::string char buffer) and this sort of problem with strings, streams and similar general: tips to approach moving buffers around between string, stream, std containers and buffer types.

I would also appreciated tips and links with C++ design approaches and specific techniques when it comes to passing buffer data around between different API's/types without copying. I mention but not using streams because I'm shaky on that subject.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
minsk
  • 755
  • 1
  • 8
  • 16
  • 1
    You can't, because there is no way you can reclaim the memory safely. At one point you ought to release the buffer, so why not keep the string all the way down, which does this automatically ? – Alexandre C. Jul 02 '12 at 21:32
  • You better write your own string implementation – Gigi Jul 02 '12 at 21:33
  • 4
    `std::unique_ptr` would be the only thing that allows anything similar. – ildjarn Jul 02 '12 at 21:33
  • @ildjarn: that is exactly my question: how to convert unique_ptr to unique_ptr or equivalent without copying data. – minsk Jul 02 '12 at 22:07
  • @minsk : You can't; you would need to start with `std::unique_ptr` instead of `std::string` to begin with. – ildjarn Jul 02 '12 at 22:10
  • @Gigi: as i mentioned in my question std::string "comes from another code layer that works with std::string". The only thing i want to require form that layer is to pass ownership of the string. So I can impose that std::string is allocated on the heap (to have it in unique_ptr or equivalent) but not to change the data type of std::string itself. The problem is to steal that std::string data into my own "string" or whatever implementation. – minsk Jul 02 '12 at 22:11
  • 2
    @minsk : I think everyone is clear on your scenario, but you're not getting it -- _it isn't possible_. ;-] – ildjarn Jul 02 '12 at 22:23
  • 1
    Also, you are aware that `std::string` stores binary data (including embedded nulls) just fine, right? Are you _sure_ you can't just continue using `std::string`? – ildjarn Jul 02 '12 at 22:32
  • @minsk : `std::string` holds _any_ `char`, it's not limited to printable characters. If you had problems previously it's because you used `std::string::c_str()` -- when storing binary data you must use `std::string::data()` instead. – ildjarn Jul 02 '12 at 23:24
  • @ildjarn How would I check data size with embedded nulls, from [http://www.cplusplus.com/](http://www.cplusplus.com/reference/string/string/size/) , size() "Returns a count of the number of characters in the string.", "string::length is an alias of string::size, returning both the exact same value" – minsk Jul 03 '12 at 00:02
  • 3
    @minsk : Er, with `size()` or `length()` -- neither of those care about embedded nulls (and use [cppreference](http://cppreference.com/) rather than cplusplus.com if you want reliable information :-]). – ildjarn Jul 03 '12 at 00:04
  • @ildjarn That's good to know. It's not clear at all from the specs, especially why length() would do that. But still, the question was about unifying and storing passed std::string and row binary data from user. If I simply choose to use std::string for both internally, I would have the opposite problem: moving in row buffer (dynamically allocated by user) into my std::string. – minsk Jul 03 '12 at 00:51
  • What data type does the user allocate? And can you force them to change it? – ildjarn Jul 03 '12 at 01:23
  • @ildjarn Raw byte data: new char[] or any abstraction of it. The module in question should allow user to send both (null terminated) std::string and also raw binary data. I am free to choose any specific signatures as long as it allows user to pass these two types depending on their needs, ie pointer with location to heap holding bytes or std::string (std::string is also on the heap). I dont want to force any signature that would just transfer this problem from mine to their layer. – minsk Jul 03 '12 at 02:11
  • @minsk : **Again**, given that `std::string` is _not_ null-terminated, and can contain _any_ characters including non-textual and NUL characters, why not just mandate `std::string`? Anyway, if you really need this, your internal code could work in terms of `boost::variant, std::string>` ([docs](http://www.boost.org/libs/variant/)), allowing the user to pass either and losing no efficiency on your part (just requiring two implementations inside the visitor). – ildjarn Jul 03 '12 at 02:15
  • @ildjarn Just checked, you are wrong about std::string, both length and size return # of characters up to the first embedded null. Test it for yourself. And you shouldn't give answers if you didn't try it and dont know it: std::string s("\a\0b"); size_t test = s.length(); size_t testSize = s.size(); – minsk Jul 03 '12 at 02:37
  • PS: the test == testSize == 1 – minsk Jul 03 '12 at 02:39
  • 2
    @minsk : The _C++ standard_ says **you** are wrong; §21.4.4/1: "_Returns:_ A count of the number of char-like objects currently in the string." You shouldn't rely on a single implementation to have the correct behavior, you should rely on the standard for _mandated_ behavior; standard library implementations have bugs too! – ildjarn Jul 03 '12 at 02:40
  • Probable duplicate: [C++: Is it possible to detach the char\* pointer from an std::string object?](https://stackoverflow.com/q/8699212) – Peter Cordes Nov 11 '21 at 14:08
  • re: the argument in comments about `.size()` vs. `.length()` from 10 years ago: a std::string can include `0` bytes, but the `string(char*)` constructor is not a possible way to get them there, because it takes a C string. So the test is invalid, not a bug in the library. – Peter Cordes Nov 11 '21 at 14:11

3 Answers3

12

How can I take ownership of std::string char data without copying and withoug keeping source std::string object ? (I want to use moving semantics but between different types)

You cannot do this safely.

For a specific implementation, and in some circumstances, you could do something awful like use aliasing to modify private member variables inside the string to trick the string into thinking it no longer owns a buffer. But even if you're willing to try this it won't always work. E.g. consider the small string optimization where a string does not have a pointer to some external buffer holding the data, the data is inside the string object itself.


If you want to avoid copying you could consider changing the interface to scheduledWrite. One possibility is something like:

template<typename Container>
void scheduledWrite(Container data)
{
    // requires data[i], data.size(), and &data[n] == &data[0] + n for n [0,size)
    …
}

// move resources from object owned by a unique_ptr
WriteId write( std::unique_ptr< std::vector<char> > vecToWrite)
{
    scheduleWrite(std::move(*vecToWrite));
}

WriteId write( std::unique_ptr< std::string > strToWrite)
{
    scheduleWrite(std::move(*strToWrite));
}

// move resources from object passed by value (callers also have to take care to avoid copies)
WriteId write(std::string strToWrite)
{
    scheduleWrite(std::move(strToWrite));
}

// assume ownership of raw pointer
// requires data to have been allocated with new char[]
WriteId write(char const *data,size_t size) // you could also accept an allocator or deallocation function and make ptr_adapter deal with it
{
    struct ptr_adapter {
        std::unique_ptr<char const []> ptr;
        size_t m_size;
        char const &operator[] (size_t i) { return ptr[i]; }
        size_t size() { return m_size; }
    };

    scheduleWrite(ptr_adapter{data,size});
}
bames53
  • 86,085
  • 15
  • 179
  • 244
  • 3
    @minsk: It's quite reasonable to want that, unfortunately it's just not possible, because the class isn't designed to allow it. – Benjamin Lindley Jul 02 '12 at 21:34
  • @minsk: You don't know how the buffer is supposed to be released. Since there is no `release` member, you can't achieve what you want with `string`. – Alexandre C. Jul 02 '12 at 21:41
  • Those are good points: small string optimization and knowing how to release another implementation buffer. What about std::stringstream, can I move std::string into std::stringstream which exposes its buffers? Those are both std objects, and std::stringstream is aware of std::string.. I would really want to find a solution which avoids copying and allows part of the code to work with strings :( – minsk Jul 02 '12 at 22:00
  • @Alexandre: I dont want to keep std::string all the way because i want to internally unify implementation for string or binary data. Otherwise i have to keep track of two version. – minsk Jul 02 '12 at 22:05
  • @minsk : "*What about std::stringstream, can i move std::string into std stream somehow?*" Nope, `std::basic_stringbuf<>` takes its string argument by const-reference. – ildjarn Jul 02 '12 at 22:09
  • This looks a lot like the proposed `std::array_ref` and `std::string_ref` classes for C++1y. – deft_code Jul 03 '12 at 01:07
2

You could use polymorphism to resolve this. The base type is the interface to your unified data buffer implementation. Then you would have two derived classes. One for std::string as the source, and the other uses your own data representation.

struct MyData {
    virtual void * data () = 0;
    virtual const void * data () const = 0;
    virtual unsigned len () const = 0;
    virtual ~MyData () {}
};

struct MyStringData : public MyData {
    std::string data_src_;
    //...
};

struct MyBufferData : public MyData {
    MyBuffer data_src_;
    //...
};
jxh
  • 69,070
  • 8
  • 110
  • 193
  • user315052 I marked this answer up because it is a solution and thx for answering. But I would avoid this approach for a number of reasons including possible virtual inheritance hit, type safety, management issues; it imposes a data type (MyData) further down the road. May become very cumbersome. I'll have to have some sort of unique access around data_src_, on top of it i'll have to new MyData and wrap that (to pass it around including to other threads). If I have to go with a wrapper, I'd rather use less intrusive & safer approach without virtual, suggested in 1st answer by bames53 – minsk Jul 03 '12 at 00:32
2

This class take ownership of a string using move semantics and shared_ptr:

struct charbuffer
{
  charbuffer()
  {}

  charbuffer(size_t n, char c)
  : _data(std::make_shared<std::string>(n, c))
  {}

  explicit charbuffer(std::string&& str)
  : _data(std::make_shared<std::string>(str))
  {}

  charbuffer(const charbuffer& other)
  : _data(other._data)
  {}

  charbuffer(charbuffer&& other)
  {
    swap(other);
  }

  charbuffer& operator=(charbuffer other)
  {
    swap(other);
    return *this;
  }

  void swap(charbuffer& other)
  {
    using std::swap;
    swap(_data, other._data);
  }

  char& operator[](int i)
  { 
    return (*_data)[i];
  } 

  char operator[](int i) const
  { 
    return (*_data)[i];
  } 

  size_t size() const
  {
    return _data->size();
  }

  bool valid() const
  { 
    return _data;
  }

private:
  std::shared_ptr<std::string> _data;

};

Example usage:

std::string s("possibly very long user string");

charbuffer cb(std::move(s)); // s is empty now

// use charbuffer...
Gigi
  • 4,953
  • 24
  • 25
  • As far as I understand the charbuffer that is moved will then hold an empty shared_ptr (the same that is default constructed in the copy-move constructor) so when the moved charbuffer go out of scope and its destructor its called then nothing happen. – Gigi Jul 02 '12 at 22:45
  • You're 100% right, not sure what I was thinking now. :-P Sorry for the noise. – ildjarn Jul 02 '12 at 22:49
  • The move ctor `charbuffer(std::string&& str)` does not actually move from the string. You are missing call to `std::move` in the initialisation. It should be `_data(std::make_shared(std::move(str)))`. – TimeS Jul 16 '19 at 08:45