4

A common pattern in C programming involves a variable length structure such as:

typedef struct {
    int length;
    char data[1];
} MyBuffer;

where data isn't literally an array of [1]. Instead, its variable length is defined by length.

The structure is allocated like:

MyBuffer* pBuff = malloc(sizeof(MyBuffer) + 100);

I want to use the same pattern, but in C++ code, so using new/delete instead of malloc/free

Can this same pattern be used in C++ code? How?

EDIT Since several answers and comments are suggesting I switch to std::vector:

I am supplied with the structure definition MyBuffer from a 3rd party, C library.
In my C++ app, I need to allocate the buffer and call a function in the C library.

On "my side" of the boundary, I prefer to keep things C++, and allocate this structure the C++-way, but I still need to pass it to a C-library that won't understand anything like a std::vector.

abelenky
  • 63,815
  • 23
  • 109
  • 159
  • 2
    In C++, you don't do this, you just use a std::vector (or if you are looking for string functionality, std::string) – IdeaHat Sep 06 '13 at 15:56
  • 2
    This is an old pattern: C99 lets you write `typedef struct {int length;char data[];} MyBuffer;` – Sergey Kalinichenko Sep 06 '13 at 15:56
  • Certainly, it can. I guess you have a need for using this kind of pattern rather than something from the standard library which takes care of a lot of the bookkeeping? – John Sep 06 '13 at 15:57
  • @MadScienceDreams: I'm writing a C++ app, but with a library provided in C. So I don't get to change the definition of the structure. I need to allocate the structure properly in C++, then send it to the C library. – abelenky Sep 06 '13 at 15:58
  • 3
    Me personally, i'd use a `std::vector buff(sizeof(MyBuffer) + 100); MyBuffer* pBuff=(MyBuffer*)&buff[0];` This has the added bonus that the vector is cleaned up when it goes out of scope. using `new` isn't going to help you here, I'm afraid. – IdeaHat Sep 06 '13 at 16:03
  • @abelenky Just to clarify, I'm not proposing switching to vector, just switching the memory management to vector. – IdeaHat Sep 06 '13 at 16:09

7 Answers7

6

If you need to maintain compatibility with the existing C code you are using, then it works with C++, pretty much unchanged (just a need to cast the return from malloc()).

#include <stdlib.h>

typedef struct {
    int length;
    char data[1];
} MyBuffer;

void f() {

    MyBuffer* pBuff = (MyBuffer *)malloc(sizeof(MyBuffer) + 100);
}

This compiles without issue using g++.

If you are concerned about managing the memory allocated my malloc() then you could create a class to manage it and expose the MyBuffer pointer via a member method, for example:

std::shared_ptr<MyBuffer> buf((MyBuffer *)malloc(sizeof(MyBuffer) + 100), free);

Which is pretty cumbersome, I'll admit...

trojanfoe
  • 120,358
  • 21
  • 212
  • 242
5

This is not idiomatic in C++, nor is it needed. The language provides std::vector<unsigned char> which wraps up the size and buffer for you, or if you don't need dynamic resizing at runtime C++11 provides std::array<unsigned char>.

EDIT: The key to note here is don't allocate the vector by itself on the heap! If you put the vector by-value on the stack or within another object and properly size it at construction you'll use exactly the same number of allocations as the C-version (one). BUT you'll be using the idiomatic language feature AND prevent yourself from making a wide variety of memory errors and/or leaks.

Mark B
  • 95,107
  • 10
  • 109
  • 188
  • 3
    This C++ solution, like so many STL solutions, involves at least twice as many calls to malloc (or something like it) under the hood. Reducing such calls (and pointer indirections required by having a pointer to a dynamic object with a pointer to another dynamic object) is the reason behind the C idiom. – Mike Housky Sep 06 '13 at 16:09
  • @MikeHousky: `std::vector` doesn't do that much under the hood, it certainly doesn't store pointers in another dynamic object. Really, it's little more than a type-safe and polymorphic (but fully compile-time resolved) encapsulation of the C idiom. Of course if you use its resizing capabilities, it will need multiple mallocs, but you don't need to do that. – leftaroundabout Sep 06 '13 at 16:30
  • 2
    @Mike Housky See my edit. If you're using `vector` in a sane way it shouldn't have any more allocations than the C version. – Mark B Sep 06 '13 at 16:44
  • How then does std::vector support push_back()? – Mike Housky Sep 06 '13 at 19:13
  • @MarkB,lefroundabout: I think I now see what you are saying. I've been picturing a struct that lives longer than the function that created it, requiring a new vector and a vector* pointer. If the vector<> can live as a variable instead of a dynamic object, then I agree. No double pointers. – Mike Housky Sep 06 '13 at 20:09
  • @Mike Housky: Yes, standard containers are meant to be used as variables in and of themselves, so they can manage the memory for you. Doing something like `vector* v = new vector;` destroys the point of having the standard containers in the first place. – In silico Sep 06 '13 at 21:29
3

I think this will do the trick, give or take a couple syntax errors.

class MyBuffer {
 // Important, this class must not have any virtual methods.
 public:
  void* operator new(size_t data_length) {
    MyBuffer* buffer = static_cast<MyBuffer>(new char[sizeof(MyBuffer) + data_length]);
    buffer->length = data_length;
    return buffer;
  }

 private:
  int length;
  char data[1];
};

Edit:

One major downside of this technique is that it's fairly common for debug builds to override the global operator new, providing runtime checks for buffer overflows and memory leaks. I'm not sure how well this would interact with non-standard implementations of the global operator new.

Kennet Belenky
  • 2,755
  • 18
  • 20
2

You can use a template in C++, a feature C lacks. The template parameter determines the size of the array.

template <unsigned N>
struct MyBufferTemplate {
    int length;
    char data[N];
};

In the case the N is not known at compile time, you can define some set of reasonably sized values, and perform a best fit allocation.

However, if that seems too wasteful of memory, then another approach is to define an interface over std::vector<int> (this is based on a comment by MadScienceDreams).

struct MyBufferAdapter {
    MyBufferAdapter (int databytes = 1)
        : buf_(1 + (databytes+sizeof(int))/sizeof(int)) { buf_[0] = databytes; }
    void resize (int newdatabytes) {
        int newlength = 1 + (newdatabytes+sizeof(int))/sizeof(int);
        buf_.resize(newlength);
        buf_[0] = newdatabytes;
    }
    int & length () { return buf_[0]; }
    int length () const { return buf_[0]; }
    char * data () { return static_cast<char *>(&buf_[1]); }
    const char * data () const { return static_cast<const char *>(&buf_[1]; }
    operator MyBuffer * () { return reinterpret_cast<MyBuffer *>(&buf_[0]); }
    operator const MyBuffer * () const {
        return reinterpret_cast<const MyBuffer *>(&buf_[0]);
    }
private:
    std::vector<int> buf_;
};
jxh
  • 69,070
  • 8
  • 110
  • 193
  • 4
    @MikeHousky: There is nothing in my solution that uses VLA. – jxh Sep 06 '13 at 16:15
  • @Mike Housky In no case can `N` (a template parameter) be a runtime variable. – Mark B Sep 06 '13 at 17:08
  • Deleting old comments: Misread "in the case that" as "in this case". Many apologies. Now the problem is that the problem really does need a run-time variable for the length. I guess I should post an alt answer instead of carping from the sidelines. – Mike Housky Sep 06 '13 at 17:12
1

Personally, I'd go with malloc and free. However, you could go all out with new[], placement new and delete[]:

#include <new>

struct MyBuffer {
    int length;
    char data[1];
};

MyBuffer* make_a_buffer(int size)
{
    // allocate buffer large enough for what we want
    char* raw_memory = new char[sizeof(MyBuffer) + size];

    // call placement new to put a MyBuffer in the raw memory
    MyBuffer* buffer = new (raw_memory) MyBuffer;
    buffer->length = size;
    return buffer;
}

void destroy_a_buffer(MyBuffer* buffer)
{
    // in this case, MyBuffer has a trivial (default) destructor, so this isn't
    // really needed, but in other cases you may need to call the
    // destructor
    //
    // NOTE: there is placement new, but no placement delete
    // this is the only way to correctly destroy the object
    buffer->~MyBuffer();

    // we've destroyed the object, and now we need to release the
    // memory, luckily we know we got it from new[], so we can
    // delete[] it
    delete[] static_cast<char*>(static_cast<void*>(buffer));
}
Max Lybbert
  • 19,717
  • 4
  • 46
  • 69
  • Why are you performing nested static_casts on the last line? – Kennet Belenky Sep 06 '13 at 17:10
  • @KennetBelenky: `delete[]` requires that I pass it a pointer that is the same type as what `new[]` returned. A C-style cast or a `reinterpret_cast` is a more common way to convert the pointer's type, but I prefer `static_cast`-ing through `void*`. – Max Lybbert Sep 06 '13 at 17:18
  • Found it: http://lists.boost.org/Archives/boost/2002/10/37185.php (the reason I prefer chained `static_cast`s to `reinterpret_cast`). – Max Lybbert Sep 06 '13 at 17:43
  • Sure, I get why you need one static cast. What I don't understand is why you need to static cast it to void* first. In the absence of polymorphism, I fail to see how the nested casts are any different from just static_cast(buffer). – Kennet Belenky Sep 06 '13 at 17:47
  • @KennetBelenky: `static_cast` can convert related pointer types directly ( boost.org/doc/libs/1_39_0/libs/conversion/cast.htm ), or it can convert any `T*` to `void*`, and `void*` to any `T*`. That is, `static_cast(buffer)` is a syntax error unless buffer is a `void*`. – Max Lybbert Sep 06 '13 at 19:07
1

If you're bound to using the C struct, but want to use a better approach in C++, you can use a combination of templates and inheritance:

#include <iostream>
#include <memory>
#include <stdlib.h>

// Here's your C struct.
// Old C-style usage would be:  MyBuffer* pBuff = malloc(sizeof(MyBuffer) + 100);
// Which effectively gives you a 101-byte array for 'data'.
// (1 for the original array, +100).
typedef struct {
    int length;
    char data[1];
} MyBuffer;

// This defines a generic template that inherits from whatever you want, and
// adds some padded space to the end of it.  The 'extra_bytes' is equivalent
// to the '+100' you used to do in the c-style malloc trick (i.e. this still
// allocates a 101-byte array for 'data').
template<typename T, size_t extra_bytes>
struct ExtraBytes : public T {
  char padding[extra_bytes];
};

// If you just want to wrap your one struct, you can use this.  The constructor
// even sets the length for you.
template<size_t array_size>
struct MyBufferWrapper : public MyBuffer {
  char padding[array_size - 1];  // 1 is already allocated to 'data'
  MyBufferWrapper() {
    length = array_size;
  }
};

int main(int, char**) {
  MyBuffer normal;
  std::cout << "Sizeof normal MyBuffer = " << sizeof(normal) << "\tlength = "
         << normal.length << "\n";  // length is uninitialized

  MyBuffer* pBuff = static_cast<MyBuffer*>(malloc(sizeof(MyBuffer) + 100));
  std::cout << "Sizeof malloc'd MyBuffer = " << sizeof(*pBuff) << "\tlength = "
         << pBuff->length << "\n";  // length is uninitialized

  ExtraBytes<MyBuffer, 100> extra_bytes;
  std::cout << "Sizeof templated ExtraBytes = " << sizeof(extra_bytes)
         << "\tlength = " << extra_bytes.length << "\n";  // length is uninitialized

  MyBufferWrapper<100> wrapper;
  std::cout << "Sizeof Wrapped MyBuffer = " << sizeof(wrapper)
         << "\tlength = " << wrapper.length << "\n";  // length is set to 100

  // If you reall  auto heap = std::make_shared<MyBufferWrapper<100>>();
  auto heap = std::make_shared<MyBufferWrapper<100>>();
  std::cout << "Sizeof heap-allocated Wrapper = " << sizeof(*heap)
         << "\tlength = " << heap->length << "\n";  // length is 100

  return 0;
}

Note that with this approach, you don't need to use malloc/free nor new/delete. You just declare your MyBufferWrapper with whatever arraysize you want, it gets allocated on the stack, and you use it (you can treat it like a normal MyBuffer). If you want to use heap-allocated memory, you can just use std::unique_ptr or std::shared_ptr.

Tim
  • 8,912
  • 3
  • 39
  • 57
  • This partially solved the problem. The another part is the C pattern works for runtime size, and the template solution will only work when the buffer size is known at compile time. – ZijingWu Sep 07 '13 at 13:36
1

One C++ "ish" way to tackle this is to describe the buffer itself as a "trivially-copyable" (C++11 lingo, was "POD" for "plain old data" in C++98 & 2003) struct, with the microscopic exception that it has a private contructor to prevent instantiation. Then construct a pointer object for that struct. Here's a complete-but-trivial program with that idea:

#include <cstdlib>
#include <cstring>

struct MyBuffer
{
    int length;
    char data[1];
private:
    MyBuffer() {}
    MyBuffer& operator =(MyBuffer& other) { return other; }
};

class MyBufferPointer
{
    MyBuffer *bufptr_;

    static std::size_t getsize(std::size_t array_size)
    {
        return sizeof (MyBuffer) + array_size * sizeof (char);
    }

    static MyBuffer *getbuf(std::size_t array_length)
    {
        std::size_t sz = getsize(array_length);
        return static_cast<MyBuffer*>( malloc(sz) );
    }

public:
    MyBufferPointer() { bufptr_ = NULL; }

    MyBufferPointer(std::size_t array_length)
    {
        bufptr_ = getbuf(array_length);
        bufptr_->length = array_length;
    }

    MyBufferPointer(const MyBufferPointer &other)
    {
        const MyBuffer *op = other.bufptr_;
        if (op == NULL)
        {
            bufptr_ = NULL;
        }
        else
        {
            bufptr_ = getbuf(op->length);
            bufptr_->length = op->length;
            std::size_t sz = op->length * sizeof op->data[0];
            std::memmove( bufptr_->data, op->data, sz );
        }
    }

    MyBufferPointer& operator =(const MyBufferPointer &other)
    {
        const MyBuffer *op = other.bufptr_;
        if (op == NULL)
        {
            bufptr_ = NULL;
        }
        else
        {
            bufptr_ = getbuf(op->length);
            bufptr_->length = op->length;
            std::size_t sz = op->length * sizeof op->data[0];
            std::memmove( bufptr_->data, op->data, sz );
        }
        return *this;
    }

    ~MyBufferPointer() { if (bufptr_) free(bufptr_); }

    std::size_t size() const
    {
        return bufptr_ ? bufptr_->length : 0;
    }

    // conventience operations for access to the data array:
    char &operator [](std::size_t index) { return bufptr_->data[index]; }
    char at(size_t index) const { return bufptr_->data[index]; }
    MyBuffer* c_buffer() { return bufptr_; }
};

#include <iostream>
using namespace std;

int main()
{
    MyBufferPointer bufp;
    cout << "bufp().size() = " << bufp.size()
         << ", c_buffer=" << bufp.c_buffer() << endl;

    bufp = MyBufferPointer(100);
    cout << "bufp().size() = " << bufp.size()
         << ", c_buffer=" << bufp.c_buffer() << endl;
    return 0;
}

The MyBuffer struct is the layout for the C data area, only with private constructor and assignment operator declarations to prevent instantiation or attempt to copy (neither of which will work properly, either in C or C++.) The MyBufferPointer class encapsulates that as a C++ style char[] array, overloading the [] operator.

This still uses malloc(), not new. The memory image needed to satisfy those C APIs you mentioned needs the variable-length struct, and you can't get that in a standard C++ class created by new. This just provides a C++ wrapper to give a single point of struct creation in that class (in the static member functions getsize() and getbuf()); and guaranteed deletion of the buffer when the pointer goes out of scope. You could add resize(), to_string(), substring() or whatever methods you want.

The performance should be identical to the C struct accessed by an ordinary pointer, after optimization, since the methods are declared in-class and simple enough to be inlined.

abelenky
  • 63,815
  • 23
  • 109
  • 159
Mike Housky
  • 3,959
  • 1
  • 17
  • 31