1

In my C++ application, I use an external library that exposes a C API. Some of the C functions take arrays of strings as input and use char** for that:

void c_api_function(char** symbols, int count);

(Note: I think a pointer to const would be more appropriate, but it seems as if const correctness was not important for the library authors.)

The strings must use a specific encoding.

Currently, in order to call the API, I first convert the strings to the correct encoding and store the result in a vector<string>. Then I create a vector<char*> that can be passed to the C API:

std::string encode(std::string const& symbol);

void call_api(std::vector<std::string> const& symbols)
{
    std::vector<std::string> encoded_symbols;
    for (auto const& s : symbols)
    {
        encoded_symbols.push_back(encode(s));
    }
    std::vector<char*> encoded_symbol_ptrs;
    for (auto const& s : encoded_symbols)
    {
        encoded_symbols_ptrs.push_back(s.data());
    }

    c_api_function(encoded_symbols_ptrs.data(), (int)encoded_symbols_ptrs.size());
}

I dont like this approach, because I need two vectors. The first vector ensures that the strings are kept alive, the second vector can be passed to the API. Is there a way that only uses a single container, but still uses automatic memory management? If necessary, I can freely change the signature of the encode function, for example, using std::unique_ptr as return value.

pschill
  • 5,055
  • 1
  • 21
  • 42
  • You can't pass a C++ std::vector as a char** parameter to a C API. – paulsm4 Oct 25 '21 at 08:24
  • 2
    I think this is about as good as it can get. (Encoding to`unique_ptr` instead of `string` won't help - you would still need a separate container for the pointers.) – molbdnilo Oct 25 '21 at 08:26
  • Generally when I see `void c_api_function(char**);` in a C program I would expect that function to return a C string by reference. – hetepeperfan Oct 25 '21 at 08:27
  • How does `c_api_function` know how many elements the array has? – molbdnilo Oct 25 '21 at 08:29
  • @molbdnilo The array could be NULL terminated. – hetepeperfan Oct 25 '21 at 08:31
  • I forgot to add the string count as argument. – pschill Oct 25 '21 at 08:36
  • 1
    You are passing 'encoded_symbols' to 'c_api_function' - it won't compile- wrong types, you want to pass 'encoded_symbol_ptrs' instead. And one more thing - calling push_back() in loop is a bad practice if you known the final size of the array, since it ends with unnecessary relocations – crayze Oct 25 '21 at 08:43

2 Answers2

0
//
// This is how to do this with the smallest number of allocations as possible (best performance)
//
// I guess: if string contains only ascii encode has no job to do and then input == output,
// then allocation isn't necessary and input string can be used to pass to c_api_function()
// I would recommend to change the encode() to do not generate output string if encode has nothing to do
// in that case if encode() returns true (encoding was made: input != output) and output will be store in 'out'
// if encode() returns false then input == output and 'out' isn't used
//
bool encode(const std::string& symbol_in, std::string& out);

void call_api(const std::vector<std::string>& symbol_in) {
    const size_t c = symbol_in.size(); // size usually have cost in operation: end - begin :)
    if (c > 0x7FFFFFFF) { // good idea if you must convert from size_t to int further
      throw std::overflow_error("..we have a problem here..");
    }

    auto it_in = symbol_in.cbegin(); // const iterator for input
    auto it_end = symbol_in.cend(); // const iterator for input end
    
    std::vector<std::string> encoded_symbols(c); // allocate array of string, but some std::string items may not be used if encode will return false
    std::vector<const char*> encoded_symbols_raw(c); // array of C raw pointers

    auto it_out = encoded_symbols.begin(); // iterator for std::string objects output
    auto raw_out = encoded_symbols_raw.begin(); // iterator for raw output

    for (; it_in != it_end; ++it_in, ++it_out, ++raw_out) {
      if (encode(*it_in, *it_out)) { // if *it_out contains encoding result:
        *raw_out = it_out->c_str(); // set std::string buffer as raw pointer
      }
      else {
        *raw_out = it_in->c_str(); // no encoding needed - just pass input string buffer
      }
    }

    c_api_function((char**)encoded_symbols_raw.data(), (int)c);
}
crayze
  • 386
  • 3
  • 12
0

Since encoded_symbols only exists within your call_api() function, I would prefer to have an object that contains the all the encoded symbols and a member function which acts upon them. As an executable example:

#include <iostream>
#include <memory>
#include <string>
#include <vector>

void c_api_function(char** symbols, int count)
{
    for(int x = 0; x < count; ++x)
    {
        std::cout << symbols[x] << '\n';
    }
}

std::string encode(std::string const& symbol)
{
    return symbol;
}

class EncodedSymbols
{
    friend void call_api(EncodedSymbols& symbols);
    friend void call_api(std::vector<std::string> const& symbols);

public:
    EncodedSymbols(const EncodedSymbols& other) = delete;
    EncodedSymbols(EncodedSymbols&& other) = default;

    ~EncodedSymbols()
    {
        for(int x = 0; x < number_of_encoded_symbols; ++x)
        {
            delete[] encoded_symbol_array[x];
        }
    }

    static EncodedSymbols create_from(const std::vector<std::string>& symbols)
    {
        EncodedSymbols obj;

        obj.encoded_symbol_array = std::make_unique<char*[]>(symbols.size());
        obj.number_of_encoded_symbols = symbols.size();

        for(int x = 0; x < symbols.size(); ++x)
        {
            const std::string encoded = encode(symbols[x]);

            obj.encoded_symbol_array[x] = new char[encoded.length() + 1];
            std::copy(encoded.begin(), encoded.end(), obj.encoded_symbol_array[x]);
            obj.encoded_symbol_array[x][encoded.length()] = '\0';
        }

        return obj;
    }

    void call_api()
    {
        c_api_function(encoded_symbol_array.get(), number_of_encoded_symbols);
    }

private:
    EncodedSymbols() = default;

    std::unique_ptr<char*[]> encoded_symbol_array;
    int number_of_encoded_symbols;
};

void call_api(EncodedSymbols& symbols)
{
    c_api_function(symbols.encoded_symbol_array.get(), 
        symbols.number_of_encoded_symbols);
}

void call_api(std::vector<std::string> const& symbols)
{
    auto encoded = EncodedSymbols::create_from(symbols);

    c_api_function(encoded.encoded_symbol_array.get(), 
        encoded.number_of_encoded_symbols);
}

int main()
{
    std::vector<std::string> symbols{"one", "two"};
    auto encoded_symbols = EncodedSymbols::create_from(symbols);

    encoded_symbols.call_api();
    call_api(encoded_symbols);
    call_api(symbols);

    return 0;
}

If there are other functions in your C library that act upon encoded symbols then (in my mind) it makes more sense to put them in a class. All the manual memory management can be hidden behind a nice interface.

If you prefer, you can also have a bare function which acts upon an EncodedSymbols instance. I have included that variant too.

As a third alternative, you could keep your current function prototype and use the EncodedSymbols type for RAII. I have shown that too in my example.

Daniel Dearlove
  • 565
  • 2
  • 12