7

I have a template<typename T> function that takes a const vector<T>&. In said function, I have vectors cbegin(), cend(), size(), and operator[]. As far as I understand it, both string and vector use contiguous space, so I was wondering if I could reuse the function for both data types in an elegant manner.

Can a std::string be reinterpreted as a std::vector of (the appropriate) char_type? If so, what would the limitations be?

Trow Way
  • 45
  • 3
Anzurio
  • 16,780
  • 3
  • 39
  • 49

7 Answers7

14

If you make your template just for type const T& and use the begin(), end(), etc, functions which both vector and string share then your code will work with both types.

Zan Lynx
  • 53,022
  • 10
  • 79
  • 131
  • Is there any implementation in which the **generated** code is shared (and not just the **source** code)? – 6502 Oct 08 '15 at 21:50
  • @6502: The generated code will not be shared unless the standard library authors went to some extreme trouble to make it happen. But really why do you care? When optimized the iterator and operator[] operations compile to just a few machine instructions each. It isn't a big deal. – Zan Lynx Oct 08 '15 at 21:57
  • @ZanLynx: the whole function will be duplicated, not just the `[]` access code. It could be a big function... – 6502 Oct 08 '15 at 22:02
  • @6502 a quality linker will eliminate identical functions – Yakk - Adam Nevraumont Oct 08 '15 at 22:48
7

Go STL way and use iterators. Accept iterator to begin and iterator to end. It will work with all possible containers, including non-containers like streams.

SergeyA
  • 61,605
  • 5
  • 78
  • 137
6

There is no guarantee the layout of string and vector will be the same. They theoretically could be, but they probably aren't in any common implementation. Therefore, you can't do this safely. See Zan's answer for a better solution.

Let me explain: If I am a standard library implementer and decide to implement std::string like so....

template ...
class basic_string {
public:
    ...
private:
    CharT* mData;
    size_t mSize;
};

and decide to implement std::vector like so...

template ...
class vector {
public:
    ...
private:
    T* mEnd;
    T* mBegin;
};

When you reinterpret_cast<string*>(&myVector) you wind up interpreting the pointer to the end of your data as the pointer to the start of your data, and the pointer to the start of your data to the size of your data. If the padding between members is different, or there are extra members, it could get even weirder and more broken than that too.

So yes, in order for this to possibly work they both need to store contiguous data, but they also need quite a bit else to be the same between the implementations for it to work.

David
  • 27,652
  • 18
  • 89
  • 138
  • Both `string` and `vector` use contiguous array for storing data. It means that a non-templated function taking just memory addresses can be used. – Andrey Nasonov Oct 08 '15 at 21:49
  • @AndreyNasonov You're wrong. Updated answer to explain it. Please don't downvote before you understand :( – David Oct 08 '15 at 22:12
  • I'm not talking about field layout and particular implementation. I'm talking about data representation. Both `string` and `vector` provide `data()` function pointing to the first element. It is guaranteed that it is a contiguous piece of memory. – Andrey Nasonov Oct 08 '15 at 22:15
  • @AndreyNasonov Are you saying that calling certain functions on `reinterpret_cast(&myVector)` will work? I can't quite tell what it is you're saying, but that's not right. Yes they both point to contiguous data... that doesn't change anything I've said... – David Oct 08 '15 at 22:21
  • 1
    What's the problem making function signature `(const T *begin, const T *end)` and calling `f(v.data(), v.data() + v.size())`? I'm taking ONLY about what returns `data()` method. It returns the same for `vector` and `string`. – Andrey Nasonov Oct 08 '15 at 22:23
  • @AndreyNasonov nothing is wrong with that but I was answering the question as to whether you can reinterpret_cast a vector to a string safely... I think that's clear, no? – David Oct 08 '15 at 22:31
6

std::experimental::array_view<const char> n4512 represents a contiguous buffer of chars.

Writing your own is not hard, and it solves this problem and (in my experience) many more.

Both string and vector are compatible with an array view.

This lets you move your implementation into a .cpp file (and not expose it), gives you the same performance as doing it with std::vector<T> const& and probably the same implementation, avoids duplicating code, and uses light weight contiguous buffer type erasure (which is full of tasty keywords).

Community
  • 1
  • 1
Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
2

If the key point is that you want to access a continuous area in memory where instances of a specific char type are stored then you could define your function as

void myfunc(const CType *p, int size) {
     ...
}

to make it clear that you assume they must be adjacent in memory.

Then for example to pass the content of a vector the code is simply

myfunc(&myvect[0], myvect.size());

and for a string

myfunc(mystr.data(), mystr.size());

or

myfunc(buffer, n);

for an array.

6502
  • 112,025
  • 15
  • 165
  • 265
  • 1
    This approach has an advantage: it does not use templates. But please change `int` to `size_t`. – Andrey Nasonov Oct 08 '15 at 21:51
  • @SergeyA: Writing a template (in the implementations I know) will share the **source** code, generating however distinct code for distinct types. – 6502 Oct 08 '15 at 21:52
  • 1
    @AndreyNasonov, why would this be an advantage? In my view, it's a drawback. – SergeyA Oct 08 '15 at 21:52
  • @6502, you are referring to code bloat? It is overhyped. Especially in the world of inlining. – SergeyA Oct 08 '15 at 21:53
  • @AndreyNasonov: `int` is IMO a better type for size. `size_t` is an historical accident dating back to 16-bit era. Just because the standard library is doomed to it for backward compatibility reasons I'm not punishing myself repeating the same mistake in my code. – 6502 Oct 08 '15 at 21:53
  • 2
    @6502, that's just plain wrong. I mean, sizes. int is usually 32 bit type, size_t 32 or 64 depending on pointer size. The code which uses int's for sizes is error prone. – SergeyA Oct 08 '15 at 21:54
  • @6502, Agree, I hate `size_t` too because it is unsigned. – Andrey Nasonov Oct 08 '15 at 21:55
  • @SergeyA: don't be fooled by the name. `unsigned` doesn't mean "non-negative" but instead "member of the Z_{2^n} modulo ring". You really think it does make sense to say that the size of a vector is a member of a modulo ring? Actually using unsigned types for size is the source of many bugs. – 6502 Oct 08 '15 at 21:58
  • @SergeyA, Because I do not want to think about mixing signed and unsigned types. I want to use only signed types. – Andrey Nasonov Oct 08 '15 at 22:03
  • @6502 mixing unsigned and signed may be an issue but since a vector could be larger than what an int can hold you have now introduced another bug. – NathanOliver Oct 08 '15 at 22:47
  • @NathanOliver: Like I said 16-bits machines were the reason for which we have unsigned `size_t` values. In my opinion even back then it was a wrong choice (if 15 bit are not enough now, 16 won't be enough either damn soon). Making that choice now would be just inexcusable (quantities are still unsigned only because of backward compatibility). The problem with `unsigned` is that while `int`s have problematic behavior around huge numbers/quantity that rarely are used in programs, `unsigned` have a problematic behavior around 0, that is an incredibly common value. – 6502 Oct 09 '15 at 06:39
  • If you want signed, use `std::ptrdiff_t`. However, `int` for a size is just silly, at least on a 64-bit platform. One of my (small) company's servers has 512GiB RAM. There's freaking *telephones* today with more than 2GiB memory. So unless you're writing for "lesser" embedded systems, or writing small utilities, avoid using `int` for sizes. – Arne Vogel Oct 09 '15 at 09:20
  • @ArneVogel: the problem is not using a specific type for sizes, the problem is using an `unsigned` type because C++ semantic for `unsigned` is special an just plain wrong for the size of a container (especially implicit promotion rules). Using a `long long` however makes perfect sense. – 6502 Oct 09 '15 at 09:35
1

You can't directly typecast a std::vector to a std::string or vice versa. But using the iterators that STL containers provide does allow you to iterate both a vector and a string in the same way. And if your function requires random access of the container in question then either would work.

std::vector<char> str1 {'a', 'b', 'c'};
std::string str2 = "abc";

template<typename Iterator>
void iterator_function(Iterator begin, Iterator end)
{
  for(Iterator it = begin; it != end; ++it)
  {
    std::cout << *it << std::endl;
  }
}

iterator_function(str1.begin(), str1.end());
iterator_function(str2.begin(), str2.end());

Both of those last two function calls would print the same thing.

Now if you wanted to write a generic version that parsed only characters only stored in a string or in a vector you could write something that iterated the internal array.

void array_function(const char * array, unsigned length)
{
  for(unsigned i = 0; i < length; ++i)
  {
    std::cout << array[i] << std::endl;
  }
}

Both functions would do the same thing in the following scenarios.

std::vector<char> str1 {'a', 'b', 'c'};
std::string str2 = "abc";

iterator_function(str1.begin(), str1.end());
iterator_function(str2.begin(), str2.end());
array_function(str1.data(), str1.size());
array_function(str2.data(), str2.size());

There are always multiple ways to solve a problem. Depending on what you have available any number of solutions might work. Try both and see which works better for your application. If you don't know the iterator type then the char typed array iteration is useful. If you know you will always have the template type to pass in then the template iterator method might be more useful.

Connor Hollis
  • 1,115
  • 1
  • 7
  • 13
1

The way your question is put at the moment is a bit confusing. If you mean to be asking "is it safe to cast a std::vector type to a std::string type or vice versa if the vector happens to contain char values of the appropriate type?", the answer is: no way, don't even think about it! If you're asking: "can I access the contiguous memory of non-empty sequences of char type if they're of the type std::vector or std::string?" then the answer is, yes you can (with the data() member function).