4

Let's say I want to build a vector container that, unlike std::vector, allows uninitialized storage. The usage of the container, say vec <T>, would be roughly like this:

  • User explicitly states the vector should allocate N uninitialized elements like that:

    vec <T> a(N, no_init);

  • At some point when data are known, user explicitly initializes an element at position n using arguments args...:

    a.init(n, args...);

  • OR, equivalently, constructs the element manually:

    new (&a[n]) T(args...);

  • Other operations may initialize or copy more massively (like std::uninitialized_copy), but that's only for convenience; the basic underlying operation is the same.

  • After completing some task, the vector may be left with some elements initialized and others not. The vector does not hold any extra information, so eventually, before releasing memory, it either destructs all elements anyway, or only destructs depending on T.

I am pretty sure this can be done, only I am not sure of the consequences. Naturally we'd like this structure to be safe for all types T assuming the user does not attempt to use an uninitialized element before constructing it. This may sound like a strong assumption but accessing elements only within the vector's range is not so different an assumption and it's so common.

So my questions are:

  1. For which types would it be safe to allow this kind of uninitialized operation as in vec <T> a(no_init)? I guess is_pod would be ok and most probably is_trivial as well. I wouldn't like to put more constraints than necessary.

  2. Should destruction be performed always or only for some types? Would the same constraint be ok as above? How about is_trivially_destructible? The idea is that destructing an element that has not been constructed or vice versa (not destructing a constructed element) should do no harm.

  3. Is there a major flaw in this attempt, other than the apparent risk of putting more responsibility to the user?

The whole point is that when a user does need such functionality for performance, lower-level solutions like std::get_temporary_buffer or manual allocation (e.g. with operator new()) may be more risky in terms of leaking. I know about std::vector::emplace_back() but that's really not the same thing.

Community
  • 1
  • 1
iavr
  • 7,547
  • 1
  • 18
  • 53
  • 2
    Do you want to develop a container that acts like `std::vector`after you call `reserve`? From [this ref](http://www.cplusplus.com/reference/vector/vector/reserve/): "Requests that the vector capacity be at least enough to contain n elements." . – wesley.mesquita Feb 05 '14 at 18:42
  • Sounds like it's not vector that you want at all. What's wrong with an associative container? – Lightness Races in Orbit Feb 05 '14 at 18:46
  • @wesley.mesquita Only partially. Yes, I want the allocation that `reserve` does, but I also want the data fully accessible. E.g. `size()` should include these allocated elements. – iavr Feb 05 '14 at 18:46
  • @LightnessRacesinOrbit Well, I'm talking about a sequence container. E.g. with a POD type, one could even use `memcpy` and still know that memory will be released eventually. – iavr Feb 05 '14 at 18:51
  • 2
    destruction isn't clear to me... this could only work if the container knows which elements are real and which are uninitialized. otherwise, as you said, you need `is_trivially_destructible`. – Karoly Horvath Feb 05 '14 at 18:54
  • @KarolyHorvath It's still not very clear to me either. My hope is that without this information (so that no extra space is needed), I would be allowed to make mistakes that do no harm (as I say in point 2: destructing an element that has not been constructed or not destructing a constructed one). E.g. for `int` that would be safe. – iavr Feb 05 '14 at 18:57
  • @iavr Interesting question, and I'm sure you can pull it off writing your own allocator. However, I question the utility of this effort. As you've noted, `T` must be trivially destructible, and I'm having a hard time imagining a type that satisfies that condition, but is so expensive to construct that you need to go through all this trouble. – Praetorian Feb 05 '14 at 19:08
  • I belive such vector would not initially require you to commit memory for its buffer (only reserve address space), so you could maintain very large address space with only commiting required pages for initialized objects. – marcinj Feb 05 '14 at 19:16
  • @Praetorian You mean an allocator whose `construct()` (and maybe `destroy`) do nothing? That's interesting and would save much trouble indeed. But the question remains: for which types should this be allowed? Maybe a struct that contains a large built-in array could well make sense. – iavr Feb 05 '14 at 19:20
  • @marcin_j I am not sure how that would be possible, given that the user may initialize elements in any order depending on the algorithm? – iavr Feb 05 '14 at 19:24
  • 1
    If avoiding the value-initialization performed by `vector::resize()` and/or `vector::vector(size_t)` is sufficient for your problem, take a look at [this answer](http://stackoverflow.com/a/21028912/923854). – Casey Feb 05 '14 at 20:32
  • @Jarod42 These versions of `resize()` in `std::vector` and `boost::container::vector` allow for value-initialized, not uninitialized elements, if I understand correctly. As pointed out by Praetorian and shown by Casey above (thanks), a custom allocator is a way to implement the desired behaviour, without implementing a new container. But, whatever the implementation, my question basically is: for which types `T` is it safe to allow such behaviour? – iavr Feb 06 '14 at 10:42

1 Answers1

2

To answer the questions:

  1. no restriction on T : if it works for standard containers it works for yours.
  2. destruction is conditional, you can statically disable it if std::is_trivially_destructible<T>, else you must track constructed elements and only delete those which were actually constructed.
  3. I don't see a major flaw in your idea, but make sure it is worth it: profile your use-case and check that you really spend a lot of time initializing elements.

I'm making the assumption that you implement your container as a block of contiguous memory of size size() * sizeof(T). Also, if the element's destructor must be called, i.e. !std::is_trivially_destructible<T>, you must enable an additional storage, like a std::vector<bool> of size() elements use to flag elements for destruction.

Basically, if T is trivially destructible, you just init when the user asks and don't bother with destroying anything. Else, things are a little more tricky and you need to track which element was constructed and which is uninitialized, so that you only destroy what's needed.

  • up-sizing or container creation:
    1. if !std::is_trivially_destructible<T> resize flags storage accordingly
    2. Memory allocation
    3. Optional initialization depending on what the user asked:
      • no_init => if !std::is_trivially_destructible<T>, flag elements as non-initialized. Else do nothing.
      • (Args...) => if std::is_constructible<T, class... Args> call that constructor for each element. If !std::is_trivially_destructible<T>, flag elements as constructed.
  • down-sizing or container destruction:
    1. Optional destruction:
      • If std::is_trivially_destructible<T> do nothing
      • else for each element, if it is flagged as constructed, call its destructor
    2. Memory deallocation
    3. If !std::is_trivially_destructible<T> resize flags storage accordingly

From a performance point of view, if T is trivially destructible, things are great. If it has a destructor, things are more constrasted: you gain some constructors/destructors calls, but you need to maintain additional flags storage - in the end it depends if your constructors/destructors are complex enough.

Also like some suggested in the comments, you could just use an associative array based on std::unordered_map, add a size_t vector_size field, implement resize and override size. That way, uninitialized elements would not even be stored. On the other hand, indexing would be slower.

Antoine
  • 13,494
  • 6
  • 40
  • 52
  • Thanks a lot for your answer! I think you have gone a bit too far. To keep things simple, I mentioned in my question that "the vector does not hold any extra information" so there are no flags and decisions are "flat" rather than "per element". This means than for some types `T`, `no_init` should be disabled hence behaviour should be just like `std::vector`. The questions were roughly "which are those types?" and "what should I do at destruction in the absence of flags? destruct elements or not?". Could you please elaborate a bit given these constraints? – iavr Feb 27 '14 at 17:41
  • One more thing. A type may be `is_trivially_destructible`, yet it may have its own constructor, and worse, the constructor may allocate some resource. This may be against the [rule of three](http://stackoverflow.com/questions/4172722/what-is-the-rule-of-three), but still it may mean that I should be more conservative and use additionally `is_trivially_constructible` or even `is_trivial`. Is this right? I have to admit I have always been lost when trying to see the exact definition of all these traits and the implications in my problem. – iavr Feb 27 '14 at 17:49
  • If the user defined a custom destructor which frees some memory then the class is not trivially destructible: http://www.cplusplus.com/reference/type_traits/is_trivially_destructible/ If there is no such destructor, but the constructor still allocates some memory, then it's a memory leak and an error on the user side, not in the logic of the container. – Antoine Feb 27 '14 at 19:11
  • Being more conservative by only handling trivially constructible types does not solve the problem, since the type may still be a class that has some methods which leak memory. Even is_trivial allows for methods which may do bad things. Really the only fool-proof option here is to restrict to something like is_integral. But from a design point of view, it's not the container's responsability to assume the caontained type is broken... – Antoine Feb 27 '14 at 19:17
  • Ok, just to see if I get this right. I only check if the type `is_trivially_destructible`, in which case I allow `no_init` operations and never destruct any element, without flags and regardless whether the element was initialized or not. If this same type does allocate resources in a constructor or any other method, it's just a broken type that would leak anyway, and not the container's responsibility. On the other hand, if the type is not trivially destructible, just behave like `std::vector`. Right? – iavr Feb 27 '14 at 22:03
  • Yeah exactly, sorry if I complicated things - I didn't get that you weren't interested in per-element behaviour. Also you may be interested in this article which addresses similar issues: in some cases you can avoid calling the constructor if you know that ` memset` puts the object in a valid state: http://seanmiddleditch.com/journal/2013/04/zero-trivial-constructors-and-destructors/ – Antoine Feb 28 '14 at 10:07
  • That's great! I'm not interested in per-element behaviour, as a trade-off between space/time efficiency and design/code complexity. I think this choice is clear-cut, needs minimal extra code from both designer and user, has only performance gain (the larger the elements, the larger the gain), has no space or time overhead, and "you don't pay if you don't use" according to C++ principles. Whether I will work on an allocator for `std::vector` or start from scratch will depend on other issues, because it's part of a larger idea. Anyway, thanks again! – iavr Feb 28 '14 at 11:42
  • You're welcome ;) Also since you mentioned a larger idea, this made me think about http://www.boost.org/doc/libs/1_55_0/libs/flyweight/doc/index.html It's a different than what you asked but (maybe!) it might give you ideas for your design ? – Antoine Mar 03 '14 at 14:51