8

I myself am convinced that in a project I'm working on signed integers are the best choice in the majority of cases, even though the value contained within can never be negative. (Simpler reverse for loops, less chance for bugs, etc., in particular for integers which can only hold values between 0 and, say, 20, anyway.)

The majority of the places where this goes wrong is a simple iteration of a std::vector, often this used to be an array in the past and has been changed to a std::vector later. So these loops generally look like this:

for (int i = 0; i < someVector.size(); ++i) { /* do stuff */ }

Because this pattern is used so often, the amount of compiler warning spam about this comparison between signed and unsigned type tends to hide more useful warnings. Note that we definitely do not have vectors with more then INT_MAX elements, and note that until now we used two ways to fix compiler warning:

for (unsigned i = 0; i < someVector.size(); ++i) { /*do stuff*/ }

This usually works but might silently break if the loop contains any code like 'if (i-1 >= 0) ...', etc.

for (int i = 0; i < static_cast<int>(someVector.size()); ++i) { /*do stuff*/ }

This change does not have any side effects, but it does make the loop a lot less readable. (And it's more typing.)

So I came up with the following idea:

template <typename T> struct vector : public std::vector<T>
{
    typedef std::vector<T> base;

    int size() const     { return base::size(); }
    int max_size() const { return base::max_size(); }
    int capacity() const { return base::capacity(); }

    vector()                  : base() {}
    vector(int n)             : base(n) {}
    vector(int n, const T& t) : base(n, t) {}
    vector(const base& other) : base(other) {}
};

template <typename Key, typename Data> struct map : public std::map<Key, Data>
{
    typedef std::map<Key, Data> base;
    typedef typename base::key_compare key_compare;

    int size() const     { return base::size(); }
    int max_size() const { return base::max_size(); }

    int erase(const Key& k) { return base::erase(k); }
    int count(const Key& k) { return base::count(k); }

    map() : base() {}
    map(const key_compare& comp) : base(comp) {}
    template <class InputIterator> map(InputIterator f, InputIterator l) : base(f, l) {}
    template <class InputIterator> map(InputIterator f, InputIterator l, const key_compare& comp) : base(f, l, comp) {}
    map(const base& other) : base(other) {}
};

// TODO: similar code for other container types

What you see is basically the STL classes with the methods which return size_type overridden to return just 'int'. The constructors are needed because these aren't inherited.

What would you think of this as a developer, if you'd see a solution like this in an existing codebase?

Would you think 'whaa, they're redefining the STL, what a huge WTF!', or would you think this is a nice simple solution to prevent bugs and increase readability. Or maybe you'd rather see we had spent (half) a day or so on changing all these loops to use std::vector<>::iterator?

(In particular if this solution was combined with banning the use of unsigned types for anything but raw data (e.g. unsigned char) and bit masks.)

Tobi
  • 78,067
  • 5
  • 32
  • 37

8 Answers8

7

Don't derive publicly from STL containers. They have nonvirtual destructors which invokes undefined behaviour if anyone deletes one of your objects through a pointer-to base. If you must derive e.g. from a vector, do it privately and expose the parts you need to expose with using declarations.

Here, I'd just use a size_t as the loop variable. It's simple and readable. The poster who commented that using an int index exposes you as a n00b is correct. However, using an iterator to loop over a vector exposes you as a slightly more experienced n00b - one who doesn't realize that the subscript operator for vector is constant time. (vector<T>::size_type is accurate, but needlessly verbose IMO).

fizzer
  • 13,551
  • 9
  • 39
  • 61
  • as ChrisN correctly commented, vector::size_t is not necassarily std::size_t . so if you would do size_t i = someVector.size() - 1 , you do not necassarily endup with (size_t)-1 . that means your loop condition then is broken. – Johannes Schaub - litb Nov 09 '08 at 13:08
  • 1
    No, size_type for the default allocator is always size_t. – fizzer Nov 09 '08 at 13:17
  • yes, but vector::size_type is NOT defined in terms of the default allocator in my copy of the standard (i did this mistake too in my own commentar above). It's defined in terms of the Container requirements as a implementation defined unsigned integral type. – Johannes Schaub - litb Nov 09 '08 at 13:29
  • That's a good spot - I missed that, thanks. But as size_t will never be narrower than size_type, I don't see the problem. – fizzer Nov 09 '08 at 13:53
  • See my changed answer on where problems will arise if you loop backwards :) And I've tried to look up where you got it from that size_t is never narrower as size_type. Haven't found it :) – Johannes Schaub - litb Nov 09 '08 at 14:03
  • size_t and ptrdiff_t are both at least 65535. Though that doesn't say anything about their real values in implementations of c89 / c++03. ptrdiff_t positive max could be 2^31-1, while size_t max could be 65535. This then would mean size_t is narrower then. Any ideas? – Johannes Schaub - litb Nov 09 '08 at 14:09
  • OK, I see it if you loop backwards and the vector is empty. Nobody told me we were looping backwards. – fizzer Nov 09 '08 at 14:18
  • Oh you replied again - look I'm half way through varnishing a door. I will get back to you again when I can give it proper attention, 30 mins – fizzer Nov 09 '08 at 14:20
  • 2
    One way to get there is size_t is large enough to hold size of any object, and vector is guaranteed to be backed by contiguous storage, so size_type <= size_t – fizzer Nov 09 '08 at 14:28
  • I think that's a good point. Tho formally, using std::size_t is wrong, by your example in reality std::size_t will always be at least as large as ptrdiff_t (so it works for forward looping). Tho the problem with looping backward remains :p – Johannes Schaub - litb Nov 09 '08 at 14:37
4

While I don't think "use iterators, otherwise you look n00b" is a good solution to the problem, deriving from std::vector appears much worse than that.

First, developers do expect vector to be std:.vector, and map to be std::map. Second, your solution does not scale for other containers, or for other classes/libraries that interact with containers.

Yes, iterators are ugly, iterator loops are not very well readable, and typedefs only cover up the mess. But at least, they do scale, and they are the canonical solution.

My solution? an stl-for-each macro. That is not without problems (mainly, it is a macro, yuck), but it gets across the meaning. It is not as advanced as e.g. this one, but does the job.

peterchen
  • 40,917
  • 20
  • 104
  • 186
  • BTW, VC++ has "for each" keyword out of the box. It's not portable, of course, but it's there if you don't care of portability. – wasker Nov 09 '08 at 15:04
3

Skip the index

The easiest approach is to sidestep the problem by using iterators, range-based for loops, or algorithms:

for (auto it = begin(v); it != end(v); ++it) { ... }
for (const auto &x : v) { ... }
std::for_each(v.begin(), v.end(), ...);

This is a nice solution if you don't actually need the index value. It also handles reverse loops easily.

Use an appropriate unsigned type

Another approach is to use the container's size type.

for (std::vector<T>::size_type i = 0; i < v.size(); ++i) { ... }

You can also use std::size_t (from <cstddef>). There are those who (correctly) point out that std::size_t may not be the same type as std::vector<T>::size_type (though it usually is). You can, however, be assured that the container's size_type will fit in a std::size_t. So everything is fine, unless you use certain styles for reverse loops. My preferred style for a reverse loop is this:

for (std::size_t i = v.size(); i-- > 0; ) { ... }

With this style, you can safely use std::size_t, even if it's a larger type than std::vector<T>::size_type. The style of reverse loops shown in some of the other answers require casting a -1 to exactly the right type and thus cannot use the easier-to-type std::size_t.

Use a signed type (carefully!)

If you really want to use a signed type (or if your style guide practically demands one), like int, then you can use this tiny function template that checks the underlying assumption in debug builds and makes the conversion explicit so that you don't get the compiler warning message:

#include <cassert>
#include <cstddef>
#include <limits>

template <typename ContainerType>
constexpr int size_as_int(const ContainerType &c) {
    const auto size = c.size();  // if no auto, use `typename ContainerType::size_type`
    assert(size <= static_cast<std::size_t>(std::numeric_limits<int>::max()));
    return static_cast<int>(size);
}

Now you can write:

for (int i = 0; i < size_as_int(v); ++i) { ... }

Or reverse loops in the traditional manner:

for (int i = size_as_int(v) - 1; i >= 0; --i) { ... }

The size_as_int trick is only slightly more typing than the loops with the implicit conversions, you get the underlying assumption checked at runtime, you silence the compiler warning with the explicit cast, you get the same speed as non-debug builds because it will almost certainly be inlined, and the optimized object code shouldn't be any larger because the template doesn't do anything the compiler wasn't already doing implicitly.

Adrian McCarthy
  • 45,555
  • 16
  • 123
  • 175
3

I made this community wiki... Please edit it. I don't agree with the advice against "int" anymore. I now see it as not bad.

Yes, i agree with Richard. You should never use 'int' as the counting variable in a loop like those. The following is how you might want to do various loops using indices (althought there is little reason to, occasionally this can be useful).

Forward

for(std::vector<int>::size_type i = 0; i < someVector.size(); i++) {
    /* ... */
}

Backward

You can do this, which is perfectly defined behaivor:

for(std::vector<int>::size_type i = someVector.size() - 1; 
    i != (std::vector<int>::size_type) -1; i--) {
    /* ... */
}

Soon, with c++1x (next C++ version) coming along nicely, you can do it like this:

for(auto i = someVector.size() - 1; i != (decltype(i)) -1; i--) {
    /* ... */
}

Decrementing below 0 will cause i to wrap around, because it is unsigned.

But unsigned will make bugs slurp in

That should never be an argument to make it the wrong way (using 'int').

Why not use std::size_t above?

The C++ Standard defines in 23.1 p5 Container Requirements, that T::size_type , for T being some Container, that this type is some implementation defined unsigned integral type. Now, using std::size_t for i above will let bugs slurp in silently. If T::size_type is less or greater than std::size_t, then it will overflow i, or not even get up to (std::size_t)-1 if someVector.size() == 0. Likewise, the condition of the loop would have been broken completely.

Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
  • The appropriate type to use would be std::vector::size_type, not std::size_t. – ChrisN Nov 09 '08 at 12:41
  • You are right. I confused it with string::size_type , which is always std::size_t. I will change it accordingly – Johannes Schaub - litb Nov 09 '08 at 12:55
  • Although I personally am not convinced about the advantages of using unsigned in C++ (i.e. the std::vector::size_type solution does not prevent expressions based on the loop index to be promoted to unsigned quietly) I am accepting this answer as it gives a good overview of what one can do for others IMHO. – Tobi Apr 04 '11 at 09:20
3

Definitely use an iterator. Soon you will be able to use the 'auto' type, for better readability (one of your concerns) like this:

for (auto i = someVector.begin();
     i != someVector.end();
     ++i)
Tim Weiler
  • 31
  • 2
  • Heh. As soon as this syntax is available in mainstream Linux distributions (there is already is maybe?), MinGW and Visual Studio, I'll definitely gonna use this. Looks so much better then std::vector<>::iterator it = someVector.begin() etc... – Tobi Nov 09 '08 at 15:43
  • oops slots of typos in that too, well suppose the meaning is clear enough :-) – Tobi Nov 09 '08 at 15:44
  • No "auto" in GCC yet, not even bleeding edge betas AFAIK. http://gcc.gnu.org/gcc-4.4/cxx0x_status.html – richq Nov 28 '08 at 18:08
0

vector.size() returns a size_t var, so just change int to size_t and it should be fine.

Richard's answer is more correct, except that it's a lot of work for a simple loop.

Lodle
  • 31,277
  • 19
  • 64
  • 91
0

You're overthinking the problem.

Using a size_t variable is preferable, but if you don't trust your programmers to use unsigned correctly, go with the cast and just deal with the ugliness. Get an intern to change them all and don't worry about it after that. Turn on warnings as errors and no new ones will creep in. Your loops may be "ugly" now, but you can understand that as the consequences of your religious stance on signed versus unsigned.

Dan Olson
  • 1,282
  • 9
  • 7
0

I notice that people have very different opinions about this subject. I have also an opinion which does not convince others, so it makes sense to search for support by some guru’s, and I found the CPP core guidelines:

https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines

maintained by Bjarne Stroustrup and Herb Sutter, and their last update, upon which I base the information below, is of April 10, 2022.

Please take a look at the following code rules:

  • ES.100: Don’t mix signed and unsigned arithmetic
  • ES.101: Use unsigned types for bit manipulation
  • ES.102: Use signed types for arithmetic
  • ES.107: Don’t use unsigned for subscripts, prefer gsl::index

So, supposing that we want to index in a for loop and for some reason the range based for loop is not the appropriate solution, then using an unsigned type is also not the preferred solution. The suggested solution is using gsl::index.

But in case you don’t have gsl around and you don’t want to introduce it, what then?

In that case I would suggest to have a utility template function as suggested by Adrian McCarthy: size_as_int