I have been wondering about the rationale behind the design of std::string
's substr(pos, len)
method for a while now. It still does not make sense to me, so I decided to ask the experts. The function throws a std::out_of_range
exception if the pos
argument exceeds the string length plus one. This can be inconvenient (even annoying) at times, but my real concern is consistency and the principle of least surprise. It turns out that the "end" position pos+len
of the substring is allowed to exceed the string length plus one. Disallowing this for the beginning but not for the end feels inconsistent to me. Allowing it for the end to me hints at the interpretation
return all characters at positions pos <= i < pos+len
however, then I would expect the function to return an empty string for values of pos
exceeding the string length, instead of throwing an exception. As a side note, with this interpretation it would even be sensible to allow for negative values of pos
(provided it had a signed type).
This leaves me with the following questions:
- Does this design appear logical to you? Sensible? Do you have a satisfactory way to resolve the inconsistency?
The only possible explanation I can come up with is compatibility with null-terminated strings. With null termination it does not matter if the specified length exceeds the end, while starting beyond the null character is a memory bug. However,
std::string
is not null-terminated and instead keeps track of the length of the string. If that's the true reason then personally I'd call that a very bad one. - Is there an advantage in terms of performance? I would actually be surprised.
- Am I overlooking an advantage in terms of usability? Maybe a standard idiom or use case in conjunction with other functions, like find? Also here my impression is that returning an empty string had the potential to simplify some code.
- Is there any way to change the behavior of
substr
in the future? I guess no, since silently breaking existing code is must worse than living with this twist...?