2

In this code I'm filling the vector alpha with letters from 'a' to 'z':

std::vector<char> alpha(26);
std::iota(alpha.begin(), alpha.end(), 'a');

I'm assuming this will not work with all character encodings, can anyone confirm? Or deny?

And if the former is true, is there any solid alternative?

anastaciu
  • 23,467
  • 7
  • 28
  • 53
  • I'm not sure if this is what you are asking for, but for instance in spanish we have 'ñ' between 'n' and 'o', so this doesn't work for us, not even in ASCII. – Ion Larrañaga Mar 26 '21 at 14:21
  • 2
    This will not work. It definitely will not work with a collating sequence such as EBCDIC. Basically, assume that alphabetic characters can be all over the place in the collating sequence, and not necessarily contiguous. – PaulMcKenzie Mar 26 '21 at 14:24
  • @IonLarrañaga, that's news to me, I thought ASCII would be the same everywhere, I was asking more about other ecodings that do not have sequencial character encoding, like the one mentioned by Paul Mckenzie. I was wondering if iota would adapt to these. – anastaciu Mar 26 '21 at 14:26
  • @PaulMcKenzie I see. – anastaciu Mar 26 '21 at 14:27
  • 1
    If it adapted, it wouldn't be `std::iota`. C++ could use a lot more locale work in its Standard Library, though. – sweenish Mar 26 '21 at 14:27
  • 1
    C++ would need a function, maybe called `next_char` that actually increments an iterator of some sort to the next available character, based on locale. Don't think anything like that exists in current C++, but I may be wrong. – PaulMcKenzie Mar 26 '21 at 14:28
  • @sweenish yes, I agree. – anastaciu Mar 26 '21 at 14:30
  • @PaulMcKenzie, you see, it could be done, it's just bad will ;) – anastaciu Mar 26 '21 at 14:31
  • 1
    I think the only ASCII and (EBCDIC) character sequence you can assume is contiguous is `0` to `9` – Blastfurnace Mar 26 '21 at 14:31
  • 1
    @Blastfurnace, yes, digits must be sequential, I was wondering if for some good will it would adapt to the locale. Or if there is something that will. – anastaciu Mar 26 '21 at 14:32
  • 1
    @anastaciu -- I think Asian languages like Japanese have the 0-9. but also local numeric characters, so `isdigit()` gets very interesting in these cases. – PaulMcKenzie Mar 26 '21 at 14:33
  • @PaulMcKenzie, is that right? Always learning here. – anastaciu Mar 26 '21 at 14:35
  • 1
    @anastaciu I mean... ASCII is the same everywhere. It's just that if you want to create an array with our alphabet, after 'n' you have to go to a much higher ascii code to get the 'ñ' and then go back to 'o', so std::iota, which just increments the char doesn't work – Ion Larrañaga Mar 26 '21 at 14:36
  • @IonLarrañaga I see what you mean. – anastaciu Mar 26 '21 at 14:38
  • 1
    Japanese uses (from what I remember) horizontal lines for some digit characters, so for Japanese locale, `std::isdigit` will return `true` for those characters. – PaulMcKenzie Mar 26 '21 at 14:40
  • @PaulMcKenzie jesus, and here I was thinking that handling with portuguese was hard. I see now that I have it easy. – anastaciu Mar 26 '21 at 14:42
  • 1
    @PaulMcKenzie: `〇〡〢〣〤〥` ? (U+3020-U+3029) – MSalters Mar 26 '21 at 14:49
  • @MSalters -- Thanks for that. – PaulMcKenzie Mar 26 '21 at 14:50
  • 1
    @IonLarrañaga ASCII is the same everywhere. Spanish is not in the ASCII range, so there is no 'ñ' (U+00F1) between 'n' (U+006E) and 'o' (U+006F) – Remy Lebeau Mar 26 '21 at 15:45

1 Answers1

3

The behavior of std::iota is very simple:

Fills the range [first, last) with sequentially increasing values, starting with value and repetitively evaluating ++value.

This means your code will only work when the encoding represents the characters 'a', 'b' ... 'z' in increasing order. This is the case with ASCII encoding, so your code will work in that case. For any other encoding, where these characters are not increasing, or there are other characters interspersed between 'a' and 'z', this will not work.

cigien
  • 57,834
  • 11
  • 73
  • 112
  • Yes I suspected as much, I was wondering if there was something missing form there. – anastaciu Mar 26 '21 at 14:45
  • @anastaciu Not with `std::iota`, since it relies on `++` to do its job. If `++` doesn't do what you want, `iota` won't work. – cigien Mar 26 '21 at 14:47
  • So in your infinite wisdom about C++ libraries ;) you think there is something that will? No pressure. – anastaciu Mar 26 '21 at 14:48
  • 1
    @anastaciu There are almost certainly libraries that will do this (I couldn't refer you to one though, I'd have to search). In the standard library, there's nothing though, and as far as I'm aware, there's nothing in the pipeline to do this either. – cigien Mar 26 '21 at 14:50
  • 1
    @anastaciu You could write an `utf8char` class (not super hard) and let `operator++` step to the next (valid) code point. You could then have a `std::vector` and let `iota` fill it. – Ted Lyngmo Mar 26 '21 at 15:00
  • @TedLyngmo that's a nice suggestion, though the not super hard part is debatable :-) – anastaciu Mar 26 '21 at 15:06
  • 1
    @anastaciu I guess :-) I always find conversion between all the different formats confusing but found writing one class dealing with strict unicode pretty straight forward. It was a very long time ago so I may have simplified it in my memory. :) – Ted Lyngmo Mar 26 '21 at 15:23