Swift's notion of string components and iteration has changed in Beta 4. From the guide, we see:
Every instance of Swift’s Character type represents a single extended grapheme cluster. An extended grapheme cluster is a sequence of one or more Unicode scalars that (when combined) produce a single human-readable character.
This has some interesting side effects:
let str1 = "abc"
let str2 = "\u{20DD}def"
countElements(str1) // 3
countElements(str2) // 4
countElements(str1+str2) // 6 ≠ 3+4 !!!
That's because the c
and \u{20DD}
combine to form c⃝. Also notice that we're using countElements
. In order to figure out the length of the string, Swift actually has to iterate through the whole string and figure out where the actual grapheme divisions are, so it takes O(n) time.
We can also see the effect on different encodings:
Array((str1+str2).utf8) // [97, 98, 99, 226, 131, 157, 100, 101, 102]
Array((str1+str2).utf16) // [97, 98, 99, 8413, 100, 101, 102]
Another issue, as your error says, is that String
's IndexType
is no longer convertible from an integer literal: you can't perform random access on the string by specifying an offset. Instead, you can use startIndex
and advance
to move forward some distance in the string, for example str[str.startIndex]
or str[advance(str.startIndex, distance)]
.
Or you can define your own helper functions in the meantime:
func at<C: Collection>(c: C, i: C.IndexType.DistanceType) -> C.GeneratorType.Element {
return c[advance(c.startIndex, i)]
}
func take<C: protocol<Collection, Sliceable>>(c: C, n: C.IndexType.DistanceType) -> C.SliceType {
return c[c.startIndex..<advance(c.startIndex, n)]
}
at(str1+str2, 3) // d
take(str1+str2, 2) // ab
Obviously there are some improvements that could (and probably will) be made in future updates. You may want to file a bug with your concerns. In the long run, supporting grapheme clusters correctly was probably a good decision, but it makes string access a little more painful in the meantime.