0
let nsString = NSString("Some string")
let nsRange = NSRange(5...10)
type(of: nsString.substring(with: nsRange))
// => String.Type

How can I do this, but returning an NSString instead of a String. I'm not looking for a solution that uses Range<String.Index>, I'm aware how to do it that way, but for what i'm doing the speed difference is noticeable. I'd like to keep things in the NSString world.

Benchmark code:

let string = String(repeating: "This is it. ", count: 10000)
let i1 = 19000
let i2 = 19020

// time computation
func tc(computation: (Int, Int) -> Void) {
    let startTime = DispatchTime.now()
    for i in 0..<100 {
        computation(i1 + i, i2 + i)
    }
    let endTime = DispatchTime.now()
    let ns = (endTime.uptimeNanoseconds - startTime.uptimeNanoseconds)
    print("Time: \(ns)")
}



tc { (s1, s2) in
  let start = string.index(string.startIndex, offsetBy: s1)
  let end = string.index(start, offsetBy: s2 - s1)
  string[start..<end]
}

tc { (s1, s2) in
  (string as NSString).substring(with: NSRange(s1..<s2))
}

String block: 55_394_353ns

NSString block: 1_389_647ns - 39.86x faster

Peter R
  • 3,185
  • 23
  • 43
  • This is not related directly but it would be helpful [swift-which-types-to-use-nsstring-or-string](https://stackoverflow.com/questions/24038629/swift-which-types-to-use-nsstring-or-string) – Mohamad Ghaith Alzin Jun 12 '22 at 06:37
  • @MohamadGhaithAlzin I don't agree that Swift's native types are more optimized. My tests show that getting a substring via String and Range takes 28x as long as doing so via an NSRange/NSString. 36.7ms vs 1.3ms for 100 iterations. That's not insignificant for a call that happens many times per keypress. And that test was for a short string, I imagine it's much worse for longer strings – Peter R Jun 12 '22 at 06:59
  • @MohamadGhaithAlzin I did the test on a 200k char string, and the results get much worse. 583ms for String/Range vs 1.56ms for NSString/NSRange, (both for 100 iterations). – Peter R Jun 12 '22 at 07:06
  • 1
    Are your tests done in an optimized/release build? – Rob Jun 12 '22 at 07:14
  • @Rob They're not, but unless Swift can do away with iterating each glyph in a string to convert an Int index to a String.Index, I see no way that this: `let start = string.index(string.startIndex, offsetBy: s1); let end = string.index(string.startIndex, offsetBy: s2); let range = start.. – Peter R Jun 12 '22 at 07:21
  • @Rob I did an optimized build, 110k char string, 10 iterations. NSRange: 0.37ms, Range 25.6ms – Peter R Jun 12 '22 at 07:35
  • 2
    Can you add your benchmark code to the question? – Martin R Jun 12 '22 at 08:35
  • 1
    Your two code blocks do not the same thing: the first one interprets `s1` and `s2` as `Character` counts (i.e. extended grapheme clusters), and the second one as UTF-16 code points counts (the units of `NSString`). The results are *different* if the string contains, Emojis, flags, or other characters outside of the “basic multilingual plane.” – Martin R Jun 12 '22 at 10:12
  • 2
    Btw, the first version can made a bit faster by computing the end index as `string.index(start, offsetBy: s2 - s1)`, but counting extended grapheme clusters is still slower than counting UTF-16 code units. – Martin R Jun 12 '22 at 10:16
  • Related question: https://stackoverflow.com/questions/65570094/swift-obj-c-interop-prevent-bridging-of-foundation-classes – Cristik Jun 12 '22 at 10:46
  • @MartinR Is there another way to do what I'm doing in the second block, but using `String`? To my understanding `String`'s way of of indexing forces me to use character counts rather than codepoints. The indexes I'll be getting from the server will be codepoint offsets rather than character offsets, so even asside from the performance difference, it seems like another reason to go with NSString. – Peter R Jun 12 '22 at 11:21
  • 1
    For UTF-16 code point based indices you can do `let start = string.utf16.index(string.startIndex, offsetBy: s1)` and `let end = string.utf16.index(start, offsetBy: s2 - s1)`, that should be considerably faster. – Martin R Jun 12 '22 at 11:38
  • 1
    So this seems to be an XY-problem: What you actually want is not “make NSString substring using an NSString” but a fast Swift string subscripting based on UTF-16 indices. I would suggest that you update the question accordingly. “The indexes I'll be getting from the server will be codepoint offsets” is relevant information. – Martin R Jun 12 '22 at 11:48
  • Martin is right. Using `utf16` with `String` is much faster, and is even an order of magnitude faster than `NSString` approach, using your benchmark. – Rob Jun 12 '22 at 15:02
  • And this assumes that you do not need the proper handling of characters represented by multiple unicode scalars. Consider `"‍‍‍"`. What is the length of that string? Of that `NSString`? – Rob Jun 12 '22 at 15:25
  • @Rob How are you getting that it's an order of magnitude faster than `NSString`? I just ran some tests and NSString is still 1/23 the speed of String, even using the utf16 code supplied by @MartinR. – Peter R Jun 12 '22 at 21:10
  • Awesome. Thanks for this @Rob. I was indeed able to repeat your results. I'll go with `String` using `utf16` as @MartinR suggested. – Peter R Jun 12 '22 at 23:52

2 Answers2

-1

Can't you cast NSString to String and vice versa?

let string = NSString("nsString") as String
let nsString = String("string") as NSString
Moose
  • 2,607
  • 24
  • 23
  • Yes, but there's a cost to that, which is why I was looking for a solution that didn't require casting back and forth from `String`. – Peter R Jun 12 '22 at 06:47
-1

These are being done 100 times, Right?

let start = string.index(string.startIndex, offsetBy: s1)
let end = string.index(string.startIndex, offsetBy: s2)

How about doing this instead:

let start = string.index(string.startIndex, offsetBy: s1)
let end = string.index(string.startIndex, offsetBy: s2)

tc {
    string[start..<end]
}

Which one is more optimized now?

  • The 100x was just to give more meaningful times in the benchmark than doing it once. In the actual code, it would be done an arbitrary amount of times, each with a different set of index numbers. I will update the code to reflect that. – Peter R Jun 12 '22 at 11:18
  • @PeterR Since you are comparing the time it takes of getting `substring` from the original `string`, then it is more reasonable to compare only the functions for doing that! Where is the logic right now behind getting `start` and `end` 100 times? – Mohamad Ghaith Alzin Jun 12 '22 at 11:56
  • @PeterR Have you tried doing it as I mentioned only once? – Mohamad Ghaith Alzin Jun 12 '22 at 12:00
  • What is a faster way to get a substring? – Peter R Jun 12 '22 at 21:16