11

As the title said, I tried to prove myself that COW(copy on write) is supported for String in Swift. But I cannot find a proof. I proved the COW on Array and Dictionary after trying the following codes:

func address(of object: UnsafeRawPointer) -> String {
    let addr = Int(bitPattern: object)
    return String(format: "%p", addr)
}

var xArray = [20, 30, 40, 50, 60]
var yArray = xArray

// These two addresses were the same
address(of: xArray) 
address(of: yArray)

yArray[0] = 200
// The address of yArray got changed
address(of: yArray)

But for String type, it was not working.

var xString = "Hello World"
var yString = xString

// These two addresses were different
address(of: xString)
address(of: yString)

And I dumped the test function from the official Swift code repo.

func _rawIdentifier(s: String) -> (UInt, UInt) {
    let tripe = unsafeBitCast(s, to: (UInt, UInt, UInt).self)
    let minusCount = (tripe.0, tripe.2)
    return minusCount
}

But this function seems to only cast the actual value pointed to not the address. So two different String variables with the same value would have the same rawIdentifier. Still cannot prove COW to me.

var xString = "Hello World"
var yString = "Hello" + " World" 

// These two rawIdentifiers were the same
_rawIdentifier(s: xString)
_rawIdentifier(s: yString)

So how does COW work on String type in Swift?

Wu_
  • 145
  • 7
  • 2
    You could just look at the source code: https://github.com/apple/swift/blob/master/stdlib/public/core/String.swift – Palle Oct 14 '17 at 17:53
  • Apparently the compiler recognizes that "Hello World" and "Hello" + " World" are the same string literal, and creates only a single storage for them. Try the same with *different* strings. – Martin R Oct 14 '17 at 17:56
  • The compiler does constant folding even in unoptimised builds (it's a so-called ["guaranteed optimisation"](https://github.com/apple/swift/blob/master/docs/SIL.rst#guaranteed-optimization-and-diagnostic-passes)), so as Martin says, `"Hello" + " World"` is folded into `"Hello World"`. You could also do `var yString = "Hello"; yString += " World"` to notice a difference (the buffer will also gain an owner in that case, as it's now dynamically, rather than statically, allocated) – Hamish Oct 14 '17 at 18:15
  • A word of caution about `_rawIdentifier`: Using `unsafeBitCast` bypasses retain count operations, so it is fully possible for you to be looking at dangling pointers (i.e the string buffer gets released while you're still looking at the string value) – attempting to dereference those would then be *undefined behaviour*. In the code you took it from, the callers used `_fixLifetime` in order to guarantee this couldn't happen. You can use `withExtendedLifetime(_:_:)` to ensure this too (or use an `UnsafeMutablePointer` and rebind the memory). – Hamish Oct 14 '17 at 18:54
  • @Hamish: But `_rawIdentifier()` does not dereference the pointer, or does it? – Martin R Oct 14 '17 at 19:11
  • @MartinR It doesn't; but OP should be careful what he does with the result (i.e don't use the values a bit patterns for `UnsafePointer`s and then try to inspect the pointees). It was more of a pre-emptive warning :) – Hamish Oct 14 '17 at 19:13
  • @Palle it's deep blue see, can you please please point what code to look for – maddy Sep 18 '18 at 19:03

1 Answers1

14

The compiler creates only a single storage for both "Hello World" and "Hello" + " World".

You can verify that for example by examining the assembly code obtained from

swiftc -emit-assembly cow.swift

which defines only a single string literal

    .section    __TEXT,__cstring,cstring_literals
L___unnamed_1:
    .asciz  "Hello World"

As soon as the string is mutated, the address of the string storage buffer (the first member of that "magic" tuple, actually _baseAddress of struct _StringCore, defined in StringCore.swift) changes:

var xString = "Hello World"
var yString = "Hello" + " World"

print(_rawIdentifier(s: xString)) // (4300325536, 0)
print(_rawIdentifier(s: yString)) // (4300325536, 0)

yString.append("!")
print(_rawIdentifier(s: yString)) // (4322384560, 4322384528)

And why does your

func address(of object: UnsafeRawPointer) -> String

function show the same values for xArray and yArray, but not for xString and yString?

Passing an array to a function taking a unsafe pointer passes the address of the first array element, that is the same for both arrays if they share the storage.

Passing a string to a function taking an unsafe pointer passes a pointer to a temporary UTF-8 representation of the string. That address can be different in each call, even for the same string.

This behavior is documented in the "Using Swift with Cocoa and Objective-C" reference for UnsafePointer<T> arguments, but apparently works the same for UnsafeRawPointer arguments.

Martin R
  • 529,903
  • 94
  • 1,240
  • 1,382
  • 1
    For anyone's that interested in the second number in the tuple, and why it changes on appending: that's a pointer to the string buffer's owner. The buffer has an owner when it's dynamically allocated (but not when statically allocated, such as is the case with a string literal). The owner is just a class instance that takes care of the reference counting for the buffer. When the owner is deallocated, the buffer is released. – Hamish Oct 14 '17 at 18:42
  • I believe under the hood for native storage, a single `ManagedBuffer` instance is used for a dynamic string buffer; so the raw buffer pointer (first element of tuple) just points to the start of the body of that instance, and the owner points to the start of the header. – Hamish Oct 14 '17 at 18:42