Why String.CharacterView is not a MutableCollection

In the previous article I discussed why Set and Dictionary donʼt conform to MutableCollection and RangeReplaceableCollection. Today Iʼd like to do the same for String.CharacterView.

CharacterView does conform to RangeReplaceableCollection but not to MutableCollection. Why? A string is clearly mutable; it seems logical that it should adopt this protocol. Again, we need to consider the protocolʼs semantics.

The documentation for MutableCollection specifies these requirements:

The MutableCollection protocol allows changing the values of a collection’s elements but not the length of the collection itself. …

A value stored into a subscript of a MutableCollection instance must subsequently be accessible at that same position. That is, for a mutable collection instance a, index i, and value x, the two sets of assignments in the following code sample must be equivalent:

a[i] = x
let y = a[i]
// Must be equivalent to:
a[i] = x
let y = x

Replacing characters can invalidate indices

Letʼs try this with a CharacterView. Weʼll start with an initial string a and an index i that points to the "_" character, which we want to replace with an emoji (our new value x):

var a = "Grinning face: _".characters
let i = a.index(of: "_")!
// Verify a[i]
a[i] // → "_"

let x: Character = "😀"

We canʼt say a[i] = x to perform the replacement because of the missing MutableCollection conformance, but we can replicate the behavior with replaceSubrange(_:with:):

// Workaround for a[i] = x
a.replaceSubrange(i...i, with: CollectionOfOne(x))

// The replacement worked:
String(a) // → "Grinning face: 😀"
// Now a[i] should still return "_":
a[i] // → "�" WRONG!

a[i] returns � (the Unicode replacement character U+FFFD), which is a sign that something went wrong. The call to replaceSubrange worked, but the index i is no longer valid. We can see why if we take a look at its underlying structure using the dump function:

dump(i)
/* →
▿ Swift.String.CharacterView.Index
  ▿ _base: Swift.String.UnicodeScalarView.Index
    - _position: 15
  - _countUTF16: 1
*/

The index stores two values: a position in the stringʼs underlying storage (measured in UTF-16 code units) and the length of the Character in UTF-16 code units. While the original character "_" had length 1, its emoji replacement is two UTF-16 code units long, which we can verify by computing a new index:

let newIndex = a.index(of: "😀")!
dump(newIndex)
/* →
▿ Swift.String.CharacterView.Index
  ▿ _base: Swift.String.UnicodeScalarView.Index
    - _position: 15
  - _countUTF16: 2
*/

So we saw that mutating a single Character can invalidate that Characterʼs index. Therefore String.CharacterView canʼt conform to MutableCollection without violating the protocolʼs semantics. RangeReplaceableCollection has different semantics, which is why CharacterView conforms to it.

Character has a variable-length encoding

Let me add one more observation regarding the second invariant of MutableCollection, namely that mutation must not affect the collectionʼs length. CharacterView arguably fulfills this requirement: the length of a Character is always 1, so replacing one with another should maintain the stringʼs overall length.

However, the Characterʼs size in the underlying storage is not the same for all characters, so replacing a single Character can potentially make it necessary to move the subsequent text forward or backward in memory by a few bytes to make room for the replacement. This would make the simple subscript assignment potentially an O(n) operation, and subscripting is supposed to be O(1). The following quote is from the documentation for Collection:

Types that conform to Collection are expected to provide the startIndex and endIndex properties and subscript access to elements as O(1) operations. Types that are not able to guarantee that expected performance must document the departure, because many collection operations depend on O(1) subscripting performance for their own performance guarantees.

Granted, this only talks about the subscript getter and MutableCollection doesnʼt say anything about the expected setter performance, but itʼs probably reasonable to assume it should have the same characteristics.

Unicode edge cases

The final potential issue for CharacterViewʼs hypothetical MutableCollection conformance is Unicode and the complexities it brings. The existence of combining characters means that replacing a single Character can actually change the stringʼs length (measured in Characters) if the new character combines with its preceding character.

In the following example, we replace the underscore in "1_" with U+20E3 COMBINING ENCLOSING KEYCAP, which then combines with the character before it:

var s = "1_".characters
s.count // → 2, as expected
let idx = s.index(of: "_")!

// U+20E3 COMBINING ENCLOSING KEYCAP
let keycap: Character = "\u{20E3}"

// Replace _ with keycap
s.replaceSubrange(idx...idx, with: [keycap])

// The keycap has combined with the preceding character:
String(s) // → "1⃣"
// Length is now 1
s.count // → 1

The resulting string "1⃣" is only 1 Character long, again violating the MutableCollection semantics. And accessing s[idx] would now crash because the index points to a position that no longer exists.

Arguably, the main point this example makes is not so much that it would violate the MutableCollection semantics, but that Unicode is complex and can easily be misused — you should generally not operate with combining marks separately from their base characters. If a few obscure Unicode-related edge cases were the only situations where a string type could not uphold a protocolʼs semantics, this should probably not be reason enough not to adopt that protocol. Unicode is so complicated that thereʼs a good chance its features will never perfectly satisfy the constraints of a generic collection protocol.

For this reason, String will probably become a Collection again in Swift 4, acknowledging that some operations may violate strict Collection semantics for “degenerate” cases.

As weʼve seen, Unicode is not the deciding factor in this case, however. There are other things that make CharacterView incompatible with MutableCollection semantics.

If you liked this article, I bet you’ll also like Advanced Swift, the book I wrote together with Chris Eidhof and Airspeed Velocity.

The second edition has been fully updated for Swift 3.

Advanced Swift is available as a DRM-free e-book (including the full book for Swift Playgrounds on iPad) and in print.