What’s in a Collection?

This is an excerpt from the Collection Protocols chapter in the upcoming new edition of Advanced Swift (somewhat amended to fit in a blog post). Chris Eidhof and I are almost done updating the book for Swift 3. It will be out real soon now.

Collections in Swift are very powerful but also very complex. If you want to implement your own custom collection type, you need to understand how the Collection protocol works. And even if all you want to do is use the familiar collection types from the standard library, we think it’s worth learning how things work, not least because it can be a big help in decoding what the compiler wants to tell you with its error messages.

In this article, we’d like to discuss the Collection protocol’s associated types. This might seem like an obscure topic, but we think understanding what the associated types do and why they are needed is key to understanding collections in Swift.

Overview

Collection has five associated types. They are declared as follows (this is not the actual code because Index is declared in IndexableBase, but you get the idea):

protocol Collection: Indexable, Sequence {
    associatedtype Iterator: IteratorProtocol = IndexingIterator<Self>
    associatedtype SubSequence: IndexableBase, Sequence = Slice<Self>
    associatedtype Index: Comparable // declared in IndexableBase
    associatedtype IndexDistance: SignedInteger = Int
    associatedtype Indices: IndexableBase, Sequence = DefaultIndices<Self>
    ...
}

The first four are inherited from the base protocols Sequence, Indexable, and IndexableBase1; Collection restates all of them except Index with tighter constraints or different default values, though.

Notice that Collection provides defaults for all but one of its associated types — conforming types only have to specify an Index type. Even though you don’t have to care much about the other associated types, let’s go through them one by one.

Iterator

Inherited from Sequence. Sequences provide access to their elements by creating an iterator. The iterator produces the values of the sequence one at a time and keeps track of its iteration state as it traverses through the sequence.

Iterators have an associated type of their own called Element. The Element type specifies the type of the values the iterator produces. For example, the element type of the iterator for String.CharacterView is Character. By extension, the iterator also defines its sequence’s element type; the fact that Element is an associated type of IteratorProtocol is why you often see references to Iterator.Element in method signatures or generic constraints for Sequence and Collection.

The default iterator type for collections is IndexingIterator<Self>. This is a pretty simple struct that wraps the collection and uses the collection’s own indices to step over each element. Most collections in the standard library use IndexingIterator as their iterator. There should be little reason to write your own iterator type for a custom collection.

SubSequence

Also inherited from Sequence, but Collection restates this type with tighter constraints: a collection’s SubSequence should itself also be a Collection. (We say “should” rather than “must” because this constraint is currently not fully expressible in the type system.)

SubSequence is used as the return type for operations that return slices of the original collection:

  • prefix and suffix — take the first or last n elements.
  • dropFirst and dropLast — return subsequences where the first or last n elements have been removed.
  • split — break up the sequence at the specified separator elements and return an array of subsequences.
  • subscript with a Range<Index> argument — return a slice containing the elements at a range of indices.

The default subsequence type for collections is Slice<Self>, which wraps the original collection (similar to IndexingIterator) and stores the slice’s start and end index in terms of the base collection.

It can make sense for a collection to customize its SubSequence type, especially if it can be Self (i.e. a slice of the collection has the same type as the collection itself). A standard library type for which this is the case is String.CharacterView, which makes working with string slices more convenient. A counterexample is Array whose slice type is ArraySlice.

Index

An index represents a position in the collection. Every collection has two special indices, startIndex and endIndex. The startIndex designates the collection’s first element, and endIndex is the index that comes after the last element in the collection. An index should be a dumb value that only stores the minimal amount of information required to describe an element’s position. In particular, indices should not keep a reference to their collection if at all possible. The only requirement for a collection’s Index is that it must be Comparable, which is another way of saying that indices have a defined order.

We are used to indices being integers as is the case for arrays, but integer indices don’t work for every data structure. Again, take String.CharacterView as an example. Characters in Swift are variable-size; if you wanted to use integer indices, you’d have two options:

  1. Make the index represent an offset into the string’s internal storage. This is very efficient; accessing an element at a given index is an O(1) operation. But there would be gaps in the index range. For example, if the character at index 0 had twice the normal size, the next character would be at index 2 — accessing an element at index 1 would either trigger a fatal error or be undefined behavior. This would be a huge violation of user expectations.
  2. Make the index n represent the n-th character in the string. This is consistent with user expectations — there wouldn’t be any gaps in the index range. However, accessing an element at a given index is now an O(n) operation; the string must start at the beginning and traverse all elements before the given index to determine where the desired character is stored. This is a big no-no; users expect subscripting on an index to give direct element access in constant time.

As a result, String.CharacterView.Index is an opaque value that points to a position in the string’s internal storage buffer. It really is just a wrapper for a single Int offset, but that is an implementation detail that is of no interest to users of the collection.

Choosing the correct index type is a choice every collections needs to make individually. That’s why the Index type is the only associated type that doesn’t have a default.

IndexDistance

A signed integer type that represents the number of steps between two indices. There should be no reason to change this from the default Int.

Indices

The return type of the collection’s indices property. It represents a collection containing all indices that are valid for subscripting the base collection, in ascending order. Note that the endIndex is not included because it signifies the “past the end” position and thus is not a valid subscript argument.

In Swift 2, the indices property returned a Range<Index> that could be used to iterate over all valid indices in the collection. In Swift 3, Range<Index> is no longer iterable because indices can no longer be advanced on their own (it is now up to the collection to advance an index). The Indices type replaces Range<Index> to keep index iterations working.

The default Indices type is the imaginatively named DefaultIndices<Self>. Like Slice, it is a pretty simple wrapper for the base collection and a start and end index — it needs to keep a reference to the base collection to be able to advance the indices. This can cause a surprising performance issue if you mutate the collection while iterating over its indices: if the collection is implemented using copy-on-write (as all collections in the standard library are), the extra reference to the collection can trigger an unnecessary copy to be made when the loop is entered.

We cover copy-on-write extensively in the book. For now, it’s enough to know that if your custom collection can provide an alternative Indices type that does not need to keep a reference to the base collection, doing so is a worthwhile optimization. This is possible for all collections whose index math does not rely on the collection itself, like arrays. If your index is an integer type you can use CountableRange<Index>. This is how the definition would look for a custom queue type (we implement this type in the book):

extension Queue: Collection {
    ...
    
    typealias Indices = CountableRange<Int>
    
    var indices: CountableRange<Int> {
        return startIndex..<endIndex
    }
}
  1. Feel free to treat the presence of Indexable and IndexableBase as an implementation detail. These protocols were introduced to work around the limitation that the type system doesn’t support recursive protocol constraints, i.e. constraints on Collection’s associated types that reference Collection itself.

    Allowing this is part of the highly anticipated better generics support we will hopefully see within the next year. When it happens, we expect Indexable and IndexableBase to be removed and their functionality folded into Collection↩︎