This is an excerpt from the Collection Protocols chapter in the upcoming new edition of Advanced Swift (somewhat amended to fit in a blog post). Chris Eidhof and I are almost done updating the book for Swift 3. It will be out real soon now.
Collections in Swift are very powerful but also very complex. If you want to implement your own custom collection type, you need to understand how the Collection
protocol works. And even if all you want to do is use the familiar collection types from the standard library, we think it’s worth learning how things work, not least because it can be a big help in decoding what the compiler wants to tell you with its error messages.
In this article, we’d like to discuss the Collection
protocol’s associated types. This might seem like an obscure topic, but we think understanding what the associated types do and why they are needed is key to understanding collections in Swift.
Overview
Collection
has five associated types. They are declared as follows (this is not the actual code because Index
is declared in IndexableBase
, but you get the idea):
protocol Collection: Indexable, Sequence {
associatedtype Iterator: IteratorProtocol = IndexingIterator<Self>
associatedtype SubSequence: IndexableBase, Sequence = Slice<Self>
associatedtype Index: Comparable // declared in IndexableBase
associatedtype IndexDistance: SignedInteger = Int
associatedtype Indices: IndexableBase, Sequence = DefaultIndices<Self>
...
}
The first four are inherited from the base protocols Sequence
, Indexable
, and IndexableBase
1; Collection
restates all of them except Index
with tighter constraints or different default values, though.
Notice that Collection
provides defaults for all but one of its associated types — conforming types only have to specify an Index
type. Even though you don’t have to care much about the other associated types, let’s go through them one by one.
Iterator
Inherited from Sequence
. Sequences provide access to their elements by creating an iterator. The iterator produces the values of the sequence one at a time and keeps track of its iteration state as it traverses through the sequence.
Iterators have an associated type of their own called Element
. The Element
type specifies the type of the values the iterator produces. For example, the element type of the iterator for String.CharacterView
is Character
. By extension, the iterator also defines its sequence’s element type; the fact that Element
is an associated type of IteratorProtocol
is why you often see references to Iterator.Element
in method signatures or generic constraints for Sequence
and Collection
.
The default iterator type for collections is IndexingIterator<Self>
. This is a pretty simple struct that wraps the collection and uses the collection’s own indices to step over each element. Most collections in the standard library use IndexingIterator
as their iterator. There should be little reason to write your own iterator type for a custom collection.
SubSequence
Also inherited from Sequence
, but Collection
restates this type with tighter constraints: a collection’s SubSequence
should itself also be a Collection
. (We say “should” rather than “must” because this constraint is currently not fully expressible in the type system.)
SubSequence
is used as the return type for operations that return slices of the original collection:
prefix
andsuffix
— take the first or last n elements.dropFirst
anddropLast
— return subsequences where the first or last n elements have been removed.split
— break up the sequence at the specified separator elements and return an array of subsequences.subscript
with aRange<Index>
argument — return a slice containing the elements at a range of indices.
The default subsequence type for collections is Slice<Self>
, which wraps the original collection (similar to IndexingIterator
) and stores the slice’s start and end index in terms of the base collection.
It can make sense for a collection to customize its SubSequence
type, especially if it can be Self
(i.e. a slice of the collection has the same type as the collection itself). A standard library type for which this is the case is String.CharacterView
, which makes working with string slices more convenient. A counterexample is Array
whose slice type is ArraySlice
.
Index
An index represents a position in the collection. Every collection has two special indices, startIndex
and endIndex
. The startIndex
designates the collection’s first element, and endIndex
is the index that comes after the last element in the collection. An index should be a dumb value that only stores the minimal amount of information required to describe an element’s position. In particular, indices should not keep a reference to their collection if at all possible. The only requirement for a collection’s Index
is that it must be Comparable
, which is another way of saying that indices have a defined order.
We are used to indices being integers as is the case for arrays, but integer indices don’t work for every data structure. Again, take String.CharacterView
as an example. Characters in Swift are variable-size; if you wanted to use integer indices, you’d have two options:
- Make the index represent an offset into the string’s internal storage. This is very efficient; accessing an element at a given index is an O(1) operation. But there would be gaps in the index range. For example, if the character at index 0 had twice the normal size, the next character would be at index 2 — accessing an element at index 1 would either trigger a fatal error or be undefined behavior. This would be a huge violation of user expectations.
- Make the index
n
represent the n-th character in the string. This is consistent with user expectations — there wouldn’t be any gaps in the index range. However, accessing an element at a given index is now an O(n) operation; the string must start at the beginning and traverse all elements before the given index to determine where the desired character is stored. This is a big no-no; users expect subscripting on an index to give direct element access in constant time.
As a result, String.CharacterView.Index
is an opaque value that points to a position in the string’s internal storage buffer. It really is just a wrapper for a single Int
offset, but that is an implementation detail that is of no interest to users of the collection.
Choosing the correct index type is a choice every collections needs to make individually. That’s why the Index
type is the only associated type that doesn’t have a default.
IndexDistance
A signed integer type that represents the number of steps between two indices. There should be no reason to change this from the default Int
.
Indices
The return type of the collection’s indices
property. It represents a collection containing all indices that are valid for subscripting the base collection, in ascending order. Note that the endIndex
is not included because it signifies the “past the end” position and thus is not a valid subscript argument.
In Swift 2, the indices
property returned a Range<Index>
that could be used to iterate over all valid indices in the collection. In Swift 3, Range<Index>
is no longer iterable because indices can no longer be advanced on their own (it is now up to the collection to advance an index). The Indices
type replaces Range<Index>
to keep index iterations working.
The default Indices
type is the imaginatively named DefaultIndices<Self>
. Like Slice
, it is a pretty simple wrapper for the base collection and a start and end index — it needs to keep a reference to the base collection to be able to advance the indices. This can cause a surprising performance issue if you mutate the collection while iterating over its indices: if the collection is implemented using copy-on-write (as all collections in the standard library are), the extra reference to the collection can trigger an unnecessary copy to be made when the loop is entered.
We cover copy-on-write extensively in the book. For now, it’s enough to know that if your custom collection can provide an alternative Indices
type that does not need to keep a reference to the base collection, doing so is a worthwhile optimization. This is possible for all collections whose index math does not rely on the collection itself, like arrays. If your index is an integer type you can use CountableRange<Index>
. This is how the definition would look for a custom queue type (we implement this type in the book):
extension Queue: Collection {
...
typealias Indices = CountableRange<Int>
var indices: CountableRange<Int> {
return startIndex..<endIndex
}
}
-
Feel free to treat the presence of
Indexable
andIndexableBase
as an implementation detail. These protocols were introduced to work around the limitation that the type system doesn’t support recursive protocol constraints, i.e. constraints onCollection
’s associated types that referenceCollection
itself.Allowing this is part of the highly anticipated better generics support we will hopefully see within the next year. When it happens, we expect
Indexable
andIndexableBase
to be removed and their functionality folded intoCollection
. ↩︎