Suppose we have a string that ends in a specific pattern, like this one:
let str = "abc,def,…[many fields]…,1234"
We want to extract the last field, i.e. the substring after the last comma, i.e. "1234"
.
Splitting the entire string
One solution might be to split the string at every comma and take the last element of the resulting array:
if let lastField = str.split(separator: ",").last {
// lastField is "1234" ✅
}
But this is quite wasteful if the string is potentially very long.
Finding the index of the last comma
It would be more efficient to find the index of last comma in the string and then take a substring from there to the end. It makes sense to start the search from the back, not only for performance reasons (assume the last field is known to be short), but also because the first portion of the string may contain commas we’re not interested in.
lastIndex(of:)
Solving this task is trivial in Swift 4.2, thanks to the new lastIndex(of:)
method, introduced with Swift Evolution proposal SE-0204:
if let lastCommaIndex = str.lastIndex(of: ",") {
let lastField = str[lastCommaIndex...].dropFirst()
// → "1234" ✅
}
Note the call to dropFirst
. The substring str[lastCommaIndex...]
includes the comma.
Manual implementation
But let’s assume for a minute that lastIndex(of:)
isn’t available to us. We could write a loop to accomplish the task, manually traversing through the string (from the back) until we find a match.
Let’s wrap the algorithm in an extension on String
to make it reusable:
extension String {
func lastIndex1(of char: Character) -> String.Index? {
var cursor = endIndex
while cursor > startIndex {
formIndex(before: &cursor)
if self[cursor] == char {
return cursor
}
}
return nil
}
}
This code is essentially the same as the standard library’s implementation for lastIndex(of:)
, if not quite as generic (the standard library version is implemented on BidirectionalCollection
). And it does the job:
if let lastCommaIndex = str.lastIndex1(of: ",") {
let lastField = str[lastCommaIndex...].dropFirst()
// → "1234" ✅
}
By composing reversed
and firstIndex(of:)
This is all well and good, but there’s another way to solve this task, and it’s what I want to talk about in this article — not because it’s a particularly practical example, but because I believe it can teach a lot about the design of the Swift standard library.
We could also implement our own version of lastIndex(of:)
by composing other Collection
APIs as follows:
extension String {
func lastIndex2(of char: Character) -> String.Index? {
guard let reversedIndex = reversed().firstIndex(of: char) else {
return nil
}
return index(before: reversedIndex.base)
}
}
It may not be immediately apparent why this works, so let’s go through it step by step. The guard
statement reverses the string before looking up the index of the search character:
guard let reversedIndex = reversed().firstIndex(of: char)
Reversing the string makes intuitive sense because we want to search it from the back, but it raises two questions:
-
Isn’t reversing the whole string expensive?
-
What can we do with an index into the reversed string? Is there a way to turn it back into an index for the original string?
1. Isn’t reversing the whole string expensive?
Answer: no, because reversed
is implemented lazily. The reversed
method doesn’t perform any work. All it does is wrap its input in a ReversedCollection
value:
extension BidirectionalCollection {
func reversed2() -> ReversedCollection2<Self> {
return ReversedCollection2(base: self)
}
}
(The code I show here and in the following is my own reimplementation of ReversedCollection
. I named it ReversedCollection2
to avoid conflicts with the standard library type, but both implementations are more or less the same.)
At its most basic, ReversedCollection
is a simple generic wrapper for any BidirectionalCollection
:
struct ReversedCollection2<Base: BidirectionalCollection> {
var base: Base
}
Even if the input string is many megabytes long, calling reversed()
on it is effectively free because String
uses copy-on-write. The actual string data isn’t copied until one of the copies is mutated.
Conforming to Collection
Next, the code calls firstIndex(of:)
1 on the ReversedCollection
. firstIndex(of:)
is part of the Collection
protocol, so ReversedCollection
must conform to Collection
. Let’s walk through a reimplementation of this conformance.
Index type
Every Collection
must have an index type. For ReversedCollection
, we can use the same idea for our index that we used for the collection itself: the index type should be a simple wrapper around the base collection’s index:
extension ReversedCollection2 {
struct Index {
var base: Base.Index
}
}
Collection indices must be Comparable
, so we have to add that conformance as well. Since we know that the base
value is also Comparable
, we can forward to it in our implementation. But note that we have to invert the logic since our index type must model the reversed relationship: the greatest base
index represents the smallest reversed index, and vice versa:
extension ReversedCollection2.Index: Comparable {
static func <(lhs: ReversedCollection2.Index, rhs: ReversedCollection2.Index) -> Bool {
// Inverted logic compared to base
return lhs.base > rhs.base
}
}
Collection
implementation
Having laid the groundwork, we can add the Collection
conformance. The idea is to base our implementation on the base collection, remembering that we have to invert everything: the reversed collection’s startIndex
is the base’s endIndex
(and vice versa), index(after:)
should call base.index(before:)
, and so on.
extension ReversedCollection2: Collection {
typealias Element = Base.Element
var startIndex: Index {
return Index(base: base.endIndex)
}
var endIndex: Index {
return Index(base: base.startIndex)
}
func index(after idx: Index) -> Index {
return Index(base: base.index(before: idx.base))
}
subscript(position: Index) -> Element {
return base[base.index(before: position.base)]
}
}
By the way, our reliance on base.index(before:)
is the reason why ReversedCollection
’s generic parameter must be constrained to BidirectionalCollection
. A plain Collection
has no index(before:)
method and thus cannot be traversed in reverse.
There’s one more thing to consider: a collection’s endIndex
signifies the “one past the end” position, not the position of the last element. By using base.endIndex
as the reversed collection’s startIndex
, we have effectively shifted all reversed indices one element to the right. This is why the subscript implementation accesses the element at base.index(before: position.base)
rather than position.base
.
It’s wrappers all the way down
Going back to the snippet we were discussing in our lastIndex2
implementation, the expression:
str.reversed().firstIndex(of: char)
// type: ReversedCollection<String>.Index?
returns an optional ReversedCollection<String>.Index
, which as we’ve seen is a wrapper around the corresponding index in the unreversed string.
2. Is there a way to turn it back into an index for the original string?
Answer: yes, because the standard library implements ReversedCollection
pretty much in the same way I have shown here, only with a few more bells and whistles. Read the code here.
And luckily, ReversedCollection.Index.base
is public2, which allows us to get back from the reversed index an index in the underlying base collection. Just as above, we have to remember to shift the index back, as the documentation reminds us:
To find the position that corresponds with this index in the original, underlying collection, use that collection’s
index(before:)
method with the base property.
Which is why the last line of our lastIndex2
implementation returns the index before the match:
return index(before: reversedIndex.base)
Conclusion
Arguably, the above is a pointless exercise without much practical relevance. However, I believe that understanding how and why this works can teach you a lot about the design of collection types and protocol hierarchy in the standard library. The Sequence
and Collection
APIs are designed to be composable, and to provide useful operations for implementing your own algorithms. Dave Abrahams’s WWDC 2018 talk Embracing Algorithms gives some great examples how to do this. It’s my favorite WWDC 2018 session.
ReversedCollection
isn’t the only lazy sequence/collection wrapper in the standard library. enumerated
, joined
, and zip
all work in exactly the same way.
And by using lazy
you can apply the same general pattern of returning a wrapper value that defers the actual work to map
, filter
, and other normally eager operations — the difference is that these wrappers store not only the base collection, but also the transformation function you pass in.
Why are some operation always lazy and others eager by default? The standard library’s policy is to make things lazy by default if the laziness is unambiguously
. If, on the other hand, there’s a significant downside to the lazy implementation, provide the lazy version under the free
lazy
property. Ben Cohen:
For example,
lazy.map
performs the mapping every time you iterate, which could be very costly on multiple iteration, and also captures and stores the mapping closure, which can be risky if it is stateful.
-
firstIndex(of:)
was calledindex(of:)
before Swift 4.2. ↩︎ -
ReversedCollection._base
is public, too, though hidden from code completion. ↩︎