Categories
Foundation iOS Swift

Merging sorted arrays with duplicates in Swift

The other day, I needed a way for merging sorted arrays containing structs. In addition, not just merging arrays, I also needed to handle duplicates. The duplicate handling needed to prioritize elements being inserted to the existing array. The case for that was adding an updated element to the existing array. In general, sounded like a leet code exercise, but something what I actually needed.

The fact that we are dealing with structs and need to take care of duplicates, then one way for that would be rebuilding the resulting array every time when merging new elements. Looping over existing elements allows doing the duplicate detection and in the end, we just need to loop over existing elements once and new elements twice (twice due to duplicates handling). Identification for elements are served by an id property. Since we’ll rebuild the resulting array we need a way to figure out how to sort elements, therefore, we can use a typical areInIncreasingOrder closure familiar from other sort related APIs. Since we discussed using id for identification, we require that Array elements conform to Identifiable protocol.

extension RandomAccessCollection {
func sortedMerged(
with otherSorted: [Element],
areInIncreasingOrder: (Element, Element) -> Bool
) -> [Element] where Element: Identifiable {

This interface will allow us to detect duplicates and keep the resulting array sorted after inserting new elements.

The core of the function is looping over existing array and new/other elements and adding the smaller element to the resulting array. Then advancing the index of the existing array or the new/other elements array, depending on which was just inserted, to the resulting array. One of the requirements is that we should prefer elements from the new/other elements array. Therefore, each time we try to add an element from the existing array to the resulting array, we should check that this element is not present in the new/other elements array. Such lookup is easy to implement with a Set which contains ids of all the elements in the new elements array. If we put everything together, the function looks like this.

extension RandomAccessCollection {
func sortedMerged(
with otherSorted: [Element],
areInIncreasingOrder: (Element, Element) -> Bool
) -> [Element] where Element: Identifiable {
let otherIds = Set<Element.ID>(otherSorted.map(\.id))
var result = [Element]()
result.reserveCapacity(count + otherSorted.count)
var currentIndex = startIndex
var otherIndex = otherSorted.startIndex
while currentIndex < endIndex, otherIndex < otherSorted.endIndex {
if areInIncreasingOrder(self[currentIndex], otherSorted[otherIndex]) {
// Prefer elements from the other collection over elements in the existing collection
if !otherIds.contains(self[currentIndex].id) {
result.append(self[currentIndex])
}
currentIndex = self.index(after: currentIndex)
} else {
result.append(otherSorted[otherIndex])
otherIndex = otherSorted.index(after: otherIndex)
}
}
// The other sorted array was exhausted, add remaining elements from the existing array
while currentIndex < endIndex {
// Prefer elements from the other collection over elements in the existing collection
if !otherIds.contains(self[currentIndex].id) {
result.append(self[currentIndex])
}
currentIndex = self.index(after: currentIndex)
}
// The existing sorted array was exhausted, add remaining elements from the other array
if otherIndex < otherSorted.endIndex {
result.append(contentsOf: otherSorted[otherIndex…])
}
return result
}
}

Here is an example of how to use it. The example involves inserting elements where some are duplicates, with one of the properties has changed.

struct Item: Identifiable {
let id: String
let date: Date
}
let referenceDate = Date(timeIntervalSince1970: 1711282131)
let original: [Item] = [
Item(id: "1", date: referenceDate.addingTimeInterval(1.0)),
Item(id: "2", date: referenceDate.addingTimeInterval(2.0)),
Item(id: "3", date: referenceDate.addingTimeInterval(3.0)),
Item(id: "4", date: referenceDate.addingTimeInterval(4.0)),
Item(id: "5", date: referenceDate.addingTimeInterval(5.0)),
]
let other: [Item] = [
Item(id: "3", date: referenceDate.addingTimeInterval(1.5)),
Item(id: "7", date: referenceDate.addingTimeInterval(2.5)),
Item(id: "4", date: referenceDate.addingTimeInterval(4.0)),
Item(id: "5", date: referenceDate.addingTimeInterval(5.5)),
Item(id: "6", date: referenceDate.addingTimeInterval(8.0)),
]
let result = original.sortedMerged(with: other, areInIncreasingOrder: { $0.date < $1.date })
result.forEach { item in
print("\(item.id) – \(item.date.timeIntervalSince(referenceDate))")
}
// 1 – 1.0
// 3 – 1.5
// 2 – 2.0
// 7 – 2.5
// 4 – 4.0
// 5 – 5.5
// 6 – 8.0

If this was helpful, please let me know on Mastodon@toomasvahter or Twitter @toomasvahter. Feel free to subscribe to RSS feed. Thank you for reading.