Building a memory cache for a file reader in Swift

In the previous blog post Reading data from a file with DispatchIO I built a small FileReader which enabled reading data from a file for random byte ranges. Signal Path uses a similar file reader but in addition it also caches read data in some interactions where there are a lot of read requests active, for example, while recording. Therefore, in this blog post we’ll build a small file data cache which stores data chunks for byte ranges. The cache needs to be performant, otherwise it would be faster to just read the data from the file if it is slow.

Defining an interface for the cache

The interface of the FileMemoryCache needs to provide a way for retrieving data, storing data, and limiting the overall memory usage.

final class FileMemoryCache {
init(byteCountLimit: Int)
/// Returns data for the byte range
func data(for byteRange: CountableRange<Int>) -> DispatchData?
/// Caches data for the byte range and evicts older data when exceeding byte count limit
func set(_ data: DispatchData, byteRange: CountableRange<Int>)
/// Removes everything from the cache
func removeAll()
}

Storing and retrieving cached data

The strategy of caching is to use a sorted array where the array is sorted by the first index of a byte range. Items in the array are structs containing a byte range and data. Having a sorted array makes it easier to merge data chunks when retrieving data for a longer byte range than the individual array items provide. As the aim is to keep it as fast as possible then data is merged only when cache is accessed which reduces the need to copy memory multiple times. The downside is that the cache can store overlapping chunks of data which reduces the efficiency. But on the other hand it is fast. Sorted array also enables to use binary search if linear search happens to be too slow with large amount of cached data chunks.

final class FileMemoryCache {
private var sortedItems = [StorageItem]()
struct StorageItem {
let byteRange: CountableRange<Int>
let data: DispatchData
}
}

Let’s start with the store method which takes a data and a byte range it corresponds to. For keeping the array sorted, we’ll use the first index of the byte range as the attribute for sorting. We’ll also make sure that the current item at the insertion index does not already contain the byte range we are trying to cache. If it does contain, we can safely ignore the current store request as the data is already cached. Additionally, we’ll check if the previous item in the array as it might also contain the byte range.

func set(_ data: DispatchData, byteRange: CountableRange<Int>) {
if let insertionIndex = sortedItems.firstIndex(where: { $0.byteRange.startIndex >= byteRange.startIndex }) {
let canDiscard: Bool = {
guard !sortedItems[insertionIndex].byteRange.contains(byteRange) else { return true }
guard insertionIndex > sortedItems.startIndex else { return false }
return sortedItems[insertionIndex.advanced(by: -1)].byteRange.contains(byteRange)
}()
if !canDiscard {
sortedItems.insert(StorageItem(byteRange: byteRange, data: data), at: insertionIndex)
}
}
else {
sortedItems.append(StorageItem(byteRange: byteRange, data: data))
}
}

When it comes to retrieving data then we’ll need to find intersecting cached data chunks and merge them if needed. Therefore, the first step is to collect cached data chunks which intersect with the requested range. For keeping it as simple as possible, will just loop over the array without using binary search and collect the items which are intersecting. The next step after that is to make sure the whole requested byte range is already present in the cache. This can be checked quite easily with IndexSet which contains indexes for every requested bytes. By removing byte indexes in intersecting ranges, the index set becomes empty. It means that every byte index is cached.

func data(for byteRange: CountableRange<Int>) -> DispatchData? {
// Find items which intersect with the search range
guard let firstIntersectingItemIndex = sortedItems.firstIndex(where: { byteRange.intersects($0.byteRange) }) else { return nil }
let intersectingItems = sortedItems[firstIntersectingItemIndex].prefix(while: { byteRange.intersects($0.byteRange) })
// Check if the whole range is covered
var byteIndexesInCache = IndexSet(integersIn: byteRange)
intersectingItems.forEach({ byteIndexesInCache.remove(integersIn: $0.byteRange) })
guard byteIndexesInCache.isEmpty else { return nil }
//
}

We’ll need to copy intersecting byte indexes into a final contiguous data. It could be that the whole cached data chunk can be copied over or just a part of it. After figuring out which part of the cached data chunk to copy, the next step is to calculate the byte index in the resulting memory region which starts with the zero byte index. The implementation can be seen below which includes byte range index conversions.

func data(for byteRange: CountableRange<Int>) -> DispatchData? {
//
guard let byteRangePointer = malloc(byteRange.count) else { return nil }
for storageItem in intersectingItems {
let rangeToCopyInData: CountableRange<Int> = {
switch (byteRange.contains(storageItem.byteRange.startIndex), byteRange.contains(storageItem.byteRange.endIndex 1)) {
case (true, true):
// Everything: |+++++++++++++|
return 0..<storageItem.byteRange.count
case (false, true):
// End: |—++++++++++|
return byteRange.startIndex storageItem.byteRange.startIndex..<storageItem.byteRange.count
case (true, false):
// Start: |++++++++++—|
return 0..<byteRange.endIndex storageItem.byteRange.startIndex
case (false, false):
// Middle: |—+++++++—|
return byteRange.startIndex storageItem.byteRange.startIndex..<byteRange.endIndex storageItem.byteRange.startIndex
}
}()
// Find the byte range in the allocated memory range where to copy the cached data section
let destinationByteIndex = storageItem.byteRange.startIndex byteRange.startIndex + rangeToCopyInData.startIndex
let destinationPointer = byteRangePointer.advanced(by: destinationByteIndex)
let destinationBufferPointer = UnsafeMutableRawBufferPointer(start: destinationPointer, count: rangeToCopyInData.count)
storageItem.data.copyBytes(to: destinationBufferPointer, from: rangeToCopyInData)
// Ignore remaining intersections because everything is copied
if destinationByteIndex + rangeToCopyInData.count >= byteRange.endIndex {
break
}
}
let byteRangeBufferPointer = UnsafeRawBufferPointer(start: byteRangePointer, count: byteRange.count)
return DispatchData(bytesNoCopy: byteRangeBufferPointer, deallocator: .free)
}

Summary

We built a simple cache for storing data chunks for byte ranges. We used a sorted array for storing individual data chunks which made it easier to merge them into one when the cache is accessed. Make sure to check the full implementation on GitHub which also includes evicting cached data.

If this was helpful, please let me know on Twitter @toomasvahter. Feel free to subscribe to RSS feed. Thank you for reading.

Example Project

FileReaderPlayground (Xcode 12.4)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s