In the previous blog post Reading data from a file with DispatchIO I built a small FileReader which enabled reading data from a file for random byte ranges. Signal Path uses a similar file reader but in addition it also caches read data in some interactions where there are a lot of read requests active, for example, while recording. Therefore, in this blog post we’ll build a small file data cache which stores data chunks for byte ranges. The cache needs to be performant, otherwise it would be faster to just read the data from the file if it is slow.
Defining an interface for the cache
The interface of the FileMemoryCache needs to provide a way for retrieving data, storing data, and limiting the overall memory usage.
Storing and retrieving cached data
The strategy of caching is to use a sorted array where the array is sorted by the first index of a byte range. Items in the array are structs containing a byte range and data. Having a sorted array makes it easier to merge data chunks when retrieving data for a longer byte range than the individual array items provide. As the aim is to keep it as fast as possible then data is merged only when cache is accessed which reduces the need to copy memory multiple times. The downside is that the cache can store overlapping chunks of data which reduces the efficiency. But on the other hand it is fast. Sorted array also enables to use binary search if linear search happens to be too slow with large amount of cached data chunks.
Let’s start with the store method which takes a data and a byte range it corresponds to. For keeping the array sorted, we’ll use the first index of the byte range as the attribute for sorting. We’ll also make sure that the current item at the insertion index does not already contain the byte range we are trying to cache. If it does contain, we can safely ignore the current store request as the data is already cached. Additionally, we’ll check if the previous item in the array as it might also contain the byte range.
When it comes to retrieving data then we’ll need to find intersecting cached data chunks and merge them if needed. Therefore, the first step is to collect cached data chunks which intersect with the requested range. For keeping it as simple as possible, will just loop over the array without using binary search and collect the items which are intersecting. The next step after that is to make sure the whole requested byte range is already present in the cache. This can be checked quite easily with IndexSet which contains indexes for every requested bytes. By removing byte indexes in intersecting ranges, the index set becomes empty. It means that every byte index is cached.
We’ll need to copy intersecting byte indexes into a final contiguous data. It could be that the whole cached data chunk can be copied over or just a part of it. After figuring out which part of the cached data chunk to copy, the next step is to calculate the byte index in the resulting memory region which starts with the zero byte index. The implementation can be seen below which includes byte range index conversions.
Summary
We built a simple cache for storing data chunks for byte ranges. We used a sorted array for storing individual data chunks which made it easier to merge them into one when the cache is accessed. Make sure to check the full implementation on GitHub which also includes evicting cached data.
If this was helpful, please let me know on Mastodon@toomasvahter or Twitter @toomasvahter. Feel free to subscribe to RSS feed. Thank you for reading.
Example Project
FileReaderPlayground (Xcode 12.4)