Categories
Swift

Reading data from a file with DispatchIO

Signal Path is an app which works with large files, even in several gigabytes. The app reads ranges from the file and visualizes the data. Signal Path heavily relies on DispatchIO for having efficient access to data in a file. The aim of the blog post is to build a FileReader class which wraps DispatchIO and provides a similar functionality.

DispatchIO manages a file descriptor and coordinates file accesses. DispatchIO object can be created by providing a path to the file and specifying the stream type: stream or random access. In the context of this post we are interested in random access as we would like to read random ranges of bytes from a file. Therefore, let’s start with defining an interface for the FileReader.

final class FileReader {
init(fileURL: URL)
/// Opens the I/O channel with random access semantics
func open()
/// Closes the opened I/O channel
func close()
/// Reads data at byte range.
func read(byteRange: CountableRange<Int>, queue: DispatchQueue = .main, completionHandler: @escaping (DispatchData?) -> Void)
}
The interface for DispatchIO wrapping class.

The interface is pretty straight-forward with functions to open and close the file and with an asynchronous read function. DispatchIO returns read data as DispatchData which is a contiguous block of memory and what also can be cast into Data if needed.

We can use an init method on DispatchIO which has path argument when dealing with file paths. Note that the queue parameter on DispatchIO just specifies which queue is used for the cleanup handler, most of the cases .main will suffice. After creating the channel we can additionally control if data is returned once or partially with multiple handler callbacks. In our case, we would like to get a single callback, and therefore we need to set the low limit to Int.max. Then the read data is returned in with a single callback.

func open() -> Bool {
guard channel == nil else { return true }
guard let path = (fileURL.path as NSString).utf8String else { return false }
channel = DispatchIO(type: .random, path: path, oflag: 0, mode: 0, queue: .main, cleanupHandler: { error in
print("Closed a channel with status: \(error)")
})
// Load the whole requested byte range at once
channel?.setLimit(lowWater: .max)
guard self.channel != nil else { return false }
print("Opened a channel at \(fileURL)")
return true
}
func close() {
channel?.close()
channel = nil
}

Reading from a file requires having an opened channel and defining a byte range.

func read(byteRange: CountableRange<Int>, queue: DispatchQueue = .main, completionHandler: @escaping (DispatchData?) -> Void) {
if let channel = channel {
channel.read(offset: off_t(byteRange.startIndex), length: byteRange.count, queue: queue, ioHandler: { done, data, error in
print(done, data?.count ?? -1, error)
completionHandler(data)
})
}
else {
print("Channel is closed")
completionHandler(nil)
}
}

And the file reader can be used like this:

let fileURL = Bundle.main.url(forResource: "DataFile", withExtension: nil)!
let reader = FileReader(fileURL: fileURL)
if reader.open() {
reader.read(byteRange: 0..<20) { data in
if let data = data {
print("Read bytes: \(data.map({ UInt8($0) }))")
}
else {
print("Failed to read data")
}
}
}
else {
print("Failed to open")
}

Summary

DispatchIO provides an efficient way for accessing raw bytes in a file. Wrapping it into a FileReader class gives us a compact interface for working with file data. Please checkout the playground which contains the full implementation and the example: FileReaderPlayground.

If this was helpful, please let me know on Mastodon@toomasvahter or Twitter @toomasvahter. Feel free to subscribe to RSS feed. Thank you for reading.

Example Project

FileReaderPlayground (Xcode 12.4)

Categories
iOS Swift UIKit

Creating persistent data store on iOS

Storing data persistently on iOS is something what is needed quite often. In this post, we are going to look into how to build a persistent data store and how to store image data.

Initialising the persistent data store

Persistent data store is an object managing a folder on disk. It allows writing and reading data asynchronously.
Firstly, we need to create a folder where to store all the files. As every instance of the data store should manage its own folder, we will add an argument name to the initialiser. Then we can create a folder in user’s documents folder with that name. As writing and reading data is an expensive operation, we are going to offload the work to a concurrent DispatchQueue. Concurrent dispatch queue allows us to read multiple files at the same time (more about it a bit later).

final class PersistentDataStore {
let name: String
private let dataStoreURL: URL
private let queue: DispatchQueue
init(name: String) throws {
self.name = name
queue = DispatchQueue(label: "com.augmentedcode.persistentdatastore", qos: .userInitiated, attributes: .concurrent, autoreleaseFrequency: .workItem)
let documentsURL = try FileManager.default.url(for: .documentDirectory, in: .userDomainMask, appropriateFor: nil, create: false)
dataStoreURL = documentsURL.appendingPathComponent(name, isDirectory: true)
try FileManager.default.createDirectory(at: dataStoreURL, withIntermediateDirectories: true, attributes: nil)
}
}

Storing data asynchronously

Method for storing data on disk consists of closure, identifier and completion handler. This allows us to create a closure what transforms object to data. For example, it could transform UIImage to Data. Secondly, this transformation, possibly slow operation, can be offloaded to the same thread writing the data into a file. Using closure gives us a flexible API what we can extend with convenience methods.

typealias Identifier = String
enum Result {
case failed(Error)
case noData
case success(Identifier)
}
func storeData(_ dataProvider: @escaping () -> (Data?), identifier: Identifier = UUID().uuidString, completionHandler block: @escaping (Result) -> ()) {
queue.async(flags: .barrier) {
let url = self.url(forIdentifier: identifier)
guard let data = dataProvider(), !data.isEmpty else {
DispatchQueue.main.async {
block(.noData)
}
return
}
do {
try data.write(to: url, options: .atomic)
DispatchQueue.main.async {
block(.success(identifier))
}
}
catch {
DispatchQueue.main.async {
block(.failed(error))
}
}
}
}
// Example (adding data to data store with unique identifier):
persistentStore.storeData({ () -> (Data?) in
return image.jpegData(compressionQuality: 1.0)
}) { (result) in
switch result {
case .success(let identifier):
print("Stored data successfully with identifier \(identifier).")
case .noData:
print("No data to store.")
case .failed(let error):
print("Failed storing data with error \(error)")
}
}

Identifier is internally used as a filename and default implementation creates unique identifier. Therefore, when data store consumer would like to replace the current file, it can supply an identifier, otherwise new file is created.
Completion handler contains a Result enum type. Result enum consists of three cases: success, transformation failure and data writing failure. Success’ associated value is identifier, failure contains error object and transformation failure is equal to noData.
Important to note here is that the work item has barrier specified. Barrier means that when DispatchQueue starts to handle the work item, it will wait until all the previous work items have finished running. Meaning, we will never try to update a file on disk when some other request is busy reading it.

Loading data asynchronously

Load data is generic method allowing the data transformation closure to return a specific type (e.g. transforming Data to UIImage). Shortly, load data reads file from disk and transforms it into a different type. As transformation can be a lengthy task, it is yet again running on the background thread and will not cause any hiccups in the UI.

func loadData<T>(forIdentifier identifier: Identifier, dataTransformer: @escaping (Data) -> (T?), completionHandler block: @escaping (T?) -> ()) {
queue.async {
let url = self.url(forIdentifier: identifier)
guard FileManager.default.fileExists(atPath: url.path) else {
DispatchQueue.main.async {
block(nil)
}
return
}
do {
let data = try Data(contentsOf: url, options: .mappedIfSafe)
let object = dataTransformer(data)
DispatchQueue.main.async {
block(object)
}
}
catch {
print("Failed reading data at URL \(url).")
DispatchQueue.main.async {
block(nil)
}
}
}
}
// Example
persistentStore.loadData(forIdentifier: "my_identifier", dataTransformer: { UIImage(data: $0) }) { (image) in
guard let image = image else {
print("Failed loading image.")
return
}
print(image)
}

Removing data asynchronously

Removing a single file or all of the files is pretty straight-forward. As we are modifying files on disk, we will use barrier again and then FileManager’s removeItem(at:) together with contentsOfDirectory(at:includingPropertiesForKeys:options:).

func removeData(forIdentifier identifier: Identifier) {
queue.async(flags: .barrier) {
let url = self.url(forIdentifier: identifier)
guard FileManager.default.fileExists(atPath: url.path) else { return }
do {
try FileManager.default.removeItem(at: url)
}
catch {
print("Failed removing file at URL \(url) with error \(error).")
}
}
}
func removeAll() {
queue.async(flags: .barrier) {
do {
let urls = try FileManager.default.contentsOfDirectory(at: self.dataStoreURL, includingPropertiesForKeys: nil, options: [])
try urls.forEach({ try FileManager.default.removeItem(at: $0) })
}
catch {
print("Failed removing all files with error \(error).")
}
}
}

Extension for storing images

It is easy to extend the PersistentDataStore with convenience methods for storing a specific type of data. This allows us to hide the technical details of transforming image to data and vice-versa. Moreover, calling the method gets easier to read as data transformation closure is not visible anymore.

extension PersistentDataStore {
func loadImage(forIdentifier identifier: Identifier, completionHandler block: @escaping (UIImage?) -> (Void)) {
loadData(forIdentifier: identifier, dataTransformer: { UIImage(data: $0) }, completionHandler: block)
}
func storeImage(_ image: UIImage, identifier: String = UUID().uuidString, completionHandler handler: @escaping (Result) -> ()) {
storeData({ image.jpegData(compressionQuality: 1.0) }, identifier: identifier, completionHandler: handler)
}
}
// Examples:
persistentStore.storeImage(image) { (result) in
print(result)
}
persistentStore.loadImage(forIdentifier: "my_identifier") { (image) -> (Void) in
guard let image = image else {
print("Failed loading image.")
return
}
print(image)
}

Summary

We created a persistent data store what is performant and has a flexible API. API can be extended easily to support any other data transformation. In addition, it uses thread-safe techniques for making sure data never gets corrupted.

Playground

PersistentDataStore (GitHub) Xcode 10, Swift 4.2

References

DispatchQueues (Apple)
dispatch_barrier_async (Apple)