Categories
iOS Swift

Avoiding subtle mistake when guarding mutable state with DispatchQueue

Last week, I spent quite a bit of time on investigating an issue which sometimes happened, sometimes did not. There was quite a bit of code involved running on multiple threads, so tracking it down was not so simple. No surprise to find that this was a concurrency issue. The issue lied in the implementation of guarding a mutable state with DispatchQueue. The goal of the blog post is to remind us again a pattern which looks nice at first but actually can cause issues along the road.

Let’s have a look at an example where we have a Storage class which holds data in a dictionary where keys are IDs and values are Data instances. There are multiple ways for guarding the mutable state. In the example, we are using a concurrent DispatchQueue. Concurrent queues are not as optimized as serial queues, but the reasoning here is that we store large data blobs and concurrent reading gives us a slight benefit over serial reading. With concurrent queues we must make sure all the reading operations have finished before we mutate the shared state, and therefore we use the barrier flag which tells the queue to wait until all the enqueued tasks are finished.

final class Storage {
private let queue = DispatchQueue(label: "myexample", attributes: .concurrent)
private var _contents = [String: Data]()
private var contents: [String: Data] {
get {
queue.sync { _contents }
}
set {
queue.async(flags: .barrier) { self._contents = newValue }
}
}
func store(_ data: Data, forIdentifier id: String) {
contents[id] = data
}
// …
}
view raw Storage.swift hosted with ❤ by GitHub

The snippet above might look pretty nice at first, since all the logic around synchronization is in one place, and we can use the contents property in other functions without needing to think about using the queue. For validating that it works correctly, we can add a unit test.

func testThreadSafety() throws {
let iterations = 100
let storage = Storage()
DispatchQueue.concurrentPerform(iterations: iterations) { index in
storage.store(Data(), forIdentifier: "\(index)")
}
XCTAssertEqual(storage.numberOfItems, iterations)
}
view raw Test.swift hosted with ❤ by GitHub

The test fails because we actually have a problem in the Storage class. The problem is that contents[id] = data does two operations on the queue: firstly, reading the current state using the property getter and then setting the new modified dictionary with the setter. Let’s walk this through with an example where thread A calls the store function and tries to add a new key “d” and thread B calls the store function at the same time and tries to add a new key “e”. The flow might look something like this:

A calls the getter and gets an instance of the dictionary with keys “a, b, c”. Before the thread A calls the setter, thread B already had a chance to read the dictionary as well and gets the same keys “a, b, c”. Thread A reaches the point where it calls the setter and inserts modified dictionary with keys”a, b, c, d” and just after that the thread B does the same but tries to insert dictionary with keys “a, b, c, e”. When the queue ends processing all the work items, the key “d” is going to be lost, since the thread B managed to read the shared dictionary state before the thread A modified it. The morale of the story is that when modifying a shared state, we must make sure that reading the initial state and setting a new value must be synchronized and can’t happen as separate work items on the synchronizing queue. This happened here, since using the dictionaries subscript first runs the getter and then the setter.

The suggestion how to fix such issues is to use a single queue and making sure that read and write happen within the same work item.

func store(_ data: Data, forIdentifier id: String) {
// Incorrect because read and write happen in separate blocks on the queue
// contents[id] = data
// Correct
queue.async(flags: .barrier) {
self._contents[id] = data
}
}
view raw Fixed.swift hosted with ❤ by GitHub

An alternative approach to this Storage class’ implementation with new concurrency features in mind could be using the new actor type instead. But keep in mind that in that case we need to use await when accessing the storage since actors are part of the structured concurrency in Swift. Using the await keyword in turn requires having async context available, so it might not be straight-forward to adopt.

actor Storage {
private var contents = [String: Data]()
func store(_ data: Data, forIdentifier id: String) {
contents[id] = data
}
var numberOfItems: Int { contents.count }
}
// Example:
// await storage.store(data, forIdentifier: id)
view raw Actor.swift hosted with ❤ by GitHub

If this was helpful, please let me know on Mastodon@toomasvahter or Twitter @toomasvahter. Feel free to subscribe to RSS feed. Thank you for reading.

Categories
iOS Swift UIKit

Creating persistent data store on iOS

Storing data persistently on iOS is something what is needed quite often. In this post, we are going to look into how to build a persistent data store and how to store image data.

Initialising the persistent data store

Persistent data store is an object managing a folder on disk. It allows writing and reading data asynchronously.
Firstly, we need to create a folder where to store all the files. As every instance of the data store should manage its own folder, we will add an argument name to the initialiser. Then we can create a folder in user’s documents folder with that name. As writing and reading data is an expensive operation, we are going to offload the work to a concurrent DispatchQueue. Concurrent dispatch queue allows us to read multiple files at the same time (more about it a bit later).

final class PersistentDataStore {
let name: String
private let dataStoreURL: URL
private let queue: DispatchQueue
init(name: String) throws {
self.name = name
queue = DispatchQueue(label: "com.augmentedcode.persistentdatastore", qos: .userInitiated, attributes: .concurrent, autoreleaseFrequency: .workItem)
let documentsURL = try FileManager.default.url(for: .documentDirectory, in: .userDomainMask, appropriateFor: nil, create: false)
dataStoreURL = documentsURL.appendingPathComponent(name, isDirectory: true)
try FileManager.default.createDirectory(at: dataStoreURL, withIntermediateDirectories: true, attributes: nil)
}
}

Storing data asynchronously

Method for storing data on disk consists of closure, identifier and completion handler. This allows us to create a closure what transforms object to data. For example, it could transform UIImage to Data. Secondly, this transformation, possibly slow operation, can be offloaded to the same thread writing the data into a file. Using closure gives us a flexible API what we can extend with convenience methods.

typealias Identifier = String
enum Result {
case failed(Error)
case noData
case success(Identifier)
}
func storeData(_ dataProvider: @escaping () -> (Data?), identifier: Identifier = UUID().uuidString, completionHandler block: @escaping (Result) -> ()) {
queue.async(flags: .barrier) {
let url = self.url(forIdentifier: identifier)
guard let data = dataProvider(), !data.isEmpty else {
DispatchQueue.main.async {
block(.noData)
}
return
}
do {
try data.write(to: url, options: .atomic)
DispatchQueue.main.async {
block(.success(identifier))
}
}
catch {
DispatchQueue.main.async {
block(.failed(error))
}
}
}
}
// Example (adding data to data store with unique identifier):
persistentStore.storeData({ () -> (Data?) in
return image.jpegData(compressionQuality: 1.0)
}) { (result) in
switch result {
case .success(let identifier):
print("Stored data successfully with identifier \(identifier).")
case .noData:
print("No data to store.")
case .failed(let error):
print("Failed storing data with error \(error)")
}
}

Identifier is internally used as a filename and default implementation creates unique identifier. Therefore, when data store consumer would like to replace the current file, it can supply an identifier, otherwise new file is created.
Completion handler contains a Result enum type. Result enum consists of three cases: success, transformation failure and data writing failure. Success’ associated value is identifier, failure contains error object and transformation failure is equal to noData.
Important to note here is that the work item has barrier specified. Barrier means that when DispatchQueue starts to handle the work item, it will wait until all the previous work items have finished running. Meaning, we will never try to update a file on disk when some other request is busy reading it.

Loading data asynchronously

Load data is generic method allowing the data transformation closure to return a specific type (e.g. transforming Data to UIImage). Shortly, load data reads file from disk and transforms it into a different type. As transformation can be a lengthy task, it is yet again running on the background thread and will not cause any hiccups in the UI.

func loadData<T>(forIdentifier identifier: Identifier, dataTransformer: @escaping (Data) -> (T?), completionHandler block: @escaping (T?) -> ()) {
queue.async {
let url = self.url(forIdentifier: identifier)
guard FileManager.default.fileExists(atPath: url.path) else {
DispatchQueue.main.async {
block(nil)
}
return
}
do {
let data = try Data(contentsOf: url, options: .mappedIfSafe)
let object = dataTransformer(data)
DispatchQueue.main.async {
block(object)
}
}
catch {
print("Failed reading data at URL \(url).")
DispatchQueue.main.async {
block(nil)
}
}
}
}
// Example
persistentStore.loadData(forIdentifier: "my_identifier", dataTransformer: { UIImage(data: $0) }) { (image) in
guard let image = image else {
print("Failed loading image.")
return
}
print(image)
}

Removing data asynchronously

Removing a single file or all of the files is pretty straight-forward. As we are modifying files on disk, we will use barrier again and then FileManager’s removeItem(at:) together with contentsOfDirectory(at:includingPropertiesForKeys:options:).

func removeData(forIdentifier identifier: Identifier) {
queue.async(flags: .barrier) {
let url = self.url(forIdentifier: identifier)
guard FileManager.default.fileExists(atPath: url.path) else { return }
do {
try FileManager.default.removeItem(at: url)
}
catch {
print("Failed removing file at URL \(url) with error \(error).")
}
}
}
func removeAll() {
queue.async(flags: .barrier) {
do {
let urls = try FileManager.default.contentsOfDirectory(at: self.dataStoreURL, includingPropertiesForKeys: nil, options: [])
try urls.forEach({ try FileManager.default.removeItem(at: $0) })
}
catch {
print("Failed removing all files with error \(error).")
}
}
}

Extension for storing images

It is easy to extend the PersistentDataStore with convenience methods for storing a specific type of data. This allows us to hide the technical details of transforming image to data and vice-versa. Moreover, calling the method gets easier to read as data transformation closure is not visible anymore.

extension PersistentDataStore {
func loadImage(forIdentifier identifier: Identifier, completionHandler block: @escaping (UIImage?) -> (Void)) {
loadData(forIdentifier: identifier, dataTransformer: { UIImage(data: $0) }, completionHandler: block)
}
func storeImage(_ image: UIImage, identifier: String = UUID().uuidString, completionHandler handler: @escaping (Result) -> ()) {
storeData({ image.jpegData(compressionQuality: 1.0) }, identifier: identifier, completionHandler: handler)
}
}
// Examples:
persistentStore.storeImage(image) { (result) in
print(result)
}
persistentStore.loadImage(forIdentifier: "my_identifier") { (image) -> (Void) in
guard let image = image else {
print("Failed loading image.")
return
}
print(image)
}

Summary

We created a persistent data store what is performant and has a flexible API. API can be extended easily to support any other data transformation. In addition, it uses thread-safe techniques for making sure data never gets corrupted.

Playground

PersistentDataStore (GitHub) Xcode 10, Swift 4.2

References

DispatchQueues (Apple)
dispatch_barrier_async (Apple)