Avoiding subtle mistake when guarding mutable state with DispatchQueue

Last week, I spent quite a bit of time on investigating an issue which sometimes happened, sometimes did not. There was quite a bit of code involved running on multiple threads, so tracking it down was not so simple. No surprise to find that this was a concurrency issue. The issue lied in the implementation of guarding a mutable state with DispatchQueue. The goal of the blog post is to remind us again a pattern which looks nice at first but actually can cause issues along the road.

Let’s have a look at an example where we have a Storage class which holds data in a dictionary where keys are IDs and values are Data instances. There are multiple ways for guarding the mutable state. In the example, we are using a concurrent DispatchQueue. Concurrent queues are not as optimized as serial queues, but the reasoning here is that we store large data blobs and concurrent reading gives us a slight benefit over serial reading. With concurrent queues we must make sure all the reading operations have finished before we mutate the shared state, and therefore we use the barrier flag which tells the queue to wait until all the enqueued tasks are finished.

	final class Storage {
	private let queue = DispatchQueue(label: "myexample", attributes: .concurrent)
	private var _contents = [String: Data]()

	private var contents: [String: Data] {
	get {
	queue.sync { _contents }
	}
	set {
	queue.async(flags: .barrier) { self._contents = newValue }
	}
	}

	func store(_ data: Data, forIdentifier id: String) {
	contents[id] = data
	}
	// …
	}

view raw Storage.swift hosted with ❤ by GitHub

The snippet above might look pretty nice at first, since all the logic around synchronization is in one place, and we can use the contents property in other functions without needing to think about using the queue. For validating that it works correctly, we can add a unit test.

	func testThreadSafety() throws {
	let iterations = 100
	let storage = Storage()
	DispatchQueue.concurrentPerform(iterations: iterations) { index in
	storage.store(Data(), forIdentifier: "\(index)")
	}
	XCTAssertEqual(storage.numberOfItems, iterations)
	}

view raw Test.swift hosted with ❤ by GitHub

The test fails because we actually have a problem in the Storage class. The problem is that contents[id] = data does two operations on the queue: firstly, reading the current state using the property getter and then setting the new modified dictionary with the setter. Let’s walk this through with an example where thread A calls the store function and tries to add a new key “d” and thread B calls the store function at the same time and tries to add a new key “e”. The flow might look something like this:

A calls the getter and gets an instance of the dictionary with keys “a, b, c”. Before the thread A calls the setter, thread B already had a chance to read the dictionary as well and gets the same keys “a, b, c”. Thread A reaches the point where it calls the setter and inserts modified dictionary with keys”a, b, c, d” and just after that the thread B does the same but tries to insert dictionary with keys “a, b, c, e”. When the queue ends processing all the work items, the key “d” is going to be lost, since the thread B managed to read the shared dictionary state before the thread A modified it. The morale of the story is that when modifying a shared state, we must make sure that reading the initial state and setting a new value must be synchronized and can’t happen as separate work items on the synchronizing queue. This happened here, since using the dictionaries subscript first runs the getter and then the setter.

The suggestion how to fix such issues is to use a single queue and making sure that read and write happen within the same work item.

	func store(_ data: Data, forIdentifier id: String) {
	// Incorrect because read and write happen in separate blocks on the queue
	// contents[id] = data

	// Correct
	queue.async(flags: .barrier) {
	self._contents[id] = data
	}
	}

view raw Fixed.swift hosted with ❤ by GitHub

An alternative approach to this Storage class’ implementation with new concurrency features in mind could be using the new actor type instead. But keep in mind that in that case we need to use await when accessing the storage since actors are part of the structured concurrency in Swift. Using the await keyword in turn requires having async context available, so it might not be straight-forward to adopt.

	actor Storage {
	private var contents = [String: Data]()

	func store(_ data: Data, forIdentifier id: String) {
	contents[id] = data
	}

	var numberOfItems: Int { contents.count }
	}

	// Example:
	// await storage.store(data, forIdentifier: id)

view raw Actor.swift hosted with ❤ by GitHub

If this was helpful, please let me know on Mastodon@toomasvahter or Twitter @toomasvahter. Feel free to subscribe to RSS feed. Thank you for reading.

Support me on Patreon

Donate with Paypal

Buy me a coffee

Share this: