Lock-based programming may not be the best approach to building large concurrent programs.
March 01, 2005
URL:http://www.drdobbs.com/cpp/the-trouble-with-locks/184401930
In my most recent articles [1, 2], I presented reasons why concurrency (for example, multithreading) will be the next revolution in the way we develop software — a sea change of the same order as the object-oriented revolution. I also stated that "the vast majority of programmers today don't grok concurrency, just as the vast majority of programmers 15 years ago didn't yet grok objects."
In this column, I'd like to consider just one question that several people wrote to ask, namely: "Is concurrency really that hard?" In particular, a few readers felt that lock-based programming is well understood; it is, after all, the status quo mainstream solution to concurrency control.
So, "is concurrency really that hard?" My short answer is this:
Unfortunately, today's reality is that only thoughtful experts can write explicitly concurrent programs that are correct and efficient. This is because today's programming models for concurrency are subtle, intricate, and fraught with pitfalls that easily (and frequently) result in unforeseen races (i.e., program corruption) deadlocks (i.e., program lockup) and performance cliffs (e.g., priority inversion, convoying, and sometimes complete loss of parallelism and/or even worse performance than a single-threaded program). And even when a correct and efficient concurrent program is written, it takes great care to maintain — it's usually brittle and difficult to maintain correctly because current programming models set a very high bar of expertise required to reason reliably about the operation of concurrent programs, so that apparently innocuous changes to a working concurrent program can (and commonly do, in practice) render it entirely or intermittently nonworking in unintended and unexpected ways. Because getting it right and keeping it right is so difficult, at many major software companies there is a veritable priesthood of gurus who write and maintain the core concurrent code.
Some people think I'm overstating this, so let me amplify. In this article, I'll focus on just the narrow question of how to write a lock-based program correctly, meaning that it works (avoids data corruption) and doesn't hang (avoids deadlock and livelock). That's pretty much the minimum requirement to write a program that runs at all.
Question: Is the following code thread-safe? If it is, why is it safe? If it isn't always, under what conditions is it thread-safe?
T Add( T& a, T& b ) { T result; // ... read a and b and set result to // proper values ... return result; }
There are a lot of possibilities here. Let's consider some of the major ones.
Assume that reading a T
object isn't an atomic operation. Then, if a
and/or b
are accessible from another thread, we have a classic race condition: While we are reading the values of a
and/or b
,
some other thread might be changing those objects, resulting in blowing
up the program (if you're lucky, say, by causing the object to follow
an internal pointer some other thread just deleted) or reading corrupt
values.
How would you solve that? Please stop and think about it before reading on...
Ready? Okay: Now please stop a little longer and think about your solution some more, and consider whether it might have any further holes, before reading on...
Now that you've thought about it deeply, let's consider some alternatives.
Today's typical lock-based approach is to acquire locks so that uses of a
and b
on one thread won't interleave. Typically, this is done by acquiring a
lock on an explicit synchronization object (a mutex, for instance) that
covers both objects, or by acquiring locks on an implicit mutex
associated with the objects themselves. To acquire a lock that covers
both objects, Add
has to know what that lock is, either because a
and b
and their lock are globals (but then why pass a
and b
as parameters?) or because the caller acquires the lock outside of Add
(which is usually preferable). To acquire a lock on each object individually, we could write:
T SharedAdd( T& a, T& b ) { T result; lock locka( a ); // lock is a helper // whose constructor acquires a lock lock lockb( b ); // and whose // destructor releases the lock // ... read a and b and set result to // proper values ... return result; } // release the locks
I renamed the function because probably most T
objects are never shared and we don't want to incur the cost of the lock on most Add
operations.
Aside: In some cases, we might even be more efficient and have a
reader-writer lock that allows multiple concurrent readers, but to do
that we have to guarantee at least that the readers won't physically
change the object, including any internal data indirectly reachable by
the object; this is a much stronger requirement than just plain old const
because const
is logical and shallow, not physical and deep.
Is this enough to make the function thread-safe? No. For one thing, there's no way to guarantee that all code that uses a
and b
will respect the lock. There has to be an enforced discipline to ensure
everyone plays by the locking rules to take a lock (and it has to be
the same lock) before touching a
or b
.
So let's assume you have a nice baseball bat (or Nerf gun) and
can go around your office and force everyone on your team to play by the
rules, so that nobody ever forgets to acquire the right locks when
touching a
or b
. Now are we thread-safe? No, for a completely different reason: SharedAdd
as written above is a recipe for deadlock. For a simple example, consider the following code that calls SharedAdd
:
T global1, global2; // on thread A SharedAdd( global1, global2 ); // on thread B SharedAdd( global2, global1 );
One fine day, thread A
acquires the lock on global1
, gets interrupted by (or runs concurrently with) thread B
, which acquires the lock on global2
, then both wait forever for each other's locks. Not a great place to be.
I'm sure a good number of readers saw this one coming; good for
you. The next question is, how would you fix this? One approach would be
to recognize that we're okay as long as the locks are always acquired
in the same order, and smart folks like Andrei have published helper
types that make it easier to automate that. Let's roll our own: Let's
have SharedAdd
harden itself to this by consistently taking
the locks in a given global order, say by the objects' addresses, and
we'll even be smart and remember that the Standard doesn't allow direct
comparisons between pointers to arbitrary objects but does allow them
using the std::less
library helper:
T SharedAdd( T& a, T& b ) { T result; lock lock1( less<T*>(&a,&b) ? a : b ); lock lock2( less<T*>(&a,&b) ? b : a ); // ... read a and b and set result to // proper values ... return result; } // release the locks
Now the problematic code we showed earlier is fine, because regardless of the order that global1
and global2
are passed in, we will always lock the one having the lowest address first, and so the deadlock cannot happen...right?
True, it cannot happen in that example. But any other code in the
world might have acquired a lock on whichever one has the higher
address, then called SharedAdd
. Ouch. And it could be code
we don't control, such as third-party libraries we only have binaries
to. (And it could be code that hasn't been written yet and won't exist
until a few years from now, such as new features or new overrides of
virtual functions. More on this in a moment.)
But let's keep pushing on the lock-based approaches for a moment. Let's assume that every piece of code in our universe is under our control and all of it always strictly plays fair and never gets the locking order wrong, not even by mistake. Quite an assumption, but let's assume.
Is that enough to be thread-safe? Not in general, for the simple reason that this approach only works if the only locks you ever acquire are acquired not only in the same order but at the same time, so that some code somewhere knows what all the locks in use at a given time will be and can acquire them at once. Put another way, locking operations can't safely nest. That's a problem, because it basically rules out safe locking in layered applications and libraries, which covers most of the world's significant software.
To make this more concrete, consider: What does SharedAdd
have to do to calculate the new values? For example, in the "read a
and b
and set result
to proper values" code, can it call (possibly virtual) member functions of a
or b
?
Can it call helper functions? And does whatever it calls, and all the
functions those things call, and so on, ever try to acquire any locks?
If even one part of the code that gets executed inside the a
and b
locks caused us to call a function somewhere that could try to acquire a
lock, we're open to deadlock again, even if every function everywhere
in the code globally respected the discipline to acquire locks in order.
Consider: What if during "read a
and b
and set result
to proper values" we end up calling a function that tries to acquire a lock on an object c whose address is between a
's and b
's?
That, too, is a recipe for deadlock because if we're doing that while
some other thread already holds the lock for c, and then that thread
tries to lock a
or b
(say, by calling SharedAdd(a,b)
), we're toast. Kaput.
This demonstrates the fundamental issue that lock-based synchronization isn't composable, meaning that you can't take two concurrency-correct lock-based pieces of code and glue them together into a bigger concurrency-correct piece of code. Building a large system using lock-based programming is like building a house using bricks that are made out of different kinds of volatile materials that are stable in isolation but not in all combinations, so that they often work well and stack nicely but occasionally they cause violent explosions when the wrong two bricks happen to touch.
That's why calling into any unknown code while holding a lock is a recipe for deadlock.
Let me share one data point that illustrates both that lock-based programming is hard and that very few people are actually doing much of it: Many shipping frameworks, even very popular ones by very savvy companies, do dangerous stuff that they're just getting away with because developers aren't writing much lock-based code. Here's an example pointed out to me by Chris Brumme [5]: Look in just about any framework, even ones that are designed for concurrency including the Java libraries and the .NET Framework, and you will find code that calls a virtual function while holding a lock — which is a recipe for deadlock. (Think about it: Deadlock can happen anytime you hold a lock and try to acquire another one, unless you can rule out that another thread might try to acquire the locks in the other order. So if you're holding a lock, and then call a virtual function that could or should be overridden by code outside your framework, which by definition means you're calling out into unknown and unknowable black-box code... One shudders.) And the folks who write those libraries are smart people. Locking is the status quo, but it's not good enough for widespread use.
There are other lock-based solutions we haven't talked about. One
is to use monitors, or objects that lock themselves so as to guarantee
that all member functions are synchronized with respect to each other,
so that no two threads can be executing member functions on the same
object at the same time. Examples in the field include Java's Vector
and the .NET Framework's SyncHashTable
.
Fundamentally, just the fact that monitors use locks shows that they're
open to the same issues as all uses of locks, including deadlock if
their implementations ever call into unknown code while holding the
object's lock. And such internally locked objects have their own
additional pitfalls, notably that they are limited and don't solve the
specific problem we've been considering, which requires atomic use of
both a
and b
at the same time (and locking each individual member function of a
and b
isn't enough to do that).
In this article, I've focused on only the one narrow question of how to write a lock-based program correctly, meaning that it works (avoids data corruption) and doesn't hang (avoids deadlock and livelock). That's the minimum requirement to write a program that runs at all. Along the way, I've noted reasons why many concurrent programs in the field today aren't actually correct, but are just getting away with lock-based programming flaws because concurrency isn't yet in routine widespread use.
I have virtually ignored the equally difficult question of how to write a lock-based program efficiently, one that actually scales decently when multiple processors are available and avoids all the common pitfalls, from priority inversion to serialization to convoying to aggressive locking, that can easily render it no better, and even worse, than a single-threaded program.
I've also ignored lock-free programming, which avoids nearly all the problems of lock-based programming but creates even more; experts readily admit that lock-free programming is much harder still than writing lock-based code.
Personally, I think that the most important sentence in [1] and
[2] was the fourth and last conclusion: "4. Finally, programming
languages and systems will increasingly be forced to deal well with
concurrency." Before concurrency becomes accessible for routine use by
mainstream developers, we will need significantly better,
still-to-be-designed, higher level language abstractions for concurrency
than are available in any language on any platform. Lock-based
programming is our status quo and it isn't enough; it is known to be not
composable, hard to use, and full of latent races (where you forgot to
lock) and deadlocks (where you lock the wrong way). Comparing the
concurrency world to the programming language world, basics such as
semaphores have been around almost as long as assembler and are at the
same level as assembler; locks are like structured programming with goto
; and we need an "OO" level of abstraction for concurrency.
The good news is that many people are working on this problem. As the next-generation programming models become available, they will be a key factor in opening the gates to the broad and strong adoption of concurrent programming that will let us make our programs simpler (by separating logical control flows); more responsive (by accomplishing more work during idle time such as waiting for disk and memory); and fully able to exploit the continuing exponential gains in processor throughput.
[1] Sutter, H. "The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software," Dr. Dobb's Journal, March 2005. Available online at http://www.gotw.ca/publications/concurrency-ddj.htm.
[2] Sutter, H. "The Concurrency Revolution," C/C++ Users Journal, February 2005. This is an abbreviated version of [1].
[3] Alexandrescu, A. "Lock-Free Data Structures," C/C++ Users Journal, October 2004.
[4] Alexandrescu, A. and M. Michael. "Lock-Free Data Structures with Hazard Pointers," C/C++ Users Journal, December 2004.
[5] http://blogs.msdn.com/cbrumme.
Herb Sutter (http://www.gotw.ca/) chairs the ISO C++ Standards committee and is an architect in Microsoft's Developer Division. His most recent books are Exceptional C++ Style and C++ Coding Standards (Addison-Wesley).
Terms of Service | Privacy Statement | Copyright © 2012 UBM TechWeb, All rights reserved.