Monday, April 12. 2010

Modern Multiprocessing

I've been thinking a lot lately about the way we accomplish multiprocessing. We've seen a significant change in the operation of Moore's law for CPU speeds: today's CPUs are about the same speed as those of a few years ago, but they have more cores, and more virtual processors on those cores. This is great for heavily-loaded servers, which have plenty of distinct tasks to place on those cores and VCPUs, but not so useful for users working with single-threaded applications.

Why are most applications still single-threaded? There are lots of good reasons. Threaded code is harder to write, and not just because it requires careful analysis and use of synchronization primitives: many common tasks are difficult to meaningfully parallelize without careful control over inter-thread communication, and in a portable application you don't have that kind of control. Threaded code generally performs badly on single-CPU systems, which are still common. Some popular languages still make threading difficult, at least in a portable fashion. And threads are still relatively heavyweight entities in most operating systems: you don't spawn ten threads to mergesort a 100-item array.

Some of these problems will go away with a little more time, but some will get worse. NUMA architectures can make sharing data between threads slow. Hyperthreading and its interaction with processor caches adds yet another level of unpredictability.

We know how to build massively parallel systems that run massively parallel algorithms. What is still unknown is how to build portable, simple software that can run efficiently across a vareity of architectures. This is a problem of practice, not theory, and there's lots of interesting work going on in this area.

Of course, there are languages designed explicitly to support communication, such as Limbo or Erlang, Haskell, and Clojure. For the most part, these languages are structured as communicating sequential processes, which is to say that they represent multiprocessing as a set of sequential threads that pass information to one another. Problems of thread safety are subsumed by the languages, but mapping the parallelism to available resources is generally left to the programmer or administrator.

One interesting project is Apple's Grand Central Dispatch. It defines a simple but highly expressive closure syntax (a block) and a mechanism to dynamically schedule execution of such closures (queues). Critically, the GCD library takes care of scaling the parallelism of the queue processing appropriately to the underlying hardware. On a single-threaded CPU, this amounts to cooperative multitasking, but on parallel hardware the operating system can dynamically allocate virtual CPUs to applications needing more parallelism.

This topic seems to come up often in my various pursuits, so I will return to it again.

Want to work on Amanda?

I've not made any secret of the fact that I want more people hacking on Amanda. This is both for selfish reasons -- many hands make light work -- and for altruistic reasons -- a broader community of developers can provide better governance for the project and long-term continuity. With a few noticable exceptions, I haven't had a lot of satisfaction.

I think part of the reason is that Amanda has a steep learning curve, even within the new Perl code. The time to climb that curve is a big investment, and folks with only a small itch to scratch can't afford it.

In an effort to sweeten the pot, we (Zmanda) are offering to pay for flexible work on Amanda. Part-time or full-time, on your own schedule. Your choice of projects. Support and gratitude from the other hackers. And the option to become a full Zmanda employee if that's your bent.

Here are some possible projects, to pique your interest:

  • MySQL application (to round out the set with ampgsql)
  • Cyrus Imapd application (gnutar doesn't deal well with the application's tiny files and hard links)
  • OpenSSL for network transport, using certificates and keys for authentication
  • Database-backed backup catalog
  • Amvault upgrade
  • Handle Logical EOM (LEOM) on all devices that support it, drastically reducing the number of parts Amanda writes
  • Support for more cloud backends than just S3
  • Parallel writes to multiple devices

If you're interested, contact me ( and we'll work something out!


The postings on this site are my own and don't necessarily represent the opinions of my employer, or anyone else.