Summary: The common fallacy that concurrent tasks
require threads is debunked. I am not an expert on threads and feel
awkward writing this essay, but I have not found any concise
formulation of this argument anywhere and had to write it down. Feel
free to correct me.
An event queue is even cute.
Definitions
The following are some of the terms used below. I don't insist that
these are the correct definitions; I simply need them to explain the
differences.
Kernel Scheduling Entity (KSE)
The unit of kernel scheduling. Something that can receive
timeslices from the kernel, be preempted, and block on kernel events.
Task
A sequence of actions to be performed by a program while keeping
the same context.
Coroutines
Concurrently executing pieces of code within the same program that
use cooperative multitasking to ensure each makes progress.
Thread
The term is very much overloaded. I use it here to refer to POSIX
threads, Microsoft Windows threads, Solaris threads, etc. Details of
implementations can differ (e.g., FreeBSD has several different POSIX
thread implementations that work differently).
Process
I'm afraid to try to give a definition of a process after all the
giants who failed. You know what a process is. An operating system
process, a KSE with its own address space.
Event loop
A common technique of organizing a program that obtains external
events of some kind and processes them as they come.
An example
Suppose we need to write a program that:
Binds to a TCP socket and listens for connections.
Within each TCP connection, implements a calculator that reads
lines of arithmetic expressions and outputs their results. The client
can either enter things like ``(3+4)*6'' or can enter
assignments like ``size=6+15/3''; variables to which
something was assigned before can be used in subsequent expressions.
Different requests are separated by newlines. The requests can be
pipelined; i.e., the client does not need to wait for a response to a
previous request before sending the next one.
How would you solve this toy problem? Stop reading until you
formulate the answer.
Programmers that come from different traditions can suggest
different ways of implementing it:
With processes: a process enters a blocking accept()
loop; when a new connection is established, a child process is spawned
with a fork() call; any given child process handles its
own session: it reads requests, keeps a data structure for the context
(perhaps a hash with variable names as keys), and writes responses.
There's no communication between the children, and very little
communication between a child and the parent (the parent needs to
spawn the child, then read and discard or record its exit status).
With threads: a similar architecture with threads replacing
processes.
With an event loop: only a single routine executes; a data
structure records the state of the system; the data structure could
be, e.g., hash of hashes (client identifiers such as sockets on first
level of hashing and variable names on the second level of hashing);
another data structure is used to record partial inputs (e.g., a hash
with client identifiers as keys and strings as values); an analogous
data structure holds outputs to be sent to different clients; all
sockets are put into non-blocking mode; a select() loop
governs program execution.
Do you fit into any of these categories? If you've selected the
solution with processes, you're probably a UNIX programmer who hasn't
been exposed to Solaris for too long; you might enjoy reading this.
If you've selected an event loop, you already know what I'm going to
say; don't waste your time---go program something. If you've selected
the threads solution, I wrote this for you. If you didn't get the
event loop business or think that parallelism (in the sense of
multiple tasks) necessarily requires threads, I would like to point
out that it does not.
(If your solution doesn't fit into this trichotomy, you can go back
to writing your program that constructively extends Rice's theorem by
taking two programs, performer and detector, on input, and always, as
long as the detector can print ``no'', always stops and prints ``yes''
or ``no'', and prints ``yes'' with performer on input, outputs a
program that represents the same function as the performer and that
elicits ``no'' from the detector. I know, junior-level nonsense.
Well then, try it in Lisp. Sorry about the digression; I do need to
deal with hecklers.)
Other solutions
Many other solutions for the problem exist [C10K]. They are,
generally, either refinements of one of the approaches (e.g., a
particular mechanism, such as select(),
poll(), /dev/poll, or kqueue()
to obtain notification of events), or combinations of the approaches
(e.g., an event loop running within multiple threads). These details
can be extremely important and difficult to get right [DARC04], but
here only the basic approach is discussed.
The Polemics
The widely cited argument against threads [OUST95] actually advises to
use threads for high-end applications (e.g., databases) and
applications that require true CPU concurrency on multi-processor
systems. It advises to mostly keep even multi-threaded code
single-threaded, with just event-specific threads. Interestingly, the
author both claims that threads can have (usually have? always have?)
poor performance and advises to use them only in performance-critical
contexts, reserving event programming for distributed systems, GUIs,
and the like.
What is widely seen as an attempt to refute [BEHR03] the
anti-thread thesis uses, as its core argument, the fact that
threads and events are dual (this was known for a long time [LAUE78]).
Thus, you can do the same things with the two mechanisms, with
essentially the same performance. Therefore, one should use the
mechanism that's more natural for the task at hand. The authors
proceed to claim that for most tasks, threads are the more natural
approach; two very valid points are (i) the difficulty of error
handling with events and (ii) the greater probability of memory leaks
with events. The authors admit that for fan-in and fan-out
operations, such as multicasting, the event model is more natural.
But (yes, there is a ``but'' the size of a blue whale), the paper
states that there is no well-performing thread implementation (other
than what the authors rigged up for their paper). The authors propose
a number of improvements to address performance and other limitations.
Until these improvements are understood, implemented, deployed, and
tested at least by a few of popular operating systems, the authors'
argument that threads can work well is also an argument that
they currently do not. Incidentally, the authors argue for a
user-level thread implementation with cooperative multitasking, with
the kernel knowing little about threads. Most of the recent movement
in this area is towards more kernel involvement in threads, not less
(e.g., Solaris 10 replaces an N:M implementation, where multiple
threads, implementing multiple tasks, can exist within a KSE, with a
1:1 implementation, where one thread is exactly one KSE).
So, should I use threads or events?
In theory, a proper implementation of threads is as good as a proper
implementation of events, so one can choose freely the mechanism that
is more convenient. If it's convenient to use a thread for each task,
do so; if fan-out makes it more convenient to write the application as
an event loop, do so.
However, in practice, currently, since all threading libraries are
so bad, you should seriously consider using events. Here are the
possible reasons to use threads:
You need concurrency for a performance-critical application, and a
hardware source of concurrency exists;
You don't need portability and are using a good threads library
not available to the rest of the world (and threads are the more
natural approach to your problem);
You were taught thread-based programming, and cannot think
off-hand of another way to implement tasks.
The first reason is a solid one. The second is dubious: you trade off
portability for being able to write more natural code; if one always
did that, almost all programs would have been written in
domain-specific languages running on exotic platforms. The third,
needless to say, is not a reason at all.
OK, I'm using threads. How many should I have?
What is your reason for using threads? If it's 3 (that you want to
avoid using events), I can't really help you, as I don't consider this
a valid reason. If it's 2 (that it's more natural and hurts nothing),
then use as many threads as you have tasks or whatever feels natural.
If it's 1 (that you need to exploit potential parallelism in
hardware), then the number should be related to the number of things
that can happen in parallel.
The number of parallel processes that can happen in a computer is
related to the number of CPUs, disks (disk I/O events, unfortunately,
cannot be efficiently used in event-based programs on UNIX: even if
you set the file descriptor to nonblocking, you will still wait, as
select() will always report that file descriptors for
regular files are ready, even if it'll take 10ms for a disk seek
before data comes), and various other I/O channels that behave like
disks. The number of threads should be related to this degree of
potential parallelism; not necessarily equal (performance might be
better if the number of threads is the degree of parallelism plus or
times some small constant), but related to it, rather than the number
of tasks. Note that as the number of threads gets larger than the
number of CPUs, the potential for unnecessary context switches
increases, so the small constant mentioned above should actually be
small.