Determinism

For most of my professional research career, I have sought more deterministic mechanisms for solving various engineering problems. My focus has always been on systems that combine the clean and neat world of computation with the messy and unpredictable physical world (cyber-physical systems). Given the messiness and unpredictability of the real world, why the obsession with determinism1?

An argument that I hear frequently against a focus on determinism goes something like this: In the real world, things will go wrong. Nothing is really predictable. Even deterministic models will be violated in practice, so why bother with deterministic models? Why not, instead, assume everything is random and design your system to tolerate this randomness?

Perhaps, at least in part because of this thinking, engineers accept nondeterminism where they should not. In concurrent and distributed software, for example, although many software engineers have moved beyond the wildly nondeterministic mechanisms of threads, the replacements fall short. Publish-and-subscribe frameworks (such as ROS and MQTT) and actor-based frameworks (such as Akka and Erlang) deliver messages in nondeterministic order. Even frameworks designed specifically for safety-critical systems (such as AUTOSAR and the standards governing PLCs) have unexpected nondeterminism. In all of these frameworks, it is possible to construct deterministic programs, but it requires considerable expertise in concurrent software, expertise that is usually well outside the comfort zone of most application engineers. So the application engineers tweak priorities, insert ad-hoc delays, test their implementations extensively, and hope for the best.

Perhaps determinism is just too difficult to achieve. Is it? We have done it before, and the benefits have been enormous. TCP, the linchpin of the Internet, provides reliable in-order packet delivery. CRC checks in computer memories ensure that what is read is what was written. Synchronous digital logic design, the foundation of almost all modern electronics, is a deterministic model overlaid on the randomness of sloshing electrons. Why do we rely so heavily on determinism at these lower levels, and then abandon it at higher levels?

Today, computer memories have replaced ledgers in finance and law. Digital signatures have become an acceptable way to finalize legal contracts. Stock trades execute without human intervention or paper records. None of these would be possible without deterministic models and our ability to build physical systems that are highly faithful to these models. Electronic circuits perform billions of arithmetic operations per second and go for years without errors.

I believe that this lack of commitment to determinism at higher levels is based on a misunderstanding of the concept. First, I claim that determinism is a property of models, not of physical systems. To understand the implications of this observation, we have to first understand that models are used by engineers in two very different ways. An engineering model is a specification of how a physical system should behave, not a model of how the physical system does behave (the latter is a scientific model). When you have a model that defines how a system should behave, then you get, for free, the notion of a fault. A fault is a behavior that deviates from the specification.

The existence of faults does not undermine the value of deterministic models. In fact, the very notion of a fault is strengthened by deterministic models because they define more clearly what behavior a physical system should have. Detecting faults, therefore, is easier. A cyclic-redundancy check (CRC), for example, detects at least some violations of a simple deterministic model of a computer memory. This enables fault-tolerant design, where the system reacts in predictable ways to faults.

Every realization of an engineering model can exhibit faults. When we successfully build a physical system that reliably behaves like a model, it does so only under certain assumptions. No computer will correctly execute a program if it overheats, is crushed, or is submerged in salt water. The model is faithfully emulated only under the assumption that none of these things has happened.

Making the assumptions clear also has value.  For example, in a distributed system, if you can assume that there is an upper bound on communication latency, then the absence of a message conveys information. Of course, that assumption may be violated in practice. So may the assumption that two 32-bit numbers are added correctly, but, in practice, violation of this second assumption is so unlikely (because of the determinism of synchronous digital logic) that we don’t feel compelled to check it.  A good design makes assumptions explicit, assesses the likelihood that the assumptions will be violated, and provides detection and fault handling mechanisms if that likelihood is higher than we would like.

A deterministic model, together with clearly stated and quantified assumptions under which a physical realization emulates the model, enable efficient designs that can react in predictable ways to faults. Hence, despite Murphy’s Law, deterministic models are useful, even in the face of unpredictable failures.

For cyber-physical systems, deterministic models play an important role on both sides of the divide between the cyber and the physical. For example, on the physical side, deterministic differential equation models can be useful descriptions of how a robot arm should behave, particularly if coupled with probabilistic models of how it may actually behave. On the cyber side, for distributed software, clear specifications of how the components should coordinate are useful, particularly if coupled with probabilistic models of network behavior that may compromise these specifications. In both cases, we are talking about a combination of engineering and scientific models.

In the context of parallel and distributed software, we can do better than we are doing today. There are many deterministic models that have achieved modest success in certain circles, but they are far from widespread. These include dataflow models, discrete-event systems, synchronous-reactive models, logical-execution-time-based models, process networks, and more. Although they are widely used in several engineering communities (electronic hardware and mechanical engineering, for example), the software engineering community has failed to recognize the value.  And that reticence has spilled over into fields that should not accept unnecessary nondeterminism, such as robotics, where ROS is widespread.

One way to make a commitment to determinism in concurrent and distributed software is to adopt a semantic notion of time and explicitly time-stamp events. This is done in discrete-event system simulators and hardware design languages, but it is rare in software systems. Such models introduce a logical notion of time that makes the models easy to understand and verify. With an additional constraint that messages with identical timestamps be processed in a well-defined order, the model becomes deterministic. The job of the runtime system, then, is to ensure that every component processes events in timestamp order. This ensures that the physical realization conforms with the model, exhibiting physical time properties that conform with the logical time model. This is the principle behind the recently introduced reactor model and realized in the Lingua Franca language, the current manifestation of my obsession with determinism.

I was recently approached by a large company that makes and sells safety-critical cyber-physical systems. They were interested in creating “digital twins” in the form of simulation models of a deployed system. These would be used for virtual prototyping and for fault detection. They were finding that their simulations were unable to capture the timing of software execution sufficiently precisely, and they asked if my group could help them improve the accuracy of their simulations. I said no. Moreover, I pointed out that they were trying to build scientific models, whereas what they really should be doing is to build engineering models. An engineering model with a logical notion of time will be easy to simulate accurately. If they can then ensure that the runtime system conforms with the engineering model with high confidence, then the engineering model will be a faithful digital twin. If the engineering model also avoids unnecessary nondeterminism, admitting only whatever nondeterminism is intrinsic in the application, then simulation becomes an effective testing mechanism for a virtual prototype. Test vectors can be defined for a range of operating conditions, and since correct behavior will be well defined, identifying design flaws will be far easier.

Determinism is an extremely valuable tool in our engineering toolbox. We should resist any temptation to give it up because it seems too hard to achieve. We have repeatedly proven that it is not so hard to achieve, and the benefits we have reaped are considerable. Because of that determinism, I can be fairly sure that you are seeing the same words in this blog that I am. Whether you understand those words in the way that I do is another question entirely.


  1. This blog summarizes a position more fully developed and defended in a paper with the same title that is to appear in the ACM Transactions on Embedded Computing Systems (TECS) in May 2021 (DOI: http://dx.doi.org/10.1145/3453652). Please see that paper for citations.  ↩

Author bio: Edward A. Lee has been working on embedded software systems for 40 years, and after detours through Yale, MIT, and Bell Labs, landed at Berkeley, where he is now Professor of the Graduate School in EECS.  His research is focused on cyber-physical systems, where he strives to make composable, secure, and verifiable timing sensitive systems. Recently, he has branched out and published two books on the philosophy of technology.

DisclaimerAny views or opinions represented in this blog are personal, belong solely to the blog post authors and do not represent those of ACM SIGBED or its parent organization, ACM.