Safety-Critical Neural Networks

As an undergraduate, I wanted to learn to play Go. In my AI class, we learned that although we have pretty good algorithms for games like checkers and chess, even an intermediate Go player can outperform the best computer program in existence. This was a good opportunity, I thought, to show I could beat a computer at something objective other than filling out CAPTCHAs. After downloading GnuGo and practicing occasionally for a few months, I was good enough to be competitive without a handicap, sometimes even winning. These days, I couldn’t even come close to beating a state-of-the-art Go program. Even the best professional human Go players can no longer beat the best computers. Game-playing algorithms have had an algorithmic breakthrough, and this has come from neural networks and deep learning.

The breakthrough in the current deep learning revolution was the undeniable success in addressing long standing problems in computer vision close to ten years ago. Nowadays, neural networks dominate the field, and cannot be ignored by researchers or practitioners. For certain image classification problems, it is not unusual for computer algorithms to even outperform humans. See this for yourself by uploading some old photos to google photos. The neural-network algorithms will identify who is in what photo and create person-specific albums, linking 10-20 year old photos with new ones. This amazingly even works in large group photos with small faces and in videos.

The explosion in deep learning research and applications has not been confined to game playing and image classification. Almost every field has at least attempted to use neural networks to investigate what is possible. Cyber-physical systems (CPS), where computers interact with the physical world, are no different. Although learning-based controllers have long been looked at in control theory, recently this has taken off, with many CPS conferences at a minimum featuring a session or two on how neural networks can solve important classes of problems.

One important distinction with CPS, however, is that these systems are not confined to the digital world. Behaviors of a neural network in a CPS may manifest in the physical world. CPS are often safety-critical and cannot tolerate any mistakes. Do neural networks make mistakes? Certainly for vision problems it’s well-known that there can be what are called adversarial images. These are images with slight perturbations that look the same to a human but classify incorrectly. An image of a stop sign, for example, can be slightly modified to classify as a speed limit sign. This is unsettling for safety-critical systems. If the system looks like it’s working correctly, it may pass all the system tests and even work well in the field for a time, only to fail when some adversarial set of sensor readings occurs.

This problem needs to be addressed before neural networks are deployed in safety-critical applications. Fortunately, there has been a surge of recent results in the field of neural network verification.

Neural Network Verification

The field of formal verification deals with mathematically proving properties about system models. Traditionally this may have been finite state machines or software programs. Although neural networks are sometimes considered as black boxes, we computer scientists know this isn’t true. Every operation performed is exact and known, often a matrix multiplication or a simple nonlinear activation function such as a sigmoid, arctan, or ReLU. Neural networks are a large organized graph of fully-known basic operations, not a black box. Neural network verification tries to prove properties about this cascade of operations.

What are we trying to prove? 

When proving properties about a piece of software, the source code is not enough. All programs are correct in some sense, in that they will always compute what is written in the program. To make the verification question meaningful, you also need preconditions and postconditions. Preconditions are assumptions on the inputs to the program. Postconditions are properties about the outputs. The useful verification question in this context becomes: ”Given any input that satisfies the preconditions, will the postconditions of the output always be true?”

Similarly, when proving properties about a neural network, the network is not enough. The network will always compute what it computes. You also need the equivalent of preconditions and postconditions. The simplest way to do this is provide preconditions and postconditions on the network itself. Given some predicate over the possible inputs, will the outputs always satisfy some other predicate? This is the open-loop neural network verification problem. In the image classification context, we can propose open-loop questions like: “Given a this image of a stop sign, will some bounded perturbation on some of the pixels cause the network to classify it as a speed-limit sign?”

In CPS, another type of verification problem is important. If the neural network is controlling some aspects of a physical system, does the physical system, which continuously responds to the neural network’s commands, stay physically safe? This requires reasoning over multiple invocations of the neural network, as well as the evolution of the physical system, which may be provided mathematically using differential equations. This harder verification problem is the closed-loop neural network verification problem.

What is currently possible in neural network verification?

Defining problems is a good start, but can we actually solve open and closed-loop neural network verification problems? The answer, of course, depends on the specific model and properties. Formal methods as an area is interesting in this respect. Models and proof algorithms seem completely mathematical in nature, but the problems being solved are usually undecidable or NP-Complete. Evaluating if a proposed formal method is good, therefore, depends on empirical results. Does the analysis method work in practice on the systems we care about?

An important development in this direction, published in CAV 2017 with now over 500 citations, is the Reluplex verification paper. In addition to clarifying the verification problem and providing a clever verification approach based on a modification of the Simplex linear programming algorithm, the paper proposes an important set of benchmarks based on the ACAS Xu system. This system is designed to prevent mid-air collisions of aircraft by issuing turn advisory outputs—strong left, weak left, clear-of-conflict, weak right, or strong right—given input values for the aircraft relative positions, headings, and velocities. Originally created as a multi-gigabyte lookup table of rules to follow, the system was compressed using a series of neural networks to save memory, with the motivation that avionics hardware is often resource constrained. Did this compression of the look-up table as a neural network affect the system’s operation? That is a verification question.

The paper provides about 200 specific verification queries—network-specification pairs—with runtimes ranging from seconds to days, including some (originally) unsolved instances. Ten simple open-loop verification properties are considered. This includes things like property 3, which states that if the two aircraft are in a head-on collision situation, a turn advisory will be issued. This can be stated in an open-loop fashion by providing acceptable preconditions to the network, ranges of states that are considered a head-on collision scenario. The acceptable output is that the minimum output, which is used to select the current turn advisory, must not be clear-of-conflict.

The image below shows the output advisory issued by the network for different positions of an intruder aircraft, which appears to meet the specification that a turn advisory is issued when the intruder is close enough. This was produced by sampling the network, however. Is there an adversarial state that doesn’t issue a turn? This is the open-loop verification problem checked by property 3.


Research on neural network verification has appeared in a variety of top conferences such as EMSOFT, CAV, FM, HSCC, ICCPS, POPL, USENIX Security, NeurIPS, AAAI and many others. Special-purpose workshops on the topic have also recently started such as the International Workshop on Verification of Neural Networks (VNN) and the Workshop on Formal Methods for ML-Enabled Autonomous Systems (FoMLAS). This short article cannot attempt to review the entire field. For that, surveys have been produced recently on the neural network verification topic, and informative overview talks from top researchers are also available online.

Our group has pursued the verification strategy of propagating sets of states geometrically layer by layer through a network. To do this exactly for ReLU networks such as the ones in ACAS Xu, you must be able to perform two geometric operations: (1) linear transformations of sets in a Euclidean space, and (2) intersections of these sets. This would not be efficient if we were using polytopes given by their linear constraints, as computing general linear transformations is inefficient in high dimensions. Other representations have been considered for the problem, such as zonotopes (linear transformations of a unit box), which are efficient for linear transformations but do not support intersections. Instead, we have explored using the star set representation, sometimes called AH-polytopes or affine representations. Star sets are 2-tuples consisting of a linear transformation part and a polytope part, which are kept separate. The linear transformation of a star set can be done efficiently by modifying the linear transformation part, while taking intersections with star sets only modifies the polytope part. Star sets can therefore analyze neural networks exactly, without overapproximation error, providing visual depictions of the execution of a neural network layer by layer, such as the one below.


Here, we could optimize over the output set to check if the clear-of-conflict output is ever less than the other outputs, in order to verify the head-on-collision specification. As shown in the animation, randomly sampling the input space is generally a bad strategy, since large regions of possible outputs may not be seen even when many simulations are executed. This analysis can also be used for test-case generation, finding the closest point in the input space corresponding to some dangerous region in the output space.

A more interesting question with ACAS Xu asks if the network is repeatedly invoked and the aircraft obeys the advisories, will a collision actually be avoided? This is now a closed-loop neural network verification problem. Simulations show that, from a set of initial states, it looks like collisions are avoided:


Again, star set based analysis can be used to prove such a property, equivalent to running an infinite number of simulations. This requires alternatively reasoning over executions of the network and executions of the physical system’s differential questions. Shown below are the decision points obtained in the above simulation video with each advisory (left), as well as the result of set-based analysis with star sets (right).


Is the neural network verification problem solved?

No. The biggest problem with verification methods is scalability, which is related to runtime. If you can analyze the same network faster it means you can analyze bigger networks in the same amount of time.

Using star sets makes exact verification possible, but directly implementing this method can be slow. Our group’s upcoming CAV 2020 paper shows how to make this 10-100x faster, using combinations of implementation optimizations and selective use of overapproximation methods to prune the search. Using zonotope overapproximations, for example, we can eliminate much of the computation needed with the star set verification method. This is another theme common in formal methods: use overapproximations as much as possible, and revert to slower, more exact methods, only when it’s needed. With the improvements, we are now able to verify in minutes or seconds some properties that needed days to check in the original Reluplex work three years ago.

The ACAS Xu networks are fairly small, just 300 neurons. Large image classification networks may have millions of neurons. Although certain types of exact analysis is possible even for these larger networks, overapproximation methods are probably a better fit for this domain. Beyond verifying properties, approaches try to use formal verification methods to improve network training, which is another important application.

In other areas of formal verification, competitions have driven progress. There are yearly competitions in SAT and SMT solving, as well as software verification. This summer the neural network research community will hold the first VNN-COMP verification competition on the open-loop neural network verification problem, with results expected in a few months. The second year of the AINNCS verification competition is also underway, which looks at benchmarks for the closed-loop verification problem. Both of these accept outside benchmarks of interest, to guide the algorithm and tool developers to benchmarks that matter in practice (please submit some if you see these methods as useful for your application). In the years to come, solver performance will likely continue to increase by several orders of magnitude, making verification of practical systems increasingly feasible.

Author bio:  Stanley Bak (@StanleyBak) is joining the Department of Computer Science at Stony Brook University as an Assistant Professor starting in Fall 2020. He investigates the verification of autonomy, cyber-physical systems, and neural networks. He strives to develop practical formal methods that are both scalable and useful, which demands developing new theory, programming efficient tools and building experimental systems.

Stanley Bak received a Bachelor’s degree in Computer Science from Rensselaer Polytechnic Institute (RPI) in 2007 (summa cum laude), and a Master’s degree in Computer Science from the University of Illinois at Urbana-Champaign (UIUC) in 2009. He completed his PhD from the Department of Computer Science at UIUC in 2013. He received the Founders Award of Excellence for his undergraduate research at RPI in 2004, the Debra and Ira Cohen Graduate Fellowship from UIUC twice, in 2008 and 2009, and was awarded the Science, Mathematics and Research for Transformation (SMART) Scholarship from 2009 to 2013. From 2013 to 2018, Stanley was a Research Computer Scientist at the US Air Force Research Lab (AFRL), both in the Information Directorate in Rome, NY, and in the Aerospace Systems Directorate in Dayton, OH. He helped run Safe Sky Analytics, a research consulting company investigating verification and autonomous systems, and performed teaching at Georgetown University before joining Stony Brook University as an assistant professor in Fall 2020. 

Disclaimer: Any views or opinions represented in this blog are personal, belong solely to the blog post authors and do not represent those of ACM SIGBED or its parent organization, ACM.

Leave a comment