[EMSOFT 2021 Best Paper] Verified Lustre normalization with node subsampling
“Verified Lustre normalization with node subsampling” by Timothy Bourke, Basile Pesin, Paul Jeanmaire, and Marc Pouzet is the best paper winner of the International Conference on Embedded Software (EMSOFT) in 2021. This blog post gives an overview of the work. The Coq development described in the paper is available online. There is also a page of links between the article text and the Coq development.
It’s difficult to imagine an engineer describing a system without using a block diagram showing the main functional units and the data flowing between them. Tools such as MathWorks Simulink, National Instruments LabVIEW, and Ansys Scade Suite, to name but a few, let engineers program with block diagrams, which are then used as a basis for simulation, testing, verification, and code generation. Scade Suite, in particular, is used in this way to develop safety-critical control software. The figure below presents a screenshot of a simple system in Scade Suite.
The blocks in such diagrams can be seen as functions from streams of inputs to streams of outputs. Compilation of blocks produces imperative code that is executed cyclically to successively sample inputs and compute outputs. This is the basis for the academic programming language Lustre which preceded the richer Scade language. The Vélus project aims to formalize the semantics of block diagram languages like Lustre and Scade and to prove their compilation schemes correct in the interactive theorem prover Coq. Coq is a purely functional programming system that also permits writing mathematical specifications and proofs. Vélus builds on the CompCert verified C compiler to provide an end-to-end proof between a dataflow semantics for source programs and the imperative semantics of the assembler generated from them. The end result is a compiler that you can run in a terminal or a web page to transform Lustre via C into assembly. A machine-checked proof guarantees that cyclically executing the generated code accurately reproduces the streams specified by the source program.
Compilation
The block diagrams shown in the screenshot above are written as follows in the textual syntax of Lustre.
Each block is declared with an input/output interface and defined by a set of mutually recursive equations. This program waits for the input i to change from false to true and then sustains its output o for n cycles. We won’t explain the defining equations here, but you can find a detailed description on pages 2 and 3 of our article. This example, taken from the Lucid Synchrone user manual, is typical of the control operators provided by the Scade Suite and Simulink libraries.
The program is read from the input file by a parser and represented in memory as an abstract syntax tree. The compiler is structured as a succession of functions each of which transforms an abstract syntax tree from a source language to a target language. The structure of the Vélus compiler is shown below.
In this diagram, the boxes represent languages and the arrows represent compilation functions. After parsing and the addition of type annotations, the source program is in the Lustre language, it is then successively transformed into a restricted form called NLustre, a language of transition systems called Stc, a simple imperative language called Obc, and finally to CompCert’s Clight. Some of the functions (transcription, i-translation, …) map between different languages, others (unnesting & distribution, fusion optimization, …) rewrite the program in the same language.
Correctness
The general schema for a single compilation pass is shown below. The upper half shows the compilation function that transforms a source program into a target program.
To determine whether the function is correct, we first need to define what is expected. This is done by giving a meaning, or semantics, to the source and target programs, and then equating their input/output behaviours. For dataflow programs, i.e., in Lustre or NLustre, the meaning is defined by associating each variable and subexpression with a stream of values. For transition systems, i.e., in Stc, the meaning is defined by an initial state and a relation between current and successor states. For imperative programs, i.e., in Obc or Clight, the meaning is a succession of updates to variables. In Coq, these meanings are expressed by sets of predicates that associate the syntax of a program to a mathematical object, i.e., a stream, transition relation, or state relation. A correctness lemma assumes various conditions on typing and causality, and states a relation between the source and target semantic models. Its proof invariably involves induction over the syntax of a program followed by case analysis over all constructions in the language. From a certain point of view, we state the problem and then the proof assistant generates (very many) conditions to check. Unfortunately, the reality is more complex: inductive proofs require carefully crafted invariants and reasoning in a proof assistant is fastidious.
The overall correctness theorem for Vélus is a composition of the individual correctness lemmas and CompCert’s correctness theorem. It provides a machine-checked relation between a dataflow model of the source language and the trace-based model of the generated assembly code.
Our Article
We have been developing Vélus for the last seven years. Our article at EMSOFT 2021 both addresses the normalisation pass that transforms Lustre into NLustre and enriches the language with sampled arguments. In practical terms, our compiler now supports the usual syntax of Lustre, not just the restricted form available in previous versions of Vélus.
Normalization
We decomposed normalization into the three functions visible in red at the top of the Compiler Architecture Diagram above: unnesting and distribution, expression initialization, and transcription. The overall result on the example program is shown below.
The basic idea is to simplify expressions by isolating any element that requires an internal state into its own equation. This is why the above program contains distinct equations for the fby
operators that introduce unit delays and for block instances like the use of count_down
within rising_edge_retrigger
. We must also ensure that the fby
equations are initialized by a constant. This transformation is a first step toward the generation of imperative code that explicitly initializes and updates internal state variables.
Notice that the transformed program contains many new local variables, namely those whose names contain the $
symbol. Normally, introducing new names is trivial: the compiler can simply increment a counter. This is not so easy to do, however, in a purely functional programming language. Rather, it is necessary to explicitly track which names have been used and thread this information through the compilation functions (technically, we use a “monadic” form). There are a lot of tedious proof obligations about the freshness of introduced names.
The treatment of names is tedious but conceptually simple. The most difficult problem was reasoning about the correctness of activation conditions. Each expression in a Lustre program is associated with an activation condition that ultimately determines when it should be computed. An activation condition is represented by an annotation, called a clock type, that becomes a boolean condition in the generated code. For an annotation to be correct, it must only evaluate to true in instants when the corresponding stream has a value. In other words, the static annotations used to generate code must agree with the rhythms of streams decreed by the semantic model. For technical reasons, this property had to be assumed in the model for NLustre, but we were able to eliminate it from the model for Lustre introduced in our article. The new model is thus simpler and closer to pen-and-paper semantic models, but the correctness property has to be proved. We did this by expressing the dependency relation between variables as an acyclic graph with an associated induction principle.
Node subsampling
The term node subsampling refers to the definition and use of blocks having some inputs and outputs that are not always provided. This feature is quite specialized but it does allow a version of the current operator to be implemented as a library function. The source code is shown below.
This block has three inputs: a default value d, a variable representing when a signal is available ck, and the signal itself x. The annotation on x indicates that its value is only available in instants when ck is true. Without going into the details, the body of the block defines a buffer that is updated with the value of x when it is given and otherwise holds the initial or previous value.
Adding node subsampling support to the compiler required updates to the semantic models, type systems, and proofs, as expected, but we were surprised to find it was also necessary to adapt the compilation functions that generate Clight. To understand why, consider that each block is compiled into a pair of step and reset functions, much like the setup and loop functions used when programming the Arduino, for example. The functions generated for current have the following prototypes.
When fun$step$current is called in a cycle, the value of x will only have been calculated if ck is true. Lustre’s type system and compilation scheme guarantee that the value of x is only ever used when it has been calculated but, unfortunately, the C standard does not allow passing undefined values via function calls. The C99 standard states (§6.3.2.1) if an lvalue does not designate an object when it is evaluated, the behavior is undefined, and, furthermore (§6.5.2.2), that in preparing for the call to a function, the arguments are evaluated, and each parameter is assigned the value of the corresponding argument. It is unlikely that we would have thought of this detail ourselves, but we could not avoid confronting it in our proof because the semantic model for Clight precisely encodes the informal text from the C99 document. Specifically, the eval_exprlist function specifies the successive evaluation of arguments. It relies on another function, sem_cast which rejects casting uninitialized values (Vundef
) to anything other than the void
type.
There are many possible solutions to this problem. We chose to implement a simple heuristic that adds explicit initializations for local variables that may otherwise be passed uninitialized into a function.
Conclusion
The CompCert project showed how to specify and verify a C compiler for embedded applications in an interactive theorem prover. We build on this work to specify and compile features of the block-diagram languages used to program some types of embedded control software. The features treated in our EMSOFT article permit the compilation of concise, natural Lustre programs. For instance, beforehand a programmer would have to write the rising-edge retrigger example in the normalized form shown above, whereas afterward our compiler accepts the source program presented at the start of the article. Perhaps more importantly, the semantic predicates presented in the article and encoded in the Coq proof assistant are now quite close to the formal, but non-“mechanized” definitions used in modern articles. The invariants and proof techniques that we developed permit to formally connect this source-level model to the code generated from it. The proof involves a succession of quite technical invariants, lemmas, and intermediate models. External reviewers, however, need only consider the source semantic model and the other definitions used in the overall correctness theorem—the proof assistant validates the fine-grained reasoning.
Acknowledgment
We would like to thank Baptiste Pauget and Jean-Louis Colaço for their help with Scade Suite, and also the EMSOFT 2021 program committee and its chairs, Linh Thi Xuan Phan and David Broman, for organizing the conference and reviewing our submission.
Authors: Timothy Bourke is a Research Scientist at Inria Paris in the PARKAS team. His research is centered on rigorous approaches to modelling, programming, and verifying embedded control systems and their physical environments. He focuses particularly on applying interactive theorem provers and synchronous languages to practical problems. He teaches at the École normale supérieure and the École polytechnique.
Basile Pesin is a French doctoral student with the Inria PARKAS team. He is interested in synchronous programming, embedded software, language and compiler design, and proof assistants.
Paul Jeanmaire is a doctoral student in the Inria PARKAS team. He is interested in formal verification, proof assistants, and semantic models for embedded programming languages.
Marc Pouzet is a Professor at Ecole normale supérieure and head of the Inria PARKAS team. His research focuses on the design, semantics and implementation of trustworthy languages to implement critical real-time software. His work on Lucid synchrone blends ideas from synchronous and typed functional languages. The Scade 6 synchronous language developed by ANSYS incorporates some of his work on language features, type-based analyses, and compilation techniques.
Disclaimer: Any views or opinions represented in this blog are personal, belong solely to the blog post authors, and do not represent those of ACM SIGBED or its parent organization, ACM.