Time-predictable hardware acceleration: we should not overlook FPGA-based SoC
“Cyber-physical systems (CPS) are facing revolution […]”: it is with a sentence like this one that many scientific papers published in these years begin to introduce new results of disparate kinds. Well, I do not want to sound rhetorical nor to bore the reader from the very beginning, but I think that also this post should begin with a sentence of that kind, as what I am going to discuss next pertains to something that, in my humble opinion, can be really called “a revolution”, at least up to some extent.
The introduction of modern sensors such as high-resolution cameras and lidars, together with the impressive performance achieved by machine learning algorithms, are pushing for completely new requirements to be satisfied by software frameworks for next-generation CPS and are hence inevitably steering their design and development. These requirements demand for the processing of huge amounts of data in real time via complex, massively-parallel computations, posing CPS designers in front of challenges that cannot be definitively compared with those faced in the past years. Indeed, despite being true that most CPS software has been characterized by a continuous increase in complexity during the last decade, matching these new requirements really means making a “giant step” in complexity. This is especially true when considering their implications in a critical domain such as the one of CPS that, differently from general-purpose computing, demands for rigorous software engineering and the use of analytical models to comply with stringent safety certification standards.
The most popular examples of such next-generation CPS are autonomous vehicles, which are characterized by heavy computing workloads originated by perception tasks (such as lane and object detection, object tracking, and point cloud segmentation) that crunch a lot of data in real time and make an extensive use of machine learning algorithms.
Chip manufacturers are chasing these trends by evolving embedded computing platforms towards heterogeneous designs that integrate asymmetric multicore processors together with hardware accelerators, such as general-purpose graphic processing units (GPGPUs) and field programmable gate arrays (FPGAs). These platforms are de-facto establishing as power-efficient solutions to match the requirements of next-generation CPS, at the cost of both a paradigm shift in the design and development of software, as well as a considerable increase in the efforts required to analyze and configure the internal mechanisms of the platforms.
The last revolution faced by embedded computing platforms was the advent of multicore processors, which originated many issues that still today make difficult achieving timing predictability in the execution of critical software. Although multicore processors stimulated interesting and challenging research in the field of real-time systems, they also gave rise to a string of software “hacks” that have been conceived just to workaround hardware platforms that were mostly not designed for predictable execution. For instance, this has been the case for dealing with the contention experienced by different cores at the micro-architectural level (shared level of caches, memory controllers, etc.), which is notably one of the most important sources of unpredictability in multicores.
I am scared that the same can happen with heterogeneous computing platforms! While with multicores, in most cases, the only option to build real-time systems with commercial off-the-shelf platforms was probably to use software workarounds, for hardware accelerators we may have an interesting opportunity that should not be overlooked by researchers.
The case of FPGA-based SoC
The landscape of heterogeneous platforms offers a wide spectrum of hardware accelerators that includes the very popular GPGPUs, digital signal processors (DSPs), many cores, application-specific integrated circuits (ASICs), and FPGAs. The latter are the ones that in my opinion are significantly attractive for critical systems.
System-on-chips that integrate both multicore processors and an FPGA fabric are indeed a very promising solution to realize time-predictable embedded computing systems with hardware acceleration. While FPGAs were traditionally considered as solutions for prototyping circuits, nowadays they are increasingly used to deploy energy-efficient, yet powerful and flexible hardware accelerators that exhibit competitive performance with respect to other technologies (e.g., GPGPUs).
Differently from other solutions, FPGA-based accelerators are particularly suitable for timing predictability for at least two main reasons. First, they are very often characterized by a very regular, clock-level timing behavior: they behave similarly to state machines that evolve at each clock beat and hence, when running in isolation, their execution manifests very little timing fluctuations. Second, being the hardware design to be deployed in the FPGA under the full control of the designer, FPGA-based accelerators allow for an explicit control of the memory traffic they generate. Indeed, it is both possible to deploy memories on the FPGA that works similarly to scratchpads (i.e., memories controlled by the programmer/designer, which are notably far more predictable than caches) and to regulate the memory traffic with custom arbitration policies. For instance, if one wants to reserve a certain memory bandwidth for a hardware accelerator that accesses an off-chip shared memory, it is sufficient to deploy a minimal module on the FPGA that budgets the memory transactions issued by the accelerators over time.
Modern FPGAs are also attractive because they offer an interesting and unique feature named dynamic partial reconfiguration (DPR), which allows reprogramming a portion of the FPGA area at run time while other modules deployed on the rest of the area continue to operate. DPR allows implementing area virtualization, hence making possible to realize efficient and cost-effective systems in which more accelerators than those that can be statically programmed on the FPGA are actually deployed. Differently from heterogeneous platforms that include GPGPUs, FPGA-based SoC also proved to be particularly open, i.e., the vendors provide several details on their internals, and are therefore more suitable for building accurate models to study their timing behavior, and for properly configuring the platform to enhance predictability (e.g., see the CLARE project). Finally, it is worth mentioning that FPGAs are already used in safety-critical domains (railway, aerospace, etc.). As such, their vendors already dispose of the industrial culture to comply with stringent certification standards, and some system integrators already have experience in dealing with these platforms in safety-critical systems.
Drawbacks and limitations of FPGA-based SoC
By looking at all the advantages mentioned above, it seems that FPGA-based SoC are a perfect match for implementing hardware-accelerated critical systems, however, to date, they still receive a limited attention compared to other platforms. The reasons for this restrained adoption are manifold and are also possibly of a non-technical nature.
My personal take is that, to properly benefit from the power of FPGA-based SoCs, today’s technologies still require a considerable expertise with hardware design and, with respect to classical software development, involve complex design flows in which some steps must be manually performed (e.g., geometrically placing the accelerators on the FPGA area and connecting bus signals). Despite the fact that FPGA vendors are pushing for improving the programmability of FPGAs, e.g., by means of high-level synthesis (HLS) and other development tools, there are still some design steps that do not dispose of proper automated tools and are hence tedious and time-consuming to accomplish. These issues are de-facto limiting the exposure of these platforms to software-oriented developers and represent a barrier for their adoption. Overall, new development and design tools are required, and the research communities can play a key role in solving fundamental problems on the way of enhancing the usability of FPGA-based SoCs. The reader may check out the AMPERE project, funded by the European Union’s Horizon 2020 research and innovation programme, which is committed to address some of these challenges.
Author bio: Alessandro Biondi is tenure-track Assistant Professor at the Real-Time Systems (ReTiS) Laboratory of the Scuola Superiore Sant’Anna. He graduated (cum laude) in Computer Engineering at the University of Pisa, Italy, within the excellence program, and received a Ph.D. in computer engineering at the Scuola Superiore Sant’Anna. In 2016, he has been visiting scholar at the Max Planck Institute for Software Systems (Germany). His research interests real-time multiprocessor and heterogeneous systems, design and implementation of real-time operating systems and hypervisors, timing analysis, cyber-physical systems, and synchronization protocols. He was recipient of six Best Paper Awards, one Outstanding Paper Award, the ACM SIGBED Early Career Award 2019, and the EDAA Dissertation Award 2017.
Disclaimer: Any views or opinions represented in this blog are personal, belong solely to the blog post authors and do not represent those of ACM SIGBED or its parent organization, ACM.