MBMV 2019 - vde.com · (Treffen der Arbeitsgruppen, jährliches Symposium, Barcamp), und wie die...

Christoph Grimm, Klaus Schneider, Carna Zivkovic (Hrsg.)

22. Workshop „Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen“

8. – 9. April 2019 in Kaiserslautern

MBMV 2019

Christoph Grimm, Klaus Schneider, Carna Zivkovic (Hrsg.)

MBMV 201922. Workshop „Methoden und Beschreibungs sprachen zur Modellierung und Verifikation von Schaltungen und Systemen“

8. – 9. April 2019 in Kaiserslautern

Workshop der GMM/ITG/GI-Fachgruppen 3 und 4organisiert vom VDE und der Technischen Universität Kaiserslautern

VDE VERLAG GMBH

MBMV 2019 ⋅ 8. – 9. April 2019 in Kaiserslautern

Bibliografische Information der Deutschen NationalbibliothekDie Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; detaillierte bibliografische Daten sind im Internet über http://dnb.dnb.de abrufbar.

ISBN 978-3-8007-4945-4 (CD-ROM)ISBN 978-3-8007-4946-1 (E-Book)

© 2019 VDE VERLAG GMBH ⸱ Berlin ⸱ Offenbach, Bismarckstraße 33, 10625 Berlin www.vde-verlag.de

Alle Rechte vorbehalten.

Das Werk ist urheberrechtlich geschützt. Jede Verwertung außerhalb der engen Grenzen des Urheberrechtsgesetzes ist ohne Zu stimmung des Verlags unzulässig und strafbar. Die Wiedergabe von Gebrauchsnamen, Handelsnamen, Warenbeschreibungen etc. berechtigt auch ohne besondere Kennzeichnung nicht zu der Annahme, dass solche Namen im Sinne der Markenschutz-Gesetzgebung als frei zu betrachten wären und von jedermann benutzt werden dürfen. Aus der Veröffentlichung kann nicht geschlossen werden, dass die beschriebenen Lösungen frei von gewerblichen Schutzrechten (z. B. Patente, Gebrauchsmuster) sind. Eine Haftung des Verlags für die Richtigkeit und Brauchbarkeit der veröffentlichten Programme, Schaltungen und sons-tigen Anordnungen oder Anleitungen sowie für die Richtigkeit des technischen Inhalts des Werks ist ausgeschlossen. Die gesetzlichen und behördlichen Vorschriften sowie die technischen Regeln (z. B. das VDE-Vorschriftenwerk) in ihren jeweils geltenden Fassungen sind unbedingt zu beachten.

Produced in Germany

III


VorwortWir freuen uns darüber, dass wir Ihnen den hier vorliegenden Tagungsband der ausgewählten Beiträge des 22. Workshops zur Modellierung und Verifikation von Schaltungen und Systemen, MBMV 2019, vorstellen dürfen. MBMV ist der jährlich stattfindende Workshop der GMM/ITG/GI-Fachgruppen 3 und 4 und wurde 2019 vom 8.-9. April in Kaiserslautern organisiert.

Das Ziel des Workshops besteht darin, Experten aus Industrie und Forschung zusammenzubringen, um ge meinsam neue Trends, Ergebnisse und aktuelle Fragen auf dem Gebiet der Modellierung und der Verifi ka-tion von Schaltungen und Systemen zu diskutieren.

Der vorliegende Tagungsband ist eine ausgewählte Sammlung von 8 wissenschaftlichen Beiträgen und 8 wei-te ren Übersichtsvorträgen. Übersichtsvorträge berichten über bereits veröffentlichte Ergebnisse oder über noch unvollständige Arbeiten und werden als kurze Zusammenfassungen in diesem Tagungsband veröffentlicht. Wissenschaftliche Beiträge präsentieren neue Ergebnisse und werden zusätzlich als volle Beiträge in IEEE Xplore veröffentlicht.

Der Workshop füllte ein attraktives Programm über zwei Tage mit zwei interessanten eingeladenen Vorträgen aus Industrie und Forschung. Die Editoren bedanken sich bei den eingeladenen Vortragenden dafür, dass sie unseren Einladungen gefolgt sind und zum Erfolg des Workshops beigetragen haben. Vielen Dank geht auch an alle Autoren für ihre Beiträge zum Konferenzprogramm und zu diesem Tagungsband.

Wir möchten uns auch bei allen Gutachtern für ihre Zeit und Mühe bei der Bereitstellung wertvoller Dis kus-sio nen und die Überprüfung der eingereichten Beiträge bedanken.

Last but not least, geht ein großes Dankeschön an das VDE-Team für die engagierte Unterstützung im Kon-fe renzmanagement und die Publikation der Konferenzbeiträge durch den VDE VERLAG und die IEEE Pro-cee dings.

Kaiserslautern, den 22.03.2019

Christoph GrimmKlaus SchneiderCarna Zivkovic

IV


Programmkomitee

Christoph Scholl, Albert-Ludwigs University of FreiburgChristoph Grimm, TU KaiserslauternChristian Haubelt, University of RostockChristoph Jäschke, IBM ResearchCarna Zivkovic, TU KaiserslauternDaniel Große, University of BremenFrank Slomka, University of UlmFrank Oppenheimer, OFFIS Jens Brandt, Hochschule NiederrheinJens Schönherr, HTW DresdenJürgen Ruf, Bosh-SensortecJürgen Teich, University of Erlangen-NurembergKlaus Schneider, TU KaiserslauternMarkus Wedler, Synopsys GmbHMichael Glass, University of UlmOliver Bringmann, University of TübingenRolf Drechsler, University of BremenRobert Wille, JKU LinzRaik Brinkmann, OneSpin SolutionsThomas Klotz, Bosch Sensortec GmbHThomas Kropf, Robert Bosh and University of TübingenUlrich Heinkel, TU ChemnitzWolfang Kunz, TU KaiserslauternWolfang Ecker, Infinion Technologies AGWolfang Müller, University of Paderborn

Programm Chairs und Organisation

Christoph Grimm, TU KaiserslauternKlaus Schneider, TU KaiserslauternCarna Zivkovic, TU Kaiserslautern

Additional Reviewers

Beichler, BenjaminEcker, WolfgangGis, DanielHerdt, VladimirKölsch, JohannesLe, Hoang M.Ratzke, AxelStreit, Franz-Josef

V


Inhaltsverzeichnis

Keynotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Verification

1 fbPDR: In-depth Combination of Forward and Backward Analysis in Property Directed Reachability (Extended Abstract) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Tobias Seufert and Christoph Scholl

2 ACCESS: HW/SW-Co-Equivalence Checking for Firmware Optimization . . . . . . . . . . . . . . . . . 4Michael Schwarz, Dominik Stoffel and Wolfgang Kunz

3 Inductive Proof Rules Beyond Safety Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Martin Köhler and Klaus Schneider

4 Approximation of Neural Networks for Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Fin Hendrik Bahnsen and Goerschwin Fey

HW/SW Systems

5 Self-Explaining Digital Systems – Some Technical Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Goerschwin Fey and Rolf Drechsler

6 Measuring NoC Fault Tolerance with Performability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Jie Hou and Martin Radetzki

7 Analyse sicherheitskritischer Software für RISC-V-Prozessoren . . . . . . . . . . . . . . . . . . . . . . . . . 36Peer Adelt, Bastian Koppelmann, Wolfgang Müller and Christoph Scheytt

8 Logical Analysis of Distributed Systems: The Importance of Being Constructive . . . . . . . . . . . 41Michael Mendler

Design Methods and Optimization

9 Optimization Framework for Hardware Design of Engine Control Units . . . . . . . . . . . . . . . . . . 42Iryna Kmitina, Nico Bannow, Christoph Grimm, Daniel Zielinski and Carna Zivkovic

10 Logic Optimization of Majority-Inverter Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50Heinz Riener, Eleonora Testa, Winston Haaswijk, Alan Mishchenko, Luca Amaru, Giovanni De Micheli and Mathias Soeken

11 How to Keep 4-Eyes Principle in a Design and Property Generation Flow . . . . . . . . . . . . . . . . . 54Keerthikumara Devarajegowda, Wolfgang Ecker and Wolfgang Kunz

VI


12 Automated Sensor Firmware Development – Generation, Optimization, and Analysis . . . . . . 60 Jens Rudolf, Manuel Strobel, Joscha-Joel Benz, Christian Haubelt, Martin Radetzki and Oliver Bringmann

System Design

13 SEMAS – System Engineering Methodology for Automated Systems – The World Described in Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72Markus Hedderich, Markus Heimberger and Axel Klekamp

14 Ein Ansatz für die agile, verteilte Entwicklung Cyber-Physischer „Systems of Systems“ (Work in Progress) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81Christoph Grimm, Frank Wawrzik and Carna Zivkovic

15 Processor Hardware Security Vulnerabilities and their Detection by Unique Program Execution Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85Mohammad Rahmani Fadiheh, Dominik Stoffel, Clark Barrett, Subhasish Mitra and Wolfgang Kunz

16 Model-Based Configuration of a Coarse-Grained Reconfigurable Architecture . . . . . . . . . . . . 87Jens Froemmer, Nico Bannow, Axel Aue, Christoph Grimm and Klaus Schneider

Keynote 1 Certified IC Integrity: Thriving Towards Correct, Safe, Secure, and Trusted Circuits

Tobias Welp, OneSpin Solutions

Nowadays, integrated circuits are routinely deployed within safety- and security-critical systems. By definition, malfunc-tion of these circuits can have grave consequences and as such must be avoided. However, the integrity of the integrated circuits is threatened in all phases within their life cycles: While the design phase is e.g. particularly susceptible to design bugs, the implementation and manufacture phases are prime targets for trojan insertion. After the deployment of the system, cosmic radiation may cause faults through soft errors. Academia and industry have developed an ecosystem of technologies to mitigate the risks associated with these threats. While formal verification has been adopted in industry to complement simulation-based approaches for detecting design bugs, the risk of random errors is alleviated through software and hardware safety mechanisms and the efficacy of these mechanisms can be formally verified. Security concerns can be addressed with trojan detection tools and rigorous equiv-alence checking from design to manufacture. Parallel to the technology development, standards are being devised that provide recommended practices and guide vendors in their quest for reliable integrated systems.

Keynote 2 GdR SoC^2 und LIP6

Daniela Genius, Campus Pierre et Marie Curie Sorbonne Université – LIP6

Diese Keynote gibt einen (recht teilweisen) Einblick in die Forschungslandschaft in Frankreich. Mitglieder einer GdR (Groupe de Recherche) sind akademische Forscher einer gemeinsamen Thematik. Im Falle des GdR Soc2 sind dies Sytems-on-Chip, Eingebettete Systeme und das Internet der Dinge. Soc2 ist aus der GdR SoC-SiP (System on Chip-System in Package) hervorgegangen. Wir zeigen auf, wie die GdR SoC2 strukturiert ist, wie der Jahresablauf gestaltet ist (Treffen der Arbeitsgruppen, jährliches Symposium, Barcamp), und wie die GdR mit den anderen Institutionen der For-schung in Frankreich zusammenarbeitet. Das LIP6 (Laboratoire d'Informatique de Paris-VI) ist an der Sorbonne Université im Zentrum von Paris angesiedelt und eine mit dem CNRS (Centre National de Recherche Scientifique) gemeinsame Forschungseinrichtung. Es hat über 200 permanente und etwa genauso viele temporäre Mitarbeiter (Doktoranden und PostDocs), in 4 Achsen mit insgesamt 22 Forscherteams organisiert, und ist damit das größte Informatik-Institut Frankreichs. Die Spannbreite der Forschung am LIP6 erstreckt sich von Algorithmik über Künstliche Intelligenz und Rechennetzwer-ken, bis hin zu Computer-Hardware. Das LIP6 hat zahlreiche industrielle und internationale Partner. Wir versuchen in dieser Keynote, die beiden Institutionen umfassend zu beschreiben, und einige persönliche Eindrücke von über 15 Jahren Tätigkeit in dem Kontext zu geben.


1

fbPDR: In-depth combination of forward and backward analysis inProperty Directed Reachability (extended abstract)

Tobias Seufert and Christoph SchollUniversity of Freiburg, Freiburg, Germany, {seufert,scholl}@informatik.uni-freiburg.de

Abstract

We describe a thoroughly interweaved forward and backward version of PDR/IC3 called fbPDR. Motivated by the com-plementary strengths of PDR and Reverse PDR, fbPDR enables beneficial collaboration between the two and lifts thecombination to a new level. We lay the theoretical foundations for sharing information between PDR and Reverse PDRand demonstrate the effectiveness of our approach on benchmarks from the Hardware Model Checking Competition.

1 Introduction

Nowadays PDR (or IC3) [1] [2] is considered as one ofthe most powerful methods in hardware verification. Un-like others it does not unroll a transition relation. It in-crementally strengthens a proof until a safe invariant or acounterexample is found. PDR in its usual definition has a‘fixed direction’: It considers overapproximations of statesets reachable from the initial states in k or less steps.This work is motivated by observations already made in [3]showing that a combination of (forward) PDR and (back-ward) Reverse PDR is worthwile. Reverse PDR computesover-approximations of state sets which can reach unsafestates in k or less steps. In [4] we examine Reverse PDRthoroughly and enable communication between PDR andReverse PDR via proof obligations.In our most recent work we lift the combination of PDRand Reverse PDR to a next level [5]. Our algorithm reallyintertwines PDR and Reverse PDR reasoning by strength-ening one trace using blocked cubes learnt from the other.We show that both communication via proof obligationsand strengthening one trace with information from theother can indeed be used successfully in combination.Here we describe our forward and backward version ofPDR/IC3 called fbPDR. Both PDR and Reverse PDR profitfrom all the information gathered by its counterpart and weobserve a significant speedup in both finding counterexam-ples as well as safe inductive invariants.

2 Communication via Proof Obliga-tions

One kind of information exchange between the two direc-tions of PDR takes place on the basis of proof obligations.Sets of proof obligations of the original forward PDR rep-resent underapproximations of the set of states from whichthe unsafe states (or unsafe) can be reached.The reason for this is that a proof obligation (s,k,d) intime frame k is a state s for which it has been shown thatthere is a path of length d from s to unsafe. ThereforePDR is obliged to prove that s is unreachable within ≤ ksteps from the initial states (or init). Hence, in Reverse

PDR the set of unsafe states can be extended by the proofobligations from forward PDR as a “target enlargement”.Extending unsafe in Reverse PDR has two effects:1.) During Reverse PDR, the intersection of a cube sthat can be reached from init (i.e., of a proof obligationin Reverse PDR) with the extended unsafe may nowbe non-empty, because of a non-empty intersection of swith a proof obligation from forward PDR. Due to thisnon-empty intersection, a counterexample, i.e., a tracefrom init to unsafe, has been found where the first part ofthe trace (reaching s from init) has been constructed byReverse PDR and the second part (reaching unsafe from s)has been constructed by forward PDR.2.) Sometimes in (Reverse) PDR the generalization ofblocked cubes (learnt clauses) s into s1 for unreachableproof obligations (technically, this corresponds to unsatis-fiable SAT solver calls that allow for literal dropping, seee.g. [2]) may be unnecessarily large such that it preventsearly convergence of the procedure. In the combinedalgorithm unnecessary large generalizations are restrictedby the stronger requirement that a generalized cube s2does not intersect the extended unsafe states, not onlyunsafe as in the original Reverse PDR. If a larger s1 thatcontains states from which unsafe can be reached (ashas been proved by forward PDR) would be removed bylearning clause ¬s1, this clause could not be part of anysafe inductive invariant (since it excludes states whichreach unsafe eventually).

Of course, a dual argumentation is possible for trans-ferring information from Reverse PDR to PDR: Sets ofproof obligations of Reverse PDR represent underapprox-imations of the set of states which can be reached frominit. So in the original PDR the set of initial states can beextended by these proof obligations.

3 Learning new Lemmas from Re-verse PDR

Basically, PDR gathers information in terms of proof obli-gations and learnt clauses (lemmas). In the previous sec-tion we presented the capabilities of sharing the proof obli-


2

gation part of this information. Now we want to focus onlearning new clauses (lemmas) for PDR from lemmas inReverse PDR.Note that in the following our analysis considers only thetransfer of information from Reverse PDR to PDR. How-ever, all procedures also apply the other way around con-sidering the characteristics of PDR and Reverse PDR.Reverse PDR maintains a trace RR0,RR1, . . . ,RRN ofclause sets. RRn represents an overapproximation of stateswhich are able to reach unsafe within 0≤ j≤ n steps. PDRmaintains a trace R0,R1, . . . ,RN of clause sets where Rnrepresents an overapproximation of states reachable frominit within 0≤ j ≤ n steps. It holds that all clause sets aresupersets of the ones with higher indices, i.e., Ri w Ri+1and RRi w RRi+1. In contrast, if we consider Ri (RRi) se-mantically as the state sets represented by the correspond-ing clause sets, we have Ri ⊆ Ri+1 and RRi ⊆ RRi+1.For a Reverse PDR clause (lemma) c ∈ RRN−i with c = sit is not clear how to make use of this information in PDRwhere we work with underapproximations of states reach-ing unsafe (i.e., proof obligations) and overapproximationsof states reachable from init (i.e., lemmas). In contrast, bylooking at sets RRN−i as a whole, we can extract usefulinformation:

Theorem 1. Given a Reverse PDR trace of length N anda PDR trace of length N′. Let s be an arbitrary cube s ⊆RRN−(i+1) with 0 ≤ i ≤ N− 2. If we strengthen the PDRtrace by blocking s in all frames 1 ≤ k ≤ min(i,N′), i.e.by setting Rk = Rk ∧ s, then in the resulting PDR trace thestate sets Ri with 0 ≤ i ≤ N still overapproximate the setsof states reachable from init in ≤ i steps. Moreover, theproperty of syntactical inclusion of clause sets Ri+1 v Rifor 1 ≤ i ≤ N− 1 and semantical inclusion Ri ⊆ Ri+1 for0≤ i≤ N−1 is preserved by the strengthening.

A comprehensive proof can be found in [5]. The intuitionbehind this is that RRN−(i+1) (after discharging all proof-obligations for the trace RR0, . . . ,RRN−1) excludes an over-approximation of the states which are reachable from initwithin ≤ i steps. Thus RRN−(i+1) is an underapproxima-tion of all states which are not reachable from init within≤ i steps. Excluding (blocking) such states from Rk with1≤ k ≤min(i,N′) in forward PDR maintains the propertythat Rk overapproximates the states reachable from init in≤ k steps.To strengthen a PDR trace according to Thrm. 1 we haveto extract subcubes from RRN−(i+1) provided by ReversePDR. RRN−(i+1) is given as a CNF, thus extracting allsubcubes amounts to a CNF to DNF conversion, and ex-tracting a restricted number of good, i.e., short, subcubesmeans computing only a part of the corresponding DNF.The naive way of CNF to DNF conversion using the law ofdistributivity can lead to an exponential growth. Anotherpossibility is to negate the CNF, use Plaisted-Greenbaum-Transformation [6] for translating the DNF into CNF, andnegate the result again. However, this method may havedisadvantages as well: The number of computed cubes islinear in the size of the CNF, but we may be interested ineven more condensed information to be transferred. More-over, we have to introduce new auxiliary variables which

act as additional state space variables.Thus we are using a SAT-based method to pick a smallnumber of preferably short and informative subcubes [5].

4 fbPDR

Our implementation fbPDR1 runs PDR and Reverse PDRin alternation (30s slices) and implements both commu-nication methods presented in Sects. 2 and 3. We alsoimplemented the feature that PDR and Reverse PDRtraces with length N and N′ can always be extended tolength max(N,N′) and all proof-obligations which repre-sent counterexamples of length≤max(N,N′) can immedi-ately be discharged.Experiments of the benchmark set of HWMCC’15 and’17 show promising results comparing the latest version offbPDR to the version only communicating via proof obli-gations [4] as well as the default configuration of ABC’sPDR2 and ic3ref3 (see Fig. 1).

0

500

1000

1500

2000

2500

3000

3500

320 340 360 380 400 420

tim

e in s

eco

nds

problem instances

Both coll. strategies (fbPDR)Comb. PDR [DATE 2018]ic3refABC PDR

Figure 1 Comparison of implementations

References

[1] A. R. Bradley, “Sat-based model checking without un-rolling,” in VMCAI, 2011, pp. 70–87.

[2] N. Eén, A. Mishchenko, and R. K. Brayton, “Efficientimplementation of property directed reachability,” inFMCAD, 2011, pp. 125–134.

[3] T. Seufert and C. Scholl, “Sequential verification usingReverse PDR,” in MBMV, 2017, pp. 79–89.

[4] ——, “Combining PDR and reverse PDR for hardwaremodel checking,” in DATE, 2018, pp. 49–54.

[5] ——, “fbPDR: In-depth combination of forward andbackward analysis in property directed reachability,”in to appear in DATE, 2019.

[6] D. A. Plaisted and S. Greenbaum, “A structure-preserving clause form translation,” in Journal of Sym-bolic Computation, 1986, pp. 293–304.

1We provide result tables and binaries under https://www.dropbox.com/s/ckbzq6kd10aebod/fbPDR.zip?dl=0.

2https://bitbucket.org/alanmi/abc, downl. on 9/10/20173https://github.com/IC3ref, downl. in Sep. 2016


3

ACCESS: HW/SW-Co-Equivalence Checkingfor Firmware OptimizationMichael Schwarz, Dominik Stoffel, and Wolfgang KunzTU Kaiserslautern, Germany, <surname>@eit.uni-kl.de

KurzfassungEnge Ressourcenbeschränkungen erfordern eine Anpassung eingebetteter Systeme an ihre jeweilige Anwendung. Oftsind Optimierungen der Firmware bzw. der HW/SW-Schnittstelle notwendig, um die Beschränkungen einzuhalten. Sieziehen jedoch häufig Änderungen in der Kommunikation zwischen Firmware und Hardware-Peripherie nach sich, was zuFehlern führen und die funktionale Korrektheit des Gesamtsystems beeinträchtigen kann. Diese Arbeit stellt eine formaleMethode für HW/SW Co-Equivalence Checking zur Verifikation des korrekten Ein-/Ausgabeverhaltens von Peripherieunter veränderter Firmware vor.

AbstractCustomizing embedded computing platforms to specific application domains often necessitates optimizing the firmwareand/or the HW/SW interface under tight resource constraints. Such optimizations frequently alter the communicationbetween the firmware and the peripheral devices, possibly compromising functional correctness of the input/output be-havior of the embedded system. This paper proposes a formal HW/SW co-equivalence checking technique for verifyingcorrect I/O behavior of peripherals under a modified firmware.

1 IntroductionThe internet of things (IoT) opens new opportunities butincreases already existing pressure of tight resource con-straints on embedded computing platforms. Many visionsof the IoT domain demand extremely low design and pro-duction costs (< 1 $/device) and involve an IoT networkconsisting of rather small devices, each with a well-definedset of specific functionalities.A common strategy to reduce costs is using available in-tellectual property (IP) which requires no or only mini-mal further hardware (HW) design effort and implement-ing the desired device functionality in software (SW) byreusing existing SW and generic libraries. This minimizesdesign effort and costs, yet often fails to meet importantnon-functional design targets such as power consumptionand execution time due to the inefficient use of the hard-ware platform by the generic software.A large portion of an embedded system’s chip area is con-sumed by memory, even more so when relying on soft-ware to implement a functionality which otherwise wouldbe implemented by dedicated hardware. One way to re-duce the required instruction memory lies in optimizing theHW/SW interfaces for the peripherals of the system. Theseinterfaces are usually implemented using memory mappedI/O in a number of API functions for accessing each of thesystem’s peripherals. We can increase code reuse by unify-ing access sequences to multiple peripheral devices, lead-ing to fewer API functions. This, however, may change theI/O behavior of the software w.r.t. a particular peripheraldevice, for example, because now the API function usesa different number of load/store accesses or has changedtheir order.When compared with arithmetic instructions, load/storeoperations are more expensive in terms of the energy bud-get, as their execution involves always some action by sec-ondary HW (bus systems, memory etc.) and might makeenergy saving schemes, e.g., using sleep modes, less ef-fective. Therefore, customizing a computing platform usu-ally entails a number of firmware optimizations concerning

the access and communication structures between HW andSW. These interactions are often complex, which makesoptimizing the HW/SW interface prone to errors. In gen-eral, a compiler cannot perform these optimizations be-cause it lacks detailed knowledge about the system’s HW.Instead, the firmware developer or HW designer makessuch adjustments manually when customizing the plat-form. Changes to the HW/SW interface, as consideredin this paper, in fact constitute an optimization step thattakes place after the regular optimizations performed by thecompiler. The class of optimizations considered here typi-cally leads to local modifications of the I/O behavior of theprogram but does not change its global control flow. Yet,the considered optimizations pose significant challenges todesigners and SW developers. They may easily compro-mise the functional correctness of the I/O behavior of theembedded system, e.g., by triggering unanticipated side ef-fects in the HW. As a response, the driver or the HW devicemay move into an unexpected or undefined state, which inturn can lead to a failure of the whole system. Hence, eventhough the modifications are locally contained, their verifi-cation remains a challenging prospect.Any notion of equivalence that only considers the SW can-not be appropriate to solve this verification problem. Sincewe compare programs with different I/O behavior, equiva-lence can only be stated by considering their effect on theHW. In the general case, this would require to prove theHW/SW co-equivalence modulo latency over the entiretyof the system for as many clock cycles as are needed tocover the overall run time of the software. Obviously, thisis an invincibly complex task for realistic applications.This paper proposes a new formal approach to this prob-lem, which can be feasible in many practical settings. Theapproach exploits the locality of the induced changes, dras-tically reducing the complexity of the proof task, by per-forming several small cycle-accurate proofs for the periph-eral. In each local partition, the global context of the com-plete program is taken into account in a time-abstract fash-ion. By over-approximating the initial state space of theperipheral for each partition to be considered, we can con-


4

duct a series of cycle-accurate local proofs, each spanningover small time intervals, whereas the equivalence of theentire system follows as a compositional result from theequivalence of all segments examined in this way.The authors are not aware of any other verification methodthat is able to address the stated problem in a similar fash-ion.

2 PreliminariesOur approach considers low-level driver SW interactingwith HW. We chose to use the program netlist (PN) [1] asthe computational SW model. It represents all executionpaths in a HW-dependent way and can easily be integratedwith the actual HW. This allows precise examination of thecombined behavior of HW/SW system.The PN is based on a representation of the SW as executiongraph (EXG), which is obtained by unrolling the machineprogram using a SAT solver to compute branch conditionsand loop bounds. The EXG is a directed acyclic graphwhose nodes represent specific instructions in the machineprogram. A program netlist is a direct translation of theEXG into a combinational circuit in VHDL or Verilog.The PN model generated for a given SW contains the fol-lowing information relevant to our approach:

• all possible execution paths,• all possible input/output access sequences to periph-

eral hardware (HW) components and to shared mem-ory• the address spaces reached by every instruction• all possible effects of the program on the program-

visible HW registers.

3 MethodThe goal of the proposed approach is to formally verifythat two software program variants trigger the same behav-ior in a set of peripheral devices so that the I/O behavior ofthe combined HW/SW system is identical for both. In theanalysis, the peripheral devices are assumed to be function-ally independent of each other, i.e., there exist no hardwareside effects of one device on another. This assumption isusually fulfilled in practice.

3.1 HW/SW InteractionA peripheral device communicates with the software overits register interface. In addition, it has inputs and outputsto the environment for providing its service. The notion ofsequential hardware equivalence compares the inputs andoutputs of two digital circuits clock cycle by clock cycle.In most practical cases, this notion is too strict for our prob-lem. The overall I/O behavior of the HW/SW system mayvary regarding the timing at the clock cycle scale. If the in-puts and outputs of the peripheral device under the controlof two versions of a software are functionally equivalentbut differ only in timing the common notion of HW/SWco-equivalence modulo latency can be applied.This notion of equivalence leads to prohibitively complexproof problems for the types of systems considered here,when attempting the verification over the full runtime ofthe software. In many practical scenarios, however, theproof problem can be decomposed into manageable sub-problems. As already elaborated in the introduction, the

optimization of the firmware usually changes the programbehavior only locally. This is particularly true for the opti-mizations that target program memory footprint by increas-ing code reuse in the HW/SW interfaces of IoT platforms.In our approach, we consider each peripheral device indi-vidually. We partition both the original software and therevised software into segments. The segmentation is cho-sen such that corresponding software segments induce thesame behavior in the peripheral device, i.e., correspondingsegments are expected to be functionally equivalent w.r.t.this device.Such a segmentation is possible if the optimizations arelocal and can be identified and contained in a software seg-ment. For example, a sequence of device configurationsteps may have been replaced by a different sequence afterintroducing a new API function, however, without chang-ing the effect on the device HW. Such a sequence would beplaced in one segment.The I/O timing within each such segment is modeled withclock cycle accuracy, while the time intervals between thesegments are modeled abstractly. Hence, our notion ofequivalence is indeed HW/SW co-equivalence modulo la-tency, however, per SW segment.For the formal equivalence checks we use program netlistsas described in Sec. 2 to model the interaction between thesoftware and the device registers. A program netlist repre-sents all execution paths between an entry and an exit pointin the software. It models the software behavior at the ISAlevel, including load/store accesses to the peripheral deviceregisters. By combining a program netlist with an RTLmodel of the peripheral device we can reason about the be-havior of the hardware under the control of the firmwarewith clock-cycle accuracy. We consider execution paths ofthe software implicitly using the execution graph (EXG) ofSec. 2. For the following definition, keep in mind that anEXG is a directed acyclic graph (DAG).

Definition 1 (Linear EXG Segment) Consider an execu-tion graph (EXG) G(V,E). A linear segment of the EXG isa subset W ⊆ V of EXG nodes, W = {v0,v1, . . . ,vn}, lyingon a path from the start node, v0, to the end node, vn, i.e.,it is (vi,vi+1) ∈ E for 0≤ i < n.

If the segment contains a branch node then only one of thealternative immediate successor nodes can be included inthe same segment.

Definition 2 (EXG Cover) A set, C = {W1, W2, . . . , Wm},of linear segments of an execution graph G(V,E) such that⋃m

i=1 Wi =V , is called a cover of G.

For equivalence checking, we model both, the originalfirmware and the revised firmware, by the execution graphsG1(V1,E1) and G2(V2,E2), respectively. Each executiongraph is fully decomposed into the same number of m lin-ear segments such that we obtain a cover C1 for G1 and acover C2 for G2. The decomposition is chosen such thatthere is a one-to-one mapping between the elements of C1and C2.

Definition 3 (Segment Mapping) Given two EXG covers,C1 and C2 with |C1|= |C2|= m, a bijection M : C1 7→C2 iscalled segment mapping.

For every linear segment W1,i of the original EXG there isa corresponding linear segment W2,i of the revised EXG.


5

The segments and a mapping M are defined such that cor-responding segments of the original and the revised soft-ware are expected to induce the same behavior of the pe-ripheral device. If a segment of the original software con-tains load and store instructions accessing the register inter-face of the peripheral, then the load and store instructionsof the corresponding segment in the revised software areexpected to generate the same behavior in the peripheral.This is checked with clock cycle accuracy with the modeldescribed shortly.

Definition 4 (Access Sequence) An access sequence of alinear segment W of an EXG is a sequence of accesses (z1,z2, . . .) to the peripheral device registers, correspondingto load/store instructions in W. Each access zi has a type(“read” or “write”), and an address representing a deviceregister.

Accesses generated by load/store instructions in the soft-ware produce read/write transactions at the peripheral de-vice registers. The actual clock cycles at which these trans-actions occur depends on various parameters of the com-puting platform.

Definition 5 (Access Pattern) An access pattern is a se-quence of sets of logic values produced at the peripheraldevice register interface as a result of an access sequence.

An access pattern is the RTL view of an access sequence ofDef. 4, which is the programmer’s view of the communi-cation with the device. An access pattern is a clock cycle-accurate “waveform” of read and write transactions at theregister interface. The individual RTL transactions mayvary in timing and may be separated by “idle” periods, i.e.,clock cycles of inactivity at the register interface. A sin-gle access sequence therefore corresponds to a multitudeof possible access patterns.The segmentation of the software allows to decouple indi-vidual activities of the peripheral device temporally fromeach other, reducing the overall complexity of the verifica-tion task. Segment mapping according to Def. 3 is mainlya manual task, and is usually straightforward, particularlywhen the firmware has been optimized by manual steps aswell.

3.2 Computational ModelOnce we have partitioned the software into segments wecan build a computational model and formulate an equiva-lence checking problem for this model.Fig. 1 shows the overall model, called ACCESS-global.The execution graphs for both the golden and the revisedfirmware are translated into program netlists (PN1 andPN2). The boxes in Fig. 1 represent instruction cells, i.e.,combinatorial subcircuits describing the modification ofthe program state under a specific instruction. The boxeswith solid outlines represent load/store instructions com-municating with the peripheral under consideration. Theboxes with dashed outlines represent other kinds of in-struction cells. The trapezoid shapes represent “mergecells” [1], auxiliary constructs where computations of dif-ferent branches are merged.The behavior of the peripheral is represented by a com-binatorial circuit module called ACCESS-local. It is, es-sentially, an unrolling of the peripheral device logic intoa number of time frames (similar as in bounded modelchecking (BMC)) such that different access patterns as in

PN2PN1

a

ab

b

e

g

c

d

e

f f

ACCESS(local)

?

ACCESS(local)

?

ACCESS(local)

?

Figure 1 ACCESS-global: PN miter

Def. 5 can be applied and evaluated. ACCESS-local con-tains two instances of the peripheral (as in a classical equiv-alence checking miter) whose outputs are compared. Oneinstance of the peripheral is controlled by device accessesfrom PN1, the other by device accesses from PN2. Bothprogram netlists are connected to the same primary inputs,i.e., they model the original and the revised firmware underthe same initial memory content and under the same inputsreceived from other sources (OS, I/O). (This is not shownin Fig. 1.)An ACCESS-local module is created for every pair of codesegments from the segment mapping M (Def. 3). For eachACCESS-local module, an equivalence check is carried outcomparing the two instances of the peripheral under theinfluence of the original and revised firmware, respectively.For the example in Fig. 1, the computational model is con-structed for three equivalence checks, each comparing apair of software segments with regard to the peripheral be-havior. The first one checks whether the access sequence(a,b) in PN1 generates the same behavior in the peripheralas the access sequence (b,a) in PN2. The second checkevaluates whether sequence (c,d,e) in PN1 produces thesame effects in the peripheral as sequence (e,g) in PN2.The last check is trivial: it compares the instruction se-quence ( f ) in PN1 with ( f ) in PN2. Even though the ac-cesses ( f ) themselves may be identical, the values writtendepend on the computations in PN1 and PN2 and may bedifferent. Any difference can be detected because the com-putational model for an ACCESS-local check includes thefull program netlists, not just the compared segments, eventhough only the accesses from a segment are connected toan ACCESS-local module.

EDA-Clusterforschung

+

PS

HW

+

PS

HW

+

PS

HW

+

PS

SAT?

HWHWgrace

period

PN-2

PN-1

eq

ual sta

te HWgrace

period

modelled in verilog/vhdl

modelled by properties

Figure 2 ACCESS-local: device miter

Fig. 2 illustrates the internals of an ACCESS-local mod-ule. Two instances of the peripheral are unrolled for a fi-


6

nite number of time frames. The unrolling begins at anarbitrary state (cf. Sec. 3.3) which is, however, the samefor both instances. Each instance is controlled by an ac-cess sequence: one from PN1, the other one from PN2,as illustrated in the figure. The number of time frames islarge enough to accommodate for the longest of the twoaccess sequences plus additional clock cycles of delay be-tween the individual transactions plus an additional user-defined “grace period”, discussed below. The ACCESS-local construct models all access patterns (Def. 5) derivedfrom the two access sequences (Def. 4) and all possibleinter-access delays up to a user-specified maximum num-ber of time frames. The outputs of the peripheral deviceas well as the ending state reached at the end of the timeframe expansion are compared for equality.What does this equivalence check verify? It checks that thebehavior of the peripheral device is the same for both, thesegment of the original and the segment of the revised soft-ware, modulo timing variations. The proof is conservativebecause it considers the device behavior from an arbitrarystarting state, i.e., the proof includes the state which thedevice is in when the respective software segment beginsexecution.If all ACCESS-local equivalence checks (one for each pairof mapped software segments) are successful then the pe-ripheral device behavior is the same for the golden and therevised firmware as a whole.By induction. Step: ACCESS-local proves the following:If a peripheral device starts from the same state at the be-ginning of the revised software segment as it does at thebeginning of the original software segment, then, after ei-ther segment, it arrives in the same ending state. Base: Theperipheral device is in the same state (the reset state) at thebeginning of the first segment of either software.Grace Period. The ACCESS-local approach verifieswhether two different access sequences can drive a periph-eral device into the same ending state, modulo differencesin timing. The computational model must accommodatefor a number of clock cycles of hardware action in the de-vice after the last access in a sequence has occurred, be-fore the ending state has been reached. We call this the“grace period” of the equivalence check. It leads to addi-tional time frames in the unrolling, as illustrated in Fig. 2.

3.3 ACCESS Device CheckIf an ACCESS-local check is successful, the comparedsoftware segments are guaranteed to be equivalent w.r.t.device behavior. If a firmware optimization is not correct,then the ACCESS-local check is guaranteed to fail for thecorresponding segments.However, the opposite is not true, i.e., if an ACCESS-localcheck fails this doesn’t necessarily mean that the comparedSW segments are not equivalent. The reason for this lies inthe division of the overall verification problem into man-ageable subproblems. In order for Theorem 3.2 to be ap-plicable, the ACCESS-local construct must be conserva-tive by analyzing the behavior of the peripheral from anarbitrary starting state. This may lead to spurious coun-terexamples if the starting state of the counterexample isnot reachable in the combined HW/SW system.For example, consider a communication peripheral thatneeds to be configured and then can be used to transmitdata. Normal usage performs these steps separately and itmay, in fact, be a requirement that no configuration hap-pens when the device is busy. Let’s formulate this re-

quirement as a constraint in the form of an implicationbusy =⇒ ¬config. Assume now that we are running anACCESS-local equivalence check to compare two SW seg-ments that configure the peripheral. Since the ACCESS-local computational model begins at an arbitrary state itmay return a counterexample in which the constraint is vi-olated, i.e., in which the device begins at a busy state andreceives a config command. This counterexample is, mostlikely, spurious because the device cannot be in a busy statewhen the SW is still in the device configuration phase. Inview of the whole HW/SW system the starting state of thecounterexample is unreachable.When an ACCESS-local check fails, the verification engi-neer would need to manually analyze the counterexample.If the counterexample is spurious because a constraint wasviolated, the property instance needs to be strengthened byadding a constraint (in fact, an invariant of the HW/SWsystem). In our example, the implication busy =⇒ ¬configwould be added.Usually, a peripheral device imposes several constraintslike this on the software using it. Such constraints area consequence of the device’s inner architecture, and, of-ten enough, not all of these constraints are documented forthe software programmer. Discovering undocumented con-straints through debugging ACCESS-local checks may betedious because the counterexample can also result fromplain firmware inequivalence or from an actual bug in thedevice hardware.We can use the basic idea of the ACCESS-local approachto specifically search for undocumented device precondi-tions as well as for bugs. We construct an ACCESS-localmodel as in Fig. 2 but do not connect it to any programnetlists. Instead, we control both instances of the devicehardware by an access sequence that is arbitrary but thesame for both instances. For each instance, the ACCESS-local construct models all access patterns (of a given fixedlength), i.e., all valid waveforms in all timing variationsthat represent the abstract access sequence. The problemformulation searches for a pair of access patterns that in-duce a different ending state in the device. If a solution ex-ists then we have found an access sequence for which thedevice behavior depends on the actual access timing. Suchan access sequence either violates a constraint unknown sofar or it exposes a bug in the device hardware.We call this check the ACCESS Device Check. Every con-straint found in this check is iteratively added to a con-straint set. This set is used to exclude the same spuriouscounterexamples to be returned again in further runs of theDevice Check. The complete constraint set is then used inthe ACCESS-local equivalence checks with the effect thatall spurious counterexamples are prevented.

4 ConclusionThe paper has presented a novel method for proving thecorrectness of code transformations modifying a firmwareI/O sequence accessing a peripheral device. Experimentsare in preparation and first results are promising.

5 References[1] B. Schmidt, C. Villarraga, T. Fehmel, J. Bormann, M. Wedler,

M. Nguyen, D. Stoffel, and W. Kunz, “A new formal verificationapproach for hardware-dependent embedded system software,” IPSJTransactions on System LSI Design Methodology (Special Issue onASPDAC-2013), vol. 6, pp. 135–145, 2013.


7

Inductive Proof Rules Beyond Safety PropertiesMartin Köhler, Klaus SchneiderTU Kaiserslautern, Germany

Abstract

The verification of temporal logic properties of reactive systems has been classically introduced as a model checkingproblem where one has to check that a temporal logic property holds on all initial states of a considered state tran-sition system. A major breakthrough has been achieved by symbolic model checking, first by fixpoint computationswhere the state sets were stored as canonical normal forms of propositional logic formulas like BDDs, and later asbounded and SAT-based model checking procedures using SAT solvers. Interpolation-based model checking finallypaved the way for induction-based methods like property-directed reachability (PDR) that are currently the mostefficient verification procedures. However, PDR is so-far only used for the verification of safety properties, i.e., toprove that a desired property holds on all reachable states. In this paper, we prove the correctness and completenessof induction rules for general fixpoint formulas that extend Park’s fixpoint induction rules. We instantiate theserules for all CTL operators so that induction becomes available for all CTL properties. Moreover, we develop furtherinduction rules for liveness properties by considering liveness properties as bounded safety properties. In general, wetherefore point out that induction-based proof methods are not limited to safety properties.

1 Introduction

The verification of temporal logic properties of reac-tive systems is without doubt one of the big successstories of computer science [32, 24, 3, 54]. The ini-tial breakthrough was achieved with the introductionof symbolic model checking [19, 20] of temporal logicproperties where state transition systems as well assets of states are represented by propositional logicformulas that are themselves typically represented asbinary decision diagrams (BDDs) [17, 49, 59]. Us-ing symbolic representations, transition systems withmore than 1020 states were verified even around 1990[19, 20].

After the breakthrough with symbolic model check-ing [19, 20], many sophisticated methods were devel-oped to further increase the performance of the verifi-cation tools. The strong improvements of SAT solversusing clause learning, i.e., non-chronological backtrack-ing [10] early motivated the use of SAT solvers forthe verification of reactive systems [9]. Since BDDsare canonical normal forms, one can quickly checkwhether a fixpoint has been reached by just compar-ing two pointers. Without canonical normal forms,this check however leads to a SAT problem on its own.Moreover, computing the predecessor states of a sym-bolically represented set of states requires booleanquantification which is not available in SAT solvers.Thus, fixpoint-based verification became a problem af-ter the introduction of SAT solvers.

For this reason, bounded model checking [7, 23] wasintroduced next where the fixpoints were replaced byfixpoint approximations that were obtained by un-rolling the transition relation for a given number oftimes. This way, the existence of errors can be provedfor greatest fixpoints like safety properties while ab-

sence of errors can only be proved for least fixpointslike liveness properties. Completeness for both hasbeen added later on for finite state systems by check-ing terminal cycles that have to occur on paths due tothe finitely many states. However, unrolling the fix-point formulas and state transition relations lead toreally big formulas that also challenged SAT solvers.

Interpolation-based model checking [45, 46, 47, 1]replaced the exact computation of predecessor statesthat requires boolean quantification by computing anapproximation of the same using Craig interpolation[25] (avoiding boolean quantification). This way,McMillan was able to replace fixpoint-based modelchecking entirely by SAT solvers without sacrificingthe completeness of the decision procedures.

Induction-based reasoning was developed as an al-ternative to the fixpoint-based reasoning of the origi-nal symbolic model checking procedures. Induction isa very old proof rule that at least dates back to Peanowho popularized it for the axiomatization of the natu-ral numbers. Park [50] already discovered that induc-tion can be used for proving general greatest fixpointsas well, and induction was also used in many specialverification procedures [55, 57, 48, 29, 2, 22, 33]. In-duction is very attractive and efficient in the sense thatonly the induction base and the induction step haveto be proved as two proof goals independent of thenumber of fixpoint iteration steps. However, it hasthe disadvantage that one usually has to find a suit-able inductive assertion since some properties cannotbe proved by a single induction step even though theyare valid.

Incremental inductive reasoning as used by propertydirected reachability (PDR) [12, 13, 14, 28, 31, 34, 35,58] turned out as a clever combination of combiningSAT solvers with induction rules: Whenever a proof


8

attempt fails, it is analyzed whether the counterexam-ple found is a true one or just a counterexample to theinduction step. In the latter case, the proof attemptis repeated with a stronger inductive assertion that isautomatically constructed by analyzing the reachabil-ity of the states provoking the counterexample (analo-gous to CEGAR for the generation of good abstractionsby means of counterexamples).

PDR is currently considered to be the most efficientverification method for safety properties. The corealgorithm has been introduced in [12] for hardwaremodel checking problems and has been implementedin a tool called IC3 (Incremental Construction of Induc-tive Clauses for Indubitable Correctness). In essence,given a symbolic representation of a state transitionsystem K and a state property ϕ, the algorithm triesto prove that ϕ holds on all reachable states of Kby means of induction. To this end, it first checkswhether ϕ holds on all initial states (induction base),and then checks whether ϕ holds on all successorstates of those states that satisfy ϕ (induction step).However, the latter may fail even though ϕ holds onall reachable states since there may exist unreachablestates satisfying ϕ that have successor states that donot satisfy ϕ. Such states are called counterexamples toinduction (CTIs) and have to be incrementally learnedand excluded from consideration by the PDR method.

PDR is very efficient, since in the best case, it mayjust use a SAT or SMT solver to prove the inductionbase and the induction step. To this end, it has noneed to compute fixpoints as done by symbolic modelchecking [19, 20], neither to unroll the transition re-lation as required in bounded model checking [23, 8]nor has it a need to construct Craig interpolants as re-quired by interpolation-based model checking [45]. Ithas meanwhile been integrated in many model check-ers like nuXmv1, ABC [16], IIMV2, PdTRAV3, andKind24, and many detailed optimizations have beenadded: In particular, [28] suggested a couple of mod-ifications to the original PDR method to improve itsperformance and to simplify its implementation.

In this paper, we show that the proof principle of in-duction is not at all limited to safety properties. Firstof all, we recall Park’s induction rule [50] for provinggreatest fixpoints, and second we present a new in-ductive proof rule for least fixpoints in this paper. Bothrules are correct and complete as stated by Theorem4.2 of this paper. We instantiated these rules as spe-cial induction rules for all CTL formulas as shown inFigure 1. Moreover, we develop by Theorem 5.1 a fur-ther special induction rule for proving liveness prop-erties that considers liveness as a bounded safety prop-erty. The results of this paper clearly demonstrate thatthe induction principle is not limited to safety proper-ties so that inductive proof procedures similar to PDRmay also be developed for other properties.

1https://nuxmv.fbk.eu2http://ecee.colorado.edu/~bradleya3http://fmgroup.polito.it4http://kind2-mc.github.io/kind2

The outline of the paper is as follows: In the nextsection, we briefly compare our work with relatedwork. Section 3 explains the notation used in the restof the paper. Section 4 is the main part of our paper,where we first prove Park’s fixpoint induction rule thatcan be used to verify greatest fixpoints and to disproveleast fixpoints by means of induction. This rule can-not be used to verify least fixpoints and to disprovegreatest fixpoints. For the latter, we prove in Section4 a new induction rule that can prove arbitrary leastfixpoints by a sequence of assertions. Section 5 intro-duces a further induction rule for liveness propertiesby carefully analyzing why Park’s rule fails for livenessproperties.

2 Related Work

Despite the big interest and high potential of PDR,it was so far almost only used for the verification ofsafety properties: The only exceptions we are awareof are [15, 14, 34, 36]. References [15, 14] discussthe algorithm FAIR that uses PDR for searching reach-able fair cycles (also called ‘lasso’ structures) for prov-ing fairness properties. To that end, FAIR describesSCC-closed sets by a sequence of inductive assertionswhich is used to introduce a progress measure sinceleaving an SCC-closed set makes it impossible to re-visit that set afterwards.

References [14, 34] present with IICTL an incre-mental inductive verification method for CTL proper-ties. In contrast to fixpoint-based model checking, thesets of states satisfying CTL formulas are not exactlycomputed, but estimated as either under- or over-approximations of the true sets. Undecided states mayhave to be decided by reachability (PDR) or fair cyclerequests (FAIR). IICTL adapts the style of local modelchecking and is therefore proof-goal oriented, but thebasic methods used are still the classic PDR and thefair cycle detection FAIR.

Also [22, 36] exploit the structure of the state tran-sition system for proving liveness properties: [22]converts liveness properties to persistence propertiesof the form FGϕ (stating that except for finitely manyexceptions, ϕ holds always). The k-LIVENESS algo-rithm of [22] counts the number of times ϕ can befalse on a path. If that number is bounded for allpaths, we know that AFGϕ holds. Finally, [36] com-pares the k-LIVENESS approach [22] with the FAIRalgorithm presented in [15].

Hence, the so-far existing approaches for provingliveness properties are not really based on fixpointinduction of the liveness property. In this paper, wetherefore propose a completely new approach wherenew fixpoint induction rules are established for prov-ing not only liveness, but even general least fixpointproperties. In this paper, we present the formal back-ground and prove the correctness of our fixpoint in-duction rules while further papers have to discusstheir implementation and experimental evaluation.


9

3 Preliminaries

We assume that a state transition system K =(V ,ΨI ,ΨR) is symbolically represented by means ofa finite set of boolean variables V , and propositionalformulas ΨI and ΨR for its initial states and its tran-sitions, respectively. A state s ⊆ V of K is a subsetof V such that those variables hold in the state thatbelong to s while other variables are false5.

As usual for symbolic representations, every propo-sitional formula ϕ over the variables V is associatedwith a set of states JϕKK ⊆ 2V of the transition sys-tem which are those states that satisfy ϕ if the statesare viewed as variable assignments. Analogously, ev-ery propositional formula over the variables V and arelated set V ′ denotes a set of transitions so that theassignments to variables V and V ′ correspond withthe current and next state, respectively.

In the following, we assume that every state of Khas at least one successor state, i.e., there are no finitepaths in K . This is no severe restriction since for re-active systems, only infinite computations matter andall (unwanted) finite paths can be quickly removed.

To reason about temporal relationships of states, wedefine the existential/universal predecessor/successorstates of a state set Qi ⊆S of a transition system overthe state set S = 2V as follows:

• preR∃ (Q2) := {s1 ∈S | ∃s2.(s1,s2) ∈R ∧ s2 ∈ Q2}

• preR∀ (Q2) := {s1 ∈S | ∀s2.(s1,s2) ∈R→ s2 ∈ Q2}

• sucR∃ (Q1) := {s2 ∈S | ∃s1.(s1,s2) ∈R ∧ s1 ∈ Q1}

• sucR∀ (Q1) := {s2 ∈S | ∀s1.(s1,s2) ∈R→ s1 ∈ Q1}

State transition systems K = (V ,ΨI ,ΨR) are themodels of reactive systems that are used to verify tem-poral logic properties. It is well-known that the µ-calculus is a very expressive logic that covers mostother specification logics for reactive systems in termsof expressiveness. The formulas of the µ-calculus overvariables V are defined as follows:

Definition 3.1 (µ-Calculus Syntax) The set of µ-calculus formulas Lµ over variables V is the least setof formulas satisfying the following properties providedthat ϕ,ψ ∈Lµ and x 6∈ V

• variables: V ⊆Lµ

• propositional operators: ¬ϕ,ϕ ∧ψ,ϕ ∨ψ ∈Lµ

• modal operators: ♦ϕ,�ϕ,←−♦ϕ,←−�ϕ ∈Lµ

• fixpoint operators: µx.ϕ ∈Lµ and νx.ϕ ∈Lµ

It is moreover required that all occurrences of the boundvariable x in the fixpoint formulas µx.ϕ and νx.ϕ arepositive (i.e., covered by an even number of negations).Monotonicity follows from this syntactic restriction. Thefollowing sections rely on monotonicity in order to en-sure the existence of fixpoints.

5Thus, we identify states with their variable assignments. If dif-ferent states with the same variable assignments would be wanted,one may use additional non-observable variables.

For every state transition system K = (V ,ΨI ,ΨR),every µ-calculus formula represents a set of stateswhich is recursively defined as follows:

Definition 3.2 (µ-Calculus Semantics) Given a statetransition system K = (V ,ΨI ,ΨR) and a µ-calculusformula Φ ∈ Lµ over the state set S , the followingrecursively defined function determines a set of statesJΦKK of K as the semantics of Φ:

• JxKK := {s ∈S | x ∈ s}• J¬ϕKK := S \ JϕKK

• Jϕ ∧ψKK := JϕKK ∩ JψKK

• Jϕ ∨ψKK := JϕKK ∪ JψKK

• J♦ϕKK := preR∃ (JϕKK )

• J�ϕKK := preR∀ (JϕKK )

•r←−♦ϕ

z

K:= sucR

∃ (JϕKK )

•r←−�ϕ

z

K:= sucR

∀ (JϕKK )

• Jµx.ϕKK is the least fixpoint of f (Q) := JϕKK Q

x

• Jνx.ϕKK is the greatest fixpoint of f (Q) := JϕKK Q

x

The function f (Q) := JϕKK Q

xis called the state trans-

former of ϕ and x. For its definition, we use the modifiedKripke structure K Q

x that changes the variable assign-ment of variable x 6∈ V such that exactly the states Qsatisfy variable x.

Note that K Qx is simply obtained by adding x↔ ΨQ

as conjunct to the transition relation ΨR of K if Q =JΨQKK holds, i.e., ΨQ represents Q.

Model checking means to verify that all initial statesΨI satisfy a given µ-calculus formula Φ, i.e., to provethat JΨI KK ⊆ JΦKK holds. There are two funda-mentally different approaches to prove this which arecalled global and local model checking. In global modelchecking, one first computes the states JΦKK to thencheck whether JΨI KK ⊆ JΦKK holds. To this end, theset of states JΦKK are recursively computed accordingto the above definition where the least and greatestfixpoints are computed by the Tarski/Knaster theoremas follows:

Theorem 3.1 (Tarski/Knaster) For any state transi-tion system K = (V ,ΨI ,ΨR) and any µ-calculus for-mula ϕ and variable x, we consider the following se-quences of sets of states:

• Q0 := {} and Qi+1 := JϕKK

Qix

• P0 := S and Pi+1 := JϕKK

Pix

The sequence Qi is non-decreasing and converges tothe least fixpoint Jµx.ϕKK and the sequence Pi isnon-increasing and converges to the greatest fixpointJνx.ϕKK .

In local model checking, the proof goal JΨI KK ⊆JΦKK is decomposed to subgoals according to the syn-tax of Φ and the transition relation of K . The lat-ter reveals a strong relationship between µ-calculus


10

model checking and the solution of parity games[26, 27, 11]. Interestingly, the complexity of thesetwo equivalent problems has just recently been provedto be quasi-polynomial [21, 30]. Due to their deduc-tive proof style, induction rules are traditionally morerelated to local model checking, but a new branchof research explores the relationship between fixpointcomputations and inductive reasoning.

While the µ-calculus is very expressive [54], its for-mulas are often difficult to read. For this reason, otherspecification logics were defined, and while havinghistorically different roots, we consider in the follow-ing the CTL temporal logic as being defined as the fol-lowing macros for µ-calculus formulas:

Definition 3.3 (Temporal Logic CTL) The set of CTLtemporal logic formulas is defined in terms of µ-calculusformulas as follows:

• EXϕ = ♦ϕ

• EGϕ = νx. ϕ ∧♦x• EFϕ = µx. ϕ ∨♦x• E[ϕ U ψ] = µx. ψ ∨ϕ ∧♦x• E[ϕ U ψ] = νx. ψ ∨ϕ ∧♦x• E[ϕ B ψ] = µx. ¬ψ ∧ (ϕ ∨♦x)• E[ϕ B ψ] = νx. ¬ψ ∧ (ϕ ∨♦x)

• AXϕ =�ϕ

• AGϕ = νx. ϕ ∧�x• AFϕ = µx. ϕ ∨�x• A[ϕ U ψ] = µx. ψ ∨ϕ ∧�x• A[ϕ U ψ] = νx. ψ ∨ϕ ∧�x• A[ϕ B ψ] = µx. ¬ψ ∧ (ϕ ∨�x)• A[ϕ B ψ] = νx. ¬ψ ∧ (ϕ ∨�x)

In CTL formulas, each path quantifier E and A occursin combination with one of the temporal logic oper-ators X, G, F, [ U ], [ U ], [ B ], and [ B ] (while in themore expressive logic CTL∗ any combination of pathquantifiers and temporal logic operators is allowed).In general, Eϕ states that there is an outgoing paththat satisfies ϕ, and Aϕ states that all outgoing pathssatisfy ϕ. Xϕ states that ϕ holds on the next stateof the path, Gϕ states that ϕ holds on all states ofthe path, and Fϕ states that ϕ holds on at least onestate of the path. [ϕ U ψ] states that ϕ must hold un-til ψ holds on a path, and [ϕ U ψ] additionally assertsthat ψ must hold at least once there. Finally, [ϕ B ψ]states that ϕ must hold before ψ holds on a path, and[ϕ B ψ] additionally asserts that ϕ must hold at leastonce there.

Symbolic model checking according of CTL for-mulas [19, 20, 32] was the breakthrough of modelchecking. This was done by global model checkingof the related µ-calculus formulas according to theTarski/Knaster theorem. However, the computationof the fixpoints requires sometimes many iterations,and also checking after each iteration whether the fix-point has been reached may be computationally ex-pensive (while it is cheap when using BDDs). Forthis reason, many sophisticated improvements weresuggested, in particular, to benefit from the enormouspower of nowadays SAT solvers.

In particular, property directed reachability (PDR)[12, 13, 14, 28, 31, 34, 35, 58] introduced a sophisti-cated proof engine that is nowadays considered to bethe most efficient algorithm to prove safety properties,i.e., properties of the form AGϕ where ϕ is a proposi-

tional logic formula. To understand PDR, one first hasto consider the following simple induction rule:

ΨI → ϕ ϕ →�ϕ

ΨI → AGϕ

This means that we prove the validity of ΨI → ϕ andϕ→�ϕ to conclude ΨI →AGϕ, i.e., that all outgoingpaths from initial states always satisfy ϕ which clearlyholds iff all reachable states satisfy ϕ. The above ruleis obviously correct, but unfortunately not complete,i.e., there are state transition systems K and proper-ties AGϕ such that ΨI → AGϕ holds but ϕ → �ϕ isnot valid (while the other assumption ΨI → ϕ mustthen be valid). The reason for this is that there maybe unreachable states that satisfy ϕ, but one of theirsuccessors does not satisfy ϕ. These states are calledcounterexamples to induction. To make the above rulecomplete, it is well-known that one has to find an in-ductive assertion ψ so that the following rule can beused:

ΨI → ψ ψ →�ψ ψ → ϕ

ΨI → AGϕ

The above rule is correct and complete, i.e., it is al-ways possible to find a suitable formula ψ such thatthe above rule can prove any valid safety property.For example, choose any ψ that exactly representsthe reachable states to see the completeness. Withoutcomputing the reachable states, PDR came up with akind of learning from counterexamples to inductionso that a sequence of inductive assertions ψi are de-termined until a proof with the following rule will besuccessful or a counterexample turns out to be a realcounterexample:

ΨI → ψ0∧k−1

i=0 (ψi→ ψi+1∧�ψi+1) ψk→ ϕ ∧ψk−1ΨI → AGϕ

PDR is able to automatically determine the assertionsψi which approximate a fixpoint computation of thereachable states. See [38, 39] besides the original pa-pers for a gentle and brief introduction to PDR.

4 Fixpoint Induction Rules

As explained in the previous section, induction has re-cently gained increased interest for the verification ofsafety properties. In this paper, we point out that theinduction principle is not limited to safety properties,and can instead be applied to general fixpoints. Thishas already been remarked by David Park [50] whopointed out an induction rule for fixpoints. One of therules that we present is exactly Park’s fixpoint induc-tion rule, the others are however not related to Park’srule. To point out the differences, we first considerPark’s fixpoint induction rule.

Fixpoint induction rules are best understood by in-troducing the notion of pre- and post-fixpoints whichcan be defined in a general partially ordered set asfollows:


11

Definition 4.1 (Pre/Post-Fixpoints) For every func-tion f : D → D over a partially ordered set (D ,v), wedefine:

• x is a fixpoint of function f if f (x) = x.• x is a pre-fixpoint of function f if f (x)v x.• x is a post-fixpoint of function f if xv f (x).

It is clear that every fixpoint is both a pre-fixpoint andalso a post-fixpoint. The converse may not be the case.

Lemma 4.1 (Pre/Post-Fixpoints) The least fixpoint iscontained in any pre-fixpoint, and the greatest fixpointcontains any post-fixpoint. Moreover, we even have

• µx. f (x) =l{x | f (x)v x}

• νx. f (x) =⊔{x | xv f (x)}

Proof: We prove by induction that f n(⊥) v x holdsfor any n ∈ N and every pre-fixpoint x: The induc-tion base is clear, since f 0(⊥) = ⊥ v x holds. Forthe induction step, we know by induction hypoth-esis that f n(⊥) v x holds, so that by monotonicityof f , we obtain f n+1(⊥) v f (x). By transitivity ofv, and the fact that x is a pre-fixpoint, we concludethat f n+1(⊥)v f (x)v x. Since µx. f (x) = limn→∞ f n(⊥)(Tarski-Knaster iteration), it follows that µx. f (x) v xfor any arbitrary pre-fixpoint x.

Analogously, we can prove that x v f n(>) holds forany n ∈ N and any arbitrary post-fixpoint x: The in-duction base is clear, since x v f 0(>) = > holds. Forthe induction step, we know by induction hypothe-sis that x v f n(>) holds, so that by monotonicity off , we obtain f (x) v f n+1(>). By transitivity of v,and the fact that x is a post-fixpoint, we concludethat xv f (x)v f n+1(>). Since νx. f (x) = limn→∞ f n(>)(Tarski-Knaster iteration), it follows that x v νx. f (x)for any arbitrary post-fixpoint x.

By the proven facts, it follows that the followingrelations hold:

• µx. f (x)vl{x | f (x)v x}

•⊔{x | xv f (x)} v νx. f (x)

The converse relations are clear since µx. f (x) andνx. f (x) are both pre- and post-fixpoints and thus, eventhe least and greatest ones. �

An immediate consequence of the above lemma arethe following fixpoint induction rules that we can nowprove to be correct and complete:

Theorem 4.1 (Park’s Fixpoint Induction [50]) Thefollowing rules are correct for all formulas ϕ,ψ where[ϕ]ψx denotes the replacement of all occurrences ofvariable x in formula ϕ by formula ψ:

[ϕ]ψx → ψ

(µx.ϕ)→ ψ

ψ → [ϕ]ψxψ → νx.ϕ

Proof: For the proof, note that the validities of[ϕ]ψx → ψ and ψ → [ϕ]ψx mean that JψKK is a pre-and post-fixpoint of the function f (Q) := JϕK

K Qx

, re-spectively. The rest follows directly from the previouslemma. �

The above fixpoint induction rules have alreadybeen noted by Park, and were attributed to him [50].These rules allow us to prove greatest fixpoints, andto disprove least fixpoints. To prove least fixpointsand to disprove greatest fixpoints, the following The-orem can be used that furthermore explains how onecan construct inductive assertions ψi in an incremen-tal manner:

Theorem 4.2 (Incremental Fixpoint Induction)The following rules are correct and complete:

ΨI → ψ ψ → [ϕ]ψxΨI → νx.ϕ

ψ0→ [ϕ]falsex

n−1∧i=0

(ψi→ ψi+1)n−1∧i=0

(ψi+1→ [ϕ]ψi

x)

ΨI → ψn

ΨI → µx.ϕ

In both rules, the assertions ψ and ψi, respectively, arepost-fixpoints of the state transformer associated withµx.ϕ, i.e., f (Q) := JϕK

K Qx

.

Proof: The correctness of the first rule is obvious:By the Park fixpoint induction, we have ψ → νx.ϕ sothat this implies with the other assumption ΨI → ψ

that the initial states imply the greatest fixpoint. Theassumption ψ → [ϕ]ψx directly asserts that ψ is a post-fixpoint, and using ψ := νx.ϕ proves the complete-ness.

Now consider the proofs for the least fixpoint rule:We first prove by induction that for all i ∈ N the impli-cation ψi → µx.ϕ is valid: For the induction base, wehave to prove that ψ0→ µx.ϕ holds which can be con-cluded from the assumption ψ0→ [ϕ]falsex and the fact6

[ϕ]falsex → µx.ϕ, so that the induction base is proved.For the induction step, we have by induction hy-

pothesis ψi→ µx.ϕ, so that we can conclude by mono-tonicity7 of ϕ that [ϕ]ψi

x → [ϕ]µx.ϕx holds. Now note that

[ϕ]µx.ϕx is equivalent to µx.ϕ, since the latter is a fix-

point of ϕ. Hence, we have [ϕ]ψix → µx.ϕ, so that by

our assumption ψi+1→ [ϕ]ψix and transitivity of→ that

ψi+1→ µx.ϕ holds.Hence, we conclude

Jψ0KK ⊆ Jψ1KK ⊆ . . .⊆ JψnKK ⊆ Jµx.ϕKK

so that our assertions form an increasing chain thatis contained in the least fixpoint µx.ϕ. If we finallyhave ΨI → ψn, the chain has finally covered the ini-tial states so that we can conclude ΨI → µx.ϕ.

6Note that the Tarski/Knaster fixpoint iteration of µx.ϕ startswith x0 := false, x1 := [ϕ]falsex , . . . and forms a monotonically increas-ing sequences that converges to µx.ϕ.

7This means that α → β implies [ϕ]αx → [ϕ]βx .


12

To prove the completeness, we have to find prop-erties ψi such that the rule can be used to proveany valid property µx.ϕ. To this end, we defineψ0 := [ϕ]falsex and ψi+1 := [ϕ]ψi

x . Clearly, the sequenceof properties ψi converges to µx.ϕ, and will reach thatfixpoint for some i = n since we only consider finitetransition systems. These properties then fulfill all as-sumptions of the above rule.

Finally, it is easily seen that ψi → [ϕ]ψix , i.e., that ψi

denotes a post-fixpoint: Simply note that the two as-sumptions ψi → ψi+1 and ψi+1 → [ϕ]ψi

x directly implyψi→ [ϕ]ψi

x , so that ψi is a post-fixpoint. �The above theorem is the core of our paper. While

the fixpoint induction rule due to Park as given in The-orem 4.1 was already known, the second rule of The-orem 4.2 has not yet been considered to the best ofour knowledge and is first presented in this paper.

The intention of the above rule for least fixpoints isto determine the properties ψi incrementally as under-approximations of the least fixpoint that finally shouldcover the initial states. We start with selecting anysubset of [ϕ]falsex to determine ψ0. If ΨI → ψ0 holds,the proof is done. Otherwise, there is an initial statethat does not belong to ψ0, so that we have to extendour approximation from ψi to ψi+1 that must still bea subset of [ϕ]ψi

x until finally ΨI → ψn holds or a realcounterexample is found that disproves ΨI → µx. ϕ.

Another way to explain our proof rule for least fix-points is as follows: It is well-known that the Tarski-Knaster iteration for least fixpoints may also be startedwith any post-fixpoint that is less than or equal tothe least fixpoint [54]. The induction rule for leastfixpoints of Theorem 4.2 may also be understood ascounterparts to that optimization since the proof ofthe above theorem revealed that each ψi is a post-fixpoint less than or equal to the least fixpoint µx. ϕ.

At a first glance, our proof rule to prove least fix-points may look similar to the proof rule used by PDR.However, it is completely different: In contrast to ourrule for least fixpoints, PDR does not make inductionon the fixpoint of AGϕ, but on the fixpoint represen-tation of the reachable states to show that these areincluded in ϕ.

Having proven Theorem 4.2, we can instantiate theproof rules by Definition 3.3 to obtain the proof rulesfor CTL as shown in Figure 1 (for EGϕ and EFϕ, werefer to the left column here). Based on this, we alsoimmediately obtain the following estimations of thesets of states that satisfy these formulas:

• JEGαKK ⊆ JαKK

• JβKK ⊆ JEFβKK

• JβKK ⊆ JE[α U β ]KK ⊆ JE[α U β ]KK ⊆ Jα ∨βKK

• Jα ∧¬βKK ⊆ JE[α B β ]KK ⊆ JE[α B β ]KK ⊆ J¬βKK

The proofs of these inclusions are straightforward:We simply have to remark that [ϕ]falsex is an under-approximation of µx.ϕ and that [ϕ]truex is an over-approximation of νx.ϕ.

The above lower estimations can be used as initialassertions ψ0 (as already indicated by the proof rulesthemselves), and the upper estimations tell us a su-perset for extending a given sequence of assertions.

5 Further Rules for Liveness

Park’s induction rule for greatest fixpoints used in The-orem 4.1 is based on the fact that all greatest fixpointscontain all post-fixpoints. This yields induction rulesfor proving greatest fixpoints, but not for least fix-points. We have developed by Theorem 4.2 an ad-ditional rule for proving least fixpoints that aims atderiving a sequence of properties below the least fix-point that finally cover the initial states. In this sec-tion, we consider further induction rules for livenessproperties.

To this end, we first study in detail why the Parkinduction rule fails to prove liveness properties (andleast fixpoints in general). If we would apply it toprove AFϕ and EFϕ, we would obtain the followingincorrect (!) rules:

ΨI → ψ ψ → ϕ ∨�ψ

ΨI →?? AFϕ

ΨI → ψ ψ → ϕ ∨♦ψ

ΨI →?? EFϕ

The right hand side assumptions of the rules ensurethat ψ is a post-fixpoint of the state transformersf (Q) := Jϕ ∨�xK

K Qx

and g(Q) := Jϕ ∨♦xKK Q

x, respec-

tively. However, while we can guarantee that everypost-fixpoint is contained in the corresponding great-est fixpoint, we cannot guarantee that it is also con-tained in the least fixpoint.

The above rules are therefore not correct: For exam-ple, consider the transition system shown in Figure 2,that consists of a cycle of states s0 → . . .→ sn−1 → s0such that all states on the cycle satisfy ¬ϕ ∧ψ and as-sume that s0 is the only initial state. Thus, on everystate of the cycle, we have ♦ψ, thus also ψ → ϕ ∨♦ψ,and we also have ΨI → ψ, but EFϕ does not hold onthis transition system (and neither does AFϕ hold).The above rules are therefore not correct.

Still, it makes sense to discuss further conditionsthat could make the above rules work. Intuitively, theassumption ψ → ϕ ∨♦ψ demands that for all statesthat satisfy our inductive assertion ψ, there is either asuccessor state that also satisfies the inductive asser-tion ψ or that ϕ holds. The only reason why that mayfail is the above mentioned counterexample, i.e., thatwe never find a state where ϕ holds, and instead onlyfind a successor state within ψ. That this is the onlycounterexample is stated in the following lemma:

Lemma 5.1 (Inductive Assertions for Liveness)The following rules are valid:

ψ → ϕ ∨�ψ

ψ → A(Gψ ∨Fϕ)ψ → ϕ ∨♦ψ

ψ → EGψ ∨EFϕ

Proof: To prove the first rule, note that A(Gψ ∨Fϕ)is equivalent to the CTL formula A[ϕ B (¬ψ ∧¬AFϕ)],


13

ΨI → ψ ψ → α ∧♦ψ

ΨI → EGα

ΨI → ψ0

n−1∧i=0

(ψi→ ψi+1∧♦ψi) ψn→ ψn−1

ΨI → EGϕ

ψ0→ β

n−1∧i=0

(ψi→ ψi+1)n−1∧i=0

(ψi+1→ β ∨♦ψi) ΨI → ψn

ΨI → EFβ

ΨI → ψ0∧n−1

i=0 (ψi→ ϕ ∨♦ψi+1) ψn→ ϕ

ΨI → EFϕ

ΨI → ψ ψ → β ∨α ∧♦ψ

ΨI → E[α U β ]ψ0→ β

n−1∧i=0

(ψi→ ψi+1)n−1∧i=0

(ψi+1→ β ∨α ∧♦ψi) ΨI → ψn

ΨI → E[α U β ]

ΨI → ψ ψ →¬β ∧ (α ∨♦ψ)

ΨI → E[α B β ]ψ0→ α ∧¬β

n−1∧i=0

(ψi→ ψi+1)n−1∧i=0

(ψi+1→¬β ∧ (α ∨♦ψi)) ΨI → ψn

ΨI → E[α B β ]

Figure 1 Induction Rules for Temporal Logic Formulas: In addition to the rules shown above, the same rulesare correct and complete when E and ♦ are replaced with A and �, respectively.

¬ϕ ∧ψ

s0

. . . ¬ϕ ∧ψ

sn−1

Figure 2 Counterexample for Applying Park’s Induc-tion Rules for Liveness.

and the CTL formula A[α B β ] can be expressed asthe fixpoint νx.¬β ∧ (α ∨�x). Therefore, A(Gψ ∨Fϕ)is equivalent to the fixpoint νx.(ψ ∨AFϕ)∧ (ϕ ∨�x).Hence, if ψ → ϕ ∨�ψ holds, it follows that ψ is apost-fixpoint of νx.(ψ ∨AFϕ)∧ (ϕ ∨�x), so that thecorrectness of the rule follows from the Park induc-tion rules (note that ψ → ϕ ∨�ψ is equivalent toψ → (ψ ∨AFϕ)∧ (ϕ ∨�ψ).

To prove the second rule, note first the followingequivalences

ψ → EGψ ∨EFϕ

⇔ ψ → ψ ∧♦EGψ ∨ϕ ∨♦EFϕ

⇔ ψ → ♦EGψ ∨ϕ ∨♦EFϕ

⇔ ψ → ϕ ∨♦EGψ ∨♦EFϕ

⇔ ψ → ϕ ∨♦(EGψ ∨EFϕ)

Now assume ψ holds on an arbitrary state si of thetransition system. By our assumption ψ→ ϕ ∨♦ψ, weconclude by modus ponens that also ϕ ∨♦ψ holds onsi. In case ϕ holds on si, we conclude with the aboveequivalences that ψ → EGψ ∨EFϕ holds on si. Other-wise, it remains to prove that si satisfies ♦(EGψ∨EFϕ)in case it satisfies ψ ∧¬ϕ ∧♦ψ. Since si satisfies ♦ψ,there is a successor state si+1 of si that satisfies ψ.With the same reasoning as done for si, we now gettwo cases: in the first one where si+1 satisfies ϕ, theproof is done, otherwise, we end up with the task toprove that si+1 satisfies ♦(EGψ ∨EFϕ) in case it satis-

fies ψ∧¬ϕ∧♦ψ. Repeating this reasoning generates asequence of states s0,. . . ,sn−1 until the final state sn−1already appeared on the sequence (since we only con-sider finite state transition systems). This closes a loopso that we have found a path on the transition systemwhere in each state we have ψ ∧¬ϕ. It follows thaton all of these states, we have EGψ, so that the opencase of our proof is closed. �

The above lemma tells us which conclusions we re-ally can draw from the assumptions ψ → ϕ ∨♦ψ andψ → ϕ ∨�ψ. The lemma specifies in some sense theset of post-fixpoints of the state transformers f (Q) :=Jϕ ∨�xK

K Qx

and g(Q) := Jϕ ∨♦xKK Q

x, respectively.

As can be seen, we cannot conclude AFϕ andEFϕ, respectively, but at least the weaker formulasA(Gψ ∨Fϕ) and EGψ ∨EFϕ, respectively. Moreover,the lemma shows that the only counterexamples forthe correctness of the rules discussed at the beginningof this section are those where there are states withψ-cycles that do not contain a ϕ-state, i.e., the coun-terexample given is essentially the only one.

Hence, to prove liveness properties AFϕ and EFϕ,we have to additionally ensure the absence of ψ-cycleswhich exclude ϕ. In essence, this means that for ev-ery ψ-state, we are allowed to finitely often delay thevalidity of ϕ, but not infinitely often. To this end, weneed to introduce some kind of progress measure, i.e.,a Noetherian ordering ξ such that whenever a ψ-statedoes not satisfy ϕ, then ξ (s) w ξ (s′) for at least oneor all (depending on the rule) successor state(s) s′ ofs. Such rules are often used in termination proofs ofHoare calculus where a Noetherian order is used toprove the termination.

Another idea is to work with a sequence of asser-tions that make progress towards the ϕ-states:


14

Theorem 5.1 (Induction Rules for Liveness) Thefollowing rules are correct and complete:

ΨI → ψ0∧n−1

i=0 ψi→ ϕ ∨�ψi+1 ψn→ ϕ

ΨI → AFϕ

ΨI → ψ0∧n−1

i=0 ψi→ ϕ ∨♦ψi+1 ψn→ ϕ

ΨI → EFϕ

Proof: The correctness of the above rules follows byreconsidering the proofs of Lemma 5.1. The finitenessof the sequence of assertions ψ0,. . . ,ψn enforces thatat least after n steps ϕ-states are reached. The live-ness properties are this way proved as bounded livenessproperties and as such as safety properties [18, 56].The completeness follows by either choosing the se-quence of successor states starting from the initialstates or the sequence of predecessor states startingfrom the ϕ-states. �

Proving liveness properties as bounded safety prop-erties [6] is another well-known approach that we canformulate as shown above as induction rules. We havetherefore shown that induction is a principle that alsoworks well for liveness.

6 Conclusions and Future Work

We have proved the correctness and completeness ofinduction rules for proving least and greatest fixpointsof the µ-calculus. While the induction rules for prov-ing greatest fixpoints are already known as Park’s fix-point induction rules, the rules we developed for prov-ing least fixpoints are new. We instantiated these rulesas special induction rules for all CTL formulas. More-over, we developed in Theorem 5.1 a further specialinduction rule for proving liveness properties that con-siders liveness as a bounded safety property. Theseresults of this paper clearly demonstrate that the in-duction principle is not limited to safety properties sothat inductive proof procedures similar to PDR mayalso be developed for other properties.

While the proof rules shown in this paper indicatethat the use of induction for the verification of tem-poral logic properties can be significantly extended,there are many problems that still have to be solved:We already pointed out that proving a universal in-ductiveness ϕ → �ϕ of a property ϕ can be done bystandard SAT solvers while QSAT-solvers or BDDs arerequired to prove existential inductiveness ϕ → ♦ϕ.Future work has to consider how these special casescan be solved, e.g. for special cases like determinis-tic transition systems. Moreover, new heuristics haveto be developed for the incremental construction ofassertions for the induction rules similar to PDR.

7 References[1] A. Albarghouthi and K. McMillan. Beautiful interpolants. In

N. Sharygina and H. Veith, editors, Computer Aided Verifica-

tion (CAV), volume 8044 of LNCS, pages 313–329, Saint Pe-tersburg, Russia, 2013. Springer.

[2] R. Armoni, L. Fix, R. Fraer, S. Huddleston, N. Piterman, andM. Vardi. SAT-based induction for temporal safety proper-ties. Electronic Notes in Theoretical Computer Science (ENTCS),119:3–16, 2005.

[3] C. Baier and J.-P. Katoen. Principles of Model Checking. MITPress, Cambridge, Massachusetts, USA, 2008.

[4] M. Ben-Ari. Mathematical Logic for Computer Science.Springer, 3 edition, 2012.

[5] M. Ben-Ari, Z. Manna, and A. Pnueli. The temporal logicof branching time. In Principles of Programming Languages(POPL), pages 164–176, 1981.

[6] A. Biere, C. Artho, and V. Schuppan. Liveness checking assafety checking. Electronic Notes in Theoretical Computer Sci-ence (ENTCS), 66(2):160–177, December 2002.

[7] A. Biere, A. Cimatti, E. Clarke, M. Fujita, and Y. Zhu. Sym-bolic model checking using SAT procedures instead of BDDs.In Design Automation Conference (DAC), pages 317–320, NewOrleans, Louisiana, USA, 1999. ACM.

[8] A. Biere, A. Cimatti, E. Clarke, O. Strichman, and Y. Zhu.Bounded model checking. In M. Zelkowitz, editor, Advancesin Computers, volume 58, pages 118–149. Acad. Press, 2003.

[9] A. Biere, E. Clarke, R. Raimi, and Y. Zhu. Verifying safetyproperties of a PowerPC microprocessor using symbolic modelchecking without BDDs. In N. Halbwachs and D. Peled,editors, Computer Aided Verification (CAV), volume 1633 ofLNCS, pages 60–71, Trento, Italy, 1999. Springer.

[10] A. Biere, M. Heule, H. van Maaren, and T. Walsh, editors.Handbook of Satisfiability, volume 185 of Frontiers in ArtificialIntelligence and Applications. IOS Press, 2009.

[11] J. Bradfield and I. Walukiewicz. The mu-calculus and modelchecking. In E. Clarke, T. Henzinger, H. Veith, and R. Bloem,editors, Handbook of Model Checking, chapter 26, pages 871–919. Springer, 2018.

[12] A. Bradley. SAT-based model checking without unrolling. InR. Jhala and D. Schmidt, editors, Verification, Model Checking,and Abstract Interpretation (VMCAI), volume 6538 of LNCS,pages 70–87, Austin, Texas, USA, 2011. Springer.

[13] A. Bradley. IC3 and beyond: Incremental, inductive verifica-tion. In P. Madhusudan and S. Seshia, editors, Computer AidedVerification (CAV), volume 7358 of LNCS, page 4, Berkeley,California, USA, 2012. Springer.

[14] A. Bradley. Understanding IC3. In A. Cimatti and R. Sebas-tiani, editors, Theory and Applications of Satisfiability Testing(SAT), volume 7317 of LNCS, pages 1–14, Trento, Italy, 2012.Springer.

[15] A. Bradley, F. Somenzi, Z. Hassan, and Y. Zhang. An in-cremental approach to model checking progress properties.In P. Bjesse and A. Slobodová, editors, Formal Methods inComputer-Aided Design (FMCAD), pages 144–153, Austin,Texas, USA, 2011. IEEE Computer Society.

[16] R. Brayton and A. Mishchenko. ABC: An academic industrial-strength verification tool. In T. Touili, B. Cook, and P. Jackson,editors, Computer Aided Verification (CAV), volume 6174 ofLNCS, pages 24–40, Edinburgh, Scotland, UK, 2010. Springer.

[17] R. Bryant. Graph-based algorithms for Boolean function ma-nipulation. IEEE Transactions on Computers (T-C), 35(8):677–691, August 1986.

[18] J. Burch. Verifying liveness properties by verifying safetyproperties. In E. Clarke and R. Kurshan, editors, ComputerAided Verification (CAV), volume 531 of LNCS, pages 224–232,New Brunswick, New Jersey, USA, 1991. Springer.

[19] J. Burch, E. Clarke, K. McMillan, D. Dill, and L. Hwang. Sym-bolic model checking: 1020 states and beyond. In Logic inComputer Science (LICS), pages 1–33, Washington, District ofColumbia, USA, 1990. IEEE Computer Society.

[20] J. Burch, E. Clarke, K. McMillan, D. Dill, and L. Hwang. Sym-bolic model checking: 1020 states and beyond. Informationand Computation, 98(2):142–170, June 1992.

[21] C. Calude, S. Jain, B. Khoussainov, W. Li, and F. Stephan.Deciding parity games in quasipolynomial time. In H. Hatami,P. McKenzie, and V. King, editors, Symposium on Theory ofComputing (STOC), pages 252–263, Montreal, QC, Canada,


15

2017. ACM.[22] K. Claessen and N. Sörensson. A liveness checking algorithm

that counts. In G. Cabodi and S. Singh, editors, Formal Meth-ods in Computer-Aided Design (FMCAD), pages 52–59, Cam-bridge, UK, 2012. ACM and IEEE Computer Society.

[23] E. Clarke, A. Biere, R. Raimi, and Y. Zhu. Bounded modelchecking using satisfiability solving. Formal Methods in SystemDesign (FMSD), 19(1):7–34, July 2001.

[24] E. Clarke, T. Henzinger, H. Veith, and R. Bloem, editors.Handbook of Model Checking. Springer, 2018.

[25] W. Craig. Three uses of the Herbrand-Gentzen theorem in re-lating model theory and proof theory. The Journal of SymbolicLogic, 22(3):269–285, 1957.

[26] E. Emerson and C. Jutla. Tree automata, µ-calculus and de-terminacy. In Foundations of Computer Science (FOCS), pages368–377, San Juan, Puerto Rico, 1991.

[27] E. Emerson, C. Jutla, and A. Sistla. On model checking forthe µ-calculus and its fragments. Theoretical Computer Science(TCS), 258(1-2):491–522, 2001.

[28] N. Eén, A. Mishchenko, and R. Brayton. Efficient implementa-tion of property directed reachability. In P. Bjesse and A. Slo-bodová, editors, Formal Methods in Computer-Aided Design(FMCAD), pages 125–134, Austin, Texas, USA, 2011. IEEEComputer Society.

[29] N. Eén and N. Sörensson. Temporal induction by incrementalSAT solving. Electronic Notes in Theoretical Computer Science(ENTCS), 89(4), 2003.

[30] J. Fearnley, S. Jain, S. Schewe, F. Stephan, and D. Wojtczak.An ordered approach to solving parity games in quasi poly-nomial time and quasi linear space. In H. Erdogmus andK. Havelund, editors, Model Checking Software (SPIN), pages112–121, Santa Barbara, CA, USA, 2017. ACM.

[31] A. Griggio and M. Roveri. Comparing different variants ofthe IC3 algorithm for hardware model checking. Transactionson Computer-Aided Design of Integrated Circuits and Systems(TCAD), 35(6):1026–1039, June 2016.

[32] O. Grumberg and H. Veith, editors. 25 Years of Model Checking– History, Achievements, Perspectives, volume 5000 of LNCS.Springer, 2008.

[33] A. Gurfinkel and A. Ivrii. K-induction without unrolling. InD. Stewart and G. Weissenbacher, editors, Formal Methodsin Computer-Aided Design (FMCAD), pages 148–155, Vienna,Austria, 2017. ACM.

[34] Z. Hassan, A. Bradley, and F. Somenzi. Incremental, in-ductive CTL model checking. In P. Madhusudan and S. Se-shia, editors, Computer Aided Verification (CAV), volume 7358of LNCS, pages 532–547, Berkeley, California, USA, 2012.Springer.

[35] Z. Hassan, A. Bradley, and F. Somenzi. Better generalizationin IC3. In B. Jobstmann and S. Ray, editors, Formal Methodsin Computer-Aided Design (FMCAD), pages 157–164, Portland,Oregon, USA, 2013. IEEE Computer Society.

[36] A. Ivrii, Z. Nevo, and J. Baumgartner. k-FAIR = k-LIVENESS+ FAIR revisiting SAT-based liveness algorithms. In For-mal Methods in Computer-Aided Design (FMCAD), pages 1–5,Austin, TX, USA, 2018. IEEE Computer Society.

[37] Y. Kesten and A. Pnueli. A complete proof systems for QPTL.In Logic in Computer Science (LICS), pages 2–12, San Diego,California, USA, 1995. IEEE Computer Society.

[38] X. Li and K. Schneider. Control-flow guided clause generationfor property directed reachability. In M. Vechev, editor, High-Level Design Validation and Test Workshop (HLDVT), pages 17–24, Santa Cruz, USA, 2016. IEEE Computer Society.

[39] X. Li and K. Schneider. Control-flow guided property directedreachability for synchronous programs. In E. Leonard andK. Schneider, editors, Formal Methods and Models for Code-sign (MEMOCODE), pages 23–33, Kanpur, India, 2016. IEEEComputer Society.

[40] Z. Manna and A. Pnueli. Verification of concurrent programs:Temporal proof principles. In D. Kozen, editor, Logics ofPrograms, volume 131 of LNCS, pages 200–252, YorktownHeights, New York, USA, 1982. Springer.

[41] Z. Manna and A. Pnueli. How to cook a temporal proof systemfor your pet language. In Principles of Programming Languages

(POPL), pages 141–154, New York, New York, USA, 1983.ACM.

[42] Z. Manna and A. Pnueli. Adequate proof principles for invari-ance and liveness properties of concurrent programs. Scienceof Computer Programming, 4(3):257–290, 1984.

[43] Z. Manna and A. Pnueli. Completing the temporal pic-ture. Theoretical Computer Science (TCS), 83(1):97–130, June1991.

[44] K. McMillan. Circular compositional reasoning about liveness.In L. Pierre and T. Kropf, editors, Correct Hardware Design andVerification Methods (CHARME), volume 1703 of LNCS, pages342–346, Bad Herrenalb, Germany, 1999. Springer.

[45] K. McMillan. Interpolation and SAT-based model checking. InW. Hunt and F. Somenzi, editors, Computer Aided Verification(CAV), volume 2725 of LNCS, pages 1–13, Boulder, Colorado,USA, 2003. Springer.

[46] K. McMillan. An interpolating theorem prover. In K. Jensenand A. Podelski, editors, Tools and Algorithms for the Construc-tion and Analysis of Systems (TACAS), volume 2988 of LNCS,pages 16–30, Barcelona, Spain, 2004. Springer.

[47] K. McMillan. Applications of Craig interpolants in modelchecking. In N. Halbwachs and L. Zuck, editors, Tools and Al-gorithms for the Construction and Analysis of Systems (TACAS),volume 3440 of LNCS, pages 1–12, Edinburgh, Scotland, UK,2005. Springer.

[48] K. McMillan, S. Qadeer, and J. Saxe. Induction in composi-tional model checking. In E. Emerson and A. Sistla, editors,Computer Aided Verification (CAV), volume 1855 of LNCS,pages 312–327, Chicago, Illinois, USA, 2000. Springer.

[49] C. Meinel and T. Theobald. Algorithms and Data Structures inVLSI Design: OBDD – Foundations and Applications. Springer,1998.

[50] D. Park. Fixpoint induction and proof of program semantics.In B. Melzer and D. Michie, editors, Machine Intelligence, vol-ume 5, pages 59–78. Edinburgh University Press, 1970.

[51] A. Pnueli and Y. Kesten. A deductive proof system for CTL*.In L. Brim, P. Jancar, M. Kretínský, and A. Kucera, editors,Concurrency Theory (CONCUR), volume 2421 of LNCS, pages24–40, Brno, Czech Republic, 2002. Springer.

[52] A. Pnueli, S. Ruah, and L. Zuck. Automatic deductive veri-fication with invisible invariants. In T. Margaria and W. Yi,editors, Tools and Algorithms for the Construction and Analy-sis of Systems (TACAS), volume 2031 of LNCS, pages 82–97,Genoa, Italy, 2001. Springer.

[53] A. Pnueli and E. Shahar. A platform combining deductivewith algorithmic verification. In R. Alur and T. Henzinger,editors, Computer Aided Verification (CAV), volume 1102 ofLNCS, pages 184–195, New Brunswick, New Jersey, USA,1996. Springer.

[54] K. Schneider. Verification of Reactive Systems – Formal Meth-ods and Algorithms. Texts in Theoretical Computer Science(EATCS Series). Springer, 2003.

[55] K. Schneider, R. Kumar, and T. Kropf. Alternative proof pro-cedures for finite-state machines in higher-order logic. InJ. Joyce and C.-J. Seger, editors, Theorem Proving in HigherOrder Logics (TPHOL), volume 780 of LNCS, pages 213–226,Vancouver, British Columbia, Canada, 1994. Springer.

[56] V. Schuppan. Liveness Checking as Safety Checking to FindShortest Counterexamples to Linear Time Properties. PhD the-sis, Swiss Federal Institute of Technology Zurich, Zurich,Switzerland, 2006. PhD.

[57] M. Sheeran, S. Singh, and G. Stålmarck. Checking safetyproperties using induction and a SAT-solver. In W. Hunt andS. Johnson, editors, Formal Methods in Computer-Aided De-sign (FMCAD), volume 1954 of LNCS, pages 108–125, Austin,Texas, USA, 2000. Springer.

[58] F. Somenzi and A. Bradley. IC3: where monolithic and in-cremental meet. In P. Bjesse and A. Slobodová, editors, For-mal Methods in Computer-Aided Design (FMCAD), pages 3–8,Austin, Texas, USA, 2011. IEEE Computer Society.

[59] I. Wegener. Branching Programs and Binary Decision Dia-grams. Monographs on Discrete Mathematics and Applica-tions. Society of Industrial and Applied Mathematics (SIAM),2000.


16

Approximation of Neural Networks for VerificationFin Hendrik Bahnsen∗and Goerschwin Fey†

Institute of Embedded Systems, Hamburg University of Technology, Hamburg, Germany

Abstract

Statistical learning methods enable the adaptation of artificial neural networks (ANN) to complex problems. Meanwhile,formal properties can be verified on small ANNs under simplified assumptions.First we show a simple algorithm to convert neural networks into a system of equations with boundary conditions. Inparticular, we discuss how non-linear functions may be approximated. In experiments we study the impact of this ap-proximation on the validity on the proof of formal guarantees.

1 Introduction

Artificial neural networks (ANN) [1] are statistical mod-els and are extremely flexible classification and regressiontools [2]. In domains where established algorithms fail,ANNs achieve impressive results, e.g. in image and speechrecognition [3, 4]. Therefore ANNs are widely applied innearly all domains were statistics provides accurate mod-els. In future scenarios, ANNs could take over complextasks in safety-critical systems, e.g. in the domain of au-tonomous vehicles or drones [5, 6].

Statistical learning methods are used to adapt an ANN toa specific problem [2]. The aim in this learning process isto achieve generalisation of the ANN [7]. In simple terms,the ANN is intended to provide correct results for inputsthat have not been used during learning. In general, rep-resentative example solutions of the problem must be usedin the learning process in order to solve a problem withthe ANN also on unseen problem instances in a suitableway. Nevertheless, the ANN can provide incorrect resultsfor some inputs, e.g. [8]. Inaccuracies in the calculatedmapping between input and output can be intentionally ex-ploited to falsify the result of the ANN, e.g. [9, 10]. Thus,the use of ANNs for safety relevant applications and de-vices is critical and ultimately prevented.

Learned ANNs encode the problem in their own represen-tation so that proving a formal property is not obvious forhumans. Applying a theorem proof engine, e.g. a solverfor Satisfiability Modulo Theories (SMT) is not too diffi-cult, but results in large run times and requires approxi-mations in certain cases. Thus, at first checking only verysmall ANNs up to 20 neurons for formal properties wasfeasible, e.g. [11, 12]. Depending on the SMT solver inuse non-linear dependencies may directly be modeled, too[13]. A newer approach called Reluplex modifies the sim-plex algorithm contained in an SMT proof engine to verifyformal properties on substantially larger ANNs [14]. Thisproof engine can also be used to check the global robust-ness of an ANN [15]. Reluplex directly tackles the problem∗Email: [email protected]†Email: [email protected]

of non-linear activation functions contained in an ANN butlimits the proof algorithm to ANNs using the Rectified Lin-ear Unit (ReLU) [16] as activation function.

We discuss an algorithm to translate an ANN to an SMTproblem. We follow the idea of Reluplex. However, for thetranslation we use only step-wise linear functions that ap-proximate non-linear activation functions in an ANN. Bydoing this the translation is not limited to ReLU functions.This is important since various different functions are usedin ANNs [17]. In addition, this is motivated by efficientSMT solvers for linear functions. In experiments we studythe impact of the approximation on the quality of resultsfor proving formal guarantees. We first consider Booleanfunctions since formal guarantees can easily be derived.Then, we consider a case study on a non-linear classifica-tion problem. In the experiments the quality of the approx-imation suffices to match the exact results.

The structure of this paper is as follows: In Section 2 weintroduce ANNs in a formal way and in Section 3 we dis-cuss some commonly used activation functions. In Section4 we present a motivating example. An algorithmic wayto translate an ANN to a set of assertions for SMT solv-ing is described in Section 5. Our experiments to inves-tigate step-wise linearly approximated activation functionsis presented in Section 6. Finally, we give a conclusion inSection 7.

2 Artificial Neural Networks

As already mentioned, ANNs can be used for both classi-fication and regression problems. Such problems can gen-erally be modeled using a mapping ~f : Rm → Rn with minputs and n outputs. ANNs model the relation between aninput vector ~x0 and an output vector ~y. The ANN output~y =~xk of a k-layer feed-forward ANN (multilayer percep-tron, short MLP) with input~x0 is given recursively by

~xi+1 = σi

(Wi ·~xi +~bi

)︸︷︷︸

~hi

= σi(~hi) with 0≤ i < k, (1)


17

x(1)0

x(2)0

...

x(m)0

x(1)1

x(2)1

...

x(α)1

x(1)k−1

x(2)k−1

...

x(β )k−1

x(1)k

x(2)k

...

x(n)k

Inputs Hidden Neurons Outputs

Figure 1 Concept of a MLP with inputs (green), hiddenneurons (yellow) and outputs (red). Not drawn neuron andlayers are implied by dots and dashed arrows respectively.

x(1)i

x(2)i

...

x(p)i

Σ σi

b(q)i

x(q)i+1

w(q,1)i

w(q,2)i

w(q,p)i

Figure 2 Calculus graph of the q-th neuron of the i-th layerwith p inputs. The weights w(q,·)

i beside the edges are mul-tiplied with the value of the edge source node.

where Wi is the weight matrix, ~bi is the bias vector andσi(·) is the non-linear activation function of layer i, respec-tively. The activation functions σi are evaluated componentwise for vectors.

A representation of a general MLP can be seen in Figure1. Since the number of hidden neurons is not determinedby the dimension of the input m or the dimension of theoutput n, α and β are two arbitrary integers to take thisinto account. As indicated in the figure, there can be anynumber of hidden layers. In Figure 2 a single neuron inthe (i+1)-th layer, here the q-th neuron, is shown. In bothfigures the value in the superscript denotes the componentof a vector or a matrix. First, all outputs of the previouslayer are multiplied with the corresponding weight w(q,·)

i

and then summed up with the bias value b(q)i of the neuron.This is the weighted sum which is written as matrix-vector-multiplication for all neurons in one layer as shown in Eq.(1). For later use we depict this sum as h(q)i , which is theargument passed to the activation function σi.

In general, an ANN is trained by optimizing all Wi and~bi for known (~x0,~y)-tuples to minimize the model loss-function L (~y,~y), which measures the deviation betweenexpected real output~y and inferred output ~y statistically.

−8 −6 −4 −2 0 2 4 6 8

0

2

4

6

8

x

y

Figure 3 Softplus (blue; dashed) and ReLU (red; solid)Function.

3 Activation Functions

If a linear function is selected for all activation functionsσi of an ANN, Eq. (1) can be combined in a single ma-trix. In this case, the ANN would be a linear representationand could only represent linear relationships. Therefore anon-linear function is usually chosen for σi. A single dis-continuity in the domain of σi is sufficient, so that complexnon-linear correlations can be approximated by the ANN.

Another task of the activation function is the mapping ofthe calculated values for ~hi into a desired range. For ex-ample, ranges between 0 and 1 are common for the inter-pretation of the values as probability or between -1 and 1in order to take anti-correlations in the ANN into account.The choice of the activation function strongly depends onthe problem to be solved and how the values have to be in-terpreted. In the following, the most frequently used acti-vation functions together with cost-efficient linear approx-imations are introduced. Here, cost-efficient means thatonly a few constraints are needed to model the respectivefunction.

3.1 Softplus and ReLU

The concept of the activation function can also be moti-vated in the manner of a biological nerve cell. Only whena certain threshold value is exceeded, which is given for anANN by the bias ~bi, the neuron transmits a signal to thefollowing neurons. Historically, an activation function thatreplicates the so-called activation potential of a biologicalnerve cell was obvious. The softplus function fulfills ex-actly this purpose and is given by

softplus(x) = ln(ex +1). (2)

With the motivation to minimize the computational effortfor ANNs with softplus activation functions, the much sim-pler Rectified Linear Unit (relu) function was investigated,


18

−8 −6 −4 −2 0 2 4 6 8

0

0.2

0.4

0.6

0.8

1

x

y

Figure 4 Sigmoid Function (blue; dashed) and the step-wise linear approximation function hard sigmoid (red;solid).

which can be calculated by

relu(x) = max(0,x). (3)

With the maximum function the relu function can be tracedback to the simple case distinction

relu(x) ={

0 for x≤ 0x for x > 0 . (4)

Both functions are shown in Figure 3. Neural networkswith relu activation functions provide consistently good re-sults, so that the use of relu activation functions is now es-tablished [16].

3.2 Sigmoid and Hard Sigmoid

During statistical learning, the derivation must be calcu-lated for the activation function. The Sigmoid function

sig(x) =1

1+ e−x (5)

is established in the area of ANNs due to its simple deriva-tion. The derivation

sig′(x) = sig(x) · (1− sig(x)) (6)

uses the original function value. If the function is im-plemented efficiently for a quick evaluation of an ANN,then the derivation can generally be derived efficiently,too. Since the domain of the function represents the valuesbounded between 0 and 1, the Sigmoid function is usuallyused to interpret the result as probability. A very simplebut accepted approximation of the Sigmoid function is thestep-wise linear Hard Sigmoid function

hsig(x) =

0 for x≤−2.50.2x+0.5 for −2.5 < x < 2.51 for x≥ 2.5

(7)

−4 −2 0 2 4

−1

−0.5

0

0.5

1

x

y

Figure 5 Hyperbolic Tangent (blue; dashed) and the step-wise linear approximation function Hard Hyperbolic Tan-gent (red; solid).

The Sigmoid function and the approximated Hard Sigmoidfunction are plotted in Figure 4. Another equation usedlater is the inverse Sigmoid function which is given by

logit(x) = sig−1(x) = ln(

x1− x

). (8)

3.3 Hyperbolic Tangent

Strictly speaking, the tangens hyperbolicus function

tanh(x) =e2x−1e2x +1

(9)

as well as the Sigmoid function belong to the class of lo-gistic functions and have comparable properties. An im-portant difference is that the value range of the function isbetween -1 and 1. As already mentioned, this property isusually used so that the ANN itself maps anti-correlationsvia the activation function. Therefore the function is of-ten used in hidden layers and for so-called recurrent ANNscontaining inner-layer and self connections. We define thestep-wise linear Hard Hyperbolic Tangent function

htanh(x) =

−1 for x≤−1x for −1 < x < 11 for x≥ 1

. (10)

The Hyperbolic Tangent and its approximation are bothplotted in Figure 5.

4 Motivating Example

As mentioned above, SMT solvers can handle linear func-tions quite effectively. A linearization of the activationfunction of an ANN can be realized, but needs a large num-ber of constraints if the precision of floating point numbersmust be matched. Ultimately, performance is the limiting


19

ANN Layer

InputBounds

Propertyto Check

WeightedSum

Neuron

Activation

Alg. 1 Assertions

WeightedSum

Neuron

Activation

Figure 6 Flowchart of the translation tool. Data structureswhich need to be given as input in green.

factor for proofs on ANNs. Replacing the activation func-tion with one of the already presented piece-wise linearizedfunctions is therefore of interest. With a simple and moti-vating example we show how this replacement of the acti-vation function can result in a different ANN output.

A simple exclusive-or-operation

y = xor(a,b) = a⊕b (11)

of two Boolean variables a and b is a good example of anon-linear separable problem [18]. A perceptron with asingle neuron as shown in Figure 2 cannot solve this prob-lem. As described in [2] at least one hidden layer with non-linear activation functions is needed to fully classify an xoroperation. We use this example to motivate our investi-gation. We have created an MLP with 3 neurons, whichtakes Boolean variables as inputs~x0 = a,b and outputs thexor-connection as ~x2 = y. To adjust the ANN for all fourpossible assignments of a and b requires about 800 train-ing epochs. For the selected example, one specific trainedANN can be specified according to equation Eq. (1) with

~x1 = σ

[(−4.24 3.50

1.59 −1.55

)·~x0 +

(−2.28−0.92

)](12)

~x2 = σ[(

3.20 1.99)·~x1 +1.30

], (13)

where the sigmoid function sig(·) in Eq. (5) is used as ac-tivation function σ(·) during training. This concrete ANNis one complete representation of the xor-operation as longas the sigmoid function is used as activation function. TheANN reproduces correct outputs for the four possible as-signments of a and b if the result is rounded to 1 or 0,respectively.

If the hard sigmoid function hsig(·) in Eq. (7) is chosen asapproximation for the activation function, the example forthe input ~x0 = {1,0} gives the wrong output 0 instead of1. In the case of this specific example, the ANN could betrained a little longer and would then also represent the xorcompletely for the approximation with the hsig function.But, in general this option may not be feasible:

x(1)i

x(2)i

...

x(o)i

h(1)i

h(2)i

...

h(p)i

x(1)i+1

x(2)i+1

...

x(p)i+1

h(1)i+1

h(2)i+1

...

h(q)i+1

σi

σi

σi

Inputs Splitted Neurons

Figure 7 Modified ANN architecture with split neurons.The translation tool uses this kind of representation to de-fine unique symbols which are used in the assertions.

• The training has to be stopped at a certain point toprevent the ANN from so-called overfitting and fromloosing its generalisation.

• The continuation of the training does not necessarilyimprove the quality for an evaluation of the ANN withapproximated activation functions.

Alternatively, a model may be trained directly with the ap-proximated activation function, so that a verification or aproof of the ANN does not contain any inaccuracy for theactivation functions. However, this excludes models andproblems where this is not suitable, e.g. an ANN can rep-resent continuous mappings only to a limited extent withactivation functions that are only piece-wise continuouslydifferentiable. This problem is also known as ringing insignal theory [19]. This motivates a further investigationof the encountered approximation error.

5 ANN Translation to SMT

We have developed a tool to translate ANNs for SMTsolvers. Our algorithm first iterates over the layers ofan ANN and defines named variables, so-called symbols,which can then be used in equations. We call these equa-tions assertions, because they describe the problem math-ematically and limit the problem by boundary conditions.Figure 6 gives an overview of the information flow in ourtool. The ANN model is iterated layer-wise along the hier-archy of the model. The Neurons included in one layer aretranslated each to an assertion. The activation function isdefined either directly or via Algorithm 1 defined later. Inaddition to the ANN model, input restrictions and a prop-erty to be checked are translated into assertions.

Our tool reads the weight matrices Wi and the bias vectors~bi from the model definition. We transform the ANN intofloating point constraints as defined in the SMT-lib stan-dard. The weighted sum for~hi in Eq. (1) adds assertions tothe SMT instance for each neuron.

In Figure 7 the values calculated for ~hi and ~hi+1 are de-


20

picted in blue. These are available as symbol definitionsafter the previous step. The activation function for eachneuron uses a fresh symbol that corresponds to the valuefor ~xi+1 in Eq. (1) (depicted in yellow in the figure). Thepresented piece-wise linear activation functions relu, hsigand htanh from equations (4, 7 and 10) can be directly con-verted into an if-then-else-assertion. The SMT solver treatssuch assertions by a case split.

The non-linear activation functions softplus, sigmoid andtanh may be approximated by their counterparts relu, hsig,and htanh, respectively. This only requires a few asser-tions in the SMT problem. However, the approximation israther coarse. Alternatively we can choose a finer approx-imation. For this purpose, we use Algorithm 1 to automat-ically approximate these activation functions with a stepfunction. The algorithm uses multiple nested if-then-else-assertions, one per approximation point. The length of thecalculated assertion equals the number of support points,which is given by the interval length stop− start dividedby the approximation precision. These calculated asser-tions are much longer than those for the piece-wise linearrelu, hsig and htanh activation functions. Note that, ANNstrained for hsig, htanh or relu functions do not require ap-proximation in SMT.

On a technical level for the implementation the translationmust be performed according to the hierarchy of the ANNmodel which may contain nested ANN layers in complexnetworks. Otherwise symbols needed for an assertion maynot be defined. Without describing this step in detail, weconstruct an incremental graph structure containing all op-erations of the ANN model in an ordered way and auto-matically derive the names of the symbols from it.

Another part that is important for the translation are as-sumptions about the inputs. This can significantly af-fect the runtime of an SMT solver. Boundaries either re-sult from the problem itself or can be calculated from themaximum and minimum values of the training or test datasets of an ANN model.

6 Experiments

The experiments investigate the output of different ANNmodels combined with multiple activation functions. Ina first experiment models for Boolean functions weretrained. The simplicity of these expressions allows to cre-ate ANN models with exact results by training the modelsfor a complete mapping of the expressions. Further, thesemodels are used to investigate the behavior using piece-wise linear approximated functions. Based on these basicexperiments, we apply the techniques to a more complexnon-linear problem. The results are qualitative statementsabout the approximation accuracy in different situationsleading to a better understanding of the approximation be-havior of ANNs.

All models were trained on a machine with Intel [email protected], Nvidia GeForce 930MX and 32 GB RAM.

Algorithm 1 Approximate Assertions for Activa-tion Functioninput : symbol: a symbol reference

f unc: a function reference to beapproximated

start,stop: a tuple representing the interval[start,stop] the function f unc isapproximated in

precision: the approximation precision

output: an nested if-assertion corresponding tosymbol representing the functionreferenced by f unc.

1 last_assertion:=create constant for func(start);2 for s:=start; s < stop; s+=precision do3 c:=create constant for func(s);4 last_assertion:=create if-then-else assertion(

if symbol <= sthen celse last_assertion)

5 return last_assertion

The ANN models were created using Keras [20] withGoogle Tensorflow [21] as back-end using the python3 lan-guage [22]. For our experiments including SMT checkingwe used the Microsoft z3 Solver [23].

6.1 Boolean Expressions

For the experiments described in this section the followingBoolean expressions were selected:

y1 = a⊕b (14)y2 = a∨b∨ c (15)y3 = a∧b∧ c (16)y4 = (a∨b∨¬c)∧ (¬b∨d)∧ (¬a∨d)∧ e. (17)

The expression in Eq. (14) was chosen to consider a wellknown linearly non-separable problem. The expressionsin Eq. (15) and Eq. (16) model AND or OR-gates of 3variables. They were selected to analyze the base blocksof each Boolean expression independently. The fourth ex-pression in Eq. (17) is an arbitrary Boolean expression inconjuctive normal form (CNF).

In our experiment we record the approximation behavior ofthe piece-wise linear approximation functions in compari-son to the associated activation functions. We train modelswith the exact activation function and evaluate the ANNwithout further adaptation with the approximated activa-tion function. The training data set consists of all possibleassignments for the variables that occur in the expressionsin Eqn. (14 - 17). An evaluation of the ANN on a test dataset is not necessary due to this special problem; the trainingset is a complete representation of the Boolean expression.

We have implemented a mechanism that monitors the


21

0 500 1000 1500 2000Epoch

0.0

0.2

0.4

0.6

0.8

1.0

L

tanhhtanhsighsigsoftplusrelu

Figure 8 Elaboration of the model loss for the activationfunctions together with the corresponding approximationsagainst the training epoch for a full training cycle.

model accuracy during the training process and stops thetraining as soon as the ANN reaches an accuracy of 100 %.At this point, the mapping based on the exact activationfunction is already perfect. The quality of approximationsusing the linearized activation function can then be mea-sured by the accuracy of the ANN with the replaced ap-proximated activation function. To be clear, the learnedweight matrices and bias vectors remain unchanged. Foroptimization the adadelta algorithm [24] was used, whichcorresponds to the state of the art and is typically bestsuited for classification problems.

For the expressions in the Eqn. (15 - 17) ANNs with twohidden layers of 6 neurons each were used, while for theexpression in Eq. (14) an ANN with a total of 3 neurons asin the example in Section 4 was used. Independent fromthe examined activation function, a sigmoid activation wasapplied to the output neuron to represent the result valueof the Boolean expression. This is necessary because themodel loss has to be calculated by the binary cross entropy.

Since the used ANNs are very small, training a model takesonly a few seconds and at most 2 minutes. We trained theANNs 20 times for each pair of activation function and ap-proximation function as well as for each expression fromthe Eqn. (14 - 17) with randomly seeded weights. The re-sult averaged over all 20 runs is shown in Table 1. If a rundid not converge over 20,000 epochs, it was repeated. Thetable also contains information on the achieved loss valuesmeasured via the exact activation function Lact and withthe approximated function Lapx in the model. Since theaccuracy of the ANNs for the exact activation function isby definition 100 %, only the accuracy measured by the ap-proximated activation function is included as acc in percentin the table.

The loss L of the trained ANNs for the evaluation with

Table 1 Average values on 20 training cycles.

Func. Eq. Epochs Lact Lapx accapx (%)

sig

hsig

(14) 870 0.4918 0.5012 69.99(15) 616 0.2026 0.2310 83.12(16) 781 0.1781 0.1936 84.99(17) 1153 0.0654 0.0879 96.88

soft

plus

ReL

U

(14) 894 0.3985 0.3732 85.00(15) 323 0.1237 0.0912 98.13(16) 378 0.1637 0.5037 70.62(17) 750 0.0435 0.2049 93.13

tanh

htan

h

(14) 575 0.4766 0.4455 91.25(15) 148 0.2372 0.2033 100.00(16) 284 0.1723 0.1325 100.00(17) 363 0.1263 0.0957 99.06

the exact activation function and with the correspondingaproximated function versus the training epoch are de-picted in Figure 8 for a full training cycle. The trainingwith backpropagation essentially depends on the deriva-tion of the activation function. Because the derivation ofthe sigmoid function has a maximum value smaller than 1(see Eq. (6)), the learning progress or the training perfor-mance with sigmoid function is lower than with the otheractivation functions, e.g. for the softplus function as wellas for the tanh function there is a maximum value of 1 intheir derivations. Accordingly, significant fewer trainingepochs are required for these functions before the ANNcompletely maps the Boolean expression.

In addition, values are highlighted in the table: Particularlyhigh values are marked in the column for accuracy. In col-umn Lapx values are marked which are below the loss ofthe exact activation function Lact. At least on the inves-tigated expressions, the hard hyperbolic tangent functionsolves the problem better and faster than the tanh function(compare Figure 8). This is also confirmed by the achievedaccuracies.

For the sigmoid function and the tanh function, the lossevaluated on the approximated function follows the courseof the model loss. This indicates that these functions arevery suitable approximations. The experiment proves thisonly empirically for the investigated problem class butadaption to more complex problems seems feasible.

The relu function shows a very unrobust behavior as ap-proximation of the softplus function. We study this byrepeating the experiment multiple times. In Figure 9 theloss evaluated on the relu function is plotted against thetraining epoch for a model trained with softplus activation.The approximation shows a random behavior and sponta-neous instabilities occur. Therefore, the approximation isnot robust. In addition, this is shown by the fact that runsoccurred where the function diverged.

6.2 Property Checking

In this experiment we check the models we trained in Sec-tion 6.1 for the expressions in the equations (14 - 15). The


22

0 500 1000 1500 2000Epoch

0.0

0.2

0.4

0.6

0.8

Lap

x

Figure 9 Several training runs for the expression in Eq.(17) with softplus training function. Shown is the loss ofthe relu function as approximation for the softplus func-tion. The lowest curve starting at about 0.4 (pink) is theloss of the sofplus function of an individual run. The con-vergence behavior is very different; the green curve di-verges. Some curves (brown, green, purple) show spon-taneous instabilities during the training.

Table 2 Results on equivalence checking with theMicrosoft z3 solver on the models trained in Section 6.1.

Func. Eq. assertions time (sec)

sig

sig

(14) unsat 9 1.512(15) unsat 30 26.967(16) unsat 30 25.271(17) sat 32 163.537

sig

hsig

(14) sat 9 2.488(15) sat 30 7.141(16) sat 30 7.410(17) sat 32 8.170

soft

plus

soft

plus


soft

plus

relu

(14) sat 9 2.388(15) unsat 30 5.956(16) sat 30 6.483(17) sat 32 7.566

tanh

tanh


tanh

htan

h

(14) sat 9 0.578(15) unsat 30 6.425(16) unsat 30 6.512(17) sat 32 7.723

ANNs are translated to SMT assertions using the algorithmpresented in Section 5. The sig function was approximatedwith Algorithm 1 on the interval from -6 to 6, as well asthe tanh and softplus on the interval from -4 to 4. With aprecision of 0.008 in both cases.

As illustrated in Figure 6, in addition to the ANN model,boundary conditions for the model inputs and a property tobe checked are required. For the examples examined thisis obvious. Since Boolean expressions are considered here,the inputs can only assign to the values 1 or 0. As propertywe choose the Boolean expression itself. Thus, the checkperformed is an equivalence check; the SMT solver com-pletely checks the learned mapping.

For this purpose we invert the Boolean expression and addit as assertion to the SMT instance. If the solver finds asatisfying assignment, the check returns satisfied (sat) asresult. In this case the mapping is not perfect. The SMTsolver has found an assignment for which the Boolean ex-pression is not matched and the solver specifies it. If thereis no such assignment, the result is unsatisfied (unsat), andthe mapping is proven to be complete for the given modeland under the approximations for the SMT representation.Since the ANN accurately models the expressions learnedon the Boolean functions, this allows an analysis of thetranslation algorithm and the approximations it contains.

Table 2 shows the results we calculated with the Microsoftz3 solver [23]. The table contains section by section theresults for all expressions for different combinations of ac-tivation functions. The left function in the first column isthe function with which the model was trained. The rightfunction in the first column indicates which function wasused in the assertion representation.

For the Boolean expression from Eq. (17) it turns out thatthe representation that we have with Algorithm 1 for sig,softmax and tanh is not sufficient to capture the expectedperfect mapping. In this case the precision in Algorithm1 would have to be increased to get the correct result. Theparticularly good results we achieved for the expressions inEq. (15) and Eq. (16) using the htanh function in trainingunder tanh are also proven by the solver. As expected, theruntime of the solver for the relu, hsig and htanh is signifi-cantly shorter than for the functions softplus, sig and tanh,which are much more complex to evaluate.

6.3 Robustness Analysis

For the robustness analysis, we use another simple clas-sification problem. Random points are calculated with amultivariate Gaussian distribution N (~µ,κ), where~µ is thecenter of the distribution and κ is the co-variance matrix.The ANN should estimate the distance from the center~µ ofthe distribution. Five discrete distance classes are definedfor this purpose. A point always belongs to the next smallerclass, for example a point that has a geometric distance of3.5 LU (Length Units) to the center of the distribution isclassified in distance class 4. A point in the center of thedistribution belongs to the distance class 1. All points witha distance larger than 4 belong to class 5. An exemplary


23

6

8

10

12

14

16

18sig softplus tanh

6

8

10

12

14

16

18sig→ hsig softplus→ relu tanh→ htanh

5 10 156

8

10

12

14

16

18hsig

5 10 15

relu

5 10 15

htanh

Figure 10 Visualization of the ANN output for different training scenarios. Well classified examples are depicted in blue(grayscale print; dark gray) while miss classified examples are depicted in orange (grayscale print; light gray). Blackcircles mark the defined distance classes.

scenario for which a comparable model could be used isthe determination of the distance from the measured val-ues of two distance sensors of an autonomous vehicle.

A network size with 20 neurons each in two hidden lay-ers was used for the following experiments. Since this is amulticlass problem, the model loss was calculated by thecategorical cross entropy and the model adjustment wasdone with a Nesterov momentum enabled Adam optimizer[25, 26]. For the example all data was drawn from a ran-

dom distribution. Overfitting was prevented by an earlystopping mechanism that monitors the performance of themodel on an independent test set. The achieved model ac-curacies are depicted in Table 3.

We have trained the described ANN for all presented acti-vation functions and their approximations. In Figure 10 theresult of the ANN on the test dataset is evaluated. Greendata points mean that the ANN places the data point in thecorrect class, while the values in red are wrongly classi-


24

http://www.deeplearningbook.org

http://www.deeplearningbook.org

Table 3 Training results evaluated on test data set.

act apx Lact Lapx accact accapx

sig hsig 0.1809 0.5303 95.36 77.99softplus relu 0.1648 4.5480 93.99 36.39tanh htanh 0.2191 0.5842 92.83 79.07hsig - 0.1674 - 93.63 -relu - 0.1343 - 94.67 -htanh - 0.2312 - 91.25 -

fied by the ANN. According to the expectation, the trainedANNs do not reach 100 % accuracy (see Table 3). Espe-cially at the transition between two distance classes, it ismore likely that wrong decisions are made. Furthermore,the figure contains an investigation which is comparable tothat from Section 6.1. The ANN was trained on an exactactivation function, while the approximated function wasused for evaluation. For these three cases the result gen-erally deteriorates. However, the graphical evaluation alsoshows that for the approximations with hsig or htanh func-tions there is an improvement in the classification of dis-tance class 1. In the scope of a future work, a safety prop-erty will be investigated. For example a collision detectorwhich requires zero incorrect classifications for points indistance class 1.

7 Conclusion

Besides the case studies in the previous section, a contri-bution of this work is a generalized interface to translateANNs into a graph structure. Having this graph, the trans-lation into a symbolic problem is straightforward; with thecurrent state of our tool, almost all typical network topolo-gies can be translated. However, the verification of anANN remains limited by the runtime complexity.

Our case study has clearly shown that the approximationof softplus correlations by relu functions is of limited use.In contrast, approximations for sigmoid and tanh activationfunctions can even improve the result in individual cases.The proof of robust properties can be made with certainrestrictions, only. Our future work will focus on these lim-itations and try to determine them automatically.

8 References

[1] S. I. Gallant, “Perceptron-based learning algorithms,”IEEE Transactions on neural networks, vol. 1, no. 2,pp. 179–191, 1990.

[2] I. Goodfellow, Y. Bengio, and A. Courville,Deep Learning. MIT Press, 2016, http://www.deeplearningbook.org.

[3] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke,P. Nguyen, T. N. Sainath et al., “Deep neural net-works for acoustic modeling in speech recognition:

The shared views of four research groups,” IEEE Sig-nal processing magazine, vol. 29, no. 6, pp. 82–97,2012.

[4] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Im-agenet classification with deep convolutional neuralnetworks,” in Advances in neural information pro-cessing systems, 2012, pp. 1097–1105.

[5] M. Bojarski, D. Del Testa, D. Dworakowski,B. Firner, B. Flepp, P. Goyal, L. D. Jackel,M. Monfort, U. Muller, J. Zhang et al., “End toend learning for self-driving cars,” arXiv preprintarXiv:1604.07316, 2016.

[6] K. D. Julian, J. Lopez, J. S. Brush, M. P. Owen,and M. J. Kochenderfer, “Policy compression for air-craft collision avoidance systems,” in Digital Avion-ics Systems Conference (DASC), 2016 IEEE/AIAA35th. IEEE, 2016, pp. 1–10.

[7] D. M. Hawkins, “The problem of overfitting,” Jour-nal of chemical information and computer sciences,vol. 44, no. 1, pp. 1–12, 2004.

[8] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna,D. Erhan, I. Goodfellow, and R. Fergus, “Intrigu-ing properties of neural networks,” arXiv preprintarXiv:1312.6199, 2013.

[9] A. Kurakin, I. Goodfellow, and S. Bengio, “Adver-sarial examples in the physical world,” arXiv preprintarXiv:1607.02533, 2016.

[10] I. Goodfellow, J. Shlens, and C. Szegedy, “Explainingand harnessing adversarial examples. arxiv preprint(2014),” 2001.

[11] L. Pulina and A. Tacchella, “An abstraction-refinement approach to verification of artificial neuralnetworks,” in International Conference on ComputerAided Verification. Springer, 2010, pp. 243–257.

[12] L. Pulina and A. Tacchella, “Challenging SMTsolvers to verify neural networks,” AI Communica-tions, vol. 25, no. 2, pp. 117–135, 2012.

[13] K. Scheibler, L. Winterer, R. Wimmer, and B. Becker,“Towards verification of artificial neural networks.”in MBMV, 2015, pp. 30–40.

[14] G. Katz, C. Barrett, D. L. Dill, K. Julian, andM. J. Kochenderfer, “Reluplex: An efficient SMTsolver for verifying deep neural networks,” in Inter-national Conference on Computer Aided Verification.Springer, 2017, pp. 97–117.

[15] G. Katz, C. Barrett, D. L. Dill, K. Julian, and M. J.Kochenderfer, “Towards proving the adversarial ro-bustness of deep neural networks,” arXiv preprintarXiv:1709.02802, 2017.

[16] V. Nair and G. E. Hinton, “Rectified linear units im-prove restricted Boltzmann machines,” in Proceed-ings of the 27th international conference on machinelearning (ICML-10), 2010, pp. 807–814.

[17] R. Hecht-Nielsen, “Theory of the backpropagationneural network,” Neural networks for perception, pp.65–93, 1992.


25

https://keras.io

[18] Z. Yanling, D. Bimin, and W. Zhanrong, “Analysisand study of perceptron to solve xor problem,” in Au-tonomous Decentralized System, 2002. The 2nd Inter-national Workshop on. IEEE, 2002, pp. 168–173.

[19] H. W. Johnson, M. Graham et al., High-speed digitaldesign: a handbook of black magic. Prentice HallUpper Saddle River, NJ, 1993, vol. 1.

[20] F. Chollet et al., “Keras,” https://keras.io, 2015.[21] M. Abadi et al., “Tensorflow: a system for large-scale

machine learning.” in OSDI, vol. 16, 2016, pp. 265–283.

[22] G. Rossum, “Python reference manual,” 1995.[23] L. De Moura and N. Bjørner, “Z3: An efficient SMT

solver,” in International conference on Tools and Al-gorithms for the Construction and Analysis of Sys-tems. Springer, 2008, pp. 337–340.

[24] M. D. Zeiler, “Adadelta: an adaptive learning ratemethod,” arXiv preprint arXiv:1212.5701, 2012.

[25] I. Sutskever, J. Martens, G. E. Dahl, and G. E. Hinton,“On the importance of initialization and momentumin deep learning.” ICML (3), vol. 28, no. 1139-1147,p. 5, 2013.

[26] T. Dozat, “Incorporating nesterov momentum intoadam.(2016),” Citado na, p. 53, 2016.


26

Self-Explaining Digital Systems – Some Technical StepsGoerschwin Fey1 Rolf Drechsler2,3

1Hamburg University of Technology, 21071 Hamburg2University of Bremen, 28359 Bremen3DFKI, 28359 Bremen

AbstractToday’s increasingly complex adaptable and autonomous systems are hard to design and difficult to use. Partly this is dueto problems in understanding why a system executes certain actions. We propose to extend digital systems such that theycan explain their actions to users and designers. We formalize this as self-explanation and show how to implement andverify a self-explaining system. A robot controller serves as proof-of-concept for self-explanation.

1 Introduction

Digital systems continuously increase in their complexitydue to integration of various new features. Systems han-dle failures and have complex decision mechanisms foradaptability and autonomy. Understanding why a systemperforms certain actions becomes more and more difficultfor users. Also designers have to cope with the complex-ity while developing the system or parts of it. The maindifficulties are the inaccessibility of the inner logic of thedigital system or a lack in understanding all the details. Anexplanation for actions executed by a digital system unveilsthe reasons for these actions and, by this, can serve variouspurposes.From the outside a user may be puzzled why a technicaldevice performs a certain action, e.g., “why does the traf-fic light turn red?” In simple cases the user will know thereason, e.g., “a pedestrian pushed the button, so pedestri-ans get green light, cars get red light”. In more complexcases, explanations for actions may not as easily be ac-cessible. When the digital system that controls the largertechnical device provides an explanation, the user can un-derstand why something happens. This raises the user’sconfidence in the correct behavior. The explanation for ac-tions required in this case must refer to external input tothe system, e.g., through sensors, and to an abstraction ofthe internal state that is understandable for a user.Also designers of digital systems can benefit from explana-tions. A typical design task is debugging where a designerhas to find the reason for certain actions executed by a dig-ital system. Depending on the current design task a de-signer may use the same explanations that help users. Ad-ditionally, more detailed explanations, e.g., justifying dataexchange between functional units may be useful. Thus,debugging and development are supported by explanationsgiving simple access points for a designer justifying thesystem’s execution paths. At design time a designer canuse explanations to understand the relation between thespecification and the implementation.Correctness of the system is validated through explana-tions if these explanations provide an alternative view thatjustifies the actual output. For in-field operation explana-tions may even be exploited for monitoring as a side-checkthat validates the actual execution of the system to detectfailures and unexpected usage. In particular, problems are

detected earlier when explanations cannot be generated,are not well-formed, or are not consistent with respect tothe actual behavior.Given a digital system the question is how to provide an ex-planation for observable actions online. While on first sightthis mainly concerns functional aspects also non-functionalaspects like actual power consumption or response time ofthe system deserve explanations.During online operation either the system itself or somededicated additional entity must provide the explanations.This incurs a cost, e.g., for storing historical data that ex-plains and, by this, also justifies current and future actions.This overhead must be kept as low as possible.A non-trivial challenge is to provide concise explanationsin a cost-efficient way. While some actions of a systemmay have very simple explanations, e.g., “the power-onbutton has been pressed”, other actions may require a deepunderstanding of the system, e.g., “when the distance to anenergy source is large and the battery level is low, we saveenergy by reducing light as well as speed and move to-wards the energy source”. Such an explanation may in turnrequire knowledge about what an energy source is, whatthresholds are used, and how the system detects where thenext energy source may be found.Our contributions are the following:

• We formalize explanations and define what a self-explaining system is. We explain how to verifywhether a system is self-explaining.

• We provide a technical solution for explanations onthe functional level and explain how to automaticallyinfer and to extend explanations to non-functional as-pects.

• We consider a robot controller implemented at theregister transfer level in Verilog as a case study.

The paper is structured as follows: While there is no di-rectly related work, Section 2 considers the aspect of ex-planation in other areas. Section 3 formalizes explanations,defines a self-explaining system, its implementation andverification. Section 4 studies a self-explaining controllerfor an autonomous robot and explains how explanationsmay automatically be inferred. Section 5 draws conclu-sions.


27

2 Related Work

The concept of self-explaining digital systems is new butrelated to explanation as understood in other domains.Thus, there is no tightly related work. But various com-munities are interested in a deeper understanding of sys-tems, implementations, or algorithms for different reasonsas discussed in the following.Causation has a long history in philosophy where [15] isa more recent approach that relates events and their causesin chains such that one event can cause a next one. Of-ten underlying hidden relations make this simple approachcontroversial. A rigorous mathematical approach insteadcan use statistical models to cope with non-understood aswell as truly non-deterministic dependencies [26]. Artifi-cial intelligence particularly in the form of artificial neu-ral networks made a significant progress in the recent pastmodeling such relations. However, given an artificial neu-ral network it is not understandable how it internally pro-cesses data, e.g., what kind of features from the data sam-ples are used or extracted, how they are represented etc.First approaches to reconstruct this information in the in-put space have been proposed [9, 21].Decision procedures are a class of very complex algo-rithms producing results needed to formally certify the in-tegrity of systems. The pairing of complexity and certifi-cation stimulated the search for understanding the verdictprovided by a decision procedure. Typically, this verdicteither yields a feasible solution to some task, e.g., a sat-isfying assignment in case of Boolean satisfiability (SAT)solver, or denies the existence of any solution at all, e.g.,unsatisfiability in case of SAT solving. A feasible solu-tion can easily be checked. Understanding why some taskcannot be solved is more difficult. Proofs [10, 30], unsatis-fiable cores [25] or Craig-interpolants [11] provide naturalexplanations.Understanding complex programs is a tedious task requir-ing tool support [29]. One example is the analysis of data-flow in programs and of root causes for certain output.Static [28] and dynamic [13] slicing show how specific datahas been produced by a program. Dynamic dependencygraphs track the behavior, e.g., to extract formal proper-ties [19].Debugging circuits is hard due to the lack of observabilityinto a chip. Trace buffers provide an opportunity to recordinternal signals [5]. The careful selection of signals [18]and their processing allows to reconstruct longer traces.Coupling with software extensions allows to much moreaccurately pin point time windows for recording [16].Verification requires a deep understanding of a system’sfunctionality. Model checking is a well established and au-tomated approach for formal verification. Typically, logiclanguages like Linear Temporal Logic (LTL), ComputationTree Logic (CTL), or System Verilog Assertions (SVA) areused to express properties that are then verified. Theseproperties summarize the functionality of a design in adifferent way and thus explain the behavior. Verificationmethodology [2, 1] ensures that properties capture an ab-straction rather than the technical details of an implemen-tation.

Beyond pure design time verification is the idea of proofcarrying code to allow for a simplified online verificationbefore execution [24].Self-awareness of computing systems [12] on various lev-els has been proposed as a concept to improve onlineadaption and optimization. Application areas range fromhardware level to coordination of production processes,e.g., [23, 27]. The information extracted for self-awarenessrelates to explanation usually focused towards a specificoptimization goal.While all these aspects relate to explanation, self-explanation has been rarely discussed. For organic com-puting, self-explanation has been postulated as a usefulconcept for increasing acceptance by users [22]. Thehuman-oriented aspect has intensively been studied inintelligent human computer interfaces and support sys-tems [20]. Self-explanation has also been proposed forsoftware systems although limited to the narrow domain ofagent-based software [7] and mainly been studied in theform of ontologies for information retrieval [8]. Expertsystems as one very relevant domain in artificial intelli-gence formalize and reason on knowledge within a specificcontext with the goal to diagnose, control, and/or explain.Particularly, real-time expert systems have been proposed,e.g., for fault tolerance [17]. Aspects like online-reasoningon formalized knowledge have been considered in this do-main.The overview in [6] introduced an abstract concept for self-explanation and also discusses in-field verification and se-curity under reconfiguration. However, that paper lacks thetechnical details on self-explanation as it does neither pro-vide a formalization nor an implementation approach nor acase-study.This brief overview of very diverse works in several fieldsshows that understanding a system has a long tradition andis extremely important. Recent advances in autonomy andcomplexity reinforce this demand. In contrast to previouswork, we show how to turn a given digital system into aself-explaining system.

3 Self-Explanation

Figure 1 gives a high-level view for self-explanation asproposed here. The digital system is enhanced by a layerfor self-explanation that holds a – potentially abstracted –model of the system. Any action executed by the systemat a certain point in time is an event (bold black arrows inthe figure). The explanation layer stores events and theirimmediate causes as an explanation and provides a uniquetag to the system (dotted black arrows). While processingdata, the system relates follow-up events to previous onesbased on these tags (blue zig-zag arrows). Besides events,references to the specification can provide causes for ac-tions. The user or designer may retrieve an explanation forevents observable at the output of the system as a causeeffect chain (green dots connected by arrows). This causeeffect chain only refers to input provided to the system, the– abstracted – system model, and the specification.


28

(abstract) system model + requirementsSelf-Explanation

Digital system

User, designer

Figure 1 Approach

In the following we formalize self-explanation, providean approach for implementation and verification. We alsopropose a conceptual framework that uses different layersfor making explanations more digestible for designers andusers, respectively.

3.1 Formalizing ExplanationsWe consider explanations in terms of cause-effect relation-ships. Before defining explanations we describe our systemmodel. The system is represented by a set of variables Vcomposed of disjoint sets of input variables I, output vari-ables O, and internal variables. A variable is mapped to avalue at any time while the system executes.This system model is quite general. For a digital systema variable may correspond to a variable in software or to asignal in (digital) hardware. For a cyber-physical system avariable may also represent the activation of an actuator ora message sent over the network.Based on this system model we introduce our notion of ac-tions, events, causes, and explanations to formalize themafterwards. An action of a system fixes a subset of vari-ables to certain values1. An observable action fixes ob-servable output values of the system. An input action fixesinput variables that are not controlled by the system, but bythe environment. An action executed at a specific point intime by the running system is an event. We assume that aset of requirements is available for the system from a pre-cise specification. A cause is an event or a requirement. Anexplanation for an event consists of one or more causes.These terms now need more formal definitions to reasonabout explanations. An action assigns values to a subset ofeither I, O or V/(O∪ I) of the variables introduced above.We define an ordered set of actions A with i(a) for a ∈Aproviding the unique index of a, a set of requirements Rand the set of explanations E ⊆A ×N×2R×2A ×N|A |.An explanation e = (a, t,R,A,T ) ∈ E relates the action awith unique tag t, i.e., the event (a, t) to its causes. The tag t

1An extension of our formalism could consider more complex actionsthat include a certain series of assignments over time, e.g., to first send anaddress and afterwards data over a communication channel. However, forsimplicity we assume here that an appropriate abstraction layer is avail-able. Nonetheless, multiple valuations of the variables may be associatedto the same action, e.g., the action “moving towards front left” may ab-stract from the radius of the curve followed by a system

may be thought of as the value of a system-wide wall-clocktime when executing the action. However, such a strongnotion of timing is not mandatory. Having the same tag fora particular action occurring at most once is sufficient forour purpose and is easier to implement in an asynchronousdistributed system. The vector T in an explanation relatesall actions in A to their unique tags using the index functioni(a) such that a ∈ A is related to the event (a,Ti(a)) whereTj denotes the jth element of vector T . Since A ⊆ A therelation |A| ≤ |T | holds, so unused tags in T are simply dis-regarded. Technically, the reference to prior events directlyrefers to their explanations. Note, that there may be multi-ple justifications for the same action, e.g., the light may beturned off because there is sufficient ambient light or be-cause the battery is low. We require such ambiguities to beresolved during run time based on the actual implementa-tion of the system.Lewis [15] requires for counterfactual dependence of anevent e on its cause c that c→ e and ¬c→¬e. However,an event is identified with precisely the actual occurrenceof this event. There may be alternative ways to cause asimilar event, but the actual event e was precisely due tothe cause c. Consider the event that “the window was bro-ken by a stone thrown by Joe”. The window may havealternatively been broken by a ball thrown by Joe, but thiswould have been a different “window broken” event. Lewisachieves this precision by associating the actual event ewith a proposition O(e) that is true iff e occurs and falseotherwise. These propositions allow to abstract from theimprecise natural language. Here we achieve this precisionby adding tags to actions.Lewis [15] defines causation as a transitive relationshipwhere the cause of an event is an event itself that has itsown causes. Similarly, we go from an event to the causesand from these to their causes until reaching requirementsor inputs of the system.

Definition 1 For an explanation e = (a, t,R,A,T ), the im-mediate set of explanations is given by E(e) = {e′ =(a′, t ′,R′,A′,T ′) ∈ E |a′ ∈ A and t ′ = Ti(a′)}.

Definition 2 We define the full set of explanations E∗(e) asthe transitive extension of E(e) with respect to the causingevents, i.e.,

if e′ = (a′, t ′,R′,A′,T ′) ∈ E∗(e) andthere exists e′′ = (a′′, t ′′,R′′,A′′,T ′′) ∈ E

with a′′ ∈ A′ and t ′′ = T ′i(a′′),then e′′ ∈ E∗(e).

Now we define well-formed explanations that provide aunique explanation for any action and must ultimately beexplained by input data and requirements only:

Definition 3 A set of explanations E is well-formed, iff

1. for any e = (a, t,R,A,T ) ∈ E there does not exist e′ =(a, t,R′,A′,T ′) ∈ E∗(e) with (R,A,T ) 6= (R′,A′,T ′),

2. for any e ∈ E if e′ = (a′, t ′,R′,A′,T ′) ∈ E∗(e) thenfor any a′′ ∈ A′/A′↓I where A′↓I is the set of ac-tions in A′ that fix values of inputs I there exists(a′′, t ′′,R′′,A′′,T ′′) ∈ E∗(e).


29

FU1

EU1

FU2

ac

t

Figure 2 Implementation

Note, that our notation is similar to classical message-based events for formalizing asynchronous distributed sys-tems, e.g., used in the seminal work of Lamport [14] thatexplains how to deduce a system-wide wall-clock time. Animportant difference is, however, that in our case the exe-cution of the system perfectly describes the order of events.An explanation then captures this order without additionalmechanisms for synchronization. The only requirement isthe association of an action with a unique tag to form anevent, i.e., any action occurs at most once with a particulartag.Our formalism provides the basis for extending an actualdigital system to provide explanations for each observableaction. The sets of variables, actions, requirements, andexplanations are freely defined. This leaves freedom to de-cide on the granularity of explanations available during runtime, e.g., whether an action only captures the driving di-rection of a robot or the precise values of motor controlsignals.The set of observable actions must be derived at first. Themethodology must ensure that for each possible systemoutput there is a related action.

Definition 4 A set of observable actions is complete withrespect to a given system iff for any observable output ofthe system there exists a related observable action.

Definition 5 A set of explanations is complete iff it is well-formed and explains all observable actions.

Definition 6 A digital system is self-explaining iff it has acomplete set of observable actions and creates a completeset of explanations.

3.2 ImplementationPractically, explanations are produced by adding appro-priate statements to the design description. To create thecause-effect chain, we think of functional units that areconnected to each other. A functional unit may be a hard-ware module or a software function. To produce explana-tions for the actions, each unit records the actions and theirexplanations from preceding units together with the input.By this, data being processed can always be related to itscauses, likewise actions triggered by that data can be asso-ciated to their causes.In the following we describe a concrete approach to im-plement a self-explaining digital system. Functional unitsderive causes for their actions. We associate an explana-tion unit for storage, reference, and usage of explanationsto each functional unit. Whenever a functional unit exe-cutes an action, the cause for that action is forwarded to

Lightsensor (ls)

Active wheel (wl) Passive wheel

Push-button (pb)+ Microphone (mi)

left (l) right (r)

front (f)

back (b)

fllf

frlr

Figure 3 Robot

the explanation unit. The explanation unit then providesunique tags for the action to form an event, merges it withthe cause, and stores the resulting explanation. Other func-tional units query the explanation unit to associate incom-ing data with an event and its explanation. This informa-tion must then be passed jointly while processing the datato provide the causes for an action. Figure 2 illustrates this.Functional unit FU1 executes an action a passed to func-tional unit FU2. The cause c of a is stored in explanationunit EU1 that provides a unique tag t. FU2 refers to theevent (a, t) to derive causes for its own actions.For this step we rely on the designer to enhance the imple-mentation with functionality to pass causes and drive ex-planation units by adding appropriate code. The designeralso decides whether actions are defined in terms of exist-ing variables of the design or whether new variables areintroduced to allow for abstraction.

4 Case Study

We apply our approach to a small robot controller that ex-plains actions executed with the controlled robot. Figure 3shows an abstract view of the actual robot. The robot haswheels on the left and on the right side, each equipped witha motor that is commanded by the robot controller. Thepassive wheel on the back turns freely such that by com-manding the two motors the robot controller can steer andmove the robot. The main sensors of the robot are light-sensors and microphones on the four sides as well as eightpush-buttons at its corners to detect collisions.The specification in Table 1 describes the functionality.The robot controller moves the robot towards the noise de-tected by the microphones as long as the power levels in-dicated by the battery are sufficient. When power gets low,the controller steers towards the light detected by the light-sensors. Upon a collision detected by a push-button, therobot turns away from that button’s contact point.The four boxes shown in Figure 4 implement the robotcontroller in Verilog modules. Thus, in this case a func-tional unit directly corresponds to a Verilog module. Sen-sors and battery provide input data to the controller thatprovides output to the motors. The battery state directlyimpacts the motor speed.


30

Table 1 Specification

No. contentR0 There are three battery levels: strong,

medium, lowR1 If battery level is strong, move towards

noise.R2 Unless battery level is strong, move to-

wards light.R3 If battery level is low, use only half speed.R4 If push-button is pressed, move towards

other direction overriding requirements R0to R3.

Main

Motor

Power

Actuators

Sensors

SensorBattery

Figure 4 Modules of the robot controller

4.1 Adding ExplanationsWe consider cause-effect chains on the unit-level whereactions fix the output values. Each module is equippedwith an extra output that provides all causes for an actionto form an explanation for each event. All kinds of actionsare known in advance, so their dependence on causes ishard-coded into the system. Each module explains all itsoutput data. The causes for the output data of one moduleare generated by preceding module’s actions and require-ments, so the causes explaining an action are encoded as abit string for referencing them.The explanations of the robot controller already make anabstraction from actual data. For example, instead of ex-plaining the precise speed of the two motors, this is ab-stracted to one of the driving directions “straight”, “for-ward left”, “forward right”, or “turn”.To have reproducible explanations and record their depen-dence, we equip every module with a separate explanationmodule. The explanation module stores explanations andprovides unique tags for events. An explanation module es-sentially is a memory that stores the explanation which is abit vector encoding the causes for an action. The memoryaddress serves as the unique tag for the event associated tothe current action of the respective module. By this, theunique tag also serves as reference to the explanation forthe event. This tag is accessible for subsequent modules toproduce their explanations. Uniqueness of these tags forevents is subject to the limited size of the memory. Byusing a simple wrap-around arithmetic for the memory ad-dresses, the size of the memory in the explanation module

Table 2 Implementation sizes

Column “entries”: number of addresses in explanation unitsColumn “#state bits” and “#gates”: size of the implementation

entries #state bits #gatesno explanation - 113 5,692with explanation 4 437 8,643with explanation 32 2,250 21,714with explanation 256 16,605 123,572

decides on the length of the history that can be recorded.For example, the main module’s explanations always de-pend on the actions of the sensor module and the powermodule together with the respective requirements. Receiv-ing data from the power module or the sensor module cor-responds to an action of these modules associated to anexplanation with a unique tag. The main module stores theunique tags for the explanations to generate the explanationfor its own action. This totals to 20 bits; in our implemen-tation we used 24 bits to conveniently represent directionand sensors using hexadecimal digits. Explanations for theother modules have different lengths depending on theirneeds.

4.2 ResultsFigure 5 shows an excerpt of the recorded explanationswhere nodes denote events and edges lead from a causeto an event. In this excerpt sensor input and power-state ultimately explain driving direction and speed. Node“Main: 21” gives an explanation with unique tag “21” forthe main module. According to the powerstate medium(node “Power: 02”) and Requirement R2 the robot goes“straight” to the lightsensors “ls”. This is one reason forthe observable actions in nodes “Motor_left: 21” and “Mo-tor_right: 21”. The other reason is the current powerstate.Figure 6 shows a similar explanation, but the powerstatechanged from low to medium after deciding direction andspeed in the main module and before adjusting the speedin the motor driver. Whether this is wanted or not dependson the implementation. Definitely, this explanation givessome insight into the behavior.The original design has 257 lines of code, extensions forself-explanation require 119 lines, and the explanation unithas 28 lines. Table 2 gives an impression about the designand the cost for explanation. The numbers of state bits andgates are shown for four configurations: the plain robotcontroller without explanation and with explanation withsizes of 4, 32, and 256 entries in the memories of the ex-planation modules. In the table these memories are countedas state bits plus the decoding and encoding logic that addsto the gates in the circuit. For memories with 256 entriesabout 2 KByte of memory are required (the numbers in thetable count bits). Note, that the encoding of explanationswas not optimized for size. The main aims were a sim-plified implementation and easily separable reasons in ahexadecimal representation. A rather typical implementa-tion of the controller would use microcontrollers connectedover buses instead of pure Verilog. In that case a 2 KByteoverhead for explanation would be rather small.


31

Main: 21Act: straight ls,powerNotStrong

Motor_left: 21Motor_right: 21

Power: 02Act: medium

Sensors: 17Act: changed: ls

Figure 5 First excerpt from the explanations

Sensors: 1fAct: changed: ls

Main: 37Act: turn ls,powerNotStrong

Motor_left: 37 Motor_right: 37

Power: 03Act: low

Power: 04Act: medium

Figure 6 Second excerpt from the explanations

Table 3 Wrap around in tags for a trace of 10,000 cycles

Column “entries”: number of addresses in explanation unitsOther columns: number of wrap arounds for unique tags of

modulesentries Main Motor_left Motor_right Power Sensor

4 269 252 252 15 15832 32 30 30 1 18

256 3 2 2 0 1

The number of entries in the memories decides for howlong an explanation can be traced back before the uniquetags for explanations wrap to zero again, i.e., are not uniqueanymore. This restricts the self-explanation to recent his-tory. Table 3 shows how many times the tags were set backto zero for the different explanation units in a run of 10,000cycles. The number of wrap arounds per module are differ-ent as the number of events also differs between the mod-ules. Some of the events of one module do not necessar-ily trigger a follow-up event in a subsequent module, e.g.,values of the microphones are only relevant, if the robotcurrently follows the sound. With 256 entries the length ofthe history increases to approximately 3,300 cycles for themain module having 3 wrap arounds.Obviously, optimizations in the size required for explana-tions are possible, e.g., by adjusting the number of entriesof explanation units per module or by encoding explana-tions in fewer bits. But this is not the scope of this paperwhich focuses on the concept of self-explanation.

4.3 Reasoning about ExplanationsHaving the design enhanced with explanations immedi-ately supports a user or a designer in understanding thedesign’s actions. Additionally, consistency of the expla-nations and the related actions is an interesting question.Due to the abstraction, e.g., in case of the driving directionit may not be fully clear what kind of actions precisely cor-respond to an explanation. We give some examples howto clarify this using model checking. We assume that thereader is familiar with model checking [4] so we do notprovide any details for this process.Considering the main module some facts can be analyzedby model checking, e.g., if the explanation of the mainmodule says a certain action means moving “straight”, thisshould imply that both motors are commanded to move inthe same direction with the same speed. Indeed the sim-

ple robot controller always moves forward at full speed. InCTL this is proven using the formula:

AG( exp[23:20] =straight→( speed_left[7:0]=255 ∧ speed_right[7:0]=255∧ direction_right=fwd ∧ direction_left=fwd) )

The 24-bit vector “exp” refers to the explanation of themain module where only the bits corresponding to thedescription of the action are selected; the 8-bit vectors“speed_right” and “speed_left” correspond to the speedfor the left and right motor, respectively; likewise the“direction”-variables.Similar facts can be formalized for other situations. Usinga more expressive language like SVA, properties may beformulated in an even nicer way, e.g., using expressionsover bit-vectors. The underlying concepts for explanationremain the same.

4.4 ExtensionsCurrently, an action is defined to be a variable assignment.In practice, more complex actions may be of interest, e.g.,to perform a burst access to a communication resource.Appropriate extensions are possible by allowing for a moregeneral specification of an action, e.g., in terms of a for-mal property language that describes conditional sequen-tial traces.We propose completeness and well-formedness as basiccriteria for self-explanation. Further properties of interestare aspects like determinism or consistency with an envi-ronment model. The systems considered here are limited togenerating explanations for themselves and out of the avail-able view onto the environment which is largely unknownto the system. If the system itself incorporates a more de-tailed model of the environment, the expected impact onthe environment can also be incorporated into the expla-nations. This provides an even deeper insight for the ob-server of the system and would immediately allow to judgethe consistency of explanations with the actual behavior.Potentially this serves as the basis for an autonomous diag-nosis loop.Non-functional aspects like reaction time or power con-sumption similarly require self-reflexive functionality inthe system, e.g., to determine the current processing load orcurrent sensors and a prediction on future activities. Thisagain can be seen as a model of the environment within thedigital system.


32

4.5 Automated InferenceManually enhancing a design for self-explanation may betime consuming. Thus, further automation is useful. Tech-nically, one option to automatically derive explanations isthe use of model checking engines. Given a precise spec-ification of an observable action in terms of a formal lan-guage, model checking can derive all possible ways to exe-cute this observable action. Logic queries [3] may serve asa monolithic natural tool to identify causes. Deriving thesecauses in terms of inputs of a functional unit and then con-tinuing to preceding functional units allows to automati-cally derive well-formed explanations. Completeness mustbe ensured by formalizing all observable actions properly.

5 Conclusions

Future complex systems driving real-world processes mustbe self-explaining. Naturally, our proposal is just one tech-nical solution that cannot consider many of the alternativeways to create a self-explaining system.Our paper provides a formal notion of self-explanation anda proof-of-concept realization. We studied a robot con-troller as a use case. We gave an idea on how to automat-ically provide self-explanations. The extension to reactivesystems in general and to systems where new actions maybe defined on-the-fly remains for future work.

6 Literature

[1] P. Basu, S. Das, A. Banerjee, P. Dasgupta, P.P.Chakrabarti, C.R. Mohan, L. Fix, and R. Armoni.Design-intent coverage: A new paradigm for for-mal property verification. IEEE Trans. on CAD,25(10):1922 –1934, 2006.

[2] Jörg Bormann. Complete Functional Verification.PhD thesis, University of Kaiserslautern, 2009. En-glish translation 2017.

[3] William Chan. Temporal-logic queries. In ComputerAided Verification, volume 1855 of Lecture Notes inComputer Science, pages 450–463, 2000.

[4] Edmund M. Clarke, Orna Grumberg, and DoronPeled. Model Checking. MIT press, 01 2001.

[5] Sergej Deutsch and Krishnendu Chakrabarty. Mas-sive signal tracing using on-chip dram for in-systemsilicon debug. In Int’l Test Conf., pages 1–10, 2014.

[6] Rolf Drechsler, Christoph Lüth, Görschwin Fey, andTim Güneysu. Towards self-explaining digital sys-tems: A design methodology for the next generation.In International Verification and Security Workshop(IVSW), pages 1–6, 2018.

[7] Johannes Fähndrich, Sebastian Ahrndt, and Sahin Al-bayrak. Towards self-explaining agents. In Trends inPractical Applications of Agents and Multiagent Sys-tems, pages 147–154, 2013.

[8] Johannes Fähndrich, Sebastian Ahrndt, and Sahin Al-

bayrak. Self-explanation through semantic annota-tion: A survey. In Position Papers of the 2015 Feder-ated Conference on Computer Science and Informa-tion Systems (FedCSIS), 2015.

[9] Raphael Féraud and Fabrice Clérot. A methodologyto explain neural network classification. Neural Net-works, 15(2):237 – 246, 2002.

[10] Eugene Goldberg and Yakov Novikov. Verificationof proofs of unsatisfiability for CNF formulas. In De-sign, Automation and Test in Europe, pages 886–891,2003.

[11] B. Keng and A. Veneris. Scaling VLSI design de-bugging with interpolation. In Int’l Conf. on FormalMethods in CAD, pages 144–151, 2009.

[12] Jeffrey O. Kephart and David M. Chess. The visionof autonomic computing. Computer, 36(1):41–50,2003.

[13] Bogdan Korel and Janusz Laski. Dynamic programslicing. Information Processing Letters, 29(3):155–163, 1988.

[14] Leslie Lamport. Time, clocks, and the ordering ofevents in a distributed system. Communications ofthe ACM, 21(7):558–565, July 1978.

[15] David Lewis. Causation. Journal of Philosophy,70(17):556–567, 1973.

[16] David Lin, Eshan Singh, Clark Barrett, and SubhasishMitra. A structured approach to post-silicon valida-tion and debug using symbolic quick error detection.In Int’l Test Conf., pages 1–10, 2015.

[17] Wei Liu. Real-time fault-tolerant control systems. InCornelius T. Leondes, editor, Expert Systems, pages267–304. Academic Press, 2002.

[18] Xiao Liu and Qiang Xu. Trace signal selection forvisibility enhancement in post-silicon validation. InDesign, Automation and Test in Europe, pages 1338–1343, 2009.

[19] Jan Malburg, Tino Flenker, and Goerschwin Fey.Property mining using dynamic dependency graphs.In ASP Design Automation Conf., pages 244–250,2017.

[20] Mark T. Maybury and Wolfgang Wahlster, editors.Readings in Intelligent User Interfaces. MorganKaufmann Publishers Inc., 1998.

[21] Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Methods for interpreting and under-standing deep neural networks. Digital Signal Pro-cessing, 73:1 – 15, 2018.

[22] Christian Müller-Schloer and Sven Tomforde. Or-ganic Computing – Technical Systems for Survival inthe Real World. Birkhäuser, 2017.

[23] Mischa Möstl, Johannes Schlatow, Rolf Ernst, HenryHoffmann, Arif Merchant, and Alexander Shraer.Self-aware systems for the internet-of-things. In In-ternational Conference on Hardware/Software Code-sign and System Synthesis (CODES+ISSS), pages 1–9, 2016.


33

[24] George C. Necula and Peter Lee. Safe, untrustedagents using proof-carrying code. In Giovanni Vi-gna, editor, Mobile Agents and Security, pages 61–91.Springer Berlin Heidelberg, 1998.

[25] Yoonna Oh, Maher N. Mneimneh, Zaher S. Andraus,Karem A. Sakallah, and Igor L. Markov. AMUSE: aminimally-unsatisfiable subformula extractor. In De-sign Automation Conf., pages 518–523, 2004.

[26] Judea Pearl. Causality. Cambridge University Press,2010.

[27] Lydia C. Siafara, Hedyeh A. Kholerdi, AlekseyBratukhin, Nima Taherinejad, and Axel Jantsch.SAMBA - an architecture for adaptive cognitive con-trol of distributed cyber-physical production systemsbased on its self-awareness. Elektrotechnik und In-formationstechnik, 135(3):270–277, 2018.

[28] Mark Weiser. Program slicing. In International Con-ference on Software Engineering, pages 439–449,1981.

[29] Steven Woods and Qiang Yang. The program under-standing problem: analysis and a heuristic approach.In International Conference on Software Engineer-ing, pages 6–15, 1996.

[30] Lintao Zhang and Sharad Malik. ValidatingSAT solvers using an independent resolution-basedchecker: Practical implementations and other appli-cations. In Design, Automation and Test in Europe,pages 880–885, 2003.


34

Measuring NoC fault tolerance with performabilityJie Hou, Martin RadetzkiChair of Embedded Systems, University of Stuttgart, Germany

Abstract

With the development of integration technology, transistor density has been increased drastically. It enables microproces-sors to evolve from single-core to many-core architectures, which can satisfy the increasing demands for higher on-chipcomputing power. As the traditional bus-based communication limits performance of such systems, Network-on-Chip(NoC) was proposed as an interconnection network for them. It is highly scalable, bandwidth efficient and packet-switched network containing a certain number of routers and links. Different network topologies have been proposed andresearched in the NoC area, a widely used topology is mesh, in which redundant routes exist [1].Technology scaling increases the susceptibility to failures in the NoC’s components [2]. However, fault-tolerant routingalgorithms take advantage of redundant routes to bypass faults. Therefore, a mesh-based NoC can be classified as afault-tolerant system. Performance analysis of such systems requires taking into account the impact of faults and thelikelihood of their occurrence. This can be achieved with the concept of performability, which combines the measures ofperformance and reliability [3].Different modeling techniques can be used for combining performance and reliability analysis. Markov reward model isthe commonly used tool for performability analysis. It extends a continuous-time Markov chain (CTMC) by assigning areward to each state [4, 5, 6]. Because of a failure or a repair, a system moves from one operational state to another one,the performance level may change. Such system behavior is normally modeled by different states in a CTMC. Usually,a CTMC is described by a generator matrix, which contains information about states and transitions between states. Areward assigned to a state denotes a performance level given by the system while it is in that state. Three classes ofperformability measures are commonly used: long-term steady-state, transient, and cumulative performabilities.In our work, we apply the Markov reward model to investigate performabilities of mesh-based NoCs under fault-tolerantrouting based on two performance metrics: communication time and fault resilience. We demonstrate an applicablemethod to model a mesh-based NoC as a CTMC. As the size of meshes increases, the number of states in their corre-sponding CTMCs grows very quickly. To be able to evaluate performabilites of large-size meshes, we propose novelapproaches to computing the communication time and fault resilience quickly. The accuracy and speedup of our ap-proaches were compared with a cycle-accurate NoC simulator named wormsim [7]. Based on the experimental results,we conclude that our approaches to estimating communication time and computing fault resilience are on average 78xand 100x faster respectively comparing to wormsim. We further compare performabilities of three different routingalgorithms. Moreover, we investigate how performability develops with scaling towards larger NoCs.

References[1] W. J. Dally and B. Towles, “Route packets, not wires: On-chip interconnection networks,” in Design Automation Conference,

2001. Proceedings. IEEE, 2001, pp. 684–689.

[2] M. Radetzki, C. Feng, X. Zhao, and A. Jantsch, “Methods for fault tolerance in networks-on-chip,” ACM Computing Surveys(CSUR), vol. 46, no. 1, p. 8, 2013.

[3] J. F. Meyer, “On evaluating the performability of degradable computing systems,” IEEE Transactions on computers, no. 8, pp.720–731, 1980.

[4] K. S. Trivedi, E. C. Andrade, and F. Machida, “Combining performance and availability analysis in practice.” Advances in Com-puters, vol. 84, pp. 1–38, 2012.

[5] B. R. Haverkort, “Markovian models for performance and dependability evaluation,” Lecture notes in computer science, vol. 2090,pp. 38–83, 2001.

[6] R. Smith, K. S. Trivedi, and A. Ramesh, “Performability analysis: measures, an algorithm, and a case study,” IEEE Transactionson Computers, vol. 37, no. 4, pp. 406–417, 1988.

[7] worm_sim, “Cycle accurate noc simulator,” http://www.ece.cmu.edu/ sld/software/index.php, 2005, [Online; accessed 20-July-2017].


35

Analyse sicherheitskritischer Software für RISC-V ProzessorenAnalysis of Security Sensitive Software for RISC-V ProcessorsPeer Adelt, Bastian Koppelmann, Wolfgang Mueller, Christoph ScheyttHeinz Nixdorf Institut, Paderborn, Deutschland, {adelt,kbastian,wmueller,cscheytt}@hni.upb.de

Kurzfassung

In diesem Artikel stellen wir eine Methode zur nicht-invasiven dynamischen Speicher- und IO-Analyse mit QEMUfür sicherheitskritische eingebettete Software für die RISC-V Befehlssatzarchitektur vor. Die Implementierung basiertauf einer Erweiterung des Tiny Code Generator (TCG) des quelloffenen CPU-Emulators QEMU um die dynamischeIdentifikation von Zugriffen auf Datenspeicher sowie auf an die CPU angeschlossene IO-Geräte. Wir demonstrieren dieFunktionalität der Methode anhand eines Versuchsaufbaus, bei dem eine Schließsystemkontrolle mittels serieller UART-Schnittstelle an einen RISC-V-Prozessor angebunden ist. Dieses Szenario zeigt, dass ein unberechtigter Zugriff auf dieUART-Schnittstelle frühzeitig aufgedeckt und ein Angriff auf eine Zugangskontrolle somit endeckt werden kann.

Abstract

In this article we introduce a method for the non-invasive dynamic memory and IO analysis based on QEMU for safety-critical embedded software for the RISC-V instruction set architecture. The implementation is based on an enhancementof the Tiny Code Generator (TCG) of the open source CPU emulator QEMU by the dynamic identification of accesses todata memory as well as to IO devices connected to the CPU. We demonstrate the functionality of the method by a setupin which a door locking controller is connected to a RISC-V processor via a serial UART interface. That scenario showsthat unauthorized access to the UART interface can be detected at an early stage and an attack on an access control canthus be identified.

1 Einleitung

Der Entwurf eingebetteter Systeme ist seit jeher einherausfordernder Kompromiss zwischen Geschwindig-keit und Leistungsaufnahme. Insbesondere Internet-of-Things-Devices verfügen in der Regel über sehr begrenz-te Ressourcen wie Rechenleistung, Speichergröße undIO-Kommunikationsbandbreite, was die Entwicklung vonSoftware für sichere Anwendungen zusätzlich erschwert.Diese Einschränkungen und ihre Anwendung in einer netz-werkbasierten Umgebung machen sie zudem höchst anfäl-lig für Angriffsszenarien aller Art.Während des Softwareentwicklungsprozesses hat ein Sys-temintegrator nicht immer für alle Komponenten denQuellcode zur Verfügung, sondern ist gezwungen, Kom-ponenten von Drittanbietern zu integrieren, welche häufignur als Objektcode geliefert werden. Obwohl Drittanbieterin der Regel eine hohe Glaubwürdigkeit besitzen, bestehtimmer noch ein geringes Risiko, dass solche Komponentendie Anfälligkeit der Systeme erhöhen. Die Ursache hierfürliegt häufig in beabsichtigten oder unbeabsichtigten Fehl-funktionen, wie z.B. Stacküberläufen, Datenlecks oder un-berechtigten Zugriffen auf Peripheriegeräte, welche ohnespezielle Analysetechniken zumeist schwer bis unmöglichzu identifizieren sind.Dedizierte Werkzeuge zur dynamischen Speicher- und IO-Analyse für die RISC-V-Plattform zur Identifikation vonZugriffen auf sensitive Speicherbereiche sowie von IO-

Zugriffen sind momentan generell nicht verfügbar. Zur ge-nerellen Speicheranalyse für kompilierte RISC-V Softwa-re steht steht u.a. ObjDump aus den GNU Binary Utili-ties [5] zur Verfügung. Dies kann jedoch nur für stati-sche Analysen, wie z.B. zur Untersuchung des Speicher-layouts eines Binärprogramms eingesetzt werden. Ein wei-teres statisches Analysewerkzeug ist StackAnalyzer derFirma AbsInt GmbH [6] zur statischen Berechnung desWorst-Case-Call-Graphen und zur Bestimmung des maxi-malen Speicherbedarfs für den Stack. Dynamische Analy-sen sind mit diesen beiden Werkzeugen jedoch prinzipbe-dingt nicht möglich, da sie das Programm weder ausfüh-ren noch simulieren. Valgrind [7] unterstützt zwar prinzi-piell eine dynamische Analyse von Speicherzugriffen fürLinux-, Android, und macOS-Anwendungen. Zum einensteht es aber für RISC-V-basierte Systeme nicht zur Verfü-gung. Zum anderen unterstützt es nicht die Analyse vonBare-Metal Programmen und bietet auch keine spezielleUnterstützung zur Untersuchung sicherheitskritischer Soft-warekomponenten.Wir stellen eine Methode zur nicht-invasiven Speicher-und IO-Zugriffsanalyse für sicherheitskritische eingebette-te Software für die RISC-V ISA [1] vor. Zur Validierunghaben wir die Methode als QEMU Memory Tracer (QMT)implementiert. QMT erweitert den Tiny Code Generator(TCG) des Open Source CPU-Emulators QEMU [2] umdie Identifizierung von Lese- und Schreibzugriffen auf si-cherheitsrelevante Speicherbereiche und IO-Geräte. Unser


36

Ansatz deckt potentiell illegale Zugriffe auf sicherheits-kritische Speicherbereiche während der CPU-Emulation ineiner kontrollierten Umgebung auf, ohne dass die Soft-ware auf der endgültigen Zielhardware ausgeführt werdenmuss. Auf diese Weise können im Vorfeld vor dem end-gültigen Software-Rollout gezielte effektive Maßnahmengegen wohl-definierte Angriffsszenarien ergriffen und be-wertet werden.Die Struktur des weiteren Artikels ist wie folgt. Im Ab-schnitt 2 geben wir einen Überblick über die für unser Ana-lysewerkzeug benötigten RISC-V-Instruktionen sowie überdie zu seiner Entwicklung relevanten Komponenten. ImAbschnitt 3 beschreiben wir, wie wir QEMU für das Tra-cing von Speicher- und IO-Zugriffen sowie zur Ermittlungdes Ausführungskontextes erweitert haben. In Abschnitt 4demonstrieren wir anhand einer Fallstudie, wie QMT zureffizienten Detektion unberechtigter Zugriffe auf ein IO-Gerät eingesetzt werden kann. In Abschnitt 5 geben wireine kurze Zusammenfassung über die in diesem Artikelvorgestellten Arbeiten.

2 QEMU RISC-V Erweiterung

QEMU ist ein quelloffener effizienter CPU-Emulator aufBasis einer Just-In-Time Kompilierung [2]. Die Schlüs-selkomponente bei der Übersetzung von Targetcode inhostkompatiblen Binärcode ist der sogenannte Tiny Co-de Generator (TCG). Die Übersetzung erfolgt auf derEbene sogenannter Basisblöcke, wobei ein Block bei ei-nem Label oder nach einer Sprunginstruktion beginnt undbis zur nächsten Verzweigungsanweisung reicht. Für je-de neu angetroffene, noch nicht übersetzte Instruktion desaktuellen Basisblocks erzeugt der TCG einen interme-diären, funktional äquivalenten und plattformunabhängi-gen 3-Adresscode, der im Kontext von QEMU als TCG-Mikrocode oder Tiny Code bezeichnet wird. Ein Beispielfür die Übersetzung eines einfachen RISC-V Basisblocksist in Tabelle 1 aufgeführt. Nachdem alle Instruktionender Zielplattform (links) des aktuellen Blocks übersetztwurden, wird der entsprechend generierte TCG-Mikrocode(Mitte) optimiert und im letzten Schritt in hostkompati-blen, nativ ausführbaren Binärcode (rechts) umgewandelt.Wir haben QMT als Erweiterung des QEMU-TCGs fürdie RISC-V Befehlssatzarchitektur implementiert, die seitRelease V2.12 von QEMU unterstützt wird. RISC-V ba-siert auf einer Load-Store-Architektur, d.h. alle Zugriffeauf den Speicher erfolgen ausschließlich über Load- undStore-Anweisungen. Im Gegensatz dazu werden alle ALU-Operationen nur auf CPU-Registern und ggf. auf unmittel-bar in das Instruktionswort enkodierten Werten, nie jedochdirekt auf Speicherzellen ausgeführt.Zur Überwachung der Speicher- und IO-Zugriffe sind ne-ben den Load- und Store-Anweisungen auch alle Kon-trollflussanweisungen von Interesse, die zum Aufruf undzur Rückkehr aus Programmroutinen verwendet werden.Zur Speicher- und IO-Analyse sind daher sowohl direk-te Sprünge sowie indirekte Sprünge zu in CPU-Registerngespeicherten Adressen zu analysieren. Das Tracing desKontrollflusses ermöglicht zusätzlich neben den Load- und

Tabelle 1 Zweistufige Übersetzung eines RISC-V Basis-blocks in TCG-Mikrocode und x86-Hostcode

RISC-V ASM TCG Mikrocode x86 ASMaddi x5,x0,5 movi_i32 x5,0x5 mov 0x5,ebx

mov ebx,0x70(ebp)addi x5,x5,5000 movi_i32 t0,0x1388 mov 0x70(ebp),ebx

add_i32 x5,x5,t0 add 0x1388,ebxmov ebx,0x70(ebp)

jalr 2040000a movi_i32 t0,0x20400008 mov ebp,(esp)movi_i32 t1,0x2 mov 0x20400008,ebxmovi_i32 t2,0x6007277c mov ebx,0x4(esp)call t2,0x0,0,env,t0,t1 mov 0x2,ebxmovi_i32 PC,0x2040000a mov ebx,0x8(esp)

call 0x6007277cmov 0x2040000a,ebxmov ebx,0x8(ebp)

exit_tb 0x0 xor eax,eaxjmp 0x621dfeb4

Store-Anweisungen die Zuordnung der erfassten Speicher-zugriffe zum Ausführungskontext, was in den folgendenAbschnitten näher erläutert wird.

3 Dynamische Überwachung derSpeicher- und IO-Zugriffe sowiedes Kontrollflusses

Zur Erweiterung des QEMU Tiny Code Generatorsum eine dynamische IO- und Speicheranalyse wurdendie Übersetzungsfunktionen für alle relevanten RISC-VLoad-/Store- und Kontrollflussinstruktionen derart erwei-tert, dass neben der Erzeugung eines funktional äquiva-lenten TCG-Mikrocodes sowie des dazugehörigen x86-Hostcodes zusätzlich Aufrufe zu eigenen Helperfunk-tionen generiert werden. Helperfunktionen sind TCG-Funktionen, die es ermöglichen, den Codegenerierungs-prozess flexibel um eigene Funktionalität zu erweitern, umein Monitoring bestimmter Aktionen durchzuführen. Diefolgenden Abschnitte erläutern das Prinzip des Speicher-,IO- und Kontrollfluss-Tracing.

3.1 SpeicherzugriffeDer in der User-Level ISA-Spezifikation definierte mini-male Befehlssatz RV64I definiert eine Menge von Load-und Store-Instruktionen für verschiedene Wortbreiten zwi-schen 8 und 64 Bit sowie für vorzeichenbehaftete undnicht-vorzeichenbehaftete Daten [3].Abbildung 1 zeigt das Encoding der RISC-V-Load-Instruktion. Die Speicheradresse wird hierbei durch dasRegister rs1 und einen Offset angegeben und das Ergeb-nis in Register rd abgespeichert. Das Feld funct3 bestimmtdie Wortbreite und ob es sich um eine vorzeichenbehafteteOperation handelt.Abbildung 2 zeigt das Encoding der RISC-V-Store-Instruktion. Analog zur Load-Instruktion wird auch hierdie Speicheradresse durch das Register rs1 und ein Off-set sowie die Wortbreite und Vorzeichenbehaftung durch


37

Volume I: RISC-V User-Level ISA V2.2 19

31 20 19 15 14 12 11 7 6 0

imm[11:0] rs1 funct3 rd opcode

12 5 3 5 7o↵set[11:0] base width dest LOAD

31 25 24 20 19 15 14 12 11 7 6 0

imm[11:5] rs2 rs1 funct3 imm[4:0] opcode

7 5 5 3 5 7o↵set[11:5] src base width o↵set[4:0] STORE

Load and store instructions transfer a value between the registers and memory. Loads are encodedin the I-type format and stores are S-type. The e↵ective byte address is obtained by adding registerrs1 to the sign-extended 12-bit o↵set. Loads copy a value from memory to register rd. Stores copythe value in register rs2 to memory.

The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory,then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but thenzero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values.The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of registerrs2 to memory.

For best performance, the e↵ective address for all loads and stores should be naturally alignedfor each data type (i.e., on a four-byte boundary for 32-bit accesses, and a two-byte boundary for16-bit accesses). The base ISA supports misaligned accesses, but these might run extremely slowlydepending on the implementation. Furthermore, naturally aligned loads and stores are guaranteedto execute atomically, whereas misaligned loads and stores might not, and hence require additionalsynchronization to ensure atomicity.

Misaligned accesses are occasionally required when porting legacy code, and are essential for goodperformance on many applications when using any form of packed-SIMD extension. Our ratio-nale for supporting misaligned accesses via the regular load and store instructions is to simplifythe addition of misaligned hardware support. One option would have been to disallow misalignedaccesses in the base ISA and then provide some separate ISA support for misaligned accesses,either special instructions to help software handle misaligned accesses or a new hardware address-ing mode for misaligned accesses. Special instructions are di�cult to use, complicate the ISA,and often add new processor state (e.g., SPARC VIS align address o↵set register) or complicateaccess to existing processor state (e.g., MIPS LWL/LWR partial register writes). In addition,for loop-oriented packed-SIMD code, the extra overhead when operands are misaligned motivatessoftware to provide multiple forms of loop depending on operand alignment, which complicatescode generation and adds to loop startup overhead. New misaligned hardware addressing modestake considerable space in the instruction encoding or require very simplified addressing modes(e.g., register indirect only).

We do not mandate atomicity for misaligned accesses so simple implementations can justuse a machine trap and software handler to handle some or all misaligned accesses. If hardwaremisaligned support is provided, software can exploit this by simply using regular load and storeinstructions. Hardware can then automatically optimize accesses depending on whether runtimeaddresses are aligned.

Abbildung 1 RISC-V-Load-Instruktion

funct3 bestimmt. Das zu schreibende Datum befindet sichin Register rs2.

Volume I: RISC-V User-Level ISA V2.2 19

31 20 19 15 14 12 11 7 6 0


12 5 3 5 7o↵set[11:0] base width dest LOAD

31 25 24 20 19 15 14 12 11 7 6 0

imm[11:5] rs2 rs1 funct3 imm[4:0] opcode

7 5 5 3 5 7o↵set[11:5] src base width o↵set[4:0] STORE

Load and store instructions transfer a value between the registers and memory. Loads are encodedin the I-type format and stores are S-type. The e↵ective byte address is obtained by adding registerrs1 to the sign-extended 12-bit o↵set. Loads copy a value from memory to register rd. Stores copythe value in register rs2 to memory.

The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory,then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but thenzero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values.The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of registerrs2 to memory.

For best performance, the e↵ective address for all loads and stores should be naturally alignedfor each data type (i.e., on a four-byte boundary for 32-bit accesses, and a two-byte boundary for16-bit accesses). The base ISA supports misaligned accesses, but these might run extremely slowlydepending on the implementation. Furthermore, naturally aligned loads and stores are guaranteedto execute atomically, whereas misaligned loads and stores might not, and hence require additionalsynchronization to ensure atomicity.

Misaligned accesses are occasionally required when porting legacy code, and are essential for goodperformance on many applications when using any form of packed-SIMD extension. Our ratio-nale for supporting misaligned accesses via the regular load and store instructions is to simplifythe addition of misaligned hardware support. One option would have been to disallow misalignedaccesses in the base ISA and then provide some separate ISA support for misaligned accesses,either special instructions to help software handle misaligned accesses or a new hardware address-ing mode for misaligned accesses. Special instructions are di�cult to use, complicate the ISA,and often add new processor state (e.g., SPARC VIS align address o↵set register) or complicateaccess to existing processor state (e.g., MIPS LWL/LWR partial register writes). In addition,for loop-oriented packed-SIMD code, the extra overhead when operands are misaligned motivatessoftware to provide multiple forms of loop depending on operand alignment, which complicatescode generation and adds to loop startup overhead. New misaligned hardware addressing modestake considerable space in the instruction encoding or require very simplified addressing modes(e.g., register indirect only).

We do not mandate atomicity for misaligned accesses so simple implementations can justuse a machine trap and software handler to handle some or all misaligned accesses. If hardwaremisaligned support is provided, software can exploit this by simply using regular load and storeinstructions. Hardware can then automatically optimize accesses depending on whether runtimeaddresses are aligned.

Abbildung 2 RISC-V-Store-Instruktion

Darüber hinaus definieren die mit den Buchstaben F und Dgekennzeichneten Floating-Point-Standarderweiterungenzusätzliche Load- und Store-Instruktionen, die aufFloating-Point-Registern anstelle von normalen uni-versellen Integer-CPU-Registern arbeiten. Aufgrundihrer Ähnlichkeit zu den regulären Load- und Store-Instruktionen wurde an dieser Stelle auf eine weitereDarstellung des Encodings verzichtet.Zur Nachverfolgung aller im RISC-V-Standard definiertenSpeicherzugriffe müssen während der Ausführung derarti-ger Anweisungen die folgenden Informationen aufgezeich-net werden:

• Der Name der aktuellen Load- oder Store-Anweisung.Der Name impliziert hierbei auch die Wortbreite (8,16, 32 oder 64 Bit) sowie, ob es sich um eine vorzei-chenbehaftete Instruktion handelt.

• Die Adresse der zugegriffenen Speicherzelle im Da-tenspeicher der CPU.

• Den Inhalt der Speicherzelle im Datenspeicher derCPU, auf die mit Hilfe dieser Instruktion lesend oderschreibend zugegriffen wurde.

3.2 Zugriffe auf MMIO-GeräteIn der RISC-V-ISA sind keine speziellen Instruktionenzur Verwendung angeschlossener Ein-/Ausgabegeräte vor-gesehen. Stattdessen erfolgt die Nutzung von Peripherieausschließlich nach dem MMIO-Prinzip, wobei alle Gerä-te in den sichtbaren Speicher der CPU eingeblendet wer-den. Daher kommen bei ihrer Verwendung, wie auch beiRandom-Access-Memory-Zugriffen, ausschließlich Load-und Store-Instruktionen zum Einsatz. Für den CPU-Kernmacht es also keinen Unterschied, ob er auf RAM oder einPeripheriegerät zugreift.Entsprechendes Wissen über das Speicherlayout der Ziel-plattform vorausgesetzt kann QMT daher ebenfalls zumTracing von Zugriffen auf MMIO-Geräte eingesetzt wer-den. Es müssen hierzu dann lediglich die Speicherstellenüberwacht werden, an denen die Peripherie in den CPU-Speicher eingeblendet ist.

3.3 KontrollflussinstruktionenSelbst wenn das Tracing auf die relevanten Speicherberei-che beschränkt ist, erweist es sich als nur bedingt hilfreich,

wenn die Programmroutinen oder die Ausführungskontex-te, in denen die Zugriffe stattfanden, nicht bekannt sind.Gerade im Bereich sicherheitskritischer Software ist es vongrößtem Interesse zu zeigen, dass bestimmte IO-Geräte,Daten oder Bibliotheken, wie z.B. private Schlüssel oderSignaturfunktionen, nur von bekannter, vertrauenswürdi-ger Software genutzt werden.Um solche Szenarien analysieren zu können, muss derKontrollfluss auf dem Level ausgeführter Programm-routinen aufgezeichnet werden. Zu diesem Zweck wirddie direkte Sprunganweisung Jump-and-Link (JAL) so-wie das registerindirekte Jump-and-Link-Register (JALR)im QEMU-Tiny-Code-Generator erweitert. Die direkteSprunginstruktion JAL springt dabei zu einer unmittel-bar in das Instruktionswort enkodierten Adresse, währendJALR als register-indirekter Sprung den Program-Counterauf die im Operandenregister gespeicherte Adresse setztund ggf. noch einen Offset hinzuaddiert.Abbildung 3 zeigt das Encoding der RISC-V-JAL-Instruktion. Das Sprungziel bildet hierbei der im Instruk-tionswort kodierte offset, welcher unter Berücksichtigungdes Vorzeichens auf die Speicheradressbreite erweitertund zum PC addiert wird. Die Rücksprungadresse istPC+ 4, also die Instruktion, die unmittelbar auf die JAL-Instruktion folgt. Sie wird in Register rd gespeichert.

16 Volume I: RISC-V User-Level ISA V2.2

The alternate link register supports calling millicode routines (e.g., those to save and restoreregisters in compressed code) while preserving the regular return address register. The registerx5 was chosen as the alternate link register as it maps to a temporary in the standard callingconvention, and has an encoding that is only one bit di↵erent than the regular link register.

Plain unconditional jumps (assembler pseudo-op J) are encoded as a JAL with rd=x0.

31 30 21 20 19 12 11 7 6 0

imm[20] imm[10:1] imm[11] imm[19:12] rd opcode

1 10 1 8 5 7o↵set[20:1] dest JAL

The indirect jump instruction JALR (jump and link register) uses the I-type encoding. The targetaddress is obtained by adding the 12-bit signed I-immediate to the register rs1, then setting theleast-significant bit of the result to zero. The address of the instruction following the jump (pc+4)is written to register rd. Register x0 can be used as the destination if the result is not required.

31 20 19 15 14 12 11 7 6 0


12 5 3 5 7o↵set[11:0] base 0 dest JALR

The unconditional jump instructions all use PC-relative addressing to help support position-independent code. The JALR instruction was defined to enable a two-instruction sequence tojump anywhere in a 32-bit absolute address range. A LUI instruction can first load rs1 with theupper 20 bits of a target address, then JALR can add in the lower bits. Similarly, AUIPC thenJALR can jump anywhere in a 32-bit pc-relative address range.

Note that the JALR instruction does not treat the 12-bit immediate as multiples of 2 bytes,unlike the conditional branch instructions. This avoids one more immediate format in hardware.In practice, most uses of JALR will have either a zero immediate or be paired with a LUI orAUIPC, so the slight reduction in range is not significant.

The JALR instruction ignores the lowest bit of the calculated target address. This bothsimplifies the hardware slightly and allows the low bit of function pointers to be used to storeauxiliary information. Although there is potentially a slight loss of error checking in this case,in practice jumps to an incorrect instruction address will usually quickly raise an exception.

When used with a base rs1=x0, JALR can be used to implement a single instruction sub-routine call to the lowest 2KiB or highest 2KiB address region from anywhere in the addressspace, which could be used to implement fast calls to a small runtime library.

The JAL and JALR instructions will generate a misaligned instruction fetch exception if the targetaddress is not aligned to a four-byte boundary.

Instruction fetch misaligned exceptions are not possible on machines that support extensionswith 16-bit aligned instructions, such as the compressed instruction set extension, C.

Return-address prediction stacks are a common feature of high-performance instruction-fetch units,but require accurate detection of instructions used for procedure calls and returns to be e↵ective.For RISC-V, hints as to the instructions’ usage are encoded implicitly via the register numbersused. A JAL instruction should push the return address onto a return-address stack (RAS) onlywhen rd=x1/x5. JALR instructions should push/pop a RAS as shown in the Table 2.1.

Abbildung 3 RISC-V-JAL-Instruktion

Die JALR-Instruktion ist in Abbildung 4 dargestellt. Ana-log zur JAL-Instruktion wird auch hier die Rücksprung-adresse in Register rd abgespeichert. Allerdings findet hierkein direkter Sprung statt sondern es wird stattdessen dasSprungziel aus der Summe der in Register rs1 gepspeicher-ten Adresse und dem im Instruktionswort kodierten, vor-zeichenbehafteten offset berechnet.

16 Volume I: RISC-V User-Level ISA V2.2

The alternate link register supports calling millicode routines (e.g., those to save and restoreregisters in compressed code) while preserving the regular return address register. The registerx5 was chosen as the alternate link register as it maps to a temporary in the standard callingconvention, and has an encoding that is only one bit di↵erent than the regular link register.

Plain unconditional jumps (assembler pseudo-op J) are encoded as a JAL with rd=x0.

31 30 21 20 19 12 11 7 6 0

imm[20] imm[10:1] imm[11] imm[19:12] rd opcode

1 10 1 8 5 7o↵set[20:1] dest JAL

The indirect jump instruction JALR (jump and link register) uses the I-type encoding. The targetaddress is obtained by adding the 12-bit signed I-immediate to the register rs1, then setting theleast-significant bit of the result to zero. The address of the instruction following the jump (pc+4)is written to register rd. Register x0 can be used as the destination if the result is not required.

31 20 19 15 14 12 11 7 6 0


12 5 3 5 7o↵set[11:0] base 0 dest JALR

The unconditional jump instructions all use PC-relative addressing to help support position-independent code. The JALR instruction was defined to enable a two-instruction sequence tojump anywhere in a 32-bit absolute address range. A LUI instruction can first load rs1 with theupper 20 bits of a target address, then JALR can add in the lower bits. Similarly, AUIPC thenJALR can jump anywhere in a 32-bit pc-relative address range.

Note that the JALR instruction does not treat the 12-bit immediate as multiples of 2 bytes,unlike the conditional branch instructions. This avoids one more immediate format in hardware.In practice, most uses of JALR will have either a zero immediate or be paired with a LUI orAUIPC, so the slight reduction in range is not significant.

The JALR instruction ignores the lowest bit of the calculated target address. This bothsimplifies the hardware slightly and allows the low bit of function pointers to be used to storeauxiliary information. Although there is potentially a slight loss of error checking in this case,in practice jumps to an incorrect instruction address will usually quickly raise an exception.

When used with a base rs1=x0, JALR can be used to implement a single instruction sub-routine call to the lowest 2KiB or highest 2KiB address region from anywhere in the addressspace, which could be used to implement fast calls to a small runtime library.

The JAL and JALR instructions will generate a misaligned instruction fetch exception if the targetaddress is not aligned to a four-byte boundary.

Instruction fetch misaligned exceptions are not possible on machines that support extensionswith 16-bit aligned instructions, such as the compressed instruction set extension, C.

Return-address prediction stacks are a common feature of high-performance instruction-fetch units,but require accurate detection of instructions used for procedure calls and returns to be e↵ective.For RISC-V, hints as to the instructions’ usage are encoded implicitly via the register numbersused. A JAL instruction should push the return address onto a return-address stack (RAS) onlywhen rd=x1/x5. JALR instructions should push/pop a RAS as shown in the Table 2.1.

Abbildung 4 RISC-V-JALR-Instruktion

Gemäß der RISC-V-Standard-Calling-Convention [4] wirdein Routineaufruf durch eine JAL- oder JALR-Anweisungbeschrieben, bei der das Linkregister x1 zum Speichernder Rücksprungadresse ausgewählt wird. Eine Funktions-rückkehr wird wiederum durch eine JALR-Anweisung be-schrieben, in der die Rücksprungadresse nicht gespeichertwird (ausgedrückt durch Speichern im konstanten Nullre-gister x0) und das Linkregister x1 als Sprungziel ausge-wählt wird. Während der Verfolgung des Kontrollflusseswerden die folgenden Informationen aufgezeichnet:

• Die Position der JAL(R)-Anweisung im Programm-speicher der CPU. Dies beinhaltet auch die Return-Adresse, die die nächste Anweisung nach dem Sprungist.


38

• Das Sprungziel. Für JAL ist dies ein Offset, der un-mittelbar im Befehlswort kodiert und direkt zum Pro-grammzähler (PC) hinzuaddiert wird. Beim register-indirekten Sprung JALR ist die Sprungzieladresse dieSumme aus Operandenregister rs1 und einem Offset,der im Befehlswort kodiert ist. Der PC wird dann aufden berechneten Wert gesetzt.

• Der Symbolname der Ausgangsroutine (d.h. der Rou-tine, die die Anweisung JAL(R) enthält). Dies wirdnur dann aufgezeichnet, wenn Symbolinformationenin der Target-Binärdatei vorhanden sind.

• Der Symbolname der Zielroutine. Analog zum Sym-bolnamen der Startroutine wird er nur dann aufge-zeichnet, wenn entsprechende Symbolinformationenverfügbar sind.

Hierbei kann in der Implementierung die Kontrollfluss-analyse einfach auf relevante sicherheitssensitive Pro-grammabschnitte eingeschränkt werden. Zu diesem Zweckmuss der Symbolname oder die Startadresse einer Rou-tine angegeben werden. Der Programmablauf außerhalbdes Kontextes der angegebenen Routine wird dann bei derAnalyse nicht berücksichtigt. Dies ist besonders nützlich,wenn z.B. bei eingebetteter Software, die in der Program-miersprache C geschrieben wurde, nur der Programmab-lauf innerhalb der Grenzen der Main-Funktion protokol-liert werden soll, nicht aber der mikrocontrollerabhängigeStartcode.

4 Analyse sicherheitskritischerSoftware mit QMT

Das folgende Beispiel soll unser Analysewerkzeug QMTanhand einer dynamischen IO-Analyse demonstrieren.In diesem Beispiel kommt ein RISC-V-Prozessor zumEinsatz, welcher über keine Memory Management Unit(MMU) oder Memory-Protection Unit (MPU) verfügt. Erist über eine UART (Universal Asynchronous Receiver andTransmitter) Schnittstelle an ein elektronisches Türschließ-system angeschlossen. Bei UART handelt es sich um ei-ne asynchrone serielle Schnittstelle an PCs und Mikrocon-trollern zum Senden und Empfangen von Daten. BekannteUART-Implementierungen sind z.B. RS-232 und RS-485,welche u.A. im industriellen Umfeld eingesetzt werden.Die allgemeine Struktur ist in Abbildung 5 dargestellt undwird im folgenden Abschnitt näher beschrieben.Wir gehen davon aus, dass sich im Programmspeicherdes RISC-V-Prozessors zwei Programme sowie eine Pro-grammbibliothek befinden, welche die Steuerung derSchließanlage durch die UART-Schnittstelle übernimmt.Wir nehmen an, dass für beide Programme kein Quellco-de verfügbar ist, so dass sie als Objektcode zum finalenRISC-V Binärprogramm gelinkt werden. Das erste Pro-gramm soll berechtigt sein, mit Hilfe der Bibliothek überdie UART-Schnittstelle Kommandos zur Ent- und Verrie-gelung an das elektronische Schließsystem zu senden. Jeg-licher Zugriff vom zweiten Programm sowohl auf die Bi-bliothek als auch unmittelbar auf den UART der RISC-V-

Getestete SW mit Erlaubnis

zum Zugriff auf UART

SW für Tür-

zugriff

Ungetestete SW ohne Zugriffs-

berechtigung auf UART

Programm 2Programm 1

RISC-V UART Schnittstelle

UART Bibliothek

Abbildung 5 Zwei Programme greifen jeweils auf dieUART-Schnittstelle des RISC-V-Prozessors zu

CPU sollte nicht vorgesehen sein und somit als potentiellillegaler Zugriff betrachtet werden. Erlaubte Zugriffe sindin der Abbildung durch durchgehende, fett gedruckte Pfei-le und potentiell illegale Zugriffe durch gestrichelte Pfeiledargestellt.Tabelle 2 zeigt einen möglichen Trace für das oben be-schriebene RISC-V-Szenario, das während der Ausführungin QMT angelegt wurde. Die erste Spalte zeigt die Artder Anweisung, d.h. ob es sich um einen Funktionsauf-ruf, einen Return oder einen IO-Zugriff handelt. Die zwei-te Spalte zeigt die entsprechende Adresse im Programm-speicher bei einer Kontrollflussinstruktion und die MMIO-Adresse im Datenspeicher der virtuellen RISC-V-CPU beieinem IO-Zugriff. In der letzten Spalte steht der Symbolna-me der aktuell ausgeführten Routine oder bei IO-Zugriffender Name des überwachten Hardwaregeräts; in diesem Bei-spiel jeweils UART_tx. Aus Gründen der Übersichtlichkeitist der Inhalt der auf der UART-Schnittstelle gesendetenDaten in der Tabelle nicht aufgeführt.

Tabelle 2 Ergebnisse der dynamischen Speicheranalyse

Operation Adresse Symbolnamecall 0x20400066 maincall 0x204001b6 program1

berechtigt

call 0x2040014c door_openstore byte 0x10013000 UART_tx... ... ...store byte 0x10013000 UART_txreturn 0x20400142 door_openreturn 0x20400156 program1... ... ...call 0x204001b8 program2store byte 0x10013000 UART_tx illegal

... ... ...store byte 0x10013000 UART_txreturn 0x204001ac program2return 0x204001c4 main


39

https://www.riscv.org/

https://www.riscv.org/specifications/

https://www.riscv.org/specifications/

https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf



http://web.mit.edu/gnu/doc/html/binutils_5.html

http://web.mit.edu/gnu/doc/html/binutils_5.html

https://www.absint.com/stackanalyzer/index.htm

https://www.absint.com/stackanalyzer/index.htm

http://valgrind.org

In diesem Szenario greift Programm 1 zunächst berech-tigterweise auf die Funktion door_open der Bibliothekzur Türsteuerung zu. Im Anschluss daran schreibt Pro-gramm 2, verbotenerweise und an der Programmlogik derdoor_open Funktion vorbei, ebenfalls in den Speicherbe-reich des UART und versucht auf diese Weise, Zugriff aufdie Türsteuerung zu erlangen.Im Trace-Log ist dieser illegale Zugriffsversuch durch Pro-gramm 2 leicht erkennbar, was ermöglicht, entsprechendeGegenmaßnahmen zu ergreifen und bei Bedarf zu veran-lassen, dass das Programm 2 als fehlerhaft eingestuft undin der nächsten Iteration nicht mehr in das Binärprogrammintegriert wird.

5 Zusammenfassung und Ausblick

In diesem Artikel haben wir eine nicht-invasive Metho-de zur Untersuchung des Verhaltens von Binärsoftware imHinblick auf durchgeführte Lese- und Schreibzugriffe aufDatenspeicher sowie aller in den sichtbaren Speicherbe-reich der CPU eingeblendeten Ein-/Ausgabegeräte vorge-stellt, die im Rahmen des Analysewerkwerkzeugs QMTimplementiert wurde. QMT wurde als ein auf dem CPU-Emulator QEMU aufbauendes dynamisches Speicher- undIO-Analysewerkzeug für die RISC-V-Plattform implemen-tiert. Wir konnten zeigen, dass mit QMT jegliche Periphe-riezugriffe inklusive des jeweiligen Ausführungskontexteseffizient protokolliert werden können, was eine effizien-te Bewertung im Hinblick auf mögliche Sicherheitsrisikenermöglicht. Als Beispiel diente hierbei eine Türsteuerung,welche per serieller UART-Schnittstelle an einen RISC-V-Prouzessor angeschlossen ist. Hierbei konnte der unerlaub-te Zugriff auf ein Türschließsystem durch eine ungetesteteFunktion, welche nicht im Quellcode vorlag, einfach iden-tifiziert werden. Im realen Betrieb wären hierzu weiterge-hende Analysen, wie z.B. der Einsatz von Logikanalysato-ren, nötig gewesen. Erste Ergebnisse haben gezeigt, dassdie Laufzeit durch die Analyse nur unwesentlich negativbeeinflusst wird. Hierbei lag der gemessene Overhead derzusätzlichen in QEMU integrierten Analyse immer bei we-niger als 5%. Da dieser Wert im Wesentlichen auf Datei-zugriffe auf die Log-Datei zurückführen ist, herrscht hiernoch erhebliches Potential zur Optimierung.Eingebettete Software beinhaltet in der Regel InterruptService Routinen (ISR). Hierzu definiert die RISC-V ISAspezifische Instruktionen für die Rückkehr aus den ISRs(z.B. mret, sret, uret), die einen indirekten Kontrolltrans-fer ermöglichen und häufig im Low-Level-Code verwen-det werden, um beispielsweise Kontextwechsel zu imple-mentieren. Diese Instruktionen wurden für die Kontroll-flussanalyse in unserer aktuellen Implementierung derzeitnicht berücksichtigt, da sie in der aktuellen RISC-V-TCG-Version auch nicht vollständig unterstützt werden. Wir pla-nen, QMT für die Kontrollflussanalyse von Interrupts zuerweitern, sobald diese Unterstützung vorliegt. Des Wei-teren planen wir die Implementierung eines Whitelisting-Verfahrens für die Spezifikation von erlaubten Speicherzu-griffen. Hiermit könnte ein Systemintegrator explizit spe-zifizieren, welche Programmroutinen auf welche Speicher-

bereiche zugreifen dürfen. Die Analyseergebnisse könntensomit automatisch bewertet und alle nicht auf der Whiteliststehenden Zugriffe automatisch als illegal betrachtet wer-den. Dies wäre speziell für iterative Speicherzugriffsana-lysen hilfreich, da dann die wiederholte, manuelle Bewer-tung der Analyseergebnisse entfällt.

DanksagungenDie hier durchgeführten Arbeiten wurden durch dasBMBF im Rahmen der Förderprojekte COMPACT (Nr.01IS17028) und Safe4I (Nr. 01IS17032) unterstützt.

6 Literatur

[1] Homepage von RISC-V: https://www.riscv.org/[Online]. Abgerufen am 04.01.2019

[2] F. Bellard: QEMU, a Fast and Portable DynamicTranslator. In ATEC ’05: Proceedings of the Annu-al Conference on USENIX Annual Technical Confe-rence, Anaheim CA, USA, 2005.

[3] RISC-V User-Level Instruction Set Architec-ture Specification: https://www.riscv.org/specifications/ [Online]. Abgerufen am04.01.2019

[4] Calling Convention for RISC-V: https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf [Online]. Abgerufen am04.01.2019

[5] The GNU Binary Utilities - objdump: http://web.mit.edu/gnu/doc/html/binutils_5.html [Onli-ne]. Abgerufen am 21.01.2019

[6] StackAnalyzer - Stack Usage Analysis: https://www.absint.com/stackanalyzer/index.htm[Online]. Abgerufen am 21.01.2019

[7] Homepage von Valgrind: http://valgrind.org[Online]. Abgerufen am 21.01.2019


40

Michael Mendler, Uni Bamberg

The design and analysis of complex distributed systems proceeds along numerous levels of abstractions. One key abstraction step for reducing complexity is the passage from analog transistor electronics to synchronously clocked digital circuits. This significantly simplifies the modelling from continuous differential equations over the real numbers to discrete Mealy automata over two-valued Boolean algebra. Although typically taken for granted, this step is magic. How do we obtain clock synchronization from asynchronous communication of continuous values? How do we decide on the discrete meaning of continuous signals without a synchronization clock? From a logical perspective, the possibility of synchronization is paradoxical and appears “out of thin air.”

This talk revisits the logical justification of the synchronous abstraction claiming that correctness arguments, in so far as they are not merely reductions, must intrinsically depend on reasoning in classical logic. This is studied at the circuit level, where all software reductions must end. The well-known result that some synchronization elements cannot be implemented in delay-insensitive circuits is related to “Berry’s Thesis” according to which digital circuits are delay-insensitive if and only if they are provably correct in constructive logic. More technically, we introduce Muller Logic, a modal temporal logic to specify the asynchronous behaviour of digital circuits under the General Multiple Winner Model of signal switching. We show how non-inertial delays give rise to a constructive fragment of Muller logic while inertial delays are inherently non-constructive. This gives a logical explanation for why inertial delays can be used to build arbiters, memory-cells and other synchronization elements, while non-inertial delays are not powerful enough. Though these results are tentative, they indicate the importance of logical constructiveness for metastable-free discrete abstractions of physical behavior.

Logical Analysis of Distributed Systems: The Importance of Being Constructive


41

Optimization Framework for Hardware Design of Engine Control

Units

Iryna Kmitina1, Nico Bannow1, Christoph Grimm2, Daniel Zielinski1, Carna Zivkovic2 1Robert Bosch GmbH, Schwieberdingen, Germany, {Iryna.Kmitina, Nico.Bannow, Daniel.Zielinski}@de.bosch.com 2Kaiserslautern University of Technology, Kaiserslautern, Germany, {Grimm, Zivkovic}@cs.uni-kl.de

Abstract

An increasing number of vehicle functionalities, shorter development time and competitive market price constitute

conflicting objectives with multiple constraints for the development of Engine Control Units (ECU). Hence, there is a

need for an assessment framework to overcome the above-mentioned challenges during the design of ECUs. This paper

presents an outline of an automated optimization framework that reduces the effort in the ECU design. We briefly describe

the ECU requirements and an optimization approach that finds Pareto optimal solutions for defined design objectives.

The preliminary results of the framework indicate a significant speed-up in the design of ECUs.

Kurzfassung

Eine zunehmende Anzahl von Fahrzeugfunktionalitäten, kürzere Entwicklungszeiten und wettbewerbsfähige Marktpreise

stellen widersprüchliche Ziele mit mehreren Randbedingungen für die Entwicklung von Motorsteuergeräten dar. Daher

ist es notwendig, den Entscheidungsprozess zu unterstützen, um die genannten Herausforderungen bei der Gestaltung von

Steuergeräten zu überwinden. In dieser Arbeit wird ein Überblick über ein automatisiertes Optimierungs-Framework

gegeben, das den Aufwand beim Steuergeräte-Design reduziert. Das Paper beschreibt die Anforderungen an ein

Steuergerät und einen Optimierungsansatz, der Pareto-optimale Lösungen für definierte Optimierungsziele findet. Die

vorläufigen Ergebnisse des Frameworks zeigen eine erhebliche Beschleunigung bei der Entwicklung von Steuergeräten.

1 Introduction

The diversification of propulsion concepts for vehicles

includes classic combustion engines, electrical engines,

and their combination together with new comfort features

for autonomous driving. It leads to a steadily growing

number of vehicle functions in the powertrain. Due to an

equipment rate of 100% in vehicles with an internal

combustion engine, the engine control unit is being used as

an integration platform not only for engine control

functions as such, but also for many other functions in the

vehicle leading to ECU’s with up to ~330 inputs/outputs

(I/O). Despite the changes and increasing variance in

propulsion systems (i.e. electrification) and the resulting

changes in the Electrical/Electronic (E/E) architecture of

the powertrain, the number of functions is still growing.

The ECU design (also known as an engineering process) is

coupled with increasing requirements with respect to e.g.

environmental conditions, housing size, power dissipation,

or safety and security as well as the increasing cost

pressure. Therefore it evolves into a multi-objective

optimization problem with growing complexity. In this

paper, vehicle functions will be assigned to hardware (HW)

components, and thence they are referred to as HW

functions. The chosen integrated circuits (ICs) and other

electronic components define the total area of the printed

circuit board (PCB) and also influence the housing type,

size and material (sheet metal or plastic). An example of an

ECU is illustrated in Figure 1.

Figure 1 Example of a vehicle ECU

The engineering process of a new ECU, based on an

effective assignment of HW functions to IC, is basically

still a manual process, which is based on expert’s

knowledge from previous ECU generations/projects.

Especially in case of large changes compared to previous

projects, the ECU could be supported if a decision-making

assessment could be used during the engineering process.

The main challenge to obtain the most suitable HW

function mapping to existing ICs is the high complexity of

the solution space. This can be solved by adaption of

proper optimization approaches and accurate selection of

optimization settings depending on characteristics of each

individual problem. The mathematical formulation of

objectives and free optimization parameters pose a further

Intern | PS-EC/EHA1 I.Kmitina | 14.06.2018

© Robert Bosch GmbH 2018. Alle Rechte vorbehalten, auch bzgl. jeder Verfügung, Verwertung, Reproduktion, Bearbeitung, Weitergabe sowie für den Fall von Schutzrechtsanmeldungen.27

Housing (top)

ICs

PCB

Housing (bottom)


42

challenging task which impacts the applicability of the

existing approaches. Beyond the above-mentioned

challenges, one has to consider the complexity of a generic

construction of the framework to deal with various

optimization use cases.

This paper introduces an automated and optimization-

based framework that provides new possibilities for the

design of engine control units. It helps the engineer to

collate and analyze all relevant electronic characteristics of

vehicle functions automatically. It applies both the

customers’ and a supplier’s requirements as the ECU

design restrictions to generate feasible solutions and

performs HW functions mapping based on numerical

optimization by metaheuristic algorithms.

The outline of the paper is as follows: section 2 gives a

brief introduction to the problem task and related works in

the field of HW function mapping. Section 3 describes the

optimization framework, including the formulation of the

HW function mapping problem. Section 4 shows the

results of mapping studies. Finally, in section 5 the

conclusion and future work are presented.

2 Related work

Automotive embedded systems play a dominant role in

vehicle innovations. Resource allocations in embedded

systems is also known as a configuration design problem

[1-4]. The configuration problem is a specialization of the

general design problem, in which no new components (e.g.

ICs) may be added to the problem to be solved [1].

The mapping task is classified as a combinatorial

optimization problem and is known to be NP-hard (non-

deterministic polynomial time) [5-7]. The NP-hardness of

the HW function mapping was shown by several

researchers and was summarized in [8].

However, best to our knowledge, there are only a few

approaches considering HW function mapping for ECUs.

One of these is the hardware platform optimization tool [9].

The tool was developed for function mapping to

application-specific integrated circuits (ASIC) for single

ECU projects. The goal of this work is the sequential

optimization of two objectives: costs and power

dissipation. For this purpose, two optimization methods

were applied respectively for each objective: mixed-

integer linear programming (MILP) and A*-search

algorithm. However, due to the sequential steps, the

optimization of the objectives cannot be performed

simultaneously. Furthermore, the configuration of the tool

structure restricts the exchangeability of optimization

methods and further extendibility with additional

objectives. Another approach is proposed by Lutz [5] and

based on Evolutionary Algorithms (EA). Hardware

modules are assigned to ASICs. Here HW modules mostly

have abstract requirements and contain different tasks

ranging from simple power switches up to complex

integrated circuits. In this approach, the internal structure

of ASICs is still unknown. The primary author’s idea is

defining the optimal manufacturing technology for the

functional scope of prospective ASICs.

E/E architecture is considered as related to ECU mapping

where the connections of all electronic control units,

sensors and actuators in a single vehicle are taken into

account. The aim of the optimization is to achieve an

efficient allocation of software and hardware components

to electronic control units [6]. Several researchers have

studied exact, heuristic and metaheuristic methods to solve

such mapping problems. As mapping is an NP-hard

problem, the application of exact methods tends to be quite

slow for bigger dimensions of the problem [8]. A mapping

technique for electronic control unit networks to overcome

the problems arising from outsized Binary Decision

Diagrams is presented in [10]. This technique is efficient

for Boolean objective functions only, which is a limitation.

The MILP approach with respect to energy-efficiency of

E/E architectures is proposed in [7]. However, this

approach is efficient for small to medium-sized problems

with single-objective linear function only. In [11], EA was

used for software components mapping to electronic

control units considering two objectives: the data

transmission reliability between components and the

communication overhead. A mapping of multimedia

applications to architecture components like processors

and memories is provided in [12]. Three objectives have

been considered: maximum processing time, power

consumption and total cost of the architecture. The explicit

definition of the multimedia application parameters for

mapping to the architecture components is however

missing.

Other relevant works are research activities that use

Satisfiability Modulo Theories (SMT) solver [13, 14]. The

extension νZ within the solver allows applying this

technique solely in non-complex ECU optimization

problems due to long computational time.

Although most of these investigated mapping techniques

deal with widespread metaheuristic optimization

approaches, they are strictly adapted to concrete problem

formulation and available input data. This hampers the

direct application of these techniques to our problem

formulation. Hence, we propose a generic construction of

the assessment framework for the development of engine

control units that complements best practices and allows

higher flexibility.

3 Automated optimization

framework

3.1 Framework structure

The ECU engineering process is a result of a mapping

procedure of customers’ required hardware functions,

which are assigned to electronic characteristics, to

appropriate ECU electronic components (hereinafter

referred to as ASICs only). For simpler handling of the

automated optimization framework for an ECU

engineering process, we divided it into two main parts:

solution space preparation and optimization (Figure 2).


43

Figure 2 Schematic diagram of the optimization

framework

The framework takes “Databases” as the input (see

Fig. 2). They include information of customers’

requirements and supplier’s electronic components. A

continuous change of customers’ ECU project

requirements, available, planned and expiring

supplier’s electronic components and new market

trends are taken into consideration for an ECU

engineering here. The framework automatically

uploads and structures this information from

databases. For this purpose, we designed an object-

relational data model for data processing, and

integrated it into the framework.

After the databases excerpt, an ECU engineer first

chooses an optimization use case in the “Solution

space preparation” part of the framework diagram

(Fig. 2). There are three primary use cases in the ECU

design: optimization of a single ECU, optimization of

different ECU variants of one vehicle type, e.g.

gasoline, and optimization of all vehicle types as an

ECU portfolio to cover a target market of the

automotive industry. The use cases are elaborated and

integrated into the framework by means of preparing

and manipulating the data model accordingly. In this

paper, we will present results for the use-case with a

single ECU. The whole solution space will be prepared

here for a chosen ECU project. In this step, the

customers’ and component supplier’s restrictions will

be checked and the solution space will be reduced. For

example, not all available electronic components are

suitable for the realization of the required HW

functions. This depends on technical requirements like

the ECU environmental conditions, housing size,

ambient temperature, etc. Finally, we get the solution

space that includes all valid hardware function

realizations for the chosen use case.

In the “Optimization” part (Fig. 2), a generic

optimization procedure is implemented. The input

data for this part is the prepared solution space. The

applied optimization algorithm produces values of

these input (free) parameters within the given lower

and upper limits. Furthermore, the framework

generates an ECU design based on the proposed free

parameter values. The new design is than being

evaluated for constraint violations and objectives.

Depending on the engine type – for example, standard

or premium segment – the importance of some

optimization objectives and constraints varies.

Therefore, there are no fixed objectives and

constraints for all ECU projects. Because of this, we

designed our framework to allow activating and

deactivating objectives and setting optional

constraints. After evaluation of the objectives, the

optimizer performs an iterative modification of the

free parameters so that the costs of the given objectives

are minimized without violating any of the existing

constraints. This iterative optimization process is

continued until a termination criterion is reached.

Finally, the “Output” of the framework is a set of

Pareto solutions that are optimal mappings of

hardware functions to ASICs in an ECU.

3.2 Mapping procedure

In this work, the peculiarity of the HW function mapping

procedure is that it is carried out in two stages.

In the first stage, HW functions are mapped to abstract

building blocks. The term building block is used for entities

of ASICs. It can be various high-side and low-side switch

power stages for driving different current loads, sensor

supplies, and communication transceiver, e.g. SPI, CAN

bus, or Ethernet. For a mapping of HW functions to the

building blocks, a further classification of each HW

function is done. The classification includes information

such as:

HW function subsumption, e.g. to a power switch,

half bridge, communication or sensor supply, etc.,

maximum load current,

automotive safety integrity level,

after-run mode.

One HW function can be realized by at least one or by a

combination of several building blocks. Moreover, one

HW function may have more than one such realization

possibility.

The second stage is the mapping of these abstract building

blocks to ASICs. ASICs have a fixed structure and can

contain one or more building blocks. During the mapping

process, each building block must be mapped to a single

ASIC. In this way, the mapping of one HW function can be

realized by one or several ASICs depending on the

available building blocks.

An example of a mapping proposal is given in Fig. 3. The

HW function F1 can be realized by the combination of two

building blocks b1 and b2. Furthermore, these building

blocks are mapped to the two electronic components ASIC1

and ASIC2 respectively. The ASIC1 also provides a

realization for HW function F2 over the building block b3.

The above-mentioned model of the HW function mapping

procedure was extended to fit our optimization problem

formulation and was integrated into our framework

afterward.

Framework

Output

Solution space

preparation



Optimization

Supplier

restrictions

Solution

space

Free

parameters

proposal

ImportCustomer

restrictions

Start an

optimizer

HW function

mapping

Evaluate

constraints

and

objectives

Use case

Nex

t it

erat

ion

Data-

bases


44

Figure 3 Example for a mapping HW functions to ASICs

of an ECU

The assignment of functions to ASICs via building blocks

is constituted of a large variety of mappings possibilities in

the ECU design. After both mapping stages, the ASICs

implement at least a part of a single HW function, or the

realization of a complete HW function, or even the

realization of several different HW functions, or parts of

them.

As mentioned in section 3.1, characteristics of electronic

components (electrical parameters and costs of the building

blocks and ASICs) are stored in the object’s specific

attributes. During the evaluation of constraints and

objectives, the values are extracted from the object’s (e.g.

building block) attributes. This construct is required to

allow a generic configuration of the framework.

Different mapping optimization objectives of an ECU

design have to be evaluated simultaneously as they

influence the mapping choice of HW functions to

electronic components. For instance, the total cost can be

decreased by mapping HW functions to low price ASICs

or by mapping a maximum number of possible HW

functions to a single ASIC. The total PCB area can be

decreased by choosing ASICs with smaller packages or by

mapping several functions to one ASIC. Power dissipation

can be decreased by using parallel circuit configurations

and selecting discrete power stages with low resistance.

The best mapping should be chosen under consideration of

the best tradeoff between all of the multiple optimization

objectives to satisfy the key performance indicators such as

overall cost, environmental conditions (e.g. ambient

temperature), and ECU housing type and size.

Furthermore, it considers the main cost effect chain – the

thermal path.

3.3 Optimization strategy

The mapping problem is a constrained multi-objective

integer optimization problem and is formulated as:

minimize f(x)

subject to g(x) ≤ 0

In this equation f(x) is the vector that contains the values

of the different objectives, the vector x contains free

parameters and the vector g(x) – constraints.

An essential task in the selection of an efficient

optimization strategy is the comprehension of the problem.

In [15] various classification criteria of optimization

problems were introduced. The most significant

classifications for our task to optimize an ECU are

analyzed as follows:

the given problem is classified as a multi-

objective optimization,

multi-dimensional search space,

free parameters accept only integer values,

presence of constraints which define the set of

feasible solutions within a search space,

the objective and constraint functions are both

linear and non-linear, and

large-scale combinatorial optimization problem.

The choice of a convenient optimization method for our

problem is non-trivial. According to the classification of

both optimization problems and optimization algorithms,

metaheuristic algorithms appear to be suitable for our

problem formulation [5, 8, 16-18]. These methods are able

to produce Pareto optimal solutions as well as to solve

integer and large-scale combinatorial problems. Over the

last few decades, several researches have investigated

intelligence based optimization algorithms, which are

inspired from the nature of vortex. In [16] authors reviewed

approaches like Evolutionary Optimization, Fern strategy,

Ant Colony Optimization, Particle Swarm Optimization

(PSO) and Artificial Neural Nets. The study shows that EA

and PSO have the capacity to propose better solutions. In

some tests, the PSO shows a faster tendency towards the

assumed best values, but also might have the tendency to

stick to local optima. For further investigations, we applied

several metaheuristic techniques. In this paper, we will

present results obtained by the PSO method.

In a multi-objective optimization, it is not possible to find

a single solution instead several objectives have to be

optimized simultaneously. A very common method to

solve such problems is the Weighted Sum method [19, 20].

The objectives will be aggregated into a single objective

function by weighting factors. Nevertheless, this approach

is not appropriate for concave problems. Furthermore, it is

required to adjust the weighting factors for each objective

function. Instead, the multi-objective PSO method seems

to be promising for these types of problems, where no

aggregation of the objectives is required.

PSO technique was introduced in 1995 by Kennedy and

Eberhart as an artificial implementation of swarm

intelligence [21] and has undergone many changes up to

today [22, 23]. The method is similar to a genetic algorithm

while it generates a random-based population. However,

each individual of the population – a particle – gets a

randomized velocity as well. Three adjustable factors are

used in the velocity update of particles: inertia, cognitive

and social. The optimization performance is affected by the

weighting of these factors, the choice of the particles

number and the number of iterations. There are no general

rules for these factors for every optimization problem.

However, proposals from previous investigations [16, 22,

23] might help to get started more effectively.

Three main conflicting objectives were identified for an

ECU optimization. The first and very important objective

is the minimization of the total ECU cost. In the presented



ECU

HW

fu

nct

ion

s li

st

…

…

AS

ICs

set

…


45

work, the costs are associated with the ASICs costs, costs

for housing, for the PCB, and for manufacturing and

testing. Each ASIC has a fixed price over the

manufacturing years. Furthermore, this price varies based

on an ECU production volume. Thus the objective of costs

is directly influenced by the choice and the amount of

ASICs. Since free parameter values will be proposed, a

HW function mapping that implemented in the framework

will be done. The resulting costs caused by the chosen

ASICs can be determined (1)

𝑓 (𝐱) =∑𝑐𝑎𝑖

𝑖=

𝑞𝑎𝑖 (1)

where vector x stands for free optimization parameters,

each ASIC 𝑎𝑖 is a part of a set of all available ASICs

𝑎 , … , 𝑎 ∈ A, variable 𝑐 stands for ASIC cost and 𝑞 is the

quantity of the ASIC 𝑎𝑖, 𝑞 ∈ ℤ+. If no HW function was

mapped to an ASIC 𝑎𝑖 , then 𝑞𝑎𝑖 = 0.

The second objective defines the total area of all ASICs

required on the PCB. This area mainly depends on the

integrated circuit packaging type of the chosen ASICs. The

connectivity between used building blocks also influences

the PCB area size. For the sake of simplification, it is

assumed as an overhead to the ASIC’s area in addition to

ancillary circuits like capacitors and resistors and the

required layout space (2):

𝑓 (𝐱) =∑𝑠𝑎𝑖

𝑖=

𝑞𝑎𝑖 (2)

where 𝑠 is area required by used ASIC 𝑎𝑖. The last of the three objectives defines the total power

consumption of an ECU. The thermal path is one of the

most critical and cost relating cause-effect chains in an

ECU. The power dissipation that electronic components

produce is calculated using the HW function’s load current

and the electrical resistance of the used building blocks of

the chosen ASICs (3):

𝑓 (𝐱) =∑∑ 𝐹𝑖

𝑗=

𝑖=

𝑅𝑏𝑗 (3)

where is the load current of the HW function F and 𝑅 is

the electrical resistance of the building block . Each

building block b is a part of a set of all available building

blocks B, b ∈ B. The power dissipation also depends on the

type of connection of the power switches (in series or

parallel) and was considered in the framework.

In addition to the mentioned objectives, obligatory

constraints are involved in the optimization process: maximum number of building blocks of each

ASIC,

maximum allowed power dissipation of an ASIC,

depending on its package.

Objective functions can also be formulated as additional

mapping constraints for the optimization at the same time.

The mapping procedure in the framework is realized in a

way that the constraint of the maximum number of

buildings blocks for each ASIC is never violated. Some

value combinations of free parameters may violate other

constraints. Unfortunately, it is only possible to identify the

violations after the mapping of a new ECU. A penalty

approach is used to deal with this problem [24, 25]. A

dynamically adjusted penalty factor is applied to the

objectives to penalize these free parameters values and to

increase the probability to obtain valid outcomes.

4 Study results

This section presents the results of applying the proposed

framework for real-world ECUs. The HW functions of

different projects have been mapped to several ASICs

inside a single ECU. A workstation equipped with an

Intel(R) Xeon(R) CPU E3-1505M @2.80GHz processor

and 32GB RAM was used for this study.

Depending on the number of HW functions in one ECU

project there is a different number of possible mappings.

For example, in a case with 10 HW functions and 5

possible realizations for each function, we would get 510

possibilities for the HW function mapping or ECU designs,

whereas permutations are not counted. For real projects,

this number of mapping combinations is much larger and

has a quadratic growth with the number of HW functions

(Figure 4).

Figure 4 Influence of a project’s number of HW functions

on the combinatorial size

The estimation in Figure 5 reflects the influence of the

number of HW functions on the calculation time of a single

optimization iteration.

Figure 5 Influence of a project’s number of HW functions

on the calculation time for one iteration


46

In this example, one iteration consists of 100 swarm

particles. It means that 100 objective evaluations namely

ECU designs for one iteration were applied.

It is obvious that an increasing number of HW functions in

a project has direct influence on the problem complexity

and the computational time. Moreover, the projects with a

higher amount of HW functions require more iterations for

an optimization convergence. One has to find a

compromise between the most suitable ECU designs and

shorter computational time by the selection of optimization

settings for each individual ECU project.

An optimization study with a realistic application example

is provided. The ECU project including 41 HW functions

was carried out. Considering the influence of the number

of HW functions, the following optimization settings were

used: 500 iterations, 300 swarm particles. The overall

runtime was approximately 2 hours. In Figure 6, the Pareto

front obtained by our optimization framework is

illustrated.

Figure 6 Pareto front for an ECU design study

Three objectives were minimized – cost, area, and power

dissipation defined by equations (1)-(3). The gained Pareto

front solutions are compared as normalized values against

each other. The lowest cost, area and power dissipation

values are all presented by the factor of “1.0”.

Each Pareto front solution represents a different ECU

design. A comparison between the most deviating

solutions reveals a power dissipation range from factor 2.0

to 1.0, while the cost factor of the same solutions moves in

the opposite direction from 1.0 to 1.5. The area is increased

by almost a factor of 1.5 as well. The best compromises can

be found in the middle of the Pareto front with cost factors

of less than 1.2, area factors around 1.18, and power

dissipation values around 1.1.

To compare the objectives of the ECU designs of this

study, a polar diagram may be used as shown in Figure 7.

One can evaluate all objectives in this diagram at once.

This becomes especially important with a higher number

of objectives. Three samples were selected: two extreme

solutions (sample 1 with the smallest power dissipation

factor and sample 2 with the biggest power dissipation

factor) as well as one compromise solution from the middle

part of the Pareto front (sample 3).

Figure 7 Polar diagram of the ECU designs

The better the designs meet the objective values the closer

the curve is to the center point. It can be seen that there is

no best single solution for all objectives since they have

opposite directions. Each ECU design has different

advantages and disadvantages. For example, sample 2 has

better results in cost and area reduction than the other

samples, while samples 1 and 3 are better in power

dissipation, which allows for using a wider range of ECU

housing types.

The optimization results of this study were evaluated

against empirical results of ECU engineers and they are

well acceptable. Especially for sample 3, no significant

discrepancies in the mapping of functions to ASICs were

observed compared to what an ECU engineer would

implement.

5 Conclusions and future work

In this paper, the automated optimization framework for

hardware design of an ECU was presented. The mapping

procedure was adopted, and suitable optimization

techniques for this multi-objective problem formulation

were analyzed. Automated input data processing was

coupled with the optimization part and integrated into the

framework. The optimization part was designed in a way

that an exchange of the optimization methods and

extendibility with additional objectives is possible with a

minimum of effort. In this study, the multi-objective PSO

method was integrated which gives a reliable

representation of the Pareto front. The relationship

between the number of HW functions, ECU design

combinations and the calculation effort were studied to

help adjusting the optimization settings effectively. The

results obtained from the framework were compared with

empirical results from ECU engineers. Only minor but

irrelevant discrepancies in the function mapping to ASICs

were observed. The results of this study show that the

presented framework is efficient and can provide

reasonable ECU design suggestions. Moreover, it proves to

be able to support an engineer to shorten ECU development

time complying with the steadily growing functional

complexity and to find the global optima.



Cost factor

Area factorPd factor

sample 1 sample 2 sample 3

2

1.5

1

0.5

0


47

In the future, we will implement other optimization

methods to compare their performance to obtain better

optimization results. The structure of the framework will

be further refined – especially, ECU designs are not only

engineered to fulfill the requirements of the individual

ECU project. The next steps will deal with an optimization-

based design for ECU variants and an architecture

evaluation for entire ECU portfolios. Depending on the

ECU production quantities and used electronic

components, a compromise over the whole portfolio will

provide an overall cost optimal solution. One major

optimization task will be the balancing between electronics

and mechanics design. Another focus of future work are

improvements towards usability to allow ECU engineers

easily use the tool in their daily work.

6 Literature

[1] Charles J. Petrie: Automated Configuration Problem

Solving. Springer, New York, NY, 2012, ISBN 978-1-

4614-4531-9. http://dx.doi.org/10.1007/978-1-4614-

4532-6.

[2] Frans Sanen, Eddy Truyen, and Wouter Joosen:

Mapping problem-space to solution-space features: a

feature interaction approach. In Proceedings of the

eighth international conference on Generative

programming and component engineering (GPCE

'09). ACM, New York, NY, USA, 167-176, 2009.

http://dx.doi.org/10.1145/1621607.1621633.

[3] Andreas Günter and Christian Kühn: Knowledge-

Based configuration- survey and future directions. In:

Puppe F. (eds) XPS-99: Knowledge-Based Systems.

Survey and Future Directions. XPS 1999. Lecture

Notes in Computer Science, vol 1570. Springer,

Berlin, Heidelberg, 1999.

[4] Bob Wielinga and Guus Schreiber: Configuration-

design problem solving. In IEEE Expert, vol. 12, no.

2, 49-56, 1997. http://dx.doi.org/10.1109/64.585104.

[5] Bernd Lutz: Neue Partitionierungskonzepte für

Kraftfahrzeug-Steuergeräte [Novel partitioning

concepts for vehicle control units]. Ph.D. Dissertation.

University Tübingen, 2011.

[6] Bernd Hardung: Optimization of the allocation of

functions in vehicle networks. Ph.D. Dissertation.

University Erlangen-Nürnberg, 2006.

[7] Gregor Walla, Andreas Herkersdorf, André S. Enger,

Andreas Barthels, and Hans-Ulrich Michel: An

automotive specific MILP model targeting power-

aware function partitioning. International Conference

on Embedded Computer Systems: Architectures,

Modeling, and Simulation (SAMOS XIV). Agios

Konstantinos, 299-306, 2014. http://dx.doi.org/

10.1109/ SAMOS.2014.6893224.

[8] M. C. Bhuvaneswari (Ed.): Application of

Evolutionary Algorithms for Multi-objective

Optimization in VLSI and Embedded Systems.

Springer, New Delhi, 2015, ISBN 978-81-322-1957-

6. http://dx.doi.org/10.1007/978-81-322-1958-3.

[9] Thomas Fritzsche and Stephan Schulteis: Hardware

platform optimization tool Homer. Unpublished,

2016.

[10] Michael Glaß, Martin Lukasiewycz, Felix Reimann,

Christian Haubelt, and Jürgen Teich: Symbolic

Reliability Analysis and Optimization of ECU

Networks. In Design, Automation and Test in Europe

(DATE), 158-163, 2008. http://dx.doi.org/10.1109/

DATE. 2008.4484679.

[11] Irene Moser and Sanaz Mostaghim: The automotive

deployment problem: A practical application for

constrained multiobjective evolutionary optimization.

In IEEE Congress on Evolutionary Computation, 1-8,

2010. http://dx.doi.org/10.1109/CEC.2010.5585991.

[12] Cagkan Erbas, Selin C. Erbas, and Andy D. Pimentel:

A multiobjective optimization model for exploring

multiprocessor mappings of process networks. First

IEEE/ACM/IFIP International Conference on

Hardware/ Software Codesign and Systems Synthesis

(IEEE Cat. No.03TH8721). Newport Beach, CA,

USA, 182-187, 2003. http://dx.doi.org/10.1109/

CODESS. 2003.1275280.

[13] Nikolaj Bjørner, Anh-Dung Phan, and Lars

Fleckenstein: νZ - An Optimizing SMT Solver. In:

Baier C., Tinelli C. (eds) Tools and Algorithms for the

Construction and Analysis of Systems. TACAS.

Lecture Notes in Computer Science, vol. 9035.

Springer, Berlin, Heidelberg, 194-199, 2015.

https://doi.org/10.1007/978-3-662-46681-0_14.

[14] Mohammad-H. Tayarani-N. and Adam Prügel-

Bennett: On the Landscape of Combinatorial

Optimization Problems. In IEEE Transactions on

Evolutionary Computation, vol. 18, no. 3. Springer,

Berlin, Heidelberg, 420-434, 2014. https:

//doi.org/10.1109/TEVC.2013.2281502.

[15] Anh-Tuan Nguyen, Sigrid Reiter, and Philippe Rigo:

A review on simulation-based optimization methods

applied to building performance analysis. Applied

Energy, vol. 113, 1043–1058, 2014. https:

//doi.org/10.1016/j.apenergy.2013.08.061.

[16] Rolf Steinbuch and Simon Gekeler (Ed.): Bionic

Optimization in Structural Design. Springer-Verlag

Berlin Heidelberg, 2016, ISBN 978-3-662-46595-0.

https://doi.org/10.1007/978-3-662-46596-7.

[17] Giorgio Chiandussi, Marco Codegone, Simone

Ferrero, and F. E. Varesio: Comparison of multi-

objective optimization methodologies for engineering

applications. Comput. Math. Appl. 63, 5 (March

2012), 912-942, 2012. http://dx.doi.org/10.1016/

j.camwa.2011.11.057.

[18] Sergey Rodzin and Lada Rodzina: Theory of bionic

optimization and its application to evolutionary

synthesis of digital devices. In Proc. of IEEE East-

West Design & Test Symposium (EWDTS 2014),

Kiev, 1-5, 2014. http: //dx.doi.org/10.1109/

EWDTS.2014. 7027058.


48

[19] R. Timothy Marler and Jasbir S. Arora: The weighted

sum method for multiobjective optimization: New

insights. Structural and Multidisciplinary

Optimization, vol. 41, 853–862, 2010. https://doi.org/

10.1007/s00158-009-0460-7.

[20] I. Y. Kim and O. L. de Weck: Adaptive weighted sum

method for multiobjective optimization: A new

method for Pareto front generation. Structural and

Multidisciplinary Optimization, vol. 31, 105–116,

2006. https://doi.org/10.1007/s00158-005-0557-6.

[21] Russell Eberhart and Jim Kennedy: A new optimizer

using particle swarm theory. In Proceedings of the

Sixth International Symposium on Micro Machine and

Human Science, 39-43, 1995. https:

//doi.org/10.1109/MHS.1995.494215.

[22] Vagelis Plevris and Manolis Papadrakakis: A hybrid

particle swarm - gradient algorithm for global

structural optimization. Comp.-Aided Civil and

Infrastruct. Engineering, vol. 26, 48-68, 2011.

https://doi.org/10.1111/j.1467-8667.2010.00664.x.

[23] Amitava Chatterjee and Patrick Siarry: Nonlinear

inertia weight variation for dynamic adaptation in

particle swarm optimization. Computers &

Operations Research, vol. 33, 859-871, 2006.

https://doi.org/10.1016/j.cor.2004.08.012.

[24] Marco Cavazzuti: Optimization Methods: From

Theory to Design. Springer-Verlag Berlin Heidelberg,

2013, ISBN 978-3-642-31186-4. https:

//doi.org/10.1007/978-3-642-31187-1.

[25] Éric Walter: Numerical Methods and Optimization: A

Consumer Guide. Springer International Publishing,

2014, ISBN 978-3-319-07670-6. https://doi.org/

10.1007/978-3-319-07671-3.


49

Logic Optimization of Majority-Inverter GraphsHeinz Riener1, Eleonora Testa1, Winston Haaswijk1, Alan Mishchenko2,Luca Amarú3, Giovanni De Micheli1, Mathias Soeken1

1EPFL, Lausanne, Switzerland2UC Berkeley, CA, United States3Synopsys Inc, Sunnyvale, CA, United States

Kurzfassung

Majority-Inverter Graphen (MIGs) sind eine multi-level Logikrepräsentation von Booleschen Funktionen mit be-merkenswerten algebraischen und Booleschen Eigenschaften, die effiziente Logikoptimierungen über die Möglichkeitenkonventioneller Logikrepräsentationen hinaus erlauben. In dieser Arbeit, überblicken wir zwei moderne Logikopti-mierungsmethoden für MIGs: cut rewriting und cut resubstitution. Beide Algorithmen sind generisch und könnenauf beliebige graph-basierte Logikrepräsentationen angewandt werden. Wir beschreiben sie in einem vereinheitlichenFramework und präsentieren experimentelle Ergebnisse für Größenoptimierung von MIGs unter Verwendung der EPFLBenchmarks.

Abstract

Majority-inverter graphs (MIGs) are a multi-level logic representation of Boolean functions with remarkable algebraicand Boolean properties that enable efficient logic optimizations beyond the capabilities of conventional logic representa-tions. In this paper, we survey two state-of-the-art logic optimization methods for MIGs: cut rewriting and cut resubstitu-tion. Both algorithms are generic and can be applied to arbitrary graph-based logic representations. We describe them ina unified framework and show experimental results for MIG size optimization using the EPFL combinational benchmarksuite.

1 Introduction

Logic optimization of multi-level Boolean networks playsan important role in automated design flows for digital sys-tems and is responsible for substantial area and delay re-ductions [1, 2]. These logic optimizations are commonlycarried out on a simple and technology-independent repre-sentation of the digital logic. Particularly, homogeneousdata-structures, such as and-inverter graphs (AIGs) [3,4]—being composed of two-input ANDs and inverters—ormajority-inverter graphs (MIGs) [5]—being composed ofmajority-of-three gates and inverters—have been proven tobe successful. Structural hashing on the intermediate rep-resentation ensures that no two nodes have identical incom-ing edges. Arbitrary Boolean networks can be transformedinto AIGs or MIGs, for which a repertoire of scalable opti-mization techniques is available [6].Recently, MIGs have received much attention due to theirremarkable algebraic and Boolean properties. On the onehand MIGs share many characteristics of AIGs such thatsimple and efficient optimization are possible. On theother hand, MIGs generalize AIGs and enable a more com-pact representation of logic functions. The logic ANDx ∧ y of two functions x and y can be represented witha majority expression 〈0xy〉 by assigning the third inputto constant 0. Consequently, all AIGs are convertible toMIGs without increasing the number of nodes. Figure 1illustrates the compactness of MIGs by showing the func-

∧∧

∨

∨

∧

∧

∨

∧

∧

∨

∧

∧x1 x3x2 x5x3

prime5(x1, . . . ,x5)

(a)

∧

M

M

∧ ∧

M

M

x1x3 x2 x5x4

prime5(x1, . . . ,x5)

(b)

Figure 1 Example of a (a) logic representation us-ing AND, OR, and complemented edges and a (b)MIG representation for function prime5(x1, . . . ,x5) =[(x5 . . .x1)2 is prime]. Majority-3, AND, and OR nodesare distinguished by label M, ∧, and ∨, respectively.Complemented edges are drawn using dashed lines.

tion prime5(x1, . . . ,x5) = [(x5 . . .x1)2 is prime] representedusing ANDs, ORs, and complemented edges (on the left)and as MIGs (on the right).The focus of this paper lies on two Boolean optimizationtechniques:

1) Boolean rewriting is a coarse-grained optimizationtechnique that iteratively selects small parts of aBoolean network and replaces them with more compact


50

implementations in order to reduce the overall numberof nodes, while maintaining the global output functionsof the Boolean network.

2) Boolean resubstitution is a more fine-grained techniquethat reexpresses the Boolean functions of particularnodes using nodes already present in the Boolean net-work. Nodes which are no longer used (including nodesin the transitive fan-ins) can then be removed from theBoolean network.

Effective implementation of both ideas are available forAIGs [6, 7], which exploit peephole optimization tech-niques using cuts, truth tables, and pre-computation in or-der to scale to large Boolean networks.We survey state-of-the-art generalization of Booleanrewriting and Boolean resubstitution applicable to arbitrarygraph-based logic representations. In particular, we discusscut rewriting [8], an on-the-fly rewriting technique usingexact synthesis [9, 10], and cut resubstitution [11], a scal-able rule-based resubstitution technique. Both techniquesare DAG-aware and exploit structural hashing to obtain again even when a smaller part of logic is replaced with alarger one, by reusing already existing logic in the Booleannetwork. The two techniques are implemented in the EPFLlogic synthesis library mockturtle1 [12]. In our experi-ments using the EPFL combinational benchmarks suite, weshow that the proposed techniques are capable of reducingthe benchmark’s size by 23.54% in 392.72s when appliedinterleaved until convergence.

2 Preliminaries

A Boolean network N is a directed acyclic graph (DAG).Each node corresponds to a logic gate. Each directed edge(n,m) is a wire connecting node n with node m. The fanin,respectively fanout, of a node n ∈ N are the incoming, re-spectively outgoing, edges of the node. The primary inputs(PIs) are the nodes of the Boolean network without fanin.The primary outputs (POs) are the nodes of the Booleannetwork without fanout. All other nodes in the Booleannetwork are gates.A cut is a pair (r,L) where r is a node, called root, and L is aset of nodes, called leaves, such that 1) each path from anyprimary input to r passes through at least one leaf and 2) foreach leaf l ∈ L, there is at least one path from a primaryinput to r passing through l and not through any other leaf.The cover N.cover(r,L) of a cut (r,L) of network N is theset of all nodes n ∈ N that appear on a path from any l ∈ Lto r including r, but excluding the leaves.A fanout-free cone (FFC) of a node r is a cut (r,L) suchthat no node r′ ∈N.cover(r,L) with r′ 6= r has a parent nodethat is outside of N.cover(r,L). The maximum fanout-freecone (MFFC) of a node r is its largest FFC. In other words,the MFFC of a node contains all the logic used exclusivelyby the node. When a node is removed or substituted, thelogic in its MFFC can be removed [13].

1EPFL Logic Synthesis Lib., https://github.com/lsils/lstools-showcase

3 Cut Rewriting

Algorithm 1 shows the pseudo code of cut rewriting. Thealgorithm starts by enumerating all cuts of network N withcut size l and cut limit p using cut enumeration tech-niques [14, 15, 13].Since cuts found by cut enumeration may not be an FFC,DAG-aware rewriting techniques [7] are used to computethe gain of possible replacement candidates. After all re-placement candidates and their gain have been computed,the algorithm finds a set of replacement candidates thatmaximizes the overall gain.Next, an empty graph G(V,E) is initialized that will beconstructed when enumerating replacement candidates forthe cuts. The graph has vertices V for cuts, and an edge inE if two cuts have overlapping logic and can therefore notbe replaced simultaneously. Each vertex is also assigned toa root node r′ of a best replacement candidate and the po-tential gain when being replace by r′. The replacements forthe cuts are constructed in the network with dangling rootnodes while computing the potential gains. On termina-tion, all remaining dangling nodes are recursively removedfrom the network.For each cut (r,L) the algorithm enumerates possible re-placements (r′,L) either looking the replacements up froma pre-computed database of best implementations or on-the-fly using SAT-based exact synthesis. The replacementsare not required to be size optima. The runtime of exactsynthesis can be controlled by setting thresholds on theconflict limit of the SAT solver. For each replacement can-didate the gain is stored in a variable together with the bestreplacement candidate [9]. If a replacement with root r′

that leads to a gain can be found, a vertex (r,L,r′) for thecut is added to G, i.e., the cut (r,L) can be replaced by thecut (r′,L). Afterwards edges are added to G for each twocuts that have overlapping covers. To obtain a good subsetof non-conflicting replacement candidates we heuristicallysolve the maximum weighted independent vertex set prob-lem on G with respect to the gain weights in the graph usingthe greedy algorithm GWMIN [16], which provides an ap-proximation guarantee of finding a solution with a weightof at least 1

∆α(G), where ∆ is the degree of G and α(G) is

the weight of the globally optimum solution.

4 Cut Resubstitution

Algorithm 2 shows the pseudo code of cut resubstitution.The algorithm iterates over all nodes r in a given net-work N, identifies possible node replacements r′ of r usingexisting logic in N, and resubstitutes r with r′ if the overallnumber of nodes in the logic network is reduced.For each node r, first a reconvergence-driven cut [6] iscomputed restricted with cut size limit k. Next, from thesame node r, an MFFC M is constructed to estimate howmany nodes can be freed if r is replaced. Each node ofthe cut, which is not part of M, is considered a potentialcandidate for replacing r and added to a list D of divisors.The local functions of the nodes n within thereconvergence-driven cut are computed using truth


51

Input : Boolean network N, cut size k, cut limit pSet C← N.enumerateCuts(k, p);Set T ← N.simulateCuts(C);Set G← (V = /0,E = /0);foreach node r ∈ N do

Set M← N.computeMFFC(r);if |M|= 1 then continue;foreach leaves L ∈C(r) do

r′← N.computeBestReplacement(r,L,T );if r′ 6=⊥ then G.addVertex(r,L,r′);

endendforeach L1 ∈C(r1) and L2 ∈C(r2) do

if N.cover(r1,L1)∩N.cover(r2,L2) 6= /0 thenG.addEdge(r1,L1)— (r2,L2);

endendSet V ′← G.maximalIndependentVertexSet();foreach (r,L,r′) ∈V ′ do

N.replaceNode(r,r′);endreturn N;

Algorithmus 1 : Cut rewriting

Data : Logic network N, cut size kResult : Optimized logic networkforeach node r ∈ N do

Set L← N.computeReconvDrivenCut(r,k);Set M← N.computeMFFC(r,L);Set D← N.collectDivisors(r,L,M);Set T ← N.simulate(L,D);Set r′← N.resubKernel(r,D,M,T );if r′ 6=⊥ then N.replaceNode(r,r′);

endreturn N;

Algorithmus 2 : Cut resubstitution

tables. The core of the algorithm is a rule-based resubstitu-tion kernel that identifies possible replacements of r usingdivisors in D. If a possible replacement r′ is found by theresubstitution kernel, then r is replaced with r′ and theBoolean network is updated. If no replacement is found(i.e., the kernel returns ⊥), then the algorithm continueswith the next node.The actual resubstitutions are computed by the resubsti-tution kernel that compares divisors and suggests possi-ble replacements. The resubstitution kernel contains thoseparts of the resubstitution algorithm, which have to be cus-tomized for the logic representation in use. In particular, aresubstitution kernel defines resubstitution rules and filter-ing rules:

1. A resubstitution rule is a simple, repetitive test to de-termine if a given node can be reexpressed with divi-sors using a fixed resubstitution pattern. For instance,a 1-AND resubstitution rule tests for each pair of can-didate divisors d1,d2 ∈ D with d1 6= d2 if r = d1∧d2.

2. A filtering rule implements a necessary or sufficientcondition to pre-filter the divisors in D with the ob-jective to reduce the number of tests in resubstitu-tion rules. For instance, in order to speed-up 1-AND

resubstitution, one may pre-compute those divisorsU ⊂ D that imply r, i.e.,

d ∈U iff d ∈ D∧d =⇒ r.

Filtering rules lead to performance improvements ifthe filters can be leveraged by multiple resubstitutionrules.

For MIGs, we consider five resubstitution rules:

1. Constant resubstitution replaces r if equivalent to aBoolean constant 0 or 1.

2. Divisor resubstitution replaces r if equivalent to a di-visor in the current cut or its complement.

3. Relevance resubstitution replaces r if one of its chil-dren can be replaced by a divisor.

4. 1-MAJ resubstitution replaces r with one newly addedmajority gate using three divisors from the current cut.

5. 2-MAJ resubstitution replaces r with two newly addedmajority gates using five divisors from the current cut.

5 Experiments

We have implemented the proposed algorithms in C++-17using the EPFL logic synthesis library[12] mockturtle in ageneric way such that they can in principle be applied toarbitrary logic representations.We present MIG size optimization results for the EPFLcombination benchmark suite. We apply cut rewriting(RW) using a database of best MIGs [8] and cut resubsti-tution (RS) using a resubstitution kernel specifically de-signed for MIGs [11], which adds at most two MIG nodesto the Boolean network. Both techniques, RW and RS,are applied to the Boolean network interleaved until con-vergence. Table 1 is organized as follows: the first threecolumns name the benchmarks (Name) and show the initialsize (Size) and depth (Depth) of the circuits. The next fourcolumns present the results after size optimization, i.e., thereduced size (Size) and depth (Depth) of the benchmarks,the number of iterations until convergence (It), and theruntime (Time). One iteration refers to one execution ofcut rewriting and cut resubstitution. The last column (Im-prov.) shows the size reduction when compared to the ini-tial benchmarks. Overall, the proposed size optimizationflow achieves an average size reduction of 23.54% (108954MIG nodes) in 392.72s.

6 Conclusion

We have presented two state-of-the-art methods for logicoptimizing of Boolean networks: cut rewriting and cut re-substitution. Both techniques are generic and can be ap-plied to arbitrary logic representations.Both algorithms leverage DAG-awareness, cut-based com-putations, and truth tables to scale to large Boolean net-works.


52

Table 1 Size Optimization of EPFL Benchmarks

Benchmark (RW · RS)+ Improv.

Name Size Depth Size Depth It Time ηa(Size)[s] [%]

adder 1020 255 512 130 3 0.14 49.80arbiter 11839 87 11839 87 1 2.10 0.00bar 3336 12 3073 13 2 0.65 7.88cavlc 693 16 602 16 4 3.49 13.13ctrl 174 10 81 10 3 0.04 53.45dec 304 3 304 3 1 0.02 0.00div 57247 4372 36154 4337 6 46.52 36.85hyp 214335 24801 162416 16795 4 251.36 24.22i2c 1342 20 1180 18 5 0.51 12.07int2float 260 16 209 16 3 0.10 19.62log2 32060 444 30387 422 6 25.07 5.22max 2865 287 2301 208 4 1.02 19.69mem_ctrl 46836 114 41757 113 5 25.58 10.84multiplier 27062 274 24496 273 4 12.78 9.48priority 978 250 683 181 5 0.39 30.16router 257 54 244 53 2 0.07 5.06sin 5416 225 4910 196 13 9.13 9.34sqrt 24618 5058 11433 4131 5 14.05 53.56square 18484 250 17137 131 4 8.03 7.29voter 13758 70 4822 53 10 6.04 64.95

Total 462884 353930 392.72 23.54

We have described both algorithm in a unified frameworkand have shown experimental results for MIG size opti-mization using the EPFL combinational benchmark suite.

7 Acknowledgments

This research was supported by the Swiss National Sci-ence Foundation (200021-169084 MAJesty); by the Euro-pean Research Council in the project H2020-ERC-2014-ADG 669354 CyberCare and by SRC contracts “SAT-based methods for scalable synthesis and verification” and“Deep integration of computation engines for scalability insynthesis and verification”.

8 Literatur

[1] R. K. Brayton, G. D. Hachtel, and A. L. Sangiovanni-Vincentelli, “Multilevel logic synthesis,” Proceed-ings of the IEEE, vol. 78, no. 2, pp. 264–300, 1990.

[2] G. De Micheli, Synthesis and Optimization of DigitalCircuits. McGraw-Hill, 1994.

[3] L. Hellerman, “A catalog of three-variable Or-invertand And-invert logical circuits,” TEC, vol. 12,no. 3, pp. 198–223, 1963. [Online]. Available:http://dx.doi.org/10.1109/PGEC.1963.263531

[4] A. Kuehlmann, V. Paruthi, F. Krohm, andM. K. Ganai, “Robust Boolean reasoningfor equivalence checking and functional prop-erty verification,” TCAD, vol. 21, no. 12,pp. 1377–1394, 2002. [Online]. Available:http://dx.doi.org/10.1109/TCAD.2002.804386

[5] L. Amarú, P. Gaillardon, and G. De Micheli,“Majority-Inverter Graph: A New Paradigmfor Logic Optimization,” IEEE Transactions onComputer-Aided Design of Integrated Circuits andSystems, vol. 35, no. 5, pp. 806–819, 2016.

[6] A. Mishchenko and R. K. Brayton, “Scalable logicsynthesis using a simple circuit structure,” in Int’lWorkshop on Logic and Synthesis, 2006, pp. 15–22.

[7] A. Mishchenko, S. Chatterjee, and R. K.Brayton, “DAG-aware AIG rewriting a freshlook at combinational logic synthesis,” inDAC, 2006, pp. 532–535. [Online]. Available:http://doi.acm.org/10.1145/1146909.1147048

[8] H. Riener, W. Hasswijk, A. Mishchenko, G. DeMicheli, and M. Soeken, “On-the-fly and DAG-aware: Rewriting Boolean networks with exact syn-thesis,” in DATE, 2019, to appear.

[9] W. Haaswijk, A. Mishchenko, M. Soeken, andG. De Micheli, “SAT based exact synthe-sis using DAG topology families,” in Proceed-ings of the 55th Annual Design Automation Confer-ence, DAC 2018, San Francisco, CA, USA, June 24-29, 2018, 2018, pp. 53:1–53:6. [Online]. Available:https://doi.org/10.1145/3195970.3196111

[10] W. Haaswijk, M. Soeken, A. Mishchenko, and G. DeMicheli, “SAT-Based Exact Synthesis: Encodings,Topology Families, and Parallelism,” IEEE Transac-tions on Computer-Aided Design of Integrated Cir-cuits and Systems, 2019, to appear.

[11] H. Riener, E. Testa, L. Amaru, M. Soeken, andG. De Micheli, “Size optimization of MIGs withan application to QCA and STMG technologies,”in NANOARCH, 2018, pp. 157–162. [Online].Available: https://doi.org/10.1145/3232195.3232202

[12] M. Soeken, H. Riener, W. Haaswijk, andG. De Micheli, “The EPFL logic synthesis li-braries,” Computer Science - Logic in Computer Sci-ence, vol. abs/1805.05121, 2018. [Online]. Available:http://arxiv.org/abs/1805.05121

[13] A. Mishchenko, S. Cho, S. Chatterjee, andR. K. Brayton, “Combinational and sequen-tial mapping with priority cuts,” in IC-CAD, 2007, pp. 354–361. [Online]. Available:http://dx.doi.org/10.1109/ICCAD.2007.4397290

[14] J. Cong and Y. Ding, “On area/depth trade-offin LUT-based FPGA technology mapping,” TVLSI,vol. 2, no. 2, pp. 137–148, 1994. [Online]. Available:http://dx.doi.org/10.1109/92.285741

[15] J. Cong, C. Wu, and Y. Ding, “Cut ranking and prun-ing: Enabling a general and efficient FPGA mappingsolution,” in FPGA, 1999, pp. 29–35. [Online]. Avail-able: http://doi.acm.org/10.1145/296399.296425

[16] S. Sakai, M. Togasaki, and K. Yamazaki, “Anote on greedy algorithms for the maximumweighted independent set problem,” Discrete AppliedMathematics, vol. 126, no. 2-3, pp. 313–322, 2003.[Online]. Available: https://doi.org/10.1016/S0166-218X(02)00205-6


53

1

How to Keep 4-Eyes Principle in a Design andProperty Generation Flow

Keerthikumara Devarajegowda1,2, Wolfgang Ecker1,3, Wolfgang Kunz1,2

Infineon Technologies AG1 - Technische Univeritat Kaiserslautern2 - Technische Univeritat Munchen3

Email: [email protected], [email protected], [email protected]

Zusammenfassung—Ein bedeutendes Problem in der Halblei-terindustrie ist ’design productivity‘, die wegen der konstan-ten Komplexitatssteigerung von System-on-Chips und FPGAsschnell zuruckgeht. Eine wichtige Gegenmaßnahme sind dieAutomatisierung von Entwicklungsschritten wie RTL-Designund properties (Eigenschaften) fur functional verification (funk-tionale Verifikation). Methoden die sowohl Design- als auchverification(Verifications-) aufgaben automatisieren mussen dasVier-augen-prinzip als Grundvoraussetzung erfullen, um dieQualitat der generierten Designs gewahrleisten zu konnen. Indiesem Paper prasentieren wir einen Generierungsmethode diespeziell dafur entwickelt wurde, dem Vier-augen-prinzip zugenugen und gleichzeitig dafur genutzt wird, um sowohl RTL-Design als auch properties(Eigenschaften) fur functional veri-fication (funktionale Verifikation) zu generieren. Diese Methodewurde bereits fur die Generierung eines RISC-V Prozessor Coresund weiteren Peripherie Geraten erfolgreich eingesetzt. DieseAutomatisierungsmethode fuhrte zu einem bedeutenden Vermin-derung des dafur notigen manuellem Aufwand und gleichzeitigzu einer Verbesserung in der Qualiat der generierten Designs.

Abstract—A significant issue in the semi-conductor industryis ‘design productivity’, which is fast diminishing due to theconstant growth in the complexity of system-on-chips and FPGAs.An important counter measure is automating development taskssuch as RTL design and properties for functional verification.Frameworks that automate both design and verification tasksmust satisfy 4-eyes principle as a basic requirement to ensurethe quality of generated designs. In this paper, we present angeneration approach developed to obey 4-eyes principle and isused to generate both RTL designs and properties for verification.The approach has been successfully used to generate RISC Vprocessor core and several peripheral devices. The automationapproach has resulted in significant reduction in the manualefforts needed and improvements in the quality of generateddesigns.

Index Terms—Design Automation, 4-Eyes Principle, Automa-tion Frameworks, Model Based Generation

I. INTRODUCTION

As the electronic industry constantly grew in last decades,the complexity of System-on-Chips (SoCs) and FPGAs alsogrew at the same rate. With ever-increasing demand for en-hanced functions and special features, the designs are expectedto further grow in complexity. While the complex systems withabundant features, enhanced the applicability and usability forthe end consumer, the development process of such systemsare increasingly missing the original schedules [7]. Missingdeadlines can have catastrophic consequences as the time-to-market (TTM) is a key parameter for the product’s revenue.

Therefore, companies are constantly striving for novel methodsto efficiently drive the development processes.

Hardware design generation is seen as the next produc-tivity driver after IP reuse. Several design generation tech-niques/methods exists that report significant gains comparedto the manual coding [1], [3], [6], [10], [12], [14]. The mainintention behind generating the hardware designs is to fast-forward the development process, which in turn lead to shortertime-to-market (TTM) of the products. However, due to thecomplexity of hardware designs, the functional verification ofthese designs require at-least 50% of the overall developmenttime [7]. Hardware designs are verified to ensure the qualityand functionality of the Register Transfer Level (RTL) code. Inorder to avoid the productivity loss due to verification efforts,several techniques exist to automate the testbench development[4], [8], [9], [13].

A fundamental requirement of the hardware design flowis to obey the 4-Eyes Principle (4EP). 4EP (also known as2 person rule) is a well known control mechanism followedwidely in the decision making process for critical applicationsor operations. According to the rule, any process/decisionmust be carried out by at-least 2 people. 4EP is necessaryto avoid errors or bugs resulting from human mistakes and iscrucial to safety and secure applications. 4EP is also appliedin taking management level decisions, important documentpublishing, software development flows, etc.. In a typicalhardware design flow, the RTL code is verified by a differentindividual (verification engineer) than the design engineer toobey the 4EP.

The automatic code generation frameworks are developedonce and are employed over several projects. In addition toimproving the productivity, generation leads to better codequality and efficiency. Any error in these frameworks needto be fixed only once and at one place. Nevertheless, thegeneration flows are also susceptible to unknown or hiddenbugs. In order to ensure that unknown bugs in the automationframeworks do not hide the real bugs in the RTL design, it isimportant to consider the 4EP for automation flows. In otherwords, RTL and verification code (property) generation musttake separate paths to the target code from the specifications.

Developing code generators is also a complex developmentprocess and requires considerable resources. A systematicapproach is needed such that the generation flow addresses


54

2

different target languages in different platforms. Our genera-tion approach is formulated considering the following requiredaspects:

1) Productivity : We use an automation framework thatutilizes meta-models to capture the abstract structuredspecifications of the intended hardware system. The au-tomation frameworks provides an infrastructure for creat-ing different alternatives of the intended system withoutmuch manual effort.

2) Ease of developing code generators : Inside the automa-tion framework, we adapt the Model Driven Architecture(MDA) proposed by Object Management Group (OMG)for developing code generators. The approach proposes tosplit the generation into multiple layers and uses model-to-model transformations as a way of building flexiblegenerators.

3) Multiple target views : MDA adaption for code generationhelps to build the models that are independent of theplatform specific languages. This in turn allows to gen-erate the intended system in multiple target languages bymapping the platform independent models to the platformspecific models.

4) 4EP : The generation of RTL and properties satisfiesthe 4EP as the flows take independent paths from thesingle source (specifications). The generated propertiesare used to formally verify the RTL behavior in a modelchecker. Hence additional details of the RTL informationare needed. A special binding model is used to combinethe grey box details needed for the property generation.

The rest of the paper is outlined as follows: in section II,we provide a high level view of the hardware design flowobeying 4EP. Section III describes the automation approachfor generating RTL and properties by still obeying the 4EP.In section IV, we demonstrate various aspects of the flowby considering RISC-V (Reduced Instruction Set Computing)cpu as the running example. A short summary in section Vcompletes the paper.

II. 4-EYES PRINCIPLE IN HARDWARE DESIGN FLOWS

As mentioned earlier 4EP is a key mechanism applied toavoid human errors in decision making processes. Similarlyin the hardware industry, 4EP is commonly employed in de-velopment flows. Typically in a hardware design flow, the firststep is to capture the customers requirements and formulate theproduct details with technical specifications. After this steps asshown in (left part of) Fig. 1, the design and concept engineerscreate an architectural description.

Design engineers write the RTL description in a preferredHardware Description Language (HDL) to realize the intendedhardware. This step is controlled by HDL coding guidelines(manual) and linting checks1 (automatic). The design engi-neers also verify (with directed or random tests or assertions)and fix the issues in the RTL code for intended behaviour.

1Linting is a feature provided by EDA tools and is performed to checkcommon errors that lead to buggy scenarios or problems that can be caughtby static analysis

Interpretation

Specification

implementationmeets

specification?

DesignEngineer

Verification Engineer

Figure 1: Hardware design flow obeying 4EP

Surveys suggest that design engineers spend half of their timein verifying the RTL code [7]. However, verification performedby the designers alone cannot be considered for releasing thedesign for production as it does not satisfy the 4EP.

In order to obey the 4EP and avoid bugs occurring due tohuman errors, a parallel verification flow as shown in (rightpart of) Fig. 1 is followed. A verification engineer interpretsthe specification and implements test cases/properties in aHardware Verification Language (HVL) to verify whetherthe implementation meets specification. Depending on variousaspects such as design size, quality required and projectschedules, different verification techniques as simulation, for-mal verification or emulation are used [7]. The testbenchcomprising of all the test cases is evaluated against the RTLdesign in a commercially available EDA tool (simulators/-model checkers/emulators). The verification of the RTL code isconcluded as complete once all the expected results (functionaland coverage metrics) are met. Additionally, the testbench isreviewed by another verification engineer to ensure the qualityof the test cases/properties.

III. 4-EYES PRINCIPLE IN HARDWARE AUTOMATIONFLOW

While the automation frameworks help to improve the pro-ductivity and efficiency of the development flows, developingthe code generators itself requires a systematic approach.Ecker et al. proposed an automation framework for systemlevel synthesis based on meta-modeling and reported signif-icant productivity gains [15]. Additionally, OMG proposed amodel driven approach for systematic development of codegenerators [11]. The approach adapted for generating hardwaredesigns and corresponding properties follows MDA principlefor code generators in an automation framework based onmeta-modeling.

Fig. 2 depicts the high level view of the RTL (left part) andproperty (right part) generation flow. The middle part repre-sents a binding model used to provide grey box informationrequired by the properties as they are used in a model checkerto formally verify the RTL behavior.


55

3

Specification

RTL Generation

DSL for RTL

Model of Design

Transformation

Model of View

View files

(.vhd, .vlog, .sv)

Property Generation

DSL for Properties

Model of Property

Transformation

Model of View

View files

(.sva, .vhi)

Model of Binding

Binding

Transformation

Formal Specifications

Figure 2: Automation framework for generating RTL and properties

As described in II, the development process starts with thespecifications of the intended system. A common practice inthe industry is to capture these specifications in a contextsensitive (informal) language (ex: English, German). Captur-ing the specifications in a context-free (formal) format is theentry point to the flow. In order to achieve the same, for agiven specification, a meta-model is created using UML classdiagram to represent the abstract structure of the intendeddesign. Additionally, the behavior of the structures is described(via attributes) in an abstract expression. These formal specifi-cations are then used as the single source for RTL and propertygeneration as shown in Fig. 2. We call the formal specificationmodels as Model-of-Things (MoTs). In the original MDA,MoTs correspond to Computation Independent Model (CIM).

A. RTL GenerationEcker et al. propose an approach to use a Python based

Hardware Domain Specific Language (HDSL) to describethe intended hardware system [5]. The RTL generation flowshown in Fig. 2 uses a similar DSL coded in Python, whichextracts the required information from the formal specificationmodels and creates a blue print of the micro-architecture.For example, for a RISC based CPU, this step involvesdescribing the pipeline architecture (2/3/4/5 stage pipelines)comprising both control and datapath. The micro-architectureis a component tree that is independent of any platform specificdetails and is shown as Model-of-Design (MoD) in Fig. 2.MoD is an instance of a meta-model called MetaRTL, whichdefines the attributes of each component and the semantics forinterconnection of all components.

MoD corresponds to Platform Independent Model (PIM)in the original MDA and is an abstract representation of theintended hardware. The MoD is mapped to a less abstractmodel called Model-of-View (MoV). MoV is an AbstractSyntax Tree (AST) of the target language and corresponds to

Platform Specific Model (PSM) in the MDA definition. Thegrammar of the target language (ex.:VHDL) is captured inEBNF like notation called View Language Description (VLD)and a meta-model is constructed from this notation using theautomation framework. Hence, MoV is an instance of the VLDand an automatically generated un-parser maps the MoD to theRTL code in VHDL, verilog or SystemVerilog.

B. Property Generation

Similar to the RTL generation approach, properties are auto-mated by following the MDA principle for code generation asshown in Fig. 2. The flow starts from the formal specificationsand takes a separate path to the view files (ex: properties inSystemVerilog Assertions). The separation of generation flowsis needed to obey 4-Eyes Principle. However, as the generatedproperties are used in a model checking tool to verify theRTL, additional RTL details are needed to be included in theproperties. In order to achieve the same, a separate bindingmodel is used as described in III-C.

The property generation flow uses a Python coded DSLto describe the property traces as outlined in [5]. The prop-erty DSL is used to extract the specification informationand describe the abstract property models (Model-of-Property(MoP)). An abstract property model is an expression tree thatspans over multiple time points (clock cycles). In other-words,MoP contains an expression tree which is expected to besatisfied by the RTL code. The property model MoP is aninstance of the meta-model “MetaPROP”, which defines anabstract expression tree.

Finally, MoP is mapped to a less abstract language specificmodel MoV to generate the properties in a specific targetlanguage. The flow currently supports properties in SystemVer-ilog Assertions (SVA) and InTerval language (ITL).


56

4

C. Binding

Formal verification of RTL designs using model checkingtools has several proven benefits: ensures high quality veri-fication due to exhaustive analysis; does not require explicitinput stimuli; short debug traces and automatic in nature [16].However, formal verification requires grey-box verificationapproach in order to avoid false failures and to speed up theproof runs. In other-words, certain internal details of the RTLimplementation are needed to be encoded in the properties.Therefore, we create a model called Model-of-Binding (MoB)as shown in Fig. 2, which holds the RTL information neededfor the properties.

Base

Name : string [0..1]

Component

Top : bool [1] = False

Description : string [0..1]

Variable

IsExpression : bool [1] = False

Expression : [0..1]

Metabind

Port

Bitwidth : int [1] = 1

PortType : PortDirection [1] = IN

<<enum>>

PortDirection

IN : PortDirection

OUT : PortDirection

INOUT : PortDirection

PropertyModule

rootNode

SubComponents *

*

*

*

BindModule1

0..1RTLPort

Figure 3: UML class diagram representing meta-model of binding

MoB is an instance of a meta-model, which is shown asa UML class diagram in Fig. 3. The meta-model mainlyconsists of two parts: component details (Class Component)and property module details (Class PropertyModule). Therootnode has a composition relation to both the classes. Acomponent can be a top block and is composed of zero-to-many sub-components and multiple ports. A property modulehas multiple variables (Class Variable) encoded in properties.Additionally. each variable has a reference to a port signal ofa component (Relation RTLPort) as shown.

As shown Fig. 2, MoD is a component tree that containsthe micro-architecture implementation of the intended system.Hence, MoB is automatically populated by iterating throughthe MoD components. Additionally, the generated MoB isappended with a list of variables, used to define the propertytraces (DSL for properties). After this step, for each variable,a port signal of a component is referred manually. For caseswhere a specific RTL port signal is not available, a booleanexpression is defined. The boolean expression is constructedwith port signals as the symbols. Finally, for variables withport signal references, macros are generated in the target file.This improves the readability of the view files and additionallyaddresses any change in the RTL description (i.e., changes inthe MoD due to changes in DSL for RTL).

IV. APPLICATION

To demonstrate the feasibility and effectiveness of the RTLand property generation flows, various architecture alternativesof (1/4/5-stage pipeline) RISC-V core [2] are generated. Inaddition, the flow has been applied to generate and verifyperipheral devices such as timer, data-bus, register interfacesand connectivity unit.

A. Formal Specification (MoTs)

We consider the automation of RISC-V core as the runningexample to describe the various steps outlined in Fig. 2. Asexplained in III, the first step in the generation approach isto analyze the informal specifications and create an abstractmodel to capture the high-level requirements. For this purposea meta-model called MetaRISC is created to model the RISCbased Instruction Set Architectures (ISAs). The UML classdiagram of MetaRISC is shown in Fig. 4.

The MetaRISC meta-model can be visualized as 3 mainparts: instruction encoding (RangeNode, Parameter, Opt, Para-materEncoding), instruction behavior (Instruction, Instruction-Behavior, ObjectSelect) and architectural state (ObjectProper-ties). Instruction encoding describes how the instructions areencoded in the instruction word. Instruction behavior describesthe sequence of operations performed by the instruction andarchitectural state represents the state of the CPU. An instanceof MetaRISC is a valid model of an ISA. For example, oneextension of RISC-V ISA — 32-bit Base Integer InstructionSet [2]) is a valid instance of MetaRISC. An XML snippetof the MoT showing the ADD instruction and its behavior isshown in Fig. 5. A significant aspect of MoTs is that they donot contain any information of the micro-architecture to beimplemented. This in turn allows flexibility in DSL for RTLto define alternative micro-architectures.

B. RTL Generation

Once the MoTs are defined, Python coded DSL is writtento describe the micro-architecture of the intended hardware.A significant feature of the DSL is its user friendlinessfor describing hardware. The infrastructure provided by theautomation framework enables APIs that are optimized fordescribing the hardware. A code snippet is shown in Fig. 6implementing a ALU (simplified) in DSL for RTL. The codesnippet shown implements a simple Arithmetic Logic Unit(ALU) with 4 operations (addition, subtraction, logical AND,logical OR). The port signals reg src1, reg src2, op select andalu out define the interfaces of ALU block. The ALU blockis interfaced with pipeline stages inside a class instantiates allcomponent classes. Use of Python as the automation languageenables describing multiple micro-architecture alternatives.After this step, the flow generates the RTL files by mappingthe MoD to MoV as described in III-A.

C. Property Generation and Binding

Similar to the RTL generation flow, Python coded DSLis used to describe the abstract property models. MoTs


57

5

Opt

Value : string [*]


ObjectProperties


Size : int [1]

NumOfElements : int [1]

Position : int [0..1]

Interpretation : string [1]

Parameter


Metarisc

Name : string [1]

Instruction


Mnemonic : string [0..1]


Format : string [0..1]

Active : bool [1] = True

RangeNode

BitSelect : int [1..*]


rootNode

ParameterEncoding

Name : string [1]

BitSelect : int [1..*]

FixedValue : int [0..1]

0..1

* InstructionTemplate*

Template *

ProgramCounter

RegisterFile

Flags

0..1

1..*

1

DefaultOpt0..1

1..*

EncodingRoot1

*

StateObject

*

SubObject

*

*

1..*

TargetObject

1

InstructionBehavior


DataFlowString : Expression [0..1]

OutEdgeName : Expression [0..1]

ObjectSelect

DataFlowString : Expression [0..1]

OutEdgeName : Expression [0..1]

Figure 4: MetaRISC : A metamodel for modeling RISC based ISA specifications (simplified)

<Name>a d d i n s t r u c t i o n : r e g i s t e r−t y p e</ Name><Mnemonic>add</ Mnemonic>

<O b j e c t S e l e c t><D a t a F l o w S t r i n g>

g p r r d a d d r = HWPLUS( rd addr , L i t e r a l ( ” 1 ’ d0 ” ) )</ D a t a F l o w S t r i n g><OutEdgeName>g p r r d a d d r</ OutEdgeName>

</ O b j e c t S e l e c t><Name>r e g w r i t e</ Name><T a r g e t O b j e c t R e f>3</ T a r g e t O b j e c t R e f><D a t a F l o w S t r i n g>

rd Da ta = HWPLUS( g p r r s 2 d a t a , g p r r s 1 d a t a )</ D a t a F l o w S t r i n g><OutEdgeName>rd Da ta</ OutEdgeName>

<A c t i v e>True</ A c t i v e>



Figure 5: ADD instruction behavior in MoT

c l a s s ALU( S t r u c t u r e ) :def i n i t ( s e l f , * a rgs , ** kwargs ) :

re turn super (ALU, s e l f ) . i n i t (* a rgs , ** kwargs )## p o r t ss e l f . r e g s r c 1 = P o r t ( D i r e c t i o n =” IN ” )s e l f . r e g s r c 2 = P o r t ( D i r e c t i o n =” IN ” )s e l f . o p s e l e c t = P o r t ( D i r e c t i o n =” IN ” )s e l f . a l u o u t = P o r t ( D i r e c t i o n =”OUT” )

## a l u o p e r a t i o n sand2 = s e l f . r e g s r c 1 & s e l f . r e g s r c 2or2 = s e l f . r e g s r c 1 | s e l f . r e g s r c 2sub2 = s e l f . r e g s r c 1 − s e l f . r e g s r c 2add2 = s e l f . r e g s r c 1 + s e l f . r e g s r c 2## s e l e c t d i f f e r e n t o p e r a t i o n si n s = [ and2 , or2 , sub2 , add2 ]s e l = s e l f . o p s e l e c ts e l f . a l u o u t = Mux( I n s = i n s , S e l = s e l )

Figure 6: DSL for RTL: Describing a simple ALU

(MetaRISC instances) are decoded to extract the supportedinstructions (Active tag in Fig. 5), instruction encoding andthe corresponding instruction behavior (DataFlowString tag inFig. 5). A property model (MoP) is defined for each instructionencoding type (ex: jump, store, load, branch, arithmetic, etc.).

A Python code snippet for describing a temporal trace forverifying a register type instruction is shown in Fig. 7. The

def r e g i s t e r t y p e p r o p ( s t a g e s , op ) :## a n t e c e d e n tassume = LAND(EQ( i n s t d e c o d e , 1 ) ,

EQ( SLICE ( i n s t , 6 , 0 ) , R TYPE) ,EQ( SLICE ( i n s t , 1 4 , 1 2 ) , op ) )

## c o n s e q u e n tr e s u l t = s e l f . i n s t . g e t D a t a F l o w S t r i n g ( )t a r g e t = s e l f . i n s t . getOutEdgeName ( )p rove = TIME OFF ( s t a g e s , EQ( t a r g e t , r e s u l t ) )## r e t u r n MoPre turn IMPLY( assume , p rove )

Figure 7: DSL for Properties: Property trace for reg-type instructions (simpli-fied)

Python function register type prop() returns an expressiontree with IMPLY operator as the root and the function isiterated for all the register type instructions. Based on the CPUcore architecture to be verified, the number of pipeline stages(ex.: 5-stage) is passed as an argument (2/4/5 stage pipeline)to the function definition. The expression root consists 2 subexpressions: assume and prove. The assume part constraintsthe property to start from decode stage and assumes a registertype instruction. The prove part extracts the data flow string(getDataFlowString()) and output edge name (getOutEdge-Name()) from the MoT. The defined MoP is an abstract modelof all instructions that belong to the same encoding type.

The property trace is defined without the RTL port signal in-formation. In order to bind the property and RTL information,MoB is generated from the MoD (containing a componenttree) and variables used for describing the properties. Anexample MoB instance generated from MoD of ALU andproperty DSL is shown in Fig. 8. The MoB instance is man-ually edited to add references between the property modulevariables and RTL port signals. A set of macros are generatedfor all the references in the MoB. Finally, MoPs are mapped tothe MoV layer to generate the properties in a preferred targetlanguage (SVA or ITL). A generated property for verifyingthe ADD instruction behavior and macros generated to bindRTL signals is shown in Fig. 9 (in SVA).


58

6

<Component t y p e =” S t r u c t u r e ”><Name>ALU</ Name>1<Component>

<Name>HWPLUS</ Name>. . . p o r t s . . .

</ Component><Component>

<Name>HWMINUS</ Name>. . . p o r t s . . .


<Name>BAND</ Name>. . . p o r t s . . .


<Name>BOR</ Name>. . . p o r t s . . .


<Name>MUX</ Name>. . . p o r t s . . .

</ Component>

2<Name>r e g s r c 1</ Name><D i r e c t i o n>IN</ D i r e c t i o n>32



3<Name>r e g s r c 2</ Name><D i r e c t i o n>IN</ D i r e c t i o n>32

4<Name>o p s e l e c t</ Name><D i r e c t i o n>IN</ D i r e c t i o n>2



5<Name>a l u o u t</ Name><D i r e c t i o n>OUT</ D i r e c t i o n>32

</ Component><Prope r tyModu le>

. . .<v a r i a b l e>

<Name> a l u r e s u l t</ Name><RTLPor tRefe rence>5</ RTLPor tRefe rence>

</ v a r i a b l e>. . .



Figure 8: MoB instance generated from MoD and DSL for Properties

/ / ADD i n s t r u c t i o n P r o p e r t y i n SVAp r o p e r t y r e g i s t e r t y p e p r o p A D D ;

@( posedge c l k )d i s a b l e i f f ( r e s e t )pc == pc +4 &i n s t [ 6 : 0 ] == ” 0110011 ” &i n s t [ 1 4 : 1 2 ] == ” 000 ” |−>##3 rd Da ta = g p r r s 1 d a t a + g p r r s 2 d a t a ;e n d p r o p e r t y−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

/ / macro t o r e p l a c e d e s t i n a t i o n r e g i s t e r w r i t e v a l u e‘ d e f i n e rd Da ta CPUCore top . r e g i s t e r f i l e . r e g w r v a l

/ / macro t o r e p l a c e s o u r c e r e g i s t e r 1‘ d e f i n e g p r r s 1 d a t a CPUCore top . r e g i s t e r f i l e . r e g s r c 1

/ / macro t o r e p l a c e s o u r c e r e g i s t e r 2‘ d e f i n e g p r r s 2 d a t a CPUCore top . r e g i s t e r f i l e . r e g s r c 2

/ / macro t o r e p l a c e a l u r e s u l t‘ d e f i n e a l u r e s u l t CPUCore top .ALU. a l u o u t

Figure 9: Property for verifying ADD instruction behavior in SVA and macrosto bind RTL signals

V. SUMMARY

We proposed an approach for automating both RTL and ver-ification code by following the 4EP. The approach uses DSLsto describe RTL and properties in a high-level language andpresents a novel design automation flow. The combined usageof meta-model based automation framework and adaptationof MDA vision for code vision has significantly improved theproductivity and quality of designs. The approach proposesto translate informal specifications to formal specifications asthe starting step. The generation flows for RTL and propertiestake separate paths from formal specifications to obey 4EP andensure unknown RTL bugs are not hidden by the generationflow. The presented flow is an ideal alternative to time tediousmanual coding of RTL designs with existing HDLs, as theDSLs are custom-tailored to describe the intended hardwarewithout considering the platform specific details.

REFERENCES

[1] Spinal hdl. https://spinalhdl.github.io/SpinalDoc-RTD/index.html. [On-line:accessed 26-Nov-18].

[2] Krste Asanovic and David A. Patterson. Instruction sets should befree: The case for risc-v. Technical Report UCB/EECS-2014-146, EECSDepartment, University of California, Berkeley, Aug 2014.

[3] J. Bachrach, H. Vo, B. C. Richards, Y. Lee, A. Waterman, R. Avizienis,J. Wawrzynek, and K. Asanovic. Chisel: constructing hardware in a scalaembedded language. In The 49th Annual Design Automation Conference,DAC CA, USA, 2012.

[4] K. Devarajegowda and W. Ecker. Metamodel Based Automation ofProperties for Pre-Silicon Verification. In 2018 IFIP/IEEE InternationalConference on Very Large Scale Integration (VLSI-SoC), Oct 2018.

[5] K. Devarajegowda, J. Schreiner, R. Findenig, and W. Ecker. Pythonbased Framework for HDSLs with an underlying Formal Semantics. InProceedings of the 36th International Conference on Computer-AidedDesign, ICCAD ’17, New York, NY, USA, 2017. ACM.

[6] W. Ecker and J. Schreiner. Introducing Model-of-Things (MoT) andModel-of-Design (MoD) for simpler and more efficient hardware gener-ators. In 2016 IFIP/IEEE International Conference on Very Large ScaleIntegration (VLSI-SoC), pages 1–6, Sept 2016.

[7] Harry Foster. Trends in functional verification: a 2016 industry study. InDesign and Verification Conference -2017, San Jose, California, USA.

[8] Namdo Kim, Young-Nam Yun, Young-Rae Cho, J. B. Kim, and ByeongMin. How to automate millions lines of top-level uvm testbenchand handle huge register classes. In 2012 International SoC DesignConference (ISOCC), pages 405–407, Nov 2012.

[9] D. Sheridan, L. Liu, H. Kim, and S. Vasudevan. A coverage guidedmining approach for automatic generation of succinct assertions. In2014 27th International Conference on VLSI Design and 2014 13thInternational Conference on Embedded Systems, pages 68–73, Jan 2014.

[10] Shinya Takamaeda-Yamazaki. Pyverilog: A Python-Based HardwareDesign Processing Toolkit for Verilog HDL. In Kentaro Sano, DimitriosSoudris, Michael Hubner, and Pedro C. Diniz, editors, Applied Recon-figurable Computing, Cham, 2015. Springer International Publishing.

[11] F. Truyen. The fast Guide to Model Driven Architecture.[12] J. Urdahl, S. Udupi, T. Ludwig, D. Stoffel, and W. Kunz. Properties

First? A New Design Methodology for Hardware, and Its Perspectivesin Safety Analysis. In Proceedings of the 35th ICCAD,2016. ACM.

[13] S. Vasudevan, D. Sheridan, S. Patel, D. Tcheng, B. Tuohy, and D. John-son. Goldmine: Automatic assertion generation using data mining andstatic analysis. In DATE 2010, pages 626–629, March 2010.

[14] Bogdan Vukobratovic. Hw design: A functional approach. https://www.pygears.org/. [Online:accessed 10-Jan-19].

[15] L. Zafari A. Goyal W. Ecker, M. Velten. The Metamodeling Approachto System Level Synthesis. In Gerhard Fettweis and Wolfgang Nebel,editors, DATE. European Design and Automation Association, 2014.

[16] Bruce Wille, John C. Gross, and Wolfgang Roesner. ComprehensiveFunctional Verification. Morgan Kaufmann Publishers, 2005.


59

Automated Sensor Firmware Development - Generation, Optimizati-on, and AnalysisJens Rudolf ∗, Manuel Strobel †, Joscha Benz ‡, Christian Haubelt ∗, Martin Radetzki †, and Oliver Bringmann ‡

Institute of Applied Microelectronics and Computer Engineering, University of Rostock, 18051 Rostock, Germany∗

{jens.rudolf, christian.haubelt}@uni-rostock.deChair of Embedded Systems, University of Stuttgart, 70569 Stuttgart, Germany†

{manuel.strobel, martin.radetzki}@informatik.uni-stuttgart.deEmbedded Systems Department, University of Tübingen, 72076 Tübingen, Germany‡

{joscha-joel.benz, oliver.bringmann}@uni-tuebingen.de

Kurzfassung

Der Entwurf Eingebetteter Systeme ist eine wachsende Herausforderung, nicht zuletzt aufgrund stetig steigender Komple-xität der Systeme, Anforderungen wie Echtzeitverarbeitung und hohe Zuverlässigkeit bei gleichzeitig begrenzten Spei-cherkapazitäten, starken Beschränkungen der Leistungsaufnahme sowie kurzen Entwicklungszyklen. In der jüngerenVergangenheit finden sich daher viele Untersuchungen neuer Verfahren zum automatisierten Erzeugen, Analysieren so-wie Optimieren in verschiedenen Phasen des Entwurfsablaufes. Die Bewertung dieser Methoden konzentriert sich jedochhäufig auf das jeweilige Forschungsgebiet und berücksichtigt selten die übergreifenden Effekte, welche sich aus derInteraktion verschiedener Entwurfsmethoden ergeben. In dieser Arbeit stellen wir einen vollständigen, automatisiertenEntwurfsablauf vor, der die Aspekte modellbasierte Generierung, Analyse und Optimierung von Firmware für EingebetteSysteme abdeckt und demonstrieren diesen an einem virtuellen Systemprototyp eines typischen inertialen Sensorkno-tens. Wir zeigen die Integration verschiedener Entwurfs- und Bewertungsmethoden an einem realistischen Beispiel undmotivieren perspektivisch, wie mit den gewonnenen Informationen der Systementwurf verbessert werden kann.

Abstract

Embedded systems design has lately become particularly challenging due to fast increasing system complexities, real-time demands and reliability requirements. At the same time, designs are constrained by stringent power budgets, limitedmemory capacity and short time-to-markets. Recently, various methods to automatically generate, analyze, and optimizedifferent stages of the design flow have been investigated. The evaluation of these methods, however, often focuses on asingle associated field of research and thus may not consider the domain-crossing effects that stem from the interactionof different design methods. This paper presents a complete and fully automated workflow that covers model-basedgeneration, analysis, and optimization aspects for embedded firmware and demonstrates it on a virtual system prototypeof a typical inertial sensor node. We illustrate the integration of different design and evaluation methods on a realisticexample and show potential opportunities for the application of gathered information in order to improve the design.

1 Introduction

The design of embedded systems is subject to steadilyincreasing complexity. Higher functionality and perfor-mance are in direct contrast to given power consumptionand timing budgets of an existing system design. For non-stationary embedded devices, the most severe design re-striction is certainly power consumption. On the otherhand, more and more data is generated by today’s appli-cations and even (pre-)processed in place on the embeddeddevice. This increases the system load and affects the tim-ing characteristics negatively. In sum, these facts lead tocontradictory constraints as well as large and non-trivialdesign spaces. To face these challenges, automated designmethods for constraint-aware hardware/software co-designare essential in order to keep the system design processmanageable at reasonable time and effort.

1.1 MotivationIn this context, different research fields have evolved inthe domain of electronic design automation over the years.This includes e.g. model-based development and gener-ation, automated optimization methods, or fast yet accu-rate system analysis and simulation to name just a few.The evaluation of recent methods is, however, often lim-ited to the scope of the associated research field and rarelygoes beyond. The impact of single components, but more-over their interaction within a hardware/software co-designworkflow, remains open after all. As a consequence, posi-tive as well as negative aspects that originate from the inter-action of different design methods are not actively studied.This leads to unused potential and possibilities in the worstcase, which motivates the work presented in the following.


60

1.2 ContributionOn the example of a smart sensor node, this paper intro-duces a complete and fully automated embedded systemdesign workflow, including generation, optimization, andanalysis aspects. Beyond addressing the single compo-nents of this flow, the main focus is further put on theinteraction of the individual stages, which are defined asfollows:

1. Model-based generation of firmware code for a sensornode with integrated processing unit.

2. System simulation with power and timing analysis byusing a virtual system prototype.

3. Automated optimization of the memory subsystem.

4. Post-optimization simulation coupled with a secondpower and timing analysis step.

Beyond that, a virtual system prototype of a sensor hubsystem and its integration with the above methods and cor-responding implementations is presented. With that, thecombined evaluation of otherwise isolated developmentsteps is demonstrated and discussed. Finally, the possibil-ity for a step-wise refinement of a system design at handthrough interaction of the above stages in a feedback loopis outlined.

1.3 Document StructureState of the art in relevant areas of model-based sensorfirmware development, embedded system memory opti-mization methods, and highly accurate timing simulationis discussed in Section 2. Afterwards, Section 3 introducesthe overall system design workflow, followed by a descrip-tion of the individual building blocks. The ARM-baseddemonstration platform is presented in Section 4, followedby an outlook on potential extensions and future work inSection 5. Results and evaluation for exemplary use casesare given in Section 6. Section 7 concludes this paper.

2 Background and Related Work

The electronic design automation fields that are combinedin this work, i.e. generation, optimization, and analysis,are non-overlapping in terms of related work. The follow-ing section therefore categorizes and discusses the state ofthe art for the individual workflow blocks separately. Ap-proaches that deal with a combined consideration, com-parable to what is presented in this paper, have not beendiscussed in recent literature to this day to the best of ourknowledge.

2.1 Model-based Sensor Firmware Devel-opment and Generation

Firmware, running on modern embedded sensor subsys-tems has recently become increasingly complex due to arising number of tasks carried out directly on the integrated

processing unit [1]. Its development thus requires a model-based approach in order to meet today’s constrained bud-gets and time-to-market.Different dataflow based models of computation (MoC),e.g. synchronous dataflow (SDF) by Lee et al. [2], cyclo-static dataflow (CSDF) by Bilsen et al. [3], or scenario-aware dataflow (SADF) by Theelen et al. [4] have beenstudied extensively and proven suitable for design and op-timization of digital signal processing applications (DSP)[5] in the past. These MoC provide useful information onimportant aspects of the system under design, e.g. consis-tency, deadlock-freeness as well as throughput, allow foroptimization of for example latency [6] and memory usage[7] and enable the automatic synthesis of a periodic admis-sible sequential schedule (PASS) [8].Tools and frameworks as for example Ptolemy II [9] andothers have emerged, enabling developers to compose,evaluate and visualize such models. These tools providesophisticated algorithms for model transformation and op-timization and software synthesis for DSP targets [10] aswell as synthesis of parallel hardware implementations[11] has been conducted successfully.However, fully automatic firmware synthesis for low re-source embedded systems such as microelectromechani-cal (MEMS) sensor nodes and hubs remains critical dueto the stringent requirements to code and data size as wellas system overall energy consumption. To this day, avail-able solutions lack the ability to configure the hardware’spower modes. Furthermore, they provide no mechanismto automatically integrate pre-existing C or C++ algorithmcode and thus do not allow to easily reuse well-tested andplatform-optimized implementations without manual inte-gration effort.For our embedded design workflow, we propose a hybridapproach to automatically generate a complete operationalsensor firmware binary from a dataflow model, which con-figures the sensor to use the optimal available power modesfor the given task and leverages software reuse by integrat-ing with existing target platform code.

2.2 Embedded System Memory Optimiza-tion Methods

In embedded system design, hardware and software are of-ten developed side by side. This fact allows direct influenceon the structure of the memory subsystem, which is of highinterest due to the large share that is attributed to it in termsof power consumption. For Static Random-Access Mem-ory (SRAM), still the dominant memory technology in em-bedded devices, corresponding figures in literature go, de-pending on the application, far beyond 50% compared tothe overall system energy budget [12]. Potential optimiza-tion subjects in this connection are memory allocation, i.e.how many memory instances or banks and of which size touse, and application binding, i.e. mapping of applicationcode and data to memory units.Sensor nodes or hubs can be seen as low-end embeddeddevices, a class of systems that rarely comes with cachememory. Instead, a widely adopted concept for embed-ded systems in this class are so-called scratch-pad memo-


61

ries, a static form of caching memory. That is to say, thescratch-pad content is statically determined at system de-sign time, which keeps the run-time overhead that comesfrom otherwise required dynamic cache coherence proto-cols at a minimum. The authors of [13] for example presenta method based on dynamic programming that assigns thescratchpad content either with focus on performance or en-ergy minimization. Menichelli et al. [12] put the focuson energy minimization only and add the consideration ofpower-down phases for the main memory to their optimiza-tion problem.For designs without cache or scratch-pad, different meth-ods for memory partitioning have been presented. Beniniet al. [14] developed an automated partitioning algorithmfor on-chip SRAM based on exhaustive search. Theirmethod searches for the global optimum memory bankingwith minimum energy consumption. The authors of [15]use a genetic algorithm instead and include the intercon-nect into their considerations.A fact, common to all of the above methods is the alreadyset memory allocation. That is, number and size of mem-ory instances is given, hence only the banking allows fordifferent variations, wherefore quite some optimization po-tential is lost. The authors of [16] address this point andpresent an optimization model that allows the combinedoptimization of memory instance allocation and applica-tion binding of address ranges to instances.Subsequent work extends the partitioning concept by theuse of memory low-power modes. The approach of Ste-infeld et al. [17] can be named here. Unfortunately, it islimited to memory blocks of equal size and only consid-ers simplistic low-power mode activation schemes. Theauthors of [18] improve on that work by introducing amore fine-grained low-power mode activation model andthe possibility to define peak power constraints.

2.3 Timing SimulationTiming analysis is an important part of embedded sys-tem design. Especially during design space explorationit is necessary to be able to evaluate the performance ofthe system in development to allow informed design de-cisions. To that end, it is crucial that such an evaluationis as fast as possible in order to keep the exploration pro-cess efficient. Furthermore, timing analysis has to yieldsufficiently accurate results to be useful as input for de-sign decisions. There are several different approaches totiming analysis that have been developed with distinct re-quirements in mind. Those that are suited for early perfor-mance evaluation during design space exploration are dy-namic approaches based on simulation or execution traces.A very fast, yet accurate approach to timing simulationis source-level timing simulation (SLTS), which has beenworked on extensively [19], [20], [21]. These contribu-tions have in common, that it is possible to run the timingsimulation on a simulation host, that is usually much fasterthan the target platform. Similar approaches based on ma-chine learning have been proposed [22], [23], which alsoallow the timing simulation to be run on a much faster sim-ulation host, while the resulting timing estimate represents

the performance on a target system. One major drawbackof approaches with differing simulation host and target sys-tem is the impossibility to accurately simulate the timing oflow-level software such as embedded firmware.Hence, several techniques have been proposed to handlelow-level software as well: hybrid- and binary-level sim-ulation (BLS). Binary-level approaches simulate the tim-ing behaviour during execution of a target binary, usu-ally executed on some kind of virtual prototype (VP) ofthe target system. There are approaches, which integratetiming simulation into a SystemC-based system-level sim-ulation [24]. Although binary-level simulations are usu-ally slower than source-level simulations, the advantageof binary-level simulation techniques is the flexibility asit allows the timing simulation of any software that can besimulated by the virtual prototype in use. Furthermore, bychoosing different kinds of VP, it is possible to control sim-ulation speed and accuracy.Hybrid simulation methods try to overcome the draw-backs of cross-platform approaches like SLTS and machinelearning based techniques by combining those with BLS.Wang et al. proposed a hybrid approach that combineshost-compiled source-level simulation with an instruction-level simulation for target dependent software [25]. An-other approach that combines SLTS with a SystemC-basedVP relies on annotating the target source-code before com-piling it for the executing VP [26].

3 System Design Workflow

In this section, we first give a short overview of our systemdesign workflow, followed by a more detailed descriptionof the steps and their interaction with each other and thevirtual system prototype (VSP). In addition, we provide anin-depth discussion of the methodologies implemented ineach step.

Timing Simulation

Basic Block Execution Order

2/4

Virtual System PrototypeSensor

Firmware Generation

Memory Subsystem

Optimization

31

MemoryAccesses

SensorFW

Figure 1 Overview of system design workflow

The system design workflow, as briefly introduced in Sec-tion 1, is shown in Figure 1. It starts with a model-basedgeneration of sensor firmware (see Figure 1 - step 1). Morespecifically, an actor-oriented data-flow model of the soft-ware is transformed to C or C++ source code, while actorsare scheduled for minimal energy consumption. The result-ing firmware is then compiled and run on the VSP, whichis used for functional verification of the sensor firmware.


62

Next, the analysis step is executed (see Figure 1 - step2/4). This steps consists of timing simulation of the exe-cuted firmware as well as power simulation of the memorysubsystem. These analyses rely on additional optional in-put, specified in a JSON-based data exchange format. Thelatter was designed to allow efficient communication andinteraction between the different steps and applied method-ologies. In general, it enables the description of all systemaspects that are relevant for the different steps of our designworkflow, such as:

• Hardware characteristics, for example the memorysubsystem.

• Firmware characteristics, for example input sourcefile and function.

• Constraints, for example power and timing con-straints.

However, not all aspects that are covered by the exchangeformat are relevant to the methodologies presented in thiswork. Therefore, only relevant points are outlined in thefollowing as part of the individual design step descriptions.The timing analysis (see Figure 1 - step 2/4) interfaceswith the VSP to get the binary basic block execution or-der. Hence, the start address of each executed basic blockis passed to the analysis for an online, possibly context-sensitive timing analysis. In addition to the obligatory totalsimulated execution time of the firmware, timing analysisoptionally yields a report in JSON format, containing anevaluation of all specified timing constraints. The memorysubsystem power analysis (see Figure 1 - step 2/4), on theother hand, communicates with the VSP to be informed onany memory access. This information is used to trigger apower state machine that simulates the energy and powerbehavior of the memory subsystem. Thus, power simula-tion yields total energy consumption and peak power fig-ures. Furthermore, optionally defined power constraintsare evaluated, while the corresponding results are providedvia the data exchange format.Next, the optimization step (see Figure 1 - step 3) is ex-ecuted. For this, all memory accesses during system sim-ulation on the VSP are captured and further used to de-rive memory access statistics for the individual parts ofthe firmware, referred to as profiles. This data set is thenused to find an energy-optimized partitioning of the mem-ory subsystem. On top of that, the utilization of memorylow-power states is evaluated and optimized. Finally, addi-tional instructions for power-mode switching are inserted,followed by a reorganization of the firmware in order tomatch the optimized memory structure.In step four, the resulting, optimized binary is executed us-ing the virtual system prototype to re-run the power andtiming simulation. The resulting timing and power infor-mation can be used to further improve the design in an-other iteration of generation and optimization step - ei-ther manually or automatically. Note that such a feedbackloop can be realized using our current tooling environment,since all necessary information can be easily shared andexchanged between the individual stages using the data ex-change format. Section 5 contains a detailed discussion of

how such an iterative workflow could improve automatedsensor firmware design.In the following, every workflow step is outlined in detail.

3.1 Sensor Firmware GenerationAutomatic model-based firmware generation enables de-velopers to handle the increasing complexity of today’s im-plementations. However, code and data size as well as en-ergy consumption must be minimal to meet the stringentrequirements of low resource embedded devices as for ex-ample sensor nodes. In this section, we introduce a novelmethodology to generate energy efficient sensor firmwarefrom dataflow models as a first stage of our automatic em-bedded design workflow.Our proposed approach consist of four distinct steps: 1)model composition, 2) actor scheduling, 3) code emission,4) compile and link. Figure 2 outlines these steps for a sim-ple sensor firmware example that reads acceleration sam-ples from the sensor, applies a detection algorithm for a tapgesture with an input data rate of 200 Hz onto the signaland feeds the result into a system output.

Actors

Acc Tap Detector @ 200 Hz Out

main.c RuntimeFirmware.bin

Acc_exec()Tap_exec()

Out_exec()

+

Period

1 2

34

Target Toolchain

sleep

Figure 2 Firmware generation outline: 1) model compo-sition, 2) actor scheduling; 3) code emission; 4) compileand link

The following paragraphs explain the particular steps inmore detail.

1) Model composition As a first step, the developer com-poses the firmware model from signal flow components,so called actors. These actors define the data processingoperations. They are connected via channels to read inputdata and pass on their computation results. Furthermore,actors can be annotated with attributes, e.g. required inputdata rate or latency. In order to assist model composition,we provide a Python based framework including a libraryof selected actors for signal input, output and gesturedetection. The functions and classes are inspired bythe Ptolemy II project [9]. To extend the functionality,the developer may add new actors by deriving from theavailable base classes in the library. Listing 1 presents thenecessary Python code to compose the model from Fig-ure 2 with our framework available in the dataflow module.

2) Actor scheduling In order to create a schedule, the ac-tor based model is transformed into an SDF graph. Actorsform nodes and channels the edges. Edge production andconsumption rates are derived from data rate requirementsof the respective actors. The production rates of sourcenodes are used to provide the energy optimal sensor con-


63

figuration. A consistency check is performed by comput-ing the rank of the connection matrix [2] and symbolic ex-ecution is conducted to prove deadlock-freeness and createa repetition vector. A simple list-scheduling is applied togenerate a periodic sequential schedule [8].

Listing 1 Model input via Python dataflow modulefrom d a t a f l o w import ( A c c e l e r o m e t e r ,

Compos i teActor , S tdou t , T a p D e t e c t o r )

model = Compos i t eAc to r ( ’ model ’ )Acc = A c c e l e r o m e t e r ( ’ Acc ’ , model )Tap = T a p D e t e c t o r ( ’ Tap ’ , model , r a t e =200)Out = S t d o u t ( ’ Out ’ , model )model . c o n n e c t ( Acc . o u t p u t , Tap . input )model . c o n n e c t ( Tap . o u t p u t , Out . input )

3) Code emission In this step, the generated actorschedule as well as the derived sensor configuration istransformed into C code. A single source file is gener-ated, containing the main() function that implements theconfiguration logic, e.g. for setting the correct sensordata rate and schedule execution. Memory buffers areinstantiated to enable data exchange between the actorsalong the channels in the model. The size of thesebuffers are derived from maximum edge marks in thedataflow graph. Preprocessor statements are inserted toinclude the headers that contain the necessary declarationsto access the sensor modes (Figure 2 - Runtime) andactor firing functions (Figure 2 - Actors). As we do notgenerate the implementation of these functions out of thedataflow model, they have to pre-exist in source form oras pre-compiled library for the selected target platform.The schedule execution is emitted as an infinite loop,which invokes the actors firing functions according tothe sequential schedule. At the end of each period, asleep statement is inserted to put the processor into alow power mode, what further reduces energy consump-tion. Here, the source actors (sensors) are responsiblefor waking up the system once the next sample is available.

4) Compiling and linking In the last step, the generatedsource code, that contains the sensor configuration and theimplementation of the periodic sequential actor scheduleis compiled using the corresponding target toolchain. Thehardware-specific functions as well as the implementationsof the actor firing functions are supplied either in sourcecode form or as pre-compiled libraries during the linkingstage. For our demonstration platform, we used the opensource GNU ARM EABI GCC toolchain and compiledthe code for ARMv6 CPU architecture combined withsoftfloat ABI.

In the end, we get a fully operational firmware binary thatis executable on the target platform. In our proposed work-flow, this generated firmware is subject to further analysisand optimization steps as detailed in the next sections.

3.2 Memory Subsystem OptimizationThe optimization of the memory subsystem at system de-sign time can be highly relevant in terms of energy con-sumption reduction and in order to satisfy peak power re-

strictions (cf. Section 2.2). This section describes the cor-responding methodology that has been integrated into thesystem design workflow according to Figure 1.An overview of the memory subsystem optimization flowfor the sensor hub system design at hand is depicted in Fig-ure 3. It is subdivided in three subsequent steps as follows.

Firmware generation

Optimized memory

architecture

Generatedlinker script

Optimizedfirmware

binary

Optimization stage 2

Optimization stage 1

Memory allocation

Profile binding

Low-power mode scheduling

Profiling information

Firmwarebinary

1 2 3

Memory character-

istics

Figure 3 Overview of memory subsystem optimization

In a first step, referred to as optimization stage 1, memoryallocation and profile binding are determined. In this con-text, each profile denotes a single function or data blockof the firmware application. The allocation describes theset of memories to use, including cell technology, storagecapacity, and banking. The binding part defines the map-ping of every profile, which is part of the application, tothe set of allocated memories. Main input for this opti-mization step is a collection of profiling information, i.e.detailed memory access statistics, as obtained for the inves-tigated firmware from simulation using the virtual systemprototype (VSP). This set of information is represented inthe previously discussed JSON-based data exchange for-mat and includes the following values per profile:

• Profile type, distinguishes code and data

• Profile name or label respectively

• Address range and size

• Duty cycle, i.e. number of cycles this profile is in use

• Number of read accesses

• Number of write accesses

• Dependency vector, reflects the accumulated depen-dencies of this profile to every other profile in codeand data flow graph

Beyond that, a set of memory characteristics is required asinput. The representation of this data set is again based onthe data exchange format and comprises energy and powerfigures for all memory components that are available to thesystem design engineer.The optimization potential that motivates for this step orig-inates from the fact that, in terms of energy efficiency, it ischeaper to access a small SRAM instance. Main reason isthe tremendous impact of the memory periphery, i.e. ad-dress logic, amplifiers, or pre-charging units, that consid-erably increases with memory size. Hence, it is beneficialto have frequently accessed application profiles in possiblysmall memory instances in order to reduce the energy con-sumption altogether. For the solution of this optimization


64

problem, different methods and corresponding implemen-tations are available and integrated into the workflow. Thisincludes, on the one hand, the combined optimization of al-location and binding with ensured global energy minimumfor the given set of input data using integer linear program-ming [16]. On the other hand, different heuristics for effi-cient clustering of inter-dependent profiles, e.g. modularityor min-cut, are available for this purpose [18].The following second step (cf. optimization stage 2) is di-rectly based on allocation and binding but also relies on thepreviously discussed input information of stage 1 (cf. Fig-ure 3). It is only executed if memory low-power modes areavailable. If so, this optimization method allows the de-termination of an optimal schedule for the set of availableoperation modes.In case of SRAM memories, static power consumption ba-sically originates from leakage currents, whose reductioncan be achieved for example by using low-power modesbased on biasing methods and/or power gating. Anotheroptimization goal is consequently static power consump-tion minimization for the set of allocated memories. Theused optimization model captures the mentioned inputsand optimization goal and is implemented as mixed-integerquadratic program [18]. After being solved, it yields a low-power mode configuration vector for every function withinthe set of application profiles. This vector consists of oneelement per allocated memory, while every element againis part of the set of available low-power modes.In the final step, the results of both optimization stages arefed back to the firmware application level in a transparentand fully automated way. On the one hand, this includesthe generation of a linker script, which reflects the deter-mined memory allocation. On the other hand, several codemodifications are applied on the assembly level. These are:

• The placement of application profiles according to thebinding with respect to memory allocation and thegenerated linker script.

• The insertion of code snippets for the controlled ac-tivation and deactivation of determined memory low-power modes on relevant transitions between single(code) profiles, based on the result of optimizationstage 2.

Finally, the modified assembly and linker script are fed tothe assembler and linker of the cross-compilation toolchainagain, in order to generate the final firmware binary thatreflects all optimization results.

3.3 Timing SimulationIn this section we first give a detailed explanation of thesimulation approach in use as well as the underlying timingmodel. In addition, we discuss the online timing constraintevaluation that is executed along with the timing simula-tion.Figure 4 shows an overview of the timing analysis ap-proach we integrated in our VSP, while the former is basedon work proposed in [24]. Since we use a binary-level ap-proach, only the target binary is required for the simulation

Target Binary

Source-Code

Offline Analysis

Static Timing Analysis

Timing Model

TimingDB

Virtual System Prototype

Functional Simulation

Bas

ic B

lock

Exec

uti

on

Ord

er

TimingSelection

Timing Simulation

Figure 4 Overview of binary-level timing simulation

to work. The original source code can simplify the specifi-cation of timing constraints, which is discussed later in thissection. First, the target binary is used to do a static tim-ing analysis based on a timing model for the target hard-ware. In this case, it models a platform similar to ourVSP. More specifically, the modeled target consists of anARMv6 microcontroller, memory, and no caches. As thismicrocontroller consists of a simple two-stage pipeline anduses a static branch prediction (predicts always not taken),we do not model the pipeline explicitly. We rather use aninstruction-latency based model, which considers branchesto account for penalties due to mispredictions. Hence, incontrast to [24], a context-sensitive timing analysis or sim-ulation is not necessary or done in this work.As a result, the static timing analysis yields a so-calledTimingDB or timing database, which is a data structure thatallows analysis-agnostic storage of information requiredfor fast and accurate timing simulation on binary- andsource-level [27] [28]. More specifically, a TimingDB canstore context-sensitive basic block timings as well as infor-mation necessary to reconstruct the binary-level control-flow. Note that all timings are stored in terms of cycles toallow execution frequency independent simulation.During execution of the target binary, the timing simula-tion is triggered on each executed basic block and receivesthe start address of that basic block as well as the last ad-dress of the preceding basic block (see Figure 4 - Func-tional Simulation). Based on the information contained inthe previously generated timing database, it is possible toderive the corresponding execution time as well as poten-tially occurring delays due to branch mispredictions. Usingthis information, the total execution time is accumulatedduring functional simulation and returned before returningfrom sc_main.An optional step of the timing simulation process is theevaluation of timing constraints. To that end, parts of thedata exchange format are designed specifically to modelvarious constraints at different granularity. Note that Sec-tion 4.3 contains a more detailed discussion of possibleconstraints, all of which contain one or more attributesused to describe the program entity to be constrained. Sup-ported are both binary- and source-level entities, while themost fine-grained are binary-level basic blocks. The lat-ter are identified by address, while source-level entities are


65

identified by source file and line.As timing simulation is done on binary level it is necessaryto map the latter to binary basic blocks. That way, it is pos-sible to evaluate a constraint each time a corresponding ba-sic block is executed. Lets assume we define a constraint torestrict the WCET for a function called handle_interrupt.To be able to evaluate that constraint, we need to measurethe simulated execution time between the first and the lastexecuted basic block of that function. Since a binary-levelfunction may have multiple entry and exit points, we useeach pair of caller and return-to basic block of a functioninstead. Hence, the mapping between entities (and there-fore constraints) and binary basic blocks also works withcomplex control-flow and run-time calculated call targets.To be able to map information about source-level entitiesto binary basic blocks, we use a matching algorithm basedon [29].Once all timing constraints can be mapped to binary basicblocks, we create an annotated binary-level control-flowgraph. In this CFG, we annotate each basic block withone or more timing constraint references. Thus, each timethat basic block is entered during functional simulation thesimulated timing is recorded and passed to each of the an-notated constraints. Internally, each constraint is imple-mented using a finite-state machine (FSM), which decodesconsecutive invocations of a constraint.After simulation, a result file is generated that satisfies thedata exchange format and contains evaluation informationof all constraints. This file lists each defined constraint in-cluding the actual execution time determined during sim-ulation and a boolean field that signals the violation of aconstraint.

4 Demonstration Platform

To demonstrate our proposed workflow, we implemented aSystemC virtual system prototype (VSP) for a microelec-tromechanical (MEMS) system based sensor hub, whichshows some similarities to a Bosch Sensortec BMF055[30]. We instrumented it with interfaces for a memory pro-filer and a timing model to enable timing simulation. Thenext three subsections present these particular componentsin more detail.

4.1 Sensor Hub Virtual System PrototypeThe SystemC virtual system prototype is shown in Figure 5and consists of three parts which are based on componentsavailable from the open source SoCLib project [31]:

• A 32 bit ARMv6 processor model with support forARM11 and Thumb instruction set running at 100MHz;

• A configurable memory model that serves as systemmain memory;

• A simplified model of a triaxial MEMS accelerationsensor that provides a three-dimensional accelerationsignal with 16 bit resolution per axis sampled at 1600Hz.

VCI Interconnect

Profi

ler I

F

Tim

ing

IF

ARMv6

RAMAccel

Firm-ware

Mem

ory

Profi

ler

Tim

ing

Sim

ulat

ion

IRQ

Figure 5 Virtual prototype of a MEMS sensor hub

An m-to-n interconnect implementing the TLM 2.0 basedvirtual component interface (VCI) connects the three com-ponents and provides memory mapped read and write ac-cesses to the firmware executing on the processor.The sensor component has a single interrupt request (IRQ)line that directly connects to the first IRQ input of the pro-cessor model. It operates at a constant output rate of 1600Hz and raises an IRQ once a new sample is available inits data registers. Thereby it reads the actual accelerationsamples from a given CSV file containing a pre-recordedsignal from a real sensor.The processor model implements an interpreting instruc-tion set simulator for both ARM11 and Thumb instruc-tions. It features classic ARM exception handling and de-bugger interface to connect to the GNU Debugger. We ex-tended it with a mechanism to interface with the timingsimulation explained in Section 3.3 that traces the takenbranches and jumps during program execution within a dis-tinct timing model.The memory component models a classic static random-access memory (SRAM) of configurable size and struc-ture. Additionally, it is equipped with an interface to in-teract with the memory profiling mechanism as presentedin the next subsection.

4.2 Memory ProfilerThe memory profiler consists of two parts. This is, on theone hand, an access measurement unit (AMU) that cap-tures and resolves memory accesses, which are further pro-cessed to detailed access statistics. On the other hand, apower state machine (PSM) that is triggered on occurrenceof relevant events. This allows the generation of detailedpower and energy figures for the memory subsystem andits individual components.According to Figure 5, the profiler is directly attached tothe memory model of the virtual system prototype. This isrealized via the following basic interface (cf. Listing 2).

Listing 2 Memory profiler interfaceu n s i g n e d r e a d ( u n s i g n e d a d d r e s s ) ;vo id w r i t e ( u n s i g n e d a d d r e s s ,

u n s i g n e d da ta ,u n s i g n e d mask ) ;

Based on the passed address parameter, the AMU is ableto resolve the affected memory unit and application pro-file. The collection of all measurement points results in thememory access statistics that are used for optimization asdiscussed in Section 3.2.


66

Beyond sampling of accesses, the PSM models the behav-ior of the memory in terms of energy and power consump-tion. It is triggered whenever:

• An instruction fetch occurs

• Memory is accessed for a read/write

• The memory-mapped power-mode configuration reg-ister of the memory subsystem is modified

Each of the above events leads to a change within the statespace as illustrated in Figure 6. The implemented mem-

IdleLow-power

Access

Read

Write

Active

Light sleep

Deep sleep

Figure 6 Memory PSM state space

ory power model supports two SRAM low-power modes.These are light sleep, based on source biasing, and deepsleep, which adds power gating of the memory peripheryon top of the light sleep state.The applied actual energy and power figures are based onmemory simulation with the tool CACTI [32] and followinformation as provided by the author of [33]. Overall, thefollowing analysis results are provided by the power statemachine when attached to a simulation run of the virtualsystem prototype:

• Energy consumption per memory instance from read-ing/writing

• Passive power consumption per memory instance (de-pending on idle mode periods)

• Energy and timing penalty that originates from low-power mode changes

• Average power consumption based on the system fre-quency

• Peak power consumption of the memory subsystemwithin the simulated period

4.3 Timing SimulationAs mentioned in Section 3.3, the timing analysis is respon-sible for two tasks. First, simulating the timing in cyclesaccording to the underlying timing model based on the ba-sic block execution order provided by the VSP. Second,evaluating optionally defined timing constraints, which al-lows to generate fine-grained timing information.To support these tasks, our timing simulation framework isattached to the VSP using a simple interface, as shown inListing 3. Based on this information, the timing simulationis able to reconstruct the number of simulated cycles duringthe execution of the VSP.

Listing 3 Interface to timing simulationu i n t 6 4 _ t s i m u l a t e _ b b ( a d d r _ t b r a n c h _ i n s t ,

a d d r _ t t a r g e t _ b b ) ;

In addition, the simulated timings are used for an onlineevaluation of timing constraints. As the methodology be-hind this process is already discussed in Section 3.3, thissection focuses on outlining the possible constraints andhow they can be described.There is a basic constraint, namely point-to-point delay,the definition of which is shown in Listing 4. Thus, a point-to-point delay can be used to check the number of executedcycles between to entities of a program. More specifically,these entities may be:

• binary-level basic blocks;

• binary-/source-level functions;

• source-line information;

while source-line information consist of a file name and aline number.

Listing 4 Point-to-point delay constraint{

" t y p e " : " p2p−d e l a y " ," e n t i t y _ f r o m " : " e n t i t y −i d " ," e n t i t y _ t o " : " e n t i t y −i d " ," d e l a y " : t i m e _ v a l u e

}

In summary, the timing simulation can provide very fine-grained timing information based on a single run of thevirtual system prototype. For example, it is possible to gettiming information for each function of a firmware indi-vidually. Moreover, a point-to-point delay constraint canalso be used to generate timing information for a loop or asub-loop part of a function.

5 Outlook

In this section, we give another concrete example on thepresented methodologies and the interaction with the VSP.On top of that use case, we illustrate how the workflowcan be extended to further improve the process of design-ing and developing embedded sensor firmware. Note that

Basic Block Execution Order

MemoryAccesses

1/5 3/5

2/4

SensorFW Virtual System Prototype

Future Work

Analysis

OptimizationGeneration

Figure 7 Extended system design workflow


67

this is a hypothetical example, which we did not actuallyrun trough our workflow. It is rather meant to motivatepossible benefits and use cases for future work based onour approach. The possible extensions to our workflow arehighlighted in green in Figure 7.As explained in Section 3, the first step consists of a model-based generation of sensor firmware. More specifically, thesoftware is generated from an actor-based description andcontains a power-optimized actor schedule. Thus, we firsttake a look at the corresponding models, that are the inputfor our generation step. Figure 8a shows an actor-baseddata-flow model of the example firmware, while Figure 8bshows the corresponding SDF graph that is created duringstep one of our proposed flow.

Acc Pickup @ 50 Hz

Wakeup @ 200 Hz

GPIO

IRQ

(a) Dataflow model of exam-ple firmware

Acc

Filter Pickup

Wakeup IRQ

GPIO

1

1

1

1

4

1

11

1 1

1 1

(b) SDF graph of examplefirmware

Figure 8 Dataflow descriptions of example firmware

The software represented by these models is a simplepickup detection, that may be used in a smartphone to turnon the device’s display on such an event. Of course, an ap-plication like this would have both power and timing con-straints that have to be respected. Especially in case of amobile device, low-as-possible power consumption is nec-essary to prolong battery life.Based on the SDF graph, the generation step creates acyclic actor schedule that minimizes power consumption.Finally, the actual firmware is generated and the resultingC code is shown in Listing 5.

Listing 5 Example firmwarevo id main ( ) {

a c t o r _ a c c _ i n i t ( ) ;a c t o r _ f i l t e r _ i n i t ( ) ;a c t o r _ w a k e u p _ i n i t ( ) ;a c t o r _ p i c k u p _ i n i t ( ) ;a c t o r _ i r q _ i n i t ( ) ;a c t o r _ g p i o _ i n i t ( ) ;

w h i l e ( 1 ) {f o r ( i n t i = 0 ; i < 3 ; ++ i ) {

a c t o r _ a c c _ e x e c ( ) ;a c t o r _w ak eu p_ e x ec ( ) ;a c t o r _ i r q _ e x e c ( ) ;

}a c t o r _ a c c _ e x e c ( ) ;a c t o r _w ak eu p_ e x ec ( ) ;a c t o r _ f i l t e r _ e x e c ( ) ;a c t o r _ p i c k u p _ e x e c ( ) ;a c t o r _ i r q _ e x e c ( ) ;a c t o r _ g p i o _ e x e c ( ) ;

}}

Next, the firmware is compiled and executed on the VSP,as indicated by Figure 7. During this first simulation run ofthe virtual system prototype, both timing and power anal-ysis are run. These analyses may yield that all constraints

are already met, but there might still be potential to reducepower consumption.Thus, the next step, memory subsystem power optimiza-tion, is executed. As a result, most of the functions shownin Listing 5 are moved to a smaller, more efficient mem-ory. Moreover, the optimization inserts instructions thatimplement an power optimized handling of memory powermodes. Thus, each time a certain memory is not needed, itcan be put into a reduced power state.What follows is another iteration of the analysis step tocheck if constraints are still met. Again, it is possiblethat this is the case and at this point the design work-flow would end with a valid sensor firmware consideringall non-functional requirements. Nevertheless, it is alsopossible that the previously inserted instructions to han-dle memory power modes introduce a run-time overheadthat causes a violation of timing constraints. In that case,the information provided by the analyses would be fedback into the optimization step (cf. Figure 7). Assume,for example, that the optimization assigned the functionsactor_acc_exec and actor_wakeup_exec to differentmemories. Hence, additional instructions for power-modechanges would be executed when transitioning betweenthose functions. Additionally, using the per-function tim-ings as provided by the timing simulation, a system de-signer could decide to merge these two functions. Thisway, the optimization could be re-run, while making surethat the overhead introduced by placing actor_acc_execand actor_wakeup_exec into different memories is elim-inated.Subsequent iterations could allow further design decisionsthat improve the firmware or lead to a final, power opti-mized version of it, that also adheres to the specified timingconstraints. As already mentioned, the extended workflowdescribed in this section is future work, that should focuson automating the feedback-loop briefly outlined using thisexample.

6 Results and Evaluation

The following evaluation of the presented system designworkflow serves a twofold purpose. On the one hand,it proves the functionality of the automated flow with itscomponents (cf. Section 3) as well as its correct interactionwith the virtual system prototype (VSP) (cf. Section 4).On the other hand, resulting analysis figures from differentdevelopment stages and for individual parts of the designrespectively demonstrate the strength of this combined ap-proach and moreover reveal several indicators for a step-

Table 1 General benchmark application details

benchmark description LOC size [KB]

INFO

(1)

bitcount bit manipulation tests 780 7.9crc32 cyclic redundancy check 511 311.7dijkstra shortest path problem 749 281.1qsort quick sort algorithm 549 203.7rijndael AES encryption test 1515 340.8sha 160-bit hash generation 726 320.3stationary_tap gesture detection demo 1205 6.6


68

Table 2 Results of simulation and analysis (baseline)

simulated execution total average peakbenchmark cycles [#] time [ms] energy [µJ] power [µW] power [µW]

AN

ALY

SIS

(2) bitcount 82928064 829.2 586.3 707.0 966.9

crc32 7484813 74.8 930.2 12,428.9 19,884.8dijkstra 134494985 1,344.9 13,708.0 10,192.2 19,884.8qsort 13063895 130.6 724.7 5,547.9 10,754.6rijndael 140125582 1,401.2 13,262.3 9,464.6 19,884.8sha 63749858 637.4 5083.3 7,973.8 19,884.8stationary_tap 36971921 369.7 188.4 509.6 966.9

Table 3 Results of simulation and analysis (after optimization)

penalty total execution penalty total average peakbenchmark cycles [#] cycles [#] time [ms] energy [nJ] energy [µJ] power [µW] power [µW]

AN

ALY

SIS

(4) bitcount 48 120504265 1,205.0 23.5×10−3 356.0 295.4 535.3

crc32 9 7485063 74.8 7.6×10−3 930.1 12,426.5 19,885.7dijkstra 381 134532457 1,345.3 9.0 3,371.4 2,506.0 10,767.7qsort 5 18367911 183.6 0.2 671.8 3,657.5 10,755.8rijndael 58474 141568081 1,415.6 3,530.2 2,826.7 1,996.7 19,890.7sha 164 63752902 637.5 7.4 856.2 1,343.0 19,888.8stationary_tap 3003 36991685 369.9 1.6 61.3 165.8 535.7

wise improvement of a design at hand (cf. Section 5).All experiments have been carried out for a fixed sys-tem frequency of 100MHz and using the following toolsfor simulation, cross-compilation, optimization, and codegeneration: g++ (GCC) 5.4.1, SystemC 2.3.1 with TLM2.0.3, arm-none-eabi-gcc (GCC) 8.2.0, GNU Binutils2.27.51, LLVM version 7.0.0, AMPL version 20111121,and Gurobi 8.1.0. We compare a real world gesture de-tection sensor firmware example (stationary_tap) to differ-ent exemplary benchmarks of the MiBench suite [34]. Allconsidered examples are listed in Table 1 along with a ba-sic description, the extent of the firmware in lines of code(LOC), and the required memory space after compilationwith optimization level -Os. Please note that even thoughthe focus in the following discussion is put on total values,the provided analysis results are not limited to that. Thatmeans, a more fine-grained evaluation in terms of timingand power is possible, even down to the level of single ap-plication profiles and memory instances.Based on the firmware and the resulting binary, as pos-sibly obtained from model-based generation, a baselinesimulation and analysis round is carried out on the VSPin first place. The corresponding results are listed in Ta-ble 2. Timing simulation yields simulated cycles and ac-curate execution time values, whereas the memory powerstate machine provides energy and power figures, charac-terizing the memory subsystem. Striking, already in thistable, is the considerable difference in peak power, which

Table 4 Memory subsystem optimization statistics

memory time energy peak powerbenchmark count penalty savings reduction

OPT

IMIZ

ATIO

N(3

) bitcount 7 45.31% 39.28% 44.63%crc32 4 0.00% 0.02% 0.0%dijkstra 5 0.03% 75.41% 45.85%qsort 4 40.60% 7.31% -0.01%rijndael 5 1.03% 78.69% -0.03%sha 5 0.00% 83.16% -0.02%stationary_tap 5 0.05% 67.45% 44.59%

originates from the size of the memory instance that is at-tached to the VSP. Bitcount and stationary_tap get by witha memory block of 8K, while all other applications require256K or more. As a consequence and in case of a peakpower constraint violation, it can be worth to investigatethe possibilities of a firmware memory footprint reductionalready in this development stage.Based on application binary and baseline analysis results,the memory subsystem optimization is carried out (cf. Sec-tion 2.2). On the example of the stationary_tap demo, thisoptimization step yields a memory allocation consisting of5 memories (cf. Table 4) and the corresponding bindingof all code and data blocks to these memories. Further,all memories are assumed to support an active operationmode (A) and two low-power modes, light sleep (L) anddeep sleep (D). The corresponding low-power mode sched-ule for stationary_tap is given in Listing 6.

Listing 6 Low-power mode schedule for stationary_tapM1 M2 M3 M4 M5A L A D D _ _ d i v s i 3A L A D D _ _ a e a b i _ i d i v m o dA A D D D mainA L A D D p r i n t fA L A D D s t a p _ d o _ s t e pA L D A D s t a p _ i n i t i a l i z e

All these results are finally fed back to the firmware assem-bly level and recompiled with a modified linker script in or-der to obtain a binary that reflects all optimization aspects.Based on that, a second simulation and analysis round onthe VSP is executed, which yields the figures in Table 3 andTable 4. For the stationary_tap sensor application, a con-siderable energy consumption and peak power reduction atonly a minimal timing penalty is achieved.Altogether, it is clearly visible that energy savings as wellas peak power reduction go hand in hand with a timingpenalty, which is caused by additionally inserted code. Incomparison to the baseline firmware, these code snippetsare used for steering of the memory low-power mode con-figuration. In some cases, the resulting penalty is dispro-


69

portionately high, e.g. in case of bitcount and qsort bench-mark (cf. Table 4). This is striking because the correspond-ing number of penalty cycles, due to actual mode changesis extremely low. A follow up investigation of these ex-amples reveals that both applications make heavy use ofpointer-based function calls, which are not traceable by thepost-optimization code generation step. This feedback in-formation can be used to adjust the source code accord-ingly, hence unneeded power mode management code canbe avoided. By that, an improvement into the direction ofconsiderable energy savings at negligible time penalty asfor example given for dijkstra or stationary_tap demo be-comes possible.Another noticeable figure is the high number of penaltycycles that accumulated for the rijndael encryption bench-mark. A closer co-investigation of optimized memorybinding and low-power mode schedule reveals that twohighly active but related functions have been separated bythe optimization, a use case, comparable to the above pre-sented example in Section 5. One option in order to reducethis penalty can be the feedback of this information to thefirmware model, where a combination of these nodes willavoid the later separation. This way, the number of penaltycycles can be reduced, possibly in exchange for a decreasein energy savings. This trade-off can easily be evaluatedby iterative workflow executions.Summarized, these exemplary use cases show, how the setof information that is provided by our combined workflowcan be used in a profitable way. Especially in the pres-ence of timing and peak power constraints, this iterativeapproach is clearly a highly interesting option that allowsthe handy exploration of the design space in form of differ-ent trade-offs.

7 Conclusion

Motivated by the combined evaluation of different aspectsin a top-down embedded system design workflow, this pa-per unites different contributions from model-based gener-ation, memory optimization, and accurate timing simula-tion fields. The individual methods have been integratedwith a virtual system prototype of an ARM-based sensorhub, which allows automated simulation, analysis, and op-timization. Experiments for both, a real world applicationand representative benchmarks, prove the functionality andstrength of this approach. Beyond the individual potentialof the single workflow parts, it is the combination of analy-sis figures from different perspectives that allows new con-clusions to be drawn. Especially in the presence of strictconstraints, e.g. in terms of power or timing, such infor-mation is of high interest to the system designer. The con-tinuation of this work will put the focus on the automationof such a design feedback loop for repetitive and step-wiseimprovement of a system design at hand.

Acknowledgment

This contribution is funded as part of the CON-FIRM project (project labels 16ES0567, 16ES0568 and16ES0569) within the research program ICT 2020 bythe German Federal Ministry of Education and Research(BMBF) and supported by the industrial partners InfineonTechnologies AG, Robert Bosch GmbH, Intel DeutschlandAG, and Mentor Graphics GmbH.

References[1] J.-P. Wolff, S. Stieber, T. Rankl, and R. Dorsch, “Improving

always-on gesture recognition power efficiency for androiddevices using sensor hubs,” in Computational Science andEngineering (CSE) and IEEE Intl Conference on Embed-ded and Ubiquitous Computing (EUC) and 15th Intl Sympo-sium on Distributed Computing and Applications for Busi-ness Engineering (DCABES), 2016 IEEE Intl Conferenceon. IEEE, 2016, pp. 64–67.

[2] E. A. Lee and D. G. Messerschmitt, “Synchronous dataflow,” Proceedings of the IEEE, vol. 75, no. 9, pp. 1235–1245, 1987.

[3] G. Bilsen, M. Engels, R. Lauwereins, and J. Peperstraete,“Cycle-static dataflow,” IEEE Transactions on signal pro-cessing, vol. 44, no. 2, pp. 397–408, 1996.

[4] B. D. Theelen, M. C. Geilen, S. Stuijk, S. V. Gheorghita,T. Basten, J. P. Voeten, and A. H. Ghamarian, “Scenario-aware dataflow,” Technical Report ESR-2008-08, 2008.

[5] F. Grützmacher, B. Beichler, C. Haubelt, and B. Thee-len, “Dataflow-based modeling and performance analysisfor online gesture recognition,” in Modelling, Analysis, andControl of Complex CPS (CPS Data), 2016 2nd Interna-tional Workshop on. IEEE, 2016, pp. 1–8.

[6] A. H. Ghamarian, S. Stuijk, T. Basten, M. Geilen, and B. D.Theelen, “Latency minimization for synchronous data flowgraphs,” in Digital System Design Architectures, Methodsand Tools, 2007. DSD 2007. 10th Euromicro Conference on.IEEE, 2007, pp. 189–196.

[7] M. Geilen, T. Basten, and S. Stuijk, “Minimising bufferrequirements of synchronous dataflow graphs with modelchecking,” in Design Automation Conference, 2005. Pro-ceedings. 42nd. IEEE, 2005, pp. 819–824.

[8] E. A. Lee and D. G. Messerschmitt, “Static scheduling ofsynchronous data flow programs for digital signal process-ing,” IEEE Transactions on computers, vol. 100, no. 1, pp.24–35, 1987.

[9] J. Davis II, M. Goel, C. Hylands, B. Kienhuis, E. A. Lee,J. Liu, X. Liu, L. Muliadi, S. Neuendorffer, J. Reekie et al.,“Overview of the ptolemy project,” ERL Technical ReportUCB/ERL, Tech. Rep., 1999.

[10] J. L. Pino, S. Ha, E. A. Lee, and J. T. Buck, “Softwaresynthesis for dsp using ptolemy,” Journal of VLSI signalprocessing systems for signal, image and video technology,vol. 9, no. 1-2, pp. 7–21, 1995.

[11] M. C. Williamson and E. A. Lee, “Synthesis of paral-lel hardware implementations from synchronous dataflowgraph specifications,” in Signals, Systems and Computers,1996. Conference Record of the Thirtieth Asilomar Confer-ence on. IEEE, 1996, pp. 1340–1343.


70

https://doi.org/10.1007/978-3-642-38853-8_7

https://doi.org/10.1007/978-3-642-38853-8_7

http://ieeexplore.ieee.org/document/7428005/

http://ieeexplore.ieee.org/document/7927042/

https://ae-bst.resource.bosch.com/media/_tech/media/datasheets/BST-BMF055-DS000.pdf

https://ae-bst.resource.bosch.com/media/_tech/media/datasheets/BST-BMF055-DS000.pdf

http://www.soclib.fr

http://www.hpl.hp.com/techreports/2009/HPL-2009-85.pdf

http://www.hpl.hp.com/techreports/2009/HPL-2009-85.pdf

https://www.design-reuse.com/articles/26402/power-management-in-embedded-memory-subsystems.html



[12] F. Menichelli and M. Olivieri, “Static minimization of totalenergy consumption in memory subsystem for scratchpad-based systems-on-chips,” IEEE Transactions on Very LargeScale Integration (VLSI) Systems, vol. 17, no. 2, pp. 161–171, 2009.

[13] F. Angiolini, L. Benini, and A. Caprara, “An efficientprofile-based algorithm for scratchpad memory partition-ing,” IEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems, vol. 24, no. 11, pp. 1660–1676,2005.

[14] L. Benini, A. Macii, and M. Poncino, “A recursive algo-rithm for low-power memory partitioning,” in Proc. of the2000 International Symposium on Low Power Electronicsand Design (ISLPED’00), 2000, pp. 78–83.

[15] S. Srinivasan, F. Angiolini, M. Ruggiero, L. Benini, andN. Vijaykrishnan, “Simultaneous memory and bus partition-ing for soc architectures,” in Proc. of the 2005 IEEE Inter-national SOC Conference. IEEE, 2005, pp. 125–128.

[16] M. Strobel, M. Eggenberger, and M. Radetzki, “Lowpower memory allocation and mapping for area-constrainedsystems-on-chips,” EURASIP Journal on Embedded Sys-tems, vol. 2017, no. 1, 2016.

[17] L. Steinfeld, M. Ritt, F. Silveira, and L. Carro, Low-Power Processors Require Effective Memory Partitioning.Berlin, Heidelberg: Springer Berlin Heidelberg, 2013,pp. 73–81. [Online]. Available: https://doi.org/10.1007/978-3-642-38853-8_7

[18] M. Strobel and M. Radetzki, “Design-time optimizationtechniques for low-power embedded memory subsystems,”in Proc. of the 1st International Workshop on EmbeddedSoftware for Industrial IOT (ESIIT), 2018.

[19] D. Mueller-Gritschneder, K. Lu, and U. Schlichtmann,“Control-Flow-Driven Source Level Timing Annotation forEmbedded Software Models on Transaction Level,” 201114th Euromicro Conference on Digital System Design,pp. 600–607, Aug. 2011, publisher: Ieee Citation Key:Mueller-Gritschneder2011a ISBN: 978-1-4577-1048-3.

[20] S. Schulz and O. Bringmann, “Accelerating source-leveltiming simulation,” in Proceedings of the 2016 Conferenceon Design, Automation & Test in Europe, ser. DATE ’16.San Jose, CA, USA: EDA Consortium, 2016, pp. 1574–1579.

[21] J. Benz, C. Gerum, and O. Bringmann, “Advancing source-level timing simulation using loop acceleration,” in 2018Design, Automation & Test in Europe Conference & Exhi-bition (DATE). Dresden, Germany: IEEE, Mar. 2018, pp.1393–1398.

[22] X. Zheng, L. K. John, and A. Gerstlauer, “Accuratephase-level cross-platform power and performance estima-tion,” Proceedings of the 2016 53rd ACM/EDAC/IEEE De-sign Automation Conference (DAC), pp. 1–6, 2016, iSBN:9781450342360.

[23] X. Zheng, H. Vikalo, S. Song, L. K. John, and A. Ger-stlauer, “Sampling-Based Binary-Level Cross-PlatformPerformance Estimation,” pp. 1709–1714, 2017, iSBN:9783981537086.

[24] S. Ottlik, S. Stattelmann, A. Viehl, W. Rosenstiel, andO. Bringmann, “Context-sensitive Timing Simulation of Bi-nary Embedded Software,” Proceedings of the 2014 Inter-national Conference on Compilers, Architecture and Syn-thesis for Embedded Systems (CASES), pp. 14:1–14:10,2014, iSBN: 978-1-4503-3050-3.

[25] Z. Wang and J. Henkel, “HyCoS,” Proceedings of the eighthIEEE/ACM/IFIP international conference on Hardware/-software codesign and system synthesis - CODES+ISSS ’12,p. 133, 2012, iSBN: 9781450314268.

[26] T. Meyerowitz, A. Sangiovanni-Vincentelli, M. Sauermann,and D. Langen, “Source-Level Timing Annotation and Sim-ulation for a Heterogeneous Multiprocessor,” pp. 276–279,2008, iSBN: 9783981080131.

[27] S. Ottlik, J. M. Borrmann, S. Asbach, A. Viehl,W. Rosenstiel, and O. Bringmann, “Trace-based context-sensitive timing simulation considering execution pathvariations,” in 2016 21st Asia and South Pacific DesignAutomation Conference (ASP-DAC). Macao, Macao:IEEE, Jan. 2016, pp. 159–165. [Online]. Available:http://ieeexplore.ieee.org/document/7428005/

[28] S. Ottlik, C. Gerum, A. Viehl, W. Rosenstiel, andO. Bringmann, “Context-sensitive timing automata for fastsource level simulation,” in Design, Automation & Test inEurope Conference & Exhibition (DATE), 2017. Lausanne,Switzerland: IEEE, Mar. 2017, pp. 512–517. [Online].Available: http://ieeexplore.ieee.org/document/7927042/

[29] S. Stattelmann, O. Bringmann, and W. Rosenstiel, “Fast andaccurate source-level simulation of software timing consid-ering complex code optimizations,” Proceedings of the 48thDesign Automation Conference, pp. 486–491, 2011, iSBN:9781450306362.

[30] Bosch Sensortec GmbH, “BMF055 ApplicationNote,” Tech. Rep., 2016. [Online]. Avail-able: https://ae-bst.resource.bosch.com/media/_tech/media/datasheets/BST-BMF055-DS000.pdf

[31] SoCLib Consortium et al., “The soclib project: An inte-grated system-on-chip modelling and simulation platform,”CNRS, Tech. Rep., 2003, last visited 01/14/2019. [Online].Available: http://www.soclib.fr

[32] N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi,“Cacti 6.0: A tool to model large caches,” HP Laboratories,Tech. Rep. HPL-2009-85, 2009, last visited on 01/10/2019.[Online]. Available: http://www.hpl.hp.com/techreports/2009/HPL-2009-85.pdf

[33] L. Minwell, “Advanced power managementin embedded memory subsystems,” 2011,last visited on 01/20/2019. [Online]. Avail-able: https://www.design-reuse.com/articles/26402/power-management-in-embedded-memory-subsystems.html

[34] M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin,T. Mudge, and R. B. Brown, “Mibench: A free, commer-cially representative embedded benchmark suite,” in Proc.of the 2001 IEEE International Workshop on WorkloadCharacterization. IEEE, 2001.


71

SEMAS – System Engineering Methodology for Automated Systems | The

world described in layers SEMAS – System Engineering Methodology for Automated Systems | The world described in layers, Markus Hedderich,

Markus Heimberger, Axel Klekamp, Valeo Schalter und Sensoren GmbH, 74354 Bietigheim-Bissingen, Deutschland,

[email protected]

Abstract

The development and provision of functions and systems in the context of "autonomous driving" is highly complex. With

each new generation and with an increasing SAE level, more and more concepts and solutions are required. As a result

of the technical and functional complexity and the abundance of regulations and standards, an ever-increasing degree of

integration between the end customer, the vehicle manufacturers and their suppliers becomes necessary. Also, the

development teams within these organizations have reached a size that requires a dedicated documentation and

communication structure. Automotive SPICE essentially provides an applicable and efficient process description, but

leaves open the concrete methodology for creating the content. The methodology presented here describes in detail how

model-based the function and performance of "Automated Driving Systems" can be analyzed and used to derive

architectures. The methodology pursues a holistic approach that covers the development process from end-customer

requirements to implementation in hardware and software. This is done in a SysML / UML modeling environment in

conjunction with a requirement management tool, in which the content is then linked to the ASPICE processes.

1 Introduction

The methodology presented here focuses on the technical

and functional aspects. Regardless of who is responsible

for which parts of the overall system, abstraction layers are

introduced, which allows a purposeful consideration of all

aspects relevant to the development. These are aspects such

as the function, the associated demands on performance

and reliability and the consequent limitations and design

decisions.

As in Automotive SPICE, alternating requirements and

architectures are the basic tools of the Layer Model. Each

level consists of a process that fully describes the

processes, interactions, and thus the functions of the layer,

structurally dividing and specifying the layer into

individual elements. For each of these elements,

characteristics and requirements are defined that describe

the expected response to a scenario occurring at the level.

These requirements are a specification for the suppliers of

the elements at the level below. This process is continuous

and can be repeated as often as the complexity of the

overall system or the number of links in the supply chain

requires it.

After this introduction, in chapter 2 a generic layer model

is introduced, that includes layers, elements, roles and

relationships. From a given example, in chapter 3 the

SEMAS layer model is extracted and described. In chapter

3.1 fundamental goals of the SEMAS layers are specified

and in chapter 3.2 the use of the model-based approach is

outlined. From chapter 3.3 to 3.6 the 4 layers of the

SEMAS layer model are described. Despite the fact that

SEMAS does address 4 aspects/objectives (listed at List 2)

of System Engineering, this document does only cover the

functional and structural aspects. Further reading will be

provided in other publications currently in preparation.

It also important to note, that the modeling approach

presented here is only one half of the methodology. The

requirements and all of its content remains in the

requirement management tool chain and are not imported

and processed on the SysML/UML side. Everything

relevant to design & architecture decisions and

consequences has its origin inside the SysML/UML model

and is transferred as structural information (headings,

links, chapter structure) to the requirements management

tool chain. Therefore, no requirement diagrams are found

in this approach, as all requirement related content remains

on the requirement management tool side.

2 The generic layer model

The generic level model at Figure 1 basically consists of

levels that are hierarchically arranged one below the other.

Each level is assigned one or more elements. Each level

contains overall the same functionality, but with different

levels of detail. The deeper you descend into the levels, the

more detailed and finely granular the structure and

functionality is described. The top level consists of only

one element A, to which several elements of the underlying

level are assigned. In turn, all elements of the second level

are assigned one or more elements of the next level. This

process can be repeated over any number of levels,

depending on how complex the system to be written is.


72

Figure 1 The generic plane model

In addition to the levels and the elements, there are also

participants who are in different relationships to the

elements and commit in different roles to certain deliveries

to the elements Figure 2.

Figure 2 The roles of a member

Each element has an owner who has a responsibility to

divide its element into its sub-elements so as to break down

the complexity of its element and delegate it to

subcontractors. In this way, the owner enters into a

stakeholder relationship with each of the aggregated

elements and undertakes to formulate requirements with

regard to the sub-elements. At the same time, the owners

of the sub-elements take on the role of the supplier in

relation to the main element, which is connected with the

obligation to define restrictions. In addition, the participant

enters into an indirect obligation to use the requirements of

its parent as application scenarios. This results in the

situation for all participants of the non-highest and lowest

levels to have to take all 3 roles in the system against

different elements. It remains to be noted that this ensures

that, for all elements of the "middle levels", all 4

obligations are met, as shown in Figure 3. The four

commitments are the creation of requirements for the

element, the consideration and analysis of the scenarios

from the higher level, the creation of a design for the

decomposition of function and structure and the creation of

restrictions against the stakeholder.

Figure 3 The Layer Element and its relations to the roles

The elements have a specific meaning for the participant

through their relationship. The element of the Stakeholder

is the Context in which its element interacts and must fulfill

the formulated requirements. The Owner's element is the

System whose creation and design is being worked on. The

elements of the Suppliers are System Elements from which

the owner builds his system.

Figure 4 The member and his relations and views to the

Layer Elements

The relationships shown at Figure 4 are valid for all non-

top and bottom-level elements and participants. Looking at

the generic plane model from Figure 1 from the

perspective of the owner of the element C5, taking into

account the relationships in Figure 2 results a specific view

of the plane model (Figure 5).

Figure 5 The Plane Model from the perspective of the

Owner of C5

For a member, not only the color-coded elements are

important, but also all elements with a direct neighborhood

relationship to the colored elements. Since function always

means interaction between the element in scope and the

environment, a consideration of an element from a

functional point of view makes sense only within specific

contexts. In the example given here, the relevant elements

of the layer model would thus be as follows:


73

Figure 6 The relevant elements from the perspective of the

Owner of C5

3 The SEMAS layer model

SEMAS uses a concrete interpretation of the generic plane

model. It consists of the 4 levels of the example shown in

Figure 6. The representation of the layers is changed,

where the general layer model (Figure 3) uses only

elements assigned to layers, the view on SEMAS is

expanded. First, the elements aggregated to a parent

element are placed onto an expanded representation of the

parent element like shown in Figure 7.

Figure 7 Transformation of layers between Generic and

SEMAS layer model

As the SEMAS model is specifically tailored from the

perspective of the Automation System Supplier, the

generic layers and elements get a concrete meaning and a

specific scope as systematically shown in Figure 8.

Figure 8 Concrete scope of layers and elements in SEMAS

The SEMAS layer model extends the concept of the

Generic Layer model elements (A) by dividing them into

requirements, architecture, and constraints as visualized in

Figure 9.

Figure 9 Extension of Generic Layer element to

Requirements, Constraints and Design

Any dependencies in form of horizontal requirements or

constrinats are not considered. Any dependency between

two elements of the same layer is enforced to be taken into

account as constraint to the upper element. By this,

visibility of side-effects and limitations brought into the

overall system by any element is to be considered by the

design of the upper liing element. This mechanism, under

the responsibility of their owners, leads to the need to

resolve any inconsitency or unrealistic targets with their

stakeholder on the next higher layer, instead of just

identifiing and declaring horizontal dependencies or

implementation constraints often brought in by software or

hardware on iplmentation level.

In addition, the gray elements in Figure 6, which were

introduced via their neighborhood relationships, are treated

as external actuators, since only the externally observable

behavior towards the User & Stakeholder Scope is

important.

The complete SEMAS layer model structure is shown in

Figure 10.

This results in a requirement analysis perspective for the

top two levels, but requires a design to allow for complete

and accurate analysis. In the following 4 chapters all 4

levels are presented in their special meaning for the Tier1.

1. The Real World layer describes the behavior of

the automated OEM vehicle in interaction with its

surroundings and the driver.

2. The OEM Vehicle layer describes the interaction

of the automation system with the components

installed in the vehicle and the requirements for

perception and interpretation of the environment.

3. The Automation System level describes the

interaction between the elements contained in the

Automation System.

4. The System Elements level describes the internal

structure and structure of the sub-components of

each System Elements.

List 1 The Layers of the SEMAS Layer Model


74

Figure 10 The SEMAS Layer Model

3.1 Fundamental objectives of the SEMAS

layers

The fundamental objective of each level is to break down

the complexity of the system:

• functionally in sub-functions and features

• performance-related in sub-KPIs

• structurally in sub-elements

• temporality in sub-processes

List 2 Fundamental objectives of SEMAS layers

For each objective, SEMAS provides a dedicated process

for each layer. It is the central goal to perform these 4

processes in a consistent way so that function and structure

are clearly related. This means that sub-functions and

features can always be assigned to separate sub-elements.

Thus, all 4 processes go hand-in-hand through the levels

and can be brought into a clear context at all times. The

processes provide input and output connectors in way that

automatically bi-directional traceability is generated, as

the outgoing links at the end of a layer are the required

inputs of the next layer.

3.2 Model-based Analysis

All 4 objectives from List 2 are addressed in SEMAS with

dedicated methodologies in a centralized engineering

model. This document is dedicated to the processes of

functional and structural decomposition. KPI decomposi-

tion and temporal decomposition will be presented in the

near future. For the functional decomposition, SEMAS

uses a [1] scheme that subdivides end-user functions into

scenarios. These scenarios are modeled and developed in

hierarchical UML UseCase diagrams, where all relevant

actors and use cases are identified and specified. In a fur-

ther step, the features for the Automation System are ex-

tracted from the scenarios, broken down by functional de-

composition to a granularity level, where they can be as-

signed to the sub-elements of the Automation System. For

this SEMAS uses the method according to SysCARS [3].

The structural decomposition takes place through logical

architectures, which hierarchically carry out a structural

decomposition of the relevant elements from level to level.

For this purpose, SEMAS provides a design pattern for all

levels that was derived from the experience of a Tier1 plat-

form that is in production for many years. In SEMAS, all

four processes from List 2 are strongly interlinked, so that

features of the functional decomposition are necessarily as-

signed to elements of the logical architecture. If the logical

architecture does not provide a suitable "infrastructure" for

the features, the necessary adaptations in the logical Archi-

tecture are made. It is important that such adjustments are

always made within the given design pattern.

3.3 Real World Layer

The Real World layer is the highest layer in SEMAS

(Figure 10) and represents the activities in the real world,

in our case the externally observable behavior of the OEM

vehicle on public roads or private grounds. The core builds

the interaction of the vehicle with its environment, the

response to an occurring event within given time intervals

in a clearly specified manner. The following aspects must

be considered:

• Conformity to applicable laws and standards

• Quality and availability of functions

• Requirements for performance and reliability

• Functional safety

• Response times to critical events

• Consistent interaction with the driver

List 3 The most important factors of the Real World

Especially the Real World is highly complex. A vehicle

that moves in public space is confronted with countless sit-

uations and environments. It is impossible and not neces-

sary to create a "complete and comprehensive" description

of the reality. Rather, it is necessary to identify all the sce-

narios relevant to a function and to specify them from the

specific perspective of the OEM vehicle (the second layer

in Figure 10).

In order to fulfill the criteria of its most important factors,

these definitions have to be fulfilled

• For completeness, contain all relevant functional as-

pects

• Describe precisely the required behavior of the OEM

vehicle in interaction with its environment

• Describe precisely the physical, local and temporal

events in the environment

List 4 The goals of the Real World layer

The basis of the behavioral analysis is the logical archi-

tecture pattern. It describes a functional cycle of the OEM

vehicle, its physical behavior (inertia, etc.), the road sur-

face and the environment including the driver. Additional

sources of information (cloud & GPS) also have an influ-

ence on vehicle behavior and are respected.


75

3.3.1 Logical Architecture

The logical architectural pattern (Figure 11) is now a cen-

tral component of the methodology and is seen as the start-

ing point of the Real World Layer modeling. Any enhance-

ments due to the advancing development of new functions

and scenarios should be easy to be inserted. At the heart of

the Real World level is the OEM vehicle, here called scope,

which interacts with its environment via several logical in-

terfaces. Apart from the two sources of information

"Cloud" and "GPS", there are 2 significant working cycles

that directly influence the real-time behavior of the vehicle.

On one hand the loop of driver and OEM vehicle and on

the other the OEM vehicle and its environment. In order to

separate intrinsic and extrinsic properties of the vehicle, the

physical characteristics of the vehicle are presented as an

external element.

Figure 11 Logical Architecture Pattern of the Real World

Layer

3.3.2 Functional Decomposition

At the Real World layer (Figure 10), there is no parent

Stakeholder that formulates requirements towards the

layer. The functionality must be extracted from an analysis.

Proof of completeness is extremely difficult, as there is no

superior reference. Orientation is provided by the

automotive market, especially the segments "Active

Safety" and "Autonomous Driving". General definitions of

driving functions have been established, and both [6] and

[7] define terms and basic descriptions of these functions.

The modeling of functions on the Real World layer is done

in line with the ISO26262 “Functional Hierarchy”[2] with

a hierarchical, level-based process that identifies the end-

user functions at the top level, and then, in more and more-

detailed case distinctions, increases coverage and detail. In

doing so, first functional limits are also set, such as in

which speed range of the OEM vehicle which function

should be available or up to which distance a vehicle

should be followed. From these definitions it is possible to

derive both important factors relating to functional safety

and the definition of functional limits, as in chapter 3.3.4.

In the first step, the total scope of the functionality of the

Real World level is modeled as the sum of End-User

Functions (Figure 12).

Figure 12 Top-most Level of the Function Tree

At the second level (Figure 13), the End-User Functions

are divided into scenario catalogs. These scenario catalogs

are use cases of end-customer features [1]. This level

represents the main and standard use cases, which often

occur and represent the actual core of the function. Looking

at the end-customer functions and their use cases, it can be

seen that the behavior of the vehicle does not

fundamentally differ in many scenarios of different

functions. There are several functions that control for

example the vehicle laterally or longitudinally. Even

within concrete scenarios, there are huge similarities. For

this reason, the second level identifies the shared and

exclusive scenarios and provides them as reusable

scenarios catalogs to the functions above.

Figure 13 End-User Functions and their Scenario Catalogs

Each scenario catalog includes a number of concrete use

cases of a particular type, for example how to control the

speed and acceleration of the vehicle when following a

preceding vehicle. Many functions are basically similar in

a scenario, but differ in detail or in the set performance,

speed, precision or availability limits. In this case, it is still

desirable to reuse the scenario between multiple functions,

but to manage the differences through a variant

management so that a scenario catalog can represent

different occurrences across different versions. Two

possible variants of the same Scenario Catalog are

visualized in Figure 14, where the only difference is

Scenario Catalog 1b v1 vs. v2. The differences in many

cases are related to operation ranges or quality metrics, like

speed ranges or distances to be covered. Often these

variants identify the functional delta between an entry

product against a high performance product, which act both

in the same Operation Domain, but provide a different

quality level of functionalities.

On the third level, the scenario catalogs are broken down

into scenarios or small scenario catalogs. In particular, the

differences between the catalog variants are defined and

rarer scenarios and borderline cases are worked out. At this


76

level, the different variants can be represented as different

combinations of scenarios.

Figure 14 Scenario Catalog Variants

It should be noted that not all scenario catalogs are used

multiple times. There are also exclusive scenarios for each

function, for example the activation logic of the function or

scenarios describing the interaction with the driver, in

which the driver perceives the functions as independent

services.

The third and lowest level represents the chapter structure

of the requirement management document of the Real

World layer. There, the functional and non-functional re-

quirements are formulated for each lowest-level scenario

catalog as marked at Figure 15.

Figure 15 Lowest Layer of the Scenario Catalogs and

Chapter Structure of the Requirements Document

By continuously refining the scenarios and designing the

actors as described in the next chapter, the scenarios be-

come tangible, and with growing coverage of the actors,

potential risk scenarios become apparent and are added to

the catalogs.

By synchronizing the scenario catalogs with the require-

ments, it is ensured that, with continuous analysis, the Real

World layer encompasses the relevant scope.

3.3.3 Precise Description of Behavior

The first question is which external actors are triggering

activities? Which influences exist in the environment and

which interaction is required in detail in which point in

time?

The actuators are identified directly in the already created

scenario catalogs. For each scenario catalog on the second-

lowest level (Figure 16), a UseCase diagram is created that

shows the actors for exactly this catalog.

Scenario Catalog 2 v1

Scenario Catalog 1a

v1

Scenario Catalog 1b

v1

Scenario Catalog 1c

v1

Actor 1

Actor 2

Actor 3

Actor 4

Actor 5

Figure 16 UML UseCase Diagram of a Scenario Cata-

log2 v1 of Figure 14

All actuators must be created as an instance of one of the

gray blocks of Figure 11. Excluded is the OEM vehicle,

which is the central element of the scenarios and is the

scope of the requirements description. This ensures that all

scenarios and their elements follow the same structure and

that at the next level all System Features (Figure 20) in

chapter 3.4.2 can be merged.

When creating the UseCase diagrams for the scenario cat-

alogs, the actors are taken from a common library. By re-

using the actors between all UseCases, we ensure that the

architecture on the next layer identifies all external inter-

faces. This includes other road users, infrastructure ele-

ments and the driver of the OEM vehicle including occu-

pants if relevant (seat belt, ...). Interactions directly be-

tween the actors are not shown in this document, but are of

critical importance in many more complex scenarios or

contribute to the basic understanding of the scenario.

3.3.4 Determine temporal and local limits

In determining limits of dynamic systems, there is a direct

coupling of temporal and local boundaries. A vehicle mov-

ing with higher speed has a longer braking distance, and

with a longer braking distance, the sensor system of an au-

tomatic braking system must have a higher detection range

in order to be able to initiate braking early enough. When

determining the applicable limits of this example, the max-

imum possible intrinsic velocity and the maximum detec-

tion distance of the sensor are only two variables of the

problem. The mass of the vehicle and thus the momentum

of the vehicle, the surface conditions of the road and many

others have a direct impact on the behavior of the vehicle.

In order to be able to dynamically change values that have

already been predicted and are refined iteratively, a meth-

odology for modeling of these processes is needed.

This process is part of SEMAS, but not covered in this doc-

ument.


77

3.4 The OEM Vehicle Layer

The OEM vehicle level from Figure 10 defines the pro-

cesses within the OEM vehicle. This involves the interac-

tion between vehicle internal components and functions. It

has been proven in practice not to choose a too fine granu-

lar partitioning of functions and components, as the com-

plexity of architectures and the number of possible archi-

tectures is extremely high. To address the diversity of OEM

vehicle architectures on the market, a shift of functions be-

tween the components will be done by configuration of the

components. Such a configuration is desirable in any case,

as suppliers follow platform strategies to adapt to ever-

changing vehicle architectures. In SEMAS this level is

considered from the point of view of the Tier1 supplier.

The most important factors of this level are:

• The consolidation and mediation of requirements

from all relevant scenario catalogs

• Level of freedom at allocation of functions to compo-

nents

• Decomposition the merged requirements to the ele-

ments within the OEM vehicle

List 5 Objectives on OEM Vehicle layer

3.4.1 Structural Decomposition

Again, the logical architecture pattern as shown in Figure

11 is the foundation of level modeling. The model of the

Real World layer is extended by a view to the OEM Vehi-

cle internal elements. The given ports of the OEM Vehicle

from Figure 11 are connected on the inside of the vehicle

with internal elements identified at Figure 17. Now the au-

tomation system of the next-lower level is the center of at-

tention. The execution cycles from the previous level and

further refined. It shows that the internal elements serve as

a link between the external elements and the Automation

System. Most importantly, Environment Behavior actua-

tors are directly linked to the Automation System because

it is responsible for monitoring the environment, being the

eyes of the OEM vehicle.

Figure 17 Logical Architecture Pattern of the OEM Vehi-

cle Layer


At this level, requirements are derived from the scenarios

of the Real World layer. Each scenario catalog is further

detailed by converting the use cases like the ones from Fig-

ure 16 into sequence diagrams as shown in Figure 18. For

this purpose, the scenario catalog elements of the second

level are modeled with the actuators of the use cases as life-

lines. Since we are here on the White-Box level of the

OEM vehicle, the sequence diagrams are expanded with

actuators from the vehicle interior as shown in Figure 18

by the dark green actors.

OEM Vehicle

Veh Scena rio Catalog 2 v1

Actor 2 Actor 3 Actor 4 Actor 5Actor 1

Actor 6

Actor 7 Actor 8 Actor 9

Figure 18 Analysis of a Scenario catalog on OEM Vehicle

Level

Again the actuators must be instantiated from the gray ele-

ments of the logical architecture on Figure 17 and shared

between all scenarios via a global library. The new actua-

tors are identified by analyzing the information flows be-

tween the Real World actors and the OEM Vehicle, repre-

sented by the center lifeline in Figure 18. From this, the

most efficient processes are found and worked out. In the

next step, the process analysis extracts the System Features

(SFeats) that the automation system has to provide within

the processes found in (Figure 20).

Figure 19 The System Features (SFeats) of the Automa-

tion System


78

The System Features are building blocks of the End-User

Functions (Figure 12), extracted from concrete scenarios.

The System Features are organized centrally and reused be-

tween all the scenarios. There are different types of features

like for example environmental awareness, decision mak-

ing, planning and generation of output signals. The detailed

requirements are derived from many scenarios and are

merged and negotiated between each other, as schemati-

cally shown in Figure 20.

Figure 20 Negotiation of Requirements from different

Real World Scenario Catalogs (RWSC) to OEM Vehicle

layer System Features (SysFeat)

All scenarios produce concrete requirements, for example

the detection of pedestrians. These requirements define the

detection different values for distance, latency and preci-

sion. But all requirements must be met by the Automation

System. That means, any contradiction must be identified

and eliminated. Generally any requirement towards the

same Actor must be mediated and a final set of consistent

requirements needs to be produced. Here, the first plausi-

bility and mediation is carried out.

On the OEM Vehicle layer, the system features form the

chapter structure of the requirement management docu-

ment, in which the actual requirements are worked out and

linked to “Real World” layer requirements.

3.5 The Automation System Layer

From the perspective of the Tier1, we reach the Owner

level (Figure 6). From here, architecture is no longer the

result of the analysis of interaction between Actor and the

scope (Figure 18), but represents a design process of sev-

eral components. In addition, the temporal behavior of the

elements on the layer changes from physical rules and

deadlines to time-cyclic or event-controlled discrete pro-

cesses.

The most important factors are:

• Efficient implementation of solutions

• Robustness, Reliable, Availability

• Scalability, Re-use, Flexibility

List 5 Objectives on Automation System Layer

3.5.1 Structural Decomposition

SEMAS also provides the logical architecture pattern for

the Automation System. It is characterized by reuse and

technological aspects like resource consumption, band-

width and potential safety classification. A similar pattern

is presented at [4].

Figure 21 Logical Architecture Pattern of the Automation

System Layer

Any component, here called System Element, will be in-

stantiated by one of the elements of the logical architec-

ture pattern on Figure 21.


The requirements to the Automation System have been

modeled by System Features (Figure 19) at the OEM Ve-

hicle layer. The strategy is to first break down the System

Features before designing and implementing concrete al-

gorithmic or physical solutions. SEMAS uses functional

decomposition according to SysCARS [3]. Therefore, the

System Features are decomposed to the right level of gran-

ularity, where they can be assigned to the elements of the

logical architecture (Figure 21) of the system. The break-

down is done through Activity diagrams, where for each

System Feature a dedicated Activity diagram is created and

modeled. The actions displayed in Figure 22 stand for Sys-

tem Element Features to be implemented by the System El-

ement it is allocated to. Allocation is modeled through the

SYSML method of Call Behavior as shown here:

Figure 22 Break-Down of a System Feature to several

System Element Features (SyElement Features) with allo-

cation to System Elements


79

With all the break-downs of the System Features into Sys-

tem Element Features and their assignment of Call Behav-

ior Operations to the System Elements, the System Ele-

ments acquire their Features to be implemented.

Figure 23 Decomposition of Automation System in Sys-

tem Elements and assigned System Element Features (Sy-

Element Features)

Each System Element in the model represents a System El-

ement Requirement Specification, which is synchronized

with the Requirement Management tool. Here in Figure

23, in the same way as shown in Figure 16 for the Auto-

mation System, the System Element Features stand for the

chapter structure of the requirement specification.

3.6 The System Element Layer

This layer is responsible to specify the System Element in-

ternals. Depending on the complexity of the internal struc-

ture and function, a Component Design process can di-

rectly interface to the upper System Element Features. If

the System Element is of higher complexity or a delivery

of an external entity, the System Element Requirement

Specification acts as Stakeholder Document for the sup-

plier’s development process.

4 Summary

SEMAS is a System Engineering Methodology that com-

bines model-based system engineering with traditional re-

quirement engineering. The SEMAS Layer Model is used

to perform functional analysis, structural design and the de-

composition of functions, features, timing and perfor-

mance KPIs. The SEMAS Layer Model creates by its na-

ture bi-directional traceability, as all Layers have dedicated

process outputs that connect to the inputs of the next layer

natively. SEMAS uses a generic layer model from the per-

spective of the Tier1 Automation System Supplier to model

the scope of the end-user, OEM, himself and his supplier’s

requirements and constraints. Each layer of the SEMAS

Layer Model has a dedicated mission.

1. The Real World Layer analyses the needs of the

End-User by specifying the expected behavior of

the OEM Vehicle in interaction with its environ-

ment

2. The OEM Vehicle Layer specifies the expected

behavior of the Automation System in interac-

tion with other vehicle interior systems

3. The Automation System Layer specifies the re-

sponsibilities of each System Element and allo-

cates features to each element.

4. The System Element Layer specifies the internal

structure and functions of each System Element.

This document has given an overview over the function,

feature and structural decomposition and the interconnec-

tion of the functional and structural processes of the SE-

MAS Layers.

5 Abbreviations

SAE – Society of Automotive Engineers and short name

for SAE J3016, a standard describing the classification

and definition of terms of Autonomous Driving

Tier1 – after the OEM (Original Equipment Manufac-

turer) the second layer in the Supply Chain, especially

used in Automotive industry.

KPI – Key Performance Indicator

6 Literature

[1] Pegasus Symposium – Requirements & Conditions

Stand 4 - Scenario Description, Nov 2017

[2] ISO - 26262 Road Vehicles – Functional Safety,

2018

[3] Pique, J-D: SysCARS – SysML for embedded au-

tomotive systems, ERTS2 2014, 2014

[4] Eberle, U.: Automatisiertes Fahren und Entwick-

lungsprozesse, Pegasus, Nov. 2017, pp.54

[5] The SPICE User Group 2005 - 2007, ISO/IEC

15504 pp.49-60

[6] Bartels, A., Eberle U., Knapp A.: Adaptive --Deliv-

erable D2.1 -- System Classification and Glossary,

Feb. 2015

[7] EuroNCAP: Assessment Protocol – Safety Assist,

Version 8.0.2, Now. 2


80

Ein Ansatz für die agile, verteilte Entwicklung Cyber-Physischer „Sys-tems of Systems“ (Work in Progress)An Approach for agile, distributed Development of Cyber-Physical"Systems of Systems"Prof. Dr. Christoph Grimm, Dipl.-Ing. Frank Wawrzik, Dr.-Ing. Carna ZivkovicTU Kaiserslautern, AG Entwurf von Cyber-Physikalischen Systemen, Kaiserslautern, e-Mail:{grimm|wawrzik|zivkovic}@cs.uni-kl.de

Kurzfassung

Der Beitrag gibt einen aktuellen Überblick über einen Ansatz zur Entwicklung Cyberphysischer „Systems of Systems“.Der Ansatz ermöglicht das kontinuierliche und integrierte Testen und Verifizieren der Systemkonsistenz von frühen Ent-wicklungsphasen bis hin zu Produktion und Betrieb. Der Ansatz beruht auf der abstrakten und dadurch domänenüber-greifenden Modellierung auf der semantischen Ebene. Hierzu werden Funktionen und Eigenschaften durch Ontologienund Regeln repräsentiert, die Abhängigkeiten und Kontextwissen darstellen. Durch die Verwendung von Internetstan-dards (OWL, SWRL, JSON) sowie die Architektur der Toolunterstützung wird insbesondere eine verteilte Entwicklungunterstützt.

Abstract

This paper provides a recent overview of an approach to develop cyber-physical "systems-of-systems". The approachenables the continuous and integrated testing and verification of consistency of the system from early development phasesto production and operation. The approach is based on an abstract and domain independent modelling on semantic level.For this functions and properties are represented through ontologies and rules, which consist of dependencies and contextknowledge. Through the utilization of internet standards (OWL, SWRL, JSON) as well as the architecture of the tool,particularly a distributed development is supported.

1 Einführung

Cyber-Physische „Systems of Systems“ zeichnen sichdurch hohe Komplexität und Heterogenität aus. In der Re-gel sind mehrere Firmen an der Entwicklung beteiligt,die nicht notwendig „synchron“ und in einem Wasser-fallmodell arbeiten. Vielmehr werden solche Systeme zu-nehmend agiler entwickelt. Interessanterweise erzwingengerade die gestiegene Komplexität und Heterogenität ei-ne immer agilere Vorgehensweise: Das Wissen zu „Use-Cases“ und anderen Problemen ergibt sich oft erst in spä-ten Phasen, wenn Prototypen erprobt werden oder zuneh-mend erst während des Betriebs. Dann verschmelzen Ent-wicklung und Betrieb („DevOps“ Kontinuum). Diese Ent-wicklung führt zu neuen Anforderungen an Entwicklungs-prozesse und Werkzeuge. Es gibt bislang allerdings kaumWerkzeuge und Methoden, die diesen Anforderungen ge-recht werden. Die Anwendung von Modellierungssprachenund Simulation gelingt bis zu seinem Punkt, an dem dieSysteme zu heterogen, zu komplex oder keine Modellemehr verfügbar sind. Daher finden weit abstraktere Mo-delle des „Systems Engineering“ (SE) oder „Product Life-cycle Management“ (PLM) Anwendung. Diesen fehlt oftdie Semantik, um Systeme auch umfassend zu verifizie-

ren. Dieser Beitrag gibt einen Überblick über eine Metho-dik zur Verifikation, die Modelle von SE und PLM mitMethoden des Semantic Webs kombiniert, um eine kon-tinuierliche Verifikation des Gesamtsystems zu ermögli-chen. Die Verifikationsmethodik ist vergleichbar mit „Pro-perty Refinement“, z.B. [1, 2], insbesondere mit der Ve-rifikationsstrategie für Mixed-Signal Systeme in [3]. Da-bei wird von Signalen abstrahiert. Stattdessen werden Ei-genschaften und Constraints auf Funktionen bezogen, diedurch eine Ontologie definiert und über eine URL im OWL(Web Ontology Language, [9]) geteilt wird. Diese wird er-gänzt durch Regeln, die mit SWRL (Semantic Web RuleLanguage, [8]) modelliert werden. SWRL wird in [4, 5]zur Modellierung von Fabriken und in [6] zur Integrati-on von Produktentwurf und Produktionsprozess über meh-rere Hersteller/Lieferanten verwendet. [7] benutzt SWRL,um implizites Kontextwissen über Anforderungen für eineuropäisches Schienenverkehr Management Systems ab-zuleiten. Regeln werden dabei verwendet, um lokale Zu-sammenhänge abzuleiten. Im Vergleich zu [4, 5, 6, 7]zeigt dieser Beitrag auf, wie die Internetstandards OWLund SWRL verwendet werden können, um Constraints undihre systemweiten Abhängigkeiten mit einem (symboli-schen) Constraintnetz quantitativ und kontinuierlich wäh-


81

rend des Entwicklungsprozesses zu analysieren und da-mit eine Brücke zur Systemverifikation zu schlagen. Ab-schnitt 2 gibt eine kurze Einführung in den Kontext der Ar-beit. Abschnitt 3 gibt einen Überblick über das WerkzeugAGILA und die Anwendung von OWL/SWRL und JSONin diesem Werkzeug. Abschnitt 4 gibt einen Ausblick aufweitere Arbeiten.

2 Hintergrund

UML ist eine graphische, formale Sprache zur Modellie-rung Software. SysML [10] ist eine Erweiterung von UML(mit leichten Modifikationen) zur Modellierung von Sys-temen. Es ergänzt die UML-Möglichkeiten zur Modellie-rung von Verhalten um Möglichkeiten zur Modellierungvon Anforderungen („Requirement Diagram“) und Struk-turen („Structure Diagram“) durch die hierarchische Zer-legung („Block Definition Diagram“, „Internal Block Dia-gram“), Parametrischen Abhängigkeiten („Parametric Dia-gram“) und Package Diagrammen. Dagegen findet die tex-tuelle, formale Sprache „Web Ontology Language“ OWLoft Anwendung zur Darstellung von Wissen, um darausautomatisch durch einen „Reasoner“ Schlüsse zu ziehen.Beide Sprachen lassen sich vermutlich sogar vollständigineinander überführen. Relevante Unterschiede resultie-ren aber aus dem um die Sprachen herum entstandenenÖkosystem an Werkzeugen und Standards. Eine Einfüh-rung in die Terminologie und Semantik beider Sprachenwäre hier zu komplex. Viele Begriffe lassen sich aberschnell durch Analogien zur bekannten objektorientier-ten Programmierung erklären. In Tabelle 1 werden we-sentliche Begriffe aus dem Bereich des Knowledge En-gineering ihren Äquivalenten aus objektorientierten Pro-grammiersprachen gegenübergestellt und kurz erläutert.Eine Ontologie kombiniert i.d.R. eine Taxonomie (isA-Beziehungen zwischen Objekten) mit einer Konzeptuali-sierung (InstanceOf-Beziehungen). Die Taxonomie model-liert dann abstraktes Wissen, während die Konzeptualisie-rung eine explizite Unterscheidung zwischen abstraktemWissen der Taxonomie und „existierenden“ Dingen ein-führt, auf die das Wissen anwendbar ist.SWRL ist eine Erweiterung von OWL und stellt Regelndar. Regeln sind Implikationsregeln und haben die Form:

antecedent 7→ consequent (1)

Sowohl antecedent als auch consequent bestehen aus ei-ne Konjunktion (ˆ) von „atoms“. „Atoms“ sind Klassen,Instanzen, Object Properties, Data Properties, Werten undSWRL built-ins. Freie Variable werden durch ein Präfix„?“ markiert. Ein Beispiel aus [8]:

parent(?x,?y)∧brother(?y,?z) 7→ uncle(?x,?z) (2)

Ausdrücke auf numerischen Zahlen können als funktionaleAbhängigkeiten notiert, z.B. [8]:

?x = op : numeric−add(3,?z) (3)

Eine Ontologie ist konsistent, wenn mindestens eine Inter-pretation (Belegung mit Werten) existiert, die alle Regelnund Axiome erfüllt.

Ontologie(deutsch)

OO-Progr. Erläuterung

Concept,Class(Konzept,Klasse)

Klasse Beschreibt eine Obermen-ge abstrakter Elemente mitidentischen Eigenschaften.

Subclass,superclass

Vererbung;Subklasse

Subklasse erbt alle Eigen-schaften von Superklasse.isA beschreibt Beziehungvon Subklasse zu Superklas-se.

Individual(Instanz)

Instanz Konkretisierung einer Klas-se; InstanceOf Beziehungvon Instanz zu Klasse.

Objectproperty(Objektei-genschaft)

Eigenschaft Beziehung zwischen zweiKlassen oder zwei Instanzen

Datatypeproperty(Datenei-genschaft)

Variable Ermöglicht es, einer Instanzeinen konkreten Wert einesDatentyps zuzuordnen, z.B.String, Real.

Annotation Kommentar Annotationen dienen derLesbarkeit der Ontologieund geben Kontext über dieVerwendung der Begriffe

Tabelle 1 Begriffe OO-Programmierung vs. Begriffe On-tologie

3 SWRL zur Modellierung von Ab-hängigkeiten und Constraints

3.1 AGILA und seine Schnittstellen imÜberblick

Das Entwicklungs- und Verifikationswerkzeug AGILAwird aktuell im Rahmen des BMBF-Projekts „GENIAL!“sowie des ECSEL Projekts „Arrowhead Tools“ entwickelt.Abbildung 1 gibt einen Überblick über die Architektur desWerkzeugs in einem Anwendungsszenario über den ge-samten Produktlebenszyklus. AGILA ist als eine Daten-bank mit Webinterface realisiert. Zunächst muss eine Wis-sensbasis erstellt werden – oder aus einem vorangegange-nen Projekt vorhanden sein. Die Wissensbasis kann z.B.mit Protégé erstellt werden. Sie umfasst bekanntes Wissenzu einem zu entwickelndem System. Dies geschieht mitWerkzeugen des Semantic Web, z.B. Protégé, die auf demOWL und SWRL Format aufbauen. Anforderungen und ei-ne hierarchische Zerlegung des Systems in Teile bzw. Teil-funktionen werden im Rahmen des Systems Engineerings(SE) vorgenommen, z. B. mit dem Werkzeug CAMEO. Ei-ne Übersetzung von SysML zu OWL ist Gegenstand zu-künftiger Arbeiten. Während Entwicklung und Test wer-den für jedes Teil bzw. jede Teilfunktion die spezifiziertenEigenschaften durch Charakterisierung (vgl. [3]) bestimmtund in AGILA eingespeist, welches kontinuierlich über dieProduktlebensdauer die Konsistenz des Gesamtsystems ve-rifiziert. Zur Konsistenzprüfung wird ein Constraintsolver


82

Abbildung 1 AGILA und seine Schnittstellen.

GesamtBremskraft des Autos ist Bremskraft der Bremsenund des Motors.

Brakes(?b) ˆ hasBrakePower(?b, ?bpb) ˆ Engine(?e) ˆhasBrakePower(?e, ?bpe) ˆ swrlb:add(?totalBakePower,?bpb, ?bpe) -> hasBrakePower(MyCar, ?totalBrake-Power)

Verzögerungsleistung > 4*Antriebsleistung. totalBrakePower(MyCar, ?totalBrakePower) ˆEngine(?e) ˆ hasMotorPower(?e, ?kWMotor) ˆswrlb:multiply(?kWMotor, 4.0, ?kWMotor4) ˆswrlb:greaterThan(?totalBrakePower, ? kWMotor4)->brakesConstraintFulfilled(MyCar)

Tabelle 2 SWRL Rules für Bremskraft, Antrieb und Verzögerung

auf Basis von AADD [3] in Java und mit Hilfe von Z3 ver-wendet.

3.2 Modellierung von Wissen und DesignDie Modellierung in OWL orientiert sich im Kern anSysML, um eine Interoperabilität zu ermöglichen. Fokusist die Modellierung der hierarchischen Zerlegung durch„Block Definition Diagrams“. Diese werden durch die Ob-jekteigenschaft hasParts modelliert. Die Wissensbasis be-steht aus

• Einer Taxonomie möglicher Verfeinerungen einesSystems bzw. einer Funktion, die alle möglichen Ar-chitekturvarianten modelliert. Diese werden in OWLdurch die Objekteigenschaft „isA“ und die entspre-chenden Objekteigenschaften hasParts modelliert.

• Eigenschaften und deren Abhängigkeiten voneinan-der. Diese werden in OWL und SWRL modelliert. Sieumfassen

– Abhängigkeiten auf einer Hierarchieebene.

– Hierarchische Abhängigkeiten und Constraintsder Eigenschaften eines Systems von den Eigen-schaften seiner Teile.

Zum Beispiel hat ein Auto die Funktionen Antrieb undVerzögerung. Eigenschaften dieser Funktionen sind z.B.Antriebs- und Verzögerungsleistung. Diese kann wie folgtin SWRL modelliert werden:Ein Design wird durch Instanziierung von Konzepten derWissensbasis modelliert und sukzessive verfeinert. EineInstanz verweist immer auf ein Konzept der Wissensbasis

mit der InstanceOf Beziehung. Daten- und Objekteigen-schaften müssen dabei immer eine Verfeinerung des Kon-zepts sein. Zusätzlich müssen die spezifizierten Constraintserfüllt sein.(System-)Eigenschaften und ihre Abhängigkeiten, diedurch SWRL Regeln modelliert werden, werden dabei mitWerten oder Wertebereichen (Intervallen) belegt und dieKonsistenz der Ontologie sichergestellt.

4 Stand der Implementierung, Dis-kussion und Ausblick

Der Beitrag gibt einen Überblick der Möglichkeiten vonSWRL zur Modellierung von Constraints und Abhängig-keiten. SWRL ermöglicht es, arithmetische Berechnungenzur Bestimmung und Überwachung von Constraints einer-seits und die Modellierung von möglichen Architekturenandererseits zusammenzubringen. Vorteil der Verwendungvon SWRL ist, dass SWRL in der Werkzeugwelt zur Wis-sensmodellierung unterstützt und standardisiert ist. Aller-dings zeigt sich, dass selbst die „Human Readable“ Syntaxvon SWRL recht komplex und eben nicht intuitiv ist, selbstfür einfache Zusammenhänge. Dies ist ein schwerwiegen-der Nachteil, da Experten, deren Wissen erfasst wird in derRegel keine SWRL Experten sind (und umgekehrt). Hiergilt es noch, geeignete Vereinfachungen oder auch Alter-nativen zu finden. Aktuell sind Ontologien und Constraint-netz zwei getrennt implementierte Softwarepakete. Die In-tegration dieser Teile ist Gegenstand aktueller Arbeiten.


83

doi:10.1007/3-540-50214-9_8

doi:10.1007/3-540-50214-9_8

https://doi.org/10.1007/s00165-003-0012-7

https://doi.org/10.1007/s00165-003-0012-7

https://www.w3.org/Submission/SWRL/

https://www.w3.org/Submission/SWRL/

https://www.w3.org/TR/owl-features/

https://www.w3.org/TR/owl-features/

http://www.uml-sysml.org/documentation/sysml-tutorial-incose-2.2mo

http://www.uml-sysml.org/documentation/sysml-tutorial-incose-2.2mo

5 Literatur

[1] Jean-Raymond Abrial (1988). “The B Tool (Abstract)”(PDF). In Robin E. Bloomfield and Lynn S. Mars-hall and Roger B. Jones. VDM — The Way Ahead,Proc. 2nd VDM-Europe Symposium. Lecture Notes inComputer Science. 328. Springer. pp. 86–87. doi:10.1007/3-540-50214-9_8. ISBN 978-3-540-50214-2.

[2] Boerger, E. Formal Aspects of Computing(2003) 15: 237. https://doi.org/10.1007/s00165-003-0012-7

[3] C. Zivkovic, C. Grimm, M. Olbrich, O. Scharf,E. Barke, “Hierarchical verification of AMS sys-tems with affine arithmetic decision diagrams,” IE-EE TCAD, pp. 1–1, 2018, issn: 0278-0070. doi:10.1109/TCAD.2018.2864238.

[4] Fortineau, V., Paviot, T., Louis-Sidney, L., and Lamou-ri, S. (2012). SWRL as a rule language for ontology-based models in power plant design. International Con-ference on Product Lifecycle Management (PLM 12)IFIP, Montréal, Canada.

[5] Mun, D., Lee, S., Kim, B., Han, S.: ISO 15926-baseddata repository and related web services for sharinglifecycle data of process plants. Volume 38., Interna-tional Conference on Product Lifecycle Management,Bath, UK (2006) 713–725

[6] Asmae Abadi,Hussain Ben-Azza,Souhail Sekkat. Im-proving integrated product design using SWRL ru-les expression and ontology-based reasoning. ProcediaComputer Science,2018,127.

[7] Khaled Bahloul, Marwa Sahnoun, Takrak Chaari.Using SWRL rules for Requirements engineering: Ap-plications to ERTMS/SRS specifications. InternationalConference on Digital Arts, Media and Technology(ICDAMT), 2018

[8] (online, 12.02.2019) SWRL: A Semantic Web Ru-le Language Combining OWL and RuleML, https://www.w3.org/Submission/SWRL/

[9] (online, 12.02.2019) OWL: Web Onology Lan-guage Overview, https://www.w3.org/TR/owl-features/

[10] S. Friedenthal, A. Moore, R. Steiner: OMG Sys-tems Modeling Language Tutorial, INCOSE 2008,http://www.uml-sysml.org/documentation/sysml-tutorial-incose-2.2mo


84

Processor Hardware Security Vulnerabilities and their Detection byUnique Program Execution CheckingMohammad Rahmani Fadiheha, Dominik Stoffela, Clark Barrettb, Subhasish Mitrab and Wolfgang Kunza

a Technische Universität Kaiserslautern, Kaiserslautern, Germanyb Stanford University, Stanford, US

Kurzfassung

Die kürzlich entdeckten Angriffe auf moderne High-End-Prozessorarchitekturen, bekannt als Spectre und Meltdown, ha-ben großes öffentliches Aufsehen erregt. Diese Angriffe werden durch Informationsübertragung über so genannte covertchannels („geheime Informationskanäle”) ermöglicht, bei denen vertrauliche Daten ausgelesen werden können, ohne dassdabei explizit Informationen zwischen vertraulichem (geschütztem) Speicher und Angreifer ausgetauscht werden. In deraktuellen Diskussion wird die Ansicht vertreten, dass die Existenz solcher covert channels vor allem auf Optimierungenin komplexen Prozessorarchitekturen beruht, wie z.B. speculative execution und out-of-order execution. Um die neuenSicherheitslücken auszuschließen, wird der Verzicht auf solche Optimierungen gefordert. Dieses Arbeit zeigt jedoch,dass sich das Problem nicht nur auf hoch optimierte Prozessorarchitekturen beschränkt: Wir berichten über eine neueKlasse von covert channels die auch in Prozessoren mit geringerer Komplexität, d.h. ohne speculative execution oderout-of-order execution auftreten können. Solche einfacheren Prozessoren sind heutzutage in vielfältigen Anwendungendes Alltags anzutreffen, vom Smart Home über Internet of Things bis hin zum Autonomen Fahren. In dieser Arbeit stel-len wir ein systematisches Verfahren zur Verhinderung solcher covert channels vor, genannt “Unique Program ExecutionChecking” (UPEC). Während alle bisher entdeckten covert channel-Angriffe mittels “intensivem Nachdenken” durchWissenschaftler gefunden wurden, findet UPEC solche Schwachstellen, die für covert channels genutzt werden können,vollautomatisch und, dank formaler Techniken, erschöpfend.

1 Extended Abstract

Recent discovery of security attacks in advanced proces-sors, known as Spectre and Meltdown, has resulted in highpublic alertness about security of hardware. The root causeof these attacks is information leakage across microarchi-tectural “covert channels” that reveal secret data withoutany explicit information flow between the secret and theattacker.A digital designer has numerous degrees of freedom fordesigning the microarchitecture of a given ISA specifica-tion. However, the same degrees of freedom that the desi-gner uses for optimizing a processor design may also leadto microarchitectural side effects that can be exploited insecurity attacks. In fact, it is possible that, depending onthe data that is processed, one and the same program maybehave slightly differently in terms of what data is storedin which registers and at which time points. These diffe-rences only affect detailed timing at the microarchitecturallevel and have no impact on the correct functioning of theprogram at the ISA level, as seen by the programmer. Ho-wever, if these subtle alterations of program execution atthe microarchitectural level can be caused by secret data,this may open a “side channel”. An attacker may triggerand observe these alterations to infer secret information.In microarchitectural side channel attacks, the possible lea-kage of secret information is based on some microarchitec-tural resource which creates an information channel bet-ween different software (SW) processes that share this re-

source.In these scenarios, the attacker process alone is not capa-ble of controlling both ends of a side channel. In order tosteal secret information, it must interact with another pro-cess initiated by the system, the “victim process”, whichmanipulates the secret. This condition for an attack actual-ly allows for remedies at the SW level which are typicallyapplied to security-critical SW components like encryptionalgorithms. They prohibit the information flow at one endof the channel which is owned by the victim process.This general picture was extended by the demonstration ofthe Spectre and Meltdown attacks. They constitute a newclass of microarchitectural side channel attacks which arebased on so called “covert channels”. These are special ca-ses of microarchitectural side channels in which the atta-cker controls both ends of the channel, the part that trig-gers the side effect and the part that observes it. In thisscenario, a single user-level attacker program can establisha microarchitectural side channel that can leak the secretalthough it is not manipulated by any other program. Suchhardware (HW) covert channels not only can corrupt theusefulness of encryption and secure authentication sche-mes, but can steal data essentially anywhere in the system.Many sources believe that such covert channels are in-trinsic to highly advanced processor architectures basedon speculation and out-of-order execution, suggesting thatsuch security risks can be avoided by staying away fromhigh-end processors. This paper [1], however, shows thatthe problem is of wider scope: we present the Orc Attack,


85

a new class of covert channel attacks which is possible inaverage-complexity processors with in-order pipelining, asthey are mainstream in applications ranging from Internet-of-Things to Autonomous Systems. Rather than exploitingout-of-order or speculative execution features of a proces-sor, the Orc attack is based on the orchestration of compo-nent communication in an SoC, such as Read-After-Writehazard handling in core-to-cache interface.The Orc attack shows that subtle design changes in stan-dard RTL processor designs, such as adding or removing abuffer, can open or close a covert channel. Although spe-cific to a particular design, such vulnerabilities may in-flict serious damage, once such a covert channel becomesknown in a specific product. The new insight that the exis-tence of covert channels does not rely on certain types ofprocessors but on decisions in the RTL design phase un-derlines the challenge in capturing such vulnerabilities andcalls for methods which can deal with the high complexi-ty of RTL models and do not rely on a priori knowledgeabout the possible attacks.We present a new approach as a foundation for remedyagainst covert channels: while all previous attacks werefound by clever thinking of human attackers, this paperpresents an automated and formal method called “UniqueProgram Execution Checking” which detects and locatesvulnerabilities to covert channels systematically, includingthose to covert channels unknown so far.UPEC aims to prove that any instruction sequence executesuniquely regardless of the content of memory in its protec-ted regions. By using a tailor made verification model, theUPEC property is formalized as a CTL formula. Provingthis property for the RTL design of a microarchitecture willfind all possible paths in the design in which a secret da-ta can traverse and possibly leak to a malicious software.As a result, the hardware designer can obtain precise in-formation to comprehensively assess the impact of designdecisions on the security of the system.The verification model consists of two identical instancesof the SoC under verification in which also the memorieshold the same set of values except for the memory locationof a defined secret. Based on this model, UPEC proves thatinstructions execute uniquely and leave a unique footprint.Any mismatch in the two SoC instances regarding the mi-croarchitectural or architectural footprint of an instructionsequence execution is an alarm for secret data propagationor a leakage path. These paths have too be found throughan iterative application of UPEC in combination with in-ductive proofs to rule out certain safe propagation paths.By discovering all possible leakage and propagation paths,UPEC can prove whether or not a protected memory regionis observable through direct memory access or timing sidechannels. As a result, the confidentiality requirement canbe formally verified using the proposed method.The CTL property can be rewritten as an Interval Propertyon a bounded circuit model. This allows for techniques tomitigate computational complexity and makes the methodapplicable to realistic processor designs.Based on UPEC, for the first time, covert channels can bedetected by a systematic and largely automated analysisrather than only by anticipating the clever thinking of a

possible attacker. Unlike previous works which target cer-tain attack scenarios or suspicious paths, UPEC does notrely on any a priori knowledge about the attack scenarioand can even detect previously unknown HW vulnerabili-ties, as demonstrated by the discovery of the Orc attack inour experiments. Furthermore, it works on the RTL designrather than abstract models, which makes it possible to cap-ture vulnerabilities which have been introduced in the RTLdesign stage and which are not visible in more abstract mo-dels.We explored the effectiveness of UPEC by targeting diffe-rent design variants of RocketChip, an open-source RISC-V SoC generator. The considered RocketChip design is asingle-core SoC with an in-order pipelined processor andseparate data and instruction level-1 caches. Our experi-ments have shown that:

• UPEC is capable of finding HW vulnerabilities whichdo not corrupt functional correctness, but which stillopen back doors for covert channel attacks.

• UPEC requires a reasonable amount of manual workand computational power.

• UPEC can detect bugs in memory protection unitswhich lead to violation of confidentiality.

Literatur

[1] M. R. Fadiheh, D. Stoffel, C. Barrett, S. Mitra, andW. Kunz, “Processor hardware security vulnerabili-ties and their detection by unique program executionchecking,” in Design, Automation & Test in EuropeConference (DATE), 2019 (to appear).


86

Modellbasierte Konfiguration einer grobkörnigen rekonfigurierbarenArchitekturModel-Based Configuration of a Coarse-Grained Reconfigurable Ar-chitectureJens Frömmer, Dr. Nico Bannow, Axel Aue, Prof. Dr. Christoph Grimm, Prof. Dr. Klaus SchneiderRobert Bosch GmbH, Stuttgart, Germany, {jens.froemmer, nico.bannow, axel.aue}@de.bosch.comTU Kaiserslautern, Kaiserslautern, Germany, {froemmer, grimm, schneider}@cs.uni-kl.de

Kurzfassung

Domänenspezifische grobkörnige rekonfigurierbare Architekturen erreichen annähernd die Performanz und Effizienz wiehartverdrahtete Implementierungen bei der Beschleunigung ausgewählter datenintensiver Algorithmen. Allerdings setztdies eine Überführung der Algorithmen auf die grobkörnige rekonfigurierbare Architektur voraus, die zu einer möglichsthohen Auslastung führt. Diese Arbeit stellt ein Modellierungs-Framework für einen modellbasierten Konfigurations-Arbeitsfluss vor, der es Algorithmen- und Hardware-Experten ermöglicht, gemeinsam und effizient Algorithmen auf einegrobkörnige rekonfigurierbare Architektur zu übertragen. Das Konzept wurde durch die Implementierung und das Testenentsprechender Prototypen validiert.

Abstract

Domain specific coarse-grained reconfigurable architectures offer a performance and efficiency close to hardwired im-plementations when accelerating multiple selected data intensive algorithms. However, this requires the transfer of al-gorithms onto the coarse-grained reconfigurable architecture to lead to an as high as possible utilization. This paperintroduces a modeling framework for a model-based configuration workflow enabling algorithm experts and hardwareexperts to collaboratively and efficiently transfer algorithms onto a coarse-grained reconfigurable architecture. The con-cept was validated by the implementation and the testing of respective prototypes.

1 Introduction

In the automotive industry, algorithms for data based mod-eling, advanced control theory, advanced signal process-ing, and physical modeling are expected to run on thenext generation of engine control unit (ECU) microcon-trollers. Hence, rising performance requirements have tobe addressed by these microcontrollers despite the highcost constraints in the ECU market. A novel coarse-grainedreconfigurable architecture (CGRA), called Data Flow Ar-chitecture (DFA), has been developed at Bosch in accor-dance to the requirements and constraints of the automo-tive field. This paper contributes a model-based approachto transfer algorithms onto the DFA:

1. A Simulink-based modeling framework enables themodeling of algorithms with the DFA specific pro-cessing elements, called base blocks, and simulatesthe execution of the modeled algorithm on the DFA.The modeling framework offers different levels of ab-straction from the configuration options of the DFAvia a custom library and masks. Hence, DFA expertscan access all functional configuration options avail-able in the DFA, while algorithm experts without ex-tensive knowledge of the DFA can rely on abstracted

configuration options or predefined elements.

2. A configuration workflow imports the DFA-basedmodels of algorithms as well as models of a DFAhardware accelerator to enable the mapping of algo-rithms onto the DFA and the generation of a respec-tive DFA-configuration. The output of the code gen-erator comprises a SystemC test bench and a VHDLtest bench. The meta-models and the code generatorare based on the Eclipse Modeling Framework (EMF)and the Eclipse Xpand.

1.1 Transferring Algorithms onto a CGRACreating highly optimized configurations manually at theregister-level is time-consuming, error-prone, and requiresa deep understanding of the targeted CGRA as well as theselected algorithm. Hence, a framework or tooling is re-quired to enable DFA experts as well as non-DFA expertsto transfer algorithms onto the DFA - each on their ownand in collaboration with each other. Any framework ortooling has to allow for the efficient utilization of the targetarchitecture and the modeling as well as the verification ofalgorithms before implementing them in hardware [1].Earlier compilers, such as DRESC [2] and further ap-


87

proaches [3], enable the use of high level languages toprogram a CGRA, but usually assume a mesh-like inter-connect, homogeneous processing elements, or other prop-erties that do not apply to the DFA. More recent researchimproves upon different aspects, e.g. support of route shar-ing [4], register-awareness [5], or memory-awareness [6].The Split-Push Kernel Mapping (SPKM) [7] explicitly ex-tends the range of supported CGRAs. As finding the op-timal mapping of an algorithm onto a CGRA is in gen-eral NP-complete [8], compilers often rely on heuristicslike the SPKM and, thus, achieve only a limited utiliza-tion of the targeted CGRA. As the DFA features runtime-reconfigurable and heterogeneous base blocks that are con-nected via a sparse crossbar and offer complex configura-tion options, existing compiler or mapping techniques can-not simply be applied to the DFA.The introduction of a specific instruction set architecture(ISA) similar to the stream-dataflow ISA [9], may lead toa high utilization, but typically requires deep knowledge ofthe CGRA. Hence, a specific ISA presents an initial hurdlefor the widespread acceptance of the CGRA. Furthermore,a new ISA may be difficult to integrate into establishedworkflows and tools, especially in case of the model-drivenautomotive software engineering.The application in the automotive field demands for the ac-celeration of selected algorithms and a high performanceper size ratio. An effective usage of the DFA resourceshas a higher priority than the transfer of arbitrary algo-rithms. Therefore, a model-based approach offering agraphical user interface is preferred. Compared to a com-piler, a model-based workflow requires manual work foreach algorithm, but allows for an interactive mapping ontothe CGRA that is not limited by a fixed compiler imple-mentation. A modeling framework that initially abstractsfrom the CGRA promises better accessibility than a spe-cific ISA. Little work has been published on structural andgraphical approaches to program a CGRA. One exampleis MorphoSys [10], which offers a graphical user interface(GUI) called mView to generate the application context fileand to simulate the processing elements.

1.2 Background and Related WorkThe modeling framework as well as the configurationworkflow are based on existing tools and frameworks inorder to integrate well into existing software engineeringworkflows and to speed up the development of the respec-tive prototypes.

1.2.1 Data Flow ArchitectureThe DFA is an architecture for flexible hardware accel-erators. A DFA hardware accelerator is to be integratedinto the microcontroller, mapped into the global addressspace to be accessible by the masters, and has a master in-terface for direct memory access capabilities with a highbandwidth. The base blocks of the DFA are connected viaa sparse crossbar and implement control, load, store, andmathematical operations. Additionally, other functional-ities, such as a software core, can be integrated as baseblocks. The base blocks may only write to their succes-

LOADa[i]

MAC*, accu

STOREa · b

LOADb[i]

Inte

rco

nn

ect

Figure 1 Simplified representation of a DFA configura-tion to compute the dot product.

sors, but not read from their predecessors. Only the control,load, and store base blocks may access the local SRAMor global memory. Each base block offers configurationoptions to specify its exact operation, e.g. to define thememory access pattern of a load block. Hence, a DFA con-figuration comprises configurations of the global registers,base blocks, SRAM, and eventually debug features. Froman algorithmic perspective, a DFA configuration describesa data flow: The configuration of the load and store baseblocks determines which data flows in and out of the oper-ations as well as the order of the data. The configuration ofthe operational base blocks like the multiply-accumulate(MAC) base block determines the types and the order ofthe operations. The execution of an operation depends onthe availability of the required operands. The emulation ofdata flow graphs gives the DFA its name, while it is notclosely related to classical data flow architectures, such asthe Manchester Prototype Dataflow Computer [11]. Fig-ure 1 visualizes the DFA-specific computation of the dotproduct. The design space of the DFA especially includesthe structure and the available connections of the intercon-nect as well as the types, the quantities, and the features ofthe base blocks.As a detailed description of the DFA would go well be-yond the scope of this paper and is not required for theintroduction of the configuration workflow, it is publishedseparately.During the introduction, the term base block (BB) refersto a BB of the DFA. Starting from Section 2, these arecalled hardware base blocks (HWBBs) to differentiate themfrom the virtual base blocks (VBBs) of the modeling frame-work. These VBBs are the functional counterparts of theHWBBs.

1.2.2 SimulinkSimulink offers a broad library of blocks, which can beused to graphically model dynamic systems in block dia-grams. Those blocks have input and output ports, can beparametrized, and allow for the modeling of hierarchicalstructures. The mathematics of a Simulink block are ex-pressed in Table 1. Simulink blocks are connected to eachother via signals linking output to input ports. The outputvalues may change over time and signals can be of a com-plex as well as multi-dimensional data types. Continuous


88

t simulation timex = [xc,xd ]xc continuous statesxd discrete statesxdk discrete states at time step ky = f0(t,x,u) outputsx = fd(t,x,u) derivativesxdk+1 = fu(t,xc,xdk ,u) update

Table 1 Function parameters and equations that representa Simulink block. Output, derivative, and update calcu-lation depend on the current simulation time, states, andinputs.

and discrete states are both supported. Having modeled adynamic system in such a block diagram, it can be simu-lated to analyze its behavior using fixed- or variable-stepsolvers. As Simulink is an established software engineer-ing tool in the automotive industry including Bosch and agraphical data flow modeling language [12], it is the start-ing point of the development of the modeling framework.

1.2.2.1 SimulationThe simulation of a Simulink model is executed in severalphases [13]: model compilation, link phase, and simulationloop phase. During the first phase (model compilation),the model is converted to an executable form. To that end,block parameters are evaluated, signal attributes as well asblock execution order and sample times are determined,model hierarchy is flattened, and blocks are optimized. Inthe second phase (link phase), the required memory is al-located and the method execution order is determined. Fi-nally, an initialization is performed once in the simulationloop phase, before states and outputs are computed at eachtime step as visualized in Figure 2. The time steps aredetermined in conformance to the defined simulation startand stop time as well as the block sample times. The com-putation of states and outputs is performed for each blockin the previously scheduled execution order.

Initialization

Calculate next time step

Calculate outputs

Update discrete states

Update continuous states

Loop

Time

step

Figure 2 The simulation loop of Simulink.

The time-based simulation of Simulink differs from typi-cal hardware simulation kernels like the ones of SystemC[14] and VHDL [15], which use discrete event simulation.In case of Simulink, the time always progresses after thecomputations of the states and the outputs. Thus, there areno equivalents to the delta cycles of hardware simulationkernels, which do not induce a progress in time. Addition-

ally, high level communication channels, such as blocking,are not supported by Simulink. However, using fixed sam-ple times as well as discrete states allows for a cycle-likebehavior. The missing delta cycles and high level commu-nication channels can partly be compensated by exploitingthe state and output computations of user-defined Simulinkblocks.

1.2.2.2 S-FunctionsSimulink supports different kinds of user-defined blocks:subsystem blocks, Matlab function blocks, Matlab systemblocks, S-Function blocks, and masked blocks. The func-tionality of an S-Function (block) is defined by Matlab,Fortran, C or C++ code [16]. A mask is used to parametrizea block. The Matlab execution engine can automaticallyload and execute S-Functions after their compilation. S-Functions may contain continuous, discrete, or hybrid sys-tems.S-Functions implement callback methods that are calledfrom the simulation engine of Simulink. There are methodsfor each stage of the simulation loop phase. The initializa-tion, the output calculation, and the update of the discretestates are the most relevant phases for the modeling frame-work.

1.2.3 EclipseEclipse describes itself [17] as “a mature, scalable andcommercially-friendly environment for open source soft-ware collaboration and innovation.” The top-level Eclipseproject contains a framework for tool integration as wellas a Java development environment, which is built usingthe framework for tool integration. Due to its platform-based approach, Eclipse is easily extensible. According toa recent survey [18], Eclipse is one of the most popular de-velopment environments. This and the EMF are the mainreasons, why Eclipse was chosen as a basis for the config-uration workflow.

1.2.3.1 Eclipse Modeling FrameworkThe EMF provides a meta-model called Ecore as well as anumber of code generation options [19]. Its objective is toincrease the productivity when developing a tool manipu-lating a structured data model. Therefore, it allows for themodeling of a data meta-model on the basis of the Ecoremeta-model. Given such a data meta-model, Java code canbe generated in order to:

• create, directly access, and manipulate models,

• access models using adapter classes, and

• edit models via a generated editor.

The EMF supports model specifications described in theXML Metadata Interchange (XMI) format, which espe-cially includes support for the eXtensible Markup Lan-guage (XML) and the Unified Modeling Language (UML).Furthermore, models can be defined via Java interfaces.The modeling of meta-models via Ecore as well as the codegeneration options, especially the generated editors and thedirect access to the models, cover many features required


89

by the configuration workflow. Therefore and because ofthe code generation options, the configuration workflowbuilds upon it.

1.2.4 Code generationThe Eclipse Modeling Project includes three template-based model-to-text transformation projects supporting theEMF: Acceleo, Jet, and Xpand. According to Syriani etal. [20] template-based code generation may be described“as a synthesis technique that uses templates in order toproduce a textual artifact, such as source code, called theoutput.” A template consists of a static and a dynamic part.Text fragments that directly appear in the output are con-sidered as static parts, whereas dynamic parts consist ofmeta-code. This meta-code is executed by the template en-gine, thus computing the output based on run-time input.The meta-code also describes what kind of run-time inputis expected. Usually, it is a model or a part of a model.While Acceleo, Jet and Xpand are all viable solutions forgenerating the configuration and test benches, an earliercomparison [21] favors Xpand for a number of reasons,such as Xpand being the most powerful of the three ormissing domain-specific support in the case of Jet. Otherthan Jet and Acceleo, Xpand offers straightforward meansto pass an arbitrary number of models to a template. Asmultiple models, e.g. a mapping model and a data flowmodel, need to be evaluated in parallel for the generationof a DFA configuration, this is a required feature for theconfiguration workflow. Hence, the code generation tem-plates of the configuration workflow use Xpand.

1.3 OutlineThe remainder of this paper is structured as follows:Firstly, Section 2 presents the concept of the Simulinkmodeling framework and the configuration workflow. Sec-ondly, Section 3 elaborates how these concepts are real-ized based on the above introduced tools and frameworks.Thirdly, Section 4 briefly reviews the chosen tools andframeworks before introducing a proof of concept by test-ing the prototypes. Finally, Section 5 summarizes and dis-cusses the findings of this paper.

2 Concept and Requirements

While being separated, the modeling framework and theconfiguration workflow must work well together.

2.1 Modeling FrameworkAs explained before, the HWBBs resemble data flow op-erations that can be connected to each other. For eachHWBB, a counterpart is implemented in Simulink as anS-Function that offers the same behavior and the same con-figuration options, but only at the functional level. Thesecounterparts are the VBBs. The functionally identical con-figuration options between the HWBBs and the VBBs sim-plify the mapping framework and the code generation ofthe configuration workflow.

A major challenge regarding the Simulink modelingframework is the realization of a behavior similar to realhardware. The VBBs implemented as S-Functions must:

• delay their output by one time step,

• block their execution in case of unfulfilled data depen-dencies,

• block their data output in case of a subsequent block-ing VBB,

• be independent of the execution order, and

• resemble the functionality as well as the configurationoptions of the HWBBs.

Aligned to the simulation loop of Simulink, a rather sim-ple solution to delay the output by one time step is toperform the computation of new values during the updatephase. The computed values are stored in internal statesand copied to the output during the next time step. In caseof unavailable inputs, a base block computes no values andsignalizes invalid outputs. As the VBBs are connected toeach other and may have multiple in- and outputs, the fol-lowing case is possible: The successor of a VBB cannotcontinue its computation because another required input isunavailable. Thus, the prior VBB has to wait for its suc-cessor to continue or its outputs need to be buffered. Ofcourse, the VBBs must produce the same output values inthe same order at identical time steps independently of theirexecution order.

2.2 Configuration WorkflowGiven an algorithm and a DFA hardware accelerator, aconfiguration of the DFA hardware accelerator that resultsin the computation of the algorithm can be derived as de-picted in Figure 3: The first step is to create and import a

Step 1a:Data flow model

Step 1b:DFA hardware model

Step 1c:Mapping model

Step 2:Configuration and test benches

Alg

ori

thm

DFA

h

ard

war

e

Figure 3 The configuration workflow comprises all nec-essary steps to transfer an algorithm to the DFA.

corresponding Simulink model using the modeling frame-work. In parallel, a representation of the DFA hardwareaccelerator must specify the relevant characteristics: thestructure and available connections of the interconnect aswell as the quantities, types, and features of the HWBBs.


90

These information are available in IP-XACT, which is anXML format for language and vendor-neutral descriptionsof electronic circuit designs [22]. The availability of thedata flow model and the hardware model enables the map-ping of VBBs onto HWBBs via references. A variety ofmapping options may be explored, since several HWBBsmay match a VBB. Furthermore, the HWBBs offer fea-tures for resource-sharing: multiple execution units as wellas register sets per HWBB and multiple transfer cycles perport. Finally, the code generator can create a VHDL aswell as a SystemC test bench including the configurationof the DFA hardware accelerator by evaluating the dataflow model, the hardware model, and the mapping model.Importing and evaluating the hardware model enables thegeneration of the respective architecture in VHDL and Sys-temC.

2.2.1 Importing a Simulink ModelImporting the Simulink models into custom data flow mod-els allows for a clear interface separating the configurationworkflow from the modeling framework. The separationresults in a configuration workflow that is independent offuture changes to Simulink, especially changes to the struc-ture of the Simulink models. Furthermore, it enables theuse of diverse modeling frameworks. By using an Ecoremeta-model for the data flow model, the EMF can generatethe Java classes to directly load, manipulate, and store thedata flow models including a model-editor.An importer can traverse and evaluate the Simulink modelusing the XML Path Language (XPath), since it consistsof compressed XML files. The meta-model of those XMLfiles must be reverse engineered and implemented manu-ally based on exemplary models, since there is no XMLschema available for the Simulink XML file format.

2.2.2 Mapping FrameworkAs the data flow model and the hardware model are basedon the EMF, there are two options for the mapping model:1. combining data flow and hardware model with the ad-dition of the mapping information or 2. referencing VBBsand HWBBs using the respective identifiers. The first op-tion would lead to a single model that contains all informa-tion required for the code generation. As this model wouldalso contain redundant information, option two is preferredresulting in three strictly separated models.

2.2.3 Code GenerationThe use of Xpand templates combined with supportiveJava classes enables the evaluation of multiple models inparallel. The Xpand templates can be separated into thegeneration of the DFA configuration including test benchesin VHDL and SystemC as well as the generation of the ar-chitectural description in VHDL and SystemC.

3 Implementation

Prototypes of the modeling framework and configurationworkflow were implemented to provide a proof of concept

and a starting point for the future tool development regard-ing the DFA.

3.1 Modeling FrameworkFigure 4 visualizes the modeling framework: The coreof the modeling framework, which is based on Simulink,is a custom library of VBBs that are implemented as S-Functions. These VBBs equal HWBBs in their functional-ity and their configuration options. Additionally, there aredifferent variants of these VBBs available that offer pre-defined configurations and simplified masks to match cer-tain operations, such as a multiplication or an subtraction.All VBBs use a fixed sample time to emulate clock cycles.An algorithm modeled with the VBBs can be simulated toensure that it behaves as intended. Predefined models ofalgorithms are available in the library as well. The com-munication of the VBBs relies on custom bus objects thatare stored in a data dictionary. A data dictionary is a per-sistent storage that may store bus objects, parameters, andother data defining the behavior of a model. A data dictio-nary can be linked to a Simulink model in order to allowfor access to the stored data. A bus object may consist ofmultiple signals.

Library

Data dictionary

Bus objects

Algorithm models

Configured base blocks

Base blocks

Figure 4 The components of the modeling framework.

In general, methods, macros, and constants that realizeshared functionalities are implemented in respective headerfiles to avoid redundancies and to allow for reuse and ex-tensibility. E.g. Simulink, callback, bus, and parameterrelated functionalities are encapsulated in header files.

3.1.1 Types of Virtual Base BlocksThe VBBs may be categorized as: accessing memory,performing mathematical operations, and controlling thedata flow. The local SRAM of the DFA is emulated viamemory blocks that allocate a one dimensional array, pro-vide a pointer for read-access, and store incoming data atthe address attached to the data. Test bench blocks arememory blocks that initialize this one dimensional arraywith algorithm-specific test data and check incoming dataagainst the expected results. The memory blocks may onlybe connected to the memory accessing VBBs, which arethe load and the store VBBs. The load and the store VBBsallow for the access of the data via their address generationthat can compute complex memory access patterns. Thememory accessing VBBs are usually connected to VBBs


91

performing mathematical operations like addition, subtrac-tion, multiplication, accumulation, exponential function,and more. Additionally, there are VBBs that perform spe-cial operations, which either control the data flow or arespecific to the modeling framework. The latter cannot befound in an actual DFA hardware accelerator. A simple ex-ample is the DataToConfig VBB that converts a DATABUSinto a CONFIGBUS to reconfigure a subsequent VBB on thebasis of a computed value contained in the DATABUS.

3.1.2 Bus ObjectsThere are five different types of bus objects: ACCUBUS,CONFIGBUS, DATABUS, READBUS, and WRITEBUS. VBBsthat accumulate values use the ACCUBUS to internally storean accumulated value. The CONFIGBUS contains one ormultiple values and targets, which specify configurationparameters. The DATABUS transports data, which wasloaded from the memory or computed. Only the mem-ory blocks as well as the memory accessing VBBs use theREADBUS and the WRITEBUS. The READBUS may contain apointer to a one dimensional array, while the WRITEBUSmay contain a value and an address. All of the bus objectsinclude valid flags. The DATABUS and the CONFIGBUS ad-ditionally include information about the loop level, e.g. tocontrol when to output and reset an accumulated value.

3.1.3 Block SynchronizationWhen starting the simulation or in case of data dependen-cies, a mechanism is required to either buffer an arbitraryamount of data in between the VBBs or to postpone fur-ther computations until the next VBB is ready. E.g. anadd VBB, which computes the sum of two inputs A andB, is connected to a second add VBB. The input D of thesecond add VBB is unavailable. Consequently, the secondadd VBB cannot process C to compute E as highlighted inFigure 5.

ADDA

ADD E

D (unavailable)

Figure 5 A simple example of a data flow that requires asynchronization method such as blocking or a buffer.

As HWBBs would block in such a scenario, two ap-proaches implementing a synchronization have been eval-uated: feedback-connections and a callback functionality.Additional feedback-connections from a subsequent VBBto its previous VBB allow a VBB to accept or decline a pro-vided input. However, for each output port an additionalinput port is required. This results in a very confusingconnectivity structure even in case of small models. Thecallback functionality is not visible by connections in themodel while providing the same functionality: The secondadd VBB calls a function of the first add VBB via a call-back pointer, setting a flag which signalizes that the subse-quent VBB is not ready for new inputs or, in other words,has not consumed the provided input yet. Therefore, thefirst add VBB stops its computation and waits. As the cir-cumstances require, the first add VBB calls also the call-

back function of the previous VBB in order to block thecomputation of the previous VBB. Whenever D becomesavailable, the second add VBB continues its computationand calls the callback function of the first add VBB. Thistime, it sets a flag signalizing that the subsequent VBB isready for new values and has consumed the provided in-put. In addition to the flag, the callback function requiresany VBB to send a port index. Thereby and by havingseparate flags for each output port, a VBB can wait for allsubsequent VBBs to become ready.All VBBs contain and utilize the callback functionality, butnot the memory blocks. The memory blocks do not re-quire any synchronization, since memory access conflictsetc. are not modeled.

3.1.4 Configuration ParametersDifferent masks per VBB allow for different levels of ab-straction and provide the actual S-Functions with the con-figuration parameters. Figure 6 presents the mask of aMAC VBB including a few configuration options relatedto the accumulator of the MAC VBB: E.g. the internal

Figure 6 A screenshot of the mask of a MAC VBB inSimulink.

source of the accumulated value may either be the resultof the multiplication or the result of the addition. Whilethe library of the prototype includes only a limited numberof masks or abstraction levels, future extensions are simpleand fast due to the mask editor of Simulink. Matlab scriptsenhance the masks via the callback and the initializationmethods of the masks.

3.1.5 Predefined AlgorithmsThe predefined algorithms are subsystem blocks with amask that contains an initialization script. This script gen-erates, connects, and configures the underlying VBBs inaccordance to the algorithm-specific configuration param-eters of the mask. E.g. the mask of a matrix multiplicationblock requires the sizes of the input matrices. Using the


92

provided sizes, the initialization script computes the dy-namic configuration values, such as the base addresses ofthe individual matrices. The user is able to explore theresulting VBBs and their configurations by opening thegenerated subsystem. The predefined algorithms simplifynot only the simulation and testing of various input dimen-sions, but abstract also from the configuration options ofthe underlying VBBs.

3.1.6 Virtual Base BlockThe core functionalities of the VBB that are not related tothe simulation engine of Simulink, such as the evaluationand generation of loop level information, mathematical op-erations, and the address generation, were implemented asindependent C++ classes to enable reuse. All classes in-herit from a base class that defines generic attributes andmethods as well as a virtual operation method.

3.1.6.1 Initialization PhaseThe initialization sets the sample time, declares the porttypes and the number of ports per type, checks the parame-ters, and allocates memory for the internal states. Many ofthe initialization values are derived from the configurationparameters, e.g. the number of ports per type. Addition-ally, the class that corresponds to the VBB is instantiatedand provided with the respective configuration parameters.Afterwards, the simulation loop starts: For every time stepduring the loop phase the output methods of all VBBs areinvoked, followed by all update methods.

3.1.6.2 Update PhaseFirst of all, the inputs are evaluated and copied to inter-nal states, if these are not blocked by previous operands.Depending on the configuration, the execution of a VBBmay consume the operands. In case of CONFIGBUS inputs,the contained reconfiguration values are applied, if the tar-geted states may be overwritten. Afterwards, the updatemethods of the VBBs perform the actual operation to com-pute the next output values and states: If all operands thatare required for the configured operation are available, theoperands are passed on to the instance of the class that cor-responds to the VBB. The operation method of the sameinstance is invoked and, if the output is enabled by the looplevel, the result of the operation is assigned to the next up-date of the outputs. Finally, the VBB informs its previousVBBs, which provided the inputs, whether it is ready fornew inputs via the callback functionality explained above.

3.1.6.3 Output PhaseThe output methods of the VBBs assign the updates thatwere computed during the update phase of the previoustime step to the respective states and outputs. E.g. the flagsindicating whether subsequent VBBs are ready for new in-puts are not updated directly via the callback functionality.Instead, the updated flags are stored as copies before theirassignment during the output phase. This ensures a consis-tent behavior that is independent of the execution order ofthe VBBs.

3.2 Configuration WorkflowThe IP-XACT models of the DFA hardware architecturewere defined outside the scope of the configuration work-flow. However, the IP-XACT XML schemes were im-ported in EMF and the respective Java code to load, store,and manipulate IP-XACT models including a simple edi-tor was generated. Therefore, those IP-XACT models candirectly be passed to the code generator.

3.3 Simulink XMLThe Simulink model description in XML can either beobtained by decompressing the Simulink model file orby exporting the Simulink model to an XML file via thesave_system command. The latter may not be availablein all versions of Simulink.The XML file of a Simulink model is structured as fol-lows: Apart from additional model information, such asthe editor and simulation settings, the description of themodel itself is encompassed by the <System> tag. Itcomprises <Block> and <Line> tags, which describe theSimulink blocks and their connections. The attributes ofthe <Block> tag define the name and the type of the block.The configuration parameters of the mask of the block arewithin tags encompassed by <Object> and <Array>tags. Connections are realized via references to blocks andports within the <Line> tags.

3.4 Data Flow Meta-ModelNaturally, the description of a Simulink model is wellsuited to describe a data flow. Hence, the data flow meta-model of the configuration workflow is quite similar to thestructure within the <System> tag of a Simulink model.The main difference is the introduction of specific blocktypes corresponding to the VBBs. The respective Java codeto directly load, store, and manipulate the data flow modelsas well as a simple editor was generated via the EMF.

3.5 Simulink Model ImporterThe Simulink model importer was implemented in Javaand traverses a Simulink XML file using XPath. In parallel,the importer creates and fills a new data flow model. Theinformation extracted from the Simulink model comprisethe names, the types, the parameters, and the connectionsof the VBBs and memory blocks.

3.6 Mapping Meta-ModelThe mapping meta-model was implemented using theEMF and its code generation options, too. Each entry ofa mapping model references exactly one VBB and oneHWBB. Supplementary information specifies the execu-tion unit and the register set of the HWBB.

3.7 Code GenerationThe code generator shown in Figure 7 comprises the GUI,the generator, and the utility classes implemented in Javaas well as the configuration and test bench templates, thebase block templates, and the architecture templates im-


93

plemented in Xpand. The GUI collects the required mod-els: data flow model, hardware model, and mapping model.Furthermore, the GUI passes the models on to the gener-ator and invokes it. The generator starts the code genera-

GUI

Generator

Configuration and test bench

templates

Base block configuration

templates

Architecture templates

Utility classes

Figure 7 A visualization of the components of the codegenerator, where the arrows indicate the instantiation orusage of another component.

tion of Xpand using the configuration and test bench tem-plates as well as the architecture templates. Each target,currently VHDL and SystemC, requires an explicit con-figuration or test bench template as well as an architecturetemplate. Of course, the architecture templates are requiredonly, in case the architecture is not already described in thetargeted SystemC or VHDL model. If the architecture isalready described in VHDL or SystemC, the identifiers ofthe HWBBs in SystemC and VHDL must comply with theidentifiers of the HWBBs in IP-XACT.

3.7.1 TemplatesThe configuration and test bench templates contain staticinitializations, e.g. providing test data, and invoke thegeneric base block configuration templates by traversingall entries of the mapping model. There is one base blockconfiguration template for each HWBB type. The baseblock configuration templates require the data flow model,the hardware model, and the mapping model as parame-ters. In a base block configuration template, the respectiveconfiguration parameters of the VBB are extracted fromthe data flow model and used to generate address-data tu-ples that configure a HWBB. Listing 1 shows the templatefor an address-data tuple, where ms_system refers to thedata flow model, block to the VBB, and getParam to amethod of the utility classes. The latter returns the config-uration value of a specified configuration parameter of theVBB.A more complex example is the specification of the tar-get address of a HWBB: The subsequent VBBs of a VBBare found in the data flow model, their HWBB counter-parts are identified in the mapping model, and the respec-tive addresses of the HWBBs are derived from the hard-ware model. As queries that require the parallel traversal of

multiple models are difficult or even impossible to realizedirectly in the Xpand templates, the utility classes extendthe templates.

i_BB_Ex_fAddSub[<<split >>][<<bb >>][regId + BB_EX_ADDSUB__REG_ID__OPERATION]=

<<getParam(block , "PARAM_OPERATION_ADDSUB",ms_system)>>;

Listing 1 An excerpt of the generic configurationtemplate for a HWBB. The names of the constants wereadapted to enable a proper layout.

3.7.2 Utility ClassesThe utility classes implement several methods, which maybe seen as queries. These queries often traverse the sameparts of a model multiple times. In order to reduce theruntime, the utility classes set up data structures holdingthe respective model information, which was gained duringthe first traversal. E.g. the getParam query must traverseall configuration parameters of a VBB to find and returna single configuration value. By creating a hash map thatis initially filled with all parameters, the parameters of theVBB need to be traversed only once. The utility classesfor the base block configurations and the architecture areindependent of each other.

4 Results

In order to prepare for future work on the modeling frame-work and the configuration workflow, the succeeding sub-section briefly reviews the utilized frameworks and toolsbefore the proof of concept.

4.1 Benefits and Limitations of the chosenFrameworks and Tools

The use of Simulink, Eclipse, EMF, and Xpand enabledthe development of the prototypes for the modeling frame-work and the configuration workflow in a viable amount oftime. Beside the initially elaborated reasons for the use ofthese tools and frameworks, additional benefits and limita-tions became apparent.

4.1.1 SimulinkWhen extending Simulink in the foreseen ways, the func-tionality and the tools of Simulink, such as the simulationdata inspector, are applicable. Hence, developers who arefamiliar with Simulink need only to understand the func-tionality of the VBBs, but are not required to learn a newtool. The simulation data inspector is well suited to analyzethe simulation behavior of a modeled algorithm. As theVBBs are S-Functions, they can be combined with otherSimulink blocks.Apart from these benefits, several limitations became ap-parent during the implementation of the prototype:


94

• the block masks offer only limited customization andscripting options,

• all methods of a C(++) S-Function are static,

• bus objects have to be created in advance and madeavailable to any model,

• the data types are insufficient, and

• there is no XML schema for Simulink models avail-able.

These limitations are no critique, but the result of usingSimulink for a (simplified) simulation of digital hardware,which Simulink was not designed for. However, most ofthe limitations were overcome or can be overcome by fu-ture work.

4.1.2 Eclipse, EMF, and XpandCreating meta-models and generating the Java code to ma-nipulate respective models was straightforward due to theuse of Eclipse and EMF. The same held true for the importof the IP-XACT XML schema and the parallel evaluationof an arbitrary number of models using Xpand.However, the documentation was not as extensive and clearas the documentation of Simulink.

4.2 Proof of ConceptIn order to validate the proposed modeling framework andthe configuration workflow, synthetic tests as well as afull run-through were performed on the basis of the imple-mented prototypes. The test and development environmentis listed in Table 2.

Software VersionOperating system Windows 10, 64 bitMatlab, Simulink R2016bC(++) MEX compiler MinGW GCC 4.9.2 from TDMJava 1.8 (JRE)Eclipse Neon.3 Release (4.6.3)EMF 2.12.0.v20160526-0356Xpand/Xtend 2.2.0.v201605260315

Table 2 The software versions that were used to imple-ment and test the prototypes of the modeling frameworkand the configuration workflow.

4.2.1 Synthetic TestsVarious synthetic tests were performed throughout and af-ter the development of the modeling framework to validatethe simulation behavior. Especially, the callback function-ality as well as the exploitation of the update and the out-put phase to separate the computation of new outputs andstates from their actual assignment are critical features ofthe modeling framework. These features must not onlywork together for one VBB, but also in case of several com-municating VBBs. In order to test these features, multiplemodels were set up as test benches and run with differ-ent execution orders of the VBBs. Figure 8 illustrates a

bb_load0:9

BB_Load_0_0

bb_load0:11

BB_Load_0_1

algtestbench0:17

algtestbench

bb_addsub0:0

BB_Addsub_1_0

bb_addsub0:3

BB_Addsub_1_1

bb_addsub0:5

BB_Addsub_1_2

bb_load0:13

BB_Load_0_2 bb_addsub0:7

BB_Addsub_1_3bb_load0:15

BB_Load_0_3

Figure 8 A model used to examine the influence of theblock execution order on the behavior of the VBBs.

Simulink model that was used to exercise stress tests re-garding the behavior of the VBBs: Two data paths withdifferent pipeline lengths end in a single VBB. The modelwas simulated with varying execution orders specified viathe block properties. The assigned order can be seen atthe top on the right of the blocks. In addition to the debugoutput, the outputs of the individual blocks were loggedvia the data inspector of Simulink. All simulations yieldedthe exact same outputs confirming that the behavior of theVBBs is insensitive to the block execution order.Additionally, models covering specific scenarios like twosynchronized sub-algorithms and models testing individ-ual base block features like the computation of complexmemory access patterns were simulated. The results werechecked against the expected behavior of the DFA. All syn-thetic tests were successful showing that the models cre-ated via the proposed modeling framework behave on afunctional level as the DFA does.

4.2.2 Modeled AlgorithmsAlgorithms ranging from a simple matrix multiplicationto complex Bosch-internal algorithms with series rele-vance, such as the gaussian process [23], were success-fully modeled and simulated. Besides, these algorithmswere added as library elements with respective masks andMatlab scripts, which create, connect, and configure theunderlying VBBs. An example of a matrix multiplicationmodel can be seen in Figure 9: The model computes A×Bwhere A and B are matrices. A memory block holds theinput matrices and the expected result matrix. The loadand store VBBs are connected to the memory block to readthe input matrices and write the result matrix. The firstload VBB reads the input matrix B column-wise, while thesecond load VBB reads the input matrix A row-wise. Theelements of the columns and the rows are sent to the subse-quent MAC VBB. The MAC VBB multiplies the incomingvalues and accumulates the results. The attached loop levelinformation indicates the end of a column or row. Aftereach row or column, the MAC VBB outputs the accumu-


95

Testbenchfor(nxm)*(nxl)

matrixmultiplication

testbench

Load

BB_Load_0_0

Load

BB_Load_0_1

Store

BB_Store_0_0

Mac

BB_Mac_0_0

Figure 9 A matrix multiplication subsystem that wasgenerated via a masked library element.

lated value to the store VBB and resets it to zero. Thevalues sent by the second load VBB as well as the validflags of the outputs of the MAC VBB are depicted in Fig-ure 10: After one initialization cycle or time step, the load

0 2 4 6 8 10 12

0

2

4

6sub_matrix_mult/BB_Load_0_1:1.valuesub_matrix_mult/BB_Mac_0_0:1.valid

Figure 10 The output of the simulation data inspector ofSimulink for a load and a MAC VBB of the model pre-sented in Figure 9.

VBB outputs a new value at each cycle. After a row, whichcontains in the given example the four elements [1,2,3,4],was sent by the load VBB to the MAC VBB, the MACVBB outputs the accumulated value with the valid flag setto true.

4.2.3 Full Run-ThroughThe algorithms mentioned above were not only success-fully modeled and simulated in Simulink, but also used toperform full run-throughs of the configuration workflow.Complex algorithms are split into sub-algorithms. The pre-sented artefacts of this section stem from an algorithm thatconsists of two sub-algorithms:

1. a vector normalization y[i] = a[i] · x[i]+b[i] and

2. a mean squared error mse = ∑Ni=1 c[i] · (y[i]−d)2.

4.2.3.1 Data Flow ModelsComparing the data flow models to the original Simulinkmodels showed that all VBBs, connections, and parame-ters were extracted. Listing 2 exemplarily presents an ex-cerpt of an imported Simulink model and comprises a con-nection realized via the <line> tag, a VBB realized viathe <msblock> of the type MS_BB_AddSub, its input and

output ports, and a parameter PARAM_OPERATION_ADDSUBconfiguring the VBB to perform a subtraction instead ofan addition. The presented parameter matches the earlierexample of a code generation template in Listing 1.

<ms_system ><linesourcePort ="// @ms_system/@ms_block .13/@outport .4"targetPort ="// @ms_system/@ms_block .5/@inport .0"sourceBlock ="// @ms_system/@ms_block .13"targetBlock ="// @ms_system/@ms_block .5"/>

<ms_block xsi:type=" dFAModelSimulink:MS_BB_AddSub" name=" BB_Addsub_1_0"><outport id="1"/ ><inport id="1"/ ><inport id="2"/ ><param value ="1"paramType =" PARAM_OPERATION_ADDSUB "/>

Listing 2 An excerpt of the imported Simulink model ofan algorithm computing a vector normalization and themean squared error.

4.2.3.2 Mapping ModelsThe VBBs of the imported data flow models were mappedonto the HWBBs of an IP-XACT model describing thequantities, the types, and the features of the HWBB as wellas the interconnect of a DFA hardware accelerator. Fig-ure 11 shows an example for such a mapping: The map-

Figure 11 A mapping file opened in the editor, whichwas generated via the EMF.

ping model is opened in the tree editor, which lists severalmapping entries. The selected entry contains a reference toan IP-XACT component instance, which represents a loadHWBB, and a reference to a load VBB. The entry definesalso the execution unit and register set of the load HWBBvia the properties set and split. If multiple sets are avail-able, mapping data flow operations onto the same split butdifferent sets results in the execution unit being shared.

4.2.3.3 Code generationFinally, the code generator was provided with the IP-XACT-based hardware architecture models, the dataflow models, and the mapping models. The resultingSystemC and VHDL test benches were successfullycompiled and executed/simulated. With respect to the


96

excerpts of a template in Listing 1 and a data flowmodel in Listing 2, Listing 3 presents the generatedconfiguration of a HWBB in the SystemC model: AHWBB is accessed via an array of pointers that usesthe splits, sets, and registers as indices. The configura-tion value REG_MASK_OPERATION_SEL_ADDSUB_SUBis written to the operation registerBB_EX_ADDSUB__REG_ID__OPERATION of the first set ofthe first split of the HWBB BB_Ex_fAddSub. Hence, theHWBB BB_Ex_fAddSub subtracts its incoming operands.

i_BB_Ex_fAddSub[BB_SUBSPLIT_0 ][ BB_SET_0][regId + BB_EX_ADDSUB__REG_ID__OPERATION]=

REG_MASK_OPERATION_SEL_ADDSUB_SUB;

Listing 3 An excerpt of the generated SystemC code toconfigure a HWBB of a DFA hardware accelerator toperform a subtraction. The names of the constants wereadapted to enable a proper layout.

5 Conclusion and Discussion

The DFA is a CGRA that features runtime-reconfigurableand heterogeneous base blocks. These base blocks are con-nected via a sparse crossbar and offer complex configu-ration options. Besides, the DFA targets the next gener-ation of ECUs. Manual configurations, a compiler for ahigh level language, such as C(++), a specific ISA, anda model-based approach were considered to transfer algo-rithms onto the DFA. The application in the automotivefield demands for the acceleration of selected algorithmsand a high performance per size ratio instead of the supportof arbitrary algorithms. Hence, the advantages of an inter-active optimization and the high accessibility of a model-based approach outweighed the disadvantages, especiallythe required effort per algorithm.This paper proposed a modeling framework based onSimulink and the DFA to model and simulate algorithms.The pivotal features of the modeling framework are:

• a library of VBBs and algorithms that offer differ-ent levels of abstraction allowing both, DFA and non-DFA experts, to model new algorithms and

• a simulation behavior that matches the behavior of theDFA on the functional level.

The modeling framework was validated by the success-ful modeling and simulation of several algorithms rangingfrom simple mathematical operations, such as matrix mul-tiplications, to complex Bosch-internal algorithms with se-ries relevance. This paper proposed also a configurationworkflow to transfer a modeled algorithm onto actual DFAhardware. The configuration workflow relies on three typesof models: data flow models, hardware models, and map-ping models. Eclipse and EMF were used to generate Javacode, including a simple editor, to load, store, and manipu-late these models. As Xpand handles multiple model files

per template well, Xpand templates and respective util-ity classes were implemented to generate the configurationand test benches in VHDL as well as SystemC. The con-figuration workflow was successfully validated based onthe same range of algorithms as the modeling framework.These algorithms were imported into the data flow mod-els and mapped onto the HWBBs. The data flow models,mapping models, and hardware models were passed ontothe code generator. The resulting configurations and testbenches in SystemC and VHDL were successfully com-piled and executed.All in all, the concepts of the modeling framework andthe configuration workflow were successfully realized asprototypes. These prototypes proved the proposed model-based approach viable. Naturally, prototypes are by nomeans complete implementations: So far, the modelingframework supports only two hierarchical layers. Thus,an algorithm may not consist of sub-algorithms, but onlyof VBBs. Furthermore, the generated editors of the con-figuration workflow are simple tree editors. The EclipseGraphical Modeling Project promises an easy developmentof graphical editors based on the EMF and the GraphicalEditing Framework. Currently, the mapping must be per-formed manually. Automatizing the mapping would allowfor a completely automatic configuration generation apartfrom the creation of the model of the algorithm via themodeling framework. These open issues and opportunitiesas well as a model-based optimization framework for de-signing different DFA hardware accelerator architecturesare future work.

6 Literatur

[1] R. Tessier, K. Pocek, and A. DeHon. Reconfigurablecomputing architectures. Proceedings of the IEEE,103(3):332–354, March 2015. ISSN 0018-9219. doi:10.1109/JPROC.2014.2386883.

[2] Bingfeng Mei, S. Vernalde, D. Verkest, H. De Man,and R. Lauwereins. Dresc: a retargetable com-piler for coarse-grained reconfigurable architectures.In 2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceed-ings., pages 166–173, Dec 2002. doi: 10.1109/FPT.2002.1188678.

[3] Jong-eun Lee, Kiyoung Choi, and Nikil D Dutt.Compilation approach for coarse-grained reconfig-urable architectures. IEEE Design & Test of Com-puters, 20(1):26–33, 2003. doi: 10.1109/MDT.2003.1173050.

[4] Liang Chen and Tulika Mitra. Graph minor ap-proach for application mapping on cgras. ACMTrans. Reconfigurable Technol. Syst., 7(3):21:1–21:25, September 2014. ISSN 1936-7406. doi:10.1145/2655242.

[5] Mahdi Hamzeh, Aviral Shrivastava, and Sarma Vrud-hula. Regimap: Register-aware application map-ping on coarse-grained reconfigurable architectures(cgras). In Proceedings of the 50th Annual Design


97

https://www.mathworks.com/help/simulink/ug/simulating-dynamic-systems.html



https://www.mathworks.com/help/simulink/sfg/how-s-functions-work.html

https://www.mathworks.com/help/simulink/sfg/how-s-functions-work.html

https://eclipse.org/

https://eclipse.org/

https://insights.stackoverflow.com/survey/2018

https://insights.stackoverflow.com/survey/2018

https://eclipse.org/modeling/emf/

https://eclipse.org/modeling/emf/

Automation Conference, DAC ’13, pages 18:1–18:10,New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2071-9. doi: 10.1145/2463209.2488756.

[6] Yongjoo Kim, Jongeun Lee, Aviral Shrivastava,Jonghee Yoon, and Yunheung Paek. Memory-aware application mapping on coarse-grained recon-figurable arrays. In Yale N. Patt, PierfrancescoFoglia, Evelyn Duesterwald, Paolo Faraboschi, andXavier Martorell, editors, High Performance Embed-ded Architectures and Compilers, pages 171–185,Berlin, Heidelberg, 2010. Springer Berlin Heidel-berg. ISBN 978-3-642-11515-8.

[7] Jonghee W. Yoon, Aviral Shrivastava, SanghyunPark, Minwook Ahn, Reiley Jeyapaul, and Yunhe-ung Paek. Spkm: A novel graph drawing based al-gorithm for application mapping onto coarse-grainedreconfigurable architectures. In Proceedings of the2008 Asia and South Pacific Design Automation Con-ference, ASP-DAC ’08, pages 776–782, Los Alami-tos, CA, USA, 2008. IEEE Computer Society Press.ISBN 978-1-4244-1922-7.

[8] Michael R. Garey and David S. Johnson. Comput-ers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY,USA, 1990. ISBN 0716710455.

[9] Tony Nowatzki, Vinay Gangadhar, Newsha Ardalani,and Karthikeyan Sankaralingam. Stream-dataflow ac-celeration. In Proceedings of the 44th Annual In-ternational Symposium on Computer Architecture,ISCA ’17, pages 416–429, New York, NY, USA,2017. ACM. ISBN 978-1-4503-4892-8. doi: 10.1145/3079856.3080255.

[10] Hartej Singh, Guangming Lu, Eliseu Filho, RafaelMaestre, Ming-Hau Lee, Fadi Kurdahi, and NaderBagherzadeh. Morphosys: Case study of a reconfig-urable computing system targeting multimedia appli-cations. In Proceedings of the 37th Annual DesignAutomation Conference, DAC ’00, pages 573–578,New York, NY, USA, 2000. ACM. ISBN 1-58113-187-9. doi: 10.1145/337292.337583.

[11] J. R Gurd, C. C Kirkham, and I. Watson. The manch-ester prototype dataflow computer. Commun. ACM,28(1):34–52, January 1985. ISSN 0001-0782. doi:10.1145/2465.2468.

[12] Florian Bock, Daniel Homm, Sebastian Siegl, andReinhard German. A taxonomy for tools, processesand languages in automotive software engineering.CoRR, abs/1601.03528, 2015. doi: 10.5121/csit.2016.60121.

[13] MathWorks. Simulation phases in dynamic sys-tems, 2019. URL https://www.mathworks.com/help/simulink/ug/simulating-dynamic-systems.html. [Online; accessed 09-January-2019].

[14] Ieee standard for standard systemc language refer-ence manual. IEEE Std 1666-2011 (Revision ofIEEE Std 1666-2005), pages 1–638, Jan 2012. doi:

10.1109/IEEESTD.2012.6134619.[15] Ieee standard vhdl language reference manual. IEEE

Std 1076-2008 (Revision of IEEE Std 1076-2002),pages c1–626, Jan 2009. doi: 10.1109/IEEESTD.2009.4772740.

[16] MathWorks. How s-functions work, 2019. URLhttps://www.mathworks.com/help/simulink/sfg/how-s-functions-work.html. [Online;accessed 09-January-2019].

[17] Eclipse. Eclipse, 2019. URL https://eclipse.org/. [Online; accessed 09-January-2019].

[18] StackOverflow. Developer survey results 2018,2018. URL https://insights.stackoverflow.com/survey/2018. [Online; accessed 09-January-2019].

[19] Eclipse. Eclipse modeling framework (emf),2019. URL https://eclipse.org/modeling/emf/. [Online; accessed 09-January-2019].

[20] Eugene Syriani, Lechanceux Luhunu, and HouariSahraoui. Systematic mapping study of template-based code generation. Computer Languages, Sys-tems & Structures, 52:43 – 62, 2018. ISSN 1477-8424. doi: 10.1016/j.cl.2017.11.003.

[21] Benjamin Klatt. Xpand: A closer look at themodel2text transformation language. Technical re-port, Chair for Software Design and Quality (SDQ),Institute for Program Structures and Data Organiza-tion (IPD), University of Karlsruhe, Germany, 2007.

[22] V. Berman. Standards: The p1685 ip-xact ip meta-data standard. IEEE Design Test of Computers, 23(4):316–317, April 2006. ISSN 0740-7475. doi:10.1109/MDT.2006.104.

[23] K. Röpke and C. Gühmann (editors) in cooperationwith 65 co authors. Design of Experiments (DoE) inPowertrain Development, chapter Data-based Modelson the ECU by R. Diener et al, pages 227–241. expertverlag GmbH, 2015. ISBN 978-3-8169-3316-8.


98

MBMV 2019 - vde.com · (Treffen der Arbeitsgruppen, jährliches Symposium, Barcamp), und wie die...

Documents

Transcript of MBMV 2019 - vde.com · (Treffen der Arbeitsgruppen, jährliches Symposium, Barcamp), und wie die...