Probabilistic Safety Analysis of Executable Models · 2018-07-11 · 1. IntroductionandMotivation...

Probabilistic Safety Analysis ofExecutable Models

Der Fakultät für Angewandte Informatik derUniversität Augsburg

zur Erlangung des akademischen Grades einesDr. rer. nat.

eingereichte Dissertation

vonJohannes Severin Leupolz

19.3.2018

Erstgutachter: Prof. Dr. Wolfgang ReifZweitgutachter: Prof. Dr. Alexander Knapp

Tag der mündlichen Prüfung: 18.6.2018

Für meine Familie.

Danksagungen

An dieser Stelle möchte ich einigen Personen meinen Dank aussprechen, die mich in denletzten Jahren begleitet und unterstützt haben.Zuallererst danke ich dabei meinem Doktorvater Prof. Dr. Wolfgang Reif, der mir die

Möglichkeit gegeben hat, in einem spannenden Umfeld zu forschen und mich dabei mitzahlreichen Ratschlägen unterstützt hat, dieseDissertation zu verfassen. Auch dankenmöch-te ich Prof. Dr. Alexander Knapp, der mir bei meinen vielen Fragen, insbesondere bei denformalen Grundlagen und der Typographie mit LaTeX, weitergeholfen hat.Bedanken möchte ich mich auch bei meinen zahlreichen Kollegen am Lehrstuhl, die

mir das Arbeiten durch die angenehme Atmosphäre erleichtert haben. Besonders dankenmöchte ich Stefan Bodenmüller, Axel Habermaier, Kuzman Katkalov, Jörg Pfähler, HellaPonsar und Gerhard Schellhorn, die mir immer wieder mit Rat und Tat zur Seite standen,besonders beim Korrekturlesen von früheren Forschungsbeiträgen und dieser Arbeit.Mein tiefster Dank gebührt meiner gesamten Familie, die schon mein ganzes Leben an

mich geglaubt und mich immer wieder ermutigt hat. Insbesondere gilt dies für meine Le-bensgefährtin Sarah Gräßle, die mir immer wieder Kraft gegeben hat und mir den Rückenfreihielt, wo sie nur konnte.

Johannes Leupolz

v

Abstract

Classical software verification focuses on answering the question if the implementationof a piece of software conforms to a specification. Verification plays an essential role insafety-critical domains like railway, automotive, aviation, and also medical devices. Anothercrucial aspect in those domains is the analysis what happens if a specification-conformingsystem is embedded into a dangerously behaving environment or if parts of the system (e.g.,sensors or radio-devices) are malfunctioning. Even under such problematic circumstances,the operation of a safety-critical system should not lead to accidents or cause any otherform of harm.Traditional safety techniques like the fault tree analysis describe a way how an upper

bound of the hazard probability can be estimated using the probabilities of the componentfaults, but these traditional safety-analysis techniques have not been designed for software-intensive systems. Because of their complex behavior, such software-intensive systems arehard to analyze. This thesis presents an approach how such systems can be modeled and an-alyzed probabilistically using executable modeling languages, i.e., modeling languages that modelbehavior in an executable way; as a consequence, the approach mitigates problems that arisein the probabilistic analysis of software-intensive systems.Available programming languages can easily be extended to executable modeling lan-

guages. As the first contribution, the executable modeling language S# (pronounced “safetysharp”) was developed in the context of this thesis. S# is an imperative, object-orientedlanguage that supports modeling both probabilistic and nondeterministic behavior; it wasspecifically designed for the modeling of safety-critical systems. This thesis uses six casestudies to demonstrate how to conduct safety analysis using the executable modeling ap-proach, and its efficiency. One of these case studies, the hemodialysis machine, is used todemonstrate the modeling and analysis capabilities of the S# modeling framework.The well-known Markov chains and Markov decision processes are useful for model

checking-based analyses of systems that encompass probabilistic behavior; however, bothprobabilistic systems are less suited for the efficient analysis of executable models. Anothercontribution of this thesis are extensions of those probabilistic systems that are optimizedfor the efficient analysis of executable models. This thesis provides algorithms to calculatethe reachability probability for both extended probabilistic systems and proves their cor-

vii

rectness. A simplified probabilistic language is used as a representative of all executablemodeling languages and is given a formal semantics using the introduced probabilistic sys-tems.As third contribution, algorithms are presented that can generate probabilistic systems

from executable models by executing them in contrast to other approaches that use complexmodel transformations. The introduced algorithms are not restricted for the safety analysis.In addition, optimizations are presented that reduce the generation time significantly forthe safety analysis.The executable modeling approach makes different kinds of probabilistic safety-analyses

possible. As last contribution, those analyses are applied to the case studies, and the resultsand the insights they provide are discussed.

Contents

1. Introduction and Motivation 11.1. Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2. Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2. Overview of the Case Studies 72.1. Radio-based Railroad Crossing . . . . . . . . . . . . . . . . . . . . . . . . 72.2. Height Control System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3. Hemodialysis Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4. Pressure Tank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5. Abstract Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5.1. Dead Reckoning . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.5.2. Degraded Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.6. Summary and related work . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3. Conventional Safety Analysis 193.1. System Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2. Faults, Failures, and Errors . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3. Probabilities of Fault Occurrences . . . . . . . . . . . . . . . . . . . . . . 223.4. Using Mean Time To Failures in Discrete Probability Spaces . . . . . . . . 243.5. Fault Tree Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4. Safety Analysis of Executable Models 294.1. Executable Models in the Development Lifecycle . . . . . . . . . . . . . . 304.2. Model of Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3. The S# modeling framework . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.3.1. Modeling Behavior with S# . . . . . . . . . . . . . . . . . . . . . 334.3.2. Modeling Faults in S# . . . . . . . . . . . . . . . . . . . . . . . . 374.3.3. Analyzing Models with S# . . . . . . . . . . . . . . . . . . . . . 39

4.4. S# Model of a Hemodialysis Machine . . . . . . . . . . . . . . . . . . . . 424.4.1. Structure of the Model . . . . . . . . . . . . . . . . . . . . . . . 43

ix

4.4.2. Controller Specification using State Machines . . . . . . . . . . . 434.4.3. Flow Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.4.4. Modeling Fluid Flows . . . . . . . . . . . . . . . . . . . . . . . . 474.4.5. Modeling a Pump . . . . . . . . . . . . . . . . . . . . . . . . . . 484.4.6. Probabilistic Safety Analysis . . . . . . . . . . . . . . . . . . . . . 49

4.5. Lustre Model of Pressure Tank . . . . . . . . . . . . . . . . . . . . . . . 504.6. Discussion and Related Work . . . . . . . . . . . . . . . . . . . . . . . . 53

5. Probabilistic Systems 555.1. Generic Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.1.1. Paths and Probability Measure . . . . . . . . . . . . . . . . . . . 575.1.2. Unbounded Reachability . . . . . . . . . . . . . . . . . . . . . . 595.1.3. Bounded Reachability . . . . . . . . . . . . . . . . . . . . . . . . 665.1.4. Generic Markov Chains with Initial Distribution . . . . . . . . . . 695.1.5. Comparison to Standard Markov Chains . . . . . . . . . . . . . . 70

5.2. Choice-Aware Markov Decision Processes . . . . . . . . . . . . . . . . . 735.2.1. Paths and Probability Measure . . . . . . . . . . . . . . . . . . . 745.2.2. Unbounded Reachability . . . . . . . . . . . . . . . . . . . . . . 775.2.3. Bounded Reachability . . . . . . . . . . . . . . . . . . . . . . . . 845.2.4. Choice-Aware Markov decision Process with Initial Choice . . . . 855.2.5. Comparison to Standard Markov Decision Processes . . . . . . . 86

5.3. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

6. Model Checking Probabilistic Systems 936.1. Sparse Labeled Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . 946.2. Sparse Labeled Choice-Aware Markov Decision Processes . . . . . . . . . 976.3. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1006.4. Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

7. Generating Probabilistic Systems from Executable Models 1057.1. Formalization of Executable Models . . . . . . . . . . . . . . . . . . . . . 1067.2. State Space Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1087.3. Generating Labeled Markov Chains from Executable Models . . . . . . . 110

7.3.1. Labeled Markov Chain Semantics of an Executable Model . . . . 1107.3.2. Generating the Transition Information of a State . . . . . . . . . 1137.3.3. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

7.4. Generating Labeled Choice-aware Markov Decision Processes from Exe-cutable Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1207.4.1. Choice-aware Markov decision process semantics of an executable

model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

7.4.2. Generating the Transition Information of a State . . . . . . . . . 1227.4.3. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

7.5. Implementing the Past Time Operator Once . . . . . . . . . . . . . . . . 1277.6. Interfacing Lustre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1297.7. Discussion and Related Work . . . . . . . . . . . . . . . . . . . . . . . . 132

8. Optimizations 1338.1. Multi-Core State Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . 1338.2. Static Fault Forwarding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1368.3. Early Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

9. Analyzing the Impact of Faults using Executable Models 1419.1. Quantitative Evaluation based on theDeductive Cause Consequence Analysis 142

9.1.1. Height Control System . . . . . . . . . . . . . . . . . . . . . . . 1429.1.2. Hemodialysis Machine . . . . . . . . . . . . . . . . . . . . . . . . 144

9.2. Quantitative Evaluation using Probabilistic Model Checking . . . . . . . . 1459.2.1. Height Control System . . . . . . . . . . . . . . . . . . . . . . . 1459.2.2. Hemodialysis Machine . . . . . . . . . . . . . . . . . . . . . . . . 146

9.3. Evaluation of the Impact of Single Fault Probabilities . . . . . . . . . . . 1479.3.1. Degraded Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . 1479.3.2. Height Control System . . . . . . . . . . . . . . . . . . . . . . . 1489.3.3. Hemodialysis Machine . . . . . . . . . . . . . . . . . . . . . . . . 149

9.4. Evaluation of Design Variants using Probabilistic Systems . . . . . . . . . 1509.5. Evaluation using Conditional Probabilities and Bayesian networks . . . . . 152

9.5.1. Dead Reckoning . . . . . . . . . . . . . . . . . . . . . . . . . . . 1549.5.2. Radio-based Railroad Crossing . . . . . . . . . . . . . . . . . . . 155

10. Conclusions and Outlook 15710.1. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15710.2. Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

A. Architecture of S# 159

B. Mathematical Preliminaries 163B.1. Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

B.1.1. Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . 163B.1.2. Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . 164B.1.3. Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . 165B.1.4. Product Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 166B.1.5. Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . 166

B.2. Order Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167B.2.1. Partial Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167B.2.2. Complete Partially Ordered Set . . . . . . . . . . . . . . . . . . . 167B.2.3. Function Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

C. Bayesian Network of the Radio-based railroad crossing 169

1. Introduction and Motivation

Safety-critical systems are systems that might have a negative impact on the environment,high follow-up costs, or even catastrophic consequences on human life when they are mal-functioning. Unfortunately, there are examples of past accidents involving such systemseven if they have been rigorously analyzed for their safety. The reasons for such accidentsmay be overlooked causes like software bugs, unexpected behavior, sensor faults, or a com-bination thereof. The most prominent examples of systems where software bugs lead toaccidents are the Therac-25 radiation therapy machine and the Ariane 5 rocket. Between1985 and 1987, software bugs in the Therac-25 radiation therapy machine led to at least sixaccidents where patients were exposed to too much radiation [Lev95]. In the second exam-ple, on the first test flight of the Ariane 5 in 1996, a software bug in the control softwareled to the self-destruction of the rocket [LL96]. But there are also examples where sensorfaults or unexpected behavior lead to accidents, e.g., in 2016, the Mars lander Schiaparelliwas lost, because an attitude-estimation error during the landing triggered a too early releaseof the parachute [Vin16].One goal of rigorous safety analysis is to detect the weaknesses of a system to be able to

develop appropriate countermeasures against them. Another goal is to estimate the safetyof a system quantitatively to be able to assess if a system is safe enough for its use in thefield. Today, software is an important innovation factor, and safety-critical systems becomemore and more software-intensive. Software-intensive systems are hard to analyze becauseof their complex behavior. Traditional safety-analysis techniques like the fault-tree analysishave not been designed for such software-intensive systems. Therefore, research on safetyanalysis techniques is conducted to improve their analyses.This thesis encourages the use of executable models for safety analysis. An executable

model of a system of interest can be looked upon as a program where the code describesthe abstract behavior of

• its controllers,• its sensors,• its actuators,• relevant parts of its physical environment, and• faults that may occur in the context of the system.

1


Because of their executability, there is a strong relationship between executable modelinglanguages and programming languages. However, in contrast to executable models, realprograms of controllers contain low level details about their input/output behavior. Exe-cutable models are not intended to be run on real controllers. Thus, programs and modelsdiffer from each other in the degree of detail and in the scope of elements they contain.Many available programming languages can easily be extended to executable modeling lan-guages.For this thesis, two executable languages are used to model safety-critical systems, which

underlines that the presented approach is not limited to one particular executable modelinglanguage. The data-flow oriented programming language Lustre was turned into amodelinglanguage with only small enhancements. Furthermore, the executable modeling languageS# (pronounced “safety sharp”) was developed in the context of this thesis. S# is a DomainSpecific Language (DSL) embedded in C# that supports modeling both probabilistic andnondeterministic behavior, and is specially tailored for the analysis of safety-critical systems.In addition, S# offers language features for modeling faults and fault effects intuitively. Forthe probabilistic analysis of safety-critical systems, these powerful features make S# themeans of choice in many cases. In the context of this thesis, algorithms and data struc-tures have been developed to enable efficient probabilistic analyses of executable modelsusing model checking. These data structures and algorithms are especially advantageousfor safety analysis where statements about faults were made. This thesis also provides aformal foundation of the approach to support its validity. Case studies from different do-mains provide evidence that the proposed techniques also work for realistic scenarios. Theimpact of faults in those case studies is analyzed using the approach of this thesis and theresults are discussed.The S# repository [ISSE18] contains the reference implementation of all algorithms, the

S# modeling language, the interface to the Lustre language, models of all case studies, thedetailed results of the evaluations presented in this thesis, the scripts that have been usedfor the evaluations, and a supplementary document with proofs that are rather technical.All evaluations have been conducted with a 4-core desktop CPU and 16GB RAM.

1.1. Outline

This section gives an overview over the thesis. The discussion about related work is notincluded en bloc but split up into several parts that are located in the appropriate chapters.

Three larger case studies with applications in the railway, traffic control, and medical sec-tors are presented in chapter 2. Each of those case studies has different characteristics thatchallenge the model checking in different ways. A pressure tank consisting of a couple ofelectronic components, which is the classic example in fault tree analysis, is introduced asforth case study. Moreover, two smaller examples are introduced which point out problems

2

1.1. Outline

that might arise in the probabilistic analysis of systems with dynamic behavior. The follow-ing chapters use the introduced case studies to demonstrate the approach and to evaluatethe efficiency of the implementation and its algorithms.Chapter 3 gives a brief overview of the field of safety analysis in industry. It establishes

the terminology used in the remainder of this thesis by defining terms like fault, failure,and error. The chapter also explains the importance of including the environment whenanalyzing safety-critical systems, and further gives a brief overview of how fault tree analysisis used in the industry to conduct quantitative safety analysis. All in all, this chapter shallprovide enough basics for readers with no foreknowledge about safety analysis.Chapter 4 provides a practice-oriented introduction of how executable modeling lan-

guages can be used for safety-analysis. Therefore, it describes briefly how the analysis ofsuch executable models can be integrated into the development life-cycle of a safety-criticalsystem. Also, a fitting model of computation is described. The S# modeling language ispresented first on an abstract example to show how to model components, faults, fault ef-fects, probabilistic choice, and nondeterministic choice. Then, a model of the hemodialysismachine case study is presented to demonstrate the benefits of the object-oriented featuresof S# for larger models. Also, it shows how physical behavior can be modeled, and howthe behavior of controllers can be modeled using state machines. Thereafter, the model ofa hardware pressure tank is presented in the data-flow language Lustre; data-flow orientedlanguages are inherently parallel and thus very suited for hardware models.In the literature, various formal underpinnings have been proposed to reason about the

probability of specific events in a dynamically behaving system. This thesis summarizesthese formal underpinnings under the term probabilistic systems. Chapter 5 introduces twonew probabilistic systems, namely labeled Markov chains and labeled choice-aware Mar-kov decision processes. Both probabilistic systems have been designed for this thesis forthe efficient probabilistic safety analysis of executable models. Moreover, iteration oper-ators to calculate the probability to reach certain states (in both a bounded number andand unbounded number of steps) are introduced on a formal level. The newly introducedprobabilistic systems are then evaluated against classic Markov chains and Markov decisionprocesses in terms of the size of state space and number of transitions.Chapter 6 provides concrete data structures and algorithms that can be used to imple-

ment model checking for the probabilistic systems of the previous chapter. The data struc-tures allow compact representations of the state space by not saving zero values. Further,the data is stored in plain arrays that enable their efficient traversal by allowing direct indexaccess. The speed of the algorithms is finally evaluated on the larger case studies.The topic of chapter 7 is how to generate probabilistic systems from executable models.

The so-called formal probabilistic programs are introduced as simplified abstraction of exe-cutable models in any executable language. Those formal probabilistic programs are givenboth a purely probabilistic semantics and a semantics encompassing both nondeterministic

3


and probabilistic choice. The main part of this chapter introduces algorithms that generatea probabilistic system from an executable model by actually executing the model. The chap-ter further describes how formulas using the once-operator (an operator that allows talkingabout the past) are treated during the generation of a probabilistic system. Also, the chap-ter describes briefly how the Lustre language can be interfaced with the language-agnosticalgorithms of this chapter. Finally, the case studies are used to evaluate the generation timeof the algorithms.Chapter 8 presents three optimizations that can be applied during the model generation.

The first optimization introduces multi-core model checking. The second optimizationmakes use of static optimizations concerning transient faults, which can be made duringthe model traversal under certain circumstances. This optimization is especially useful forthe safety analysis. The third optimization leverages the insight that it is often not necessaryto generate a complete probabilistic system from an executable model. All optimizationsare evaluated against the case studies.Chapter 9 demonstrates different analysis techniques that can be used to estimate how

different faults affect a hazard. The exact hazard probabilities are calculated directly fromthe executable models using probabilistic model checking. The results are compared withresults obtained by applying the rare event approximation of the established fault tree anal-ysis. The chapter also shows how different design variants of a system can be comparedusing probabilistic model checking. Moreover, a technique based on conditional proba-bilities is introduced to reveal weak spots of a system. Finally, the chapter shows how aBayesian network can be used to interpret the interdependencies of different faults.Finally, chapter 10 summarizes the findings of this thesis and gives an outlook to future

work.

1.2. Contributions

This thesis has four main contributions that together greatly improve the accuracy of theprobabilistic analysis of safety-critical systems.Executable modeling language S# for the probabilistic analysis of safety-critical sys-tems. The executable modeling language S# was created specifically for the modeling andprobabilistic analysis of safety-critical systems. S# is very expressive and simple to graspfor developers with experience in object-oriented programming. Compared to similar ap-proaches, S# models can be highly modularized as a consequence of the adaption of C#’sobject-oriented concepts. Using Visual Studio, engineers can use the standard C# debug-ging tools to simulate their models [HLR15]. Formal analysis techniques are integrated intothe tool chain and can be used without detailed knowledge of the underlying formal under-pinning, which makes it easier for engineers to focus on the models (see also [VDW12]).In addition, algorithms have been developed for executable models that can reveal which

4

1.2. Contributions

combinations of unexpected behavior in the environment and which faults on the compo-nent level (e.g. sensors failing) could result in a hazard. Furthermore, probability estimatesof hazards can be calculated based on the algorithms introduced in this thesis.

Probabilistic systems that enable the efficient model checking of executable models.This thesis introduces labeledMarkov chains and labeled choice-awareMarkov decision pro-cesses. Both have been developed with executable models in mind and allow their efficientanalyses. Note that the application of each probabilistic system is not limited to a specificexecutable language. In the evaluation of the railway case study, the labeled Markov chainrepresentation was even smaller by a factor of 7 compared to an equivalent standard Mar-kov chain representation. For analyses that contain both nondeterministic and probabilisticchoice, the results are even more noteworthy: the introduced choice aware Markov decisionprocesses are considerably more compact and easier to analyze than their standard Markovdecision process counterparts. This compact representation is necessary under certain cir-cumstances to make the analysis possible in the first place. Correctness proofs are providedfor the algorithms that are used to calculate the probability of reaching certain states in aprobabilistic system. Also, data structures are provided that allow an efficient analysis. Allalgorithms have been evaluated on six case studies.

Generation and analysis of probabilistic systems from executable models. As thirdcontribution, algorithms are presented that can generate probabilistic systems from exe-cutable models by executing them. These algorithms generate a probabilistic system froman executable model by actually executing the model. This stands in contrast to the morecommon approaches where modeling languages are designed to be relatively close to theirformal foundation and thus easier to analyze, or where models are transformed syntacti-cally into an easier-to-analyze modeling language. The combination of two elements makeit possible to analyze larger systems using the approach of this thesis: first, executable mod-els differentiate between micro and macro steps, and secondly, states only record thosevariables that are relevant after a macro step; hence, it is possible to get rid of a lot oftemporary variables and the program counter. Other safety-analysis tools that allow thesimulation of their models can also benefit from this approach. When models are simu-lated, they are actually executed; therefore, such simulation engines can often be upgradedto probabilistic model checkers by adding the algorithms of this thesis. Furthermore, threeoptimizations are presented that can be applied during the model generation. By applyingall three optimizations, the generation time is reduced significantly.

Analysis techniques for the impact analysis of faults using executable models. Asforth contribution, this thesis shows how executable models can be used to assess the im-pact of faults. Hazard probabilities can be calculated directly from executable models usingprobabilistic model checking. This delivers a more accurate estimate than traditional ap-proaches. Moreover, different design alternatives of a system can be assessed faster becausethe object-oriented foundation of S# facilitates the modeling of such alternatives, and the

5


probabilistic model checking works fully automatically. It is also possible to measure theimpact of the probability of a fault on a hazard probability based on an executable model.A technique based on probabilistic model checking and conditional probabilities can revealthat certain faults have a higher impact on the hazard probability than others. As last tech-nique, this thesis shows how Bayesian network that are generated from an executable modelcan be used to understand the interdependencies between different faults.

6

Jack Godell: I know the vibration was not normal

– The China Syndrome (1979)

2. Overview of the Case Studies

In this chapter, case studies from different disciplines are introduced, namely from theaerospace, medical, and transportation sector. It is a well known fact that errors in thesedisciplines might have high follow up costs or even a severe impact on the physical healthof people. Applying rigorous safety analysis methods in those disciplines provides moreinsight in the safety of such a system. This insight might support the rating if a systemis safe enough for its usage. Alternatively, this insight could be used to propose changesto the system to further decrease its risk. As part of this work, each case study has beenmodeled and analyzed probabilistically using an executable modeling language. Each ofthese case studies has different challenges for the modeling and automatic analysis usingmodel checking.Three of these case studies served in previous publications as running examples for the

modeling and analysis using executable models [LHR16; LHR18; LK+17; HLR16; HLR15;HK+16]. The S# repository [ISSE18] contains complete executable models of all heredescribed case studies.

2.1. Radio-based Railroad Crossing

In the railway sector, much technical equipment is located trackside, e.g., electronic transpon-ders called balises that are placed on the track and send passing trains their position, andother devices that register passing trains. At the intersections of roads and railway lines,active protection systems are often installed that lower barriers when a train approaches,the so-called level crossings. Such level crossings traditionally require a lot of tracksideequipment. The radio-based railroad crossing replaces much of the trackside equipment byonboard computations of the train position and radio-based communication between thetrain and the crossing to get a more cost-efficient and robust system [ORS05]. In such asystem, the control computer of the train is always aware of its position and has a databaseof all level crossings on its track, and both the train and the crossing have radio communi-cation modules.When the train approaches a crossing, the following sequence is executed (see figure 2.1):

7

2.2. Height Control System

it is therefore not possible to analyze each system part separately: the fault occurrence inone part influences the behavior in the other part, e.g., if the acknowledgment message ofthe crossing is delayed due a BarrierSensorFailure, then the controller of the train needs toretrieve the position of the train more often. Probabilistic safety analysis can only revealsuch correlations when the system is inspected as a whole. Another challenge for the prob-abilistic safety analysis is that parts of the necessary statistical data might not be at hand,e.g., for the probability of a lost message. A probabilistic analysis should then still provideinsight as far as possible. There are also some things to consider from the technical view-point: the model of the case study must be large enough to provide valuable informationbut at the same time small enough to allow a rigorous and automatic analysis. Moreover,the variables of such a model allow a relatively large state space where only a small part isactually reachable, and most states of the case study only have a low number of successorstates.

2.2. Height Control System

Figure 2.2.: Height Control System

The second case study addresses a height control system for road traffic managementdescribed by Ortmeier at al. in [OR+03]. The (New) Elbe Tunnel is a tunnel below theriver Elbe in Hamburg and is part of the highway A7 in Germany. In 2013, around 123,000vehicles passed the Elbe Tunnel on an average weekday, these include 16% commercialvehicles [Fre15]. At first, the Elbe Tunnel consisted of 3 tubes with 2 lanes each. In 2002,the Elbe Tunnel was extended by a forth tube. This new tube is higher than the old tubesand allows over-high vehicles (OHV, vehicles with a height over 4 meters) to pass the tunnel,which was not possible with the old tubes. High vehicles (HV, vehicles of at most 4 meters

9


but higher than passenger cars) and passenger cars are able to use any tube. It was necessaryto design a height control system that prevents OHVs to enter one of the old, smaller tubes:the control system should trigger an alarm and switch the traffic lights to red to lock thetunnel entrance.To create a height control, engineers can use overhead detectors (ODs) and light bar-

riers (LBs): Light barriers can easily be mounted on a height in which their light ray getsinterrupted by OHVs only. But such LBs can often only be installed at locations wherethey check multiple lanes at once, which makes them unable to determine the lane of theinterruption. On the other hand, ODs, can be placed in greater height above specific lanes,but that makes them often unable to distinguish between OHVs and ordinary HVs. Nev-ertheless, the combination of a LB and a OB can detect if a OHV is passing a specific laneat a specific position.One of the many possible designs for a height control system using ODs and LBs is de-

picted in figure 2.2. The height control system is decomposed into PreControl, MainCon-trol and EndControl. PreControl activates MainControl whenever an OHV is recognizedby the light barrier LBpre. On activation, MainControl (re-)starts an internal timer and in-creases an internal counter that keeps track of how many OHVs are in the area betweenLBpre and LBmain. When the timer runs out or the internal counter reaches 0, MainCon-trol gets inactive, again. When MainControl is active and both LBmain and ODright

main detect avehicle, the internal counter gets decremented and the EndControl gets activated. Whenthe active MainControl gets a signal from LBmain, but does not detect a vehicle on thecorrect lane or it detects a vehicle on both lanes, an alarm is triggered. EndControl also(re-)starts an own internal timer whenever it gets activated and deactivates itself when thistimer elapses. When EndControl is active and ODend detects a vehicle, an alarm is triggered.Due to structural measures, it is not possible to switch lanes after ODend.

Hazards and faults. There are two hazards to consider: collisions, when a OHV entersthe small tube, and false alarms. Sensors are vulnerable to two kinds of faults: false detec-tions, i.e., detecting something non-present (false positive), and misdetections, i.e., failing todetect something present (false negative). Seven types of faults were considered in the lateranalysis:

• LightBarrierFalseDetection: False detection of a light barrier.• LightBarrierMisdetection: Misdetection of a light barrier.• OverheadDetectorFalseDetection: False detection of an overhead detector.• OverheadDetectorMisdetection: Misdetection of an overhead detector.• LeftHV: High vehicle changes to the left lane even if it is prohibited by traffic laws.• LeftOHV: Overheight vehicle changes to the left lane.• SlowTraffic: Slow traffic.

Challenges for the probabilistic safety analysis. It is not hard to validate that the de-scribed design works when only one OHV drives through the height control and no sensor

10

2.3. Hemodialysis Machine

fails. But there are many scenarios where several OHVs drive through the height controlin various ways. On top of that, there is a vast amount of combinations how sensor faultsmay lead to a wrong internal representation of the environment in the controller. Withclassical methods it is tedious to validate if a design is appropriate and robust enough un-der all circumstances. It is even harder to conduct probabilistic safety analysis on a systemwith such a dynamic behavior. Chapter 9 shows how a conventional probabilistic assess-ment of the case study fails. But also when the analysis reveals how the design could beimproved, the validation of a new design had to be started from scratch in a conventionalapproach, even if only one slight detail was changed because it might have a big impact inan unexpected part of the system. One challenge is to find a design which prevents colli-sions and false alarms even in the presence of sensor faults. To be useful, the probabilisticanalysis must be fast enough to be applicable. This is a massive challenge on the technicalside, because the many possibilities how a vehicle could behave in each step lead to a largeamount of successor states. This is the largest case study in terms of its state space: thenumber of transitions is very high. Furthermore, the calculation of successor states itself iscomputationally expensive.

2.3. Hemodialysis Machine

Figure 2.3.: Hemodialysis Machine

The third case study is a hemodialysis machine. These machines have a direct influenceon the chemical composition of a patient’s blood and thus form a safety-critical system.The human body creates metabolic waste products like urea and minerals. Usually, the

kidneys are responsible for the removal of these waste products from the blood. When the

11


kidneys fail, a hemodialysis machine can be used for this removal instead. This descriptionof a hemodialysis machine here is based on a case study description written to evaluateformal languages and a training handbook for dialysis technicians [Mas16; CD+06].A hemodialysis machine (see figure 2.3) consists of three basic elements. The extra-

corporeal blood circuit (ECB), the dialyzer, and the dialyzing fluid delivery system (DFDS).Medical staff uses syringes to connect the artery and the vein of the patient to the ECB.The main purpose of the ECB is to deliver the blood from the patient to the dialyzer

and back again to the patient. A blood pump creates a suction to pump the patient’s bloodthrough the ECB. The heparin pump adds heparin into the patient’s blood to prevent bloodclotting. The arterial and venous pressure transducers deliver blood pressure values to allowtheir monitoring. The venous tubing valve enables another safety measure to prevent badblood reentering the patient. Whenever the safety detector detects contaminated blood orgas in the blood, the venous tubing valve is closed and no blood can reenter the patient.The dialyzer itself is part of two fluid flows: a blood flow and a dialyzing fluid flow. Inside

the dialyzer, a semipermeablemembrane separates these two flows. At themembrane, smallsized waste products go from the blood side to the dialyzing fluid side. Additionally, bigsized waste products can be removed from the blood side by creating a suction on thedialyzing fluid side (ultrafiltration).The incoming dialyzing fluid of the dialyzer is produced by the DFDS. The DFDS pro-

duces dialyzing fluid in several steps. The balance chamber acts as a buffer for dialyzingfluid. The safety bypass pipes dialyzing fluid with the wrong temperature to the drain in-stead to the dialyzer.

Hazards and faults. There are two hazards to consider: the unsuccesful dialysis, i.e., thehemodialysis finishes but the patient’s blood is still not completely cleaned, and the contami-nation, i.e., the blood that flows back from the hemodialysis machine to the patient is eithercontaminated or too cold with severe consequences. Nine faults an defects were consideredin the later analysis:

• BloodPumpDefect: The blood pump of the ECB does not create suction.• DialyzerMembraneRupturesFault: The membrane of the dialyzer ruptures. The dia-lyzing fluid inside the dialyzer is contaminated by blood and the chemical composi-tion of the blood inside the dialyzer is abnormal.

• DialyzingFluidPreparationPumpDefect: The pump that pumps the fresh dialyzingfluid to the balance chamber does not create any suction towards the water supply.

• SafetyBypassFault: The safety bypass cannot relay the dialyzing fluid into the drainanymore. Therefore, the bypass forwards all dialyzing fluid into the dialyzer, even ifthe dialyzing fluid does not meet the temperature constraints.

• WaterHeaterDefect: The water preparation does not heat the incoming water any-more.

• PumpToBalanceChamberDefect: The pump that pumps dialyzing fluid from the

12

2.4. Pressure Tank

Figure 2.4.: Pressure Tank

dialyzer back to the balance chamber is defective.• SafetyDetectorDefect: The safety detector signals that the passing blood flow is allright even if it is contaminated.

• ValveDoesNotClose: The venous tubing valve cannot be closed anymore.• UltrafiltrationPumpDefect: The pump for ultrafiltration is defective.

Challenges for the probabilistic safety analysis. The case study consists of several phys-ical components such as tubing valves, pumps, drip chambers, and the dialyzer itself, manyindependent components that are interconnected with fluid flows. To adequately expressthe causal dependencies between these components and to be able to calculate the proba-bility of a hazard, it is necessary to model the fluid flows that interconnect the components.Also, a model of the patient has to be created. The modeling language must allow thesystematic decomposition to manage the complexity. Hydrodynamic laws must be approx-imated in a way that is both precise and simple enough to allow fast calculations, whichis essential for fast model checking. It is the largest case study in terms of source lines ofcode.

2.4. Pressure Tank

The pressure tank system is the standard case study for fault tree analysis and is describedby Vesely et al. in [VU81] and [VD+02]. The case study of this section deviates slightlyfrom the aforementioned descriptions. The pressure tank system is depicted in figure 2.4and consists of the pressure tank itself and a control circuit that ensures that the pressuretank is not filled for more than 60 seconds. The control circuit with the timer T, the switch

13


Switch, the sensor S, and the two relays K1 and K2 form the controller. To start the fillingprocedure, a user must press the Switch. As soon as the Switch gets pressed, the electriccircuit is closed and current flows through the branches C1, C3, and C4. The relays K1 andK2 are energized. An energized relay closes its contact (the “switch” it controls) and keepsthe contact closed as long as the relay itself is energized. Therefore, the energized relay K2

closes its contact and the circuit C5 is closed: the Pump starts to fill the Tank with fluidfrom the reservoir. The closed contact of K1 leads to a closed circuit (branches C2 and C3)that keeps itself energized. Moreover, the timer T starts its countdown.When the sensor S measures a pressure level above a certain threshold, its contact opens

and branch C4 is not energized anymore: the countdown of T restarts and the contact ofK2 is open, the Pump stops and the Tank instantly depletes. Due to the depleted Tank, Scloses its contact again and the filling procedure automatically restarts. The pressure levelreached after 40 seconds of constant filling was selected as threshold level of the sensor forthe evaluation.When the timer times out, it opens its contact, K1 opens, whereby the whole circuit

gets currentless (assuming Switch is not pressed). Hence, timer is a safety measure. Forthe evaluation, a timeout of 45 seconds was selected. To restarted the interrupted fillingprocedure, Switch needs to be pressed again.

Hazards and faults. The two main hazards are tank ruptures, and tank does not provide enoughpressure. The first hazard is severe, because a rupturing tank might injure nearby people ordamage the system that contains the pressure tank. Depending on the application area ofthe tank, the second hazard is either only annoying or also severe if the containing system issafety-critical itself, e.g., a satellite system. Three faults were considered in the later analysis,albeit the original description includes more faults:

• FK1: K1 does not open its contact after losing current.• FK2: K2 does not open its contact after losing current.• FS: S does not open when pressure in Tank is above its threshold.

Challenges for the probabilistic safety analysis. The system itself is a pure hardwaresystem without a software based controller. With some effort, such systems can be mod-eled using an imperative modeling language but other languages designed for data flow aremore suitable for this purpose. Therefore, this case study is used to demonstrate how anexecutable model can be created in a data flow-based language and be analyzed probabilis-tically with the algorithms and techniques developed in this thesis.

2.5. Abstract Examples

The last two examples are rather small and no complete systems by themselves, but theypoint out problems that might arise in the probabilistic analysis of systems with dynamicbehavior.

14

2.5. Abstract Examples

Obtain Fix

Dead Reckoning

Validate Fix

Calculate

Advancement

after 2 sec[Fix invalid]

after 3 sec

(a) UML state machine (b) Application example

Figure 2.5.: Dead Reckoning

2.5.1. Dead Reckoning

The process of estimating the position using only the last known position (called fix), theelapsed time, speed, and direction is called dead reckoning (sic) [Zot02]. In contrast to otherposition estimation techniques like the satellite-based navigation with GPS, this techniqueis prone to cumulative errors. Figure 2.5b provides an application example of the deadreckoning component: The train retrieves the fixed position from a balise on the track.From then on, dead reckoning is used to approximate the position.In this thesis, an abstract example of a dead reckoning component is analyzed. The

behavior of this abstract component is given in figure 2.5a. In the first step, the fix isobtained, and immediately validated. If the fix is invalid, the advancement is computedusing either a sensor value or a calculated approximation. The advancement is computedafter 2 seconds as described before, even when the fix was valid. After a total time of 3seconds, the component is shut down to keep the state space small for the analysis. Thecomponent monitors itself and is able to detect faults.Hazards and faults. The hazard in this example is that the advanced position could notbe computed due to a detected sensoring fault and calculation fault. Three faults wereconsidered in the later analysis:

• FC: Approximation fault in the calculation.• FF: Fault in retrieving the fixed position.• FS: Sensor fault.

Challenges for the probabilistic safety analysis. This example is no challenge for theprobabilistic safety analysis per se. Its state space only consists of 25 states. This tiny size

15


Self test

Precise Sensor

Use

Precise Sensor

Use

Reliable Sensor

[Self test failed]

[Self test passed]

(a) UML state machine (b) Application example

Figure 2.6.: Degraded Mode

allows its manual analysis, which was useful for the validation and the debugging of thealgorithms developed for this thesis. In addition, this small example has an interestingproperty, which makes a further investigation worthwhile: Although the fault FF has noinfluence on the safety of the system from a qualitative perspective, the fault has a majorinfluence on the probability of the hazard.

2.5.2. Degraded Mode

Degraded modes of a system are modes in which a system still offers important parts of itsfunctionality even when errors were detected or subsystems were defective. Many safety-critical systems have degraded modes. In this thesis, the determination of the train locationis used as constructed example of such a degraded mode (see figure 2.6): A train can use aPrecise Sensor (e.g., GPS) or a Reliable Sensor (e.g., based onDead Reckoning) to determineits position. The sensor is selected after a self test of the Precise Sensor at system start. Theself test checks whether the location provided by the Precise Sensor matches to the knownposition of the train.

Hazards and faults. The hazard in this example is that the estimated location of the trainusing the active sensor is beyond a defined threshold. The measurement fault of the PreciseSensor (MeasureSignalFault) was considered as the only fault in the analyses.

Challenges for the probabilistic safety analysis. Also this example is no challenge forthe probabilistic safety analysis, but it also has an interesting property: an increase of thefault probability not necessarily means an increase in the hazard probability. Although thisexample only serves a demonstrative purpose, larger systems may also be vulnerable tosimilar effects. Hence, this example highlights the demand for a detailed analysis that is notbiased by such hidden effects.

16

2.6. Summary and related work

Case study Railroad Height Control Hemodialysis Pressure Tank Dead Reckoning Degraded Mode

Language S# S# S# Lustre S# S#Number of Hazards 1 2 2 1 3 1Number of Faults 7 13 9 3 3 1Files 14 26 39 1 1 1Source Lines of Code 388 741 2027 29 75 165

Table 2.1.: Overview over the case studies

2.6. Summary and related work

The presented case studies are interesting representatives of safety-critical systems due totheir different natures:

• the radio-based railroad crossing is a system where most states have only few succes-sors (sparse state space),

• the height control system represents a system where most states have many succes-sors (dense state space),

• the hemodialysis machine, in which the controller itself is not complicated but theinterconnected components it controls make modeling hard,

• and the pressure tank, which is a pure hardware system that is best modeled with adata flow-based language.

Table 2.1 gives an estimate of the size of the created models. The first three case studiesare not only the largest in complexity but also in terms of source lines of code1.The radio-based railroad crossing, the height control system, the hemodialysis machine,

and the pressure tank have already been analyzed with other approaches, c.f., [ORS05;OR+03; HG+12; BS+16; VU81; VD+02], albeit none of them used an approach basedon executable models. Güdemann created an executable model in the Scade language andconducted a qualitative safety analysis of the radio-based railroad crossing using the Scadesuite [GOR07]. In contrast to the toolset of this thesis, the Scade suite does not allowprobabilistic analyses.The presented case studies are not the only ones that have been analyzed with the exe-

cutable model approach with the toolset of this thesis, albeit these are the only ones thathave been analyzed probabilistically using model checking at the time of writing. A modelof a simplified version of the pressure tank example with a computer-based controller ispresented by Habermaier et al. in [HLR15; HK+16], and an example of a self-organizingsystem in [HE+15; KH+16].

1Lines of code without comments and empty lines.

17

M: They’re toxins that destroy the body andthe brain, caused by eating too much redmeat and white bread. Too many drymartinis!

Bond: Then I shall cut out the white bread, sir.– Never Say Never Again (1983)

3. Conventional Safety Analysis

Definitions considerably vary when talking about safety. Therefore, this chapter pro-vides definitions of the terminology used in the remainder of this thesis. Also, this chapterexplains the importance to include the environment when analyzing a safety-critical system.Finally, the chapter presents the fault tree analysis and its quantitative evaluation. This alsocomprises concepts like failure rates and mean time to failure.

3.1. System Safety

Ideal safety is the freedom from catastrophic consequences on human life or the environ-ment (see [AL+04; Lev95; Sto96; ISO10]). Other common definitions go in line with thedefinition that safety is considered as the situation when “risk is judged to be acceptable”.Definitions like this using the relative term “acceptable” have flaws. Using a relative defini-tion of “safe” might lead to a situation where a system is considered as safe when it meets agiven definition of “acceptable” even when better system designs are possible. In contrast,an ideal definition does not blur the concept of safety. Leveson discusses the differencesbetween the ideal and the relative definition of “safety” in more detail in [Lev95].Albeit it is in most cases not possible to design a safe system in the ideal sense, it is

still possible to design a system which is safe enough. The big question when a systemis safe enough is a so-called trans-scientific question, where – among others – moral andpolitical issues must be considered [Lev95]. The remainder of this thesis does not addressthis question any further. Still, the goal of the techniques developed in this thesis is toprovide information about the degree of safety of a system, which can support the decisionmaking if a specific system is safe enough.

Reliability is the probability that a system provides its expected functionality over a givenperiod of time under specified conditions (see also similar definitions in [Lev95; Sto96]).The term dependability integrates several favorable aspects like safety, reliability, availability,maintainability, and other quality aspects [AL+04].A hazard is given by conditions that have the potential to cause harm or damage [ISO10].

In this thesis, this also encompasses cases with a high economic loss when a system does not

19

3.3. Probabilities of Fault Occurrences

success within k trials is 1− (1−p)k (see equation (B.2) on page 165). This formula can beused to determine how probable it is to fail within k demands. Ironically, the occurrence ofa fault is treated as “success”. Thus, given the probability p of a fault to occur per demand,the reliability in dependence of the number of demands k is given byR(k) = (1− p)k, theprobability to fail within k demands by F (k) = 1− (1− p)k, and the expected number ofdemands until the fault occurs by 1/p.The other case, modeling a fault with the probability of failing per time, is more common.

Component suppliers often publish the mean time to failure (MTTF) of their components,i.e., the expected amount of time until a component fails. The definition of (MTTF) isbased on the exponential distribution (see appendix B.1.3 on page 165). The failure rate λis given by 1/MTTF.

Figure 3.5.: Bathtub curve

For many kinds of components, the failure rate (and thus also the mean time to failure)is assumed to be constant. This can be justified with the observations depicted in fig-ure 3.5 (see also [Sto96; WP18]):

• due to manufacturing faults, components have a higher probability to fail at the be-ginning (“infant mortality”),

• at the end of the component life cycle effects of aging become noticeable (“wearout”),

• and during the whole life time, random failures occur.The graph of the observed failure rate resembles a bath tub. For safety critical systems,the problems with infant mortality can be avoided by extensive testing, and the problemswith wear out by switching the components timely in maintenance. Thus, in the life spanof interest, the failure rate is constant: the probability to fail does not depend on the timea component has already been running.According to the exponential distribution, the reliability in dependence of the time t is

given by R(t) = e−λt, and the probability to fail within time t by F (t) = 1− e−λt.There is another interesting fact about the mean time to failure that is counterintuitive at

first: having a component active for the mean time to failure, the probability chance that it

23


Figure 3.6.: Discrete geometric distribution and continuous exponential distribution

failed in this time is about 63% and therefore higher than 50%. The reason is that given anarbitrary failure rate λ, the reliability at the time MTTF is R(MTTF) = e−λ(1/λ) = e−1 =0.37.

3.4. Using Mean Time To Failures in Discrete ProbabilitySpaces

Many formal analysis tools and simulation tools expect probabilities to be provided in adiscrete probability space. As explained in appendix B.1.3 on page 165, the geometric dis-tribution is the discrete counterpart of the exponential distribution due to the memorylessproperty of both distributions. Therefore, if the probability to fail is modeled as an ex-ponential distribution, a corresponding probability p for the geometric distribution can beapproximated. Assuming that one discrete time step corresponds to td units of time, and

λ is defined for tc units of time; then, p = 1− e−λtdtc .

Example. Let λ be defined for tc = 3600 seconds, and one discrete step should corre-

spond to td = 10 seconds. Then, p = 1− e−λtdtc , and the probability to fail in 60 seconds

can either be calculated by 1−(1−p)60td = 1−(1−p)6 or by 1−e−λ 60

tck = 1−e−λ 1

60k. Fig-ure 3.6 compares the continuous distribution of the example with its corresponding discreteapproximation: the graph shows that for k = 500, 1000, . . . , 9500, F (k) = 1−(1−p)k =

1− e−λtdtc

k. This is also true for all k = 0, 1, 2, . . . .

24

3.5. Fault Tree Analysis

To sum it up, formal analysis tools and simulation tools designed for discrete probabilityspaces can simply transform MTTFs to its discrete equivalents.


Given the time a system should be in use, it is possible to estimate the probability that acertain fault occurs using the previous formulas. The next step is to give an estimate of theprobability of a certain hazard. Several methods have been developed to derive how faultsof components are related to hazards on the system level. One of those methods is thefault tree analysis (FTA), a top-down analysis method [VU81]: starting from a hazard, theanalyst goes backward and tries to find its causes. These causes themselves can have causes.Going backwards the causes are refined stepwise until the so called basic events–faults atthe component levels–have been reached. Every node in a fault tree is called an event.Every event in a fault tree should be an unwanted situation and not nominal behavior.

Figure 3.7.: Fault tree pressure tank

Figure 3.7 shows an (incomplete) fault tree for the pressure tank example of section 2.4.Starting from the hazard “tank ruptures after pumping has started”, the fault tree deliversan explanation how this hazard could be triggered. The hazard is at the top of the fault tree.It is stepwise refined:1. “Tank ruptures after pumping has started” is refined to “Tank is filled for more than60 s” using an inhibit-gate ( ). The conditioning event of the inhibit-gate “Tank

25


ruptures after 60 s of constant filling” specifies the side condition of this refinement.2. The event “Tank is filled for more than 60 s” is directly refined without a gate.3. The event “K2 is closed for more than 60 s” is refined using the or-gate ( ) to event“K2 is energized for more than 60 s” and the basic event “K2 does not open” (shortFK2), a component fault that is not further refined. The semantics of the or-gatesays that one of the lower events is enough for the upper event to appear.

4. The event “K2 is energized for more than 60 s” is refined using the and-gate ( ) tobasic events “S does not open” (short FS) and “K1 does not open” (short FK1). Thesemantics of the and-gate says that all lower events are required for the upper eventto appear.

In this simple example, every event can be described as a Boolean expression. Usingthe Boolean semantics, a formula can be derived, which states how the basic events leadto the hazard: (FS ∧ FK1) ∨ FK2. From the disjunctive normal form of such a formula,the cut sets can directly be derived: CS = FS,FK1, FK2. A cut set Γ is minimalif no proper subset Γ′ ⊂ Γ is in CS. Using this rule, the minimal cut sets MCS can bederived from CS by removing those cut sets that are not minimal. The minimal cut setshave an interesting property: if it is possible to prevent in each of its cut sets one fault, itmakes a hazard impossible and system safe (concerning this hazard). A singleton in MCSis called a single point of failure. Those are weaknesses of the system where designers mustbe cautious.Based on the qualitative results, a fault tree can be evaluated quantitatively. Faults are

usually assumed to be stochastically independent. Using the described formulas, the occur-rence probability of a fault fi in a given time or given number of demands can be calculated.This can be seen as a probabilistic experiment with the two outcomes f+

i (the fault occurs)and f−

i (the fault does not occur). For faults f1, . . . , fn, let (Ωfi, Prfi

) be the discreteprobability space of this experiment. A product experiment (see appendix B.1.4) can becreated based on these discrete probability spaces. Using this, a minimal cut set can be in-terpreted as an event that consists of every matching event. The hazard can be interpretedas the union of such events. Therefore, the hazard probability can be approximated by

PrΩ(H) = PrΩ(∪

Γ∈MCS

event in Ω | event matches to Γ).

Applying the formula to the pressure tank example,

PrΩ(H) = PrΩ((FS+ × FK+1 × ΩFK2) ∪ (ΩFS × ΩFK1 × FK+

2 ))

= PrΩ((FS+,FK+1 ,FK+

2 ), (FS+,FK+1 ,FK−

2 ), (FS+,FK−1 ,FK+

2 ),

(FS−,FK+1 ,FK+

2 ), (FS−,FK−1 ,FK+

2 )).

The rare event approximation can be applied when the probabilities of each fault are verysmall. The rare event approximation states that the probability of a hazard H can be ap-proximated by summing up the probabilities of each minimal cut set leading to H . This

26


is an upper bound for the probability of the hazard. The probability of one minimal cutset is approximated by multiplying the probabilities of the faults inside that minimal cut set.More formally,

Pr rea(H) =∑

Γ∈MCS

∏

f∈Γ

Pr(f) ,

where Pr(f) abbreviates the probability of fault f , i.e., Prf (f+).Note that the event (FS+,FK+

1 ,FK+2 ) is counted twice using the rare event approxima-

tion:

PrΩ(H) = PrΩ((FS+ × FK+1 × ΩFK2) ∪ (ΩFS × ΩFK1 × FK+

2 ))

≤ PrΩ((FS+ × FK+1 × ΩFK2)) + PrΩ((ΩFS × ΩFK1 × FK+

2 ))

= PrFS(FS+) · PrFK1(FK+1 ) · 1 + 1 · 1 · PrFK2(FK

+2 )

= Pr rea(H).

The quantitative analysis with fault trees has several weaknesses: demand is hard to dealwith, some combinations of faults are counted more than once, the assumption of indepen-dence of faults is hard to keep up when common cause faults come into play, and systemswith a highly dynamic behavior are hard to capture using this technique.Probabilistic model checking does not have these problems. Using the techniques of

this thesis, chapter 9 confirms that the conventional analysis delivers acceptable results forcertain systems like the hemodialysis, but is useless for dynamic systems like the heightcontrol.The goal of the fault tree evaluation is to provide the order of magnitude for the hazard

probability. The authors of the fault tree handbook emphasize that “extreme precision isnot required (and is not believed!) in a fault tree evaluation” [VU81]. There is an inter-esting fact about fault tree evaluation: the fault tree analysis was originally developed forquantitative analysis but is used today more commonly for qualitative analysis [Lev95].

27

Alice: How long is forever?White Rabbit: Sometimes, just one second.

– Alice’s Adventures in Wonderland (1865)

4. Safety Analysis of Executable Models

The focus of this chapter is to provide practitioners with an introduction how to modeland analyze a system probabilistically using the approach introduced in this thesis. There-fore, the two languages S# and Lustre that can be used for the modeling are introducedusing smaller case studies. S# is the modeling language in focus of this thesis. Moreover,the chapter also presents an excerpt of the S# model of a hemodialysis machine to empha-size that the approach also works for real sized systems.Section 4.1 shows briefly how executable models can be integrated into the development

life-cycle of a safety-critical system. In section 4.2, the model of computation based onmicro- and macro steps is described which is used in this thesis. Section 4.3 introducesS# that is used as main executable modeling language of this thesis. For this purpose, thesection demonstrates how a model of the dead reckoning example of section 2.5 can becreated with imperative programming constructs. It also presents the S# fault injectionapproach to how faults and their effects are modeled, distinguishing between per-time andper-demand faults. Code that shows how to start qualitative and probabilistic safety analysesis shown at the end of the section. Section 4.4 presents how the hemodialysis machinefrom section 2.3 is modeled in S#. The hemodialysis machine is a larger case study witha lot of components. Hence, first the system is decomposed in components, which isshown in form of a SysML block definition diagram. Next, the controller of the casestudy is modeled using state machines provided by the S#-DSL. Afterwards, a model ofthe fluid flows that interconnect the components is created. Hydrodynamic laws are onlyapproximated, which is essential for fast model checking. This section is concluded by aprobabilistic safety analysis of the case study. Section 4.5 shows a model of the pressuretank (see section 2.4) created in the programming language Lustre that can also be used asan executable modeling language. Lustre is a data-flow based language that is well-suited forthe modeling of circuits. The usage of Lustre in this case study shows that the approach isgenerally applicable regardless of the modeling language used. Section 4.6 reports relevantfindings gathered during the creation of the models, and discusses related approaches fromother research groups.The S# model of the hemodialysis machine and its qualitative analysis is published in

[LHR16], and the probabilistic extension in [LHR18]. Götz analysed a Lustre model of a

29


hardware pressure tank version without explicitly modeled faults in [Göt17].

4.1. Executable Models in the Development Lifecycle

During the development of a safety-critical system, there are several activities that are usedto identify and assess potential hazards [Lev95; Sto96]:

• The preliminary hazard identification (PHI) is used to identify critical system hazards. Itis started early in the concept exploration phase of the system development. Theearly start makes it possible to handle risks involving hazards appropriately in earlydesign stages. One technique to identify hazards is hazard and operability studies (HA-ZOP), which uses informal “what-if ”questions to estimate the consequences of cer-tain events, e.g., “what would be the effect if the speed is increased?” The techniqueitself supports creative thinking about situations when and which hazards arise.

• The preliminary hazard analysis (PHA) uses the results of the PHI. The relations of thefunctional requirements and the identified hazards are considered. Safety engineerscan also use HAZOP for this. The PHA provides an initial assessment of thosehazards. The results of the PHA are collected in the preliminary hazard report, whichincludes among others the safety objectives of the system, and justifications aboutthe initial assessments.

• The system hazard analysis (SHA) is used when the designmatures. It includes a detailedstudy of the system even in degraded situations. The purpose of the SHA is torecommend changes to the system and evaluate the consequences of different designdecisions for the system safety. The SHA is used several times in an iterative processas the design evolves.

• In the system risk assessment (SRA), the consequences of hazards are investigated andtheir occurrence probabilities. Also, integrity levels are assigned to various compo-nents of the system based on the results of the SHA.

Safety analysis using executable models is especially useful for the system hazard analysis.Figure 4.1 shows how the safety analysis using executable models can be employed: rect-angles with round corners depict activities and rectangles without round corners artifacts.Using informal (preliminary) system specification, the nominal behavior of the system (thebehavior not including any faults) can be modeled as a (formal) executable model. Thedesired system properties from the specification can be modeled as well. Using a formalfunctional analysis, it can be verified whether the system behaves as asked for even when nofault occurs. If problems are found, the preliminary system specification can be fixed. Thenominal model of the system can be enriched to an extended system model. An extendedsystem model also contains information how faults affect the system behavior. Based onthe results of the PHI, the hazards can be formalized in terms of the executable model.The extended system model in conjunction with the formalized hazards can be used for

30

4.3. The S# modeling framework

In contrast to the final implementation of controller software, S#models are analyzable:controller software is executable, but it also often contains low-level details like networkprotocols or how underlying operation systems are interfaced. S# models usually do notcontain those details. Instead, software is described at a higher degree of abstraction, andthe model also includes behavioral details about the surrounding environment and faults.Even nondeterminism and probabilism can be modeled in S#. An example from the avia-tion domain illustrates why it is important to include non-controller components into themodel: a plane should never retract its landing gear when the plane is on the ground. Afaulty sensor may give the controller the wrong impression that the plane is already airborne,leading to a retraction of the landing gear. In this case, it is not enough to prove that thecontroller never retracts the landing gear when it senses that the plane is on the ground. Asafety analysis should also prove that a fault-tolerant controller never retracts the landinggear when the plane actually is on the ground even when a sensing fault occurs.Compared to low level formalisms like Kripke structures, Markov chains, or Markov de-

cision processes, the S#modeling language has a significantly higher level of expressiveness:it provides users with a rich modeling language based on the industrial-grade programminglanguage C#.The following subsections give a brief introduction how to create and analyze S#models.

As S# is based on C#, some prior knowledge of object-oriented programming is required.

4.3.1. Modeling Behavior with S#

This introduction to S# creates a model of the “dead reckoning” example of section 2.5 onpage 14. The following listing presents how the controller of the example can be modeledin a strongly abstracted way. This strong abstraction serves only an illustrative purpose, asa real model would contain more details.

1 public class DeadReckoningComponent : Component 2 public int Step;3 public bool CalculationError;4 public bool SensorValueWrong;5 public bool NoFixAvailable;6

7 public Formula Hazard => CalculationError && SensorValueWrong;8

9 public override void Update() 10 if (Step >= )11 return;12 if (Step == )13 RequestFix();14 if (Step == || NoFixAvailable) 15 CheckSensor();16 CalculatePosition();

33


17 18 Step++;19 20

21 public virtual void RequestFix() 22 // Get fix position23 NoFixAvailable = false;24 25

26 public virtual void CalculatePosition() 27 // Calculate new position28 CalculationError = false;29 30

31 public virtual void CheckSensor() 32 // Measure data from own sensor33 34

In line 1 the component DeadReckoningComponent which inherits Component is de-clared. Component is part of the S#-DSL. As specified by the description in section 2.5,the system is only intended to run for 3 seconds. Passing time must be modeled explicitlyin S#. The time passing between two macro steps is 1 second. For this purpose, line 2introduces a variable Step, which is initially . The component has three methods, namelyRequestFix (lines 21-24) to get the fix position in the first step, CheckSensor (lines 31-33)to get data from the sensor, and CalculatePosition (lines 26-29) to estimate the positionif the sensor data is invalid.Due to the abstract nature of the example, information is only saved if those actions

could be executed in a satisfactory way. Otherwise, the corresponding values NoFixAvail-able, SensorValueWrong, and CalculationError (declared in lines 3-5) are set to trueto indicate a problem. In line 7, the expression-bodied member Hazard denotes the hazardthat both the sensor data is invalid and the calculation was wrong. Of course, in a real modelof a safety-critical system, the hazard should never be formulated in internal variables of acontroller. This example ignores this best practice to keep the model concise and the statespace small to allow a closer look at the state space in the next subsection.The actual behavior of a component is modeled in its Update-method. The Component-

class of S# provides an empty implementation of this Update-method, which is overriddenin lines 9-19: If 3 steps (which correspond to 3 seconds) have passed, do nothing. If it isthe first step, then try to get the fixed position by calling the method RequestFix. If thefixed position could not be retrieved or if two seconds have passed, use CheckSensor andCalculatePosition to estimate the position. Note that in line 14, if is used and not elseif; therefore, CheckSensor and CalculatePosition could also be called in step 0. At theend of the Update-method, Step is incremented.

34


considered. Those nondeterministic choices can be resolved differently each time Chooseis called. S# can consider the best case and the worst case outcomes of choices whencalculating the probability of a formula like Hazard. S# combines the best case probabilityand the worst case probability to a probability range in which the probability of the analyzedformula in any possible outcome lies. In comparison, probabilistic choices can be usedwhen statistics of how a choice resolves have been gathered and these statistics should beused when evaluating a choice. Both types of choices are presented in the following abstractexample:

1 public class AbstractExample : Component 2 int Y = ;3 Formula E => Y== ;4

5 public override void Update() 6 if (Y != )7 return;8 bool L = Choose(9 new Option<bool>(new Probability( . ), true),10 new Option<bool>(new Probability( . ), false)11 );12 Y = ;13 if (L)14 Y = Choose( , );15 16

In the nondeterministic case, Choose takes as arguments a list of possible outcomes, andreturns one of them. In line 13 of the example, either or is assigned to the class variable Y.In the probabilistic case, Choose takes as arguments a list of options. Each option contains aprobability and the outcome associated with this probability, and the calculation of Choosereturns on of the outcomes. This is shown in lines 8-11, where the local variable is assignedeither true with a probability of 0.6 or false with a probability of 0.4. Any finite numberof choices are possible in a step; also, probabilistic choices and nondeterministic choicescan be combined in any order.If in each step a certain variable of a component is always written to before it is read from,

this variable may be marked as [Hidden]. Such hiding can be used to reduce both the sizeof a state and the size of the state space. The following listing provides an example. Thevariable L of the previous example was transformed into a field of the class to be able toexpress the formula EO, which is Y!= || L.

1 public class HiddenExample : Component 2 int Y = ;3 bool L = false;4 Formula E => Y!= || L;

36


5 Formula E => Y== ;6

7 public override void Update() 8 L = false;9 if (Y != )10 return;11 Y = ;12 L = Choose(13 new Option<bool>(new Probability( . ), true),14 new Option<bool>(new Probability( . ), false)15 );16 if (L)17 Y = Choose( , );18 19

HiddenExample consists of the reachable states (Y= ,L=false), (Y= ,L=false), (Y= ,L=true), (Y= ,L=true), (Y= ,L=false), and (Y= ,L=false) whereas AbstractExample onlyconsists of the states (Y= ), (Y= ), (Y= ), (Y= ). Note that the state (Y= ,L=false) is reachedby executing Update on the state (Y= ,L=true). AbstractExample has less states and statesof a smaller size; each of these properties makes model checking more efficient. Addingthe attribute [Hidden] in front of the field L removes L out of the state space, while stillretaining the ability to use L in formulas. When using this feature, modelers must take carethat a variable is indeed written before it is read in each step. One unique feature of theapproach of this thesis is that hidden variables can still be used to define hazards and otherformulas even though such a variable is not visible in the state space.

4.3.2. Modeling Faults in S#

S# strictly distinguish between faults and fault effects. A fault indicates that “somethingwent wrong”, and a fault effect determines what happens when a fault occurs. A fault mightalso have more than one fault effect. The separation allows modeling common cause faults,i.e., faults that might impact several distinct parts of the model. The process of adding faultsinto a model is commonly called fault injection.The S#-DSL offers two kinds of faults: transient faults and permanent faults. Permanent

faults are faults that never disappear once they appeared, e.g., a sensor with a physical de-fect. Transient faults instead may disappear again, e.g., a failed radio transmission may besuccessful on the next trial. The S#-DSL uses Choose internally to determine if a faultoccurred. Additionally, persistent faults use a state variable to memorize if they have beenactivated before.

37


The following listing integrates the 3 faults FC (fault in computation), FF (fault in retriev-ing the fixed position), and FS (sensor fault) into DeadReckoningComponent:

1 public class DeadReckoningComponent : Component 2 /*...*/3

4 public Fault FC, FF, FS;5

6 public DeadReckoningComponent() 7 FC = new TransientFault();8 FC.ProbabilityOfOccurrence = new Probability( . );9

10 FF = new TransientFault();11 FF.ProbabilityOfOccurrence = new Probability( . );12

13 FS = new PermanentFault();14 FS.ProbabilityOfOccurrence = new Probability( . );15 16

17 [FaultEffect(Fault = nameof(FC))]18 public class FCEffect : DeadReckoningComponent 19 public override void CalculatePosition() 20 // The calculated estimate is flawed, but can be better next time step21 CalculationError = true;22 23 24

25 [FaultEffect(Fault = nameof(FF))]26 public class FFEffect : DeadReckoningComponent 27 public override void RequestFix() 28 // Fix position is flawed or not available29 NoFixAvailable = true;30 31 32

33 [FaultEffect(Fault = nameof(FS))]34 public class FSEffect : DeadReckoningComponent 35 public override void CheckSensor() 36 // Sensor retrieves from now on wrong values37 SensorValueWrong = true;38 39 40

In the model, FC and FF are transient faults, and FS is a permanent fault, as set in theconstructor in lines 6-15. The probability of a fault is set in the property ProbabilityOf-Occurrence of the faults as shown in lines 8, 11, and 14. If the component manufacturer

38


provides the fault probability as MTTF, the probability can be discretized as demonstratedin section 3.4. If the probability is not known, the fault occurrence can also be modeled asnondeterministic by setting ProbabilityOfOccurence to null.A fault effect is modeled by adding nested classes into the affected component. Such

fault effect classes are derived from the affected components and are tagged with the at-tribute FaultEffect. By setting the fault in the parameter of the FaultEffect-attribute,the fault effect can be connected to its triggering fault. The methods inside the fault effectoverride the methods of the original behavior when one of its faults is active, e.g., Calcu-latePosition in line 19-22 overrides CalculatePosition of the original listing.S# allows to model per-time and per-demand faults. S# persistent faults are per-time

faults by default. A per-time fault corresponds to a fault with a demand at each time step.Therefore, S# treats per-time faults as special case of per-demand faults with a demand atthe start of each step. A fault can explicitly be set as per-time fault:

FF.DemandType = Fault.DemandTypes.OnStartOfStep;

To model a per-demand fault, two different ways are possible. S# can detect demands atrun-time, by interpreting the first method call of a fault effect in a step as demand. This isthe standard mode for transient faults. Modelers can explicitly set this mode for a fault:

FF.DemandType = Fault.DemandTypes.OnMethodCall;

Sometimes, modelers want to model demands explicitly. Therefore, a method must beprovided that is evaluated at the beginning of each step to determine if there was a fault de-mand in this step. This method can be provided in form of a lambda expression, e.g., () =>Step > , which returns true whenever step is larger than one. To use a lambda methodto determine the demand, the DemandTypemust be adjusted and the lambda-method mustbe assigned to the HasCustomDemand-property:

FF.DemandType = Fault.DemandTypes.OnCustom;

FF.HasCustomDemand = () => Step > ;

The fault injection only extends the behavior of the system. Assuming no faults occur,the traces of the extended model, i.e., the model with injected faults, should correspondto the traces of the nominal system model. The fault injection technique of S# ensuresthis automatically. This can easily be done, because faults and fault effects could simply bestripped away as the nominal model should never depend on them; otherwise it is regardedas a modeling bug that can be detected statically.

4.3.3. Analyzing Models with S#

The benefit of the strongly abstracted case study is that the size of the corresponding labeledMarkov chains is relatively small with only 25 states. Models with small state spaces have

39


Figure 4.5.: State space of the dead reckoning example with faults FS and FC but not FF

been useful during the development of S# for debugging purposes. By removing the faultFF, the size can be reduced even further to 11 states as shown in figure 4.5. The rectanglesdepict the states. The valuation of the variables is given in a brief form inside the rectangles:the current Step is written down directly; if and only if the variable CalculationError istrue in the state, the tag CE is denoted, the same applies for SensorValueWrong and itscorresponding tag SVW; finally, if and only if the permanent fault FS occured, the statecontains the tag with its name FS. Transient faults do not manifest themselves in the statebecause they are only modeled via local variables internally, which leads to gains in efficiency.On the transitions between states, two values are denoted: first, the probability of thetransition is given; secondly, a labeling with the value “t” if the formula of Hazard is satisfiedby the transition, and “f ” otherwise.

Readers may recognize that the transitions have labels and not the states, which is thestandard in the competing model checking approaches. This is for efficiency reasons asexplained and evaluated in chapter 5. How the transitions can be derived from the modelcomputationally is explained in chapter 7. Now interesting observations can be made: The

40


first state (Step= ) only has two outgoing transitions: CalculatePosition, which wouldlead to a demand of the transient fault FC, does not get called in this state, but the persistent,on-time fault FSmight be activated, which leads to two successor states. The correspondingstates of time step 2 either have two or four outgoing transitions, depending on whetherthe permanent fault FS has already been activated or not. In short, traces are sequences oftransitions. Traces are defined more precisely in chapter 5. There are three traces leading toa transition with the labeling “t”. Their sum corresponds to the probability of the hazard:

1.0 · 0.95 · 0.95 · 0.0005+ 1.0 · 0.95 · 0.05 · 0.01+ 1.0 · 0.05 · 1.0 · 0.01 = 0.00142625.

The fully automated probabilistic model checking of S# confirms this result.S# offers a range of safety analysis methods for the different phases in the development

lifecycle of a safety-critical system (recall figure 4.1).For the functional analysis, an invariant checker can be used that ignores the probabilities

in the model. The following listing shows how to check that the controller never detects aCalculationError when the fault FC is never activated:

1 var model = new DeadReckoningModel();2 model.Component.FC.Activation = Activation.Suppressed;3 var result = SafetySharpModelChecker.CheckInvariant(model,

!model.Component.CalculationError);4 Console.WriteLine(result.FormulaHolds);

S# reports true as result, indicating that the invariant holds. The invariant checker canbe applied on nominal models or also on extended models. As shown in line 2, it is oftenhandy to deactivate certain faults that have been introduced with the fault integration andmight invalidate formulas that have been valid in the corresponding nominal model.For the qualitative safety analysis, S# can automatically calculate the minimal critical sets

using the DCCA. How this is achieved technically is described in [HK+16]. To conduct theDCCA, the user must only pass the model and the hazard as propositional logic formula:

1 var model = new DeadReckoningModel();2 var analysis = new SafetySharpSafetyAnalysis();3 var result = analysis.ComputeMinimalCriticalSets(model, model.Component.Hazard);4 Console.WriteLine(result);

S# derives a singleton minimal critical fault set Γ = FC, FS from the model.For the probabilistic safety analysis, S# derives the probability from the traces directly.

This produces more accurate estimates than when derived from the minimal cut sets. Two

41


different techniques are integrated: A technique which is used for purely probabilistic mod-els and one for models that employ both nondeterministic and probabilistic choices. Thereason for this is that purely probabilistic models can be checked faster by magnitudes asshown by the evaluations in the later chapters. If checks are conducted purely probabilistic,then the nondeterministic choices are interpreted as probabilistic choices with a uniformdistribution. The probabilistic safety analysis can be conducted either with a bounded num-ber or an infinite number of steps. In this example, the probability to reach the hazard iscalculated when all three faults are possible:

1 var model = new DeadReckoningModel();2 var resultBounded =3 SafetySharpModelChecker.CalculateProbabilityToReachStateBounded(4 model, model.Component.Hazard, );5 Console.WriteLine(resultBounded);6 var resultUnbounded =7 SafetySharpModelChecker.CalculateProbabilityToReachState(8 model, model.Component.Hazard);9 Console.WriteLine(resultUnbounded);

S# calculates a probability of 0.00200837 for both the bounded case with maximal 10steps, and the unbounded case. The probability is slightly higher than the probability whenthe fault FF is not considered.Finally, the probability range between the best case probability and the worst case prob-

ability is calculated using the mode supporting both nondeterministic and probabilisticchoices adequately:

1 var model = new DeadReckoningModel();2 var result =3 SafetySharpModelChecker.CalculateProbabilityRangeToReachStateBounded(4 model, model.Component.Hazard, );5 Console.WriteLine(result);

Of course, the model was purely probabilistic. Thus, S# correctly calculated the proba-bility range [0.00200837, 0.00200837].

4.4. S# Model of a Hemodialysis Machine

The object-oriented features of S# enable the systematic modeling of larger systems. Thehemodialysis machine of section 2.3 is such a system. This section demonstrates how sucha medical device can be modeled in a way that automatic safety analysis is possible. Later,chapter 9 discusses the results of its safety analysis. The complete model is available in theS# repository [ISSE18].

42


Figure 4.6.: HemodialysisMachine Bdd

4.4.1. Structure of the Model

The hemodialysis machine is first decomposed into subcomponents to manage its com-plexity. This is possible due to the object-oriented features of S#. Figure 4.6 shows thedecomposition using the standard system modeling language SysML [Obj15].The Specification contains the Patient and the HdMachine (hemodialysis machine). It is

necessary to include the patient in the model to be able to express hazards that concernthe patient. The HdMachine itself consists of several parts, namely the Dialyzer, the Control-

System, the DialysingFluidDeliverySystem, and the ExtraCorporealBloodCircuit. DialyzingFluid-

DeliverySystem and ExtraCorporealBloodCircuit themselves consist of several parts (e.g., Wa-

terSupply). The ControlSystem itself only contains references to these subparts because theyare physically not part of ControlSystem. Figure 2.3 on page 11 shows the interconnectionbetween those parts.

4.4.2. Controller Specification using State Machines

There are several established ways to model the behavior of controllers. S# supports bothmodeling controllers with state machines and by sequential code. S# even allows nestingstate machines with sequential code. The following listing shows an excerpt of the controlsystem of the hemodialysis machine model.

43


1 public enum TherapyPhase 2 InitiationPhase,3 EndingPhase4 5 public class ControlSystem : Component 6 public int TimeStepsLeft = ; // hard code 6 time steps7 // references to components8 private readonly VenousSafetyDetector VenousSafetyDetector;9 private readonly VenousTubingValve VenousTubingValve;10 /* (other components and constructor left out for brevity) */11 public StateMachine<TherapyPhase> CurrentTherapyPhase =

TherapyPhase.InitiationPhase;12

13 public void StepOfMainTherapy() /* behavior left out for brevity */ 14 public void ShutdownMotors() 15 VenousTubingValve.CloseValve();16 TimeStepsLeft = ;17 ArterialBloodPump.SpeedOfMotor = ;18 UltraFiltrationPump.PumpSpeed = ;19 PumpToBalanceChamber.PumpSpeed = ;20 DialyzingFluidPreparation.PumpSpeed = ;21 22 public override void Update() 23 CurrentTherapyPhase.Transition(24 from: TherapyPhase.InitiationPhase,25 to: TherapyPhase.InitiationPhase,26 guard: TimeStepsLeft > &&

!VenousSafetyDetector.DetectedGasOrContaminatedBlood,27 action: StepOfMainTherapy28 )29 .Transition(30 from: TherapyPhase.InitiationPhase,31 to: TherapyPhase.EndingPhase,32 guard: TimeStepsLeft <= ||

VenousSafetyDetector.DetectedGasOrContaminatedBlood,33 action: ShutdownMotors34 );35 36

First, the possible therapy phases InitiationPhase and EndingPhase are declared inthe enumeration in lines 1-4. In the following lines, the ControlSystem is declared. Ini-tially, the remaining time of the dialysis is 6 hours, and a constant time of 1 hour passesbetween the macro steps. Therefore, the class contains the field TimeStepsLeft of typeint with the initial value in line 6. The S#-DSL provides a generic state machine whosestates can be determined by instantiating the generic type StateMachine<T> with an enu-

44


meration T of the desired states. One such state machine is CurrentTherapyPhase inline 11, whose states are determined by the enumeration TherapyPhase. The initial activestate is set to InitiationPhase. In lines 8 and 9, the ControlSystem contains referencesto components that are not part of the ControlSystem itself like VenousSafetyDetectorof type VenousSafetyDetector. The concrete instances of the references are set in theconstructor and are not shown in the excerpt for brevity. The reference is marked as read-only because it cannot be changed during model checking. The class ControlSystem alsocontains the methods StepOfMainTherapy and ShutdownMotors. The behavior definedin the methods use the previously declared references. Methods may be called from statemachines or from other methods. Finally, the Update-method of ControlSystem containstwo transitions of the state machine CurrentTherapyPhase. The first transition is a reflex-ive transition from the initial state InitiationPhase to itself. A transition is usable whenthe active state of the state machine is the from-state of the transition and the guard ofthe transition evaluates to true. Each time the Update method is called, the state machineselects an arbitrary usable transition, executes its action, and sets the to-state as the nextactive state. This action can be any method of the containing component.

4.4.3. Flow Concept

The model of the control system would be sufficient to verify if the control system fulfills aparticular specification. However, non-controller components must also be included intothe model to conduct a complete safety analysis. Typically, a controller makes assump-tions about its environment. These assumptions may be wrong. The perceived state maybe different from the actual state, especially when sensors fail or deliver erroneous values.Furthermore, hazards are typically expressed in terms of a system’s environment, e.g., con-taminated blood is entering the vein of the patient.To express this hazard in S#, the relevant parts of the controllers’ environment have to

be included into the model. In addition, modeling the environment makes it possible tocalculate the probability of a hazard. The fluid flows need to be expressed for an adequatemodel of the non-controller elements of the hemodialysis machine and their interconnec-tions. Fluid flows obey complex physical laws such as, e.g., Bernoulli’s principle. Fortu-nately, to create a useful model of many fluid flows, it is not necessary to take account ofall details of these laws. It is sufficient to model the basic principles necessary for the casestudy. It is of greater importance for the subsequent model checking that the calculationscan be performed efficiently. Of course, the better the hydrodynamic laws are represented,the better are the results. This subsection demonstrates how simple fluid flows can be mod-eled in an efficient way in S#. The implementation is generic and can be reused for anyacyclic fluid flow where no backflow is possible, i.e., the direction in which the fluids flowis fixed. Finally, this subsection presents the model of the pump for dialyzing fluid used inthe hemodialysis machine.

45


Figure 4.7.: Example of a simple fluid flow illustrating the flow concept

Our model separates the information of how each flow component works and how allflow components form a common flow. The flow between these components is only de-clared by connecting these flow components. The separation increases the reuse of anymodeled flow component. This corresponds to the design pattern “low coupling”. Bycontrast, models that encode assumptions about the flow in each component are harder tounderstand and to revise. Thus, it is more likely they contain bugs. Imagine that a fluidflow from a water supply to a drain should be modeled. A pump that sits in between thewater supply and the drain determines the exact amount of water that flows in each step. Itis desirable to treat the water supply, the pump, and the drain as separate independent flowcomponents. The challenge is to create an adequate model (where the pump determinesthe amount of fluid that is emitted by the water supply) and adhere to the low couplingparadigm at the same time. When a pump adds dynamic pressure to a flow system, a flowwith a certain speed is the result of the law of conservation of energy. The basic idea ofour simplified model is to see a fluid flow as a bidirectional flow (see figure 4.7). In thebackward direction, there is a suction representing the added dynamic pressure. This pres-sure determines the emitted fluid. In the forward direction, there is a flow of a specificamount of fluid, which is the result of the compensation of the added dynamic pressure.In the example, the pump emits a suction on the water supply, which is its predecessor inthe flow. Now, based on the incoming suction, the water supply can determine the exactamount of fluid to emit into the direction of the pump. To calculate the amount of fluidthat arrives at the drain the following sequence is executed:

1. The Drain notifies its predecessor that it is able to receive any amount of fluid (SendBackward = any amount).

2. The Pump receives the suction information of the Drain (= suction y). It calculatesthe suction x based on suction y. In this case, y = “any amount of fluid”, so x is setto the number of units of liquid the Pump moves in a time step. Then, the Pumpnotifies its predecessor that it wants x units of fluid (Update Backward = suction x).

3. TheWater Supply receives the suction information of the Pump (Received Backward= suction x).

4. The Water Supply emits fluid. It knows the amount of fluid to emit from the previ-ously received suction (Send Forward = x units fluid).

5. The Pump forwards the received fluid to its successor (Update Forward = x units

46


Figure 4.8.: Internal Block Diagram of simple fluid flow

fluid).6. The Drain receives the fluid (Received Forward = x units fluid).This concept might seem to be too over-engineered for this small example, but adhering

to this concept allows implementing a pump in a similar way as the safety valve, drip cham-bers, or any other fluid component where a flow runs through. On the other hand, it issimple enough to allow fast calculations, which is essential for fast model checking. Never-theless, adhering more closely to hydrodynamic laws would provide better results but slowdown analysis time significantly, because the pressure equilibrium needed to be calculatedusing iterative methods.

4.4.4. Modeling Fluid Flows

The SysML Internal Block Diagram in figure 4.8 depicts the structure of the simple fluidflow model of figure 4.7 in S#. The diagram declares each flow component as a separatecomponent declaration with its behavior. An instance of a component declaration is de-noted by InstanceName : DeclarationNamewhere InstanceName is optional. Each componentin the example contains a port with the name MainFlow that is of a different type, respec-tively. WaterSupply has a port of the type FlowSource with the property Outgoing; Drain aport of type FlowSink with the property Incoming; and Pump a port of type FlowInToOut withboth properties. A flow is established by connecting the Outgoings with the Incomings. Thedashed arrows depict connections. The following listing presents an excerpt of the S# codeof the simple fluid flow that creates the simple fluid flow:

1 var supply = new WaterSupply();2 var pump = new Pump();3 var drain = new Drain();4 var combinator = new DialyzingFluidFlowCombinator();5 pump.PumpSpeed = ;6 combinator.ConnectOutWithIn(supply.MainFlow, pump.MainFlow);7 combinator.ConnectOutWithIn(pump.MainFlow, drain.MainFlow);8 combinator.CommitFlow();

The classes WaterSupply, Pump, and Drain contain generic “templates” of how watersupplies, pumps, and drains work, respectively. These generic templates need to be instan-

47


tiated to use them in a specific flow. The flow components supply, pump, and drain areinstantiated by calling the constructor of their corresponding classes, respectively. A Dia-lyzingFluidFlowCombinator is instantiated, which is used to establish the flow betweenthe instances of the flow components. The speed of the pump is set to units per step.Whether unit of measurement should be interpreted as liters, 250 milliliter, or somethingelse is the responsibility of the modeler. In the model, unit of fluid corresponds to 100milliliter in the hemodialysis example. At some places, fluid flows are split into sub flows(e.g., before pressure transducers, recall figure 2.3 on page 11).

Figure 4.9.: Internal Block Diagram of a model in which a flow splits

Figure 4.9 shows an example where a flow splits into 2 sub flows. Each sub flow ends ina Drain. These splitting flows are realized with ports of the type FlowSplitter. A FlowSplitter

has multiple Outgoing properties, which can be connected to the Incoming properties ofdifferent ports in different flow components. To merge several flows, FlowMerger has alsobeen implemented. FlowMerger is dual to FlowSplitter and is not described here any further.

4.4.5. Modeling a Pump

The following listing contains the model of the pump that is instantiated 2 times in the fullmodel, namely as PumpToBalanceChamber and as UltraFiltrationPump (see figure 4.6):

1 public class Pump : Component 2 public readonly FlowInToOut<DialyzingFluid,Suction> MainFlow;3 public int PumpSpeed = ;4 public Pump() 5 MainFlow = new FlowInToOut<DialyzingFluid,Suction>();6 MainFlow.UpdateBackward = SetMainFlowSuction;7 MainFlow.UpdateForward = SetMainFlow;8 9

10 public DialyzingFluid SetMainFlow(DialyzingFluid fromPredecessor) returnfromPredecessor;

48


11 public virtual Suction SetMainFlowSuction(Suction fromSuccessor) 12 Suction toPredecessor;13 toPredecessor.SuctionType = SuctionType.CustomSuction;14 if (fromSuccessor.SuctionType==SuctionType.SourceDependentSuction)15 toPredecessor.CustomSuctionValue = PumpSpeed;16 else17 toPredecessor.CustomSuctionValue =

fromSuccessor.CustomSuctionValue+PumpSpeed;18 return toPredecessor;19 20

The port MainFlow contains the 2 delegates UpdateBackward and UpdateForward, whichdetermine the methods to call when a suction is received from the successor and a fluidelement is received from the predecessor, respectively. In the constructor, the methodsSetMainFlowSuction and SetMainFlow are assigned to the 2 delegates. Every time theport receives a suction, the local member SetMainFlowSuction is called. This methodcreates a suction on its predecessor in the size of PumpSpeed. Incoming fluids are justforwarded to the successor.

4.4.6. Probabilistic Safety Analysis

As already mentioned earlier, faults may lead to the situation that the perceived state ofa controller differs from the actual state. The following listing extends the model of thepump with the fault PumpDefect and its effect:

1 public class Pump : Component 2 /* code see above*/3

4 public readonly Fault PumpDefect new PermanentFault();5

6 [FaultEffect(Fault = nameof(PumpDefect))]7 public class PumpDefectEffect : Pump 8 public override Suction SetMainFlowSuction(Suction fromSuccessor) 9 Suction toPredecessor;10 toPredecessor.SuctionType = SuctionType.CustomSuction;11 toPredecessor.CustomSuctionValue = ;12 return toPredecessor;13 14 15

Table 4.1 provides the probabilities of the faults that were described in section 2.3. Thecurrent fault probabilities in the model are speculative and only serve the purpose to illus-trate the feasibility of the approach.

49


Fault Type Probability

BloodPumpDefect permanent, per-time 1.0× 10−5

DialyzerMembraneRupturesFault permanent, per-time 1.0× 10−5

DialyzingFluidPreparationPumpDefect permanent, per-demand 1.0× 10−5

SafetyBypassFault permanent, per-time 1.0× 10−3

WaterHeaterDefect permanent, per-time 1.0× 10−2

PumpToBalanceChamberDefect permanent, per-time 1.0× 10−5

SafetyDetectorDefect permanent, per-time 1.0× 10−7

ValveDoesNotClose permanent, per-time 1.0× 10−5

UltrafiltrationPumpDefect permanent, per-time 1.0× 10−3

Table 4.1.: Fault probabilities in the hemodialysis case study

To calculate the probability of a blood contamination and an unsuccessful dialysis, S#’sSafetySharpModelChecker.CalculateProbabilityToReachState can be used, whichreturns a probability of 5.27× 10−2 and 3.50× 10−3, respectively. Later, chapter 9 analysesthe safety in greater detail by considering the effects of different fault probabilities.

4.5. Lustre Model of Pressure Tank

Most algorithms for the safety analysis are modeling language agnostic. Therefore, anyprogramming language can be used as executable modeling language if it satisfies certainconditions; Lustre is such a language. Chapter 7 provides the technical preconditions anddetails how this can be achieved.Lustre is a synchronous, data-flow-oriented programming language [JRH18]. In such a

synchronous language, the time is discretized, a complete cycle of the program is executedin each discrete time step, and the program reacts instantaneously to its inputs. To be ableto run such a synchronous program on specific hardware, the developer must ensure thatthe target hardware can execute a program cycle fast enough. The step semantics of Lustregoes well together with the model of computation described in section 4.2: one cycle of aLustre program then corresponds to a macro step of the executable model.In a data-flow-oriented language, the output of a program is given by a set of equations.

Each of these equations can depend on other equations as long as these dependencies areacyclic. Therefore, a data-flow can be visualized as a network of operators where eachoperator has a (possible empty) set of inputs and outputs. The data-flow oriented languageScade that is based on Lustre provides both a graphical and a textual way to create data-flowprograms.

50

4.5. Lustre Model of Pressure Tank

The following listing contains an extended Lustre model of the pressure tank (see sec-tion 2.4 on page 13). The time between two macro steps is 5 seconds.

1 const2 max_time= ;3 max_level= ;4 min_sensed_pressure= ;5

6 node TANK(fault_k , fault_k , fault_sensor: bool) returns (level: int);7 var8 sensor, switch, k , k , timer: bool; -- these are the contacts9 c _c : bool;10 time: int;11 let12 sensor = true ->13 if (level >= min_sensed_pressure and not fault_sensor) then false14 else true;15

16 switch = true -> false;17

18 c _c = pre(switch) or (false -> pre(k ));19

20 k = false -> (pre(timer) and c _c ) or (pre(k ) and fault_k );21

22 k = false -> (pre(sensor) and c _c ) or (pre(k ) and fault_k );23

24 time = ->25 -- timer is energized26 if pre(sensor) and c _c then27 if pre(time) >= max_time then max_time else pre(time) +28 -- timer is not energized29 else30 ;31

32 timer = time < max_time;33

34 level = ->35 -- pump is energized36 if k then37 if pre(level) >= max_level then max_level else pre(level) +38 -- pump is not energized39 else40 ;41 tel

Three constants are defined in lines 1-4: max_time gives the maximal time the timer stays

51


Step 1 2 3 4 5 6 7 8 9 10 11 12 13

fault_k . . . . . . . . . ⊤ ⊤ ⊤ ⊤

fault_k . . . . . . . . . . . . ⊤

fault_sensor . . . . . . . . ⊤ ⊤ ⊤ ⊤ ⊤

level 0 5 10 15 20 25 30 35 40 45 50 55 60

sensor ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤

switch ⊤ . . . . . . . . . . . .

k . ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤

k . ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤

timer ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ . . . .

c _c nil ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤

time 0 1 2 3 4 5 6 7 8 9 9 9 9

Table 4.2.: Example of pressure tank model execution

closed when it is energized, max_level gives the maximal level of the pressure tank, andmin_sensed_pressure the minimal pressure level when the sensor opens. Themodel itselfis given by the node TANK. The node has three inputs fault_k , fault_k , and fault_-sensor of type bool for the three faults that are either true or false in a macro step.The node generates one output level of type int that contains the pressure level aftereach macro step. The state variables of the node are declared in lines 7-10. The variablessensor, switch, … indicate the states of the contacts of the corresponding componentsthat either can be closed (true) or open (false). The auxiliary variable c _c indicatesif the switch or the relay K1 was closed in the previous macro step. The variable timecontains the time how long the timer is energized.The value of each variable is determined exactly once. The order of the equation does

not matter, because Lustre identifies the evaluation order by itself. This is possible, becausethere are no cyclic dependencies. To break dependencies, the unary previous operator precan be used: it evaluates its parameter using the values calculated by the previous step. Thepre value of a variable in the first step is the undefined value nil. The binary initializationoperator -> can be used to initialize variables: in the first step, the value on the left side ofthe operator is used, and in all other steps the value on the right side. The conditional ifthen else is in Lustre an expression in the functional sense, i.e., it always returns a value.The behavior of the node is specified by the equations between let and tel (lines 11-

41). Already the first equation is an excellent example: due to the initialization operatorit evaluates in the first step to true, and it depends on the variable level which is laterdefined but earlier evaluated.Table 4.2 shows an exemplary execution of the pressure tank model. To improve the

52

4.6. Discussion and Related Work

Fault Type Probability

fault_k permanent, per-time 3.0× 10−6

fault_k permanent, per-time 3.0× 10−6

fault_sensor permanent, per-time 1.0× 10−5

Table 4.3.: Fault probabilities in the pressure tank case study

readability, true and false are abbreviated to ⊤ and a dot, respectively. In step 9 of thedepicted scenario, the sensor did not react to the high pressure level due to a fault. Andeven if the timer – the safety mechanism of the system – opened in step 10, the relay K1

did not open, due to another fault. Therefore, a pressure level of 60 was reached in step 13.This scenario is thus a witness that a pressure level of 60 is possible when faults occur.In contrast to S#, Lustre has no special means for fault injection. Therefore, faults have

to be integrated in a way that ensures that the behavior when all faults are permanentlyfalse, the observable behavior equals the nominal behavior. One technique to achievethis is conservative integration [HG+12].For the probabilistic safety analysis of the pressure tank model, the hazard “level ≥

60” was used in conjunction with the fault probabilities given in table 4.3. The resultingprobability that the hazard occurs within the first 25 steps is 3.00× 10−5.In the prototypical implementation, the hazard formula and the fault probabilities have

been hard-coded, but it is trivial to extend the implementation. Fault probabilities could begiven in the const section of a Lustre model. A hazard could be encoded as Lustre nodewith the system output as its input, and a single bool as output. Note that such a hazardnode must not use the previous operator or the initialization operator, because hazardsshould correspond to propositional logic formulas.


This section summarizes the experiences from modeling and analyzing the aforementionedcase studies. These experiences are also related to various other tools and approaches formodeling and analyzing safety-critical systems

Modeling languages do not have to be invented from scratch. With the unconven-tional choice of using the C# programming language as the foundation of the S#modelinglanguage, S# benefits from the experience of C#’s language and tool designers: S# inheritsC#’s mature imperative, object-oriented language features and tooling support at the costof minor syntactic overhead in the models. To make further modeling paradigms available,a textual notation for state machines were embedded into S#, none of which required exten-sions to C#’s syntax. It was also not necessary to include new syntax for the probabilisticand nondeterministic choice; the Choose-method of the Component-class provides an in-

53


tuitive way for it. Also for systems that are best modeled as dataflow systems, adequatelanguages are already available, as demonstrated by the pressure tank model in Lustre. Thisthesis found a way how existing programming languages can be used as modeling languagesand still be analyzed probabilistically.Use the most adequate modeling paradigms for the best model quality. There areseveral modeling paradigms to express the dynamic behavior in formal models. The Lustrelanguage is an instance of the dataflow paradigm where the data flows from inputs to outputsin a network of interconnected components [Ber07]. An example of the imperative paradigmis SPIN where behavior is expressed sequentially in a C-like syntax [Hol04]. Modelicafollows the equation paradigm where the relations of different components are expressed asphysical equations [Mod14]. Languages like SAML are based on the state machine paradigmwhere behavior is expressed as transitions between states [LSO12].It makes sense to include different modeling paradigms into one modeling language and

thus allow models to use the best of several worlds. For instance, the modeling languagesof Compass and Scade support both dataflows and state machines [Nol15; Ber07]. In S#,on the other hand, behavior can be modeled both imperatively and with state machines.Furthermore, object-oriented methods can be used to emulate flows as demonstrated bythe hemodialysis machine. The model of the height control system, uses imperative mod-eling language features whereas for the onboard computer of the train, state machinesare used [ISSE18]. Dataflows are appropriate to model the non-controller parts of thehemodialysis machine; however, adding support for equations to S# similar to Modelicawould likely make the models more comprehensible.There are several paradigms to express the structure of models. Most component-oriented

modeling languages like Compass and Scade depend on a rigorous part-of hierarchy of the com-ponents where no component may directly access another component that is not a physicalpart of it. In S# and SysML, by contrast, components might also have references to othercomponents that are not physically part of them. This allows for further greater flexibilitywhile modeling: in the listing of the hemodialysis machine controller in subsection 4.4.2on page 43, for instance, the controller can access the pump and the pressure sensor usingreferences even if they are not physically part of the controller; the actual implementationof the controller is likely to work in a similar way. In modeling languages which insist on arigorous part-of hierarchy, buses and networking components have to be included into themodels which increase clutter even though a higher level of abstraction would suffice.

54

Det. Polhaus: Heavy. What is it?Sam Spade: The, uh, stuff that dreams are made of.

– The Maltese Falcon (1941)

5. Probabilistic Systems

In the literature, various formal underpinnings have been proposed to reason about theprobability of specific events in a dynamically behaving system. In this thesis, such formalunderpinnings are summarized under the term probabilistic systems. This chapter introducestwo probabilistic systems, namely generic Markov chains and choice-aware Markov decision processes.These are later used for the probabilistic safety analysis of executable models.

Roughly speaking, a Markov chain (MC) contains for each state a probability distributionover potential successor states. MCs can be used for the analysis of purely probabilisticmodels, i.e., models containing no nondeterministic choice. Section 5.1 describes genericMarkov chains, a slightly generalized version of MCs that have been developed for thisthesis to better suit to executable models than standard MCs.

Markov decision processes (MDPs) combine nondeterministic choice and probabilisticchoice: for each state, a MDP contains a nondeterministic choice between different prob-ability distributions over potential successor states. An intuitive way to look at a transitionin a MDP is that first one probability distribution of the active state is selected nondeter-ministically, and secondly, this probability distribution is used to select a successor state prob-abilistically. Choice-aware Markov decision processes (CMDPs) are a major extension ofthe well-known MDPs: to define the transitions of a state, there may be multiple consec-utive nondeterministic and probabilistic choices in any order, in contrast to MDPs, whichare limited to exactly one nondeterministic choice and afterwards exactly one probabilisticchoice per transition. In CMDPs, the choices are “saved” explicitly in the structure (hence“choice-aware”). CMDPs enable the probabilistic analysis of executable models that con-tain both probabilistic and nondeterministic choices, and are described in more detail insection 5.2.

Both section 5.1 (discussing generic MCs) and section 5.2 (discussing generic CMDPs)have the same outline. This approach is useful because the definitions and proofs of thegeneric CMDPs are based on those of the generic MCs. First, the probabilistic systemsare introduced formally. Afterwards, paths and a probability measure based on that pathsare defined for the respective system. Then, iteration operators to calculate the probability

55


of certain events in a bounded number of steps are presented and proved to be correct.1

Thereafter, calculating the probability of certain events in an unbounded number of steps isdiscussed. Proofs are provided that the operators of the bounded case can also be usedto approximate the unbounded case. Finally, the probabilistic systems are compared withtheir standard variants using evaluations of the case studies. Section 5.3 discusses relatedwork of probabilistic systems.Generic MCs have been published in [LK+17]. This thesis provides the first descrip-

tion of CMDPs. Descriptions about standard Markov chains and standard Markov deci-sion processes have been discussed by many authors before, e.g., in [BK08; Bai98]. Thedefinitions of generic MCs and CMDPs are structurally close to their standard counter-parts. The proofs of this thesis that the iteration operators can be used to calculate theunbounded reachability in generic MCs and CMDPs reuses proof ideas from Baier, whomade a similar proof for standard MCs and standard MDPs [Bai98]. Some of the less in-teresting but more technical proofs can be found in the supplementary document in theS#-repository [ISSE18].

5.1. Generic Markov Chains

As the first probabilistic system, generic Markov chains (generic MCs/GMCs), which aresuited to model purely probabilistic systems, are introduced.

Definition 5.1 (Generic Markov Chain). A generic Markov chainM is represented by thetuple (S, Θ, τ, R) consisting of

• a countable set S of states, and• a countable set Θ of targets with a target state function τ : Θ → S, and• a transition distribution function R : S → Dists(Θ)

where Dists(Θ) = µ : Θ → [0, 1] |∑

θ∈Θ µ(θ) = 1.

Let µs be a shorthand for R(s). By convention SM stands forM’s states, ΘM forM’stargets, and so on. This is used in the remainder, when the corresponding generic Markovchain is not clear from the context.By choosing the states as targets, i.e.,Θ = S, and the identity function as τ , a probabilistic

system is obtained that is commonly known as Markov chains. Such Markov chains areidentified as standard Markov chains in the remainder.This work introduces another useful manifestation of generic Markov chains, the so

called labeled Markov chains. An L-labeled Markov chain is a generic Markov chain where ΘL-labeledMC is chosen to be L × S for some set of labels L with τ(ℓ, s) = s. In contrast to approaches

where states are labeled, the labeling belongs to the transitions. Shifting the labeling on the

1The implementations in chapter 6 are based on that operators.

56


transitions is one reason for the efficiency of the algorithms in this thesis as discussed inthe evaluation section in more detail.

target label

θ1 e0θ2 e0θ3 e0, e1θ4θ5 e0

Figure 5.1.: Labeled Markov chain with 3 states

Figure 5.1 depicts an example of a 2e0,e1-labeled MC. The squared nodes represent thestates, e.g., node 1 represents s1. The distribution function of a state is depicted by thewhite-headed outgoing arcs of its corresponding node. Each arc is associated with a target,e.g., the reflexive arc of state 1 is associated with θ1. The target node of an arc correspondsto the target state of its target, e.g., τ(θ1) = s2. The labels of each target are given in thetable. This labeled Markov chain can be used to calculate the probability that eventuallya label like e0 is reached from a specific state. How this can be formally expressed andcalculated is topic of the remainder of this subsection.

5.1.1. Paths and Probability Measure

LetM = (S, Θ, τ, R) be a generic MC.A finite path π = θ1 . . . θn ∈ Θ∗ ofM is a sequence of targets where µτ(θi)(θi+1) > 0 π

for all 1 ≤ i < n. The empty sequence is also a valid finite path; ∅ denotes the emptysequence. FinPathsM is the set of all finite paths of M. An anchored finite path π = π

[s, θ1 θ2 . . . θn] ∈ S×FinPathsM ofM is a pair of a state and a finite path where µs(θ1) >0. Let |π| = n be the path length of π = [s, θ1 θ2 . . . θn]. FinPathsM

is the set of all |π|

anchored finite paths ofM. FinPathsMs = (s × FinPathsM) ∩ FinPathsMis the set

of all anchored finite paths starting from s. Let τ : FinPathsM

→ S be a function that τreturns the last state of an anchored finite path, i.e., τ([s,∅]) = s, and τ([s, θ1 . . . θn]) =τ(θn) otherwise. Given an anchored finite path π = [s, θ1 . . . θn] and a target θ, letπθ = [s, θ1 . . . θn θ] be the concatenation of the finite path with the target. Sometimes, asubset Π ⊆ FinPathsM

is from interest; then, such a subset Π is called paths of interest.

An infinite path π = θ1 θ2 θ3 . . . ∈ Θω ofM is a sequence of targets where for all i ≥ 1, πµτ(θi)(θi+1) > 0. InfPathsM is the set of all infinite paths of M. An anchored infinite π

path π = [s, θ1 θ2 θ3 . . . ] ∈ S × InfPathsM of M is a pair of a state and an infinitepath where µs(θ1) > 0. InfPathsM

is the set of all infinite anchored infinite paths ofM.

InfPathsMs = (s × InfPathsM) ∩ InfPathsMis the set of all anchored infinite paths

starting from s. Let π[..k] = θ1 . . . θk denote the k-th prefix of π = θ1 θ2 θ3 . . . , and let π[..k]

57


Pref (π) = π′ | π′ is prefix of π denote the set of all prefixes of π. Note that ∅ is aprefix of all infinite paths. Also, let Pref ([s, π]) = [s, π] | π ∈ Pref (π) denote the setof all anchored prefixes of [s, π].In the remainder, both finite and infinite paths are sometimes just called paths when it

is clear from the context which one is meant.The cylinder set π

↑M of the anchored finite path π ∈ FinPathsMcontains all anchored

infinite paths ofM that start with π. More formally, π↑M = π′

∈ InfPathsM

| π ∈π

↑M

Pref (π′). Let Π ⊆ FinPathsM, then Π↑M =

∪π∈Π π

↑M. If M is clear from theΠ↑M

context, π↑ abbreviates π

↑M, and Π↑ abbreviates Π↑M.π↑, Π↑

The cylinder sets of a genericMarkov chain are the basic elements to define its probabilitymeasure.

Definition 5.2 (Probability Space of a Generic Markov Chain starting in a certain state).The probability space (ΩM

s ,EM

s , PrM

s ) of a genericMarkov chainM = (S, Θ, τ, R) startingin state s ∈ SM consists of

• the sample space ΩM

s = InfPathsMs , and• the events EM

s as the smallest σ-algebra on ΩM

s that contains

π↑ | π ∈ FinPathsMs , and

• the probability measure PrM

s with

PrM

s ([s, θ1 . . . θn]↑) = µs(θ1) ·∏

1≤i<n

µτ(θi)(θi+1) .

Using this probability measure, the probability of a countable set of anchored finite pathsΠ starting from state s which are non-overlapping, i.e., π

↑∩π′

↑ = ∅ for all π = π′

∈ Π,can be calculated. This can be achieved by summing up the probabilities of each anchoredfinite path in the set, i.e.

PrM

s (Π↑) = PrM

s (∪

π∈Π

π↑) =

∑

π∈Π

PrM

s (π↑) .

The non-overlapping property ensures that each path is counted exactly once. To allow thissimple summation, Π needs to be a subset of FinPathsMs . Otherwise, this set may containpaths starting at different states for which the probabilities are defined in other probabilityspaces. In general, summing up probabilities from different probability spaces is not welldefined. Such sums may even have values greater than 1. For that reason, given an arbitraryΠ ⊆ FinPathsM

, the probability of that paths cannot be calculated. But the probability of

those paths in Π that start from state s can be calculated. Therefore, the restriction to thoseΠspaths is defined as Πs = Π ∩ FinPathsMs .To conclude this subsection, a lemma that “extracts” the first distribution from an an-

chored path is given that is used in later proofs.

58


Lemma 5.3. Let (S, Θ, τ, R) be a generic Markov chain and [s, θ1 . . . θn] a finite pathwith n ≥ 1, then

PrM

s ([s, θ1 . . . θn]↑M) = µs(θ1) · PrM

τ(θ1)([τ(θ1), θ2 . . . θn]↑) .

Proof. By expanding the definitions, we get

PrM

s ([s, θ1 . . . θn]↑)

= µs(θ1) ·∏

1≤i<n

µτ(θ1)(θi+1)

= µs(θ1) · µτ(θ1)(θ2) ·∏

2≤i<n

µτ(θi)(θi+1)

= µs(θ1) · PrM

τ(θ1)([τ(θ1), θ2 . . . θn]↑) .

5.1.2. Unbounded Reachability

The goal is to compute the probability to reach specific targets, i.e., the probability to reachtargets ofΘ2 ⊆ ΘM being restricted to stay in the targetsΘ1 ⊆ ΘM before. Introducing thetemporal until operator U , the finite paths that come in question can be formally definedas

FinPathsM

(Θ1 U Θ2) = [s, θ1 . . . θn] ∈ FinPathsM

|

θn ∈ Θ2 ∧ ∀1 ≤ i < n . θi ∈ Θ1 .

The computation of their probability cannot directly be reduced to the computationof the probability of the set of paths of interest, and the paths must be restricted to thepaths that start with a specific state s. Hence, PrM

s (Θ1 U Θ2) abbreviates PrM

s (Πs↑) withΠ = FinPathsM

(Θ1 U Θ2). Also, PrM

s (F Θ2) abbreviates PrM

s (ΘMU Θ2) for those

cases without any restrictions on the targets that must be satisfied before reaching a targetin Θ2, i.e., Θ1 = ΘM.This subsection presents an algorithm to calculate the probability PrM

s (Θ1 U Θ2). First,however, an auxiliary operator is defined. The auxiliary operator is not intended to beimplemented but used for the proofs. The final operator gets deduced from the auxiliaryoperator.Two sets are important in the definition of the auxiliary operator: θ | πθ ∈ Π are

the targets of a given anchored path π that are covered byΠ ⊆ FinPathsM, and the other

targets of π are θ | πθ /∈ Π.Let Π ⊆ FinPathsM

be the anchored paths of interest, then the auxiliary (iteration)

operator F : ((FinPathsM

\ Π) → [0, 1]) → (FinPathsM

\ Π) → [0, 1])) is given by

F (p)(π) =

∑

θ | πθ∈Π

µτ(π)(θ)

+

∑

θ | πθ /∈Π

µτ(π)(θ) · p(πθ)

.

59


Figure 5.2.: Unwound state space

The idea of the iteration operator is that if applied iteratively on a start value, it finallydelivers the probabilities for each π ∈ FinPathsM

\ Π to reach a path in Π given that

the path π has already been taken. This resembles the idea of the posterior probability.Figure 5.2 illustrates the idea by example: suppose, only the path [s1, θ5 θ4 θ4] is of interest,and the path [s1, θ5] has already been taken, then the fixed point of the operator shouldreturn the probability 1 for this path, because from this point on, the probability to reachΠ is 1; in contrast, the path [s1, θ2] never can reach a path in Π, therefore, the probabilityshould be 0 for that path.The function 0 : (FinPathsM

\ Π) → [0, 1] with 0(π) = 0 serves as start value for

the iteration.Note that the function space (FinPathsM

\ Π) → [0, 1] is a complete partially ordered

set itself with the function 0 : (FinPathsM

\ Π) → [0, 1], 0(π) = 0 as bottom element,and where the order-relation is defined pointwise.Therefore, 0 as bottom element seems to be a reasonable start value, which is confirmed

by a later proof. Appendix B.2 provides details about the notation and more mathematicalbackground about complete partially ordered sets.The next example provides some intuition for the iteration operator.

Example 5.4. LetM be the labeled Markov chain of figure 5.1, and letΠ = [s1, θ5 θ4 θ4]be the paths of interest.The values of F (0)i(π) in iteration 1 ≤ i ≤ 4 are given in table 5.1: the table header

contains the number of the iteration, and the first column the anchored finite path π.Entries with a value of 0 are left blank to make the table visually clearer. Remind that theseare not the probabilities of the corresponding paths; they rather deliver the probabilities to

60


1 2 3 4

[s1,∅] 0.7 0.7[s2,∅]

[s1, θ5] 1.0 1.0 1.0

[s1, θ5 θ4] 1.0 1.0 1.0 1.0

Table 5.1.: Example for the iteration operator with Π = [s1, θ5 θ4 θ4]

reach a path inΠ given that the path π has already been taken. Note that after 3 iterations,the entries of [s1,∅] equal PrM

s1(Π↑).

Let pi(π) be a shorthand for F (0)i(π). The entry of [s1, θ5 θ4] in iteration 1 can becalculated by

p1([s1, θ5 θ4])

=

∑

θ | [s1,θ5 θ4]θ∈Π

µτ([s1,θ5 θ4])(θ)

+

∑

θ | [s1,θ5 θ4]θ /∈Π

µτ([s1,θ5 θ4])(θ) · p0([s1, θ5 θ4]θ)

=

∑

θ∈θ4

µs3(θ)

+

∑

θ∈∅

µs3(θ) · p0([s1, θ5 θ4 θ])

= µs3(θ4) = 1 .

And the entry of [s1,∅] in iteration 1 can be calculated by

p1([s1,∅])

=

∑

θ | [s1,∅]θ∈Π

µτ([s1,∅])(θ)

+

∑

θ | [s1,∅]θ /∈Π

µτ([s1,∅])(θ) · p0([s1,∅]θ)

=

∑

θ∈∅

µs1(θ)

+

∑

θ∈θ1,θ2 θ5

µs1(θ) · p0([s1, θ])

= µs1(θ1) · p0([s1, θ1]) + µs1(θ2) · p0([s1, θ2]) + µs1(θ5) · p0([s1, θ5]) = 0 .

Example 5.5. In this example, the paths of interest are given using the temporal op-erator F . Let M be the labeled Markov chain of figure 5.1, and let Π = F θ3 =[s2, θ3], [s1, θ2θ3], [s1, θ1 θ2 θ3], [s1, θ1 θ1 θ2 θ3], . . . be the paths of interest. The valuesof F (0)i(π) in iteration 1 ≤ i ≤ 4 are given in table 5.2.Note that all entries in the table ending with state s1 (all π where τ(π) = s1) have

the same values. The reason is that θ | πθ ∈ Π and θ | πθ /∈ Π always returnthe same targets for these paths due to Π. This property is used later in the proof oftheorem 5.10. Even if no fixed point is reached after 4 iterations, the entries of [s1,∅] aresteadily increasing with each iteration and lower than PrM

s1(Π↑).

61


1 2 3 4

[s1,∅] 0.2 0.22 0.222[s2,∅] 1.0 1.0 1.0 1.0[s3,∅]

[s1, θ1] 0.2 0.22 0.222[s1, θ2] 1.0 1.0 1.0 1.0

[s1, θ1 θ1] 0.2 0.22 0.222[s1, θ1 θ2] 1.0 1.0 1.0 1.0

[s1, θ1 θ1 θ1] 0.2 0.22 0.222[s1, θ1 θ1 θ2] 1.0 1.0 1.0 1.0

Table 5.2.: Example for the iteration operator with Π = F θ3

The entry of [s1,∅] in iteration 4 can be calculated by

p4([s1,∅])

=

∑

θ | [s1,∅]θ∈Π

µτ([s1,∅])(θ)

+

∑

θ | [s1,∅]θ /∈Π

µτ([s1,∅])(θ) · p3([s1,∅]θ)

=

∑

θ∈∅

µs1(θ)

+

∑

θ∈θ1,θ2 θ5

µs1(θ) · p3([s1, θ])

= µs1(θ1) · p3([s1, θ1]) + µs1(θ2) · p3([s1, θ2]) + µs1(θ5) · p3([s1, θ5])

= 0.1 · 0.22+ 0.2 · 1.0+ 0.7 · 0 = 0.222 .

To calculate the probability of anchored paths with a length of 3, the probabilities of someanchored paths with a length of 4 are needed which are not shown in table 5.2.

Thus, the actual probability of the paths of interestΠ starting in a certain state s can alsobe given by p∗([s,∅]) = Prs(π∗

| [s,∅]π∗

∈ Π↑) = Pr τ([s,∅])(π∗

| [s,∅]π∗

∈ Π↑).

Two lemmas are provided and the definition of path junctions to keep the upcomingdefinitions concise. Given two anchored paths π1

= [s1, θ11 . . . θ1m] and π2

= [s2, θ21 . . . θ2n]

with τ(π1) = s2, let π1

π2

= [s1, θ11 . . . θ1m θ21 . . . θ2n] be the junction of the paths. Forπ ∈ Π, let π∗

| ππ∗

∈ Π be those paths of interest that start at the end of π.

Lemma 5.6. Let M be a generic Markov chain, Π ⊆ FinPathsM, and π ∈ Π. Then

Pr τ(π)(π∗

| ππ∗

∈ Π↑) = 1.

Figure 5.3 gives an intuition that lemma 5.6 is correct: In the example, π = [s, θ1 θ2]is an anchored path of Π (illustrated by the light gray area). The cylinder of the residual

62


Figure 5.3.: Intuition for lemma 5.6

π∗

= [τ(π),∅] contains all infinite paths InfPathsMτ(θ2)that start from τ(π∗

) = τ(θ2), as

illustrated by the dark gray area. 2 This is the full sample space, and therefore, the probabilityis 1. The detailed proof is in the supplementary document in the S#-repository [ISSE18].

Lemma 5.7. LetM be a generic Markov chain, Π ⊆ FinPathsM, π ∈ FinPathsM

, and

θ ∈ θ | πθ ∈ Π. Then Pr τ(θ)(π∗

| πθπ∗

∈ Π↑) = 1.

Proof. Use lemma 5.6.

Figure 5.4.: Example for lemma 5.7

Example 5.8. Let (S, Θ, τ, R) be the generic Markov chain from Figure 5.4. Let Π =[s0, θ1 θ2], and let π = [s0, θ1].Then, Pr τ(θ2)(π∗

| πθ2π

∗

∈ Π↑) = 1. This can be derived directly by applyinglemma 5.7, because πθ2 ∈ Π. This probability can also be derived the old fashioned wayas follows. The paths of interest are

π∗

| [s0, θ1]θ2π∗

∈ Π = π∗

| [s0, θ1 θ2]π∗

∈ Π = [s2,∅] .

2The asterisk (∗) is used as index for the residual

63


The cylinder of the paths of interest is

π∗

| [s0, θ1]θ2π∗

∈ Π↑ = [s2, θ4 θω5 ] .

Thus, the probability is

Pr τ(θ2)(π∗

| [s0, θ1]θ2π∗

∈ Π↑) = Pr τ(θ2)([s2, θ4 θω5 ]) = 1 .

To provide an example, where the lemma cannot be applied, let π = [s0,∅]:

Pr τ(θ1)(π∗

| πθ1π∗

∈ Π↑) = 0.3 .

The next proposition is the centerpiece for the unbounded model checking algorithm.

Proposition 5.9. LetM = (S, Θ, τ, R) be a genericMarkov chain, and letΠ ⊆ FinPathsM

be the paths of interest. For π ∈ FinPathsM

\ Π, let p∗ : FinPathsM

\ Π → [0, 1] withp∗(π) = Pr τ(π)(π∗

| ππ∗

∈ Π↑). Then, p∗ is the least fixed point of the operator

F : (FinPathsM

\ Π → [0, 1]) → (FinPathsM

\ Π → [0, 1]) which is given by

F (p)(π) = ∑

θ | πθ∈Π

µτ(π)(θ)

+

∑

θ | πθ /∈Π


.

Proof. The operator F is monotone and preserves suprema; hence, it has a fixed pointaccording to proposition B.1 3. Let f be the least fixed point of F .

“f ≤ p∗”: Let π ∈ FinPathsM

\ Π.We decompose π∗

| ππ∗

∈ Π into disjoint sets. Let θ ∈ Θ, then [τ(π), θ]π∗

|

πθπ∗

∈ Π is the finite set of path suffixes that correspond to paths of interest.The set π∗

| ππ∗

∈ Π can be split into the subsets each corresponding to a target

θ ∈ θ | πθ ∈ Π ∪ θ | πθ /∈ Π respectively. Thus, π∗

| ππ∗

∈ Π is equal tothe disjoint union of sets

π∗

| ππ∗

∈ Π =

∪

θ | πθ∈Π

[τ(π), θ]π∗

| πθπ∗

∈ Π

∪

∪

θ | πθ /∈Π

[τ(π), θ]π∗

| πθπ∗

∈ Π

.

Note that this decomposition only works because π /∈ Π. Otherwise, [τ(π),∅] ∈π∗

| ππ∗

∈ Π is missing in the decomposition.

3A brief excerpt of order theory is provided in appendix B.2.

64


For θ ∈ Θ, we derive by term expansion, applying lemma 5.3, and using basic set theory

Pr τ(π)([τ(π), θ]π∗

| πθπ∗

∈ Π↑)

= Pr τ(π)([τ(π), θ θ1 θ2 . . . θn] |

[τ(θ), θ1 θ2 . . . θn] ∈ π∗

| πθπ∗

∈ Π↑)

= µτ(π)(θ) · Pr τ(θ)([τ(θ), θ1 θ2 . . . θn] |

[τ(θ), θ1 θ2 . . . θn] ∈ π∗

| πθπ∗

∈ Π↑)

= µτ(π)(θ) · Pr τ(θ)(π∗

| πθπ∗

∈ Π↑).

For θ | πθ ∈ Π, we know by using Lemma 5.7 that


| πθπ∗

∈ Π↑)

= µτ(π)(θ) · Pr τ(θ)(π∗

| πθπ∗

∈ Π↑)

= µτ(π)(θ) .

For θ | πθ /∈ Π, the function p∗ is properly defined for πθ; hence,


| πθπ∗

∈ Π↑)

= µτ(π)(θ) · Pr τ(θ)(π∗

| πθπ∗

∈ Π↑)

= µτ(π)(θ) · Pr τ(πθ)(π∗

| πθπ∗

∈ Π↑)

= µτ(π)(θ) · p∗(πθ).

Thus,

p∗(π)

= Pr τ(π)(π∗

| ππ∗

∈ Π↑)

= Pr τ(π)

∪

θ | πθ∈Π

[τ(π), θ]π∗

| πθπ∗

∈ Π↑

∪

∪

θ | πθ /∈Π

[τ(π), θ]π∗

| πθπ∗

∈ Π↑

=

∑

θ | πθ∈Π

µτ(π)(θ)

+

∑

θ | πθ /∈Π

µτ(π)(θ) · p∗(πθ)

= F (p∗)(π).

Thus, p∗ is a fixed point of F . Knowing that f is the least fixed point of F , we concludethat f(π) ≤ p∗(π) for all π ∈ FinPathsM

\ Π.

65


“p∗ ≤ f”: Notice that

π∗

| ππ∗

∈ Π ∧ |π∗| ≤ 0

⊆ π∗

| ππ∗

∈ Π ∧ |π∗| ≤ 1

⊆ π∗

| ππ∗

∈ Π ∧ |π∗| ≤ 2

⊆ . . .

and π∗

| ππ∗

∈ Π =∪

k π∗

| ππ∗

∈ Π ∧ |π∗| ≤ k. Thus, p∗ = limk→∞ pk. By

induction on k, we get pk(π) ≤ f(π) for all π ∈ FinPathsM

\ Π. Thus, p∗ ≤ f .

From p∗ ≥ f and f ≥ p∗, we conclude that p∗ = f is the least fixed point of F .

Finally, the next theorem introduces an iteration operator for GMCs that is indeed im-plementable. Its proof relies on the previous proposition.

Theorem 5.10. LetM be a generic Markov chain, let Π = FinPathsM

(Θ1 U Θ2), and letF : (S → [0, 1]) → (S → [0, 1]) be the iteration operator which is given by

F (p)(s) =

∑

θ∈Θ2

µs(θ)

+

∑

θ∈Θ1\Θ2

µs(θ) · p(τ(θ))

.

Then, PrM

s (Θ1 U Θ2) = lfp F (s).

Proof. Given Π = FinPathsM

(Θ1 U Θ2), Let Π′ be those finite anchored paths of theMarkov chain that adhere to Θ1 \ Θ2, i.e., Π′ = [s, θ1 . . . θn] ∈ FinPathsM

| ∀1 ≤ i ≤

n. θi ∈ Θ1 \ Θ2.Letπ ∈ Π′. Notice that π∗

| ππ∗

∈ Π = π∗

| [τ(π),∅]π∗

∈ Π. Thus, p∗(π) =

p∗([τ(π),∅]). For s ∈ S let 0(s) = 0. By induction on k, we know F k(0)(π) =F k(0)([τ(π),∅]) = F k(0)(τ(π)).By using proposition 5.9, we finally get PrM

s (Θ1 U Θ2) = p∗([s,∅]) = lfp F ([s,∅]) =lfp F (s).

The next subsection shows that the iteration operator of the unbounded case can beused for the bounded case as well.

5.1.3. Bounded Reachability

Analogously to the unbounded probability to reach certain certain paths, which was de-scribed in the previous subsection, a bounded variant can be defined, i.e., the probabilityto reach specific targets in a bounded number of steps, i.e., those paths that reach targetsΘ2 ⊆ ΘM being restricted to the targets Θ1 ⊆ ΘM before, and having a maximal length

66


1 2 3 4

[s1,∅] 0.2 0.22 0.22[s2,∅] 1.0 1.0 1.0 1.0[s3,∅]

[s1, θ1] 0.2 0.2 0.2[s1, θ2] 1.0 1.0 1.0 1.0

[s1, θ1 θ1]

[s1, θ1 θ2] 1.0 1.0 1.0 1.0

[s1, θ1 θ1 θ1]

[s1, θ1 θ1 θ2]

Table 5.3.: Example for the iteration operator with Π = F≤3 θ3

of k ∈ N0. Or more formally (introducing the bounded temporal until operator U≤k enpassant),

FinPathsM

(Θ1 U≤k Θ2) = [s, θ1 . . . θn] ∈ FinPathsM

|

n ≤ k ∧ θn ∈ Θ2 ∧ ∀1 ≤ i < n . θi ∈ Θ1 .

As in the unbounded case, the computation of those probabilities cannot directly bereduced to the computation of the probability of the set of paths of interest. A restrictionto those paths that start with a specific s is necessary. Subsequently, PrM

s (Θ1 U≤k Θ2)abbreviates PrM

s (Πs↑) with Π = FinPathsM

(Θ1 U≤k Θ2). For those cases in whichΘ1 = ΘM, i.e., those cases without any restrictions on the targets that must be satisfiedbefore reaching a target in Θ2, PrM

s (F≤k Θ2) abbreviates PrM

s (ΘMU≤k Θ2).

The auxiliary iteration operator F of the unbounded case also serves as foundation forthis algorithm. This is demonstrated in the next example.

Example 5.11. In this example, the paths of interest are given using the temporal oper-ator F≤k . Let M be the labeled Markov chain of figure 5.1, and let Π = F≤3 θ3 =[s1, θ1 θ2 θ3], [s1, θ2 θ3], [s2, θ3], [s1, θ2 θ3 θ3], [s2, θ3 θ3], [s2, θ3, θ3, θ3] be the paths of in-terest. The values of F (0)i(π) in iteration 1 ≤ i ≤ 4 are given in table 5.3. Note thatafter 3 iterations, the entries of [s1,∅] equal PrM

s1(FinPathsM

(F≤3 θ3)).

The next proposition is the centerpiece for the bounded model checking algorithm.

Proposition 5.12. LetM = (S, Θ, τ, R) be a generic MC, and letΠ ⊆ FinPathsMbe the

paths of interest.For π ∈ FinPathsM

\ Π, let pn(π) = PrM

τ(π)(π∗

| ππ∗

∈ Π ∧ |π∗| ≤ n↑), and let

67


F : (FinPathsM

\ Π → [0, 1]) → (FinPathsM

\ Π → [0, 1]) be the iteration operatorwhich is given by

F (p)(π) =

∑

θ | πθ∈Π

µτ(π)(θ)

+

∑

θ | πθ /∈Π


.

Then, F n(0)(π) = pn(π) .

Proof. By induction. For the base case n = 0: π∗

| ππ∗

∈ Π ∧ |π∗| ≤ 0 = ∅. Hence,

p0(π) = PrM

τ(π)(π∗

| ππ∗

∈ Π ∧ |π∗| ≤ 0↑) = 0 = F (0)(π) . For the induction

step n = k + 1, π∗

| ππ∗

∈ Π ∧ |π∗| ≤ k + 1 can be decomposed into disjoint sets.

Use analogue reasoning as proof of proposition 5.9.

Finally, the next theorem introduces an iteration operator for GMCs that is indeed im-plementable. Its proof relies on the previous proposition.

Theorem 5.13. LetM = (S, Θ, τ, R) be a generic Markov chain, letΠ = FinPathsM

(Θ1

U≤k Θ2), and for s ∈ S let 0(s) = 0, and let F : (S → [0, 1]) → (S → [0, 1]) be theiteration operator which is given by

F (p)(s) =

∑

θ∈Θ2

µs(θ)

+

∑

θ∈Θ1\Θ2


.

Then, PrM

s (Θ1 U≤k Θ2) = F k(0)(s).

Proof. Given Π = FinPathsM

(Θ1 U≤k Θ2), Let Π′ be those finite anchored paths of theMarkov chain that adhere to Θ1 \ Θ2, i.e., Π′ = [s, θ1 . . . θn] ∈ FinPathsM

| ∀1 ≤ i ≤

n. θi ∈ Θ1 \ Θ2.For all π ∈ Π′ ∧ |π| ≤ k − i,

(1) F i(0)(π) = F i(0)(τ(π)) .

By induction. For the base case: F 0(0)(π) = F 0(0)(τ(π)). Trivial because 0(π) =0(τ(π)). For the induction step (i = i′+1): From the definition ofΠ′ and the assumption|π| ≤ k − i, we derive θ | πθ ∈ Π = Θ2 and θ | πθ /∈ Π = Θ1 \ Θ2. Hence,

F i′+1(0)(π)

= F (F i′

(0))(π)

=

∑

θ | πθ∈Π

µτ(π)(θ)

+

∑

θ | πθ /∈Π

µτ(π)(θ) · F i′

(0)(πθ)

=

∑

θ∈Θ2

µτ(π)(θ)

+

∑

θ∈Θ1\Θ2

µτ(π)(θ) · F i′

(0)(τ(θ))

= F i′+1(0)(τ(π)) .

68


Now, let i = k. The equation (5.1.3) can be used for [s,∅] because |[s,∅]| = 0 ≤ k − i.By using proposition 5.12, we finally getPrM

s (Θ1 U≤k Θ2) = pk([s,∅]) = F k(0)([s,∅]) =F k(0)(s).

5.1.4. Generic Markov Chains with Initial Distribution

target label

θ0 e0θ1 e0θ2 e0θ3 e0, e1θ4θ5 e0

Figure 5.5.: Labeled Markov chain initial distribution and 3 states

Definition 5.14 (Generic Markov Chain with initial distribution). A generic Markov chainwith initial distributionM is represented by the tuple (S, Θ, τ, R, µinit) consisting of

• a finite set S of states, and• a finite set Θ of targets with a target state function τ : Θ → S, and• a transition distribution function R : S → Dists(Θ), and• a initial probability distribution µinit ∈ Dists(Θ),

such that Dists(Θ) = µ : Θ → [0, 1] |∑

θ∈Θ µ(θ) = 1.

Analogue to GMCs, there exists also a labeled manifestation of generic Markov Chainwith initial distribution. Figure 5.5 extends the LMC of figure 5.1 with an initial distribution.Let M = (S, Θ, R, µinit) be a GMC with initial distribution. The definitions for the

generic Markov chain (S, Θ, τ, R) are also valid forM. In addition to it, new definitionsrelated to the initial probability distribution are introduced. An initial finite path πinit =θ0 θ1 . . . θn ∈ Θ∗ of M is a sequence where µM

init(θ0) > 0 and [τ(θ0), θ1 . . . θn] is an

anchored finite path. FinPathsMinit

is the set of all initial finite paths ofM. An initial infinitepath πinit = θ0 θ1 θ2 . . . ∈ Θω ofM is a sequence where µM

init(θ0) > 0 and [τ(θ0), θ1 θ2 . . . ]

is an anchored infinite path. InfPathsMinit

is the set of all initial paths ofM. The cylinder setπinit

↑M of the finite path πinit ∈ FinPathsMinit

contains all infinite paths of M that startwith πinit . More formally, πinit

↑M = π′init

∈ InfPathsM | πinit ∈ Pref (π′init

).

Definition 5.15 (Probability Space of a generic Markov chain with initial distribution). Theprobability space (ΩM,EM, PrM) of a generic Markov chain with initial distributionM =(S, Θ, τ, R, µinit) consists of

• the sample space ΩM = FinPathsMinit, and

69


• the events EM as the smallest σ-algebra on ΩM that contains πinit↑M | πinit ∈

FinPathsM, and• the probability measure PrM with

PrM(θ0 . . . θn↑M) = µM

init(θ0) ·

∏

0≤i<n

µτM(θi)(θi+1).

The definitions of FinPathsM

(Θ1 U≤k Θ2) and FinPathsM

(Θ1 U Θ2) can be lifted togeneric Markov chains with initial transitions:

FinPathsMinit

(Θ1 U≤k Θ2) = θ0 . . . θn ∈ FinPathsMinit

|

n ≤ k ∧ θn ∈ Θ2 ∧ ∀0 ≤ i < n . θi ∈ Θ1 , and

FinPathsMinit

(Θ1 U Θ2) = θ0 . . . θn ∈ FinPathsMinit

|

θn ∈ Θ2 ∧ ∀0 ≤ i < n . θi ∈ Θ1.

Note, that the initial transition does not affect the bound k.Subsequently, PrM(Θ1 U≤k Θ2) abbreviates PrM(FinPathsM

init(Θ1 U≤k Θ2)). For the

unbounded case, PrM(Θ1 U Θ2) abbreviates PrM(FinPathsMinit

(Θ1 U Θ2)). For thosecases without any restrictions on the targets that must be satisfied before reaching a targetin Θ2, i.e., Θ1 = ΘM, also the abbreviations PrM(F≤k Θ2) = PrM(ΘM

U≤k Θ2) andPrM(F Θ2) = PrM(ΘM

U Θ2) are introduced.It is easy to see that

PrM(Θ1 U≤k Θ2) =

∑

θ∈Θ2

µM

init(θ)

+

∑

θ∈Θ1\Θ2

µM

init(θ) · PrM

τ(θ)(Θ1 U≤k Θ2)

,

and

PrM(Θ1 U Θ2) =

∑

θ∈Θ2

µM

init(θ)

+

∑

θ∈Θ1\Θ2

µM

init(θ) · PrM

τ(θ)(Θ1 U Θ2)

.

In the upcoming chapters, Pr(·) abbreviatesPrM(·), and labeledMarkov chains with initialdistributions are simply referred to as labeled Markov chains, if it is clear from the context.

5.1.5. Comparison to Standard Markov Chains

Even if labeled Markov chains are only equally as expressive as standard Markov chains,they can be more compact, especially in the context of executable models.There are at least two ways to convert a 2E-labeled Markov chainM = (S, R, µinit)with

targets Θ = 2E × S into a standard Markov chainM′ = (S ′, R′, µ′init

).In the first way, each state of the standard Markov chain corresponds to either a state or

a target in the labeled Markov chain. Hence,

70


• S ′ = S ∪ Θ, and• R′(s) = µ with µ((Λ, s)) = R(s)(Λ′, s′) and 0 otherwise, and• R′((Λ, s)) = µ with µ(s) = 1 and 0 otherwise,• a transition distribution function R : S → Dists(Θ), and• µ′

init(s) = 0, and

• µ′init

((Λ, s)) = µinit(Λ, s).This transformation has several drawbacks: it doubles the number of required iterations,it increases the number of states considerably, and also formulas using the until operatorneed to be adopted. Therefore, this transformation is omitted in the remainder.The other transformation is more interesting: targets in the labeled Markov chains are

“shifted” to states in the standard Markov chains,• S ′ = Θ, and• R′((Λ, s)) = µ with µ((Λ′, s′)) = R(s)(Λ′, s′), and• µ′

init(Λ, s) = µinit(Λ, s).

Subsequently, the transitions of a state are called those targets for which the target distri-bution function returns a value greater zero. Note that transitions and targets are different.Table 5.4 and table 5.5 show how well labeled Markov chains perform compared againststandard Markov chains. The standard Markov chains have been obtained by applyingthe “shifting” transformation. The entry “exceeded” denotes that the memory limit wasexceeded during the transformation.In table 5.4 the labeling contains only the hazard. If only one formula is of interest,

labeled Markov chains have no advantage over standard Markov chains. Table 5.5 containsthe results when both the hazard and the occurrence of faults (denoted by Observed Faults)are contained in the labeling. In this case, the labeled Markov chain is almost always moreefficient, in the Railroad case study, labeled Markov chains were even necessary to checkmore formulas at once. In the Railroad case study, most of the time it makes no differenceif a fault occurs or not; it only leads to a different labeling. For such systems, labeledMarkovchains are really advantageous (factor 7.08). In contrast, in the Height Control case study,the occurrence of a fault has almost always an effect of the successor state, but the labeledMarkov chain is still clearly smaller (factor 1.67). In the Hemodialysis case study, mostfaults are permanent and therefore also the improvement shrinks (factor 1.03). To sumit up, labeled Markov chains are useful when analyzing more formulas at the same time,especially when the model contains transient faults. Hence, they are valuable especially forthe safety analysis, where different faults are of interest at the same time.

71



Number of Hazards 1 2 2 1 3 1Number of Faults 7 12 9 3 3 1Observed Faults 7 1 9 3 3 1

Labeled MCStates 2,587,933 2,186,964 294,871 156 25 30Transitions 26,221,633 249,842,821 3,372,067 416 56 51

Standard MCState 18,328,393 3,662,761 302,977 156 29 30Transitions exceeded 410,089,816 3,415,819 416 62 51

State Ratio 7.08 × 1.67 × 1.03 × 1.0 × 1.16 × 1.0 ×

Transition Ratio − 1.64 × 1.01 × 1.0 × 1.11 × 1.0 ×

Table 5.4.: Comparison of labeled Markov chains with standard Markov chains when faultsand hazards are observable



Labeled MCStates 2,587,933 2,186,964 294,871 156 25 30Transitions 9,100,662 187,511,131 3,372,067 416 56 51

No faults observed vs. faults observed (table 5.4)State Ratio 1.0 × 1.0 × 1.0 × 1.0 × 1.0 × 1.0 ×

Transition Ratio 2.88 × 1.33 × 1.0 × 1.0 × 1.0 × 1.0 ×

Standard MCStates 2,587,933 2,186,964 294,871 156 25 30Transitions 9,100,662 187,511,131 3,372,067 416 56 51State Ratio 1.0 × 1.0 × 1.0 × 1.0 × 1.0 × 1.0 ×

Transition Ratio 1.0 × 1.0 × 1.0 × 1.0 × 1.0 × 1.0 ×

Table 5.5.: Comparison of labeled Markov chains with standard Markov chains when onlyhazards are observable

72

5.2. Choice-Aware Markov Decision Processes


After the explanations of the generic Markov chains, choice-aware Markov decision process(CMDPs) are introduced as the second probabilistic system. CMDPs are well-suited formodels that comprise both probabilistic and nondeterministic choice. Before CMDPs aredefined, its choices and its choice transition function are introduced, which enable CMDPsto have multiple consecutive nondeterministic and probabilistic choices in any order.Let Θ be a finite set of targets, and C a finite set of choices, and Dists(C) = µC : C →

[0, 1] |∑

c∈C µC(c) = 1 be the probability distributions for the choices C. Furthermore,let succ : C → Θ + Dists(C) + 2C be the choice transition function that maps each choiceto either a target, a probability distribution, or a non-empty set of choices. A set of choicesis used to model nondeterminism. The sequence of choices ϱ = c0 c1 . . . cn ∈ C

∗ is a ϱchoice path of the choice transition function succ if for all non-final choices, succ returnsa probability distribution with a probability greater 0 to its successor (probabilism), or succ

returns a set of choices that includes the successor (nondeterminism); more formally, forall 0 ≤ i < n either µC = succ(ci) with µC(ci+1) > 0 or ci+1 ∈ succ(ci). A choice pathc0 c1 . . . cn with n ≥ 1 forms a cycle if the c0 = cn. A choice transition function succ isacyclic, if there exists no choice path of succ that forms a cycle. A choice path c0 c1 . . . cn

is terminal if succ(cn) ∈ Θ. Note that it makes sense to call an acyclic choice transitionfunction also terminal or finite, because applying the function iteratively on its result finallyterminates.

Definition 5.16 (Choice-aware Markov Decision Process). A choice-aware Markov deci-sion processM is represented by the tuple (S, Θ, τ,C, succ, C) consisting of

• a finite set S of states, and• a finite set Θ of targets with a target state function τ : Θ → S, and• a finite set C of choices with an acyclic choice transition function succ : C → Θ +

Dists(C) + 2C, and• a function C : S → C that determines the root choice of each state,

where Dists(C) = µC : C → [0, 1] |∑

c∈C µC(c) = 1.

By convention SM stands forM’s states, ΘM forM’s targets, and so on. This is used inthe remainder, when the corresponding choice-aware Markov decision process is not clearfrom the context.This work introduces another useful manifestation of choice-aware Markov decision pro-

cesses, the so called labeled choice-aware Markov decision processes. An L-labeled choice- L-labeledCMDPaware Markov decision processes is a choice-aware Markov decision process where Θ is chosen

to be L × S for some set of labels L with τ(ℓ, s) = s.Figure 5.6 depicts an example of a 2e0,e1-labeled CMDP. The round-bordered smaller

squared nodes represent choices, and the larger squared nodes with a number in the bracketsrepresent the states, e.g., state node [1] represents s1. The brackets in state nodes make it

73


target label

θ1 e0θ2 e0θ3 e0, e1θ4θ5 e0

Figure 5.6.: Example of labeled choice-aware Markov decision process

easier to distinguish them from choice nodes. The root choice of a state is depicted bya sole arc from the corresponding state node to a choice node. If the choice transitionfunction returns a probabilistic distribution for a choice, the outgoing, white-headed arcsfrom the choice depict the distribution (with the probabilities as labels), e.g., the outgoingarcs from choice node 1 to choice nodes 2 and 3. Black-headed arcs without labels are usedfor nondeterministic choices, e.g., 2 to choice nodes 4 and 5. Black-headed arcs with targetsas labels are used for target choices; the destination node of such an arc is the correspondingstate node of the target’s target state, e.g., choice node 4 to state node [1]. The labels ofeach target are given in the table. This labeled CMDP can be used to calculate the worstcase and the best case probability that eventually a label like e0 is reached from a specificstate. How this can be formally expressed and calculated is topic of the remainder of thissubsection.

5.2.1. Paths and Probability Measure

LetM = (S, Θ, τ,C, succ, C) be a CMDP with choice transition function succ.A choice path ϱ = c0 c1 . . . cn ∈ C

∗ ofM is a choice path of succ. PathsMchoice

is theset of all choice paths of M. Given a choice path ϱ = c0 c1 . . . cn, let ϱ[i] = ci denotePathsM

choice

ϱ[i] the i-th choice for all 0 ≤ i ≤ n. Given a terminal choice path ϱ = c0 c1 . . . cn, let τ(ϱ)be the choice path target of ϱ, i.e., τ(ϱ) = succ(cn); this is properly defined, because in aterminal choice path succ(cn) ∈ Θ. Let sM θ (read as ‘s leads to θ’) be satisfied, if andonly if there exists a terminal choice path ϱ with C(s) = ϱ[0] and τ(ϱ) = θ.A finite path π = θ1 θ2 . . . θn ∈ Θ∗ ofM is a sequence of targets where τ(θi)M θi+1

for all 1 ≤ i < n. The empty sequence ∅ is also a valid finite path. FinPathsM is the setπof all finite paths ofM. An anchored finite path π = [s, θ1 θ2 . . . θn] ∈ S × FinPathsMπ

of M is a pair of a state and a finite path where s M θ1. Let |π| = n be the path|π|length of π = [s, θ1 θ2 . . . θn]. FinPathsM

is the set of all anchored finite paths of M.

FinPathsMs = (s × FinPathsM) ∩ FinPathsMis the set of all anchored finite paths

starting from s. Let τ : FinPathsM

→ SM be the last state of an anchored finite path[s, θ1 . . . θn], i.e., τ([s,∅]) = s and τ([s, θ1 . . . θn]) = τ(θn) otherwise. Given an anchored

74


finite path π = [s, θ1 . . . θn] and a target θ, let πθ = [s, θ1 . . . θn θ] be the concatenationof the finite path with the target.An infinite path π = θ1 θ2 θ3 . . . ∈ Θω of M is a sequence of choice paths where

τ(θi) M θi+1 for all i ≥ 1. InfPathsM is the set of all infinite paths of M. An an- ππchored infinite path π = [s, θ1 θ2 θ3 . . . ] ∈ S × InfPathsM of M is a pair of a state

and an infinite path where s M θ1. InfPathsMis the set of all anchored infinite paths

ofM. InfPathsMs = (s × InfPathsM) ∩ InfPathsMis the set of all anchored infinite

paths starting from s. Let π[..k] = θ1 θ2 . . . θk denote the k-th prefix of π = θ1 θ2 θ3 . . . π[..k]and let Pref (π) = π′ | π′ is prefix of π denote the set of all prefixes of π. Also, letPref ([s, π]) = [s, π] | π ∈ Pref (π) denote the set of all anchored prefixes of [s, π].In the remainder, both finite and infinite paths are sometimes just called paths when it

is clear from the context which one is meant.A probability measure cannot be defined directly for a CMDP, because a nondeterminis-

tic choice is not a “probabilistic construct”. With a so called scheduler, the nondeterminismin a CMDP can be resolved, which results in a GMC with a properly defined probabilitymeasure. GMCs that are the result of such schedulers provide a view onto the CMDP fromthe schedulers’ perspective. By using different schedulers, a broader view onto the CMDPcan be obtained. By consulting all schedulers, minimum and maximum probabilities canbe derived. There are two kinds of schedulers in CMDPs: choice scheduler, which resolve thenondeterministic choices of choice paths, and step scheduler, which select the choice schedulerfor each step between two states.

Definition 5.17 (Choice Scheduler of a Choice-aware Markov Decision Process). LetM =(S, Θ, τ,C, succ, C) be a CMDP. A choice scheduler for M is a partial function c : C C

such that for all c ∈ C, succ(c) ∈ 2C implies c(c) ∈ succ(c).

Let C(M) be the finite set of all possible choice schedulers forM; let C denote C(M) C

if M is clear from the context. Note that the numbers of C(M) is finite, because C isfinite and there is only a limited number of combinations of possible successors for thenondeterministic choices. The choice path ϱ = c0 c1 . . . cn ofM is a c-path if and only ifthe choice scheduler c agrees with all its nondeterministic choices, i.e., if and only if for all0 ≤ i < n, succ(ci) ∈ 2C implies c(ci) = ci+1.

Definition 5.18 (Markov Chain of a Choice Scheduler). LetM = (S, Θ, τ,C, succ, C) bea CMDP with choice transition function succ, and c a choice scheduler forM. The standardMarkov ChainMc induced by c is given byMc = (C ∪ Θ, R) where

• R(c)(succ(c)) = 1 if succ(c) ∈ Θ, and• R(c) = succ(c) if succ(c) ∈ Dists(C), and• R(c)(c′) = 1 if succ(c) ∈ 2C and c(c) = c′, and• R(θ)(θ) = 1, and• R(·)(·) = 0 otherwise.

75


For state s and choice scheduler c, let µc

s be the choice distribution with µc

s(θ) =PrMc

C(s)(F θ).

Definition 5.19 (Step Scheduler of a Choice-aware Markov Decision Process). LetM =(S, Θ, τ,C, succ, C) be a CMDP. A step scheduler for M is a function s : FinPathsM

→

(C C) such that for all anchored finite paths π ∈ FinPathsM, s(π) is a choice

scheduler forM.

Let S(M) be the countable set of all possible step schedulers for M; let S denoteS

S(M) if M is clear from the context. Let s c θ (read as ‘by adhering to c, s leads toθ’) be satisfied, if and only if there exists a terminal choice path ϱ with CM(s) = ϱ[0] andτ(ϱ) = θ and for all 1 ≤ i < n, succ(ci) ∈ 2C implies c(ci) = ci+1.The finite anchored path [s, θ1 θ2 . . . θn] ofM is a finite s-path if the scheduler s agrees

with all its nondeterministic choices, i.e., iff s c0 θ1 with c0 = s([s,∅]), and for all1 ≤ i < n, τ(θi) ci

θi+1 with ci = s([s, θ1 θ2 . . . θi]). Let FinPathsM,s

denote the setof all finite s-paths ofM, and FinPathsM,s

s = FinPathsMs ∩ FinPathsM,s

denote the setof all finite s-paths ofM starting from s.Analogously, the anchored infinite path π = [s, θ1 θ2 θ3 . . . ] of M is a s-path if the

scheduler s agrees with all its nondeterministic choices, i.e., iff sc0 θ0 with c0 = s([s,∅]),and for all 1 ≤ i, τ(θi)ci

θi+1 with ci = s([s, θ1 θ2 . . . θi]). Let InfPathsM,s

denote theset of all infinite s-paths ofM, and InfPathsM,s

s = InfPathsMs ∩ InfPathsM,s

denote theset of all infinite s-paths ofM starting from s.

Definition 5.20 (Markov Chain of a Step Scheduler). Let M = (S, Θ, τ,C, succ, C) bea CMDP and s a step scheduler for M. The standard Markov Chain M

s induced by s isgiven by (FinPathsM,s

, Rs) with the transition distribution function Rs : FinPathsM,s

→

Dists(FinPathsM,s

)whereRs(π) = µ and c = s(π) such thatµ(πθn+1) = µc

τ(π)(θn+1).

Each finite s-path [s, θ1 θ2 . . . θn] ∈ FinPathsM,s

ofM has a corresponding anchoredfinite path [[s,∅], [s, θ1] [s, θ1 θ2] . . . [s, θ1 θ2 . . . θn]] ∈ FinPathsM

s

in the MC M

s, andvice versa. Analogously, each infinite s-path π

∈ FinPathsM,s

of the CMDP M has a

corresponding anchored infinite path πcin the generic Markov chain M

s, and the otherway around.This correspondence makes it possible to use the probability measure PrMs

s induced byMC M

s to define the probability space of the choice-aware Markov decision process Mwith its associated step scheduler s.The cylinder set π

↑M,s of the anchored finite path π contains all anchored paths ofMπ↑M,s

that start with π and adhere to s. More formally, π↑M,s = π′

∈ InfPathsM,s

| π ∈

Pref (π′). Let Π ⊆ FinPathsM,s

s , then Π↑M,s =∪

π∈Π π↑M,s.

Definition 5.21 (Probability Space of a Choice-aware Markov Decision Process inducedby a Step Scheduler starting in a certain state). The probability space (ΩM,s

s ,EM,ss , PrM,s

s )

76


of a Choice-aware Markov Decision ProcessM induced by a Step Scheduler s starting instate s ∈ SM consists of

• the sample space ΩM,ss = InfPathsM,s

s , and• the events EM,s

s as the smallest σ-algebra on ΩM,ss that contains π

↑M,s | π ∈FinPathsM,s

s , and• the probability measure PrM,s

s with PrM,ss (π

↑M,s) = PrMs

[s,∅](πc

↑Ms

) where πcis

the corresponding path of π.

Note that the probability measure cannot generally be used on the union of paths ofdifferent step schedulers, even if the paths all start in the same state. Thus, given an arbitraryΠ ⊆ FinPathsM

, the probability of that paths cannot be calculated. But the probability of

the s-paths in Π that start from state s can be calculated. Therefore, the restriction to those Πsspaths is defined as Πss = Π ∩ FinPathsM,s

s .

5.2.2. Unbounded Reachability

Analogue to GMCs, this subsection finally defines the unbounded reachability for CMDPs;the paths can be denoted by

FinPathsM

(Θ1 U Θ2) = [s, θ1 . . . θn] ∈ FinPathsM

|

θn ∈ Θ2 ∧ ∀1 ≤ i < n . θi ∈ Θ1 .

Analogous to GMCs, the computation of this probability cannot directly be reduced tothe computation of the probability of the set of paths of interest. A restriction to the s-paths that start with a specific s is necessary. The typical abbreviations are introduced:PrM,s

s (Θ1 U Θ2) abbreviates PrM,ss (Πss↑M,s) with Π = FinPathsM

(Θ1 U Θ2), and

PrM,ss (F Θ2) abbreviates PrM,s

s (ΘMU Θ2).

Given two anchored paths π1

= [s1, θ11 . . . θ1m] and π2

= [s2, θ21 . . . θ2n]with τ(π1) = s2,

let π1π2

= [s1, θ11 . . . θ1m θ21 . . . θ2n] be the junction of the paths. Given step scheduler s andanchored finite path π, let s[π] : FinPathsMτ(π) → (C C) with s[π](π′

) = s(ππ′

) s[π]

be the step scheduler of the residual.Given step scheduler s, for π ∈ FinPathsM

, then π∗

| ππ∗

∈ Πs are those

s-paths of interest that start at the end of π.The set θ | πθ ∈ Π describes the targets of an anchored path that are covered by

Π ⊆ FinPathsM, and θ | πθ /∈ Π describes the other targets ofM.

Let 0 : (FinPathsM

\ Π) → [0, 1] be the bottom function space (FinPathsM

\ Π →[0, 1]) → (FinPathsM

\ Π → [0, 1]) with 0(π) = 0.

Given a certain step scheduler s, an iteration operator can be used to calculate the prob-abilities to reach certain paths of interest.

77


Lemma 5.22. LetM = (S, Θ, τ,C, succ, C) be a choice-aware Markov decision process,let s be a step scheduler, and let Π ⊆ FinPathsM

be the paths of interest. For π ∈

FinPathsM

\ Π, let p∗ : FinPathsM

\ Π → [0, 1] be given by

p∗(π) = PrM,s[π]τ(π) (π∗

| ππ∗

∈ Πs↑M,s[π]) ,

and let F s : (FinPathsM

\ Π → [0, 1]) → (FinPathsM

\ Π → [0, 1]) be the iterationoperator which is given by

F s(p)(π) = ∑

θ | πθ∈Π

µc

τ(π)(θ)

+

∑

θ | πθ /∈Π

µc

τ(π)(θ) · p(πθ)

with c = s(π) .

Then, p∗ is the least fixed point of the operator F s.

The proof of lemma 5.22 is rather technical and works by matching the definitions. Thedetailed proof can be found in the supplementary document [ISSE18].Next, an iteration operator for the minimal probability is introduced and a proof is pro-

vided that this operator can be used to calculate the minimal reachability probability.

Proposition 5.23. Let (S, Θ, τ,C, succ, C) be a choice-aware Markov decision process,and letΠ ⊆ FinPathsM

be the paths of interest. Let pmin

∗ : FinPathsM

\ Π → [0, 1] withpmin∗

pmin∗ (π) = inf

s∈S

PrM,s[π]τ(π) (π∗

| ππ∗

∈ Πs↑M,s[π]).

Then, pmin∗ is the least fixed point of the operator and let the iteration operator for the

minimal probability F min : (FinPathsM

\ Π → [0, 1]) → (FinPathsM

\ Π → [0, 1])which is given by

F min(p)(π) =

minc∈C

∑

θ | πθ∈Π

µc

τ(π)(θ)

+

∑

θ | πθ /∈Π

µc


.

Proof. F min is monotone and preserves suprema. Let fmin be the least fixed point of F min,i.e., fmin = lfp(F min).

“pmin∗ ≤ fmin”:

For each π ∈ FinPathsM

\ Π, we choose any cπ∈ C such that

fmin(π) =

∑

θ | πθ∈Π

µcπ

τ(π)(θ)

+

∑

θ | πθ /∈Π

µcπ

τ(π)(θ) ∗ fmin(πθ)

.

78


Let s be the step scheduler with s(π) = cπ.

Using lemma 5.22, we know that ps

∗ = PrM,s[π]τ(π) (π∗

| ππ∗

∈ Πs↑M,s[π]) is the

least fixed point of F s.Furthermore, we recognize that by the construction of s, F s(fmin) = fmin. Thus, fmin

is a fixed point of F s. We know that an arbitrary fixed point is greater equal the least fixedpoint, thus fmin ≥ lfp(F s).Also, we know p

min∗ ≤ p

s

∗. This can easily be seen by expanding the terms. Note thatthe step scheduler leading to pmin

∗ is the infimum of all step schedulers. Any possible stepscheduler like s must be greater or equal.Altogether, pmin

∗ ≤ ps

∗ = lfp(F s) ≤ fmin.

“fmin ≤ pmin∗ ”:

Let s ∈ S be an arbitrary step scheduler and π ∈ FinPathsM

\ Π an arbitrary an-chored finite path. Furthermore, let c = s(π) and let ps

∗ be the least fixed point of theoperator F s (using lemma 5.22). Applying the operator F s onto its fixed point does notchange the probability of any finite anchored path, thus,

ps

∗(π) =

∑

θ | πθ∈Π

µc

τ(π)(θ)

+

∑

θ | πθ /∈Π

µc

τ(π)(θ) · ps

∗(πθ)

.

Assuming the path πθ is not in Π, the least fixed point pmin∗ delivers the smallest prob-

ability for it. Therefore,

ps

∗(π) ≥

∑

θ | πθ∈Π

µc

τ(π)(θ)

+

∑

θ | πθ /∈Π

µc

τ(π)(θ) · pmin∗ (πθ)

.

It is a well known fact that given a finite set S ⊂ [0, 1] that contains an element x ∈ S,then the minimum of the set cannot be greater than x, therefore x ≥ min(S). Using thisfact, we derive that the last expression

≥ minc∈C

∑

θ | πθ∈Π

µc

τ(π)(θ)

+

∑

θ | πθ /∈Π

µc

τ(π)(θ) · pmin∗ (πθ)

.

Now we use the definition of F min, and get that the last expression

= F min(pmin∗ )(π) .

79


Because we made no assumptions on s and π, we know that no matter which s andπ we take ps

∗(π) ≥ F min(pmin∗ )(π). Thus, also pmin

∗ ≥ F min(pmin∗ ). Applying proposi-

tion B.1, we get pmin∗ ≥ lfp(F min) = fmin.

From pmin∗ ≤ fmin and fmin ≤ p

min∗ , we conclude fmin = p

min∗ .

Analogously, the maximal iteration operator of the unbounded case can be defined andproven.

Proposition 5.24. Let (S, Θ, τ,C, succ, C) be a choice-aware Markov decision process,and let Π ⊆ FinPathsM

be the paths of interest. For π ∈ FinPathsM

\ Π, let pmax

∗ :pmax∗

FinPathsM

\ Π → [0, 1] be given by

pmax∗ (π) = sup

s∈S

PrM,s[π]τ(π) (π∗

| ππ∗

∈ Πs↑M,s[π]) .

Then, pmax∗ is the least fixed point of the operatorF max : (FinPathsM

\ Π → [0, 1]) →

(FinPathsM

\ Π → [0, 1]) which is given by

F max(p)(π) =

maxc∈C

∑

θ | πθ∈Π

µc

τ(π)(θ)

+

∑

θ | πθ /∈Π

µc


.

Proof. F max is monotone and preserves suprema. Let fmax be the least fixed point ofF max.

“pmax∗ ≥ fmax”:

Let π ∈ FinPathsM

\ Π. Now we choose some c′ ∈ C such that

F max(pmax∗ )(π) =

∑

θ | πθ∈Π

µc′

τ(π)(θ)

+

∑

θ | πθ /∈Π

µc′

τ(π)(θ) · pmax∗ (πθ)

.

(1)

Let sϵ,θ be a step scheduler, then psϵ,θ∗ is the fixed point of the operator that was defined

in lemma 5.22 using sϵ,θ. For each ϵ > 0 and θ ∈ θ | πθ /∈ Π, we choose a stepscheduler sϵ,θ ∈ S with

pmax∗ (πθ) ≤ p

sϵ,θ∗ (πθ) + ϵ .(2)

Let sϵ be a step scheduler with sϵ(π) = c′ and for each θ ∈ θ | πθ /∈ Π, and for

each π′

∈ πθπ′

|= (πθπ′) ∈ FinPathsM

\ Π, sϵ(π

′) = sϵ,θ(π

′). Simply put, sϵ

is constructed in a way that it uses c′ for an anchored path π and afterwards uses choiceschedulers that guarantee pmax

∗ (πθ) is always within the bound ϵ.

80


By construction, psϵ∗ and psϵ,θ

∗ are equal for anchored paths starting with πθ, i.e.,

psϵ

∗ (πθ) = psϵ,θ∗ (πθ).

Equation (2) can be rearranged to psϵ,θ∗ (πθ) ≥ p

max∗ (πθ) − ϵ. Thus,

psϵ

∗ (πθ) ≥ pmax∗ (πθ) − ϵ.(3)

By its definition, pmax∗ (π) uses the step scheduler which delivers the greatest probability

for π. In conjunction with lemma 5.22,

pmax∗ (π) ≥ p

sϵ

∗ (π).(4)

Using proposition 5.22, the fact that applying operator F onto one of its fixed pointsdoes not change its value, and by construction of c′,

psϵ

∗ (π) =

∑

θ | πθ∈Π

µc′

τ(π)(θ)

+

∑

θ | πθ /∈Π

µc′

τ(π)(θ) · psϵ

∗ (πθ)

.(5)

Applying equation (3) on the right side of equation (5), we get

psϵ

∗ (π) ≥

∑

θ | πθ∈Π

µc′

τ(π)(θ)

+

∑

θ | πθ /∈Π

µc′

τ(π)(θ) · (pmax∗ (πθ) − ϵ)

.(6)

By simple arithmetics, ϵ can be pushed out of the right sum, which gives a new ϵ′. Weare not interested in its actual value, only the fact ϵ′ ≥ 0 matters. Thus,

psϵ

∗ (π) ≥

∑

θ | πθ∈Π

µc′

τ(π)(θ)

+

∑

θ | πθ /∈Π

µc′

τ(π)(θ) · pmax∗ (πθ)

− ϵ′.(7)

Using (1), we finally get,

psϵ

∗ (π) ≥ F max(pmax∗ )(π) − ϵ′(8)

Using equations (4) and (8), we get

pmax∗ (π) ≥ F max(pmax

∗ )(π) − ϵ′

By proposition B.1, pmax∗ ≥ lfp(F max) = fmax.

“fmax ≥ pmax∗ ”:

Let ps

n(π) = PrM,s[π]τ(π) (π∗

| ππ∗

∈ Πs ∧ |π∗

| ≤ n↑M,s[π]).

81


It is easy to see that

ps

∗(π) = limn→∞

ps

n(π).(9)

Using arguments from proposition 5.12 and lemma 5.22, it is easy to see that

ps

0(π) = 0, and

ps

n+1(π) =

∑

θ | πθ∈Π

µc

τ(π)(θ)

+

∑

θ | πθ /∈Π

µc

τ(π)(θ) · ps

n(πθ)

.

Now we show by induction over n that

ps

n ≤ fmax(10)

The inequality is satisfied by the base case n = 0 as ps

0(π) = 0 ≤ fmax. For the inductionstep,

ps

n+1(π) =

∑

θ | πθ∈Π

µc

τ(π)(θ)

+

∑

θ | πθ /∈Π

µc

τ(π)(θ) · ps

n(πθ)

.

Using the induction hypothesis,

ps

n+1(π) ≤

∑

θ | πθ∈Π

µc

τ(π)(θ)

+

∑

θ | πθ /∈Π

µc

τ(π)(θ) · fmax(πθ)

.

The term on the right side is smaller or equal the term using the maximal c, thus

ps

n+1(π) ≤ maxc∈C

∑

θ | πθ∈Π

µc

τ(π)(θ)

+

∑

θ | πθ /∈Π

µc

τ(π)(θ) · fmax(πθ)

.

Now the right term can be rearranged by using the operator F max,

ps

n+1(π) ≤ F max(fmax)(π) .

Because fmax is a fixed point, nothing changes. Thus, ps

n+1(π) ≤ fmax(π). Therefore,(10) holds for all n.Using (9) and (10), ps

∗(π) ≤ fmax. Finally, pmax∗ = sup

s∈S

ps

∗(π) ≤ fmax.

From pmax∗ ≥ fmax and fmax ≥ p

max∗ , we conclude fmax = p

max∗ .

The next theorem introduces an iteration operator for CMDPs that is indeed imple-mentable.

82


Theorem 5.25. Let (S, Θ, τ,C, succ, C) be a choice-aware Markov decision process, letΠ = FinPathsM

(Θ1 U Θ2), and let F min : (S → [0, 1]) → (S → [0, 1]) be the iteration

operator which is given by

F min(p)(s) = minc∈C

∑

θ∈Θ2

µc

s(θ)

+

∑

θ∈Θ1\Θ2

µc

s(θ) · p(τ(θ))

,

and let F max : (S → [0, 1]) → (S → [0, 1]) be the iteration operator which is given by

F max(p)(s) = maxc∈C

∑

θ∈Θ2

µc

s(θ)

+

∑

θ∈Θ1\Θ2(s)

µc

s(θ) · p(τ(θ))

.

Then,

infs∈S

PrM,ss (Θ1 U Θ2) = lfp F min(s), and

sups∈S

PrM,ss (Θ1 U Θ2) = lfp F max(s).

Proof. Analogue to theorem 5.10. Use 5.23 and 5.24. Use also that s[[s,∅]] ⊆ s.

This subsection shows finally how the probability of the minimal choice scheduler forthe iteration operator F min can be calculated efficiently. LetM = (S, Θ, τ,C, succ, C) bea CMDP with choice transition function succ, let s ∈ S be an arbitrary state, and let θ anarbitrary target. Remind that succ is finite, and µc

s(θ) is defined as µc

s(θ) = PrMc

C(s)(F θ).Therefore,Mc is acyclic, if the reflexive transitions of the states inMc that emerge from Θare ignored (see definition 5.18 on page 75). Therefore, given a choice scheduler c ∈ C, theprobability µc

s(θ) can be calculated by cprobc(C(s)) using the recursively operator cprobc,which is given by

cprobc(c) =

1 if succ(c) = θ∑

c′∈C µ(c′) · cprobc(c′) if succ(c) ∈ Dists(C) and µ = succ(c)

cprobc(c′) if succ(c) ∈ 2C and c(c) = c′

0 otherwise .

Let Π = FinPathsM

(Θ1 U Θ2) be the paths of interest, and let p : (S → [0, 1]) be thestate probabilities. The recursive operator cprob can be extended to calculate the value ofF min(p)(s). The minimum choice probability operator cprobmin is given by

cprobmin(p, c) =

1 if succ(c) ∈ Θ2

p(τ(θ)) if θ = succ(c) ∈ Θ1 \ Θ2

0 if succ(c) ∈ Θ \ (Θ1 ∪ Θ2)∑

c′∈C µ(c′) · cprobmin(p, c′) if µ = succ(c) ∈ Dists(C)

minc′∈succ(c) cprobmin(p, c′) if succ(c) ∈ 2C .

83


It is can be shown that F min(p)(s) = cprobmin(p, C(s)). A dual operator cprobmaxfor the maximum probability can be defined analogously.

5.2.3. Bounded Reachability

The goal is to compute the probability to reach specific targets in a bounded number ofsteps, i.e., those paths that reach targets Θ2 ⊆ ΘM being restricted to the targets. Analo-gously to GMCs, this can be denoted using the bounded until operator U≤k as

FinPathsM

(Θ1 U≤k Θ2) = [s, θ1 . . . θn] ∈ FinPathsM

|

n ≤ k ∧ θn ∈ Θ2 ∧ ∀1 ≤ i < n . θi ∈ Θ1 .

As previously, the computation of those probability cannot directly be reduced to thecomputation of the probability of the set of paths of interest, and step schedulers mustbe regarded. Subsequently, PrM,s

s (Θ1 U≤k Θ2) abbreviates PrM,ss (Πss↑M,s) with Π =

FinPathsM

(Θ1 U≤k Θ2). For those cases without any restrictions on the targets that mustbe satisfied before reaching a target inΘ2,PrM,s

s (F≤k Θ2) abbreviatesPrM,ss (ΘM

U≤k Θ2).Finally, the next theorem shows that the iteration operator of the unbounded case can

be used for the bounded case as well.

Theorem 5.26. Let (S, Θ, τ,C, succ, C) be a choice-aware Markov decision process, andlet Π = FinPathsM

(Θ1 U≤k Θ2) be the paths of interest. Let F min : (S → [0, 1]) →

(S → [0, 1]) be the iteration operator which is given by

F min(p)(s) = minc∈C

∑

θ∈Θ2

µc

s(θ)

+

∑

θ∈Θ1\Θ2

µc

s(θ) · p(τ(θ))

,

and let F max : (S → [0, 1]) → (S → [0, 1]) be the iteration operator which is given by

F max(p)(s) = maxc∈C

∑

θ∈Θ2

µc

s(θ)

+

∑

θ∈Θ1\Θ2(s)

µc

s(θ) · p(τ(θ))

.

Then,

infs∈S

PrM,ss (Θ1 U≤k Θ2) = (F min)k(0)(s), and

sups∈S

PrM,ss (Θ1 U≤k Θ2) = (F max)k(0)(s).

Proof. Analogue to theorem 5.13. Use also that s[[s,∅]] ⊆ s.

84


target label

θ0 e0θ1 e0θ2 e0θ3 e0, e1θ4θ5 e0

Figure 5.7.: Example of Labeled choice-aware Markov Decision Process

5.2.4. Choice-Aware Markov decision Process with Initial Choice

Definition 5.27 (Generic Choice-aware Markov Decision Process with initial Choice). Achoice-aware Markov decision process M with initial choice is represented by the tuple(S, Θ, τ,C, succ, C, cinit) consisting of

• a finite set S of states, and• a finite set Θ of targets with a target state function τ : Θ → S, and• a finite set C of choices with an acyclic choice transition function succ : C → Θ +

Dists(C) + 2C, and• a function C : S → C that determines the root choice of each state, and• an initial choice cinit ∈ C

where Dists(C) = µC : C → [0, 1] |∑

c∈C µC(c) = 1.

Analogue to CMDPs, there exists also a labeled manifestation of choice-aware Markovdecision process with initial choice. Figure 5.7 extends the labeled CMDP of figure 5.6 withan initial choice.LetM = (S, Θ, τ,C, succ, C, cinit) be a CMDP with initial choice. The definitions for

the CMDP (S, Θ, τ,C, succ, C) are also valid for M. In addition to it, new definitionsrelated to the initial probability distribution are introduced. An initial finite path πinit =ϱ0 ϱ1 . . . ϱn ∈ (PathsM

choice)∗ of M is a finite path where ϱ0[0] = cM

init. FinPathsM

init

is the set of all initial finite paths of M. An initial infinite path πinit = ϱ0 ϱ1 ϱ2 . . . ∈(PathsM

choice)ω of M is an infinite path where ϱ0[0] = cM

init. InfPathsM

initis the set of

all initial paths of M. Given choice scheduler c, let µc

initbe the choice distribution with

µc

init(θ) = PrMc

cinit(F θ) using the Markov chain of definition 5.18.

Recall that the probability space of generic Markov chains starting in a certain state (seedefinition 5.2) was extended to a probability space for generic Markov chains in definition5.15. The probability space of a choice-aware Markov Decision Process induced by a stepscheduler starting in a certain state (see definition 5.21) can also be extended to a variantwith an initial choice scheduler cinit and is not further described here. Let PrM,s,cinit be theprobability measure defined in such a way. Furthermore, let Π↑M,s,cinit define the cylinderset of Π ⊆ FinPathsM

initand let Πs,cinit define the restriction in the obvious way.

85


The definitions of FinPathsM

(Θ1 U≤k Θ2) and FinPathsM

(Θ1 U Θ2) can be lifted tochoice-aware Markov decision processes with initial transitions:

FinPathsMinit

(Θ1 U≤k Θ2) = θ0 . . . θn ∈ FinPathsMinit

|

n ≤ k ∧ θn ∈ Θ2 ∧ ∀0 ≤ i < n . θi ∈ Θ1 , and

FinPathsMinit

(Θ1 U Θ2) = θ0 . . . θn ∈ FinPathsMinit

|

θn ∈ Θ2 ∧ ∀0 ≤ i < n . θi ∈ Θ1.

Subsequently, PrM,s,cinit (Θ1 U≤k Θ2) abbreviates PrM,s,cinit (Πs,cinit ↑M,s,cinit ) with Π =FinPathsM

init(Θ1 U≤k Θ2). For the unbounded case, let PrM,s,cinit (Θ1 U Θ2) abbreviate

PrM,s,cinit (Πs,cinit ↑M,s,cinit ) with Π = FinPathsMinit

(Θ1 U Θ2). The operators F≤k and F

also denote abbreviation of U≤k and U in the obvious ways.To conclude the discussion about CMDPs with initial choice, this subsection states with-

out proof that

mincinit∈C

infs∈S

PrM,s,cinit (Θ1 U≤k Θ2) =

minc∈C

∑

θ∈Θ2

µc

init(θ)

+

∑

θ∈Θ1\Θ2

µc

init(θ) · inf

s∈S

PrM,τ(θ)s′ (Θ1 U≤k Θ2)

.

Analogue statements can also be made for

maxcinit∈C

sups∈S

PrM,s,cinit (Θ1 U≤k Θ2),

mincinit∈C

infs∈S

PrM,s,cinit (Θ1 U Θ2), and

maxcinit∈C

sups∈S

PrM,s,cinit (Θ1 U Θ2).

In the upcoming chapters, Prmin(·) abbreviates mincinit∈C infs∈S

PrM,s,cinit (·),

andPrmax(·) abbreviatesmaxcinit∈C sups∈S

PrM,s,cinit PrM,s,cinit (·). Furthermore, labeled choice-

aware Markov decision process with initial choice are simply referred to as labeled choice-aware Markov decision process, if it is clear from the context.

5.2.5. Comparison to Standard Markov Decision Processes

The CMDPs that have been treated in this section are a major extension of standardMarkovdecision processes (MDPs). Labeled CMDPs are not more expressive than standard Mar-kov decision processes, but they make analyzing executable models feasible as illustrated inthis subsection. Before CMDPs are compared with MDPs, MDPs are introduced.

86


Definition 5.28 (Markov Decision Process). AMarkov decision processM is representedby the tuple (S, Dists, Steps, µinit) consisting of

• a finite set of states S, and• a finite set of probability distributions Dists : 2S→[0,1] such that for all µ ∈ Dists:∑

s∈Sµ(s) = 1, and

• the transition function Steps = S → 2Dists, and• the initial probability distribution µinit ∈ Dists.

(a) Example of MDP (b) Transformed CMDP

Figure 5.8.: Transforming a MDP to a CMDP

Figure 5.8a shows an example of a MDP. The example contains two states. The distri-butions are depicted by point-shaped nodes. The distributions of each state are indicatedby solid-headed arcs.As already indicated in the introduction of the chapter, an intuitive way to look at a

transition in a MDP is that first one probability distribution of the active state is selectednondeterministically, and secondly, this probability distribution is used to select a successorstate probabilistically.Hence, a standard Markov decision process (S ′, Dists′, Steps′, µ′

init) can be expressed

as choice-aware Markov decision process (S, Θ, τ,C, succ, C, cinit) with• S = S ′, and• Θ = S ′ and the identity function as τ , and• C = 2Dists

′

+ Dists′ + S ′ withsucc(µ0, µ1, . . . , µn) = µ0, µ1, . . . , µn andsucc(µ) = µ andsucc(s) = s, and

• C = Steps′, and• cinit = µ′

init.

87


−→

(a) Merging two probabilistic choices

−→

(b) Pulling nondeterministic choice in front of probabilistic choice

Figure 5.9.: Flattening

The other way around, every CMDP can be transformed into an equally expressive MDP.There are at least three ways to transform a labeled CMDP into a standard MDP. The refer-ence implementation contains the source code of all three transformations [ISSE18]. As allthree ways perform poorly, only the idea behind those three transformations is describedhere.The basic idea of the direct transformation is to select the choices of the CMDP as states

of the MDP. The disadvantage of the direct way is that a lot of new states get introduced.There is an even more severe problem: having a transition from a state s to its successor s′

always needs one “step” (choice path) in the CMDP; but in the MDP, between the trans-formed state s and the transformed successor s′ are several intermediate states, and thedistance between transformed states is not guaranteed to be constant because the choicepaths may have different length. After such a direct transformation, bounded model check-ing is not useful anymore (but boundedmodel checking is useful for safety analysis as shownin chapter 9).The second transformation distance preserving circumvents the problem of the variable

distance. In order to do this, the maximal distance between two states is calculated; byadding artificial states, the distance between two states is aligned to the maximal distance.The problem of this transformation is that many new states get added to the MDP.The third transformation tries to flatten the choice paths. The choices and the choice

transition function define a choice graph. This choice graph is normalized stepwise as in-dicated by figure 5.9: The graph is analyzed, if there is a situation, which disagrees with therule that only a probabilistic choice may follow a nondeterministic choice. There are threecases where this rule is broken. In the first case (illustrated by figure 5.9a), a probabilistic

88

5.3. Related Work

choice follows a probabilistic choice. Such cases can simply be merged as indicated. Thesame is true for the case when a nondeterministic choice follows a nondeterministic choice.In the third case, a nondeterministic choice follows a probabilistic choice. Then, the nonde-terministic choice can be pulled in front of the probabilistic choice, which adds additionalprobabilistic distributions. Roughly speaking, the nondeterministic choice selects a choiceresolver for which a distribution is given. The number of distributions in the final MDPtherefore is equivalent to the number of possible choice schedulers; and there might be alot of choice schedulers. The evaluation shows that this number explodes in practice.The results of the evaluations on the case studies are depicted in table 5.6 (page 90) and

table 5.7 (page 91). The number of transitions in a CMDP is determined by the numberof choices for which the choice transition succ returns a target, and the number of tran-sitions in a MPD by the number of non-zero values of its probability distributions Dists.The evaluations show that none of the described transformations is satisfactory. In manycases, the memory limit was exceeded, because too much space for states and transitionshad to be allocated. The exploding numbers of choice schedulers was indeed a problem forthe flattening transformation. On examples with a really small state space, it seems to bethe transformation of choice, but even in the relatively small Pressure Tank example, thenumber of transitions exploded (factor > 100), and for the larger case studies the transfor-mation did not even provide any result. Therefore, labeled CMDPs are best model checkeddirectly.

5.3. Related Work

There are a lot of formalisms that allow probabilistic analyses in discrete systems:• probabilistic automata are a variation of Markov decision processes that focus oncomposition of different models [Sto02],

• continuous-time Markov chains replace the discrete transitions of Markov chains byexponentially distributed delays [BK08],

• Markov automata unify Markov decision processes and continuous-time Markovchains by adding exponentially distributed delays to Markov chains [EHZ10],

• probabilistic timed automata add real-time behavior to Markov decision processes[Bea03],

• stochastic timed automata add exponentially distributed delays to probabilistic timedautomata [BB+14], and

• alternating models of Hansson et al. allow in each state either a nondeterministicchoice or a probabilistic choice similar to choice path [HJ90].

Even if executable models can be transformed in those formalisms, no efficient modelchecking is possible for larger systems. However, it seems reasonable to extend those for-malisms to better support executable models in a similar way as choice-aware Markov deci-

89




Labeled CMDPStates 2,587,933 exceeded 294,871 156 25 30Choices 87,832,472 exceeded 6,449,262 3,091 86 162Transitions 45,210,203 exceeded 3,372,067 1,624 56 108

Standard MDP (direct)States exceeded exceeded 3,120,289 1,506 43 54Transitions exceeded exceeded 6,233,131 2,974 76 132State Ratio − − 10.58 × 9.65 × 1.72 × 1.8 ×

Transition Ratio − − 1.85 × 1.83 × 1.36 × 1.22 ×

Standard MDP (distance preserving)Max Distance exceeded exceeded 9 6 3 3States exceeded exceeded 18,950,848 4,662 118 172Transitions exceeded exceeded 22,063,690 6,130 151 250State Ratio − − 64.27 × 29.88 × 4.72 × 5.73 ×

Transition Ratio − − × 6.54 × 3.77 × 2.31 ×

Standard MDP (flattening)States exceeded exceeded exceeded 156 29 30Transitions exceeded exceeded exceeded 184,664 68 108State Ratio − − − 1.0 × 1.16 × 1.0 ×

Transition Ratio − − − 113.71 × 1.21 × 1.0 ×

Table 5.6.: Comparison of labeled CMDP with standard MDPS when faults and hazardsare observable

sion processes extend standard Markov decision processes.The action-labeled fully probabilistic systems are a variation of the labeledMarkov chains

[Bai98]; however, the iteration algorithm introduced in this chapter is more efficient, be-cause it only requires to store the probabilities of each state and not of each sequence ofpossible actions. The author is not aware of any other MDP-like formalism that resem-bles the idea of choice-aware Markov decision processes, which distinguish between inner“micro steps” (here, choice paths) and the outer steps.

90

5.3. Related Work



Labeled CMDPStates 2,587,933 exceeded 294,871 156 25 30Choices 87,832,472 exceeded 6,449,262 3,091 86 162Transitions 45,210,203 exceeded 3,372,067 1,624 56 108

Standard MDP (direct)States 42,622,269 exceeded 3,084,007 1,506 39 54Transitions 85,244,539 exceeded 6,161,203 2,974 70 132State Ratio 16.47 × − 10.46 × 9.65 × 1.56 × 1.8 ×

Transition Ratio 1.89 × − 1.83 × 1.83 × 1.25 × 1.22 ×

Standard MDP (distance preserving)Max Distance exceeded exceeded 9 6 3 3States exceeded exceeded 18,666,395 4,662 106 172Transitions exceeded exceeded 21,743,591 6,130 137 250State Ratio − − 63.30 × 29.88 × 4.24 × 5.73 ×

Transition Ratio − − 6.45 × 1.93 × 2.45 × 2.31 ×

Standard MDP (flattening)States exceeded exceeded exceeded 156 25 30Transitions exceeded exceeded exceeded 184,664 62 108State Ratio − − − 1.0 × 1.0 × 1.0 ×

Transition Ratio − − − 113.71 × 1.11 × 1.0 ×

Table 5.7.: Comparison of labeled CMDP with standard MDPS when only hazards are ob-servable

91

Sherlock Holmes: It is an old maxim of mine thatwhen you have excluded the impossible,whatever remains, however improbable, mustbe the truth.

– The Adventure of the Beryl Coronet (1892)

6. Model Checking ProbabilisticSystems

In the previous chapter, labeledMarkov chains (LMCs) and labeled choice-awareMarkovdecision processes (LCMDP) have been introduced: the chapter contained formal descrip-tions and a mathematical function to calculate the bounded and unbounded probability toreach certain states of those probabilistic systems. The current chapter provides concretedata structures and algorithms that can be used to implement LMCs and LCMDPs. Thosedata structures and algorithms are the foundation of the probabilistic analysis of executablemodels using model checking. Those targets for which the target distribution function re-turns a value greater than zero are called transitions of a state. The evaluations in chapter 5showed that the average number of transitions per state is smaller than the number of statesby orders of magnitude (cf. table 5.4 on page 72 and table 5.6 on page 90). Such a statespace is called sparse. Because of the sparseness, it makes sense to save only the transitionsand their probability and not the full distributions with a lot of zero values. Hence, a trickfrom the numerical analysis can be borrowed: the sparse matrix data structure only savesnon-zero values of matrices.This chapter is intended for developers who want to implement LMCs or LMDPs, which

might also be useful apart from the safety analysis of executable models. It is recommendedto have read the introductions of section 5.1 on page 56 and section 5.2 on page 73 to getan idea of the characteristic features of LMCs and LCMDPs, respectively. HiddenExampleon page 36 with hiding applied is used as running example to illustrate the sparse datastructures for LMCs and LMDPs. How those probabilistic systems are generated from themodel is topic of chapter 7.Section 6.1 and section 6.2 show the data structures and algorithms for LMCs and

LCMDPs, respectively. Thereafter, section 6.3 demonstrates that these algorithms anddata structures are indeed applicable for the analysis of the case studies. Finally, section 6.4presents some related approaches.Parker provides sparse data structures and model checking algorithms for standard Mar-

kov chains and Markov decision processes in [Par02].

93

6. Model Checking Probabilistic Systems

target label

θ0 e0θ1 e0θ2 e0θ3 e0, e1θ4θ5 e0

(a) Illustration

states

elements

from

0

3

1

1

1

4

2

1

5

3

1

6

transitions

probability

label

state

0

1.0

e0

0

1

0.3

e0

1

2

0.3

e0

2

3

0.4

e0, e1

3

4

1.0

1

5

1.0

e0

2

6

1.0

e0, e1

3

initialTransitionFrom: 0initialTransitionCount: 1

(b) Data structure

Figure 6.1.: Example of a labeled Markov chain with an initial distribution

6.1. Sparse Labeled Markov Chains

A labeled Markov chain is represented by the tuple (S, Θ, R, µinit) (cf. definition 5.14), inwhich the each target inΘ is given by a label and a state. Figure 6.1a shows an illustration ofa LMC. A unique number in [0.. |S|] is allocated to each state. A representation in computermemory of such a LMC is depicted in figure 6.1b:

• transitions is an array of the record type probability× label× state. The data of theprobability-field comes from the initial probability distribution µinit and the proba-bility distribution µs of each state s. The corresponding targets are denormalizedinto the label-fields and state-fields. Note that only those transitions are saved forwhich the target distribution function returns a value greater than zero.

• states is an array of the record type elements× from that contains for each state theindex of the first transition in the transitions array and the number of elements in thetransitions array. Inside the transitions array, the elements of each state are stored

94

6.1. Sparse Labeled Markov Chains

sequentially.• initialTransitionFrom is an integer that contains the index in the transitions array forfirst initial transition.

• initialTransitionCount is an integer that contains the number of initial transitions.Note that in the example both transition 3 and transition 6 contain the information

of target θ3. Previous evaluation on the case studies support that the number of targetsis only slightly smaller than the number of such denormalized transitions. Even thoughthe normalization might lead to a small increase in memory usage, those LMCs are stillsmaller than comparable standard Markov chains. In exchange, the denormalization hasseveral benefits: it makes model checking faster, because it removes one indirection andthe data that is accessed together is kept closer in memory, which makes caching moreefficient. Also, it makes the generation of a LMC easier and faster, because no lookup fora preexisting target is necessary.The memory of a LMC model depends on the width of the integers used. With ordinary

unsigned integers with a fixed width of 32 bits, an entry in the states-array requires 8 bytes. Aprobability is best modeled as a floating point with double precision (=8 bytes). An integerwith 32 bits can be used to save up to 32 formulas into one label. All in all, a transitionrequires at least 16 bytes. Hence, a model like the railroad crossing with 2,587,933 states and26,221,633 transitions requires about 20 megabytes for the states and about 400 megabytesfor the transitions. Using integers with fixed width of 64 bits (as done in the referenceimplementation) almost doubles the required memory, but the size is easily manageable forany state of the art desktop computer.Section 5.1 states that the probability to reach targets satisfying Θ2 being restricted to

the targets Θ1 before in maximal k steps can be calculated by

Pr(Θ1 U≤k Θ2) =

∑

θ∈Θ2

µM

init(θ)

+

∑

θ∈Θ1\Θ2

µM

init(θ) · PrM

τ(θ)(Θ1 U≤k Θ2)

,

According to theorem 5.13, the iteration operator F applied k times onto a start valuethat maps each state to zero can be used to calculate the probability PrM

τ(θ)(Θ1 U≤k Θ2):

F (p)(s) =

∑

θ∈Θ2

µs(θ)

+

∑

θ∈Θ1\Θ2


.

Algorithm 6.1 shows an implementation for the probabilistic model checking based onthe given data structures.

95


Algorithm 6.1: reachability probability in a labeled Markov chain with initial distribution

1 method LmcModelChecker::CalculateReachabilityProbability(lmc,Θ1,Θ2)2 returns probability of Θ1 U Θ2 or Θ1 U≤k Θ2 depending on stop condition3 var xold := new double[lmc.stateCount]4 var xnew := new double[lmc.stateCount]5 i = 06 while i<k do7 swap xold with xnew8 for each s ∈ lmc.states do9 var sum := 0.010 var l := s.from11 var h := s.from + s.elements - 112 for each t ∈ lmc.transitions[l] . . . lmc.transitions[h] do13 if t.label satisfies Θ2 then14 sum += t.probability15 else if t.label satisfies Θ1 then16 sum += t.probability * xold[t.state]17 else18 do nothing19 xnew[i] = sum20 i++21 var finalsum := 0.022 var l := initialTransitionFrom23 var h := initialTransitionFrom + initialTransitionCount - 124 for each t ∈ lmc.transitions[l] . . . lmc.transitions[h] do25 if t.label satisfies Θ2 then26 sum += t.probability27 else if t.label satisfies Θ1 then28 sum += t.probability * xnew[t.state]29 else30 do nothing31 return finalsum32 end method

The array xold and xnew contain for each state the probability to satisfy the input for-mula in the previous and current iteration, respectively. In lines 3 and 4, xold and xneware declared as arrays of double precision floating points; each element in those arrays isinitialized with the probability 0.0. Lines 6-20 implement the iteration operator, and lines22-30 the calculation of the final probability that also considers the initial distribution. Asmall remark to the satisfaction check in lines 13, 15, 25, and 27: The formulas Θ1 and Θ2

may be any propositional logic formula using any element in the label as atomic proposition,e.g.,Θ1 = e0∧ e1 is a valid formula. The result of the evaluation does not change betweeniterations; therefore, it of advantage to precalculate the result of each transition and save itto an array to omit redundant evaluations.

96

6.2. Sparse Labeled Choice-Aware Markov Decision Processes

According to theorem 5.10, the unbounded case works pretty similar. The correct valueis always calculated after infinite calculations. This is of course not feasible; in practicean approximation is calculated that is precise enough. To adopt the algorithm for the un-bounded case, only the condition of the while-loop in line 6 needs to be replaced with astop condition that might be a precision criteria, a maximal number of iterations, or just atimeout.

6.2. Sparse Labeled Choice-Aware Markov DecisionProcesses

target label

θ0 e0θ1 e0θ2 e0θ3 e0, e1θ4θ5 e0

(a) Illustration

states

choice

[0]

1

[1]

8

[2]

7

[3]

6

choices

type

probability

from

to

0

t

-

-

0

1

p

-

2

3

2

nd

0.6

4

5

3

t

0.4

-

3

4

t

-

-

1

5

t

-

-

2

6

t

-

-

6

7

t

-

-

5

8

t

-

-

4

targets

label

state

0

e0

[0]

1

e0

[1]

2

e0

[2]

3

e0, e1

[3]

4

[1]

5

e0

[2]

6

e0, e1

[3]

initialChoice: 0

(b) Data structure

Figure 6.2.: Example of Labeled choice-aware Markov Decision Process

97


A labeled choice-aware Markov decision process (LMDP) is represented by the tuple(S, Θ,C, C, cinit) with the choice transition function succ (cf. definition 5.27), in whichthe each target in Θ is given by a label and a state. Figure 6.2a shows an illustration of aLMDP. A unique number in [0.. |S|] is allocated to each state. A representation in computermemory of such a LMDP is depicted in figure 6.2b.

• states is an integer array that contains for each state the index of the first choice inthe choices-array.

• choices is an array of the record type type× probability× from× to. Depending onthe type-field, the other fields have a different meaning:

– if the choice is a target choice (t), then the to-field indicates the target of thechoice. The from-field has no meaning.

– if the choice is a nondeterministic choice (nd), the from-field indicates the in-dex of the first nondeterministic option in the choices-array, and the to-fieldindicates the last (inclusive). The successor choices are stored sequentially.

– if the choice is a probabilistic choice (p), the from-field indicates the index ofthe first probabilistic option in the choices-array, and the to-field indicates thelast (inclusive). The probabilities of each choice are in the probability-field ofthe respective successor choice. For this reason, the probability-field of choice1 in the example has no relevance in contrast to the probability-fields of choice2 and 3. A probabilistic choice is never the last choice; hence, a successor choicealways exists. Only those transitions are saved for which the target distributionfunction returns a value greater zero.

Note that this representation requires that succ leads to a tree-like structure, i.e., nochoice has more than one incoming edge.

• targets is an array of the record type label× state.• initialChoice is an integer that contains the index of the initial choice in the choices-array.

Usually, the number of targets is lower than the number of choices (see table 5.7 onpage 91). In contrast to the LMC data structure, the targets have not been denormalizedinto the choices-array, because more memory could be saved with this decision. The targets-array contains θ3 twice, namely as target 3 and target 6. As with LMCs, the generationalgorithm does not check if a matching entry already exists to prevent a slowdown causedby a lookup.In the presented data-structure, the probability values are included directly in the choices-

array. Another option is to create a distinct array with only probability values; but then aprobability-field is still necessary to provide the index of the entries into this array. Thiscould save some space if this index is selected as a 32-bit integer. The requirement of a tree-like structure is no handicap, because the LMDP generation algorithm in the next sectiongenerates a tree-like structure.Using ordinary unsigned integers with a fixed width of 32 bits, an entry in the states-array

98

6.2. Sparse Labeled Choice-Aware Markov Decision Processes

requires 4 bytes. The type-field in the choices array only requires 1 byte, the from-field andto-field each 4 bytes and the probability-field 8 bytes again. All in all, a choice requires atleast 17 bytes. An entry in the targets-array requires 8 bytes. Hence, a model like the railroadcrossing with 2,587,933 states, 87,832,472 choices and 45,210,203 targets requires about10 megabytes for the states, about 1.4 gigabytes for the choices, and about 350 megabytesfor the targets. But this size is easily manageable for any state of the art desktop computer.Section 5.1 states that the probability to reach targets satisfying Θ2 being restricted to

the targets Θ1 before in in maximal k steps can be calculated by

Prmin(Θ1 U≤k Θ2) =

minc∈C

∑

θ∈Θ2

µc

init(θ)

+

∑

θ∈Θ1\Θ2

µc

init(θ) · inf

s∈S

PrM,τ(θ)s′ (Θ1 U≤k Θ2)

.

According to theorem 5.26, the iteration operator F min applied k times onto a start valuethat maps each state to zero be used to calculate the probability inf

s∈S

PrM,ss (Θ1 U≤k Θ2).

Therefore, the skeleton of the algorithm is very similar to the one of algorithm 6.1.

Algorithm 6.2: minimal reachability probability in a labeled CMDP with initial choice (part 1)

1 method LcmdpModelChecker::CalculateMinimalReachabilityProbability(lmdp,Θ1,Θ2)2 returns probability of Θ1 U Θ2 or Θ1 U≤k Θ2 depending on stop condition3 var xold := new double[lmdp.stateCount]4 var xnew := new double[lmdp.stateCount]5 while stop condition not met do6 swap xold with xnew7 for each s ∈ lmdp.states do8 xnew[i] = CalculateMinimalProbabilityOfChoice(lmdp,Θ1,Θ2, xold, s.choice)9 return CalculateMinimalProbabilityOfChoice(lmdp,Θ1,Θ2, newValues, initialChoice)10 end method

The algorithm CalculateMinimalReachabilityProbability passes the actual calculation toCalculateMinimalProbabilityOfChoice in line 9. Based on the operator cprobmin (seepage 83), this algorithm can be implemented as follows:

Algorithm 6.3: minimal reachability probability in a labeled CMDP with initial choice (part 2)

11 method LcmdpModelChecker::CalculateMinimalProbabilityOfChoice(lmdp,Θ1,Θ2, stateProbabil-ities, choice)

12 returns minimal probability of choice13 if choice.type == final then14 var t := targets[choice.to]15 if t.label satisfies Θ2 then16 return 1.017 else if t.label satisfies Θ1 then

99


18 return stateProbabilities[t.state]19 else20 return 0.021 else if choice.type == nondeterministic then22 var pSmallest := positive infinity23 for each childChoice ∈ lmdp.choices[choice.from]..lmdp.choices[choice.to] do24 var pOfChild := CalculateMinimalProbabilityOfChoice(lmdp,..,childChoice)25 if pOfChild < pSmallest then26 pSmallest := pOfChild27 return pSmallest28 else if choice.type == probabilistic then29 var sum := 0.030 for each childChoice ∈ lmdp.choices[choice.from]..lmdp.choices[choice.to] do31 var transitionProbability := childChoice.probability;32 var pOfChild := CalculateMinimalProbabilityOfChoice(lmdp,..,childChoice)33 sum += transitionProbability * pOfChild34 return sum35 else36 throw exception, because this point should be unreachable37 end method

There are no special surprises in the implementation. Starting from a certain choice, thealgorithm recursively walks through the choices. If a target choice is encountered, the algo-rithm returns 1.0 if the associated label satisfies Θ2, returns the associated state probabilityin xnew if Θ1 holds but not Θ2, and otherwise returns 0.0. If a nondeterministic choice isencountered, the minimal probability coming from its children is returned. If a probabilisticchoice is encountered, the weighted probabilities coming from its children are summarized.Care must be taken in line 31 that the probability-field of the childChoice is used and notof the choice itself because of the shift described earlier. The algorithm terminates on avalid LMDP, because every choice path ends in a target.The algorithms for maximal reachability probability and the unbounded variants work

analogously.

6.3. Evaluation

Table 6.1 and table 6.2 show the performance of the model checking algorithms for theprobabilistic systems based on sparse data structures. For the analyses of the CMDPs, thefault occurrence of two faults was set to nondeterministic: communication messages maybe lost nondeterministically in the Railroad case study (MessageDropped), and the dialy-sis membrane may rupture nondeterministically in the Hemodialysis case study (Dialyzer-MembraneRupturesFault). The results show that the algorithms are fast enough for theapplication; every calculation finished within 10 minutes if the state space did not exceed

100

6.4. Related work

the memory. The main reason is that for the safety analysis of the case study models a rel-atively low number of iterations is sufficient. The results also show that the analysis of theCMDP requires considerably more time than the analysis of the labeled MC. The reasonis that no recursive function call is needed and that the number of transitions is higher. Inthe LMCs, the transitions of each state do not contain two entries with the same label andtarget; such transitions have been merged before. For the CMDPs the number of observedfaults (i.e., the numbers of faults in the labeling) made no difference, therefore only one theresults are shown where all faults were observed (cf. table 5.6 and table 5.7).

6.4. Related work

There are different ways to model check probabilistic systems. For systems with a smallnumber of states, it is sufficient to transform a probabilistic system into a system of linearequations and use standard methods to find the perfect solution. However, for larger sys-tems of linear equations this is not feasible anymore and iterative methods like the Gauss–Seidel method are applied to find an approximation.The biggest problem with probabilistic systems is that they easily exceed the available

memory. The approach of this thesis is to remove as much temporary variables out of thestate space to reduce this amount. Another approach is to use symbolic representationsof the state space. The PRISM model checker uses Multi-Terminal Binary Decision Dia-grams (MTBDDs), an extension of the well-known Binary Decision Diagrams (BDDs) thatenables a space-efficient storage of a probabilistic system in the memory. This is the en-couraged approach of the PRISMmodel checker [KNP11; Par02]. However, this approachhas some drawbacks: only specialized modeling language are suited for the efficient gen-eration of a MTBDD model, also model checking with MTBDDs is in most cases slowerthan with sparse data structures as the evaluations of Parker show [Par02]. For this reason,the PRISM model checker implements different computation engines:

• the MTBDD engine stores both the state space and the state probabilities (the prob-abilities of each state during the calculation) as MTBDDs,

• the explicit engine saves both state space and state probabilities as sparse data struc-tures,

• the hybrid engine saves the state space as MTBDD and the state probabilities assparse data structure, and

• the sparse engine creates the state space as MTBDD and then converts it to a sparsedata structure.

At the time of writing, PRISM is the reference for model checking of probabilistic systems,but PRISM has no support for labeled Markov chains or labeled choice-aware Markovdecision processes.A model checker that is limited to Markov chains is MRMC [KZ+11; Zap08]. The algo-

101


Case study Railroad Height Control Hemodialysis

Number of Faults 7 12 9States 2,587,933 2,186,964 294,871

Hazard Potential Collision Collision False Alarm Unsuccessful Contamination

Model Checking iterations 50 50 50 10 10Probability 0.12 2.00× 10−7 0.054 0.053 3.5× 10−4

Observed Faults 7 1 9Transitions 26,221,633 249,842,821 3,372,067Model Checking time 1 m 07 s 9 m 46 s 9 m 50 s 1 s 1 s

Observed Faults 0 0 0Transitions 9,100,662 187,511,131 3,372,067Model Checking time 1 s 7 m 34 s 7 m 43 s < 1 s < 1 s

Table 6.1.: Evaluation of the model checking efficiency of sparse labeled Markov chains


Number of Faults 7 12 9Observed Faults 7 0 9States 2,587,933 exceeded 294,871Choices 87,832,472 exceeded 6,449,262Transitions 45,210,203 exceeded 3,372,067

Hazard Potential Collision Collision False Alarm Unsuccessful Contamination

Min Probability 2.24× 10−6 - - 0.04 1.98× 10−5

Max Probability 0.14 - - 1 4.16× 10−4

Model Checking iterations 50 50 50 10 10Model Checking time 6 m 07 s - - 6 s 7 sSlow-down towards LMC 5.48 × /14.68 × ≈ 6 × ≈ 7 ×

Table 6.2.: Evaluation of the model checking efficiency of sparse labeled CMDPs

102

6.4. Related work

rithms of MRMC are written in C; earlier evaluations showed that the implementation ofMRMC are about 10 times faster than the unoptimized C#-based reference implementation.The reference implementation of this thesis could be accelerated by using data structuresthat do not make out of bound checks for every array access, exploit CPU extensions likeSIMD (Single instruction, multiple data), and enable multi-core model checking.

103

Wizard: Pay no attention to that man behind the curtain!

– The Wizard of Oz (1939)

7. Generating Probabilistic Systemsfrom Executable Models

The topic of this chapter is how to generate labeled Markov chains (LMCs) and labeledchoice-aware Markov decision processes (LMDPs) from an executable model. Chapter 4showed how the languages S# and Lustre could be used to model safety-critical systems asexecutable models. In contrast to those high level languages, this chapter introduces formalprobabilistic programs, a simplified imperative programming language that represents themore complex executable languages. As a result, formal semantics can be provided with-out having to dwell with the details of the more complex languages. This chapter presentsfor executable models based on formal probabilistic programs both a purely probabilisticsemantics (based on LMCs) and a semantics comprising probabilistic and nondeterminis-tic choice (based on LMDPs). The main part of this chapter introduces algorithms thatgenerate an LMC or an LMDP from an executable model by actually executing the model.This stands in contrast to the more common approaches, where modeling languages are de-signed that are relatively close to their formal foundation and thus easier to analyze, or wheremodels are transformed syntactically into an easier-to-analyze modeling language. Again,HiddenExample (page 36) with hiding applied is used as running example to illustrate thealgorithms.Section 7.1 introduces formal probabilistic programs and executable models formally.

Then, section 7.2 presents the generic state space traversal algorithm for executable models.Afterwards, section 7.3 and section 7.4 show how labeledMarkov chains and labeled choice-aware Markov decision processes are generated, respectively. Section 7.5 shows how theonce-operator that is used to talk about the past is normalized during the model generation.Then, section 7.6 shows how the algorithms can be used in conjunction with the Lustreprogramming language. Finally, section 7.7 compares and discusses the approach of thischapter with other approaches.The approach of this chapter was inspired by the state space traversal technique of

LTSmin [KL+15]. In this chapter, the generation of probabilistic systems is described; thedefinition and the generation of (qualitative-only) fault-aware Kripke structures, which areused for the DCCA, is described in [HLR16; Hab17]. The basic idea of the labeled Markov

105

7. Generating Probabilistic Systems from Executable Models

chain generation is published in [LK+17].

7.1. Formalization of Executable Models

In this section, a simplified imperative modeling language that includes all necessary lan-guage features to demonstrate the approach of this thesis is introduced. Compared to S#,this modeling language abstracts away all high level features like classes and methods andother convenience functions, which facilitate modeling. It is also simpler than Lustre, be-cause it does not feature clock operators and also causality does not need to be considered.The behavior of each executable model can still be reduced to the simplified language,because the introduced formal language is Turing-complete. As main benefit, the simplesemantics allow focusing on the essentials of this approach.

Adding probabilistic and nondeterministic functionality to a simplified imperative lan-guage [Plo04] has also been done by other languages; the guarded command language hasbeen extended by Gretz et al. [GKM14] and Promela by Baier et al. [BCG04]. In contrastto these approaches, the formal probabilistic programs here add a nondeterministic and aprobabilistic variant of the variable assignment. This makes sense, because such an assign-ment can simply be mapped to a method call in a traditional imperative language (e.g., theChoose-operator of S# on page 35).

The valuation of variables are stored in variable environments σ ∈ Σ ≜ VL ∪ VS → Val .Variables are divided into local variables VL, i.e., variables which are not persisted betweenσmacro steps, and state variables VS . A variable environment maps each variable v ∈ VL ∪ VS

to basic values ν ∈ Val , e.g., integers, booleans, and doubles, which can be serialized into afinite, fixed amount of memory.

This thesis leaves the expression language Expr intentionally underspecified; the only as-sumption is that the evaluation of e ∈ Expr is side effect free and its semantics is given bya function EJeK : Σ → Val .EJeK

The statements of formal probabilistic programs contain the usual statements skip, thesequential composition, the conditional statement, the while-loop, and the standard variableassignment. Additionally, the language has two variable assignment statements with theoverloaded name choose: the parameters of the probabilistic variant is a list of possibleoutcomes, each accompanied by a probability, whereas outcomes of the nondeterministic

106

7.1. Formalization of Executable Models

l : if y== thenl : l:=choose(( . , true),

( . , false));l : y:= ;l : if l thenl : y:=choose( , )

elsel : skip

fielse

l : skipfi

Figure 7.1.: Example of a probabilistic formal program

variant are not accompanied by probabilities.

ρ ∈ Stm ::= skip| ρ1 ; ρ2

| if e then ρ1 else ρ2 fi| while e do ρ od| v := e

| v := choose((p1,e1), . . . ,(pn,en))| v := choose(e1, . . . ,en)

For each probabilistic choice, the probabilities have to sum up to 1, i.e.,∑

i≤n pi = 1. Boththe parameters of the probabilistic and the nondeterministic variable assignment need tocontain at least one possible outcome.A formal probabilistic program is given by a statement. As an example for a probabilistic

formal program, figure 7.1 shows the manually converted Update-method of the S#modelHiddenExample of page 36.Based on formal probabilistic program, executable models can be formalized as follows.

Definition 7.1 (Executable Models). An executable model M is represented by the tuple(E, VS, VL, ρE, ρI), which contains

• a finite set of label expressions E ⊆ Expr ,• a finite set of state variables VS ,• a finite set of local variables VL such that VL ∩ VS = ∅,• a terminating execution program ρE ∈ Stm, and

107


• a terminating initialization program ρI ∈ Stm.

The execution program is executed in eachmacro step. The label expressions correspondto the formulas of interest that shall be part of the labeling of the probabilistic system, thestate variables represent the variables to be stored during model checking. By conventionEM stands for M ’s label expressions, V M

S for M ’s state variables, and so on. Given avariable environment σ, the state variable environment σS is the part of the variable en-vironment that talks about the states, i.e., σS ∈ VS → Val with σS(v) = σ(v). Theserialized form of a state variable environment is called state vector.

Example 7.2. The S# model HiddenExample with hiding applied is equivalent to the exe-cutable model M = (E, VS, VL, ρE, ρI) with

• the label expressions E = e0, e1 with e0 = y!= ||l==true and e1 = y== ,• the singleton set of state variables VS = y,• the singleton set of local variables VL = l,• the formal program of figure 7.1 as execution program ρE ,• the statement skip as initialization program ρI .

7.2. State Space Traversal

The basic strategy to derive a probabilistic system from an executable model by executionis to traverse the state space and collect the states and the transition information en passant.This is shown in the following algorithm that is split into two parts. The first part showshow initial states and their corresponding transitions are added to a probabilistic system.

Algorithm 7.1: executable model traversal (part 1)

1 method ModelTraverser::TraverseModel( M : executable model )2 returns probabilistic system3 ps := empty data structure for probabilistic system4 ti := calculate transition information by executing ρM

I

5 for all stateVector ∈ state vectors of all targets in ti do6 (stateIndex,hasBeenAdded) := stateStorage.AddState(stateVector)7 replace stateVector by stateIndex in ti8 if hasBeenAdded then9 statesToTraverse.Push(initialState);10 add transformed ti to ps

The variable stateStorage is an instance of a state storage, a class that stores and managesstate vectors, with the methods

• AddState: byte[ ] → bool× int, which takes a state vector as argument, and has twooutput parameters, the first expresses whether the state storage contained the state

108

7.2. State Space Traversal

vector before calling the method, and the second returns the index of the state vectorassigned by the state storage, and

• RetrieveStateVector: int → byte[ ], which returns the state vector given its index.Two different indexes must not point at the same state vector in state storage. Also, theindexes are required to be dense, i.e., the indexes must be in a range between 0 and thenumber of elements in state storage (exclusive). All probabilistic systems are not interestedin the concrete state vector, but instead in a unique index that represents a state; therefore,the state vector is replaced in the transition information in line 7. It depends heavily onthe kind of probabilistic system that should be created, how a transition information lookslike and how it is added to the probabilistic system. The sections 7.3 and 7.4 provide moredetails how this can be implemented for LMCs and LMDPs.Having the indexes of the initial states in the list statesToTraverse, the second part shows

how the state space is traversed.

Algorithm 7.2: executable model traversal (part 2)

11 while ¬ statesToTraverse.IsEmpty do12 var s := StatesToTraverse.Pop();13 var oldStateVector := stateStorage.RetrieveStateVector(s)14 ti := calculate transition information by executing ρM

Eon oldStateVector

15 for all stateVector ∈ state vectors of all targets in ti do16 (stateIndex,hasBeenAdded) := stateStorage.AddState(stateVector)17 replace stateVector by stateIndex in ti18 if hasBeenAdded then19 statesToTraverse.Push(initialState);20 add transformed ti to ps21 return ps22 end method

The state space is traversed state by state. At the beginning of the while loop, a state istaken from the list of the states that have not been traversed, yet. The data structure ofstatesToTraverse determines the traversal order. The algorithm can be parallelized, and itmight speed up the traversal speed in such a setting when the implementation does not insiston a deep-first or breadth-first search order. After selecting the state, the state transitioninformation is computed and added in the same way as for the initial transition informationin the first part of the algorithm. Finally, the loop in line 11 terminates when all states havebeen encountered. Note that every state is added exactly once to the state storage, andtherefore the algorithm processes each state exactly once and the transition information isadded exactly once.The following algorithm gives very simple implementation of a state storage.

109


Algorithm 7.3: AddState

1 method StateStorage::AddState(stateVector : byte[])2 returns (stateIndex : int, hasBeenAdded : bool)3 if stateVector ∈ vectorToIndex then4 return (vectorToIndex[stateVector], false)5 var freshIndex := next free index6 vectorToIndex[statevector] := freshIndex7 indexToVector[freshIndex] := stateVector8 return (freshIndex, true)9 end method

StateStorage has two hash maps: vectorToIndex, which maps a state vector to an index,and indexToVector, which maps an index to a state vector. The map indexToVector is nec-essary for the method RetrieveStateVector. The reference implementation uses a far morecomplex implementation of a state storage that is optimized for multi-core model checking.The optimized version is based on the state storage presented by Laarman in [Laa14]. Theoriginal state storage of Laarman does not guarantee that the assigned indexes of AddStateare dense. For this thesis, the algorithm was slightly extended so that it returns dense in-dexes.

7.3. Generating Labeled Markov Chains from ExecutableModels

Having an executable model that should be analyzed, the executable model traversal algo-rithm depends on transition information that must be generated for each state of the model.This section describes how such transition information looks like for labeledMarkov chainsand how it is created by actually executing the model. But first, the next subsection gives asemantics for executable models that allows their probabilistic interpretation. This under-lines the soundness of this approach.

7.3.1. Labeled Markov Chain Semantics of an Executable Model

Before a semantics is provided for complete executable models, a structured operationalsemantics is provided for their formal probabilistic programs. A structural operational se-mantics for a programming language describes how to interpret a program stepwise. Givena formal probabilistic program, the idea is to define a Markov chain where the states of theMarkov chain represent the possible states of the programs.Let si −→ µ denote that µ ∈ Dists(S) and R(si) = µ. Let si −→ sj be a shorthand

that denotes si −→ µ with µ(sj) = 1 and 0 otherwise.

110

7.3. Generating Labeled Markov Chains from Executable Models

⟨skip, σ⟩ −→ ⟨step, σ⟩(skip)

⟨ρ1, σ⟩ −→ µ1

⟨ρ1 ; ρ2, σ⟩ −→ µwith

µ(⟨ρ′1 ; ρ2, σ′⟩) = µ1(⟨ρ

′1, σ′⟩),

µ(⟨ρ2, σ′⟩) = µ1(⟨step, σ′⟩), and µ(·) = 0 otherwise(seq)

⟨if e then ρ1 else ρ2 fi, σ⟩ −→ ⟨ρ1, σ⟩, if EJeKσ = true(if true)

⟨if e then ρ1 else ρ2 fi, σ⟩ −→ ⟨ρ2, σ⟩, if EJeKσ = false(if false)

⟨while e do ρ od, σ⟩ −→ ⟨ρ ; while e do ρ od, σ⟩, if EJeKσ = true(while true)

⟨while e do ρ od, σ⟩ −→ ⟨step, σ⟩, if EJeKσ = false(while false)

⟨v := e, σ⟩ −→ ⟨step, σ[v 7→ EJeKσ]⟩(assign)

⟨v := choose((p1,e1), . . . ,(pn,en)), σ⟩ −→ µ

with µ(⟨step, σ[v 7→ EJeiKσ]⟩) = pi, 1 ≤ i ≤ n, and µ(s) = 0 otherwise(por)

Figure 7.2.: Structural operational semantics defining the inference rules that are used bythe purely probabilistic semantics of executable models.

Figure 7.2 provides the inference rules for the structural operational semantics of formalprobabilistic programs. The usual definitions are adopted to fit into the quantitative set-ting. Consider for the sequential composition (seq): Assume that a distribution µ1 for state⟨ρ1, σ⟩ exists. By applying (seq), the distribution µ of state ⟨ρ1 ; ρ2, σ⟩ can be inferred.Indeed, when ρ is the first statement in a sequence of statements and the execution of ρon its own in a certain variable environment is resulting in a distribution of variable envi-ronments, the subsequent statement of the sequence of statements can be executed in theresulting variable environments. By applying the rule (por) in a state where a probabilisticassignment is made, a distribution to the states with the corresponding outcomes emerges.In the purely probabilistic setting, the statement for nondeterministic choice is interpretedas uniform distribution over its possible outcomes; hence, v := choose(e1, . . . ,en) issyntactic sugar for v := choose((1/n,e1), . . . ,(1/n,en)).How to apply the rules is illustrated by figure 7.3, where the inference rules given in

figure 7.2 are applied on the running example of figure 7.1 on page 107. Each node rep-resents a statement with a variable environment. To keep the node labeling concise, thefigure uses the reference numbers of the statements, and the figure abbreviates false tof and true to t, respectively. The distributions are depicted by (distribution) arrows withsolid heads. When a distribution has only one target state with probability 1 the solid arrowends directly in the node representing the state. Otherwise, a set of (probability) arrowswith empty heads start at the head of the distribution arrow and end in the target state.Distribution arrows are numbered and contain the name of the inference rules by whichthey are justified and, if required, the arrow number of the premise’s distribution.The idea is to define the states of the Markov chain semantics as S = (Stm∪step)×Σ.

These states can be used as foundation for the inference rules for all formal probabilisticprograms. But these states are not suitable to define aMarkov chain because the set of states

111


Figure 7.3.: Inference rules applied to the formal probabilistic program of figure 7.1

in a Markov chain must be countable and Stm is uncountable (e.g., assigning a real numberto a variable). Thus, only the relevant states for a specific formal probabilistic program ρthat terminates shall be regarded. Therefore, the variable environment Σ is assumed tobe finite by only considering finite variable domains, and only the subset Stmρ ⊂ Stm ofpartial programs relevant for ρ shall be regarded in the definition of S. Stmρ is definedStmρ

to be the set of all partial programs in ρ, and for each loop while e do ρ′ od in ρ theunwound loop, i.e., ρ′ ; while e do ρ′ od is included. By applying all restrictions, S is notonly countable but also finite.

Definition 7.3 (Markov Chain Semantics of a formal probabilistic program). The MarkovChain semantics of an executable probabilistic program ρ with initial variable environmentσ0, denotedMJρKσ0, is defined as the standard Markov Chain (Sρ, Rρ, µρ

init) where• Sρ = (Stmρ ∪ step) × Σ, and• Rρ(⟨ρ, σ⟩) = µ ⇐⇒ ⟨ρ, σ⟩ −→ µ and

Rρ(⟨step, σ⟩)(⟨step, σ⟩) = 1 and 0 otherwise, and• µρ

init(⟨ρ, σ0⟩) = 1 and 0 otherwise.

Based on the Markov chains induced by formal probabilistic programs representing sin-gle macro steps, the semantics of a complete executable model M = (E, VS, VL, ρE, ρI)is defined. To keep the notation succinct, let Σ(Λ, σS) define those variable environmentswhere exactly the set of expressions e ∈ Λ ⊆ E evaluate to true and the state vari-ables coincide with σS . More formally, Σ(Λ, σS) = σL ∪ σS | ∀e ∈ E . e ∈ Λ ↔Σ(Λ, σS)EJeK(σL ∪ σS) = true. Let ιL : VL → Val be the local variable environment thatιL

assigns to every local variable its initial default value, and let ιLS ∈ Σ be the variable envi-ιL

Sronment that assigns every variable its initial default value. For adding the initialization oflocal variables to a state variable environment σS , let σL

S define σLS = ιL ∪ σS .σL

S

112


Definition 7.4 (Labeled Markov chain semantics of an executable model). The 2E-labeledMarkov chain M = (SM , RM , µM

init) induced by an executable model M = (E, VS, VL,

ρE, ρI), is defined as• SM = VS → Val , and• RM(σS)(Λ′, σ′

S) =∑

σ′∈Σ(Λ′,σ′S

) PrMJρEKσLS (F ⟨step, σ′⟩), and

• µMinit

(Λ′, σ′S) =

∑σ′∈Σ(Λ′,σ′

S) PrMJρIKιL

S (F ⟨step, σ′⟩).

Note thatRM and µMinit

are only well-defined, because ρE and ρI must terminate in validexecutable models (see definition 7.1); if a formal program would loop infinitely, the prob-ability distributions would not sum up to 1. Also note that the local variable environmentproduced in a macro step is not recorded in the states of the labeled Markov chain, butcontributes to the evaluation of the expressions; removing local variables out of the statespace is the primary source of the efficiency of the approach described in this thesis.

7.3.2. Generating the Transition Information of a State

This subsection shows how the transition information of a state looks like and how it isgenerated for a labeled Markov chain. To demonstrate the idea of the algorithm, the exe-cutable model of example 7.2 on page 108 is used. As source state of which the transitioninformation should be calculated, the state with σS(y) = is used. Let v = ν denotethe (serialized) state vector of the variable environment σS(v) = ν, e.g., y = 0 is the statevector of σS(y) = .The following algorithm provides the frame how the transition information, in this case

an array of (probability× label× stateVector) is calculated.

Algorithm 7.4: Calculation of transition information

1 method LmcTraverser::CalculateTransitionInformation(ρ, oldStateVector)2 returns transition information3 var newTransitions : (probability× label× stateVector)[*] := [ ]4 while choiceResolver.NextChoices() do5 restore the variable environment by deserializing oldStateVector6 execute ρ

7 var prob := choiceResolver.GetProbabilityOfTrace()8 var label := e ∈ E | variable environment satisfies e9 var newStateVector := serialize current variable environment10 newTransitions += (prob,label,newStateVector)11 return newTransitions12 end method

The choiceResolver determines the outcomes of each choice in the next execution of theprobabilistic program ρ. Each time its methodNextChoices is called, the choiceResolver se-lects another combination of outcomes, if there is an untreated combination left; otherwise

113


NextChoices returns false and the calculation of the transition information finishes. HowNextChoices works, is addressed later in this section. Every call of ρ uses another com-bination of choice outcomes, but every call must start with the state variable environmentof oldStateVector; hence, the state environment is deserialized before ρ is called. After theexecution finished, a new transition is created with the probability of the micro step trace,which is determined by the probabilities of the choices, the label of the transition, and thestate vector of the successor state.

newTransitions

probability

label

stateVector

0

0.3

e0

y = 2

1

0.3

e0

y = 3

2

0.4

e0, e1

y = 1

Figure 7.4.: Example of transition information obtained during the generation of a labeledMarkov chain

Figure 7.4 shows the result of CalculateTransitionInformation applied on the executionprogram of the running example with state vector y = 0. Later, pseudo code of thechoose-operator and the NextChoices-method are provided by algorithm 7.6 (page 118)and algorithm 7.5 (page 117), respectively. But first, the proceeding is illustrated on therunning example to provide an intuition of the algorithms. This result is computed in 4iterations as explained below. For the explanation, the graph of figure 7.3 is condensedsuch that it only contains the nodes of the probabilistic choices. Compare figure 7.5a withfigure 7.3:

• choice node 0 represents micro state (l2;l3;l4,f,0),• choice node 1 represents micro state (l5,t,1),• final node 2 represents micro state (step,f,1),• final node 3 represents micro state (step,t,2), and• final node 4 represents micro state (step,t,3).

Each condensed graph represents another point of time during the calculation of the traver-sal information. The solid nodes and arcs in the condensed graphs express that the choiceresolver is aware of the corresponding choices, in contrast to the dashed nodes and arcs.For the calulation, a choice stack is used, which is an array of (probability×currentOption

× numberOfOptions). The first iteration of the while loop of algorithm 7.4 takes place asfollows (illustrated in figure 7.5):1. After the deserialization, the variable environment at the start of the iteration is

σ(y) = and σ(l) = false. The choice stack is empty.2. As first instruction, the conditional statement if y== then … else … fi is exe-cuted. The condition is satisfied. The then-branch gets executed.

3. The first choice l:=choose(( . , true), ( . , false)) gets executed. This

114


(a) Step 1 (b) Step 3 (c) Step 6

Figure 7.5.: Calculation of the traversal information of the running example (Iteration 1)

choice has not been encountered before, so its first option is selected. The valueof l is updated to true. The probability of the selected option, the number of theselected option, and the total number of options are saved on the choice stack [(0.6,0, 2)].

4. The statement y:= gets executed; therefore, y is updated to .5. The conditional statement if l then … else … fi is executed; the then-branchis taken.

6. The next choice y:=choose( , ) is encountered. This choice has also not beenencountered before, so its first option is selected. The value of y is updated to .Because no probability was provided for this choice and the current analysis is prob-abilistic only, a uniform distribution is assumed (probability=0.5). The probabilitymultiplied with the previous probability entry in the choice stack, the number of theselected option, and the total number of options are appended to the choice stack[(0.6, 0, 2), (0.3, 0, 2)].

7. The code execution finished. The variable environment is σ(y) = and σ(l) =true. Therefore, the expression e0: y!= || l==true is satisfied, but not the ex-pression e1: y== . The probability on top of the choice stack is the final probabilityof the micro step trace. A new transition is saved with the probability 0.3, the labele0 and the state vector y= . The local variable is discarded.

(a) Step 1 (b) Step 2 (c) Step 3


The second iteration (figure 7.6):1. After the deserialization, the variable environment is reseted to the variable environ-

115


ment with σ(y) = and σ(l) = false. The deepest currentOption of the choicestack is incremented to predetermine the chosen options for the ongoing iteration.The resulting choice stack is [(0.6, 0, 2), (0.3, 1, 2)]. In figure 7.6, the unfilled circlesat the origins of the edges hint at the predetermined choices.

2. When the first choice l:=choose(( . , true), ( . , false)) is encountered,the predetermined option 0 on the choice stack is selected. The value of l is updatedto true. Because a predetermined value was selected, the choice stack is not updated.

3. When the second choice y:=choose( , ) is encountered, the predetermined op-tion 1 on the choice stack is selected. The value of y is updated to . Because apredetermined value was selected, the choice stack is not updated. The probabilityof the last entry is updated to the probability of the choice (0.5) multiplied with theprevious probability entry in the choice stack.

4. The code execution finished. The variable environment is σ(y) = and σ(l) =true. In this case, also only expression e0 is satisfied. A new transition is savedwith the probability 0.3, the label e0 and the state vector y= . The local variable isdiscarded.

(a) Step 1 (b) Step 2


The third iteration (figure 7.7):1. After the deserialization, the variable environment is reseted to the variable environ-ment with σ(y) = and σ(l) = false. Note that after incrementing the deepestcurrentOption of the choice stack, the choice stack is equal to [(0.6, 0, 2), (0.3, 2, 2)].Every option of the deepest choice in the choice stack has been enumerated. ThecurrentOption of the deepest choice stack entry is equal to the numberOfOptions,which can easily be detected. Therefore, the deepest entry is popped from the stackand the next deepest currentOption is incremented. The resulting choice stack is[(0.6, 1, 2)].

2. When the first choice l:=choose(( . , true), ( . , false)) is encountered,the predetermined option 1 on the choice stack is selected. The value of l is up-dated to false. Because a predetermined value was selected, the choice stack is notupdated. The probability of the last entry is updated to the probability of the choice(0.4). There exists no previous entry on the choice stack, therefore, there is no need

116


to multiply the probability with the previous probability entry in the choice stack. Tomake the implementation easier, the previous probability entry of the empty choicestack is defined as 1.

3. The code execution finished. The variable environment is σ(y) = and σ(l) =false. In this case, also both expression e0 and e1 are satisfied. A new transitionis saved with the probability 0.4, the label e0, e1 and the state vector y= . The localvariable is discarded.

(a) Step 1


The forth iteration (figure 7.8):1. After incrementing the deepest currentOption of the choice stack, the choice stackis equal to [(0.4, 2, 2)]. Therefore, the deepest entry is popped from the stack.

2. The choice stack is empty and it is not the first iteration. Hence, all combinationsof possible outcomes have been encountered. The calculation of the transitionsfinished.

Pseudo code of the choose-operator and the NextChoices-method are provided by al-gorithm 7.6 and algorithm 7.5, respectively.

Algorithm 7.5: NextChoices

1 method ChoiceResolver::NextChoices( )2 returns true if there is still another untreated combination of choice outcomes, otherwise false3 choiceDepth := -1;4 if first execution of NextChoices for a given stateVector then5 return true;6 while choiceStack not empty do7 var lastEntry := choiceStack.Pop();8 if lastEntry.numberOfOptions > lastEntry.currentOption + 1 then9 lastEntry.currentOption := lastEntry.currentOption + 110 choiceStack.Push(lastEntry)11 return true12 return false13 end method

117


Algorithm 7.6: Executing a probabilistic choose

1 method ChoiceResolver::choose( (p0,e0) . . . (pn,en) )2 returns an outcome of the input options3 var previousProbabilityEntry := choiceStack[choiceDepth].probability, or 1.0 if non-existent4 choiceDepth := choiceDepth + 15 var choiceStackSize := number of elements in choiceStack6 if choiceDepth < choiceStackSize then7 var idx := choiceStack[choiceDepth].currentOption8 if choiceDepth = choiceStackSize - 1 then9 choiceStack[choiceDepth].probability = pidx · previousProbabilityEntry10 return eidx

11 var valueCount := n + 112 chosenValues.Push( (p0 · previousProbabilityEntry, 0, valueCount) )13 return e014 end method

The algorithm for NextChoices does not provide any surprises. The probability of achoice is saved on the choice stack to reduce the number of floating point multiplications,which makes the calculation of the transition information a bit faster. For this reason, thealgorithm of choose uses the probability of its predecessor on the choice stack to calculatethe probability of a new entry (line 3); if it is the first choice, a value of 1.0 is assumed toavoid a later case distinction. There are three interesting cases to consider:

• if there is no entry on the choice stack for the current choice, the choice is encoun-tered the first time, and a new entry for the choice is pushed on the stack (lines11-13),

• if there is an entry on the stack that is not the last entry, the preselected choice onthe stack is selected and returned (line 7 and 10), and

• if there is an entry on the stack that is the last entry, the probability on the stack getsupdated, and the selected value is returned (lines 7-10).

Updating the probability in the second case would have no effect. Omitting it saves somefloating point multiplications.After the calculation of the transition information of a state finished, all transitions that

have the same label and state vector are merged by summing up their probabilities. Inpractice, it was faster to sort the transitions using an algorithm inO(n log n) before mergingthem to avoid the quadratic complexity that arises when for each transitions the full arrayhas to be iterated to find a merging partner. This is quite trivial and not shown in thisthesis; the interested reader is referred to the reference implementation [ISSE18]. Aftermerging the transitions, the traversal algorithm replaces the state vectors in the transitioninformation with the index of the state in the final sparse data structure (algorithm 7.1 onpage 108). Finally, the algorithm appends the traversal information to the data structure ofthe probabilistic system

118



Number of Faults 7 12 9Observed Faults 7 1 9States 2,587,933 2,186,964 294,871Transitions 26,221,633 249,842,821 3,372,067Traversal time 27 s 50 m 42 s 7 s

Hazard Potential Collision Collision False Alarm Unsuccesful Contamination

Probability 0.12 2.00× 10−7 0.054 0.052 2.4× 10−4

Model Checking time 1 m 07 s 9 m 46 s 9 m 50 s < 1 s < 1 s

Table 7.1.: Evaluation of the generation time of a sparse labeled Markov chains when allfaults and hazards are observable


Number of Faults 7 12 9Observed Faults 0 0 0States 2,587,933 2,186,964 294,871Transitions 9,100,662 187,511,131 3,372,067Traversal time 22 s 49 m 59 s 5 s


Probability 0.12 2.00× 10−7 0.054 0.052 2.4× 10−4

Model Checking time 25 s 7 m 34 s 7 m 43 s < 1 s < 1 s

Table 7.2.: Evaluation of the generation time of a sparse labeled Markov chains when onlyhazards are observable

The labeled Markov chain of figure 6.1 on page 94 is the result of the state space traversalof this section.

An imperative host language like S# can directly call the choicemethod, and the executionprogram must not be interrupted while it is executed. As a result, there are no hacks ora specialized reimplementation of the virtual machine that executes the code necessary.After the execution program finishes, the state could be serialized. Using this principle,there is no need to save non-state variables at any time, because each execution begins atthe start. In contrast, an alternative approach that interrupts the control flow needs to saveall variables and also the program counter. The chosen approach recalculates intermediatesteps multiple times but saves a lot of serialization and deserialization effort. The evaluationin the next subsection shows that the approach works quite well in practice.

119


7.3.3. Evaluation

Table 7.1 and table 7.2 show the evaluation on the case studies. The number of formulas inthe labels had no big impact on the generation time. Especially the generation of the proba-bilistic systems for the Height Control case study with about 50 minutes performed poorly.For this reason, chapter 8 provides some optimizations that accelerate the generation.

7.4. Generating Labeled Choice-aware Markov DecisionProcesses from Executable Models

This section describes how the transition information looks like for labeled choice-awareMarkov decision processes and how it is created by actually executing the model. But first,the next subsection gives a semantics comprising both nondeterministic and probabilisticchoice for executable models.

7.4.1. Choice-aware Markov decision process semantics of anexecutable model

Before a semantics is provided for complete executable models, a structured operationalsemantics is provided for their formal probabilistic programs. Given a formal probabilisticprogram, the idea is to define a set of choices (in the CMDP sense, see definition 5.16 onpage 73) where each choice represents a possible state of the program and an appropriatechoice transition function for that choices.Let ci −→ µ denote that µ ∈ Dists(C) and succ(ci) = µ. Let ci −→ cj denote that

succ(ci) ∈ 2C with cj ∈ succ(ci). Figure 7.9 provides the inference rules for the structuraloperational semantics of formal probabilistic programs. They are very similar to those offigure 7.2 with small differences: the statement v := choose(e1, . . . ,en) is no longersyntactic sugar; it is a real part of the language. Therefore, a rule has to be introduced.Furthermore, the sequential composition no longer has to only treat distributions, but alsonondeterministic choices. For this reason, the rules seq1 and seq2 are introduced. Also,all deterministic statements are implemented as nondeterministic choices with only onesuccessor. In contrast to this, figure 7.2 defines the inference rules of the deterministicstatements by a probability distribution in which the only successor has a probability of1. The inference rules here could also be defined using such a probability distribution forthose statements; the preferred way is only a matter of taste.How to apply the rules is illustrated in figure 7.10, where the inference rules given in

figure 7.9 are applied on the running example of figure 7.1 on page 107. The distributionsare depicted by (distribution) arrows with solid heads into an intermediate point-shapednode, from which a set of (probability) arrows with empty heads start at the head of the

120

7.4. Generating Labeled Choice-aware Markov Decision Processes from Executable Models

⟨skip, σ⟩ −→ ⟨step, σ⟩(skip)

⟨ρ1, σ⟩ −→ ⟨ρ′1, σ′⟩

⟨ρ1 ; ρ2, σ⟩ −→ ⟨ρ′1 ; ρ2, σ′⟩

(seq1)

⟨ρ1, σ⟩ −→ ⟨step, σ′⟩

⟨ρ1 ; ρ2, σ⟩ −→ ⟨ρ2, σ′⟩(seq2)

⟨ρ1, σ⟩ −→ µ1

⟨ρ1 ; ρ2, σ⟩ −→ µwith

µ(⟨ρ′1 ; ρ2, σ′⟩) = µ1(⟨ρ

′1, σ′⟩),

µ(⟨ρ2, σ′⟩) = µ1(⟨step, σ′⟩), and µ(·) = 0 otherwise(seqp)

⟨if e then ρ1 else ρ2 fi, σ⟩ −→ ⟨ρ1, σ⟩, if EJeKσ = true(if true)

⟨if e then ρ1 else ρ2 fi, σ⟩ −→ ⟨ρ2, σ⟩, if EJeKσ = false(if false)

⟨while e do ρ od, σ⟩ −→ ⟨ρ ; while e do ρ od, σ⟩, if EJeKσ = true(while true)

⟨while e do ρ od, σ⟩ −→ ⟨step, σ⟩, if EJeKσ = false(while false)

⟨v := e, σ⟩ −→ ⟨step, σ[v 7→ EJeKσ]⟩(assign)

⟨v := choose((p1,e1), . . . ,(pn,en)), σ⟩ −→ µ

with µ(⟨step, σ[v 7→ EJeiKσ]⟩) = pi, 1 ≤ i ≤ n, and µ(c) = 0 otherwise(por)

⟨v := choose(e1, . . . ,en), σ⟩ −→ ⟨step, σ[v 7→ EJeiKσ]⟩ with 1 ≤ i ≤ n(or)

Figure 7.9.: Structural operational semantics defining the inference rules that are used bythe semantics of executable models that comprises both probabilistic and non-deterministic choice.

distribution arrow and end in the target state. Nondeterministic transitions are depicted bysolid arrows that end directly in the nodes representing the successors. The non-probabilityarrows are numbered and contain the name of the inference rules by which they are justifiedand, if required, the arrow number of the premise’s distribution.If the alternative definition using a probabilistic distribution for deterministic statements,

then in the inference example in figure 7.10 rule (seqp) had to be applied instead of rule(seq2); therefore the rule (seqp) was redundant in that example. The rules (seq1) and (seq2)are not redundant in general, because there are other programs, in which these rules arenecessary, e.g., l:= ; y:=choose( , ).Note that the inference rules provide for each choice of the form ⟨ρ, σ⟩ either exactly

one distribution or a nondeterministic set of successors. ⟨ρ, σ⟩ is called probabilistic inthe former case and nondeterministic in the latter. Let ιL, ιL

S , and σLS be defined as in

section 7.3.1 on page 7.3.1. The choices CM = (StmρE∪ StmρI

∪ step) × Σ and theinference rules provide the foundation of the formal semantics of executable models.

Definition 7.5 (Labeled choice-aware Markov decision process semantics of an executablemodel). The 2E-labeled choice-aware Markov decision process M = (SM ,CM , succM ,CM , cM

init) induced by an executable model M = (E, VS, VL, ρE, ρI), is defined as

• SM = VS → Val , and• C

M = (StmρE∪ StmρI

∪ step) × Σ, and

121


Figure 7.10.: Inference rules applied to the formal probabilistic program of figure 7.1

• succM(⟨ρ, σ⟩) = µ ⇐⇒ ⟨ρ, σ⟩ −→ µ if ⟨ρ, σ⟩ is probabilistic, and• succM(⟨ρ, σ⟩) = ⟨ρ′, σ′⟩ | ⟨ρ, σ⟩ −→ ⟨ρ′, σ′⟩ if ⟨ρ, σ⟩ is nondeterministic, and• succM(⟨step, σ⟩) = (Λ, σS) with Λ = e ∈ E | EJeK(σ) = true, and• CM(σS) = (ρE, σL

S ) , and• cM

init= (ρI , ιL

S) .

Note that succM is acyclic, because CM is finite and both ρE and ρI must terminate invalid executable models (see definition 7.1). Also note that the local variable environmentproduced in a macro step is not recorded in the states of the choice-aware Markov decisionprocess, similar to the labeled Markov chain semantics of the previous section.

7.4.2. Generating the Transition Information of a State

This subsection shows how the transition information of a state looks like and how it whengenerating a labeled choice-aware Markov decision process.The following algorithm provides the frame how the transition information, in this case

an array of (type× probability× from× to) for the new choices and an array of (label×stateVector) for the targets, is calculated.

Algorithm 7.7: Calculation of transition information

1 method LmdpTraverser::CalculateTransitionInformation(ρ, oldStateVector)2 returns transition information3 var newChoices : (type× probability× from× to)[*] := [ ]4 var newTargets : (label× stateVector)[*] := [ ]5 while choiceResolver.NextChoices() do6 restore the variable environment by deserializing oldStateVector7 execute ρ

8 var label := e ∈ E | variable environment satisfies e

122


9 var newStateVector := serialize current variable environment10 newTargets += (label,newStateVector)11 return (newChoices, newTargets)12 end method

The algorithm works similar as algorithm 7.4 of the labeled Markov chain case; the maindifferences are the calculated outputs and that a choice resolver is used that can handlethe nondeterministic case. How the modified NextChoices works, is addressed later in thissection. For the calulation, a choice stack is used, which is an array of (currentOption ×numberOfOptions × choiceIndex). Compared to the choice stack of the LMC case, theprobability-entry is missing and the choiceIndex is added.

newChoices

type

probability

from

to

0

p

-

1

2

1

nd

0.6

3

4

2

t

0.4

-

2

3

t

-

-

0

4

t

-

-

1

newTargets

label

stateVector

0

e0

y = 2

1

e0

y = 3

2

e0, e1

y = 1

(a) Data structures (b) Visualization of newChoices

Figure 7.11.: Example of transition information obtained during the generation of a labeledMarkov chain

Figure 7.11 shows the result of CalculateTransitionInformation applied on the executionprogram of the running example with state vector y = 0. The basic idea behind the algo-rithm is to collect the encountered choices in the data structure newChoices the during thetraversal; or to put it simply, the condensed graph of figure 7.3 is calculated on-the-fly (seefigure 7.11b).Later, pseudo code of the probabilistic choose-operator and the NextChoices-method

are provided by algorithm 7.9 (page 125) and algorithm 7.8 (page 125), respectively. Butfirst, the approach is illustrated on the running example. The result is computed in fouriterations; the first iteration is explained below.After the deserialization, the variable environment at the start of the iteration is σ(y) =

and σ(l) = false. The choice stack is empty. The status of newChoices is shown infigure 7.12a; one initial choices is set up where the to-field is not known, yet.The first choice l:=choose(( . , true), ( . , false)) gets executed. This choice

has not been encountered before, so its first option is selected. The value of l is updated

123


newChoices

type

probability

from

to

0

t

-

-

?

(a) At the beginning

newChoices

type

probability

from

to

0

p

-

1

2

1

t

0.6

-

?

2

t

0.4

-

?

(b) After the first choice

newChoices

type

probability

from

to

0

p

-

1

2

1

nd

0.6

3

4

2

t

0.4

-

?

3

t

-

-

0

4

t

-

-

?

(c) After the second choice


to true. The number of the selected option, the total number of options, and the indexof the current choice are saved on the choice stack [(0, 2, 0)]. Two new entries are createdin newChoices; the probabilities of the options are saved in the corresponding probability-fields of those new entries. The from-field and the to-field of the old entry are set tothe indexes of the new entries, and the type-field is set to probability. The index of thecurrent choice is set to the index of the from-field. The status of newChoices is shown infigure 7.12b.The next choice y:=choose( , ) is encountered. Now real nondeterminism is possi-

ble. This choice has also not been encountered before, so its first option is selected. Thevalue of y is updated to . The number of the selected option, the total number of options,and the index of the current choice are appended to the choice stack [(0, 2, 0), (0, 2, 2)].Two new entries are created in newChoices; the probability-fields of those new entries areleft untouched. The from-field and the to-field of the old entry are set to the indexes of the

124


new entries, and the type-field is set to nondeterministic. The index of the current choice isset to the index of the from-field. The code execution finished. The variable environmentis σ(y) = and σ(l) = true. Therefore, the expression e0: y!= || l==true is satisfied,but not the expression e1: y== . A new target is saved with the label e0 and the state vectory= . The local variable is discarded. The status of newChoices is shown in figure 7.12c.Every time a new choice node is encountered, new entries are created in the newChoices

data structure; except from that, the other iterations work analogously to the algorithms ofsection 7.3.2.Pseudo code of the probabilistic choose-operator and the NextChoices-method are pro-

vided by algorithm 7.9 and algorithm 7.8, respectively.

Algorithm 7.8: NextChoices

1 method ChoiceResolver::NextChoices( )2 returns true if there is still another untreated combination of choice outcomes, otherwise false3 choiceDepth := -1;4 if first execution of NextChoices for a given stateVector then5 return true;6 while choiceStack not empty do7 var lastEntry := choiceStack.Pop();8 if lastEntry.numberOfOptions > lastEntry.currentOption + 1 then9 lastEntry.currentOption := lastEntry.currentOption + 110 lastEntry.choiceIndex := lastEntry.choiceIndex + 111 currentChoiceIndex := lastEntry.choiceIndex12 choiceStack.Push(lastEntry)13 return true14 return false15 end method

Algorithm 7.9: Executing a probabilistic choose

1 method ChoiceResolver::choose( (p0,e0) . . . (pn,en) )2 returns an outcome of the input options3 choiceDepth := choiceDepth + 14 var choiceStackSize := number of elements in choiceStack5 if choiceDepth < choiceStackSize then6 var idx := choiceStack[choiceDepth].currentOption7 return eidx

8 var valueCount := n + 19 var oldChoiceIndex := currentChoiceIndex10 currentChoiceIndex := nextFreeChoiceIndex11 nextFreeChoiceIndex := nextFreeChoiceIndex + valueCount12 newChoices[oldChoiceIndex].type := probabilistic13 newChoices[oldChoiceIndex].from := currentChoiceIndex14 newChoices[oldChoiceIndex].to := currentChoiceIndex + valueCount - 1

125


15 for each i ∈ [0..n] do16 newChoices[currentChoiceIndex + i].probability = pi

17 choiceStack.Push ( 0, valueCount, currentChoiceIndex )18 return e019 end method

The variable nextFreeChoiceIndex is initially 0. The algorithm NextChoices here workssimilar as the one of the purely probabilistic case (cf algorithm 7.6 on page 118); the algo-rithm is only adapted to keep track of the currentChoiceIndex and the choiceIndexes in thechoice stack.In contrast to algorithm 7.6 on page 118, the algorithm for the probabilistic choose here

only considers two cases• if there is no entry on the choice stack for the current choice, the choice is encoun-tered the first time, a new entry for the is pushed on the choice stack and a new entryis added to the newChoices array (lines 8-18),

• otherwise, the preselected choice on the stack is selected and returned (line 7 and10).

The nondeterministic choose works analogously to the probabilistic choose describedhere.After the calculation of the transition information of a state finished, the traversal algo-

rithm replaces the state vectors in the newTargets array of transition information with theindex of the state in the final sparse data structure (algorithm 7.1 on page 108). Finally,the algorithm appends the traversal information to the data structure of the probabilisticsystem. In most cases, the indexes used in the newChoices array are already occupied inthe data structure of sparse labeled CMDP. Therefore, new indexes in the data structureof the probabilistic system have to be reserved and the new entries have to be adjustedaccordingly.The labeled choice-aware Markov decision process of figure 6.2 on page 97 is the result

of the state space traversal algorithm of this section. The evaluation in the next subsectionshows that the approach works for most, but not all examples.

7.4.3. Evaluation

As table 7.3 shows, the algorithm does not work for the Height Control case study, be-cause too many entries in the choices-array are required (memory exceeded). The purelyprobabilistic algorithm generates far less entries, because matching transitions are mergedbefore they were integrated in the spare labeled Markov chains. Unfortunately, this is notpossible here. For the case studies that work, the algorithm is still slower than the purelyprobabilistic case (cf. table 7.2 on page 119).

126

7.5. Implementing the Past Time Operator Once


Number of Faults 7 12 9Observed Faults 7 0 9States 2,587,933 exceeded 294,871Choices 87,832,472 exceeded 6,449,262Transitions 45,210,203 exceeded 3,372,067Traversal time 29 s exceeded 5 s


Min Probability 2.24× 10−6 - - 0.04 1.98× 10−5

Max Probability 0.14 - - 1 4.16× 10−4

Model Checking time 6 m 07 s - - 41 s 43 s

Table 7.3.: Evaluation of the generation time of a sparse labeled CMDP when only hazardsare observable

7.5. Implementing the Past Time Operator Once

Sometimes, it is useful to talk about the past in a formula. Hence, the model checkerof this thesis borrows the once-operator from the linear temporal logic with past [Mar03;BC03]. The subsequent semi-formal definition gives an idea of the operator: Let πinit =θ0 θ1 θ2 . . . ∈ InfPathsM

initbe an initial infinite path, and

f ::= e | O f ′ | F f ′ .

Then, (πinit , i) |= f (f hold in π at time i) is inductively defined as follows:

(πinit , i) |= e ⇐⇒ θi satisfies expression e

(πinit , i) |= O f ′ ⇐⇒ ∃j ∈ [0, i]. (πinit , j) |= f ′

(πinit , i) |= F f ′ ⇐⇒ ∃j ∈ [i, ∞]. (πinit , j) |= f ′ .

A formula f is valid on a path πinit iff (πinit , 0) |= f . Let Pr(f) = Pr( πinit |f is valid on πinit). The full past time temporal logic has more operators, but the smallexcerpt presented here is sufficient for the upcoming analyses.If the operand of a once formula is a state formula, the once formula can be normalized

to a plain state formula during the model generation. In order to do this, a fresh variable isadded to the state vector that represents if the operand of the once formula was satisfiedbefore. This once-variable is, of course, initially false. When the formulas get evaluatedduring the model generation, the value of the once-variable is updated (cf. line 8 of algo-rithm 7.4 on page 113); if the value of the once-variable is already true, the variable keepsits value, otherwise the value is set to true if and only if the operand of the once formulais satisfied.

127


newTransitions

probability

label

stateVector

0

0.3

e0

y = 2once1

1

0.3

e0

y = 3once1

2

0.4

e0, e1

y = 1¬once1

(a) State 0

newTransitions

probability

label

stateVector

0

1.0

e0, e2

y = 2once1

(b) State 1

Figure 7.13.: Example of transition information obtained during the generation of a labeledMarkov chain with a normalized once formula

Example 7.6. Let M be a small modification of the executable model of example 7.2with the label expressions E = e0, e1, e2 with e0 = y!= ||l==true, e1 = y== , ande2 = O l==true) && l==false.

The once-variable once1 is introduced for the normalization during themodel generation.Figure 7.13 shows the generated transition information of state 0 and state 1. On the basisof the once-variable, the label expressionF (( O l==true) && l==false) can be evaluated,because it is equivalent to F (once1 && l==false).

One application of the once-operator is the calculation of the probability of a minimalcut set for the safety analysis. Let be Γ = FS,FK1 be a minimal cut set, then the exactprobability can be calculated with the approach of this thesis by

Pr(Γ occurs)= Pr(both FS and FK1 occur at least once)= Pr(F (( OFS) ∧ ( OFK1))) .

The probability of temporal logic formulas using the finally operator more than once, e.g.,(F e1) ∧ (F e2), cannot be calculated directly. To circumvent this problem, such formulascan be expressed in equivalent verifiable formulas using the once-operator, e.g., (F e1) ∧(F e2) = F (( O e1) ∧ ( O e2)).

Hence, the once-operator is also useful for calculating conditional probabilities in thesafety analysis. If the probability that a sensor fault occurred during the runtime of the

128

7.6. Interfacing Lustre


Number of Hazards 1 2 2Number of Faults 7 12 9Observed Faults 7 1 9

Without OnceStates 2,587,933 2,186,964 294,871Transitions 26,221,633 249,842,821 3,372,067Traversal time 27 s 50 m 42 s 7 s

With OnceState 35,140,992 exceeded 299,809Transitions 334,015,285 exceeded 3,415,540Traversal time 3 m 03 s - 7 s

Size IncreaseState Ratio 13.58 × - 1.01 ×

Transition Ratio 12.74 × - 1.01 ×

Table 7.4.: Evaluation of the generation time of a sparse labeled Markov chain having aOnce-Formula

system given that the hazard occurs, this can be calculated by

Pr(sensor fault occurs | hazard occurs)

=Pr( both sensor fault and hazard occur)

Pr( hazard occurs )

=Pr((F sensor fault) ∧ (F hazard))

Pr(F hazard)

=Pr(F (( O sensor fault) ∧ ( O hazard)))

Pr(F hazard).

Unfortunately, the normalization increases the number of states in the generated proba-bilistic system. Table 7.4 shows that this might increase the number of states by a factor of14 in the Railroad case study where many transient faults are involved. In the Hemodialysiscase study without transient faults, the increase in negligible.


This section shows, how the programming language Lustre was extended to an executablemodel language that supports probabilistic choice. The integration was done by Götz in

129


about two months and is described at large in [Göt17]. The extensible architecture ofthe reference implementation of S# and the language agnostic algorithms facilitated theintegration. Appendix A explains the architecture in greater detail.A Lustre program determines how its outputs change depending on its inputs in one dis-

crete step, and a Lustre program is executed in a series of discrete steps. This fits perfectlyto the micro and macro step semantics of executable models. The inputs of a Lustre pro-gram can be used to indicate if faults occur in a step as demonstrated by the pressure tankexample in section 4.5 on page 50. Using an interpreter for Lustre, the execution programρE is given by algorithm 7.10.

Algorithm 7.10: Execution program in Lustre

1 function MacroStep2 for each f ∈ faults do3 determine if f is active in the step based on its demand type4 set the input variable of f in the model’s main node5 Execute one step of the model in the Lustre interpreter

To generate machine code out of a Lustre model, the Lustre v4 toolbox transforms thecode in several steps [Ray00]: first, all nested nodes of the main node are inlined, then, thenode is transformed into the intermediate format called oc5; afterwards, the code in the in-termediate format is transformed into C code that finally is compiled using an off-the-shelfC-compiler. Instead of reinventing the wheel, Götz created an interpreter for the interme-diate format to enable probabilistic model checking using the S# framework [Göt17].The Lustre model of the pressure tank that was described in section 4.5 was transformed

into the intermediate format to allow its probabilistic analysis. The following listing con-tains an excerpt of the model in the intermediate format.

1 oc5:2 module: TANK3

4 signals:5 : input:fault_k - single: bool:6 : input:fault_k - single: bool:7 : input:fault_sensor - single: bool:8 : output:level - single:9 end:10

11 variables:12 : $13 : $14 : $15 : $16 : $17 : $

130


18 ...19 end:20

21 actions:22 ...23 : call:$ ( ) ($ ( ,# ))24 : call:$ ( ) (# )25 ...26 : if:27 : if:28 : call:$ ( ) (# )29 ...30 end:31

32 states:33 startpoint:34 calls:35 : ( < > )( < > )36 : ( )( ) ...37 : ( )( ) ...38 ...39 end:40

41 endmodule:

The following description gives a first impression of the intermediate code. The technicalreference of the intermediate code by Plaice et al. provides a detailed description [PS98].The code describes a finite state automaton. The inputs and outputs of the node are givenin the signals-section. In the variables-section, the type of each of the 10 variablesis declared; $ indicates that the variable is a boolean, and $ that it is an integer. Theactions-section contains primitive statements that operate on the variables or describe acase distinction: action describes that an integer is assigned (denoted by call:$ ) to thevariable with index with the constant value , and action describes the case distinctionbased on the boolean variable with the index . Finally, the states-section describes thetransitions of the state machine: in state , the actions , , , , , , are executed in thatorder, until the case distinction is encountered. If returns true, the state machineswitches to state otherwise to state . A step is finished after the state machine switchesits state.The prototypical implementation simply uses the variables in the variables-section as

state vector. This section demonstrated how existing programming language can be ex-tended to executable modeling languages when a macro steps can be expressed as a finiteexecutable program in the language, and when variables can be serialized into a state vectorand deserialized from a state vector. Habermaier describes the serialization and deserializa-tion of S# in [Hab17].

131



There are two competing tools that are specialized in the probabilistic analysis of safety-critical systems, VECS and Compass. Both rely on the standard approach of model trans-formations and use probabilistic model checkers like PRISM, or MRMC [KNP11; KZ+11].The Safety Analysis Modeling Language (SAML) is an extension of the PRISM input lan-guage to facilitate its application for the analysis of safety critical systems [LSO12]. VECStransforms a SAML model directly into the PRISM input language for its quantitative anal-ysis. The approach of Compass is a bit more sophisticated [BC+09; Ngu13]: the Compasstoolset transforms SLIM models into the NuSMV input language. Afterwards, the statespace is exported to an intermediate file on which bisimulation is applied and probabilitiesare annotated by the tool SigRef. Finally, the resulting Markov chain is model checked byMRMC.In contrast, this thesis creates the probabilistic systems by executing the models. Hence,

it is more closely related to the approach of the model checker LiQuor [CB06; Cie11].LiQuor is a model checker for the Probmela modeling language, which is an extension ofthe Promela language that allows probabilistic choice [BCG04]. Probmela is well-suitedfor the analysis of the concurrency of several processes. LiQuor transforms a Probmelainto an intermediate format that is explicitly traversed. In order to do this, the completevariable environment and also the program counters of the processes is recorded in thestate space. Although LiQuor supports atomic regions, they are not used to reduce thenumber of temporary variables. The approach introduced in this thesis tries to recordonly the bare necessities in the state space. Furthermore, using the approach of this thesis,intermediate (micro) states do not appear in the state space. Another approach to modelcheck Probmela models is to use the standard approach and transform the model into thePRISM language [CB+08].It is hard to make a fair comparisons between these tools and the approach of this thesis

due to their different models of computation. For instance, it took about 740 lines to createa scaled down Compass version of the railroad crossing model that is semantically similar tothe S# version written in about 400 lines of C# code [Bil12]. Compass performs a qualitativesafety analysis of this model in about 21 minutes. Conducting a probabilistic analysis of therailroad crossing with Compass was not necessary, because the faster qualitative analysisalready took longer than the 6 minutes of the probabilistic analysis with S#. Of course, thecomparison is unfair as forcing Compass semantics onto executable models might likewiseslow down analyses.

132

8. Optimizations

This chapter presents three optimizations, which can be applied during the model gen-eration. The first optimization introduces multi-core model checking (section 8.1). Thesecond optimization makes use of static optimizations concerning transient faults, whichcan be made during the model traversal under certain circumstances (section 8.2). Thethird optimization treats the aspect that it is often not necessary to generate the completeprobabilistic system from an executable model (section 8.3). By applying all three optimiza-tions, the generation time of the Height Control case study could be reduced from about50 minutes to 3 minutes.

8.1. Multi-Core State Traversal

Almost all processors of modern computers have multiple cores, i.e., multiple independentprocessing units that access the same memory. To be able to use multiple cores during themodel generation, the algorithms have to be adapted. The approach for it was inspiredby LTSmin [KL+15]. The multi-core support has a major influence on the architecture,because for each core an exact clone of the executable model has to be created where eachclone has its own variables. Then, each core can execute its own clone. The detailed archi-tecture is shown in appendix A. Furthermore, the state storage must also allow concurrentaccess. That is why the reference implementation uses a derivation of the state storage pre-sented by Laarman in [LPW10; Laa14], which is optimized for multi-core model checking.For the parallel traversal of the state space the algorithm 7.1 on page 108 needs to be par-allelized; this was done by integrating ideas of the depth first search algorithm of Sanders[San97]. The transition information of each state is calculated exactly once because eachstate is processed exactly once. Finally, it must be possible to append the transition infor-mation to the data structure of a probabilistic system in parallel. For the sparse labeledMarkov chain, this can be simply done by reserving sequential space for the transition, be-fore writing the entries. This can be done with the ReserveSpace-method of the followinglisting.

133

8. Optimizations

Algorithm 8.1: Reserve space

1 method Helpers::AtomicallyAddAndReturnOld(valueLocation : pointer to int, valueToAdd : int)2 returns the value before the addition3 while true do4 var valueBefore := *valueLocation5 var newValue = := valueBefore + valueToAdd6 if compare-and-swap(valueLocation, valueBefore, newValue) then7 return initialValue8 end method

9 method LmcBuilder::ReserveSpace(elementsToReserve : int)10 returns index of first reserved element11 var index := Helpers.AtomicallyAddAndReturnOld( *reservedCounter, elementsToReserve )12 check if (index + elementsToReserve) exceeds the memory13 return index14 end method

The algorithm uses a variant of the well-known atomical adder using the compare-and-swap operation. The atomar operation compare-and-swap compares the contents of val-ueLocation with valueBefore and stores newValue to valueLocation, only if the value invalueLocation is equal to valueBefore, which is considered as success. If the operation wassuccessful, the operation returns true; otherwise false. For labeled choice-aware Markovdecision processes this works analogously.The evaluations on the case studies in table 8.1 and table 8.2 show that the model gener-

ation was most of the time three times faster with four cores than with one core.

134

8.1. Multi-Core State Traversal


States 2,587,933 2,186,964 294,871 156 25 30Transitions 9,100,662 187,511,131 3,372,067 416 56 51Traversal time (1 core) 22 s 49 m 59 s 5 s < 1 s < 1 s < 1 sTraversal time (4 cores) 6 s 14 m 31 s 1.6 s < 1 s < 1 s < 1 sSpeed up 3.66 × 3.44 × 3.12 × - - -

Table 8.1.: Evaluation of the generation time of a sparse labeled Markov chains using multi-core state traversal


States 2,587,933 exceeded 294,871 156 25 30Choices 87,832,472 exceeded 6,449,262 3,091 86 162Transitions 45,210,203 exceeded 3,372,067 1,624 56 108

Traversal time (1 core) 29 s exceeded 5 s < 1 s < 1 s < 1 sTraversal time (4 cores) 9 s exceeded 1.7 s < 1 s < 1 s < 1 sSpeed up 3.22 × - 2.94 × - - -

Table 8.2.: Evaluation of the generation time of a sparse labeled CMDP using multi-corestate traversal

135

8. Optimizations

8.2. Static Fault Forwarding

The second optimization is tailored for the analysis of safety-critical systems with S# formodels that contain transient faults. The next listing gives an example of a model wherethis optimization could be used.

1 private class C : Component 2 private readonly Fault _f = new TransientFault();3 private int _x; 4 5 public virtual bool CalculateY() 6 return _x == ;7 8

9 [FaultEffect(Fault = nameof(_f ))]10 public class FaultEffect : C 11 public override bool CalculateY() 12 return true;13 14 15

In this model, both the nominal behavior and the fault effect of the method CalculateYhave no side effect on any variable; they just return values that are determined by the currentstate or are even constant. The S# compiler detects transient faults automatically, where allassociated methods are of that kind. The S# compiler transforms the model of the exampleto the following C# code (code is simplified).

1 private class C : Component 2 [Hidden] private bool _f = false; 3 private int _x;4 5 public virtual bool CalculateY() 6 var f Active = Choose(false, true);7 var f ChoiceIndex = ChoiceResolver.choiceDepth;8 var f Result = true;9 if (!f Active) 10 var resultWithoutFault = _x == ;11 if (resultWithoutFault == f Result)12 ChoiceResolver.ForwardChoice(f ChoiceIndex);13 f Result = resultWithoutFault;14 15 return f Result;16 17

136

8.2. Static Fault Forwarding

The Choose in line 6 determines, if the transient fault is active or not. Due to the fixedtraversal order (see chapter 7), the case where the fault is not active is analyzed first. In thevariable f ChoiceIndex the depth of the choice is saved. The variable f Result containsthe result that is returned from an active fault effect (cf. line 12 of the untransformedlisting). If it is the case is encountered that the fault is not active (the earlier case), the if-branch in lines 9-13 is executed. The actual result of the method is calculated (line 10), andif the actual result is equal to the result from an active fault, it makes no difference, if thetransient fault was active or not. In this case, the Choose of line 6 is made void in line 12.Algorithm 8.2 and algorithm 8.3 provide the implementations of ForwardChoice for

labeled Markov chains and labeled choice-aware Markov decision processes, respectively.

Algorithm 8.2: ForwardChoice in labeled Markov chains

1 method ChoiceResolver::ForwardChoice(idxToForward : int)2 returns nothing3 var parentProbOfIdxToForward := choiceStack[idxToForward-1].probability4 or 1.0 if idxToForward has no parent5 var currentProbOfIdxToForward := choiceStack[idxToForward-1].probability6 var differenceProbabilityToAdd := parentProbOfIdxToForward - currentProbOfIdxToForward7 var choiceStack[choiceDepth].probability += differenceProbabilityToAdd8 var choiceStack[choiceDepth].numberOfOptions := 19 end method

For labeled Markov chains, the complete probability of the voided branch is calculatedin line 6. This probability is added to the current branch. This works, because the choice-Stack[choiceDepth].probability is not overwritten by choose until all relevant choice pathshave been traversed (cf. line 8 in algorithm 7.6 on page 118).

Algorithm 8.3: ForwardChoice in choice-aware Markov processes

1 method ChoiceResolver::ForwardChoice(idxToForward : int)2 returns nothing3 var choiceIdToForward := choiceStack[idxToForward].choiceIndex4 var choiceIdOfSibling := choiceIdToForward + 15 newChoices[choiceIdOfSibling].type := forward6 newChoices[choiceIdOfSibling].to := currentChoiceIndex7 choiceStack[idxToForward].numberOfOptions := 18 end method

For labeled choice-aware Markov decision processes, the situation is easier. Forward-Choice can simply be implemented by adding the choice-type “forward” in the relevantdata structures. Note that algorithm 6.3 must be modified to handle the new choice-typeaccordingly.Table 8.3 shows that the Height Control case study could benefit strongly from the op-

timization (factor 5) in the labeled Markov chain case. The optimization has no influence

137

8. Optimizations


No forwardingStates 2,587,933 2,186,964 294,871 156 25 30Transitions 9,100,662 187,511,131 3,372,067 416 56 51Traversal time (4 cores) 6 s 14 m 31 s 1.6 s < 1 s < 1 s < 1 s

With ForwardingStates 2,587,933 2,186,964 294,871 156 25 30Transitions 9,100,662 187,511,131 3,372,067 416 56 51Traversal time (4 cores) 5 s 2 m 55 s 1.6 s < 1 s < 1 s < 1 sSpeed up 1.2 × 5.0 × 1.0 × - - -

Table 8.3.: Evaluation of the generation time of a sparse labeled Markov chains using staticfault forwarding


No ForwardingStates 2,587,933 exceeded 294,871 156 25 30Choices 87,832,472 exceeded 6,449,262 3,091 86 162Transitions 45,210,203 exceeded 3,372,067 1,624 56 108Traversal time (4 cores) 9 s exceeded 1.7 s < 1 s < 1 s < 1 s

With ForwardingStates 2,587,933 exceeded 294,871 156 25 30Choices 69,172,070 exceeded 6,449,262 3,091 86 162Transitions 33,907,889 exceeded 3,372,067 1,624 56 108Traversal time (4 cores) 6 s exceeded 1.7 s < 1 s < 1 s < 1 sSpeed up 1.5 × - 1.0 × - - -

Table 8.4.: Evaluation of the generation time of a sparse labeled CMDP using static faultforwarding

on the number of transitions, because transitions with the same target state and labelingget merged in any case. The generation of labeled choice-aware Markov decision processescould also benefit; a factor of 1.5 was possible for the Railroad case study (table 8.4). Theoptimization had no influence on the Hemodialysis case study, because it was not applicablefor any fault.

Habermaier introduced a similar fault optimization for the DCCA in [Hab17]. Note thatthe optimization only works, if the fault occurrence is not part of any formula.

138

8.3. Early Termination

(a) Without optimization

(b) With optimization

Figure 8.1.: Early termination

8.3. Early Termination

It is often not necessary to to enumerate the complete probabilistic system from an exe-cutable model. Assume, a state formula e0 is given, and only the formula F e0 should bechecked. Then it is reasonable to stop the traversal each time e0 is encountered. This isillustrated in figure 8.1: It is not necessary to find the states 2, 3, and 4, because they do notchange the probability of F e0; by enabling the optimization, a stuttering state S is created,and the target state of each transition with the label e0 is set to state S.If this optimization is enabled, the state space has to be enumerated for every formula

on its own. Also, the traversal time depends heavily on the checked formula, e.g., checkingthe formula F false ends the traversal pretty quickly. Table 8.5 and table 8.6 show theresults of the optimization applied on the larger case studies. The generation time of thesmaller case studies is already less than a second; for this reason, they are left out. Theoptimization is especially useful for the Hemodialysis case study. The other case studiescould not benefit that much, because the hazards always occur in states that are far apartfrom the initial states.

139

8. Optimizations


Number of Faults 7 12 9States 2,587,933 2,186,964 294,871Transitions 9,100,662 187,511,131 3,372,067Full Traversal time 5 s 2 m 55 s 1.6 s

Hazard Poss. Collision Collision False Alarm Unsuccessful Contamination

States 2,059,082 2,002,728 1,899,456 59,899 188,237Transitions 7,134,793 170,817,001 164,531,497 655,497 2,523,971Traversal time 5 s 2 m 39 s 2 m 45 s < 1 s 1.2 sSpeed up 1.0 × 1.1 × 1.1 × ≈ 2 × 1.3 ×

Model Checking time 21 s 7 m 17 s 7 m 00 s 3 s 12 s

Table 8.5.: Evaluation of the generation time of a sparse labeled Markov chains using earlytermination


Number of Faults 7 12 9States 2,587,933 exceeded 294,871Choices 87,832,472 exceeded 6,449,262Transitions 45,210,203 exceeded 3,372,067Full Traversal time 9 s - 1.7 sModel Checking time 6 m 07 s - - 41 s 43 s

Hazard Poss. Collision Collision False Alarm Unsuccessful Contamination

States 2,059,082 exceeded exceeded 59, 899 188, 237Choices 61,522,580 exceeded exceeded 1846051 5050751Transitions 30,246,867 exceeded exceeded 952, 975 2, 619, 494Traversal time 6 s - - 1.2 s 1.2 sSpeed up 1.5 × - - 1.4 × 1.4 ×

Model Checking time 4 m 36 s - - 12 s 37 s

Table 8.6.: Evaluation of the generation time of a sparse labeled CMDP using early termi-nation

140

Rick: And remember, this gun is pointedright at your heart.

Renault: That is my least vulnerable spot– Casablanca (1942)

9. Analyzing the Impact of Faults usingExecutable Models

This chapter shows different analysis techniques that can be used to estimate how dif-ferent faults affect a hazard. For this purpose, the case studies are used to introduce thedifferent techniques. This also provides evidence that the proposed techniques work withrealistic case studies and practitioners can benefit from the safety analysis using executablemodels. The results make it easier to identify the weak spots in a system. Keep in mind thatthe stated fault probabilities in the examples are speculative and only serve the purpose toillustrate the different techniques.

The technique demonstrated in section 9.1 combines the minimal critical sets that canbe calculated with the Deductive Cause Consequence Analysis with the rare event approx-imation of the fault tree analysis. This delivers a very coarse approximation of the hazardprobability. In section 9.2, the exact hazard probabilities are calculated directly from the exe-cutable models using probabilistic model checking. The results of the section are comparedwith the results of section 9.1. Section 9.3 analyzes the impact of the probability of a certainfault on the hazard probability by conducting a series of probabilistic model checking runswhere the fault probability in each run is slightly changed. In section 9.4 different designvariants of a system are compared using probabilistic model checking. Finally, section 9.5shows a technique that can reveal that certain faults may increase the hazard probabilityeven if they might be overlooked by traditional analyses. For this, conditional probabilitiesare used. Moreover, the section demonstrates how Bayesian networks can be used to illus-trate the relationships between different faults to make it easier to find the weak spots of asystem.

The results that refer to the height control case study are published in [LK+17], and theresults that refer to the hemodialysis machine case study are published in [LHR18]. Fritschprovides the evaluations based on Bayesian networks in [Fri17].

141

9. Analyzing the Impact of Faults using Executable Models

9.1. Quantitative Evaluation based on the DeductiveCause Consequence Analysis

In traditional fault tree-based methodologies, first a fault tree is created by hand before min-imal cut sets can be derived from such a fault tree [VU81]. This is costly and labor-intensive.By contrast, the Deductive Cause Consequence Analysis (DCCA) is a fully automated andmodel checking-based safety analysis technique. A set of component faults Γ is a criticalset for a hazard H if and only if there is the possibility that H occurs and before that, atmost the faults in Γ have occurred. A critical set Γ is minimal if no proper subset Γ′ ⊂ Γis critical. The result of the DCCA is a set of minimal critical sets, that are abbreviatedby MCS. Habermaier gives detailed algorithms of the DCCA and formal proofs of theircorrectness and completeness [Hab17]. Section 4.3.3 on page 41 shows how to apply theDCCA on a S# model.Minimal critical sets contain information, which combinations of faults lead to a certain

hazard and are closely related to the minimal cut sets of the fault tree analysis. Therefore,it makes sense to apply the rare event approximation of the fault tree analysis on minimalcritical sets. Recall the equation of the rare event approximation of section 3.5 on page 27,

Pr rea(H) =∑

Γ∈MCS

∏

f∈Γ

Pr(f) ,

where Pr(f) abbreviates the probability of fault f .The idea to apply the rare event approximation on model checking results is not new and

also done by Bozzano et al. [BV03] and Güdemann [Güd11]. In the following subsections,the quantitative evaluation based on the DCCA is applied on the model of the HeightControl System and the Hemodialysis Machine.

9.1.1. Height Control System

Before the rare event approximation can be applied, the minimal critical sets must be calcu-lated. Table 9.1a shows the result of the DCCA applied on the model of the height controlsystem. The cardinality of the minimal critical sets give a first impression of the safety ofthe system. Hazard “Collision” has 5 minimal critical sets each having a cardinality of 2,whereas hazard “False Alarm” has 5 minimal critical sets, all being single points of failures.The results of the DCCA show that the system is more prone to false alarms than to ac-tual collisions as desired. The DCCA also provides detailed scenarios for each minimalcritical set how its faults lead to a hazard, e.g., in the scenario of the fault set LeftOHV,MisdetectionLB-Pre, a overheight vehicle drives on the left lane, the PreControl does notdetect the vehicle, which is why all other sensors do not get activated. Hence, the tunnelis not closed when the overheight vehicle tries to enter the low tube of the tunnel. If only

142

9.1. Quantitative Evaluation based on the Deductive Cause Consequence Analysis

Hazard Minimal Critical Sets

Collision: an overheight vehicle enters a low tube(1) LeftOHV, SlowTraffic (2) LeftOHV, MisdetectionOD-End-Left (3) LeftOHV, MisdetectionLB-Main (4) FalseDetectionLB-Main, LeftOHV (5) LeftOHV, MisdetectionLB-Pre

False Alarm: all OHVs on the right, but tunnel gets closed(1) LeftHV (2) FalseDetectionOD-End-Left (3) FalseDetectionOD-Main-Left (4) FalseDetectionLB-Main (5) MisdetectionOD-Main-Right

(a) DCCA results

Fault Pr1(Fault)

LightBarrierFalseDetection 5× 10−3

LightBarrierMisdetection 1× 10−4

OverheadDetectorFalseDetection 5× 10−3

OverheadDetectorMisdetection 1× 10−4

LeftHV 1× 10−2

LeftOHV 1× 10−3

SlowTraffic 1× 10−1

(b) Fault probabilities

Table 9.1.: DCCA-based evaluation of the height control case study

one of these two faults and no other fault occurs, the system is still safe. Thus, knowingthe minimal cut sets also means to know the weak spots of a system.Table 9.1b specifies the probabilities that are used in the following analysis. The proba-

bility Pr 1(f) of a fault f denotes the chance that the corresponding fault occurs withinone time step and is defined for the discrete probability space. To be able to use therare event approximation, the probability to fail at least once in the time period underconsideration is required. It makes no sense to consider an infinite time horizon, be-cause most non-terminating systems always fail eventually, i.e., the probability of a haz-ard with an infinite time horizon is 1. The probability to fail in a certain number of timesteps k can be derived easily from the given probabilities using the geometric distribution:let the time under consideration be given by k = 50 time steps, then the accumulatedprobability of the fault f = LightBarrierFalseDetection occurring within 50 time steps isPr 50(f) = 1− (1− Pr 1(f))50 ≈ 0.22.For the next calculation, the fact is ignored that some faults are only relevant in some

system states. It is also assumed that a fault of a minimal critical set is present when itoccurs in any of these 50 time steps. This is clearly an over-approximation. Applying theformula for Pr rea(H) on the minimal cut sets calculated earlier gives these probability:

Pr rea(Collision) = Pr 50(LeftOHV) · Pr 50(SlowTraffic) + . . . ≈ 5× 10−3 ,Pr rea(False Alarm) = Pr 50(LeftHV) + . . . ≈ 1.07 .

The calculated probability for false alarm is greater than 1 which is an invalid probability.The reason for that is that the assumption of the rare event approximation is not satisfiedbecause the probability Pr 50(leftHV) makes leftHV certainly not a rare event.

143


Hazard Minimal Critical Sets

Dialysis Unsuccessful: blood is not cleaned and dialysis finished(1) DialyzingFluidPreparationPumpDefect(2) WaterHeaterDefect(3) PumpToBalanceChamberDefect(4) UltrafiltrationPumpDefect(5) BloodPumpDefect(6) DialyzerMembraneRupturesFault

Contamination: blood entering the vein of the patient is contaminated(1) SafetyBypassFault, WaterHeaterDefect(2) DialyzerMembraneRupturesFault, SafetyDetectorDefect(3) DialyzerMembraneRupturesFault, ValveDoesNotClose

(a) DCCA results

Fault Probability

BloodPumpDefect 1.0× 10−5

DialyzerMembraneRupturesFault 1.0× 10−5

DialyzingFluidPreparationPumpDefect 1.0× 10−5

SafetyBypassFault 1.0× 10−3

WaterHeaterDefect 1.0× 10−2

PumpToBalanceChamberDefect 1.0× 10−5

SafetyDetectorDefect 1.0× 10−7

ValveDoesNotClose 1.0× 10−5

UltrafiltrationPumpDefect 1.0× 10−3

(b) Fault probabilities

Table 9.2.: DCCA-based evaluation of the hemodialysis machine case study

Next assume that each of these transient faults is only relevant in exactly one of these 50time steps. Applying the formula for Pr rea(H) on the minimal cut sets calculated earliergives us these probability:

Pr rea(Collision) = Pr 1(LeftOHV) · Pr 1(SlowTraffic) + . . . ≈ 1.05× 10−4 ,Pr rea(False Alarm) = Pr 1(LeftHV) + . . . ≈ 2.51× 10−2 .

Note that the assumptions are not justified, but they demonstrate the high influence onthe resulting probabilities. There are reasons that makes the results at least highly inaccurate,e.g., the ordering of faults is also important and some faults are only relevant in certain steps.This clearly shows that it is difficult to apply traditional methods on systems which havea complex logic and have a dynamically behaving environment. Later in this chapter, anaccurate probability estimate is calculated based on the probabilistic model checking of theexecutable model. Then, the accurate results are compared to the approximation using theDCCA with the rare event approximation.

9.1.2. Hemodialysis Machine

Similar to the height control case study, the approach can also be applied on the hemodialy-sis machine. Table 9.2a shows the result of theDCCA applied on themodel of the hemodial-ysis machine. For instance, the membrane rupturing in combination with the broken valveis a minimal cut set for the hazard “blood entering the vein of the patient is contaminated”,as blood is contaminated in the dialyzer and the valve is unable to prevent the contaminatedblood from entering the vein of the patient. This minimal critical set is denoted as Dia-lyzerMembraneRupturesFault, ValveDoesNotClose. The DCCA revealed six minimal cut

144

9.2. Quantitative Evaluation using Probabilistic Model Checking

sets of size one for the first hazard. Thus, they form single points of failures. For thesecond hazard, the DCCA finds three minimal cut sets of size two and no single point offailure. Based on the DCCA results, the rare event approximation can also be applied here;table 9.2b provides the necessary probabilities:

Pr rea(Unsuccessful) = Pr 6(DialyzingFluidPreparationPumpDefect) ≈ 0.066 ,Pr rea(Contamination) = Pr 6(SafetyBypassFault) · Pr 6(WaterHeaterDefect) + . . .

≈ 3.6× 10−4 .

In contrast to the height control evaluation, these results are very close to the accurateresult derived by probabilistic model checking. This is discussed in the next section in moredetail.

9.2. Quantitative Evaluation using Probabilistic ModelChecking

The quantitative evaluation using probabilistic model checking does not derive the prob-ability of a hazard from the minimal critical sets; instead, it derives the probability fromthe traces directly using model checking. This produces more accurate estimates and isespecially useful when the controllers have a complex behavior that is hard to analyze withtraditional approaches. When a controller detects a fault in a component and decides toswitch to a spare component or to go into a degraded mode, this has a direct effect on thetraces.


Probabilistic model checking deduces the following accurate results for each hazard to occurwithin 50 time steps in the model of the height control system:

PrM(Collision) ≈ 2.00× 10−7 ,PrM(False Alarm) ≈ 5.45× 10−2 .

Compared to the accurate result, the collision probabilities of the rare event approxima-tion calculated in the previous section are quite high: both 5× 10−3 (when assumed that atransient fault at any time leads to a hazard) and 1.05× 10−4 (when assumed that a transientfault at one specific time leads to a hazard) are larger than the accurate result by orders ofmagnitude. The reason is that the ordering of faults is also important and that some faultsare only relevant in certain steps. On the other hand, the probability of a false alarm isestimated too low when it is assumed that each fault only occurs in one time step.

145


It is possible to make a better approximation with traditional methods but these requirea deep knowledge of the exact order of fault activations which lead to a hazard. Suchdetails could be added into a formula, but this is far from trivial, because in such a formulafault probabilities cannot simply be multiplied with each other. For systems with a highlycomplex behavior these approachesmight be not feasible. As a consequence of looking intothe traces directly, the probabilistic model checking approach is not affected by this problem.Another source for the difference is that the probabilistic model checking approach doesnot adhere to the “no miracles rule” of the fault tree analysis [VU81]. Simply put, the “nomiracles rule” states that if one fault would prevent a more severe situation, then assumethat this fault does not occur. Thus, when the first light barrier is defect, and its defect wouldprevent that the light barrier would not activate the height control system at all, which leadsto no false alarm, then assume that the first light barrier works correctly in these cases.Model checking does not adhere to this rule, because it looks at the traces directly.The precise model checking-based probabilistic analysis does not make the qualitative

analysis with the DCCA redundant, because the minimal critical sets can help explain theresults of the quantitative analysis and reinforce the validity of the results. The probabilisticanalysis is not intended to replace traditional analysis, but to serve as an additional meansfor engineers.


Probabilistic model checking deduces the following results from the model of the hemodial-ysis machine:

PrM(Unsuccessful) ≈ 0.053 ,PrM(Contamination) ≈ 3.5× 10−4 .

These probabilities resemble the results of the rare event approximation. The reasonthey are slightly smaller is that the result of the model checking is more precise, whereasthe rare event approximation tries to find an upper bound. This result shows that the rareevent approximation indeed delivers good estimates for systems without complex dynamicbehavior.The probabilistic model checking also allows an easy analysis of models with both tran-

sient faults and permanent faults. Setting WaterHeaterDefect as the only transient fault andkeeping all others as permanent faults gives the following result,

PrM(Unsuccessful) ≈ 0.014 ,PrM(Contamination) ≈ 2.0× 10−4 .

Clearly, the probability of an unsuccessful dialysis reduces, when the fault only occurstemporarily, and the hemodialysis machine can return to the normal operation when the

146

9.3. Evaluation of the Impact of Single Fault Probabilities

water heater works again. In addition, the probability of the second hazard is reduced,because as the qualitative analysis showed previously, a defect water heater is only a problemwhen the safety valve is defect.Probabilistic model checking has even more to offer. Assume that the probability of that

the membrane of the dialyzer ruptures is unknown or that the failure rate is not constant(see section 3.3 on page 22). In such cases, the fault occurrence of DialyzerMembraneRup-turesFault can be left unspecified. Then the fault is assumed to occur nondeterministically.Hence, a probability range can be calculated (assuming fault WaterHeaterDefect is perma-nent),

PrrangeM

(Unsuccessful) ≈ [0.04, 1.0] ,Pr

rangeM

(Contamination) ≈ [1.98× 10−5, 4.16× 10−4] .

When the membrane of the dialyzer ruptures in the first step, the dialysis cannot besuccessful, obviously; the worst case probability is therefore 1.0. In the best case, thedialysis is successful with a probability of 0.04, which is only slightly better than the firstmentioned probability of 0.053, when the probability of the fault is set. The probabilityof Contamination in the worst case is only about twice the probability in the best case; thesafety measures seem to be effective.


Additionally, the probabilistic safety analysis can be used to derive a graph that shows howthe probability of a certain fault affects the hazard probability. This can easily be achievedby conducting a series of model checking runs, where the fault probability in each run isslightly changed. In S#, this is done by

var result = SafetySharpModelChecker.ConductQuantitativeParametricAnalysis(

model, parameter);

where parameter contains the analysis parameters like the formula to check, the numberof model checking runs, the probability to change, and the range of the probability.

9.3.1. Degraded Mode

The degraded mode case study (see section 2.5.2 on page 16) disproves the common mis-conception that a higher probability of a fault always leads to a higher hazard probability.This may not be the case when systems can detect fault occurrences and switch to a de-graded mode. The degraded mode case study serves as representative of such systems: thesystem makes a self-check at system start and switches to a degraded mode if the self-check

147


Figure 9.1.: Impact of MeasureSignalFault on the hazard “position estimation deviates” inthe degraded mode case study

detects a problem. The higher the probability of “MeasureSignalFault”, the more likelythe degraded mode is entered, which prevents the hazard “position estimation deviates”reliably. On the other hand, if the fault probability is smaller, also the probability to en-ter the degraded mode is smaller. And in the normal operation mode, the hazard is morelikely. Figure 9.1 illustrates how these effects add up. The curve is clearly not monotonous.Although this example only serves an demonstrative purpose, larger systems may also bevulnerable to similar effects that linger in some subcomponents.


The quality of the sensors plays the most crucial role for the safety of the height controlsystem. Thus, the impact of the quality of the light barriers are analyzed by varying theprobability of a false detection. The graphs in figure 9.2 illustrate the impact of the qualityof the light barrier on the hazards Collision and False Alarm and also on the probabilitythat the height control system prevented a collision (Prevention). The probability of thefalse detection of the light barriers are varied in a range between 1.0× 10−6 and 1.0× 10−2

while all other fault probabilities keep their value (see table 9.1b). 25 linear sample pointswere used.The quality of the light barriers has a minor impact on the probability of a collision as

shown by figure 9.2a. When the false detection gets more probable, the collision gets moreprobable as well. Nevertheless, the leftmost and rightmost sample values only differ by9.1× 10−9. It seems that the graph is almost linear with its inflection point being some-where in the middle of the graph.The quality of the light barriers has a clear impact on the probability of a false alarm

as shown by figure 9.2b. When the false detection gets more probable, the false alarmgets more probable as well. This graph shows a clear curve with its inflection point beingsomewhere in the middle of the graph. Comparing the leftmost and rightmost sample value,the probability of the hazard almost doubled. But increasing the quality of the sensors evenmore only has a minor effect.

148


(a) Effect on Collision (b) Effect on False Alarm

(c) Effect on Collision Prevention

Figure 9.2.: Impact of the quality of the light barrier in the height control system case study

The probability of a collision and of the prevention of a collision are obviously nega-tively correlated when the behavior of the drivers is the same. This is also shown clearly inthe graph of figure 9.2c. The worse the light barriers, the worse the chance to prevent acollision.


Figure 9.3 shows the effect of the probability of a defected water heater on both hazardsof the hemodialysis machine case study. For the analysis of each hazard, 25 sample pointswith probabilities in the range from 10× 10−3 to 10× 10−1 were used. These analyses

Figure 9.3.: Impact of the probability of a defected water heater in the hemodialysis machinecase study

149


confirm the results of the DCCAs in table 9.2a, where the water heater is a single pointof failure for the hazard “Dialysis unsuccessful” and part of a minimal critical set of thehazard “Contamination”.

9.4. Evaluation of Design Variants using ProbabilisticSystems

How reliable a safety-critical system finally is depends strongly on design choices madein the first phases of its development. There might be several design variants that all ful-fill the functional requirements but have different implications on the system’s safety. Inthese phases, a traditional analysis would often be too time-consuming and expensive tobe applied on every variant because for each variant a separate fault tree needs to be cre-ated manually. The early rigorous analysis of design variants with S# can calculate minimalcritical sets automatically; furthermore, probability estimates of the hazards can be calcu-lated in cases in which the traditional analysis would need expensive prototypes. Thus, S#’sDCCA and its quantitative analysis using probabilistic model checking can make life easierfor engineers.Four design variants of the height control system (called Original, PreImproved, No-

CounterT, and NoCounter) have been analyzed with S#. The variant Original denotes theoriginal design as described in section 2.2 on page 9. PreImproved is a variant where twoadditional sensors have been added to the PreControl to improve its detection rate. BothNoCounter and NoCounterT remove the MainControl’s counter to reduce false alarms.This requires only a change of the logic of the controller and does not require adding orremoving any sensors. The logic of NoCounterT is designed to be more tolerant to thesensor input than NoCounter. Prior work [OR+03] discusses the design variants in greaterdetail.Table 9.3 shows the results of the safety analyses of the four design variants. The results

of the DCCA are summarized in the first four rows of table 9.3b and table 9.3c. Forthe probabilistic safety analyses, the analyzed probabilities of each hazard to occur within50 time steps is analyzed for the four design variants. Moreover, four different sets offault probabilities that are given in table 9.3a are used for each safety analysis. The first setStandardQuality contains the probabilities the system design starts with. BetterLightBarrierdeviates from this basis by decreasing the false detection probability of the light barriers bythree orders of magnitude. The third set BetterSensors decreases the fault probabilities ofall sensors notably. In the last test set BetterDrivers only the fault probabilities of externalfactors are decreased, i.e., faults the designers of the height control cannot influence. Theresults of the probabilistic analyses are summarized starting with row 5 of the correspondingtables.The results in the tables show that the number of states and transitions had a major

150

9.4. Evaluation of Design Variants using Probabilistic Systems

Fault StandardQuality BetterLightBarrier BetterSensors BetterDrivers

LightBarrierFalseDetection 5× 10−3 5× 10−6 5× 10−6 5× 10−3

LightBarrierMisdetection 1× 10−4 1× 10−4 1× 10−6 1× 10−4

OverheadDetectorFalseDetection 5× 10−3 5× 10−3 5× 10−6 5× 10−3

OverheadDetectorMisdetection 1× 10−4 1× 10−4 1× 10−6 1× 10−4

LeftHV 1× 10−2 1× 10−2 1× 10−2 1× 10−4

LeftOHV 1× 10−3 1× 10−3 1× 10−3 1× 10−5

SlowTraffic 1× 10−1 1× 10−1 1× 10−1 1× 10−3

(a) Probabilities used for the analysis of the design variants

Variant Original PreImproved NoCounterT NoCounter

Faults 13 17 13 13Time in sec – DCCA 1 4 1 2MCS # 5 6 4 4MCS ∅ 2 2.3 2 2Time in min – Gen 02:37 06:11 01:02 01:35Time in min – Calc 07:13 09:44 02:40 02:43States 2,002,728 2,002,728 847,308 847,308Comp. Transitions 616,840,849 1,273,222,777 247,131,751 355,869,721Saved Transitions 170,817,001 225,965,737 64,753,771 64,753,771Pr(StandardQuality) 2.00× 10−7 1.99× 10−7 1.56× 10−7 1.50× 10−7

Pr(BetterLightBarrier) 1.95× 10−7 1.94× 10−7 1.58× 10−7 1.54× 10−7

Pr(BetterSensors) 4.32× 10−8 4.31× 10−8 1.17× 10−8 1.16× 10−8

Pr(BetterDrivers) 1.57× 10−9 1.57× 10−9 1.47× 10−9 1.45× 10−9

(b) Evaluation results of the hazard “Collision”

Variant Original PreImproved NoCounterT NoCounter

Faults 13 17 13 13Time in sec – DCCA 4 8 13 3MCS # 5 5 4 5MCS ∅ 1 1 1.5 1Time in min – Gen 02:44 06:13 01:04 01:38Time in min – Calc 06:58 09:01 02:36 02:38States 1,899,456 1,899,456 803,616 803,616Comp. Transitions 635,826,673 1,312,411,465 254,738,251 366,823,081Saved Transitions 164,531,497 217,405,417 62,646,091 62,646,091Pr(StandardQuality) 5.45× 10−2 5.46× 10−2 3.24× 10−2 6.11× 10−2

Pr(BetterLightBarrier) 4.20× 10−2 4.20× 10−2 3.23× 10−2 4.23× 10−2

Pr(BetterSensors) 3.79× 10−3 3.79× 10−3 3.60× 10−3 3.85× 10−3

Pr(BetterDrivers) 5.01× 10−2 5.02× 10−2 2.93× 10−2 5.75× 10−2

(c) Evaluation results of the hazard “False Alarm”

Table 9.3.: Analysis of design variants of the height control system case study

151


impact on the model checking time. The number of faults for the variant PreImprovedincreased compared to Original because of its two additional sensors. The variants No-CounterT and NoCounter have the same number of faults as Original because only thecontroller software was changed. The qualitative analyses using the DCCA were executedin between 1 and 13 seconds (row “Time in sec – DCCA”). The result of the DCCAs arecondensed into the two metrics “number of minimal critical sets” (MCS #) and “averagecardinality of the minimal critical sets” (MCS∅). Within a variant and hazard, the differentfault probability parameter sets do not change the number of states, transitions, or the prob-abilistic model checking times (row “Time in sec – Pr”), but only the resulting probability.The resulting probabilities are shown in the row Pr(·) with the name of the parameter beingthe name of the probability set respectively.In each case the DCCA is faster than the probabilistic analysis by orders of magnitude.

Still, the probabilistic analyses show a more differentiated view than the DCCA. The anal-yses confirm the common assumption that the quality of the senors have a major impacton the safety of the system. Besides from that, there are several more interesting findingsfor the height control by comparing the probabilities:1. The better the quality of the sensors, the closer the hazard probabilities in differentvariants.

2. For low quality sensors, the selection of a better variant can almost halve the proba-bility of a false alarm.

3. Better drivers only have a minor impact on the probability of a false alarm.4. Even when the minimal cut sets are equal, NoCounterT and NoCounter have differ-ent hazard probabilities. NoCounter should be preferred over NoCounterT whenthe probability of a collision should be reduced. But this has the price of a majorincrease of false alarms.

The results show that the probabilistic analysis is useful for the evaluations of differentdesign variants.

9.5. Evaluation using Conditional Probabilities andBayesian networks

Executable models also allow to calculate the conditional probabilities of certain events asshown in section 7.5. Conditional probabilities can reveal that certain faults have a higherimpact on the hazard probability than others.Let x and y be state formulas, then the joint probability distribution of the positive

events is defined for the purposes of this section by Pr(x+, y+) = Pr(X = x, Y = y) =Pr((F x) ∧ (F y)). Thus, x+, y+ identifies the traces where both x and y are satisfied atleast once.Note that probability distributions, that contain negative events like Pr(x+, y−) =

152


9.5.1. Dead Reckoning

This section shows some analyses of the Dead Reckoning case study that was introducedin section 2.5 using the probabilities stated in section 4.3.2. Using the joint probabilitydistribution, the probability of the hazard can be dissected in the probability that the faultFF occurred or not:

Pr(fF) = 0.4 ,Pr(h+) = 0.00200837 ,Pr(h+, f+F ) = 0.00115262 ,Pr(h+, f−F ) = 0.00085575 .

Note thatPr(h+) = Pr(h+, f+F )+Pr(h+, f−F ). Remind that the fault set Γ = FC, FSis the only minimal critical set. The first analysis shows that even if the FF is not inΓ, the fault has a major influence on the hazard. The following conditional probabilitydistributions confirm the impression that FF has a major influence on the safety of thesystem:

Pr(f+C | f+F ) = 0.029701 , andPr(f+C | f−F ) = 0.01 .

If fault FF occurs the occurrence of fault FC almost increases threefold.Fritsch created a Bayesian network generation algorithm that based on the well-known

PC algorithm and uses the algorithms of this theses for its calculation [Fri17; SGS93]. TheBayesian network from figure 9.5 was derived using Fritsch’s Bayesian network generationalgorithm. The network confirms the impression that FF has a major influence on thesafety of the system.

FF Pr(fF)

f+F 0.4

FC Pr(fC)

f+S 0.4013

FC FF Pr(fC | fF)

f+C f+F 0.029701f+C f−F 0.01

Γ FS FC Pr(γ | fS, fC)

γ+ f+S f+C 1.0γ+ f+S f−C 0.0γ+ f−S f+C 0.0γ+ f−S f−C 0.0

H FF Γ Pr(h | fF, γ)

h+ f+F γ+ 0.24178h+ f+F γ− 0.0h+ f−F γ+ 0.35544h+ f−F γ− 0.0

Figure 9.5.: Bayesian network of the Dead Reckoning case study

154

9.5. Evaluation using Conditional Probabilities and Bayesian networks

9.5.2. Radio-based Railroad Crossing

Figure 9.6.: Bayesian network of the Radio-based Railroad Crossing case study

In his thesis, Fritsch applied his Bayesian network generation algorithm on the Radio-based Railroad Crossing case study [Fri17]. His results are briefly presented in this subsec-tion. Note that all faults in this case study are modeled as per demand faults. The graph ofthe network is shown in figure 9.6 and the detailed distributions are provided in appendix C.The graph itself already delivers some interesting findings:• If the BarrierMotorFault occurs, the BarrierSensorFailure gets more likely. The rea-son is that the barrier sensor is queried more often which makes the per-demandfault more likely to occur.

• The occurrence probabilities of the faults BarrierMotorStuck and OdometerPosi-tionOffset correlate even if they are in different isolated system parts. Hence, de-pendencies that have an impact on the fault occurrence probabilities exist acrosssystem borders.

Fritsch provides several other interesting findings in his thesis.The section showed that the static view provided by the Bayesian networks is a valu-

able means to get a better understanding of a safety-critical system. Such networks can begenerated from executable models using the approach introduced in this thesis.

155

10. Conclusions and Outlook

The safety of a software-intensive system is often hard to quantify. Models that containan abstract view on the behavior and the involved faults in the system can be used for thequantification. This thesis turned the spotlight on executable modeling languages that canbe used to create such models: it demonstrated what such modeling languages should looklike, how executable models in such languages can be analyzed probabilistically based onmodel checking, and how the impact of faults on hazards can be analyzed using executablemodels.

10.1. Results

A central finding of this thesis is that modeling languages do not have to be invented fromscratch. For this reason, the executable modeling language S# that was created in contextof this thesis, was developed as an embedded Domain Specific Language (DSL) of theestablished programming language C#. S# was specially tailored for the analysis of safety-critical systems and supports modeling both probabilistic and nondeterministic behavior.Engineers experienced in writing object-oriented software do not need long training to beable to apply S# on their assignments, because modeling with S# is not very different fromprogramming with C#. Due to its expressive modeling language, S# facilitates the model-ing of complex systems. Especially its object-oriented nature makes S# applicable to themodeling of larger systems. A model of a hemodialysis machine was used to demonstratethe modeling and analysis capabilities of the S# modeling framework.Furthermore, this thesis demonstrated how executable models can be model checked

probabilistically by explicitly executing the models instead of using model transformationsas competing approaches do. This approach allows the removal of temporary variables fromthe state space and as a consequence enables the probabilistic model checking of modelswith a high amount of such variables. The introduced algorithms and data structures thatenable probabilistic analysis are not limited to S# and can be added with little effort to otherexecutable languages. This is demonstrated by integrating the approach into the executableprogramming language Lustre. Formal proofs underline the correctness of the basic model

157

10. Conclusions and Outlook

checking algorithms.In the end, this thesis shows how practitioners can benefit from the safety analysis using

executable models by example: among others, different design alternatives can be assessedquantitatively, the effect of fault probabilities on the hazard probability can be measured,and Bayesian networks that shows the interdependencies between different faults can begenerated. Such results can support experts in their decision making how to improve thesafety of a system.

10.2. Outlook

Even if this thesis made it possible to analyze larger systems probabilistically, there are stillresearch questions open in this field. The following paragraphs provide some ideas forfurther research.The combination of two principles made it possible to analyze larger systems: the first

principle was to differentiate between micro and macro steps, and the second principlewas to only record a part of the variables in the states to get rid of a lot of temporaryvariables, which reduces the state space as a consequence. How this is done, dependsstrongly on the modeling language. In S#, local variables of methods are one example forsuch temporary variables, but also variables that are written before they are read in eachmacro step. Right now, these variables must be identified manually. To automate this, oneidea is to use compiler construction techniques to identify such variables automatically inthe byte code of S#models. Furthermore, the byte code could also be optimized to enablea faster model traversal.Currently, there is no explicit notion of time in S#. Another research idea is to add a mod-

eling artifact to S# that explicitly expresses time. With such an artifact motion equations,timer events, and failure rates could be expressed more naturally. It would not be necessaryto express time dependent behavior implicitly in macro step transitions. The frameworkcould convert terms in the model that contain time into sampled macro steps. In a secondstep, the basic formalisms could be enhanced to support clock regions and thus enable anative support of time.As a third research idea, a modeling artifact for rewards could be added. With rewards,

energy consumption or financial aspects (e.g., cost and payoff) could be modeled. Basedon this, the expected reward could be calculated. There is much preliminary work howrewards can be expressed on a lower level in probabilistic systems [KNP11; KZ+11].A field that is quite related to the model-based analysis is the model-based diagnosis. In

the model-based diagnosis the task is to find the reasons why a failure occurred [Nyb99].The last idea is to research to what extent Bayesian networks generated from executablemodels could be used for diagnosis.

158

A. Architecture of S#

Most algorithms for the safety analysis of S# models are language agnostic. Therefore,the S# toolset was refactored into C# specific part and a a language agnostic part called ISSESafety Checking, which accounts for most of the code (see Table A.1). This chapter summa-rizes the architecture. Other programming languages can be extended to executable model-ing languages using the ISSE Safety Checking part. The reference implementation containsa such extension of the synchronous programming language Lustre (see section 7.6).The architecture was designed that orthogonal features can be combined; the axes are• simulation and model checking,• nondeterministic, probabilistic, combination of nondeterministic and probabilisticchoice,

• different optimizations, and• single-core and multi-core state traversal.Figure A.1 contains the language agnostic part. The parallelized of the model traverser

(algorithm 7.1 on page 108) is implemented in the classes Worker, ModelTraverser, and Load-

Balancer. The optimizations are all implemented as independent classes that inherit fromeither ITransitionModifier or IBatchedTransitionAction. Their methods are called by the modeltraverser. The abstract class ExectuableModel is the class that an executable modeling lan-guage must implement to be supported by the reference implementation. Different imple-mentations of the abstract class ExecutedModel exist in the reference implementation, onefor labeled Markov chain-based model checking, one labeled choice-aware Markov decisionprocess-based model checking, and one for the DCCA and fault-aware model checking.Figure A.2 shows the S# dependent part. The class RuntimeModel implements the ab-

stract class ExectuableModel to make S# a modeling language that can be analyzed with theinfrastructure provided by ISSE Safety Checking.Figure A.3 shows the classes of the labeled Markov chain-based model checking. The

class LmcModelChecker LmcModelCheckercontains the data structures and algorithms ofchapter 6. The algorithms introduced by chapter 7 are distributed among the other classes.Table A.1 contains the source lines of code broken down into separate features. The

names have slightly been changed to fit in better to the terminology of this thesis. Thesource code is available at [ISSE18].

159

A. Architecture of S#

Figure A.1.: Relevant classes of the language agnostic safety checking part

Figure A.2.: Relevant classes of the S# dependent part

160

Figure A.3.: Relevant classes of the labeled Markov chain part

Files Source Lines of Code

ISSE Safety Checking 201 19931Common Architecture 119 9952Fault Aware Kripke Structures 24 1646Markov Chains 29 3682Markov Processes 29 4651

Safety Sharp 116 8376Compiler 52 4064Runtime 37 3038Modeling Artifacts 27 1274

Lustre Runtime 20 2407

Bayesian Analysis 18 2083

Tests 698 37818

Total 1053 70615

Table A.1.: Source lines of code of the reference implementation

161

B. Mathematical Preliminaries

B.1. Probability Theory

A probabilistic experiment is a procedure that can be repeated infinitely and has a well-defined set of possible outcomes. Probability spaces model such experiments. Generally, aprobability space is given by (Ω,E, Pr). A sample space Ω is a set of all possible outcomes.For example, for the tossing of a dice with 6 faces, each face is a possible outcome, andthe sample space can be defined as Ω = 1, 2, 3, 4, 5, 6. Given a sample space Ω theset of events E ⊆ 2Ω can be defined, where each event of E is a set of zero or moreoutcomes. To be able to define a sound measure on it, E needs to be a σ-algebra onΩ. The pair (Ω,E) is called measurable space. A probability measure Pr assigns to eachevent of E a probability p in [0..1]. Furthermore, Pr must satisfy the countable additivityproperty, and Pr(Ω) = 1. Discrete probability spaces are special cases of probabilityspaces: the sample space Ω of a discrete probability space is countable, the events E aregiven by all subsets of Ω, i.e., E = 2Ω, and the probability measure Pr can be derived froma probability mass function pmf : Ω → [0, 1] with

∑ω∈Ω pmf (ω) = 1. Given such a pmf ,

Pr(A) =∑

ω∈A pmf (ω) for allA ⊆ Ω. A discrete probability space is denoted by (Ω, Pr).A more detailed introduction to probability spaces and probability measures can be foundin standard introductions on probability theory, e.g., [Fel66].

B.1.1. Conditional Probability

An exhaustive introduction to conditional probabilities is provided by [Nea04].Let E and F be events such that Pr(F ) = 0. Then the conditional probability of E given

F is given by

Pr(E | F ) =Pr(E ∩ F )

Pr(F ).

Given a probability space (Ω, Pr), a random variable X is a function on Ω; let X = xrepresent the event e ∈ Ω such that X(e) = x, thenPr(X = x) is called the probabilitydistribution of the random variable X .

163


x+1 x+

2

x+3

Figure B.1.: Venn Diagram

Given two random variables X and Y that are defined on a probability space, then letPr(X = x, Y = y) be the joint probability distribution of X and Y .

X and Y are independent (denoted by X ⊥⊥ Y ) if Pr(X = x, Y = y) = Pr(X =x) · Pr(Y = y) for all x and y. Given the random variables X and Y and Z , then X andY are said to be conditionally independent given the set Z (denoted by X ⊥⊥ Y | Z = z) ifPr(X = x, Y = y | Z = z) = Pr(X = x | Z = z) · Pr(Y = y) for all x, y, and z.Dependency as contrast of independency is denoted using the symbol ⊤⊤ , e.g., X ⊤⊤ Y .

LetX1 . . . Xn be random variables, then let x1 . . . xn be a shorthand forX1 = x1 . . . Xn =xn. LetX be a random variable with the boolean domain, then let x+ denote the (positive)event X = true, and x− denote the (negative) event X = false.Let X1, X2, and X3 be random variables with boolean domains. Then, Pr(x+

1 , x+2 , x−

3 )can be calculated by using the conditional probabilities of the positive events (cf. figure B.1):

Pr(x+1 , x+

2 , x−3 ) = Pr(x+

1 , x+2 ) − Pr(x+

1 , x+2 , x+

3 ) .

In general, if 0 < m ≤ n, every value of Pr(x+1 , . . . , x+

m, x−m+1, . . . , x−

n ) can be derivedfrom conditional probabilities using only positive events,

Pr(x+1 , . . . , x+

m, x−m+1, . . . , x−

n ) =

Pr(x+1 , . . . , x+

m) −n−m∑

k=1

(−1)k+1

∑

J⊆1,...,n−m,|J |=k

Pr

∩

j∈J

x+1 ∩ · · · ∩ x+

m ∩ x+m+j

.

Fritsch provides a detailed explanation of the formula based on the inclusion–exclusionprinciple in [Fri17].

B.1.2. Geometric Distribution

[DBÇ15] The geometric distribution is the probability distribution of the number of Bernoullitrials for “repeat until the first success”. In Bernoulli trials, each trial is independent from

164

B.1. Probability Theory

the previous trial. p is the probability of success, the integer k is the number of the firstsuccess. For k ∈ Z

+:

(B.1) Pr(X = k) = (1− p)k−1 · p

The probability for at least one success within k trials is

(B.2) Pr(X ≤ k) = 1− (1− p)k

Expected number of trials until first success

(B.3) E(X) = 1/p

The geometric distribution is memoryless, i.e., given that the first success has not yetoccurred, the probability for at least one success does not depend on the previous numberof unsuccessful trials:

Pr(X > t + x | X > t) =Pr(X > t + x and X > t)

Pr(X > t)=

Pr(X > t + x)

Pr(X > t)

= Pr(X > t + x | X > t) =(1− p)t+x

(1− p)t= (1− p)x = Pr(X > x) .

B.1.3. Exponential Distribution

The exponential distribution is the continuous analogue of the geometric distribution, i.e.,events occur at a constant average rate continuously and independently. It especially isuseful when talking about time. Given a rateλ, the first success until time t can be calculated:

(B.4) Pr(X ≤ t) =∫ t

0λe−λxdx = 1− e−λt

Also the expected time t until first success can be calculated:

(B.5) E(X) = 1/λ

The exponential distribution is memoryless:

Pr(X > t + x | X > t) =Pr(X > t + x and X > t)

Pr(X > t)=

Pr(X > t + x)

Pr(X > t)

=e−λ(t+x)

e−λt= e−λx = Pr(X > x) .

Thus, both the geometric distribution and the exponential distribution are memoryless.

165


Fault F1

MinimalCut Set Γ

Fault F2

Hazard H

Pr(f+1 ) = 0.02 Pr(f+

2 ) = 0.01

Pr(γ+ | f+1 , f+

2 ) = 1Pr(γ+ | f−

1 , f+2 ) = 0

Pr(γ+ | f+1 , f−

2 ) = 0Pr(γ+ | f−

1 , f−2 ) = 0

Pr(h+ | γ+) = 0.8Pr(h+ | γ−) = 0.0

Figure B.2.: Simple Bayesian network

B.1.4. Product Experiment

For 1 ≤ i ≤ n let (Ωi, Pr i) be discrete probability spaces. Then the discrete probabilityspace (Ω, Pr) of the product experiment is given by

• the sample space Ω = Ω1 × · · · × Ωn, and• the probability measure Pr : Ω → [0, 1] with

Pr(a1, . . . , an) =n∏

i=1

Pr i(ai), (a1, . . . , an) ∈ Ω = Ω1 × · · · × Ωn.

More details and proofs for the product experiment are provided by [Kre05].

B.1.5. Bayesian Networks

A Bayesian network is a model that represents a set of random variables and their condi-tional dependencies using a directed acyclic graph. Figure B.2 shows a simple Bayesiannetwork with the random variables F1, F2, Γ, andH . The conditional dependencies of therandom variables are written next to their corresponding nodes in the graph. The probabil-ities of the negative events are omitted, because they can easily be derived from the positiveevents, e.g., Pr(f−

1 ) = 1− Pr(f+1 ) = 0.98, and Pr(h− | γ+) = 1− Pr(h+ | γ+) = 0.2.

Given a Bayesian network with the random variables X1, X2, . . . Xn. The joint proba-bility of an event x1, x2, . . . , xn can be calculated based on the conditional probabilities ofthe network. Let pa(x) denote the matching events of node x’s parents1, or semi-formal,pa(X = x) = Xi = xi | Xi is a parent of X. If Pr(pa(xi)) = 0 for all 1 ≤ i ≤ n,then

Pr(x1, x2, . . . , xn) = Pr(x1 | pa(x1)) · Pr(x2 | pa(x2)) · · · Pr(xn | pa(xn))

1A parent is a direct predecessor in the graph.

166

B.2. Order Theory

where Pr(x | ∅) = Pr(x).Thus, in the example different probabilities can be inferred, e.g.,

Pr(h+, γ+, f+1 , f+

2 ) = Pr(h+ | γ+) · Pr(γ+ | f+1 , f+

2 ) · Pr(f+1 ) · Pr(f+

2 )

= 0.8 · 1 · 0.02 · 0.01 = 0.00016 .

Using Bayes’ theorem and theorems of the marginal distribution (not described here), itis also possible to infer probabilities like Pr(h−, γ+), Pr(h+), Pr(h+ | f+

1 ), and Pr(f+1 |

h+). An exhaustive introduction to Bayesian networks is provided by [Nea04].

B.2. Order Theory

This section recalls some basics of order theory. A beginner friendly introduction is pro-vided by Davey et al. in [DP02]. A brief summary with a focus on probability theory isprovided by Baier in [Bai98]. As Davey et al. observed, there is currently no consistentterminology across different authors; this also applies for [DP02] and [Bai98]. This sectiontries to keep the definitions simple and adjusted to the application, and sticks close to thedefinitions of [DP02].

B.2.1. Partial Order

A partial order on a set P is a binary relation ≤ on P that is reflexive, antisymmetric, andtransitive; or more formally, for all x, y, z ∈ P ,

• x ≤ x,• x ≤ y and y ≤ x implies x = y, and• x ≤ y and y ≤ z implies x ≤ z.

A set equipped with an order relation ≤ is called partially ordered set.

B.2.2. Complete Partially Ordered Set

Let P be a partially ordered set.An element x ∈ P is an upper bound of S ⊆ P if s ≤ x for all s ∈ S. The set of all

upper bounds Su is given by x ∈ P | ∀s ∈ S. s ≤ x. If Su has a least element y, i.e.,y ∈ Su and y ≤ x for all upper bounds x of S, then y is called the supremum of S; letsup S denote the supremum of S if it exists.Dually, an element x ∈ P is a lower bound of S if s ≥ x for all s ∈ S. The set of all lower

bounds Sl is given by x ∈ P | ∀s ∈ S. s ≥ x. If Sl has a greatest element y, then y iscalled the infimum of S; let inf S denote the infimum of S if it exists.

⊥ ∈ P is called bottom, if for all x ∈ P , ⊥ ≤ x. A non-empty subset S of P is calleddirected, if for every pair of elements x, y ∈ S, there exists z ∈ S such that z ∈ x, yu.

167


A partially ordered set P is a complete partially ordered set (CPO) if P has a bottomelement, and sup S exists for each directed subset S of P .This thesis uses the term “complete partially ordered sets” (CPO) as Davey et. al. In con-

trast, Baier uses the term “directed-complete partial order (DCPO)” for the same concept,other authors use the term “domain”. Especially those terms are not defined consistentlyin the literature; caution is adviced.

B.2.3. Function Spaces

IfX is a set and P a CPO with bottom element⊥P , then the function spaceX → P has acorresponding partial order where f1 ≤ f2 if and only if f1(x) ≤ f2(x). Thus,X → P is aCPO itself of which the bottom element is the function⊥where for all x ∈ X ,⊥(x) = ⊥P ,and for each directed set of functions Ξ ⊆ (X → P ), the supremum sup Ξ is given by(sup Ξ)(x) = supf(x) | f ∈ Ξ.Each compact interval of real numbers [a, b] with a < b is a CPO (where the partial

order is given by the natural order of real numbers). Therefore, also the function spaceX → [a, b] is a CPO.Given a function space X → [a, b], and a non-empty set of functions Ξ ⊆ (X →

[a, b]), then supf∈Ξ f denotes the function f : X → [a, b] with f(x) = supf∈Ξ f(x) =supf(x) | f ∈ Ξ. If Ξ is a family of functions fi | i ∈ I, then let supi∈I fi denotesupf∈Ξ f . The functions for the infimum inff∈Ξ f and infi∈I fi are defined dually.A function F : (X → [a, b]) → (X → [a, b]) is called monotone if and only if for all

fx, fy : (X → [a, b]), fx ≤ fy implies F (fx) ≤ F (fy).A function F : (X → [a, b]) → (X → [a, b]) is said to preserve suprema if and only if

for all nonempty directed sets Ξ ⊆ (X → [a, b]), F(supf∈Ξ f

)= supf∈Ξ F (f).

The last proposition can be derived using standard order theory arguments (Kleene fixed-point theorem):

Proposition B.1. Let F : (X → [0, 1]) → (X → [0, 1]) be a monotone operator. Then,F has a least fixed point lfp(F ). lfp(F ) is given by

lfp(F ) = inff∈ΞF

≥

f

where ΞF≥ = f : X → [0, 1]) | f ≥ F (f).

If F preserves suprema then

lfp(F ) = supn≥0

F n(0)

where 0(x) = 0 for all x ∈ X .

168

C. Bayesian Network of theRadio-based railroad crossing

This chapter contains the detailed results of applying Fritsch’s Bayesian network genera-tion algorithm on the Radio-based Railroad Crossing case study of section 2.1 [Fri17].

Brakes Pr(brakes)

brakes+ 0.00458948199352323

BarrierMotor Pr(barrierMotor)

barrierMotor+ 0.205557824560254

169

C. Bayesian Network of the Radio-based railroad crossing

BarrierSensor BarrierMotor Pr(barrierSensor | barrierMotor)

barrierSensor+ barrierMotor+ 0.000602783195189369barrierSensor+ barrierMotor− 0.000599863641499371

OdometerPosition BarrierMotor BarrierSensor Pr(odometerPosition | . . . )

odometerPosition+ barrierMotor+ barrierSensor+ 0.544641076528702odometerPosition+ barrierMotor− barrierSensor+ 0.544625766807146odometerPosition+ barrierMotor+ barrierSensor− 0.545197005954442odometerPosition+ barrierMotor− barrierSensor− 0.545173314524603

OdometerSpeed BarrierMotor BarrierSensor OdometerPosition Pr(odometerSpeed | . . . )

odometerSpeed+ barrierMotor+ barrierSensor+ odometerPosition+ 0.542862508015731odometerSpeed+ barrierMotor− barrierSensor+ odometerPosition+ 0.542816563390154odometerSpeed+ barrierMotor+ barrierSensor− odometerPosition+ 0.544277891013947odometerSpeed+ barrierMotor− barrierSensor− odometerPosition+ 0.544205280520717odometerSpeed+ barrierMotor+ barrierSensor+ odometerPosition− 0.545793902052672odometerSpeed+ barrierMotor− barrierSensor+ odometerPosition− 0.545807046047289odometerSpeed+ barrierMotor+ barrierSensor− odometerPosition− 0.545215125336568odometerSpeed+ barrierMotor− barrierSensor− odometerPosition− 0.545234006575906

H CBS COS COP CMB Pr(h | cBS .cOS .cOP .cMB)

h+ c+

BSc+

OSc+

OPc+

MB0.550216896031335

h+ c−BS

c+

OSc+

OPc+

MB0.213073792962948

h+ c+

BSc−

OSc+

OPc+

MB0.510295247329104

h+ c−BS

c−OS

c+

OPc+

MB0.107477138576541

h+ c+

BSc+

OSc−

OPc+

MB0.504417564886788

h+ c−BS

c+

OSc−

OPc+

MB0.122964129057371

h+ c+

BSc−

OSc−

OPc+

MB0.450319545245138

h+ c−BS

c−OS

c−OP

c+

MB3.41448999085584× 10−8

h+ c+

BSc+

OSc+

OPc−

MB0.543232608384295

h+ c−BS

c+

OSc+

OPc−

MB0.20448927439055

h+ c+

BSc−

OSc+

OPc−

MB0.510077470533965

h+ c−BS

c−OS

c+

OPc−

MB0.10743286766725

h+ c+

BSc+

OSc−

OPc−

MB0.497957475761332

h+ c−BS

c+

OSc−

OPc−

MB0.114977246791061

h+ c+

BSc−

OSc−

OPc−

MB0.450124680013939

h+ c−BS

c−OS

c−OP

c−MB

1.10205146490094× 10−14

170

List of Figures

2.1. Radio-based Railroad Crossing . . . . . . . . . . . . . . . . . . . . . . . . 82.2. Height Control System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3. Hemodialysis Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4. Pressure Tank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5. Dead Reckoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.6. Degraded Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.1. Typical control loop of a safety-critical system . . . . . . . . . . . . . . . . 203.2. Pressure tank case study as instantiation of the typical control loop . . . . 203.3. Example of fault propagation . . . . . . . . . . . . . . . . . . . . . . . . 213.4. Sources of faults and failures in a control loop . . . . . . . . . . . . . . . 223.5. Bathtub curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.6. Discrete geometric distribution and continuous exponential distribution . . 243.7. Fault tree pressure tank . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.1. Integration of executable models in the development of a safety-criticalsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2. Model of Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.3. Positioning of S# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.4. Macro step in S# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.5. State space of the dead reckoning example with faults FS and FC but not FF 404.6. HemodialysisMachine Bdd . . . . . . . . . . . . . . . . . . . . . . . . . . 434.7. Example of a simple fluid flow illustrating the flow concept . . . . . . . . 464.8. Internal Block Diagram of simple fluid flow . . . . . . . . . . . . . . . . 474.9. Internal Block Diagram of a model in which a flow splits . . . . . . . . . . 48

5.1. Labeled Markov chain with 3 states . . . . . . . . . . . . . . . . . . . . . 575.2. Unwound state space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.3. Intuition for lemma 5.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635.4. Example for lemma 5.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635.5. Labeled Markov chain initial distribution and 3 states . . . . . . . . . . . . 69

171

List of Figures

5.6. Example of labeled choice-aware Markov decision process . . . . . . . . . 745.7. Example of Labeled choice-aware Markov Decision Process . . . . . . . . 855.8. Transforming a MDP to a CMDP . . . . . . . . . . . . . . . . . . . . . . 875.9. Flattening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.1. Example of a labeled Markov chain with an initial distribution . . . . . . . 946.2. Example of Labeled choice-aware Markov Decision Process . . . . . . . . 97

7.1. Example of a probabilistic formal program . . . . . . . . . . . . . . . . . 1077.2. Structural operational semantics defining the inference rules that are used

by the purely probabilistic semantics of executable models. . . . . . . . . . 1117.3. Inference rules applied to the formal probabilistic program of figure 7.1 . 1127.4. Example of transition information obtained during the generation of a

labeled Markov chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1147.5. Calculation of the traversal information of the running example (Iteration 1) 1157.6. Calculation of the traversal information of the running example (Iteration 2) 1157.7. Calculation of the traversal information of the running example (Iteration 3) 1167.8. Calculation of the traversal information of the running example (Iteration 4) 1177.9. Structural operational semantics defining the inference rules that are used

by the semantics of executable models that comprises both probabilisticand nondeterministic choice. . . . . . . . . . . . . . . . . . . . . . . . . . 121

7.10. Inference rules applied to the formal probabilistic program of figure 7.1 . 1227.11. Example of transition information obtained during the generation of a

labeled Markov chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1237.12. Calculation of the traversal information of the running example (Iteration 1) 1247.13. Example of transition information obtained during the generation of a

labeled Markov chain with a normalized once formula . . . . . . . . . . . 128

8.1. Early termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

9.1. Impact of MeasureSignalFault on the hazard “position estimation devi-ates” in the degraded mode case study . . . . . . . . . . . . . . . . . . . . 148

9.2. Impact of the quality of the light barrier in the height control system casestudy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

9.3. Impact of the probability of a defected water heater in the hemodialysismachine case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

9.4. Types of reasoning with Bayesian networks . . . . . . . . . . . . . . . . . 1539.5. Bayesian network of the Dead Reckoning case study . . . . . . . . . . . . 1549.6. Bayesian network of the Radio-based Railroad Crossing case study . . . . . 155

A.1. Relevant classes of the language agnostic safety checking part . . . . . . . 160

172

List of Figures

A.2. Relevant classes of the S# dependent part . . . . . . . . . . . . . . . . . . 160A.3. Relevant classes of the labeled Markov chain part . . . . . . . . . . . . . . 161

B.1. Venn Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164B.2. Simple Bayesian network . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

173

List of Tables

2.1. Overview over the case studies . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1. Fault probabilities in the hemodialysis case study . . . . . . . . . . . . . . 504.2. Example of pressure tank model execution . . . . . . . . . . . . . . . . . 524.3. Fault probabilities in the pressure tank case study . . . . . . . . . . . . . . 53

5.1. Example for the iteration operator with Π = [s1, θ5 θ4 θ4] . . . . . . . . 615.2. Example for the iteration operator with Π = F θ3 . . . . . . . . . . . . 625.3. Example for the iteration operator with Π = F≤3 θ3 . . . . . . . . . . . 675.4. Comparison of labeled Markov chains with standard Markov chains when

faults and hazards are observable . . . . . . . . . . . . . . . . . . . . . . 725.5. Comparison of labeled Markov chains with standard Markov chains when

only hazards are observable . . . . . . . . . . . . . . . . . . . . . . . . . 725.6. Comparison of labeled CMDP with standard MDPS when faults and haz-

ards are observable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905.7. Comparison of labeled CMDP with standard MDPS when only hazards

are observable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.1. Evaluation of the model checking efficiency of sparse labeled Markov chains 1026.2. Evaluation of the model checking efficiency of sparse labeled CMDPs . . . 102

7.1. Evaluation of the generation time of a sparse labeled Markov chains whenall faults and hazards are observable . . . . . . . . . . . . . . . . . . . . . 119

7.2. Evaluation of the generation time of a sparse labeled Markov chains whenonly hazards are observable . . . . . . . . . . . . . . . . . . . . . . . . . 119

7.3. Evaluation of the generation time of a sparse labeled CMDP when onlyhazards are observable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

7.4. Evaluation of the generation time of a sparse labeled Markov chain havinga Once-Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

175

List of Tables

8.1. Evaluation of the generation time of a sparse labeled Markov chains usingmulti-core state traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

8.2. Evaluation of the generation time of a sparse labeled CMDP using multi-core state traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

8.3. Evaluation of the generation time of a sparse labeled Markov chains usingstatic fault forwarding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

8.4. Evaluation of the generation time of a sparse labeled CMDP using staticfault forwarding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

8.5. Evaluation of the generation time of a sparse labeled Markov chains usingearly termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

8.6. Evaluation of the generation time of a sparse labeled CMDP using earlytermination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

9.1. DCCA-based evaluation of the height control case study . . . . . . . . . . 1439.2. DCCA-based evaluation of the hemodialysis machine case study . . . . . . 1449.3. Analysis of design variants of the height control system case study . . . . . 151

A.1. Source lines of code of the reference implementation . . . . . . . . . . . . 161

176

Bibliography

[AL+04] A. Avizienis, J.-C. Laprie, B. Randell, and C. Landwehr. “Basic Concepts andTaxonomy of Dependable and Secure Computing.” In: Dependable and SecureComputing 1.1 (2004), pp. 11–33.

[Bai98] C. Baier. “On the Algorithmic Verification of Probabilistic Systems.” Habilita-tion. Universität Mannheim, 1998.

[BB+14] N. Bertrand, P. Bouyer, T. Brihaye, Q. Menet, C. Baier, M. Größer, and M.Jurdziski. “Stochastic Timed Automata.” In:Logical Methods in Computer Science10.4:6 (December 2014).

[BC+09] M. Bozzano, A. Cimatti, J.-P. Katoen, V. Nguyen, T. Noll, and M. Roveri.“The COMPASS Approach: Correctness, Modelling and Performability ofAerospace Systems.” English. In: Computer Safety, Reliability, and Security.Springer, 2009, pp. 173–186.

[BC03] M. Benedetti and A. Cimatti. “Bounded Model Checking for Past LTL.” In:Tools and Algorithms for the Construction and Analysis of Systems. Ed. by H. Garaveland J. Hatcliff. Berlin, Heidelberg: Springer Berlin Heidelberg, 2003, pp. 18–33.isbn: 978-3-540-36577-8.

[BCG04] C. Baier, F. Ciesinski, and M. Größer. “PROBMELA: A Modeling Lan-guage for Communicating Probabilistic Systems.” In: Proc. 2nd ACM-IEEE Intl.Conf. Formal Methods and Models for Codesign (MEMOCODE’2004). IEEE, 2004,pp. 57–66.

[Bea03] D. Beauquier. “On Probabilistic Timed Automata.” In: Theor. Comput. Sci. 292.1(January 2003), pp. 65–84. issn: 0304-3975.

[Ber07] G. Berry. “SCADE: SynchronousDesign andValidation of Embedded ControlSoftware.” English. In: Next Generation Design and Verification Methodologies forDistributed Embedded Control Systems. Springer, 2007. isbn: 978-1-4020-6253-7.

[Bil12] M. Billes. “Evaluation vonWerkzeugen zur Sicherheitsanalyse.” Bachelor thesis.Institute for Software and Sytems Engineering, 2012.

177

Bibliography

[BK08] C. Baier and J.-P. Katoen. Principles of Model Checking. Cambridge, MA: MITPress, 2008. isbn: 978-0-262-02649-9.

[BS+16] M. J. Butler, K.-D. Schewe, A. Mashkoor, and M. Biró, eds. Abstract State Ma-chines, Alloy, B, TLA, VDM, and Z, Proceedings. Vol. 9675. Lecture Notes inComputer Science. Springer, 2016. isbn: 978-3-319-33599-5.

[But03] J. Butcher. The Numerical Analysis of Ordinary Differential Equations: Runge-Kuttaand General Linear Methods. 2nd. Wiley, 2003.

[BV03] M. Bozzano and A. Villafiorita. “Improving System Reliability via ModelChecking: The FSAP/NuSMV-SA Safety Analysis Platform.” In: ComputerSafety, Reliability, and Security. Vol. 2788. Lect. Notes Comp. Sci. Springer, 2003,pp. 49–62.

[CB+08] F. Ciesinski, C. Baier, M. Groesser, and D. Parker. “Generating compactMTBDD-representations from Probmela specifications.” In: Proc. 15th Interna-tional SPIN Workshop on Model Checking of Software (SPIN’08). Vol. 5156. LNCS.Springer, 2008, pp. 60–76.

[CB06] F. Ciesinski and C. Baier. “LiQuor: A tool for Qualitative and QuantitativeLinear Time analysis of Reactive Systems.” In: Third International Conference onthe Quantitative Evaluation of Systems - (QEST’06). September 2006, pp. 131–132.

[CD+06] J. Curtis, K. Delaney, P. O´Kane, B. Roshto, and J. Sweeney. “HemodialysisDevices.” In: Core Curriculum for the Dialysis Technician: A Comprehensive Review ofHemodialysis, 4th edition. Medical Education Institute, 2006.

[Cie11] F. Ciesinski. “High-level modelling and efficient analysis of randomized proto-cols.” PhD thesis. Technische Universität Dresden, 2011.

[DBÇ15] D. M. Diez, C. D. Barr, and M. Çetinkaya-Rundel. OpenIntro Statistics. OpenIn-tro, Inc., 2015. isbn: 9781943450053.

[DP02] B. Davey and H. Priestley. Introduction to Lattices and Order. Cambridge mathe-matical text books. Cambridge University Press, 2002. isbn: 9780521784511.

[EHZ10] C. Eisentraut, H. Hermanns, and L. Zhang. “On Probabilistic Automata inContinuous Time.” In: 2010 25th Annual IEEE Symposium on Logic in ComputerScience. July 2010, pp. 342–351.

[Fel66] W. Feller. An Introduction to Probability Theory and Its Applications, Vol. 1. Wiley,1966. isbn: 0471257087.

[Fre15] Freie und Hansestadt Hamburg, Behörde für Wirtschaft, Verkehr und Inno-vation. Durchschnittliche tägliche KFZ-Verkehrsstärken an Werktagen. Hamburg 2013.German. 2015. url: http://www.hamburg.de/bwvi/start-verkehrsbelastung/ (visited on 08/28/2015).

178

http://www.hamburg.de/bwvi/start-verkehrsbelastung/

http://www.hamburg.de/bwvi/start-verkehrsbelastung/

Bibliography

[Fri17] S. Fritsch. “Data-Science-basierte Auswirkungsanalyse von Komponenten-fehlern in sicherheitskritischen Systemen.” MA thesis. Institute for Softwareand Systems Engineering, 2017.

[GKM14] F. Gretz, J.-P. Katoen, and A. McIver. “Operational Versus Weakest Pre-expectation Semantics for the Probabilistic Guarded Command Language.” In:Performance Eval. 73 (2014), pp. 110–132. issn: 0166-5316.

[GOR07] M. Güdemann, F. Ortmeier, and W. Reif. “Computer Safety, Reliability, and Se-curity.” In: Springer, 2007. Chap. Using Deductive Cause-Consequence Anal-ysis (DCCA) with SCADE, pp. 465–478.

[Göt17] M. Götz. “Erweiterung des Compilers einer datenussorientierten Program-miersprache zur Analyse sicherheitskritischer Systeme.” Bachelor thesis. Insti-tute for Software and Sytems Engineering, 2017.

[Güd11] M. Güdemann. “Qualitative and Quantitative Formal Model-Based SafetyAnalysis.” Dissertation. Magdeburg, Univ., 2011.

[Hab17] A. Habermaier. “Design Time and Run Time Formal Safety Analysis usingExecutable Models.” Dissertation. University of Augsburg, 2017.

[HE+15] A. Habermaier, B. Eberhardinger, H. Seebach, J. Leupolz, and W. Reif. “Run-time Model-Based Safety Analysis of Self-Organizing Systems with S#.” In:Self-Adaptive and Self-Organizing Systems Workshops. 2015, pp. 128–133.

[HG+12] A. Habermaier, M. Güdemann, F. Ortmeier, W. Reif, and G. Schellhorn. “TheForMoSA Approach to Qualitative and Quantitative Model-Based Safety Anal-ysis.” In: Railway Safety, Reliability, and Security. IGI Global, 2012, pp. 65–114.

[HJ90] H. Hansson and B. Jonsson. “A calculus for communicating systems with timeand probabilities.” In: Proceedings 11th Real-Time Systems Symposium. December1990, pp. 278–287.

[HK+16] A. Habermaier, A. Knapp, J. Leupolz, and W. Reif. “Fault-Aware Modelingand Specification for Efficient Formal Safety Analysis.” In: Proc. Joint 21st Intl.Wsh. Formal Methods for Industrial Critical Systems and 16th Intl. Wsh. Automated Ver-ification of Critical Systems (FMICS-AVoCS’16). Ed. by M. H. ter Beek, S. Gnesi,and A. Knapp. Vol. 9933. Lect. Notes Comp. Sci. Springer, 2016, pp. 97–114.

[HLR15] A. Habermaier, J. Leupolz, and W. Reif. “Executable Specifications of Safety-Critical Systems with S#.” In: Proc. of Dependable Control of Discrete Systems. IFAC,2015, pp. 60–65.

179

Bibliography

[HLR16] A. Habermaier, J. Leupolz, and W. Reif. “Unified Simulation, Visualization,and Formal Analysis of Safety-Critical Systems with S#.” In: Proc. Joint 21st Intl.Wsh. Formal Methods for Industrial Critical Systems and 16th Intl. Wsh. Automated Ver-ification of Critical Systems (FMICS-AVoCS’16). Ed. by M. H. ter Beek, S. Gnesi,and A. Knapp. Vol. 9933. Lect. Notes Comp. Sci. Springer, 2016, pp. 150–167.

[Hol04] G. Holzmann. The SPIN Model Checker. Addison-Wesley, 2004.

[ISO10] ISO. ISO 24765: Systems and software engineering – Vocabulary. 2010.

[ISSE18] Institute of Software and Systems Engineering. S# Source Repository and Wiki.2018. (Visited on 01/30/2018).

[JRH18] E. Jahier, P. Raymond, and N. Halbwachs. The lustre v6 reference manual. Jan-uary 29, 2018. url: http://www-verimag.imag.fr/DIST-TOOLS/SYNCHRONE/lustre-v /doc/lv -ref-man.pdf (visited on 02/19/2018).

[KH+16] D. Klumpp, A. Habermaier, B. Eberhardinger, and H. Seebach. “Optimis-ing Runtime Safety Analysis Efficiency for Self-Organising Systems.” In: Self-Adaptive and Self-Organizing Systems Workshops. IEEE, 2016.

[KL+15] G. Kant, A. Laarman, J. Meijer, J. van de Pol, S. Blom, and T. van Dijk.“LTSmin: High-Performance Language-Independent Model Checking.” In:Tools and Algorithms for the Construction and Analysis of Systems. Vol. 9035. Lect.Notes Comp. Sci. Springer, 2015, pp. 692–707.

[KN04] K. B. Korb and A. E. Nicholson. Bayesian Artificial Intelligence. Series in computerscience and data analysis. Boca Raton: Chapman & Hall/CRC, 2004. isbn: 1-58488-387-1.

[KNP11] M. Kwiatkowska, G. Norman, and D. Parker. “PRISM 4.0: Verification ofProbabilistic Real-Time Systems.” In: Computer Aided Verification. Vol. 6806.Lect. Notes Comp. Sci. Springer, 2011, pp. 585–591.

[Kre05] U. Krengel. Einführung in die Wahrscheinlichkeitstheorie und Statistik. Vieweg +Teubner Verlag, 2005. isbn: 978-3-8348-0063-3.

[KZ+11] J.-P. Katoen, I. Zapreev, E. Hahn, H. Hermanns, and D. Jansen. “The Insand Outs of the Probabilistic Model Checker MRMC.” In: Perform. Eval. 68.2(February 2011), pp. 90–104. issn: 0166-5316.

[Laa14] A. W. Laarman. “Scalable Multi-Core Model Checking.” PhD thesis. Enschede,The Netherlands, 2014.

[Lev11] N. Leveson. Engineering a Safer World. MIT Press, 2011.

[Lev95] N. Leveson. Safeware: system safety and computers. 1995.

180

http://www-verimag.imag.fr/DIST-TOOLS/SYNCHRONE/lustre-v6/doc/lv6-ref-man.pdf

http://www-verimag.imag.fr/DIST-TOOLS/SYNCHRONE/lustre-v6/doc/lv6-ref-man.pdf

Bibliography

[LHR16] J. Leupolz, A. Habermaier, and W. Reif. “Safety Analysis of a HemodialysisMachine with S#.” In: EuroAsiaSPI 2016 Industrial Proceedings. Whitebox, 2016.

[LHR18] J. Leupolz, A. Habermaier, and W. Reif. “Quantitative and Qualitative SafetyAnalysis of a Hemodialysis Machine with S#.” In: Journal of Software: Evolutionand Process. Wiley, 2018.

[LK+17] J. Leupolz, A. Knapp, A. Habermaier, and W. Reif. “Qualitative and Quanti-tative Analysis of Safety-Critical Systems with S#.” In: International Journal onSoftware Tools for Technology Transfer. Springer, 2017.

[LL96] G. Le Lann. The Ariane 5 Flight 501 Failure - A Case Study in System Engineering forComputing Systems. Research Report RR-3079. Projet REFLECS. INRIA, 1996.

[LPW10] A. Laarman, J. van de Pol, and M. Weber. “Boosting multi-core reachabilityperformance with shared hash tables.” In: Formal Methods in Computer AidedDesign. October 2010, pp. 247–255.

[LSO12] M. Lipaczewski, S. Struck, and F. Ortmeier. “Using Tool-Supported ModelBased Safety Analysis – Progress and Experiences in SAML Development.”In: High-Assurance Systems Engineering. IEEE, 2012, pp. 159–166.

[Mar03] N. Markey. “Temporal Logic with Past is Exponentially More Succinct.” In:EATCS Bulletin. Vol. 79. European Association for Theoretical Computer Sci-ence, 2003, pp. 122–128.

[Mas16] A. Mashkoor. “The Hemodialysis Machine Case Study.” In: Abstract StateMachines, Alloy, B, TLA, VDM, and Z. Ed. by M. Butler, K.-D. Schewe, A.Mashkoor, and M. Biro. Vol. 9675. Lecture Notes in Computer Science. Cham:Springer International Publishing, 2016, pp. 329–343. isbn: 978-3-319-33600-8.

[Mod14] Modelica Association. Modelica – A Unified Object-Oriented Language for SystemsModeling, Language Specification, Version 3.3. 2014.

[Nea04] R. E. Neapolitan. Learning Bayesian networks. Prentice Hall series in artificialintelligence. Upper Saddle River, NJ: Pearson Prentice Hall, 2004. isbn: 0-13-012534-2.

[Ngu13] V. Y. Nguyen. “Trustworthy spacecraft design using formal methods.” Disser-tation. RWTH Aachen University, 2013.

[Nol15] T. Noll. “Safety, Dependability and Performance Analysis of Aerospace Sys-tems.” English. In: Formal Techniques for Safety-Critical Systems. Vol. 476. CCIS.Springer, 2015, pp. 17–31.

[Nyb99] M. Nyberg. “Model Based Fault Diagnosis: Methods, Theory, and AutomotiveEngine Applications.” PhD thesis. Linköping University, 1999.

181

Bibliography

[Obj15] Object Management Group. OMG Systems Modeling Language, Version 1.4. 2015.

[OR+03] F. Ortmeier, W. Reif, G. Schellhorn, A. Thums, and B. H. and Helmut Trapp-schuh. “Safety Analysis of the Height Control System for the Elbtunnel.” In:Reliability Engineering and System Safety 81.3 (2003), pp. 259–268.

[ORS05] F. Ortmeier, W. Reif, and G. Schellhorn. “Formal Safety Analysis of aRadio-Based Railroad Crossing Using Deductive Cause-Consequence Analysis(DCCA).” In:Dependable Computing. Vol. 3463. Lect. Notes Comp. Sci. Springer,2005, pp. 210–224.

[Par02] D. Parker. “Implementation of Symbolic Model Checking for Probabilistic Sys-tems.” PhD thesis. University of Birmingham, 2002.

[Plo04] G. D. Plotkin. “A structural approach to operational semantics.” In: The Journalof Logic and Algebraic Programming 60-61 (2004), pp. 17–139.

[PS98] J. Plaice and J.-B. Saint. The LUSTRE-ESTEREL portable format - Version oc5.Tech. rep. INRIA, Sophia Antipolis, 1998.

[Ray00] P. Raymond. LUSTRE-V4 manual (draft). 2000.

[San97] P. Sanders. “Lastverteilungsalgorithmen für parallele Tiefensuche.” Disserta-tion. Informatik für Ingenieure und Naturwissenschaftler, 1997.

[SGS93] P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. Vol. 81.Lecture Notes in Statistics. New York, NY: Springer New York, 1993. isbn:978-1-4612-7650-0.

[Sto02] M. Stoelinga. “An introduction to probabilistic automata.” In: Bulletin of theEuropean Association for Theoretical Computer Science (2002), pp. 176–198.

[Sto96] N. Storey. Safety Critical Computer Systems. Addison-Wesley, 1996.

[tGK16] M. H. ter Beek, S. Gnesi, and A. Knapp, eds. Proc. Joint 21st Intl. Wsh. FormalMethods for Industrial Critical Systems and 16th Intl. Wsh. Automated Verification ofCritical Systems (FMICS-AVoCS’16). Vol. 9933. Lect. Notes Comp. Sci. Springer,2016.

[VD+02] W. Vesely, J. Dugan, J. Fragola, J. M. III, and J. Railsback. Fault Tree Handbookwith Aerospace Applications. Handbook. Washington, DC: National Aeronauticsand Space Administration, 2002.

[VDW12] W. Visser, M. Dwyer, andM.Whalen. “The hidden models of model checking.”English. In: Software & Systems Modeling 11.4 (2012). issn: 1619-1366.

[Vin16] K. Viner, ed. Mars lander smashed into ground at 540km/h after misjudging its altitude.The Guardian. November 24, 2016. url: https://www.theguardian.com/science/ /nov/ /mars-lander-smashed-into-ground-at- kmh-after-misjudging-its-altitude (visited on 03/10/2018).

182

https://www.theguardian.com/science/2016/nov/24/mars-lander-smashed-into-ground-at-540kmh-after-misjudging-its-altitude



Bibliography

[VU81] W. Vesely and U.S. Nuclear Regulatory Commission. Division of Systems andReliability Research. Fault tree handbook. Fault Tree Handbook. Systems and Re-liability Research, Office of Nuclear Regulatory Research, U.S. Nuclear Regu-latory Commission, 1981.

[WP18] Wikipedia contributors. Bathtub curve — Wikipedia, The Free Encyclopedia. 2018.url: https://en.wikipedia.org/w/index.php?title=Bathtub_curve&oldid= (visited on 03/18/2018).

[Zap08] I. S. Zapreev. “Model Checking Markov Chains: Techniques and Tools.” PhDthesis. University of Twente, 2008.

[Zot02] E. Zotti, ed. Is “dead reckoning” short for “deduced reckoning”? The Straight Dope.November 21, 2002. url: http://www.straightdope.com/columns/read/ /is-dead-reckoning-short-for-deduced-reckoning/ (visited on02/19/2018).

183

https://en.wikipedia.org/w/index.php?title=Bathtub_curve&oldid=830276154

https://en.wikipedia.org/w/index.php?title=Bathtub_curve&oldid=830276154

http://www.straightdope.com/columns/read/2053/is-dead-reckoning-short-for-deduced-reckoning/

http://www.straightdope.com/columns/read/2053/is-dead-reckoning-short-for-deduced-reckoning/

Probabilistic Safety Analysis of Executable Models · 2018-07-11 · 1. IntroductionandMotivation...

Documents

Transcript of Probabilistic Safety Analysis of Executable Models · 2018-07-11 · 1. IntroductionandMotivation...