Bayesian Prediction for Stochastic Process Models...

Bayesian Predictionfor

Stochastic Process Modelsin Reliability

by

Simone Hermann

Dissertation

zur Erlangung des akademischen Grades eines Doktorsder Naturwissenschaften (rer. nat.) der FakultätStatistik der Technischen Universität Dortmund

Vorgelegt: 17. Oktober 2016

Tag der mündlichen Prüfung: 21. Dezember 2016

Erstgutachter: Prof. Dr. Christine Müller

Zweitgutachter: Prof. Dr. Katja Ickstadt

Kommissionsvorsitz: Prof. Dr. Jörg Rahnenführer

Contents

1. Introduction 1

2. Foundations for Modeling, Sampling and Prediction 32.1. Main Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2. Sampling Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2.1. Metropolis Hastings (MH) Algorithm . . . . . . . . . . . . . . . . . . 52.2.2. Gibbs Sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.3. Metropolis within Gibbs Algorithm . . . . . . . . . . . . . . . . . . . . 82.2.4. Adaptive MCMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2.5. Rejection Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2.6. Inversion Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3. Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4. Bayesian Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.5. Influence of the Euler Approximation . . . . . . . . . . . . . . . . . . . . . . . 25

I. Bayesian Analysis and Prediction for Models Based on Continuous Pro-cesses 34

3. Diffusion Process 363.1. Single Series Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.2. Hierarchical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.3. Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4. Hidden Diffusion Process 434.1. Single Series Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.2. Hierarchical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.3. Simulation Study and Application . . . . . . . . . . . . . . . . . . . . . . . . 48

II. Bayesian Analysis and Prediction for Non-continuous Processes 59

5. Non-homogeneous Poisson Process 605.1. Model Description, Inference and Prediction . . . . . . . . . . . . . . . . . . . 605.2. Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6. Jump Diffusion Process 656.1. Model Description, Inference and Prediction . . . . . . . . . . . . . . . . . . . 656.2. Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

7. Jump Regression Model 737.1. Model Description, Inference and Prediction . . . . . . . . . . . . . . . . . . . 737.2. Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

8. Description of the R Package BaPreStoPro 77

9. Discussion and Outlook 87

A. Notations 96

B. Further Details 97

1. Introduction

The following work has been developed in the project B5 “Statistical methods for damageprocesses under cyclic load” of the Collaborative Research Center “Statistical modeling of non-linear dynamic processes” (SFB 823) of the German Research Foundation (DFG). The aim ofthe project is to consider material fatigue from a statistical point of view, whereby modeling andforecasts are desired. Since Bayesian analysis is a strong tool for prediction, this thesis providesfull Bayesian inference, estimation as well as prediction, for models that can be used to predictmaterial fatigue in various materials. In this thesis, we will be mainly concerned with two differ-ent materials.

Firstly, Virkler et al. (1979) made experiments with aluminum alloy, where 68 specimen wereset under cyclic load to investigate crack length propagation. Since aluminum alloy is a homoge-neous material, the crack length propagation process appears to be continuous without any irreg-ularities and jumps.

Secondly, Reinhard Maurer, Guido Heeke and Jens Heinrich conducted fatigue experi-ments on prestressed concrete under a low cyclic stress level within the SFB 823 to in-vestigate the long life fatigue strength. Since prestressed concrete is a combination of con-crete and tension wires, which break over time, this crack width process exhibits irregularjumps.

A model often used for crack growth in the engineering literature, is the deterministic process de-fined by the Paris-Erdogan equation

da

dN= C(∆K)m, (1.1)

where a denotes the crack length and N the number of cycles corresponding to the crack length a.C and m are material dependent constants. ∆K denotes the stress intensity factor range that de-pends on the square root of a. That leads to an autoregressive process whose differences increasewith the growth of the process itself. For further information see, for example, Sobczyk andSpencer (1992).

Although all the methods described in the following can be applied in general, for specificmodeling in application on the presented data sets, the process (1.1) will be taken as ba-sis.

Firstly, for modeling of continuous crack propagation processes, a general diffusion process willbe presented based on a stochastic differential equation (SDE). For generalization, we refrainfrom solving the SDE to an explicit solution and approximate the SDE with the widely usedEuler-Maruyama approximation, see, for example, Fuchs (2013).

Secondly, for the jump process of the non-continuous crack propagation, a non-homogeneousPoisson process (NHPP) will be used.

Thirdly, both components are combined in a jump diffusion process for modeling of the crackpropagation process. This process will be defined by a SDE as well.

For the respective models, in most cases, Bayesian estimation is straightforward or presentedin existing literature. We will encounter two models with an arising filtering problem and

1

present a solution which makes it possible to estimate the model parameters. Firstly, ahidden diffusion model will be considered and a particle Gibbs sampler for the filtering of theunobserved diffusion will be applied. Secondly, the Poisson process within the jump diffusionwill be assumed to be unobserved, where we propose a simple filtering step within the Gibbssampler. Note that for the underlying prestressed concrete data set, the counting process is ob-served.

Little research has been done in the field of prediction for stochastic processes without existingexplicit solution of the SDE, both for general diffusion and for jump diffusion. In the caseof an explicit solution, a distribution of the process in each time point is available and canbe taken for a simple Bayesian prediction. This is explained, for example, for the Ornstein-Uhlenbeck and the Cox-Ingersoll-Ross process in a mixed effects model in Dion, Hermann andSamson (2016) and implemented in the R package mixedsde, see Dion, Samson and Hermann(2016).

If the process is approximated by the Euler-Maruyama scheme, this prediction can becomeinaccurate for large time differences. A common solution in the Bayesian community is tosample trajectories with samples from the posterior distribution and take these for prediction,namely the pointwise mean for a point prediction or quantiles for prediction intervals, respec-tively. The analogous frequentist solution is sampling of trajectories with the point estimatesof the parameters. Both procedures will be compared to the methods presented in this the-sis.

In our project within the SFB 823, prediction for the far future is desired, and, in the following,we present two Bayesian prediction methods for stochastic processes based on the Euler approxi-mation, one for trajectories of the process and one for a predictive distribution for the process inone time point.

The thesis is structured as follows. In the next section, the model is presented, used samplingalgorithms are explained, a short sensitivity analysis with motivation of prior distributions isperformed and, as main part of the thesis, Bayesian prediction for stochastic processes based onthe transition density, which can be the one resulting from the Euler approximation, is presented.Afterwards, the implication of different time distances for the Euler approximation is investi-gated.

Sections 3 and 4, joint in Chapter I, consider the general diffusion process. In Section 4, the diffu-sion is hidden in a regression model, where the estimation is complicated through a filtering prob-lem.

In Chapter II, firstly, Section 5 introduces the NHPP and secondly, Section 6 joins diffusion andcounting process into a jump diffusion. Thirdly, Section 7 presents a non-linear regression modelincluding the NHPP, which is a result of the interdisciplinary work in the project with engineers,see Heeke, Hermann et al. (2015).

Section 8 describes the package BaPreStoPro (Bayesian Prediction of Stochastic Processes),developed as part of this thesis. All the methods described in this thesis are implemented in R,see R Core Team (2015), and joint in the package BaPreStoPro, see Hermann (2016a).

2

2. Foundations for Modeling, Sampling and Prediction

The goal of statistics is to reconstruct real phenomena with a specific uncertainty in a modelto allow some kind of inference, for example, for the future. A wide class of models is theone of Markov processes, which have the property that the current state only depends on theprevious state. One example would be the crack length in a specific material, which couldbe observed every day or every minute or, as in our cases, over a certain number of loadcycles. In the first case one would assume a discrete-time Markov process. For measurementsevery minute or over load cycles, a continuous-time one would be also suitable. In the case ofnon-equidistant time points, a continuous-time choice would be suitable as well. The examplesin this thesis are all concerned with crack growth in different materials, for example aluminumalloy, a homogeneous material, or prestressed concrete, a non-homogeneous material. Allhave in common that the current crack size only depends on the crack size before and on theincrement, which by its nature is a Markov process. To be flexible with the observation timedistances, we will assume a continuous-time process. In the case of homogeneous material,we can assume a continuous process. This will be done by a diffusion driven by a Wienerprocess. In the case of a non-homogeneous material, we observe irregular jumps, which makea non-continuous process suitable. Here, we will assume a jump diffusion process. To join allthe presented models in one representation, we consider the model described in the following sec-tion.

2.1. Main Model

In this thesis, we consider the one-dimensional stochastic process Yt, t ∈ [0, T ] on the time in-terval [0, T ] defined by the stochastic differential equation (SDE) given by

dYt = b(ϕ, t, Yt) dt + s(γ, t, Yt) dWt + h(η, t, Yt) dNt, Y0 = y0(ϕ), (2.1)

where Wt, t ∈ [0, T ] denotes a Brownian motion on [0, T ], Nt, t ∈ [0, T ] denotes a non-homogeneous Poisson process with cumulative intensity function E[Nt] =

∫ t0 λξ(u) du = Λξ(t)

which has to be bounded on [0, T ]. Both, Brownian motion and the Poisson process, haveto be stochastically independent. y0 will be assumed to be constant in some cases, butfor the hierarchical and the hidden diffusion model, this will be a function of the locationparameter ϕ. We could define the process also on the time interval [0, ∞). But, our observationtime interval is naturally bounded, since we observe material fatigue until failure. Strictlyspeaking, T in this context is a random variable, but we neglect this fact and assume T largeenough that observation time points 0 ≤ t0 < t1 < ... < tn ≤ T are located in the interval[0, T ].

Stochastic differential equations as in (2.1) are well-investigated, see, e.g., Øksendal and Sulem(2005); Shimizu and Yoshida (2006); Klebaner (2005); Protter (2005); Fuchs (2013). Through-out this thesis, we consider a parametric estimation approach, this means, functions b, s and hare assumed to be known. Under the Lipschitz conditions

(b(ϕ, t, x) − b(ϕ, t, y))2 ≤ C1(x − y)2,

(s(γ, t, x) − s(γ, t, y))2 ≤ C2(x − y)2,

(h(η, t, x) − h(η, t, y))2 ≤ C3(x − y)2

3

and the linear growth conditions

b(ϕ, t, y)2 ≤ D1(1 + y2), s(γ, t, y)2 ≤ D2(1 + y2), h(η, t, y)2 ≤ D3(1 + y2)

for all t ∈ [0, T ], x, y ∈ R, Ci and Di, i = 1, 2, 3, positive constants, SDE (2.1) has a uniquestrong solution, see Ikeda and Watanabe (1989) for further details.

The resulting process is closely related to the class of Lévy processes, see, e.g., Protter (2005).For general functions b, s, h and Λξ, the process defined by the SDE (2.1) does not have station-ary increments, in contrary to a classical Lévy process. According to the definition in Cont andTankov (2004) and Kluge (2005), our process fits in the class of time-inhomogeneous Lévy Pro-cesses.

In this thesis, the parameters in equation (2.1) have the following dimensions: ϕ ∈ Rp will beassumed to be a vector, γ ∈ (0, ∞) and η ∈ R will be assumed to be scalars, whereby extensionsto a multi-dimensional vector would be easy to implement. Function s depends on γ becausein most cases it will be s(γ, t, y) = γ or s(γ, t, y) = γy, but for inference γ2, i.e. the parameter ofthe resulting variance, will be estimated.

During this thesis, we consider different specific cases of the process, for example, the continuousdiffusion process, i.e. h(η, t, y) = 0, or the Poisson process itself, i.e. h(η, t, y) = 1, b(ϕ, t, y) =s(γ, t, y) = 0, y0 = 0.

If the SDE (2.1) has an explicit solution, one has a transition density that goes in forthe estimation. But in the case that the process has no explicit representation, the SDEcan be approximated in time points 0 ≤ t0 < t1 < ... < tn ≤ T . A widely used ap-proximation scheme is the Euler-Maruyama, or short Euler, approximation and is givenby

Y0 = y0(ϕ), (2.2)Yi = Yi−1 + b(ϕ, ti−1, Yi−1) ∆i + s(γ, ti−1, Yi−1)

√∆i ζi + h(η, ti−1, Yi−1)∆Ni,

ζi ∼ N (0, 1) i.i.d., ∆Ni ∼ Pois (Λξ(ti) − Λξ(ti−1)) ,

∆i = ti − ti−1, i = 1, ..., n,

with i.i.d. meaning independent and identically distributed. For jump diffusions, see Platenand Bruti-Liberati (2010) or for ordinary diffusions, i.e. h(η, t, y) = 0, Fuchs (2013). Con-ditional on the jump term, the approximated variables have the normal transition den-sity

p(Yi|Yi−1, ϕ, γ, η, ∆Ni) =1√

2π∆is(γ, ti−1, Yi−1)exp

(−(Yi − Yi−1 − b(ϕ, ti−1, Yi−1) ∆i − h(η, ti−1, Yi−1)∆Ni)2

2s2(γ, ti−1, Yi−1)∆i

).

In most cases, the true underlying transition density cannot be assumed to be normal. A goodoverview of possibly resulting problems of biased estimations can be seen in Sørensen (2004) orBeskos et al. (2006) for the case h(η, t, y) = 0.

The main scope of this thesis is prediction. For the different specific cases of the process in (2.1),prediction procedures will be presented. In the case where the Euler approximation is used, the

4

prediction is also based on the same discretized variables as used in the estimation which circum-vents the problem of possibly biased estimations.

In the following, we use notation θ = (ϕ, γ2, η, ξ) for the parameter vector, or parts of it inthe respective model, and Y(n) = (Y0, Y1, ..., Yn) for the vector of Euler approximated vari-ables.

2.2. Sampling Algorithms

In Bayesian estimation, the posterior distributions often are not explicitly available. Therefore,sampling algorithms may to be used for approximation. In the following, the algorithms used inthis thesis, are described. The first one is the most popular in the class of Markov Chain MonteCarlo (MCMC) algorithms.

2.2.1. Metropolis Hastings (MH) Algorithm

We want to sample from the posterior distribution

p(θ|Y(n)) ∝ p(Y(n)|θ) · p(θ).

At first, choose a proper proposal density q(θ∗|θ). A minimal necessary condition is that⋃θ ∈ supp p(·|Y(n))

supp q(·|θ) ⊃ supp p(·|Y(n))

with supp denoting the support of a function. In words, each point of the posterior’s support hasto be available by the Markov chain based on that proposal density. The formal condition wouldalso be true for the density of the Dirac measure at θ. Therefore, this is only the minimal neces-sary condition, the proposal density has to be chosen wisely for a good approximation of the pos-terior distribution.

At second, choose a starting value θ∗0 with p(θ∗

0|Y(n)) > 0. For k = 1, ..., K draw θ∗ ∼ q(·|θ∗k−1)

and set θ∗k = θ∗ with acceptance probability

ρ(θ∗, θ∗k−1) = min

p(Y(n)|θ∗) · p(θ∗)

p(Y(n)|θ∗k−1) · p(θ∗

k−1) ·q(θ∗

k−1|θ∗)q(θ∗|θ∗

k−1) , 1

and θ∗k = θ∗

k−1 with probability 1−ρ(θ∗, θ∗k−1). Then, the stationary distribution of the Markov

chain θ∗k, k = 1, ..., K is equal to the posterior p(θ|Y(n)), see Robert and Casella (2004), p.

272.

In the special case of a symmetric proposal density, i.e. q(θ∗|θ∗k−1) = q(θ∗

k−1|θ∗) the acceptanceprobability reduces to

ρ(θ∗, θ∗k−1) = min

p(Y(n)|θ∗) · p(θ∗)

p(Y(n)|θ∗k−1) · p(θ∗

k−1) , 1

.

5

One example is the density of the N (θ∗k−1, sd2) distribution with mean equal to the last iteration

step θ∗k−1 and a fixed standard deviation sd. This is also known as symmetric random walk, see

Robert and Casella (2004), p. 206.

A crucial step of the algorithm is the choice of the proposal density. Theoretically, the minimalnecessary condition given above suffices. But in practice, a good approximation of the posteriordensity is of interest. If the acceptance probability ρ(θ∗, θ∗

k−1) is large, many samples areaccepted, but the new one moves only little from the old iteration step and the support ofthe distribution will be explored slowly. On the other hand, if the acceptance probability issmall, only few candidates are accepted, which is also not suitable for a continuous distribu-tion. Rosenthal (2011) calculated an optimal acceptance rate of the MH algorithm, which is0.234. To reach this, an adaptive algorithm might be a solution, which will be presented be-low.

The choice of the starting point is also an important step of the algorithm. If it is not wellchosen, the chain needs many iterations to reach the stationary distribution and all thesesamples are skipped, this is called burn-in phase. We will make an example to see this moreclear.

Example 1We consider process (2.1) with h(η, t, y) = 0, s(γ, t, y) = γy and b(ϕ, t, y) = ϕy. It isapproximated by

Yi = Yi−1(1 + ϕ ∆i) + γYi−1√

∆i ζi, i = 1, ..., n,

with Y0 = y0 and ζi ∼ N (0, 1). We consider the process with y0 = 1, ϕ = 2, γ = 1 int0 = 0, t1 = 0.02, ..., t50 = 1.

With prior distribution ϕ ∼ N (2, 4) and fixed γ2 = 1 for a simple MH algorithm for ϕ, the poste-rior is proportional to

p(ϕ|Y(n), γ2) ∝n∏

i=1p(Yi|Yi−1, ϕ, γ2) · p(ϕ) (2.3)

=n∏

i=1

1√2πγ2Y 2

i−1∆i

exp(

− 12γ2Y 2

i−1∆iYi − Yi−1(1 + ϕ∆i)2

)1√

2π · 4exp

(− 1

2 · 4(ϕ − 2)2)

=n∏

i=1

10.2√

πY 2i−1

exp(

− 10.04Y 2

i−1Yi − Yi−1(1 + 0.02ϕ)2

)1

2√

2πexp

(−1

8(ϕ − 2)2)

.

We start the algorithm with ϕ∗0 = 10 and draw the new candidate from the normal distribution

with mean equal to the last sample and compare proposal standard deviations sd ∈ 5, 1, 0.1.In Figure 1, we see the resulting Markov chains. The acceptance rate with the large standarddeviation is 0.23 which is close to the optimal one calculated in Rosenthal (2011). The first1 000 chain iterations can be seen in Figure 1(a) and there are constant phases in which nonew candidate is accepted. In 1(b), the first 1 000 iterations with standard deviation equal to1 are displayed. The acceptance rate is 0.67 which is larger than the optimal one. But with aclose look on the chain, this result would be also good for the approximation of the posteriordistribution. The stationary distribution is reached after 100 samples and the distribution

6

support is well filled. With a look on Figures 1(d) and 1(e) displaying 10 000 iterations of therespective chain, the difference is very small. Another picture appears in Figures 1(c) and 1(f)which show the chains for the small proposal standard deviation 0.1. The acceptance rate is0.96 which is large. To get an equally good approximation of the posterior as for the two otherchains, much more iterations would have to be drawn and a high thinning rate would be neces-sary.

0 200 400 600 800 1000

02

46

810

iteration

φ

(a) sd = 5, 1 000 iterations

0 200 400 600 800 1000

02

46

8

iteration

φ

(b) sd = 1, 1 000 iterations

0 200 400 600 800 1000

02

46

810

iteration

φ

(c) sd = 0.1, 1 000 iterations

0 2000 4000 6000 8000 10000

02

46

810

iteration

φ

(d) sd = 5, 10 000 iterations

0 2000 4000 6000 8000 10000

02

46

8

iteration

φ

(e) sd = 1, 10 000 iterations

0 2000 4000 6000 8000 100000

24

68

10

iteration

φ

(f) sd = 0.1, 10 000 iterations

Figure 1: Markov chains, based on different proposal variances, with stationary distribution(2.3) and starting value ϕ∗

0 = 10.

In the following, θ∗1, ..., θ∗

K will always be understood as the resulting samples from the posterior,already reduced by the burn-in period and the thinning rate.

In the case of a high-dimensional vector θ, it might be difficult to find a proper proposal densityfor the MH algorithm to update all components simultaneously. In this case, the Gibbs samplercan be more suitable.

2.2.2. Gibbs Sampler

For a high dimensional parameter vector θ = (θ1, ..., θd), where each θj , j = 1, .., d, can be alsoa vector itself, the idea is to sample iteratively from the full conditional posterior distributions ofthe components p(θj |Y(n), θ1, ..., θj−1, θj+1, ..., θd), j = 1, ..., d.

7

Choose starting values θ∗2,0, ..., θ∗

d,0 and for k = 1, ..., K draw

θ∗1,k ∼ p(θ1|Y(n), θ∗

2,k−1, ..., θ∗d,k−1),

θ∗2,k ∼ p(θ2|Y(n), θ∗

1,k, θ∗3,k−1, ..., θ∗

d,k−1),...

θ∗d−1,k ∼ p(θd−1|Y(n), θ∗

1,k, ..., θ∗d−2,k, θ∗

d,k−1),θ∗

d,k ∼ p(θd|Y(n), θ∗1,k, ..., θ∗

d−1,k).

The resulting Markov chain (θ∗1,k, ..., θ∗

d,k), k = 1, ..., K = θ∗k, k = 1, ..., K has station-

ary distribution p(θ|Y(n)), see Robert and Casella (2004), p. 372, or, for comparison, Carlin andLouis (2009).

2.2.3. Metropolis within Gibbs Algorithm

In the case of not explicitly available full conditional posteriors, one step of the Gibbssampler can be conducted by a MH algorithm. The original work of Metropolis et al.(1953) introduced what we now call Metropolis within Gibbs algorithm. Here, we restrictto the case of only one component θj , j ∈ 1, ..., d, to be sampled by a MH step. Ofcourse, this can be done for several components. In addition, for notation simplicity, weassume the jth component to be independent from the others in their prior distribution.This means p(θ) = p(θj)p(θ1, ..., θj−1, θj+1, ..., θd). We choose a proper proposal densityq(·|θ∗

1,k, ..., θ∗j−1,k, θ∗

j,k−1, ..., θ∗d,k−1) for θj similar to the MH algorithm and starting values

θ∗2,0, ..., θ∗

d,0, if j = 1 also θ∗1,0. For k = 1, ..., K draw

θ∗1,k ∼ p(θ1|Y(n), θ∗

2,k−1, ..., θ∗d,k−1),

...θ∗

j−1,k ∼ p(θj−1|Y(n), θ∗1,k, ..., θ∗

j−2,k, θ∗j,k−1, ..., θ∗

d,k−1),

θ∗j ∼ q(θj |θ∗

1,k, ..., θ∗j−1,k, θ∗

j,k−1, ..., θ∗d,k−1) and accept θ∗

j,k = θ∗j with probability

ρ(θ∗j , θ∗

j,k−1) = min

1,p(Y(n)|θ∗

1,k, ..., θ∗j−1,k, θ∗

j , θ∗j+1,k−1, ..., θ∗

d,k−1) · p(θ∗j )

p(Y(n)|θ∗1,k, ..., θ∗

j−1,k, θ∗j,k−1, ..., θ∗

d,k−1) · p(θ∗j,k−1)

·q(θ∗

j,k−1|θ∗1,k, ..., θ∗

j−1,k, θ∗j , θ∗

j+1,k−1, ..., θ∗d,k−1)

q(θ∗j |θ∗

1,k, ..., θ∗j−1,k, θ∗

j,k−1, ..., θ∗d,k−1)

and θ∗

j,k = θ∗j,k−1 with probability 1 − ρ(θ∗

j , θ∗j,k−1),

θ∗j+1,k ∼ p(θj+1|Y(n), θ∗

1,k, ..., θ∗j,k, θ∗

j+2,k−1, ..., θ∗d,k−1),

...θ∗

d,k ∼ p(θd|Y(n), θ∗1,k, ..., θ∗

d−1,k).

See for further details Robert and Casella (2004), p. 392.

8

2.2.4. Adaptive MCMC

As seen in the description of the MH algorithm in Section 2.2.1, the quality of the algorithmstrongly depends on the proposal density. The remaining question is how to choose a suitableproposal variance. This variance controls the approximation quality of the distribution ofinterest. We have seen the effect in Figure 1. This problem is tackled in Rosenthal (2011)with an adaptive MH step using the proposal distribution N (θ∗

k−1, e2l) with l as the logarithmof the standard deviation and θ∗

k−1 the previous iteration state. This parameter is chosen sothat the acceptance rate is close to the optimal, which is 0.234 for a classical MH algorithmand 0.44 for the Metropolis within Gibbs sampler, see Rosenthal (2011). The idea is todivide the chain into batches, for example of 50 iterations. After every batch, the currentacceptance rate is calculated. If it is outside a specific range it is adapted, whereby wewill choose [0.3,0.6] in the following calculations. Rosenthal (2011) propose to add, resp.subtract, an adoption amount δ(N) = min(0.1, N−1/2) to, resp. from, l after the Nth batch.

The presented methods consider sampling from the posterior distribution of θ. The MH algo-rithm and the Gibbs sampler are very suitable in the case of easy and fast computable densities.We will present a predictive distribution which also will not be explicitly available and its calcu-lation is computationally costly. In the case of expensive computation of the interesting density,the following algorithms are more suitable. Because we sample from the predictive density, weswitch the notation from θ to y.

2.2.5. Rejection Sampling

One possibility to draw random samples from a distribution with density p(y) without Markovchains is to choose a so called envelope (or proposal) density g(y), where it is easy to draw from,and accept or reject the samples. This envelope density has to hold the condition that there issome constant A so that p(y) ≤ Ag(y) for all y ∈ supp p(·).

To draw one sample of a distribution with density p(y), conduct the following steps.

(i) Generate a realization u from the uniform distribution U(0, 1) and independently onesample y∗ from g(y).

(ii) If u ≤ p(y∗)Ag(y∗) , accept y∗ as a sample from distribution with density p otherwise, reject

the candidate.

One challenge of this algorithm is the choice of density g(y). One possibility is to choose acandidate area [yl, yu], which should include the support of p(y), and set g(y) = 1

yu−yl1(y)

equal to the density of the uniform distribution on the candidate area. With A = (yu −yl) · maxy∈[yl,yu] p(y), this density fulfills the condition for the algorithm. In Figure 2, wesee an illustration of the algorithm. The black line marks the interesting density and ingray, we see the rejection area. If this is large, the algorithm gets inefficient, becausemany samples are rejected. The smaller the gray area is, the more efficient is the algo-rithm.

Another possibility is to fix a vector of points yl = y1 < y2 < ... < yC = yu. Then, calculatep(y1), ..., p(yC) and

9

yl yu

Figure 2: Illustration of rejection sampling, black line: interesting density, in gray: rejectionarea.

(i) generate a realization u from the uniform distribution U(0, maxCj=1 p(yj)) and indepen-

dently one index j ∼ U1, ..., C.

(ii) If u ≤ p(yj), accept yj as a sample from the distribution with density p otherwise, rejectthe candidate.

In the case of many samples to draw from the same distribution with density p, the second pro-cedure can be much faster. We will investigate the runtimes later on. See Devroye (1986) for fur-ther details of the algorithm.

2.2.6. Inversion Method

Another possibility to draw random samples from a continuous distribution with distributionfunction F (y) =

∫ y−∞ p(z) dz is the inversion method. For U ∼ U(0, 1), the random variable

F −1(U) has the distribution of interest. In many cases, this inverse function F −1 is notcalculable. However, there is one possibility to fix an interval [yl, yu] and to choose a vector ofpoints yl = y1 < y2 < ... < yC = yu. Then, one calculates F (y1), ..., F (yC) and for a realizationu from the uniform distribution, miny ∈ y1, ..., yC|F (y) ≥ u can be seen as a samplefrom F . In the case of many samples to draw from the same distribution F this is a very suitablechoice.

In the case of only few, or one, sample to draw, we present the binary search algorithm. Here,we also fix an interval [yl, yu] and a realization u from the uniform distribution and in each stepF (yl+yu

2 ) is compared to u.

This means, to draw one sample of distribution F , we choose a fineness degree fn and conductthe following.

(i) Generate a realization u from the uniform distribution U(0, 1).

(ii) As long as yu − yl > fn:

10

if F (yl+yu

2 ) < u: yl = yl+yu

2 ,

else yu = yl+yu

2 .

(iii) Return yl+yu

2 as sample of distribution F .

In Figure 3, we see an illustration of the binary search algorithm.

yl yu

u

yl yl yu

u

yu

Figure 3: Illustration of inversion method with binary search.

The problem of finding a suitable interval [yl, yu] is the same for rejection sampling and inversionmethod. The only difference is that rejection sampling is very sensitive for runtimes when the in-terval is too large, because many candidates are rejected. See for further details of the algorithm,for example, Devroye (1986).

For both sampling methods, rejection sampling as well as inversion method, we have pre-sented two versions. At first, one sample can be drawn by running the firstly presentedsteps (i) and (ii) of the rejection sampling, see Section 2.2.5, or steps (i)-(iii) of the inver-sion method. This can be repeated as often as desired. This is fast, if only few sampleshave to be drawn but can take long for a large number of samples. At second, based on afixed candidate area, the distribution function, respectively the density, is calculated for avector of candidates, among which a random sample is drawn. To draw only few samples,this can be inefficient, because it takes long to calculated the distribution function for alarge vector of candidates. But to draw a large amount of samples, this is much faster thandrawing each sample separately without candidate area. We will investigate this effect lateron.

2.3. Sensitivity Analysis

In an informative Bayesian analysis, one usually is interested in the influence of the prior param-eters. In reality, it will often be the case that no prior knowledge is available and, therefore, itis important to know what happens to the estimations, if the prior parameters are not correctlychosen. We again use Example 1. In Figure 4, we first motivate the choice of the normal distribu-tion for the location parameter and the inverse gamma (IG) distribution for the variance parame-ter.

We compare three sample sizes n ∈ 20, 100, 500 between t0 = 0 and tn = 1 and each two wrongprior means with three different prior variances. For ϕ ∼ N (mϕ, Vϕ) we run the estimations with

11

−1 0 1 2 3 4 5 6

0.0e

+00

1.0e

−07

φ

likel

ihoo

d

(a) Likelihood p(Y(n)|ϕ, γ2 = 1)

−1 0 1 2 3 4 5 6

0.0

0.1

0.2

0.3

0.4

φ

dens

ity o

f N(2

,1)

(b) Possible prior density p(ϕ)

0 1 2 3 4

0.00

0.04

0.08

0.12

γ2

likel

ihoo

d

(c) Likelihood p(Y(n)|ϕ = 2, γ2)

0 1 2 3 4

0.0

0.4

0.8

1.2

γ2

dens

ity o

f IG

(10,

9)

(d) Possible prior density p(γ2)

Figure 4: Motivation of prior distribution families for Example 1; top: comparison of likelihoodwith respect to ϕ (a) and density of a suitable prior N (2, 1) distribution (b), bottom:comparison of likelihood with respect to γ2 (c) and density of a suitable priorIG(10, 9) distribution (d).

mϕ ∈ 1, 3 whereby the parameter used for the simulation is ϕ = 2 and Vϕ ∈ 0.01, 0.5, 4 tocompare large, medium and small influence of the prior distribution. For γ2 ∼ IG(aγ , bγ) – withchosen value γ2 = 1 in the simulation – we compare the prior means 0.5 and 1.5 and, therefore,choose aγ ∈ 3, 20, 100 and bγ ∈ (aγ−1)·0.5, (aγ−1)·1.5. This can be seen with a look on theexpectation of the IG(aγ , bγ):

E[γ2] = bγ

aγ − 1 , V ar(γ2) =b2

γ

(aγ − 1)2(aγ − 2)

for aγ > 2. This means a large variance for aγ larger but close to 2 and a small variance for largeaγ . In Figure 5 we see the densities of the prior distributions.

For each parameter setting, we sample from the posterior with a Metropolis within Gibbssampler. Details to the estimation itself are given in Section 3. For the sensitivity analysis of ϕ,we fix the prior parameters of γ2 to aγ = 3 and bγ = 2 which leads to prior mean and standarddeviation equal to the chosen value for γ2 = 1. In Figure 6, we see the resulting posteriordensities. In the left column, the prior mean is mϕ = 1 and in the right column, it is mϕ = 3.

12

0 1 2 3 4

01

23

4

φ

prio

r de

nsity

Vφ = 0.01Vφ = 0.5Vφ = 4

(a) N (1, Vϕ) densities

0 1 2 3 4

01

23

4

φ

prio

r de

nsity

Vφ = 0.01Vφ = 0.5Vφ = 4

(b) N (3, Vϕ) densities

0.0 0.5 1.0 1.5 2.0

0.0

0.5

1.0

1.5

2.0

γ2

prio

r de

nsity

aγ = 100aγ = 20aγ = 3

(c) IG(aγ , (aγ − 1) · 0.5) densities

0.0 0.5 1.0 1.5 2.0 2.5 3.0

01

23

45

6

γ2

prio

r de

nsity

aγ = 100aγ = 20aγ = 3

(d) IG(aγ , (aγ − 1) · 1.5) densities

Figure 5: Prior densities for ϕ with Vϕ ∈ 0.01, 0.5, 4 (top) and γ2 with aγ ∈ 3, 20, 100(bottom).

For all considered sample sizes, the results are very similar. The short dotted line marks thechosen value ϕ = 2 for simulation. The results are mirrored with respect to that line. In allcases, we can see that the very small variance, illustrated by the black line, can fix the posteriorto the prior mean location. But even the red line, Vϕ = 0.5, leads to a biased estimation. Thegreen line, Vϕ = 4, leads to good results. Only in the case of sample size n = 20, which isvery small for a process with autoregressive variance, the posterior location is not exactlycorrect. By comparison of the left and right plots, it becomes clear that this is a result ofonly one simulation, since both posterior densities have the same shape and the same loca-tion despite of the different prior locations. The credibility interval would include the value ϕ =2.

In the sensitivity analysis of γ2, we fix mϕ = 2 and Vϕ = 4 with the same reason asabove. In Figure 7 we see the resulting posterior densities. In the left column, we com-pare each of the prior variances b2

γ

(aγ−1)2(aγ−2) ∈ 0.0026, 0.0139, 0.25 and in the right col-

umn b2γ

(aγ−1)2(aγ−2) ∈ 0.023, 0.125, 2.25. We see that for all cases the small prior vari-ance, i.e. aγ = 100, leads to biased estimates. For the small sample size as well asfor n = 100, also the red line is influenced by the prior distribution. But in all cases,

13

0 1 2 3 4 5 6

01

23

4n = 20

φ

post

erio

r of

φ

Vφ = 0.01Vφ = 0.5Vφ = 4prior meanchosen value

0 1 2 3 4 5 6

01

23

n = 20

φ

post

erio

r of

φ


0 1 2 3 4 5 6

01

23

4

n = 100

φ

post

erio

r of

φ


0 1 2 3 4 5 6

01

23

4

n = 100

φ

post

erio

r of

φVφ = 0.01Vφ = 0.5Vφ = 4prior meanchosen value

0 1 2 3 4 5 6

01

23

n = 500

φ

post

erio

r of

φ


0 1 2 3 4 5 6

01

23

4

n = 500

φ

post

erio

r of

φ


Figure 6: Posterior densities of ϕ with the corresponding prior densities in Figures 5(a) and5(b), left: prior mean mϕ = 1, right: prior mean mϕ = 3.

the green line, i.e. aγ = 3, shows good results. In the middle row, the posterior loca-tion lies not at the chosen parameter γ2 = 1. But comparison of the support of the leftand the right plot, both calculated with the same simulated realization of sample Y(n),shows that this is due to the fact of only one simulation similar to the upper row of Figure6.

In summary, if no expert knowledge is available, the prior parameters will be chosen so that

14

0.5 1.0 1.5 2.0 2.5

02

46

n = 20

γ2

post

erio

r of

γ2

aγ = 100aγ = 20aγ = 3prior meanchosen value

0.5 1.0 1.5 2.0 2.5

0.0

1.0

2.0

n = 20

γ2

post

erio

r of

γ2


0.5 1.0 1.5 2.0 2.5

02

46

8

n = 100

γ2

post

erio

r of

γ2


0.5 1.0 1.5 2.0 2.5

01

23

n = 100

γ2

post

erio

r of

γ2


0.5 1.0 1.5 2.0 2.5

02

46

8

n = 500

γ2

post

erio

r of

γ2


0.5 1.0 1.5 2.0 2.5

01

23

45

67

n = 500

γ2

post

erio

r of

γ2


Figure 7: Posterior densities of γ2 with the corresponding prior densities in Figures 5(c) and5(d), left: prior mean bγ

aγ−1 = 0.5, right: prior mean bγ

aγ−1 = 1.5.

the expected value and the standard deviation are equal to each other, which would meanfor the above example: Vϕ = ϕ2 and aγ = 3. This will be done in the following investiga-tions.

15

2.4. Bayesian Prediction

In Meeker and Escobar (1998), two different prediction problems are presented. Firstly, observa-tions of one unit or individual are available and predictions for future observations for a new indi-vidual are of interest. This case is referred to as new-sample prediction. Secondly, future obser-vations for the current individual have to be predicted. This is called within-sample prediction.We will use these names in the following as well.

There are several ways to predict a new observation variable Y ∗ at time t∗. In the frequentistestimation, in many cases, a predictive distribution is not calculable and, therefore, trajectoriesare simulated from the model with point estimates plugged in, see, for example, Chiquet et al.(2009). In Vidoni (2004), a predictive density for discrete-time stochastic processes is calculated,but prediction is only made for the next observation Yn+1. In the Bayesian approach, it is com-mon to take the samples from the posterior distribution, θ∗

1, ..., θ∗K , and to simulate with each

of them one trajectory. Altogether, K trajectories provide prediction samples, see, for example,Weinberg et al. (2007). If the density p(Y ∗|θ) is explicitly available as it is, for example, in thecase of an explicit solution of the SDE (2.1), the predictive distribution can be just calculatedby a sampling procedure using the density p(Yt∗ |θ). This is presented in Blanke and Bosq(2015) for the Poisson and the Ornstein-Uhlenbeck (OU) process, in Jokiel-Rokita et al. (2014)for doubly stochastic Poisson processes, and in Dion, Hermann and Samson (2016) for the OUand the Cox-Ingersoll-Ross process within a mixed effects model. For a non-linear regressionmodel see, for example, Yuan and Ji (2015); Robinson and Crowder (2000), which would be simi-lar.

In the following, a novel approach to predict stochastic processes, based on the Euler approxima-tion, is presented. This procedure is valuable for all processes defined by (2.1), including the es-timation uncertainty as well as possible uncertainties from latent variables. We will first presentan algorithm leading to a pointwise distribution of Y ∗ and afterwards, a second algorithm yield-ing a vector of process variables.

In Bayesian prediction, one naturally has a predictive distribution and can derive the pointprediction or prediction intervals. The predictive distribution for a point of interest Y ∗ is givenby

p(Y ∗|Y(n)) =∫

p(Y ∗|Y(n), θ) · p(θ|Y(n)) dθ,

which often is not analytically available. But, if we have samples θ∗1, ..., θ∗

K ∼ p(θ|Y(n)) from theposterior distribution, we can approximate the predictive density by

p(Y ∗|Y(n)) =∫

p(Y ∗|Y(n), θ) · p(θ|Y(n)) dθ

≈ 1K

K∑k=1

p(Y ∗|Y(n), θ∗k).

In the case of within-sample prediction, the Markov property p(Y ∗|Y(n), θ) = p(Y ∗|Yn, θ)can be used. For a new-sample prediction, Y ∗ is a point of a new series and p(Y ∗|Y(n), θ) =p(Y ∗|θ).

16

In the following, two algorithms for prediction are presented. The first algorithm con-siders the predictive distribution of one point of interest that can lie far away from thelast observed value. In the case of a new series, this is equivalent to predicting a pointthat lies in large distance to the starting point. Since we base the model on the SDE(2.1), the Euler approximation can perform poorly for large time differences. This prob-lem will be tackled. A second algorithm is presented to sample a trajectory vector. Sam-pling from an arbitrary multivariate distribution is not self-evident. Both algorithms canbe used equally for the prediction of a new series as well as for prediction of the furtherdevelopment of the series. We declare the specific points with (ia) and (ib) in the algo-rithms.

Let us assume, we want to predict the process in time point t∗ ≫ tn, respectively t∗ ≫ t0. Forlarge time distances, the Euler approximation can be inaccurate. Therefore, the idea of the algo-rithm is to go in M steps to the interesting value and in each to include the predictive distribu-tion of the last step. Therefore, we choose a sampling partition t0, tn ∋ τ0 < τ1 < ... < τM =t∗, for example

τm = m · ∆∗ + τ0, m = 0, ..., M, ∆∗ := t∗ − τ0M

.

Let p(Y ∗m+1|Y ∗

m, θ) denote the transition density of the Euler approximated variables Y ∗0 ∈

y0, Yn, Y ∗1 , ..., Y ∗

M = Y ∗ in τ0, τ1, ..., τM .

In the following, we denote with θ∗1, ..., θ∗

K samples from the posterior distribution p(θ|Y(n))for the models without latent variable. In the specific models with latent variables, thehierarchical and the jump models, the first step is to predict the latent variables, the ran-dom effect in the hierarchical model and the Poisson process in the jump models. Forthe jump diffusion, prediction for the Poisson process variables Nτ1 , ..., NτM is needed asa first step, see for example Hermann et al. (2016a). Then, θ∗

1, ..., θ∗K contain the poste-

rior samples of the parameters as well as the prediction samples of the Poisson processvariables. For the hierarchical model, for new-sample predictions, samples of the predic-tive distribution for the random effects are drawn, details are given in Section 3.2. Inthe following, θ∗

k, k = 1, ..., K, will denote the vector of posterior and predictive samplesjointly. We sample from the pointwise predictive distribution by running the following algo-rithm.

Algorithm 1Take samples θ∗

k ∼ p(θ|Y(n)), k = 1, ..., K, from the posterior distribution, respectively fromthe predictive distribution of latent variables.

(i) Set starting values

(ia) Y∗(k)

0 := Yn, k = 1, ..., K: within-sample prediction,

(ib) Y∗(k)

0 := y0, k = 1, ..., K: new-sample prediction.

(ii) For m = 1, ..., M draw K samples

Y ∗(1)m , ..., Y ∗(K)

m ∼ 1K

K∑k=1

p(Y ∗

m|Y ∗(k)m−1, θ∗

k

).

17

Sampling from the density can be conducted with inversion method or rejection samplingas described before. The advantage of this algorithm is that we can use a fixed vectorof candidates and calculate the density in the second step only once and draw K sam-ples vectorial, which makes it faster than methods that calculate the density separately foreach sample as explained at the end of Section 2.2.6. We will see the different runtimes lateron.

Algorithm 1 yields samples from the predictive distribution p(Y ∗m|Y(n)), m = 1, ..., M, for each of

the points τ1, ..., τM , but no trajectories.

This can be seen as follows. For M = 2 it is

Y∗(r)

1 ∼ 1K

K∑k=1

p(Y ∗

1 |Y ∗(k)0 , θ∗

k

)≈∫

p(Y ∗1 |Y ∗

0 , θ)p(θ|Y(n)) dθ = p(Y ∗1 |Y(n)), r = 1, ..., K,

for Y ∗0 ∈ y0, Yn. In the second step it is

Y∗(r)

2 ∼ 1K

K∑k=1

p(Y ∗

2 |Y ∗(k)1 , θ∗

k

)≈∫

p(Y ∗2 |Y ∗

1 , θ)p(Y ∗1 , θ|Y(n)) d(Y ∗

1 , θ) = p(Y ∗2 |Y(n))

for r = 1, ..., K. There are three approximations in one step as can be seen by

p(Y ∗2 |Y(n)) =

∫p(Y ∗

2 |Y ∗1 , θ)p(Y ∗

1 , θ|Y(n)) d(Y ∗1 , θ)

≈∫

p(Y ∗2 |Y ∗

1 , θ)p(Y ∗1 |Y(n))p(θ|Y(n)) d(Y ∗

1 , θ) (2.4)

≈ 1K1K2

K1∑k1=1

K2∑k2=1

p(Y ∗

2 |Y ∗(k1)1 , θ∗

k2

)(2.5)

≈ 1K

K∑k=1

p(Y ∗

2 |Y ∗(k)1 , θ∗

k

). (2.6)

Approximation (2.4) is related to the fact that p(Y ∗1 , θ|Y(n)) = p(Y ∗

1 |θ, Y(n))p(θ|Y(n)). In addi-tion, it is

p(Y ∗1 |θ, Y(n)) =

p(Y ∗1 , θ, Y(n))

p(θ, Y(n))

=p(Y ∗

1 , θ, Y(n))p(θ|Y(n))p(Y(n))

=p(θ|Y ∗

1 , Y(n))p(θ|Y(n))

p(Y ∗1 , Y(n))

p(Y(n))

=p(θ|Y ∗

1 , Y(n))p(θ|Y(n))

p(Y ∗1 |Y(n))

≈ p(Y ∗1 |Y(n)). (2.7)

For large n, the term p(θ|Y ∗1 ,Y(n))

p(θ|Y(n)) is close to 1. In practice, the variables Y∗(k)

1 , k = 1, ..., K, areeven drawn from the predictive distribution based on p(θ|Y(n)) and, therefore, will not have addi-tional information for the parameter.

18

Approximation (2.5) is explained by p(Y ∗1 |Y(n)) ≈ 1

K1

∑K1k1=1 1Y

∗(k1)1 (Y ∗

1 ) and

p(θ|Y(n)) ≈ 1K2

∑K2k2=1 1θ∗

k2(θ), which can be arbitrary good according to the choice of K1 and

K2.

Approximation (2.6) is due to the fact that the expression (2.5) is computationally verycostly. We will investigate this approximation through Example 1. In Figure 8, we seeboth densities, expression (2.5) and (2.6) for three different sample sizes K = K1 = K2 ∈10, 100, 1000. For K = 10, a small difference is identifiable, but even for K = 100,the difference is small. In practice, even samples size K = 1000 is small for the approx-imation of the density, and there is no difference between the two curves. The runtimesfor drawing K samples from the displayed densities are calculated ten times with an In-tel(R) Core(TM) i5-3470 CPU (3.20GHz), 8 GB RAM, Windows 7 64bit personal com-puter. The average times are displayed in Table 1. For K = 10, the difference is negligible.But for growing samples size, this runtime saving decides over feasibility of the algorithm.

0.6 0.8 1.0 1.2 1.4 1.6

0.0

0.5

1.0

1.5

2.0

2.5

Y2

dens

ity

single sumdouble sum

(a) K = K1 = K2 = 10

0.6 0.8 1.0 1.2 1.4 1.6

0.0

0.5

1.0

1.5

2.0

Y2

dens

ity


(b) K = K1 = K2 = 100

0.6 0.8 1.0 1.2 1.4 1.6

0.0

0.5

1.0

1.5

2.0

Y2

dens

ity


(c) K = K1 = K2 = 1000

Figure 8: Investigation of approximation (2.6) with Example 1; in dotted lines:1

K1K2

∑K1k1=1

∑K2k2=1 p

(Y ∗

2 |Y ∗(k1)1 , θ∗

k2

), in solid lines: 1

K

∑Kk=1 p

(Y ∗

2 |Y ∗(k)1 , θ∗

k

), K =

K1 = K2 ∈ 10, 100, 1000.

Table 1: Overview of the runtimes in seconds for expression (2.5) and expression (2.6), averagedover 10 runs.

K = K1 = K2 = 10 K = K1 = K2 = 100 K = K1 = K2 = 1000expression (2.5) 0.038 1.694 199.136expression (2.6) 0.021 0.040 0.187

Altogether, the approximations (2.4)-(2.6) are justified and Y∗(1)

2 , ..., Y∗(K)

2 can be seen as sam-ples from p(Y ∗

2 |Y(n)). For arbitrary M , we can assume Y∗(k)

M−1 ∼ p(Y ∗M−1|Y(n)), k = 1, ..., K, and

with induction

Y∗(r)

M ∼ 1K

K∑k=1

p(Y ∗

M |Y ∗(k)M−1, θ∗

k

)≈∫

p(Y ∗M |Y ∗

M−1, θ)p(Y ∗M−1, θ|Y(n)) d(Y ∗

M−1, θ) = p(Y ∗M |Y(n))

follows for r = 1, ..., K with the same approximations as described before.

The following algorithm is made for the case that trajectories are the value of interest. Here,it is possible to draw not exactly M samples from the trajectory, but to choose a critical value

19

yc and to stop drawing if it is reached. In addition, it is possible to draw L samples with L =K.

Algorithm 2Take samples θ∗

k ∼ p(θ|Y(n)), k = 1, ..., K, from the posterior distribution, respectively fromthe predictive distribution of latent variables. For l = 1, ..., L repeat the following steps.

(i) Set m = 1 and

(ia) Y∗(l)

0 := Yn: within-sample prediction,

(ib) Y∗(l)

0 := y0: new-sample prediction.

(ii) For m = 1, 2, ... draw one sample

Y ∗(l)m ∼ 1

K

K∑k=1

p(Y ∗

m|Y ∗(l)m−1, θ∗

k

).

For notation simplicity, in the following, we restrict to the case of M trajectory entries.

The reason why this procedure leads to L independent samples of (Y ∗1 , ..., Y ∗

M ) can be seen as fol-lows. At first, it is

p(Y ∗m|Y ∗(l)

m−1, Y(n)) =∫

p(Y ∗m|Y ∗(l)

m−1, θ)p(θ|Y ∗(l)m−1, Y(n)) dθ

≈∫

p(Y ∗m|Y ∗(l)

m−1, θ)p(θ|Y(n)) dθ (2.8)

≈ 1K

K∑k=1

p(Y ∗

m|Y ∗(l)m−1, θ∗

k

), m = 1, ..., M.

The first equation is just the Markov property p(Y ∗m|Y ∗(l)

m−1, Y(n), θ) = p(Y ∗m|Y ∗(l)

m−1, θ). Thelast approximation is the same as in (2.5). Approximation (2.8) is the same as in (2.7)and, therefore, the same approximation as in (2.4). For large n, one sample Y

∗(l)m−1 does not

effect the posterior. By the way, Y∗(l)

m−1 is itself drawn by the predictive distribution basedon the same posterior samples. Therefore, even in finite sample sizes, the effect is negligi-ble.

Afterwards, in iteration l, it is approximated

Y∗(l)

1 ∼ p(Y ∗1 |Y(n)),

Y∗(l)

2 ∼ p(Y ∗2 |Y ∗(l)

1 , Y(n)),...

Y∗(l)

M ∼ p(Y ∗M |Y ∗(l)

M−1, Y(n)).

Using the Markov property, the joint density is

p(Y ∗1 , ..., Y ∗

M |Y(n)) = p(Y ∗1 |Y(n)) · p(Y ∗

2 |Y ∗1 , Y(n)) · ... · p(Y ∗

M |Y ∗M−1, Y(n))

and we get L independent samples from p(Y ∗1 , ..., Y ∗

M |Y(n)) with Algorithm 2.

20

To see the difference between the two sampling Algorithms 1 and 2, we will compare them in thesetting of Example 1.

We run the following simulation study. 500 series with n = 20 are simulated to have a high esti-mation uncertainty. To simulate time-continuity, between each two time points, 99 equidistantpoints are inserted, the trajectories from the model are drawn, and afterwards, every 100thpoint is taken as observation. This means, for t0 = 0, t1 = 0.05, ..., t20 = 1 and ∆ = ∆i = 0.05,we simulate the series Y0, Y1, ..., Y2000 in t0 = 0, t1 = 0.0005, ..., t2000 = 1 with ∆ = 0.0005.Afterwards, Y0 = Y0, Y1 = Y100, ..., Y20 = Y2000 form the observed variables. See Iacus (2008);Phillips (1973) for further details. In Table 2, an overview of the various time distances ofthe Euler approximation can be found, that are used for simulation, estimation and predic-tion.

Table 2: Overview of the time distances used for the Euler approximation in the simulation ofthe data series, the observations variables, for prediction and for simulation of thenew series.

∆i forsimulation

∆i of observa-tion variables

∆∗i for

prediction∆i for newseries

Figure 9 0.0005 0.05 0.05 0.0005Figure 11 0.0005 0.05 0.05 0.05/0.0005Figure 14 0.0005 0.05 0.05 0.05/0.005/

0.0005/0.00005Figure 15 0.000005 0.05 0.05 0.05/ 0.000005Figure 16 0.02 0.02 see (2.11) 0.02Figure 17 0.02 0.02 see (2.11) 0.02Figure 18 0.000002 0.02 see (2.11) 0.000002

Afterwards, the parameters are estimated, information will be given in Section 3. WithK = 10 000 samples from the posterior distribution, we will compare four prediction methods.The first two are the ones presented in Algorithms 1 and 2. The third method is simple sam-pling from the model based on the point estimation, as often used in the frequentist approach.This procedure fits in the definition of naive prediction in Meeker and Escobar (1998) and will,therefore, be named naive sampling in the following. In detail, we run the following algorithm.

Algorithm 3 (Naive sampling)Let θ = 1

K

∑Kk=1 θ∗

k be the approximated posterior mean. For l = 1, ..., L repeat the following.

(i) Set m = 1 and

(ia) Y∗(l)


(ib) Y∗(l)



Y ∗(l)m ∼ p

(Y ∗

m|Y ∗(l)m−1, θ

).

21

The fourth one is straightforward sampling common in Bayesian analysis, where each ofthe K posterior samples is taken to simulate one trajectory from the model. All K trajec-tories would also form a prediction, where pointwise mean or quantiles could be used forpoint prediction or prediction intervals. We will name it posterior sampling in the follow-ing.

Algorithm 4 (Posterior sampling)Take samples θ∗

k ∼ p(θ|Y(n)), k = 1, ..., K, from the posterior distribution, respectively fromthe predictive distribution of latent variables. For l = 1, ..., L ≤ K repeat the following.

(i) Set m = 1 and

(ia) Y∗(l)


(ib) Y∗(l)



Y ∗(l)m ∼ p

(Y ∗

m|Y ∗(l)m−1, θ∗

l

).

We need a criterion for the quality of the procedure. The coverage rate is an obvious criterion,since the level has to be kept. But the interval size is also interesting. Gneiting and Raftery(2007) propose an interval score as a combination of interval size, which should be small, andcoverage rate, which should be equal to or larger than the level 1 − α. The score is givenby

S(l, u, y) = u − l + 2α

(l − y)1(−∞,l)(y) + 2α

(y − u)1(u,∞)(y). (2.9)

l denotes the lower and u the upper bound of the credible or prediction interval and y the true,i.e. the observed, value. In the case of a non-covering interval, the deviance from the respectivebound is highly penalized by the deviance itself multiplied with 2

α . During this thesis, we will useα = 0.05, this means, the difference between the observed value and the interval bound is penal-ized with 2

α = 40.

For the prediction quality, we simulate a set of 5000 new series in the same way as de-scribed above, i.e. with ∆ = ∆

100 , and investigate for each of the 500 prediction intervalsover time, if they include the new series or not. All results are averaged and can be seenin Figure 9. The coverage rates are depicted in Figure 9(a), where we can see that theblue line, i.e. the posterior sampling, leads to the highest rates, even higher than the level0.95 in the end. The proposed methods, Algorithm 1 in black and Algorithm 2 in red,perform better than the naive sampling, but do not hold the level. As explained before,we are interested in precise prediction, also taking into account the interval sizes. For thatreason we have a look at the interval score (2.9). The intervals of the posterior samplingare so large that they lead to a higher score than the other methods even with a higher coveragerate.

We see in Figure 10 two examples of the simulated series with the corresponding predic-tion intervals for the four methods. In gray, the 5000 simulated new series are shown. Allmethods have the lower bound in common. But the upper interval bounds differ. The

22

0.2 0.4 0.6 0.8 1.0

0.88

0.92

0.96

1.00

t

cove

rage

rat

e

Alg.1Alg.2naive Sampl.post.Sampl.

(a) Coverage rates

0.2 0.4 0.6 0.8 1.0

020

4060

80

t

inte

rval

sco

re


(b) Average interval score over time

Figure 9: Example 1: Comparison of the prediction methods in Algorithms 1 and 2, the naivesampling, Algorithm 3, and the posterior sampling, Algorithm 4, each depicted fortime points t1 = 0.05, ..., t20 = 1.

posterior sampling, i.e. the blue line, leads to the largest intervals which is not surprising,since we simulate with all the posterior samples instead of integrating out. The differ-ence between the red and the green lines, i.e. the Algorithm 2 and the naive sampling,is small, and the lines are almost parallel. The procedures are comparable, the only dif-ference is, that Algorithm 2 includes the estimation uncertainty whereby the naive sam-pling only takes the point estimations into account and the only uncertainty stems fromthe model. The smallest intervals are given by the black line, i.e. Algorithm 1, whichis also not surprising, because it calculates pointwise intervals instead of simultaneousbands.

0.0 0.2 0.4 0.6 0.8 1.0

010

3050

70

t

Yt

new simulated seriesestimated seriesAlg.1Alg.2naive Sampl.post.Sampl.

0.0 0.2 0.4 0.6 0.8 1.0

010

3050

70

t

Yt


Figure 10: Prediction intervals with Algorithms 1 and 2, the naive sampling, Algorithm 3, andthe posterior sampling, Algorithm 4, for two exemplary series, in gray: the 5000simulated new series.

With the Euler approximation, we changed the variables for the inference. In the follow-ing, we compare what happens, if we simulate new series from the model assumed forthe estimation without any simulation of time-continuity. That means, we simulate theseries only in time points t0 = 0, t1 = 0.05, ..., t20 = 1. The resulting coverage ratesare shown in Figure 11 in solid lines. The difference between the algorithms stays thesame. But the coverage rate in the beginning is above the level and decreases after-

23

wards. For comparison, the coverage rates from Figure 9(a) are displayed in dotted lines.We will investigate the influence of the Euler approximation in detail in the next sec-tion.

0.2 0.4 0.6 0.8 1.0

0.88

0.92

0.96

1.00

t

cove

rage

rat

e


new series simulated with ∆ = 0.05new series simulated with ∆ = 0.05/100

Figure 11: Comparison of coverage rates of Figure 9(a) (dotted lines) to the the coverage ratesbased on the Euler approximation variables in time points t0 = 0, t1 = 0.05, ..., t20 =1 without simulation of time-continuity (solid lines).

Since the methods have to be applicable, runtimes are an interesting fact in the decision forone procedure. For Example 1, we test the runtimes with n = 50 and the proposed samplingalgorithms. We take K ∈ 1000, 10 000 samples from the posterior and calculate, in each step,K = L ∈ 1000, 10 000 simulations from the predictive distribution to have a fair comparison.In addition, in each procedure, the candidate area to sample Y ∗

m+1 is chosen in the same wayby[

Kmink=1

Y ∗(k)m + b(ϕ, τm, Y ∗

m)∆∗ − 5s(√

γ2, τm, Y ∗m),

Kmaxk=1

Y ∗(k)m + b(ϕ, τm, Y ∗

m)∆∗ + 5s(√

γ2, τm, Y ∗m)]

with ϕ and γ2 the point estimates, i.e. the posterior mean, and Y ∗m = 1

K

∑Kk=1 Y

∗(k)m .

As explained in Section 2.2.6, we compare the vectorial sampling to the separately drawingmethods, each for the rejection sampling and the inversion method. As mentioned, the vectorialversion is inefficient to draw only one sample, therefore, this is not used for Algorithm 2. In thecase of vectorial sampling, we fix for comparison C ∈ 1000, 5000 equidistant points yl = y1 <y2 < ... < yC = yu between the lower and upper bound of the interval. In the other case, wechoose a fineness degree of yu−yl

C to have a fair comparison. Again, the calculations are madewith an Intel(R) Core(TM) i5-3470 CPU (3.20GHz), 8 GB RAM, Windows 7 64bit personal com-puter.

In Table 3, the resulting runtimes are displayed, averaged over 10 runs. We clearly see the ad-vantage of Algorithm 1 that turns out in the vectorial sampling. In brackets, we can see the vari-ances between the ten runs of the algorithms. As imaginable, the rejection sampling has highervariances, since the algorithm quality strongly depends on the candidate area, as many samplesare rejected, if the candidate area is too large.

24

Table 3: Overview of the runtimes in seconds of the two presented algorithms for the rejectionsampling (RS) and the inversion method (IM), averaged over 10 runs. In brackets:variances.

Algorithm 1 Algorithm 2

vectorial

C = 1000 C = 5000 C = 1000 C = 5000 C = 1000 C = 5000

RSK = 103 10.1

(0.04)39.5(0.03)

144.1(42.7)

139.6(6.4)

149.2(22.3)

140.5(15.8)

K = 104 90.6(1.4)

320(0.7)

12 544(35 493)

12 247(141 312)

12 225(50 399)

11 886(49 696)

IMK = 103 8.5

(0.01)40.9(0.4)

93.6(3.4)

116.7(0.8)

89.9(0.8)

109.9(0.02)

K = 104 66.2(0.1)

313(0.02)

7668(521.4)

9845(23 383)

6979(4452)

8853(2880)

2.5. Influence of the Euler Approximation

In the following, we will consider the strengths and weaknesses of the Euler approximation.Originally, we assume a continuous-time process, which we approximate to a time serieswith possibly non-equidistant time points. Of course, both models behave differently. Inthe case of an explicit solution, this approximation should be avoided and inference shouldbe made with the true transition density. However, in many cases, the model is defined byan SDE without an explicit solution or a solution exists, but is not easy to calculate. Inthese cases, the only possibility is an approximation. While more sophisticated schemes as,for example, the Milstein or the Runge-Kutta scheme, exist, see for example Iacus (2008);Fuchs (2013); Bruti-Liberati and Platen (2007), in this thesis, the Euler approximation isused. Of course, from the theoretical point of view, the Euler scheme can be seen as startingpoint and it can be future work to extend the theory to other schemes. However, for the con-sidered data sets in this work, inference and prediction based on the Euler approximation workswell.

At first, we will compare through the first example the explicit solution to the Euler approx-imated variables, simulated with different time distances. Secondly, we have a look at theresults of the simulation study in the last section and consider the quality of the predictionintervals for the Euler variables with different time distances. Thirdly, we investigate whathappens, if the time distances of the estimation variables and the prediction variables are differ-ent.

25

Simulating the Euler error

The process in Example 1 has the explicit solution

Yt = y0 exp(ϕt − γ2

2 t + γWt), (2.10)

also known as geometric Brownian motion, see, e.g., Øksendal (2003). Therefore, we cancompare the process simulated with the explicit solution to the process simulated withthe Euler approximation based on different time distances. We investigate the processin time points with ∆i = 0.01 and compare the Euler approximation variables in (2.2)based on time distances ∆i

10 , ∆i100 , ∆i

1000 and ∆i10 000 . We simulate 100 series for the Brown-

ian motion Wt, t ∈ [0, 1] in the time points with ∆i10 000 to have the same realizations

each.

In Figure 12, we see the average of squared differences between (2.10) and the Euler approxi-mated process in time points t1 = 0.01, ..., t100 = 1. The differences between the squared errorsare large. However, in Figure 13, we can see one example of the 100 repetitions of Figure 12 and

0.0 0.2 0.4 0.6 0.8 1.0

0.00

000

0.00

015

0.00

030

t

diffe

renc

e of

tim

e−co

ntin

uous

and

Eul

er

∆ = 0.01∆ = 0.01 * 10−1

∆ = 0.01 * 10−2

∆ = 0.01 * 10−3

∆ = 0.01 * 10−4

Figure 12: Squared differences between the continuous-time process (2.10) and the Eulerapproximated variables based on time distances ∆i = 0.01, ∆i

10 = 0.01 · 10−1, ∆i100 =

0.01 · 10−2, ∆i1000 = 0.01 · 10−3 and ∆i

10 000 = 0.01 · 10−4, averaged over 100 repetitions,in time points t1 = 0.01, ..., t100 = 1.

the lines are visually hardly to distinguish. This means, for a real data set in the case of no avail-able explicit solution for the SDE, the Euler scheme is a reasonable tool to fit the model approxi-mately to the data.

Quality of prediction with Euler approximated variables for continuous-time process

As we have seen in the simulation study of the last section, the Euler approximation has a stronginfluence on the prediction quality. Similar to Figure 11, we see in Figure 14 the coverage ratesfor the four prediction interval coverage rates, if we draw the new series with ∆i, ∆i/10, ∆i/100(as for the simulations for estimation and prediction) and ∆i/1000 from the process in Example1. We can see that the coverage rates for the simulations drawn from the same Euler variablesas used for estimation and prediction, are the highest, while we can see nearly no differences be-tween the other three cases.

26

0.0 0.2 0.4 0.6 0.8 1.0

010

2030

40

t

Yt

∆ = 0.01∆ = 0.01 * 10−1

∆ = 0.01 * 10−2

∆ = 0.01 * 10−3

∆ = 0.01 * 10−4

Figure 13: The Euler approximated variables, based on time distances ∆i = 0.01, ∆i10 =

0.01 · 10−1, ∆i100 = 0.01 · 10−2, ∆i

1000 = 0.01 · 10−3 and ∆i10 000 = 0.01 · 10−4, of one of

the 100 repetitions of Figure 12 in time points t1 = 0.01, ..., t100 = 1.

0.2 0.4 0.6 0.8 1.0

0.80

0.85

0.90

0.95

1.00

t

cove

rage

rat

e

∆ = 0.05/1000∆ = 0.05/100∆ = 0.05/10∆ = 0.05

(a) Algorithm 1

0.2 0.4 0.6 0.8 1.0

0.80

0.85

0.90

0.95

1.00

t

cove

rage

rat

e

∆ = 0.05/1000∆ = 0.05/100∆ = 0.05/10∆ = 0.05

(b) Algorithm 2

0.2 0.4 0.6 0.8 1.0

0.80

0.85

0.90

0.95

1.00

t

cove

rage

rat

e

∆ = 0.05/1000∆ = 0.05/100∆ = 0.05/10∆ = 0.05

(c) Algorithm 3, naive sampling

0.2 0.4 0.6 0.8 1.0

0.80

0.85

0.90

0.95

1.00

t

cove

rage

rat

e

∆ = 0.05/1000∆ = 0.05/100∆ = 0.05/10∆ = 0.05

(d) Algorithm 4, posterior sampling

Figure 14: Comparison of the prediction interval coverage rates for the new simulated serieson different grids to simulate time-continuity. Each with Algorithms 1 and 2, thenaive sampling, Algorithm 3, and the posterior sampling, Algorithm 4, depictedfor time points t1 = 0.05, ..., t20 = 1.

The same simulation was made for a similar process with the same drift function, but with con-stant variance function. This process results in a linear curve, where the Euler approximation

27

works equally well for all time distances. Therefore, the coverage rates are equal, regardless,whether the new series are simulated with the time distances of the simulation or them for the es-timation.

To investigate this effect closer, we try a non-linear drift function:

Example 2We consider the process (2.1) with h(η, t, y) = 0, s(γ, t, y) = γ and b(ϕ, t, y) = ϕ2y(1 − 1

ϕ1y),

which is known as the logistic curve in the literature, see, for example, Donnet et al. (2010).The process is approximated by

Yi = Yi−1 + ϕ2Yi−1(1 − 1ϕ1

Yi−1) ∆i + γ√

∆i ζi, i = 1, ..., n,

with Y0 = y0 and ζi ∼ N (0, 1). We simulate random samples of the process with y0 = 1, ϕ =(10, 7), γ = 1

2 and t ∈ [0, 1].

We repeated the same simulation study as for Example 1, but with a 10 000 times finergrid for the time points to simulate time-continuity, see also Table 2 on page 21. In Fig-ure 15, we can see clearly the difference between the coverage rates for the 5000 simu-lated data series, first in solid lines: with the time distances from the observation vari-ables, second in dotted lines: with the time distances used for the simulation, that means10 000 times finer than the observation variables to simulate the continuous-time process.

0.2 0.4 0.6 0.8 1.0

0.75

0.85

0.95

t

cove

rage

rat

e


new series simulated with ∆ = 0.05new series simulated with ∆ = 0.05/10000

Figure 15: Comparison of the prediction interval coverage rates for the new simulated serieson the grid used for simulation (dotted line) and on the grid of the estimationvariables (solid line) in the non-linear Example 2. Each with Algorithms 1 and 2,the naive sampling, Algorithm 3, and the posterior sampling, Algorithm 4, depictedfor time points t1 = 0.05, ..., t20 = 1.

Of course, in real data examples, one can only assume a model, where data stem from.Originally, we started with a continuous-time stochastic process, which can only be simu-lated by the Euler approximation with very small time distances, as we have done for the

28

simulations. But the process is only discretely observed. The described algorithms are devel-oped to achieve a smaller grid for the prediction. The question is, if the prediction qualityincreases with smaller or larger time distances compared to these from the data used for estima-tion.

Different time distances in estimation and prediction

In the following, we want to investigate the consequence of different time distances in the estima-tion and the prediction. If the time distances are equal, the Euler approximation is just a modelchange and prediction works with the estimations on the same model. But in the case of non-equidistant time points or predictions made on other time differences ∆∗ than these from thedata, ∆i, i = 1, ..., n, this could be a problem.

We run a simulation study similar to the one before, but with n = 50. Here, we only considerAlgorithm 1, since the Euler approximation is the same for all methods, Algorithm 1 to 4, and,therefore, the results would be virtually the same.

We will investigate the topic for two examples in the following. Firstly, the process seenin Example 1, which is a diffusion with linear drift and autoregressive variance, will beused. Secondly, Example 2 with non-linear drift function and constant variance will be com-pared.

We here simulate 100 series with n = 50 and t0 = 0, ..., tn = 1, i.e. ∆i = ∆ = 0.02, but without asmaller grid for the simulation of time-continuity. Hence, we should only see the effect of the dif-ferent time distances for estimation and prediction without any side effects. We compare the fol-lowing time differences for prediction:

∆∗ ∈∆

4 ,2∆7 ,

∆3 ,

2∆5 ,

∆2 ,

2∆3 , ∆, 1.5∆, 2∆, 2.5∆, 3∆, 3.5∆, 4∆

. (2.11)

After calculating the the prediction intervals, we draw 5000 new series with time dis-tances ∆ of the estimation variables. In Figures 16 and 17, the average coverage rateover all 100 prediction intervals and all 5000 new series are depicted for the two exam-ples.

In Figure 16, we see the resulting coverage rates and the average interval score for Example1. In red, ∆∗ = ∆ = 0.02 is marked. The darker lines are for smaller time differences, thelighter lines for larger time differences. The prediction intervals based on the larger timedistances, have higher coverage rates and smaller interval scores in the second half of the timerange.

For Example 2, a totally different picture occurs in Figure 17. The non-linear drift function hasa large effect. Large time distances lead to a small coverage rate which lies clearly below thelevel in the first half of the time range. In all cases, the red line lies in the middle and close to thelevel. Therefore, we recommend to take the time difference of the observation series. If they arenot equidistant, a small time distance is suitable for prediction, since the dark lines do not lie faraway from the red one.

29

0.0 0.2 0.4 0.6 0.8 1.0

0.80

0.85

0.90

0.95

t

cove

rage

rat

e

∆* =

0.0050.00570.00670.0080.010.01330.020.030.040.050.060.070.08

(a) Coverage rates

0.0 0.2 0.4 0.6 0.8 1.0

020

4060

80

t

inte

rval

sco

re

∆* =

0.0050.00570.00670.0080.010.01330.020.030.040.050.060.070.08

(b) Average interval scores

Figure 16: Investigation of different time distances for prediction with Example 1 (Algorithm1); right: legend for ∆∗; red line: ∆∗ = 0.02, which is the time distance of thesimulations used for estimation. In (a): amount of prediction intervals covering the5000 new simulated series; in (b) the corresponding interval scores, both in timepoints t0 = 0, t1 = 0.02, ..., t50 = 1.

Until now, we only simulated the Euler approximation variables without simulation of time-continuity. It is interesting, if there is some change of the results of Figures 16 and 17, if wesimulate the process with smaller time distances. In Figure 18, we see the results of the same sim-ulation study as for Figure 17, but with simulation of the process on a 10 000 times finer grid, seealso Table 2 on page 21. The coverage rates are all a bit smaller than in Figure 17, which fits into

30

0.0 0.2 0.4 0.6 0.8 1.0

0.75

0.80

0.85

0.90

0.95

t

cove

rage

rat

e

∆* =

0.0050.00570.00670.0080.010.01330.020.030.040.050.060.070.08

(a) Coverage rates

0.0 0.2 0.4 0.6 0.8 1.0

24

68

t

inte

rval

sco

re

∆* =

0.0050.00570.00670.0080.010.01330.020.030.040.050.060.070.08


Figure 17: Investigation of different time distances for prediction with Example 2 (Algorithm1); right: legend for ∆∗; red line: ∆∗ = 0.02, which is the time distance of thesimulations used for estimation. In (a): amount of prediction intervals covering the5000 new simulated series; in (b) the corresponding interval scores, both in timepoints t0 = 0, t1 = 0.02, ..., t50 = 1.

the results from Section 2.4. But the curves among themselves provide the same picture as Fig-ure 17.

To conclude the results of this section, the prediction for the continuous-time process becomesbetter with small time distances, but on the other hand, the prediction quality is best, if

31

0.0 0.2 0.4 0.6 0.8 1.0

0.65

0.70

0.75

0.80

0.85

0.90

0.95

t

cove

rage

rat

e

∆* =

0.0050.00570.00670.0080.010.01330.020.030.040.050.060.070.08

(a) Coverage rates

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

t

inte

rval

sco

re

∆* =

0.0050.00570.00670.0080.010.01330.020.030.040.050.060.070.08


Figure 18: Repeated simulation study from Figure 17, but simulations made with a 10 000times finer grid as the observation variables.

the time distances in the prediction are the same as in the simulated data used for estima-tion. In summary, it could be a good solution to use a data augmentation approach for theestimation to assume Euler variables based on smaller time distances, which are augmentedinside a Gibbs sampler, since they are not completely observed. Afterwards the same Eulervariables are used for the prediction procedure. A suitable data augmentation scheme ispresented in Fuchs (2013). Implementing this to the procedures of this thesis will be future

32

work.

In the remainder of the thesis, simulation studies will be made without simulating time-continuity. That means, simulated data will be generated with the same Euler variables as usedin the estimation and prediction to avoid the need of distinguishing between the quality of theprediction procedure and the quality of the Euler approximation.

33

Part I.Bayesian Analysis and Prediction forModels Based on ContinuousProcessesIn many research areas of engineering, material fatigue plays an important role. Experimentsoften are very expensive because they take a long time and the constructions are costly. Toextract as much information as possible from existing experiments and to predict fatigue, sta-tistical models are a valuable tool. One of these existing experiments was conducted by Virkleret al. (1979). Sixty-eight replicate constant amplitude tests in aluminum alloy were carriedout to investigate the fatigue crack propagation. In each of these tests, the number of cyclesthat leads to fixed crack lengths was observed. Against the natural assumption that a value isobserved at fixed times, here the time is the dependent variable and the crack length is the inde-pendent variable. In Figure 19 we can see the number of cycle counts divided by 10 000 plottedagainst the crack lengths in mm. There are 68 observation series each with 164 observationpoints.

10 20 30 40 50

05

1015

2025

30

crack length

cycl

e co

unts

/ 10

000

Figure 19: Observation series of the 68 experiments of Virkler et al. (1979).

In Hermann et al. (2016b), we compared several growth curves for these data and appliedeach in a regression and a diffusion model, both hierarchical to include knowledge from allthe curves into estimation. Prediction for the further development of one series was made. Itturned out, that the non-linear regression model can compete with the diffusion model forprediction in the far future. For predictions in the near future, the diffusion model is moresuitable, because the Markov property leads to a small uncertainty close to the last observedvalue.

In the data series in Figure 19, no jumps are recognizable which leads to the assump-tion of a continuous stochastic process. Therefore, all over this chapter, we investigatethe process defined by the SDE in (2.1) with the special case of h(η, t, y) = 0, at first

34

for the diffusion process itself, and afterwards, in Section 4, for a noisy observed diffu-sion.

35

3. Diffusion Process

In the first section of Chapter I, we first present a single series model with the estimation andprediction procedure and afterwards extend the approach to the hierarchical model. Since pre-diction of the further development of one series in a hierarchical model is exactly the same as inthe single series model, we restrict in the hierarchical model to the case of prediction of a new se-ries.

3.1. Single Series Model

Estimation

In the following, an estimation procedure based on the Euler approximation of the processdefined by the SDE (2.1) with h(η, t, y) = 0 is given. We here restrict to the simple case of es-timation with the transition density of the Euler approximated variables in (2.2). Fuchs (2013)presents a data augmentation approach for the estimation based on the SDE in the case of largeinter-observation times in the data.

The transition distribution is given in the Bayes model

Yi|Yi−1, ϕ, γ2 ∼ N (Yi−1 + b(ϕ, ti−1, Yi−1) ∆i, s2(γ, ti−1, Yi−1)∆i), i = 1, ..., n, (3.1)ϕ|mϕ, Vϕ ∼ N (mϕ, Vϕ), Vϕ = diag(v2

1, ..., v2p),

γ2|aγ , bγ ∼ IG(aγ , bγ).

Therefore, the likelihood of the observation variables is given by

p(Y0, ..., Yn|ϕ, γ2) =n∏

i=1p(Yi|Yi−1, ϕ, γ2) (3.2)

= 1√2π

n

n∏i=1

1√∆is(γ, ti−1, Yi−1)

exp(

− 12∆is2(γ, ti−1, Yi−1) (Yi − Yi−1 − b(ϕ, ti−1, Yi−1)∆i)2

).

In the case of a non-linear function b, a MH algorithm for ϕ is suitable, described in Sec-tion 2.2.1, with θ = ϕ. For generalization, only this case is considered. Parts of ϕ, whichare linear in b, could be estimated with a conjugate prior, see, for example, Donnet et al.(2010).

In the case of a specific variance function s(γ, t, y) in the form s(γ, t, y) = γ · s(t, y), afull conditional posterior can be calculated. Examples are s(t, y) = 1 with constant vari-ance behavior, s(t, y) = t to have a variance structure dependent on the time or s(t, y) =y that leads to an autoregressive variance dependent on the current value of the pro-cess.

In this case, the full conditional posterior of γ2 is given by

γ2|Y(n), ϕ ∼ IG(

aγ + n

2 , bγ + 12

n∑i=1

(Yi − Yi−1 − b(ϕ, ti−1, Yi−1)∆i)2

s2(ti−1, Yi−1)∆i

).

36

This can be seen by

p(γ2|Y(n), ϕ) = p(γ2)p(Y(n)|γ2, ϕ)

∝(γ2)−aγ−1 exp(

− bγ

γ2

)· 1

(γ2)n2

exp(

− 12γ2

n∑i=1

(Yi − Yi−1 − b(ϕ, ti−1, Yi−1)∆i)2

s2(ti−1, Yi−1)∆i

)

=(γ2)−(aγ+ n2 )−1 exp

(− 1

γ2

(bγ + 1

2

n∑i=1

(Yi − Yi−1 − b(ϕ, ti−1, Yi−1)∆i)2

s2(ti−1, Yi−1)∆i

)).

In the application later on as well as in the package BaPreStoPro, only s(γ, t, y) = γ·s(t, y) is con-sidered. In the case of a general variance function s(γ, t, y), estimation of γ2 can be conductedthrough a MH algorithm as well.

Sampling from the posterior of both parameters ϕ and γ2 jointly can be performed witha Metropolis within Gibbs sampler, also described in Section 2.2.2, with θ = (θ1, θ2) =(ϕ, γ2).

Prediction

As explained in detail in Section 2.4, predictions can be made with one of the two presentedAlgorithms 1 and 2. The transition density p(Y ∗

m|Y ∗m−1, θ) is the density of the normal distribu-

tion

N(Y ∗

m−1 + b(ϕ, τm−1, Y ∗m−1)∆∗, s2(γ, τm−1, Y ∗

m−1)∆∗)

(3.3)

with θ = (ϕ, γ2) and starting value Y∗(k)

0 = Yn for within-sample prediction or Y∗(k)

0 = y0 fornew-sample prediction, k = 1, ..., K.

3.2. Hierarchical Model

In many cases, not only one series is observed. For the example of crack growth, severaltest specimen are stressed over time. The fatigue behavior should be the same for allspecimen but probably there are small differences between the individuals. This effect istaken into account by a hierarchical model. This means, the parameters, in our case onlythe location parameter ϕ, are assumed to be random with a mixture distribution. We hererestrict to the case of the normal distribution, but other choices are possible as well with little ef-fort.

We define the Euler approximated model

Yij |Yi−1,j , ϕj , γ2 ∼ N (Yi−1,j + b(ϕj , ti−1,j , Yi−1,j) ∆ij , s2(γ, ti−1,j , Yi−1,j)∆ij), i = 1, ..., nj ,

ϕj |µ, Ω ∼ N (µ, Ω) i.i.d., j = 1, ..., J, Ω = diag(ω21, ..., ω2

p), (3.4)µ|mµ, Vµ ∼ N (mµ, Vµ), Vµ = diag(v2

1, ..., v2p),

ω2r |aω,r, bω,r ∼ IG(aω,r, bω,r), r = 1, ..., p,

γ2|aγ , bγ ∼ IG(aγ , bγ)

with ∆ij = tij −ti−1,j and Y0j = y0(ϕj), j = 1, ..., J . Denote with Y(nj ,j) the jth observation vec-tor and Y(n·,J) all the observations Yiji=1,...,nj ,j=1,...,J .

37

Estimation

As mentioned, the estimation procedure can also be found in Hermann et al. (2016b) for fixedy0 = y0(ϕ).

Since Y0j depends on the random effect, the likelihood is given by

p(Y(n·,J)|ϕ1, ..., ϕJ , γ2) =J∏

j=1p(Y0j |ϕj)

nj∏i=1

p(Yij |Yi−1,j , ϕj , γ2)

with p(Y0j |ϕj) = 1y0(ϕj)(Y0j).

We assume a conjugate normal prior distribution of µ with mean mµ and diagonal variance ma-trix Vµ. In Hermann et al. (2016b), the matrix representation of the posterior distribution canbe found. Since Vµ and Ω are assumed to be diagonal in our model, we can calculate the poste-rior distribution for each component of µ. It is

p(µr|ϕ1, ..., ϕJ , Ω) = p(µr)p(ϕ1, ..., ϕJ |µ, Ω)

∝ exp(

−12

(µr − mµ,r)2

v2r

)· exp

⎛⎝−12

1ω2

r

J∑j=1

((ϕj)r − µr)2

⎞⎠∝ exp

⎛⎝−12

⎧⎨⎩µ2r − 2µrmµ,r

v2r

+ 1ω2

r

J∑j=1

µ2r − 2(ϕj)rµr

⎫⎬⎭⎞⎠

= exp

⎛⎝−12

⎧⎨⎩µ2r

( 1v2

r

+ J

ω2r

)− 2µr

⎛⎝mµ,r

v2r

+J∑

j=1

(ϕj)r

ω2r

⎞⎠⎫⎬⎭⎞⎠

∝ exp

⎛⎜⎝−12

( 1v2

r

+ J

ω2r

)⎧⎨⎩µr −( 1

v2r

+ J

ω2r

)−1⎛⎝mµ,r

v2r

+J∑

j=1

(ϕj)r

ω2r

⎞⎠⎫⎬⎭2⎞⎟⎠ ,

which is proportional to the

N

⎛⎝( 1v2

r

+ J

ω2r

)−1⎛⎝mµ,r

v2r

+J∑

j=1

(ϕj)r

ω2r

⎞⎠ ,

( 1v2

r

+ J

ω2r

)−1⎞⎠

density, r = 1, ..., p.

In the case of correlations between the random effects, i.e. non-diagonal matrix Ω, one couldtake the Wishart prior distribution which is also conjugate to the normal likelihood, see, for ex-ample, Donnet et al. (2010). Choosing the inverse gamma distribution for the diagonal elements,we get the conditional posterior distribution

ω2r |(ϕ1)r, ..., (ϕJ)r, µr ∼ IG

⎛⎝αω,r + J

2 , βω,r + 12

J∑j=1

((ϕj)r − µr)2

⎞⎠ , r = 1, ..., p.

The calculation is analogous to the posterior of γ2 in the single series model.

38

The full conditional posterior of γ2 is similar to the single series model given by

γ2|Y(n·,J), ϕ1, ..., ϕJ ∼ IG

⎛⎝aγ +J∑

j=1

nj

2 , bγ + 12

J∑j=1

nj∑i=1

(Yij − Yi−1,j − b(ϕj , ti−1,j , Yi−1,j)∆ij)2

s2(ti−1,j , Yi−1,j)∆ij

⎞⎠ .

After choosing starting values ϕ∗j0, j = 1, ..., J, for the MH step and µ∗

0, Ω∗0 and γ2∗

0 for the Gibbssampler, we draw for k = 1, ..., K from

ϕ∗jk ∼ p(ϕj |Y(nj ,j), γ2∗

k−1, µ∗k−1, Ω∗

k−1), j = 1, ..., J,

µ∗k ∼ p(µ|ϕ∗

1k, ..., ϕ∗Jk, Ω∗

k−1),Ω∗

k ∼ p(Ω|ϕ∗1k, ..., ϕ∗

Jk, µ∗k),

γ2∗k ∼ p(γ2|Y(n·,J), ϕ∗

1k, ..., ϕ∗Jk).

Prediction

As already mentioned, for within-sample prediction for one of the series, for example the last se-ries J , the prediction for the hierarchical model is exactly the same as in the single series model.The transition density is the same as in (3.3) with θ = (ϕJ , γ2) and the procedure starts withY

∗(k)0 = YnJ , k = 1, ..., K.

In the hierarchical model, prediction for a new series is different. We first have to calculate thepredictive distribution of the unobserved random effect for the new series, say the (J+1)th series.It is given by

p(ϕJ+1|Y(n·,J)) =∫

p(ϕJ+1|µ, Ω)p(µ, Ω|Y(n·,J)) d(µ, Ω) (3.5)

≈ 1K

K∑k=1

p(ϕJ+1|µ∗k, Ω∗

k)

with µ∗k and Ω∗

k resulting from the Gibbs sampler, see, for example, Dion, Hermann and Samson(2016). Again, this is not a known distribution. Sampling from this density can be realizedthrough one of the sampling methods in Sections 2.2.5 or 2.2.6, e.g. the inversion method. Withthe samples ϕ∗

J+1,1, ..., ϕ∗J+1,K from the predictive distribution, we can compute a prediction for

a future observation variable Y ∗J+1 in t∗

J+1 based on the Euler approximated variables with Algo-rithm 1 or 2 where θ∗

k = (ϕ∗J+1,k, γ2∗

k ) and the first step begins with Y∗(k)

0 = y0(ϕ∗J+1,k) because

we predict a new series.

3.3. Application

For the hierarchical model, a wide simulation study in the case of prediction for the furtherdevelopment of one of the series is presented in Hermann et al. (2016b) using Algorithm 1. In ad-dition, six growth curves were applied to the data set of Virkler et al. (1979). The result was that

39

the best fitting curves are one curve derived by the Paris-Erdogan equation (1.1) and the Weibullcurve. We here restrict to the curve

b(ϕ, t, y) = ϕ2t

(ϕ1 − y)

with ϕ ∈ (0, ∞)2, derived from the Paris-Erdogan equation.

In the following, we will compare prediction results in the single series and the hierarchi-cal model. For both, we assume a constant variance function s(t, y) = 1. In Hermann etal. (2016b), the starting value for the location parameter ϕ is (25, 1.8). Here, we chooseγ2∗

0 = 0.1 and take the last of the series, i.e. the 68th, for a pre-estimation. The result-ing posterior means are taken as starting points, for µ∗

0 in the hierarchical and for ϕ∗0 in

the single series model, whereby γ∗0 is the same for both models. In addition, for the hi-

erarchical model, we set Ω∗0 = µ∗

0100 . These are starting parameters for the Gibbs sampler

and also taken for the prior means. The prior parameters are chosen so that the standarddeviation is equal to the mean. This choice is based on the results of the sensitivity anal-ysis in Section 2.3, where estimation works well even with wrong prior mean. Therefore,we avoid a too informative prior information by setting the standard deviation equal to themean.

To verify the prediction results, we truncate one of the series for estimation, here we arbitrarilychoose the 10th, after the first 80 observations and predict the further development. Thismeans, for the single series model, the posterior depends only on the first 80 observations ofthe 10th series. For the hierarchical model, beside the 68th series used for the pre-estimation,all remaining 66 series are used in addition. In Figure 20, we compare the different predictionmethods for the two models. Firstly, for the single series model, Figure 20(a) shows the67 series except the 10th in gray lines, the 10th series up to the 80th observation point isdepicted by circles and the observations to be predicted in dots. Algorithms 1 to 3 leadto very similar prediction intervals. Algorithm 4, in blue lines, has larger intervals similarto the results in the previous section. In Figure 20(b), we see the interesting part with-out the gray lines in detail, where we can see that the curve is not well followed by theintervals. The aim of our modeling is first to find a model that describes the data welland second to calculate a precise prediction. Therefore, even if the blue lines include theobserved values, this is not a satisfying result. In Figure 20(c), we see the results for thehierarchical model, again in gray lines the 67 series except the 10th, in circles the part ofthe 10th series used for estimation and in dots the part to be predicted. In detail, Figure20(d) shows that the prediction intervals follow the curve much better than the single seriesmodel.

In Figure 21, we compare the two models, single series and hierarchical more closely forAlgorithm 1. At first, the intervals are smaller in the hierarchical model than in the singleseries model. One reason can be the more informative prior distribution for the random effectthrough the mixture distribution than a fixed chosen prior distribution for a fixed effect inthe single series model, which leads to a smaller variance estimation. Secondly, the predictionintervals follow the curve better. Here, again the reason might be, that the estimation of therandom effect is more precise for the data series in the hierarchical than in the single seriesmodel, since information of the other series goes in. In summary, we can say that the hierar-chical model performs much better, with the drawback of larger runtimes, since in every Gibbs

40

10 20 30 40 50

05

1020

30

crack length

cycl

e co

unts

/ 10

000

all data seriescurrent seriespredicted valuesAlg.1Alg.2naive Sampl.post.Sampl.

(a) Prediction results in the single series model

25 30 35 40 45 50

2022

2426

28

crack length

cycl

e co

unts

/ 10

000

predicted valuesAlg.1Alg.2naive Sampl.post.Sampl.

(b) Figure 20(a), zoomed in the interesting area

10 20 30 40 50

05

1020

30

crack length

cycl

e co

unts

/ 10

000

all data seriescurrent seriespredicted valuesAlg.1Alg.2naive Sampl.post.Sampl.

(c) Prediction results in the hierarchical model

25 30 35 40 45 50

2022

2426

crack length

cycl

e co

unts

/ 10

000

predicted valuesAlg.1Alg.2naive Sampl.post.Sampl.

(d) Figure 20(c), zoomed in the interesting area

Figure 20: 95% prediction intervals for the 10th series of the Virkler data with the four differentprediction methods Algorithms 1 and 2, the naive sampling, Algorithm 3, and theposterior sampling, Algorithm 4; top: in the single series model, bottom: in thehierarchical model.

sampler iteration the likelihood in the MH step of the random effects for all series has to be calcu-lated.

41

25 30 35 40 45 50

2022

2426

crack length

cycl

e co

unts

/ 10

000

data to be predictedquantiles hierarchicalquantiles single seriesmedian hierarchicalmedian single series

Figure 21: 95% prediction intervals for Algorithm 1 with the single series and the hierarchicalmodel for the 10th series.

42

4. Hidden Diffusion Process

We here consider a stochastic model to describe a noisy observed diffusion. This can besuitable if two types of uncertainties are assumed. Firstly, the randomness of the crack growthas described by the diffusion process itself and secondly, a measuring error according tothe experimental set up. In Gunkel et al. (2012), for example, crack growth is observed bytaking photos of the test specimen at several time points during the experiment. Evaluatingdata from these images is a very sophisticated procedure and exhibits a high measuringvariance, which is the same over time and, therefore, can be represented by an additiveerror term in a regression model, whereby the crack growth process itself follows a diffusionmodel.

In the case of a mixed model and in the case of an explicit solution of the diffusion pro-cess, an estimation procedure is described in Donnet et al. (2010). But diffusion processesoften are defined by a SDE without a known explicit solution. In the following, we proposehow to estimate the parameters of the hidden diffusion model based on the Euler approximationscheme.

In the described model, the diffusion process is latent and has to be estimated. With-out an explicit solution that results in a known multivariate distribution, this step is notself-evident. Our model fits into the class of hidden Markov models and is closely relatedto the discrete-time state space models. The underlying problem of estimating a high-dimensional latent variable is well-known in the literature. Several filtering approaches areavailable, see, for example, Liu and Chen (1998); Doucet et al. (2000); Del Moral et al.(2006); Carvalho et al. (2010); Barber et al. (2011). In the following, we will shortly presentthe idea of Andrieu et al. (2010) and implement the particle Gibbs sampler introduced intheir work. The following results of this section are ongoing work and are not publishedyet.

Analogously to the previous section, we will start with the single series model and extend it after-wards to a hierarchical model.

4.1. Single Series Model

Similar to the previous section, we start with a model for one data series

Zi = Yti + ϵi, (4.1)dYt = b(ϕ, t, Yt) dt + s(γ, t, Yt) dWt, Y0 = y0(ϕ),

ϵi ∼ N (0, σ2) i.i.d., i = 0, ..., n.

As mentioned, we will approximate the hidden diffusion process with the Euler approximation in

43

(2.2) with h(η, t, y) = 0. We assume the following Bayes model

Zi|Yi, σ2 ∼ N (Yi, σ2), (4.2)Yi|Yi−1, ϕ, γ2 ∼ N (Yi−1 + b(ϕ, ti−1, Yi−1) ∆i, s2(γ, ti−1, Yi−1)∆i), i = 1, ..., n,

ϕ|mϕ, Vϕ ∼ N (mϕ, Vϕ), Vϕ = diag(v21, ..., v2

p),γ2|aγ , bγ ∼ IG(aγ , bγ),σ2|aσ, bσ ∼ IG(aσ, bσ).

We pool the model parameters with θ = (ϕ, γ2, σ2) and for simplification introduce the addi-tional short notation for the vector Z(n) = (Z0, ..., Zn).

Estimation

In the paper of Donnet et al. (2010) the estimation procedure is described for the special case ofthe Gompertz process, i.e. b(ϕ, t, y) = BCe−Cty, ϕ = (A, B, C) and s(γ, t, y) = γy, because inthis case there exists a solution of the process that leads to a multivariate normal likelihood forthe vector of logarithmic observations. But the question arises if there is a possibility to developa procedure based the SDE without explicit solution.

The discretized model (4.2) is also known as state-space model in the literature and severalfiltering approaches are available. We here present the idea of Andrieu et al. (2010), calledparticle Gibbs sampler, for the special case of our model. This procedure works equal to a usualGibbs sampler explained in Section 2.2.2, where one step is sampling of the latent variable Y(n)to approximate its density p(Y(n)|Z(n), θ). This will be done by sequentially approximatingp(Y(i)|Z(i), θ), i = 1, ..., n, with L particles Y l

1 , ..., Y li , l = 1, ..., L, which is called sequential

Monte Carlo (SMC). The procedure is an extension of the sequential importance sampling, seeLiu and Chen (1998); Doucet et al. (2000); Del Moral et al. (2006). Classically, Y l

i is drawniteratively from importance, or proposal, density p(Yi|Yi−1, Zi, θ), which is the optimal one, seeLiu and Chen (1998); Doucet et al. (2000), and importance weights W l

i , l = 1, ..., L, i = 1, ..., n,are calculated. This can lead to a poor approximation of p(Y(n)|Z(n), θ) if many of theweights are close to 0. In the sequential Monte Carlo algorithm, in each step, an indexAl

i ∼ p(·) =∑L

l=1 W li1l(·) is drawn and Y

Ali

i is taken as the ancestor of Y li+1. This means, the

trajectories with higher probability are proceeded more often than less probable trajectories.An illustration is given in Figure 22 in a toy example with L = 5 particles. In Figure 22(b),the dots mark the example observations, while Figure 22(a) shows the particles, that aredrawn during the algorithm. One can see, that some trajectories end, when they are lessprobable than others. In these points, two or more paths start at a more probable trajectory.In Figure 22(b), the resulting trajectories can be seen, which are estimations of the latentY(n).

In detail, it is

p(Yi|Yi−1, Zi, θ) ∝ p(Yi|Yi−1, ϕ, γ2) · p(Zi|Yi, σ2).

44

0.0 0.2 0.4 0.6 0.8 1.0

24

68

1012

t

part

icle

s

(a) Drawn particles

0.0 0.2 0.4 0.6 0.8 1.0

24

68

1012

t

Zt

/ tra

ject

orie

s

dataparticles of the SMC

(b) Resulting trajectories

Figure 22: Example result of the sequential Monte Carlo algorithm, (a): the particles that aredrawn during the algorithm, (b): the resulting trajectories.

Both are normal densities, which leads to

Yi|Yi−1, Zi, θ ∼

N(( 1

s2(γ, ti−1, Yi−1)∆i+ 1

σ2

)−1 (Yi−1 + b(ϕ, ti−1, Yi−1) ∆i

s2(γ, ti−1, Yi−1)∆i+ Zi

σ2

),

( 1s2(γ, ti−1, Yi−1)∆i

+ 1σ2

)−1)

.

This will be the proposal distribution for the increments.

For notation simplicity, we define p1:L(k|p1, ..., pL) =∑L

l=1 pl1l = k. The SMC algorithm forfixed θ = (ϕ, γ2, σ2) is given as follows.

Algorithm 5 (sequential Monte Carlo (SMC))To draw L particles, run the following steps:

i = 0 : Set Y l0 = y0(ϕ) and W l

0 = 1L , W0 = (W 1

0 , ..., W L0 ).

i = 1, ..., n :

(i) Draw Ali−1 ∼ p1:L(·|Wi−1), l = 1, ..., L,

(ii) set Y lj = Y

Ali−1

j , j = 1, ..., i − 1, l = 1, ..., L,

(iii) draw Y li ∼ p(Yi|Y l

i−1, Zi, θ), l = 1, ..., L,

(iv) set wli = p(Y l

i |Y li−1,ϕ,γ2)p(Zi|Y l

i ,σ2)p(Y l

i |Y li−1,Zi,θ) = p(Zi|Y l

i−1, θ), l = 1, ..., L,

(v) set W li = wl

i∑L

j=1 wji

, l = 1, ..., L, and Wi = (W 1i , ..., W L

i ).

For further details and theoretical background, see Andrieu et al. (2009) and Andrieu et al.(2010).

45

Point (iv) can be seen by the calculation for the denominator

p(Yi|Yi−1, Zi, θ) =p(Yi|Yi−1, θ)p(Zi|Yi, Yi−1, θ)p(Zi|Yi−1, θ)

=p(Yi|Yi−1, ϕ, γ2)p(Zi|Yi, σ2)p(Zi|Yi−1, θ) .

To have one sample for the rest of the Gibbs sampler, we draw j ∼ p1:L(·|Wn) and set in the kthiteration Y

(k)(n) = (Y j

0 , ..., Y jn ).

Algorithm 5 is extended by a conditional sequential Monte Carlo (CSMC). This means, in eachof the iterations of the Gibbs sampler, the particle of the last iteration step Y

(k)(n) is fixed for the

sequential Monte Carlo step and surely one of the particles. For reasons of clarity and compre-hension, the extensions are explained in the Appendix B.

In the following, we present the full conditional posteriors of the parameters ϕ, γ2 and σ2 condi-tional on Y(n). The full conditional posterior distribution of γ2 is exactly the same as in Section 3.For ϕ, a small difference in the likelihood appears because of the starting point y0(ϕ). The poste-rior does not only depend on the likelihood of the Y(n), but also on the starting point Z0 as can beseen by

p(ϕ|Z(n), Y(n), γ2, σ2) ∝ p(ϕ|Y(n), γ2, σ2) · p(Z(n)|ϕ, Y(n), γ2, σ2)

= p(ϕ|Y(n), γ2) · p(Z0|y0(ϕ), σ2) ·n∏

i=1p(Zi|Yi, σ2)

∝ p(ϕ|Y(n), γ2) · p(Z0|ϕ, σ2),

where p(ϕ|Y(n), γ2) is proportional to the likelihood of Y(n) in (3.2) multiplied to the normal priordensity and p(Z0|ϕ, σ2) is the density of the N (y0(ϕ), σ2) distribution.

Finally, with the prior σ2 ∼ IG(aσ, bσ) we get the posterior

σ2|Z(n), Y(n) ∼ IG(

aσ + n + 12 , bσ + 1

2

n∑i=0

(Zi − Yi)2)

.

Choose starting parameters θ∗0 = (ϕ∗

0, γ2∗0 , σ2∗

0 ) and draw starting latent variable Y(0)

(n) with theordinary SMC algorithm. Afterwards, conduct the particle Gibbs sampler for k = 1, ..., Kthrough

ϕ∗k ∼ p(ϕ|Z(n), Y

(k−1)(n) , γ2∗

k−1, σ2∗k−1),

Y(k)

(n) ∼ p(Y(n)|Z(n), ϕ∗k, γ2∗

k−1, σ2∗k−1),

γ2∗k ∼ p(γ2|Y (k)

(n) , ϕ∗k),

σ2∗k ∼ p(σ2|Z(n), Y

(k)(n) ),

where Y(k)

(n) is drawn by the CSMC for k = 1, ..., K.

The resulting samples θ∗k = (ϕ∗

k, γ2∗k , σ2∗

k ), k = 1, ..., K, can be seen as samples from the posteriorp(θ|Z(n)).

46

Prediction

To calculate a predictive distribution for Z∗ in t∗, we first simulate from the predictive distri-bution of Y ∗ in t∗ with one of the algorithms in Sections 2.2.5 or 2.2.6. The transition densityis the same as in Section 3, namely the density of the normal distribution in (3.3). Algorithm1 or 2 is then conducted with θ = (ϕ, γ2) and starting values Y

∗(k)0 = Y

(k)n , k = 1, ..., K,

stemming from the particle Gibbs sampler in the case of within-sample prediction, andY

∗(k)0 = y0(ϕ∗

k), k = 1, ..., K, in the case of new- sample prediction. This yields sam-ples Y

∗(k)1 , ..., Y

∗(k)M in time points τ1, ..., τM = t∗, k = 1, ..., K for the latent diffusion vari-

ables.

For the prediction of Z∗ we use the samples Y∗(k)

M ∼ p(Y ∗|Z(n)), k = 1, ..., K, and the samplesσ2∗

k , k = 1, ..., K, from the Gibbs sampler and calculate the distribution

p(Z∗|Z(n)) =∫

p(Z∗|Y ∗, σ2) · p(Y ∗, σ2|Z(n)) d(Y ∗, σ2) (4.3)

≈ 1K

K∑k=1

p(Z∗|Y ∗(k)M , σ2∗

k ).

Since Y ∗ and σ2 are independent, this approximation is straightforward. We will see asimulation study and the application to the data set of Virkler et al. (1979) in Section4.3.

4.2. Hierarchical Model

We here present an extension of the hierarchical diffusion model presented in Section 3 and addan i.i.d. regression error. Of course, several extensions are possible. As mentioned above, othermixture distributions than the normal one could be of interest. Here, in addition other variancestructures for the regression error can be suitable. For example, a time-dependent variance func-tion for the regression error variable could be chosen. But this is future work and is beyond thescope of this thesis.

We define the Euler approximated Bayes model

Zij = Yij + ϵij , (4.4)Yij |Yi−1,j , ϕj , γ2 ∼ N (Yi−1,j + b(ϕj , ti−1,j , Yi−1,j) ∆ij , s2(γ, ti−1,j , Yi−1,j)∆ij),

ϕj |µ, Ω ∼ N (µ, Ω), Ω = diag(ω21, ..., ω2

p),ϵij ∼ N (0, σ2), i = 1, ..., nj , j = 1, ..., J,

µ|mµ, Vµ ∼ N (mµ, Vµ), Vµ = diag(v21, ..., v2

p),ω2

r |aω,r, bω,r ∼ IG(aω,r, bω,r), r = 1, ..., p,

γ2|aγ , bγ ∼ IG(aγ , bγ),σ2|aσ, bσ ∼ IG(aσ, bσ)

with Y0j = y0(ϕj), j = 1, ..., J . Denote with Z(nj ,j) the jth observation vector and Z(n·,J) all theobservations Ziji=1,...,nj ,j=1,...,J .

47

Estimation

Analogously to the single series model, the estimation of the latent variables Y(n1,1), ..., Y(nJ ,J) isthe crucial estimation step and will be conducted by the CSMC algorithm in each of the particleGibbs sampler iterations.

Based on the estimation of these latent variables, estimation of ϕ1, ..., ϕJ , µ, Ω and γ2 are thesame as for the hierarchical diffusion model in Section 3. The full conditional posterior of σ2 issimilar to the single series model given by

σ2|Z(n·,J), Y(n·,J) ∼ IG

⎛⎝aσ +J∑

j=1

nj + 12 , bσ + 1

2

J∑j=1

nj∑i=0

(Zij − Yij)2

⎞⎠ .

Prediction

Analogously to Section 3, we here restrict to the case of prediction for a new series, since predic-tion for the further development of one observed series is exactly the same as for the single seriesmodel.

Prediction for a new series J+1 is similar to the hierarchical diffusion model in Section 3. Analo-gously to the predictive distribution of ϕJ+1 in (3.5), it is

p(ϕJ+1|Z(n·,J)) =∫

p(ϕJ+1|µ, Ω)p(µ, Ω|Z(n·,J)) d(µ, Ω)

≈ 1K

K∑k=1

p(ϕJ+1|µ∗k, Ω∗

k)

with µ∗k and Ω∗

k resulting from the particle Gibbs sampler.

With the samples ϕ∗J+1,1, ..., ϕ∗

J+1,K from the predictive distribution, we can, similar to Section3, compute a prediction for a future process variable Y ∗

J+1 based on the Euler approximated vari-ables with Algorithm 1 or 2, where θ∗

k = (ϕ∗J+1,k, γ2∗

k ) and Y∗(k)

0 = y0(ϕ∗J+1,k), k = 1, ..., K, since

we predict a new series.

For the prediction of Z∗J+1, we use the predicted variables Y

∗(k)M , k = 1, ..., K, and the samples

σ2∗k , k = 1, ..., K, from the particle Gibbs sampler and calculate the pointwise distribution equal

to (4.3).

4.3. Simulation Study and Application

In this section, we will run a simulation study for the single series model in Section 4.1and show results for one simulation example for the hierarchical model of the last section.Afterwards, both models are applied to the data set of Virkler et al. (1979) and, in addi-tion, the hierarchical model will be applied to the wear data set of Hermann and Ruggeri(2016).

We first run a simulation study to investigate the quality of the particle Gibbs sampler. Weagain use Example 1. In addition, we choose an error variance σ2 = 0.1 and starting point

48

y0(ϕ) = ϕ2 to be estimated. This leads to ϕ = (2, 1) for a comparable process. We simulatetrajectories in t0 = 0, t1 = 0.01, ..., t100 = 1 for the estimation and will make predictions forthe further development in τ1 = 1.01, ..., τM = 1.5, M = 50. Since we have large data serieswith n = 100 for the estimation in comparison to the study in Section 2.4, we expect a smallestimation uncertainty and that the prediction with the naive sampling can compete. Wesimulate 500 series for estimation and prediction. We here start the Gibbs sampler with thechosen values, which are also taken as prior mean. The prior parameters are chosen so thatthe standard deviation is equal to the mean. It will be future work to try several diffusedstarting and prior values. The number of particles for the SMC algorithm is here chosen toL = 100. It will also be future work to investigate, how many particles are at least necessaryfor a most effective algorithm. There is a trade-off between running time and quality of thealgorithm. Many particles lead to a high quality of the algorithm but take a long time and viseversa.

We iterate the particle Gibbs sampler 101 000 times, and less a burn-in phase of 1000 and athinning rate of 10 we get K = 10 000 samples from the posterior. In Figure 23, we see the95% credible intervals of the parameters for the first 100 series. The coverage rates of all 500credible intervals can be seen in Table 4, which leads to the assumption that the algorithm workswell.

0 20 40 60 80 100

−2

02

4

number of simulation

φ 1

(a) Credible intervals of ϕ1

0 20 40 60 80 100

0.0

0.5

1.0

1.5

2.0


φ 2

(b) Credible intervals of ϕ2

0 20 40 60 80 100

01

23

45


γ2

(c) Credible intervals of γ2

0 20 40 60 80 100

0.05

0.10

0.15

0.20


σ2

(d) Credible intervals of σ2

Figure 23: Estimation results in the simulation study for the single series model based on theparticle Gibbs sampler: 95% credible intervals of the parameters for the first 100series, horizontal line: chosen value.

49

Table 4: Coverage rates of the 95% credible intervals of the parameters in the hidden diffusionmodel.

ϕ1 ϕ2 γ2 σ2

0.982 0.942 0.97 0.96

In Figure 24, we see for one example of the 500 series the result of the CSMC algorithm insidethe particle Gibbs sampler. The green lines mark the estimated latent diffusion processes,where the dots mark the simulated data points used for estimation. For prediction of thefurther development, the last trajectory points are used as starting point for the prediction algo-rithm.

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

67

ti

Zt i

simulated data used for estimationestimated latent diffusion processes

Figure 24: Result of the CSMC algorithm for one example series of the simulation study forthe single series model; in green: the estimated latent variables Y

(k)(n) , k = 1, ..., K =

10 000 from the particle Gibbs sampler.

In the following we have a look at the prediction results. For each of the 500 simulated series andcorresponding prediction intervals, we simulate 1000 new series starting each with the last simu-lated point for Y100.

One example can be seen in Figure 25(a) for the latent diffusion process and in Figure 25(b) forthe observation variables. Again, the posterior sampling leads to the largest intervals, whereasthe difference between Algorithms 1 to 3 is small. In addition, because of the small size of σ2, thedifference between the two figures for the latent and the observed variables is small. Therefore,we concentrate on the observation variables.

For the 500 simulated series, we have 500 corresponding prediction intervals leading to a cover-age rate and averaged interval score for the 1000 simulated new series each. Both can be seenin Figures 25(c) and 25(d). At first, we see that the posterior sampling has, with large intervals,a high coverage rate close to the level, but nevertheless the highest interval scores. At second,the results of the naive sampling and the trajectory-wise Algorithm 2 are close to each otherwith a slightly higher interval score for the naive sampling. In comparison to the simulationstudies in Section 2.4, with n = 100 we have a large amount of observations, which leads to asmall estimation uncertainty. Therefore, both algorithms work similar. At third, the coveragerates of Algorithm 1 are the smallest ones with 0.89 at the end, but the interval scores liebetween the lines of Algorithm 2 and the naive sampling, because of the small, i.e. precise, inter-vals.

50

0.0 0.5 1.0 1.5

020

4060

80

t

Yt

new simulated serieslatent diffusion of estimated seriesAlg.1Alg.2naive Sampl.post.Sampl.

(a) Prediction intervals for Y ∗

0.0 0.5 1.0 1.5

020

4060

80

ti

Zt i


(b) Prediction intervals for Z∗

1.0 1.1 1.2 1.3 1.4 1.5

0.89

0.91

0.93

0.95

t

cove

rage

rat

e


(c) Coverage rates of the prediction intervals forobservation variable Z∗

1.0 1.1 1.2 1.3 1.4 1.5

020

6010

0

t

inte

rval

sco

reAlg.1Alg.2naive Sampl.post.Sampl.

(d) Average interval scores of the prediction intervalsfor observation variable Z∗

Figure 25: Prediction results for one series simulated from Example 1: Comparison of theprediction methods in Algorithms 1 and 2, the naive sampling, Algorithm 3,and the posterior sampling, Algorithm 4. (a): prediction intervals for the latentdiffusion variable Y ∗, (b): Prediction intervals for the observation variable Z∗,(c): amount of prediction intervals covering the observation variables of the 1000new simulated series, (d) the corresponding interval scores, both in time pointsτ1 = 1.01, ..., τ50 = t∗ = 1.5.

As already mentioned, the algorithm is very slow. Therefore, for the hierarchical hiddendiffusion model, we only run one simulation to show some results. It will be future work to im-plement the algorithm in another language like C to make it faster. The parameters are similarto above except for µ = (2, 1) and Ω = (1, 0.04). We simulate J = 50 samples for the randomeffect and, therefore, J series each with one random effect in t0 = 0, t1 = 0.01, ..., t100 = 1. Forthe estimation, we truncate the Jth series after the first half and predict the second half after-wards.

The starting values and prior means are chosen equal to the values for the simulation and theprior parameters are set so that standard deviation is equal to the mean. In Figure 26, we seethe resulting chains with 11 000 iterations. The horizontal red lines mark the chosen values. Allchains look stationary with a good mixing and the posterior is close to the chosen value, which ismarked in red.

51

0 2000 4000 6000 8000 10000

1.5

2.0

2.5

Index

µ 1

0 2000 4000 6000 8000 10000

0.8

0.9

1.0

1.1

1.2

Index

µ 2

0 2000 4000 6000 8000 10000

0.5

1.0

1.5

2.0

2.5

Index

ω1

0 2000 4000 6000 8000 10000

0.00

0.10

0.20

Index

ω2

0 2000 4000 6000 8000 10000

0.80

0.90

1.00

1.10

Index

γ2

0 2000 4000 6000 8000 10000

0.09

50.

105

0.11

5

Index

σ2

Figure 26: Markov chains of the parameters µ, Ω, γ2 and σ2 from the particle Gibbs samplerin the hierarchical hidden diffusion model for one simulated data set. In red: thechosen values for simulation.

We take a burn-in phase of 1000 and a thinning rate of 10 and run with the remaining samplesthe prediction algorithm. We here restrict to Algorithm 1, since the differences between thealgorithms were investigated in the study before. The prediction results with Algorithm 1for the Jth series truncated for estimation can be seen in Figure 27. On the left side theunderlying latent diffusion variables are displayed and on the right side the observation variables.

52

0.0 0.2 0.4 0.6 0.8 1.0

12

34

56

t

Y

observed partpart to predictprediction intervals

0.0 0.2 0.4 0.6 0.8 1.0

12

34

56

t

Z

data used for estimationdata to predictprediction intervals

Figure 27: Prediction results for the Jth series in the hierarchical simulated data set: predictionfor the latent variables (left) and the future observations (right).

53

Application to the Virkler data

We have already seen the data set of Virkler et al. (1979) in the model motivation and applica-tion of the diffusion model in Section 3.3. We here fit the hidden diffusion model to the data.For good starting parameters, we use the values from Section 3.3 of the single series model andσ2∗

0 = 0.1. We first estimate the single series hidden diffusion model for the last, i.e. the 68th,series of the Virkler data. The resulting point estimates are taken for the starting values forthe further inference, for the single series and the hierarchical model. In addition, we set Ω∗

0 =µ∗

0100 .

Since the estimation algorithm is computationally expensive, we here take only the first10 of the series for the hierarchical model into account. For both models, the tenth seriesis truncated after the 80th observation to evaluate the prediction afterwards. We com-pare the predictions with the single series and the hierarchical model for this tenth series.We iterate for both models the particle Gibbs sampler 100 000 times, and take a burn-inphase of 25 000 and a thinning rate of 10. The resulting samples in the hierarchical modelfor γ2 and σ2 can be seen in Figure 28. Both chains converge and the locations are verysmall.

0 2000 4000 6000

0.00

600.

0070

0.00

80

Index

γ2

0 2000 4000 6000

0.00

014

0.00

020

Index

σ2

Figure 28: Results for the Virkler data in the hierarchical model (4.4): Markov chains of γ2

(left) and σ2 (right), reduced by the burn-in phase and the thinning rate.

In Figure 29, we see in green the estimated latent variables. The variance γ2 is that small thatthe latent diffusion looks like a solid line. In addition, the difference between the observed pointsand the estimated diffusions is very small, therefore, the small location of the posterior of σ2 isnot surprising.

In Figure 30, we see the prediction results for the tenth Virkler series with the hierarchicaland the single series model, both with Algorithm 1. The point predictions, i.e. the medianof the predictive distribution, are marked with the dotted lines. Similar to the results inSection 3.3, the intervals with the hierarchical model are smaller and follow the curve bet-ter.

In addition, it is interesting to compare the results with the ones from the previous section.In Figure 31, we compare each the hidden diffusion model with the observed diffusion modelfrom Section 3, in Figure 31(a) for the single series model and in Figure 31(b) for the hier-archical model. In the single series model, there is no recognizable difference, whereas forthe hierarchical model, the hidden diffusion model seems a bit more appropriate, even if

54

there are informations of only 9 additional series used instead of 66 in the diffusion model.

10 15 20 25

05

1015

crack length

cycl

e co

unts

/ 10

000

truncated data seriesestimated latent diffusion

Figure 29: Estimation of the latent diffusion variables of the 10th truncated Virkler series inthe hierarchical model (4.4).

10 20 30 40 50

05

1015

2025

crack length

cycl

e co

unts

/ 10

000

observed data partdata to be predictedquantiles hierarchical modelquantiles single series modelmedian hierarchical modelmedian single series model

25 30 35 40 45 50

2022

2426

crack length

cycl

e co

unts

/ 10

000

data to be predictedquantiles hierarchicalquantiles single seriesmedian hierarchicalmedian single series

Figure 30: 95% prediction intervals and the pointwise median for the 10th series of the Virklerdata set based on Algorithm 1, left: whole series, right: zoomed in the interestingarea, in red: the hierarchical model, in blue: the single series model.

25 30 35 40 45 50

2022

2426

crack length

cycl

e co

unts

/ 10

000

data to be predictedquantiles diffusion modelquantiles hidden diffusion model

(a) Single series model

25 30 35 40 45 50

2022

2426

crack length

cycl

e co

unts

/ 10

000

quantiles hierarchical diffusion modelquantiles hierarchical hidden diffusion modeldata to be predicted

(b) Hierarchical model

Figure 31: Comparison of the hidden and the observed diffusion models. 95% predictionintervals and the pointwise median for the 10th series of the Virkler data set.

55

Application to wear degradation in cylinder liners

In the following, we will apply the model to the wear of cylinder liners in marine diesel engines.A detailed explanation of the data set can be found in Hermann and Ruggeri (2016). The dataconcern the wear process of liners protecting cylinders in marine diesel engines. The thicknessof the liner walls are measured over time in 104 hours, starting at 100 millimeters. The wholedata set can be seen in Figure 34 in black points, where we take the wear increment as observedvariables, which is the same as 100 minus the thickness decrement. We have observations from30 cylinder liners over time with non-equidistant time points and every series has one to fourobservations, beside the starting point, which is a sparse data set. Therefore, we restrict to aone-parameter drift function based on the Paris-Erdogan equation (1.1) with m = 0, namelyb(ϕ, t, y) = ϕ and use a constant variance function s(t, y) = 1. All series start in y0(ϕ) = 0 sincein t0 = 0, no wear exists. We start the algorithm with µ∗

0 = 0.8, Ω∗0 = 0.01, γ2 = 0.001 and

σ2 = 0.01, which are taken as prior means as well. Again, the prior parameters are chosen sothat the standard deviation is equal to the mean. The starting values are chosen manually bysimulating the process in the time points of the data set and comparing the data to the simulatedset.

In Figure 32, we see the 95% credible intervals of the random effects for the 30 series. Theydiffer only slightly because of the sparse data series and the mixture distribution has much influ-ence.

0 5 10 15 20 25 30

0.7

0.8

0.9

1.0

1.1

1.2

Index

φ 1

Figure 32: 95% credible intervals for the random effects for the wear data set.

In Figure 33, the chains of the parameters can be seen. We ran the Gibbs sampler with 101 000iterations and took a burn-in phase of 11 000 samples and a thinning rate of 10. The four chainsconverge and have a good mixing.

In Figure 34, the prediction result for a new series with Algorithm 1 is displayed. We see insolid red lines the 95% prediction intervals and in dotted lines the pointwise median of thepredictive distribution. One could imagine a saturation curve in the data, which the assumedmodel does not take into account. Nevertheless, the prediction intervals enclose the data exceptthree points in the beginning. The same data set has been modeled in Hermann and Ruggeri(2016) with a jump diffusion model as will be introduced as Merton model in Section 6. We seein Figure 34(b) the comparison of the results to Figure 34(a). The lines are very similar, butthe here presented model leads to slightly smaller intervals by including the same amount of datapoints.

56

0 2000 4000 6000 8000

0.8

0.9

1.0

1.1

1.2

Index

µ 1

0 2000 4000 6000 8000

0.00

0.02

0.04

0.06

Index

ω1

0 2000 4000 6000 8000

0.10

0.20

0.30

0.40

Index

γ2

0 2000 4000 6000 8000

0.00

050.

0020

0.00

35

Index

σ2

Figure 33: Markov chains of µ, Ω, γ2 and σ2, reduced by the burn-in phase and the thinningrate in the hierarchical hidden diffusion model (4.4) for the wear data set.

As described in detail in the work of Hermann and Ruggeri (2016), the jump diffusionmodel is nicely motivated by the intuition of tiny and large soot particles in the cylin-der liners. The large soot particles are responsible for the most relevant wear increments,represented by the Poisson process in the jump diffusion. Instead of this heuristically mo-tivated, sophisticated model, the hidden diffusion model presented here, is a bit more sim-ple but lead to similar, maybe slightly better, results. From the computational point ofview, both models are very complex with filtering of latent variables, the Poisson processvariables in the jump diffusion model and the diffusion process in the hidden diffusionmodel.

57

0 1 2 3 4 5

02

46

time

wea

r

dataprediction intervalpointwise median

(a) Prediction result with the hierarchical hidden diffusion model, in dottedlines: the pointwise median

0 1 2 3 4 5

02

46

time

wea

r

datahidden diffusionjump diffusion

(b) Comparison with results in Hermann and Ruggeri (2016) with a jumpdiffusion model

Figure 34: 95% prediction intervals for a new series of wear in cylinder liners with the hierar-chical hidden diffusion model (4.4) (in red) and the jump diffusion process model(6.2) introduced in Section 6 (in blue).

58

Part II.Bayesian Analysis and Prediction forNon-continuous ProcessesIn constructional engineering, material fatigue is a relevant topic of research because it is impor-tant to predict the lifetime of, for example, bridges. Experiments in this field, especially underlow loadings, are very rare because they are very expensive and take several months. Maurer andHeeke (2010) carried out five experiments where prestressed concrete beams with initial crackshave been put under cyclic load. Recently, five new experiments were conducted, see Szugat et al.(2016). Each prestressed concrete beam contains 35 tension wires which break at random timepoints. Therefore, the resulting crack widths, which can be seen in Figure 35 for two of the ex-periments, exhibit jumps with increasing frequency that influence the crack growth process sub-stantially. Structure-borne noise measurements during the experiments provide information con-cerning the break times of the tension wires which match the observed jumps in the crack widthdata.

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

12

34

5

load cycles in mio.

crac

k w

idth

in m

m

0.00 0.05 0.10 0.15 0.20

23

45

67

load cycles in mio.

crac

k w

idth

in m

m

Figure 35: Crack width data resulting from the first two experiments of Maurer and Heeke(2010), left: 200 MPa, right: 455 MPa.

In the following, the wire breaking process will be modeled by a non-homogeneous Poissonprocess (NHPP). NHPPes are well-known in the literature. Meeker and Escobar (1998), forexample, give a historical overview of the literature employing NHPPes in reliabilty theory,see pp. 413–420, and present maximum likelihood estimators for several intensity functions.Sobczyk and Spencer (1992) introduce cumulative jump models based on homogeneous andnon-homogeneous Poisson processes to describe crack growth as a discontinuous random pro-cess.

We will see two approaches including the NHPP into a model for the crack width process. Firstly,in Section 6, the whole jump diffusion model in (2.1) will be presented. Secondly, in Section 7,a non-linear regression model based on a physically motivated function including the NHPP willbe proposed.

59

5. Non-homogeneous Poisson Process

In this section, we consider the counting process representing the wire breaking process.A Bayesian approach for NHPPes can be found in Kuo and Yang (1996) with applica-tion in software reliability, in Yuan et al. (2009) for modeling pitting corrosion in steamgenerator tubes and in Pievatolo and Ruggeri (2004) for reliability of complex repairablesystems. A good overview of the Bayesian analysis of NHPPes can be found in Ríos Insua et al.(2012).

5.1. Model Description, Inference and Prediction

The Poisson process is nested in (2.1) by setting b(ϕ, t, y) = 0, s(γ, t, y) = 0, h(η, t, y) = 1 andy0 = 0, whereby the differences Nti −Nti−1 , i = 1, ..., n, are independent and Poisson distributedwith parameter Λξ(ti)−Λξ(ti−1).

In Hermann et al. (2016a), a comparison of two different types of Λξ(t) can be found for the dataset, one for polynomial and one for exponential growth:

Λξ(t) =(

t

β

)α

, ξ = (α, β) ∈ [1, ∞) × (0, ∞), (5.1)

Λξ(t) = exp(at − b) − exp(−b), ξ = (a, b) ∈ (0, ∞) × R, (5.2)

where the first function (5.1) has the nice property that the derivative of Λξ(t) is the hazardrate of the Weibull distribution and the homogeneous case of the process is nested by settingα = 1. This specification is also known as power law in the literature, see for example Yu et al.(2007). The second function (5.2) represents an exponential intensity rate, that takes up theidea of the Paris-Erdogan equation in (1.1) with the special case m = 2. This equation wouldlead to a function f(t) = b exp(at) = exp(at − b) with b = exp(−b). This function has to bepulled down to zero in t = 0 which leads to the cumulative intensity exp(at − b) − exp(−b).Of course, one could try an approach based on the general Paris-Erdogan equation with athird parameter m. Since we have four or five observed event times it is difficult to estimatemore than two parameters. Therefore, we restrict to the two-parameter approach with m =2.

Estimation

Note that the counting process Nt, t ∈ [0, T ] is stochastically completely defined by its occur-rence times 0 < T1 < ... < TI ≤ T . In the following, we assume that the process is observed upto time T , i.e. T = tn.

The first occurrence time has probability

P (T1 > t|ξ) = P (Nt = 0|ξ) = e−Λξ(t),

which leads to the distribution function FT1(t) = 1 − e−Λξ(t) with density p(T1|ξ) = λξ(T1) ·e−Λξ(T1). For the second occurrence time, it is

P (T2 > t|T1, ξ) = P (Nt − NT1 = 0|T1, ξ) = e−(Λξ(t)−Λξ(T1))

60

and the resulting density p(T2|T1, ξ) = λξ(T2) · e−(Λξ(T2)−Λξ(T1)). For the general case, itis

p(Ti|Ti−1, ξ) = λξ(Ti) · e−(Λξ(Ti)−Λξ(Ti−1)), i = 1, ..., I, (5.3)

with T0 = 0 and Λξ(0) = 0. Since we investigate the counting process on a compact interval[0, T ] we also have the information that TI is the last occurrence up to time T , i.e. P (NT −NTI

=0|TI , ξ) = e−(Λξ(T )−Λξ(TI)). Altogether, we get the likelihood

p(N(n)|ξ) = p(T1|ξ) ·I∏

i=2p(Ti|Ti−1, ξ) · P (NT − NTI

= 0|TI , ξ) (5.4)

= e−Λξ(T1) ·I∏

i=1λξ(Ti) · e

∑I

i=2 −Λξ(Ti)+Λξ(Ti−1) · e−Λξ(T )+Λξ(TI)

= e−Λξ(T ) ·I∏

i=1λξ(Ti),

see also Ríos Insua et al. (2012).

Based on this likelihood, there is no obvious conjugate prior distribution for the whole vectorξ so we will use the MH algorithm with θ = ξ, see Section 2.2.1. In the case of the twointensity functions in (5.1) and (5.2), a proposal density with positive support is suitable. Inthe BaPreStoPro package, the lognormal density is implemented for that case. For other inten-sity functions with possibly negative parameters, the normal proposal density is implemented aswell.

In Hermann et al. (2016a), a non-informative approach without any prior specification is used.In the data set of Maurer and Heeke (2010), we have a maximum of 15 observed jumps. There-fore, prior knowledge has to be justified since it will have large influence with few observationsand a non-informative approach is very suitable.

Prediction

Based on the MH resulting samples ξ∗1 , ..., ξ∗

K ∼ p(ξ|N(n)), a predictive distribution can be ap-proximated.

There are two possibilities. Firstly, the occurrence times and their predictive distribution can beof interest. Secondly, we are interested in the Poisson process variables.

If we want to predict the occurrence times, both Algorithms 1 and 2 are suitable, because theprocess Ti, i = 1, 2, ... is a Markov process itself with transition density (5.3). For within-sample prediction, set Y

∗(k)0 = T, k = 1, ..., K, and otherwise Y

∗(k)0 = 0, k = 1, ..., K. With the

notations of the algorithm, it is θ = ξ.

If we want to predict the Poisson process variables, in the case of Algorithm 1 we start withY

∗(k)0 = 0, k = 1, ..., K, in the case of new-sample prediction and with Y

∗(k)0 = NT , k = 1, ..., K,

61

otherwise. The transition density is given by

P (Nti = l|Nti−1) =P (Nti − Nti−1 = l − Nti−1 |Nti−1)

= 1l − Nti−1

(Λξ(ti) − Λξ(ti−1))l−Nti−1 exp (Λξ(ti) − Λξ(ti−1))1N(l − Nti−1).

If we want to predict a whole trajectory of the process, it can be more efficient to samplethe event times, since the calculation of the predictive density is computationally costly andthe process will stay constant over periods. In the notation of Algorithm 2, it is θ = ξ

and starting values Y∗(k)

0 = 0, k = 1, ..., K, in the case of prediction for a new series andY

∗(k)0 = T, k = 1, ..., K, in the case of prediction for the further development, for exam-

ple the next jump. Here, the time points are τ1 = 1, ..., τM = M . In Algorithm 2, step(iii), we mentioned the case that also a critical value yc can be defined, up to which thesamples should run. This can be very suitable for the series of event times of the Poissonprocess. If prediction of the Poisson process up to a time point t∗ is the value of interest,this critical value would be yc = t∗. The prediction procedure leads to trajectory samplesT

∗(l)1 < T

∗(l)2 < ... < T

∗(l)M or, in the case of stopping at a critical value T

∗(l)1 < T

∗(l)2 < ... <

T∗(l)Ml

, l = 1, ..., L. The trajectory samples of the Poisson process can be calculated as fol-lows

N∗(l)t = j : T

∗(l)j ≤ t < T

∗(l)j+1, l = 1, ..., L. (5.5)

5.2. Application

In Hermann et al. (2016a), a simulation study each with 1000 series for both presented intensityfunctions (5.1) and (5.2) was made to validate the presented estimation and prediction pro-cedure for the Poisson process as well as for the jump diffusion process in Section 6. We here re-strict to the application for the presented data set.

We will concentrate on the first data series and compare the results for the two intensity rates in(5.1) and (5.2). For the Weibull function (5.1), we start the MH algorithm in ξ∗

0 = (5, 0.5) andfor the exponential function (5.2) in ξ∗

0 = (4, 1), both are chosen manually with a look on thewire breaking process in comparison to the cumulative intensity function. As mentioned, we hereuse a non-informative specification, i.e. p(ξ) = 1, because we do not have expert knowledge. Wehave only few data points and any prior specification would have a large effect and would have tobe justified.

For the estimation, we take into account the first 10 wire breaking times and calculate aprediction for the next five wire breaks. We draw 11 000 samples for the Weibull functionand take a burn-in phase of 1000 and a thinning rate of 10. For the exponential function,the chains need longer to converge and do not have a good mixing. We draw 110 000 sam-ples and with a burn-in of 10 000 and a thinning rate of 100, the posterior is approximatedwell.

For the prediction, we use Algorithm 2 for the event times and calculate for the wire breakingprocess the variables in (5.5). We see on the left side of Figure 36 the resulting prediction inter-vals for the eleventh up to the fifteenth event times based on the estimation with the first ten

62

event times. The corresponding prediction intervals for the NHPP, i.e. the wire breaking pro-cess, can be seen on the right side of the figure.

2 4 6 8 10 12 14

1.0

2.0

3.0

4.0

number of broken wire

load

cyc

les

in m

io.

data used for estimationdata to predictWeibullExponential

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

05

1015

load cycles in mio.

num

ber

of b

roke

n w

ires

data used for estimationdata to predictWeibullExponential

Figure 36: 95% prediction intervals for the first observed wire breaking series of the data set.Estimations are done with the first 10 observed event times, predictions calculatedfor the next 5 wire breaking times. Left: the predictions for the event times,right: prediction for the NHPP, both based on the exponential (5.2) (blue) and theWeibull (5.1) (red) intensity rates.

Equal to the findings in Hermann et al. (2016a), the exponential intensity rate yields a betterresult than the Weibull function. Nevertheless, the prediction intervals cover the true values forboth curves.

Until now, the intensity rate is a deterministic function. One suitable extension is to in-clude the heuristic fact that the same load is distributed on less wires after wire breaks.The stress range of the experiment would be considered as a respective physical parame-ter.

Outlook: self-exciting process

Searching for a possibility to include the stress range into the intensity rate, we start with heuris-tics. In Figure 37, we see the load of the I tension wires over the time. In the beginning of theexperiment, the load distributes on 35 wires. When one wire breaks, the load distributes on only34 and so on. Therefore, the idea is to use this fact in the intensity rate. But in the framework ofNHPPes, the intensity rate has to be deterministic and may not depend on the past of the pro-cess. To allow this, we turn to the class of self-exciting processes, see, for example, Ríos Insua etal. (2012).

We assume the wire breaking process to have distribution

Nt|Nr, r < t ∼ Pois(Λξ(t))

in time t, whereby Λξ(t) is a random process based on Nr, r < t.

To continue the idea from Figure 37, we assume for the cumulative intensity function

E[Nt|Nr, r < t] = Λξ(t) =Nt−∑i=1

fξ

(s

35−(i−1)

)(Ti − Ti−1) + fξ

(s

35−Nt−

)(t − TNt−) (5.6)

63

time

load

of t

he I−

th w

ire

T1 T2 T3 T4 ... TI

s/35

s/32

...

s/(35−(I−1))

Figure 37: Load over time for the Ith wire dependent on stress range s.

with T0 = 0 and Nt− = limst Ns. Derivation yields the intensity rate λξ(t) = fξ

(s

35−Nt−

).

This class of load sharing models is well-known in the reliability literature, see, for example,Kvam and Lu (2007); Nakagawa (2007); Crowder et al. (1994).

In Szugat et al. (2016), a simulation free prediction method has been developed and appliedto the data set of ten wire breaking series. It turns out, that for the function fξ = exp(−ξ1 +ξ2 log(x)), ξ ∈ R2

+, the model can predict the data very well. In Müller et al. (2016), two simula-tion free approximated prediction methods have been presented, one based on the delta methodand one based on data depth, and compared for three different functions of fξ. High coveragerates of the prediction intervals lead to the assumption that the model is a good choice for thedata set. At the moment, a master student, Laura Marquis, is working on the Bayesian analysisfor the model.

64

6. Jump Diffusion Process

We here consider the full model in (2.1). The idea of the jump process for the data is formulatedin Hermann et al. (2016a), where a two-stage modeling procedure for a SDE with explicitsolution is presented. Firstly, estimation and prediction is presented for the NHPP. Secondly,the posterior distribution of the parameters in the jump diffusion process, modeling the crackwidth, is approximated by samples drawn by a Gibbs sampler. These parameter samples andthe counting process samples, drawn from the predictive distribution from the first modelingstep, are employed to approximate the predictive distribution for the jump diffusion pro-cess.

In Hermann and Ruggeri (2016), the same process is considered based on an unobserved homoge-neous Poisson process (HPP) and applied on wear degradation in cylinder liners. The procedureis extended here for the non-homogeneous case.


In Hermann et al. (2016a), the SDE (2.1) with the special cases

b(ϕ, t, y) = ϕy, s(γ, t, y) = γy, h(η, t, y) = ηy

is presented, which is given by

dYt = ϕYt dt + γYt dWt + ηYt dNt, Y0 = y0. (6.1)

This is a stochastic extension of the simplified version of the Paris-Erdogan equation dYt =ϕYt dt as mentioned above in (1.1). In addition, under a homogeneous Poisson process, (6.1) isknown as Merton model, introduced in Merton (1976) for modeling stock returns. The SDE hasa unique strong solution, which is given by

Yt = y0 · exp(

ϕt − γ2

2 t + γWt + log(1 + η)Nt

)(6.2)

for η > −1. This can be seen by direct calculation using Itô’s formula for noncontinuous semi-martingales, see Øksendal and Sulem (2005), p. 7 in example 1.15. As mentioned, inferenceand prediction for this process is done in Hermann et al. (2016a) for observed NHPP and in Her-mann and Ruggeri (2016) for unobserved HPP.

We here restrict to the case of a SDE without explicit solution and consider the Eulerapproximation in (2.2) for the general SDE (2.1). In addition, we restrict to the case ofan unobserved Poisson process and mention the simplification of the case where it is ob-served.

65

Based on the approximation variables in (2.2), the parameter vector θ = (ϕ, γ2, η, ξ) and defini-tion ∆Ni = Nti−Nti−1 , we assume the Bayesian model

Yi|Yi−1, ∆Ni, θ ∼ N (Yi−1 + b(ϕ, ti−1, Yi−1)∆i + h(η, ti−1, Yi−1)∆Ni, s2(γ, ti−1, Yi−1)∆i),∆Ni ∼ Pois(Λξ(ti) − Λξ(ti−1)), i = 1, ..., n, (6.3)

ϕ ∼ N (mϕ, vϕ),γ2 ∼ IG(aγ , bγ),η ∼ N (mη, vη),ξ ∼ p(ξ) = 1.

Estimation

Define ∆Λξ,i := Λξ(ti)−Λξ(ti−1) and ∆Yi = Yti −Yti−1 . This leads to the likelihood conditionalon the Poisson process variables

p(Y(n)|θ, N(n))

=n∏

i=1

1√2π∆is(γ, ti−1, Yi−1)

exp(

−(∆Yi − b(ϕ, ti−1, Yi−1)∆i − h(η, ti−1, Yi−1)∆Ni)2

2s2(γ, ti−1, Yi−1)∆i

).

This likelihood and the prior distributions from the model definition suffice for the MH stepsinside the Metropolis within Gibbs sampler of the parameters ϕ, γ2, η. The likelihood of ξis the same as in Section 5 given in (5.4). Estimation of all the parameters ϕ, γ2, η and ξwill be implemented with a Metropolis within Gibbs sampler in the case of observed Poisson pro-cess.

The observability of the Poisson process is a special case of the data set of Heeke, Hermann etal. (2015). In the following, we consider a filtering procedure for the process given in (2.1)for the Euler approximated variables, respectively the Bayes model (6.3), in the case of anunobserved Poisson process. One step is included in the Gibbs sampler for the estimation ofN(n).

Since multivariate sampling always is a challenge, we will reduce the problem to the posterior ofthe independent differences ∆Ni ∼ Pois(∆Λξ,i). It is

∆Yi|∆Ni, Yi−1, ϕ, γ2, η ∼ N(b(ϕ, ti−1, Yi−1)∆i + h(η, ti−1, Yi−1)∆Ni, s2(γ, ti−1, Yi−1)∆i

).

Therefore, the posterior density for ∆Ni is given by

p(∆Ni|∆Yi, Yi−1, ϕ, γ2, η, ξ)

∝exp(−∆Λξ,i)(∆Ni)!

(∆Λξ,i)∆Ni1√

2π∆is(γ, ti−1, Yi−1)·

exp(

−(∆Yi − b(ϕ, ti−1, Yi−1)∆i − h(η, ti−1, Yi−1)∆Ni)2

2s2(γ, ti−1, Yi−1)∆i

),

which is not a density of a known distribution family. We propose the following sampling proce-dure, which is simply the inversion method, see Section 2.2.6, for the kth Gibbs sampler iteration.

66

Algorithm 6 (Filtering of the Poisson process variables)For all i = 1, ..., n − 1

(i) choose R ∈ N (large enough) for the candidate set 0, ..., R,

(ii) calculate posterior probabilities p0, p1, ..., pR with

pj = p(j|∆Yi, Yi−1, ϕ∗k−1, γ2∗

k−1, η∗k−1, ξ∗

k−1), j = 0, 1, ..., R,

(iii) draw u ∼ U(0,∑R

j=0 pj = 1) and calculate m = minr :∑r

j=0 pj ≥ u, and

(iv) set ∆Nki = m.

Additionally, set Nk(n) = (0, ∆Nk

1 , ...,∑n

i=1 ∆Nki ).

Of course, the proposed filtering procedure is computationally costly. One alternative wouldbe a simple MH step for all ∆Ni, i = 1, ..., n − 1. But the whole Gibbs sampler gets inef-ficient with a MH step, since all other parameters are updated for each k, except γ2 butthe acceptance rate is very high due to the calculated proposal density. Another alternativeis a particle filtering procedure, see Johannes et al. (2009), but this gets computationallymore expensive than our proposed method, since a number of particles has to be drawn. Inour model, a posterior of ∆Ni can be calculated, therefore, the proposed method is suit-able.

After choosing starting values ϕ∗0, η∗

0, γ2∗0 , ξ∗

0 , the Metropolis within Gibbs sampler explained inSection 2.2.3 unites all estimation steps for θ = (N(n), ξ, ϕ, γ2, η) as follows

Nk(n) ∼ p(N(n)|Y(n), ϕ∗

k−1, η∗k−1, γ2∗

k−1, ξ∗k−1),

ξ∗k ∼ p(ξ|Nk

(n)),

ϕ∗k ∼ p(ϕ|Y(n), Nk

(n), η∗k−1, γ2∗

k−1),

η∗k ∼ p(η|Y(n), Nk

(n), ϕ∗k, γ2∗

k−1),

γ2∗k ∼ p(γ2|Y(n), Nk

(n), ϕ∗k, η∗

k), k = 1, ..., K.

Of course, for special cases of functions b, s and h, full conditional posteriors could be an-alytically available, possibly with conjugate priors. But, for generalization, this will not be donehere.

In the case of observed NHPP variables, the first step of the Gibbs sampler can be ignored, the re-maining steps are equal.

67

Prediction

As already seen in the estimation procedure, N(n) is latent variable and, therefore, treated likean additional parameter. Similar to prediction of a random effect in Sections 3 and 4, we firstneed a prediction of N(n), which is already described in Section 5.

In the case of Algorithm 2, the predictive samples N∗(k)τm , k = 1, ..., K, m = 1, ..., M, from the

NHPP derived from (5.5) are used for the prediction of Y ∗ in t∗. With θ = (N(n), ξ, ϕ, γ2, η), thetransition density in Algorithms 1 and 2 is the density of the distribution given by

Y ∗m|Y ∗

m−1, Nτm , Nτm−1 , θ ∼N (Y ∗

m−1 + b(ϕ, τ∗m−1, Y ∗

m−1)∆∗ + h(η, τ∗m−1, Y ∗

m−1)(Nτm − Nτm−1), s2(γ, τ∗m−1, Y ∗

m−1)∆∗).

Starting values are Y∗(l)

0 := Yn for the case of prediction of the further development of the cur-rent series and Y

∗(l)0 := y0 in the case of prediction for a new series.

In the case of Algorithm 1, the starting values and the transition density are the same. Thereare two possibilities for the latent Poisson process variables. Firstly, the samples N

∗(k)τm , k =

1, ..., K, m = 1, ..., M , derived from (5.5), drawn with Algorithm 2 can be taken. Secondly, wecan sample predicted samples of the differences ∆Ni, since we do not need trajectories for the la-tent Poisson process variables to include in the stepwise Algorithm 1. In that case, we sample form = 1, ..., M :

∆N∗(k)m ∼ p(∆Nm|N(n)) ≈ 1

K

K∑k=1

exp(−(Λξ∗

k(τm) − Λξ∗

k(τm−1))

)(∆Nm)! (Λξ∗

k(τm) − Λξ∗

k(τm−1))∆Nm .

Comparison of explictit and Euler approximated process

In the following, we will investigate the difference between the explicit solution and the Eulerapproximated process by an example. Model (6.2) will be compared to the Euler approximatedSDE in (6.1). We arbitrarily set parameters ϕ = 0.5, η = 0.1, γ2 = 0.01 and ξ = (2, 0.2) with thepower law intensity rate function in (5.1). We start the process in y0 = 0.5 and simulate in t0 =0, ..., t100 = 1. Both processes can be seen in Figure 38 on the right with the corresponding Pois-son process on the left. The difference between the curves is small but recognizable in the lastquarter of the curve.

Now, we will treat the Euler simulated curve with the estimation procedure from the explicitsolution and the other way round for the simulated curve from (6.2) that will be estimatedwith posteriors based on the Euler approximation variables. In both cases, we keep thePoisson process variable as observed. In Figure 39, the resulting posterior densities forthe parameters ϕ, γ2 and η can be seen. Both solid lines are posterior densities dependenton the Euler simulated variables, and the dotted lines dependent on the variables in (6.2).At first, we can summarize, that both procedures applied to the corresponding data set,lead to similar posterior densities (red lines). The effect of the estimation on the wrongdata set is small for ϕ. The effect for γ2 can be ascribed to the biased estimation of η.The jump high parameter η seems to be sensitive to the choice of the model, Euler or ex-plicit.

68

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

20

t

Nt

0.0 0.2 0.4 0.6 0.8 1.0

12

34

56

t

Yt

ExplicitEuler

Figure 38: Left: simulated Poisson process, right: simulated process with formula (6.2) (solid)and with the Euler approximation (dotted).

0.0 0.2 0.4 0.6 0.8

01

23

4

φ

Den

sity

0.004 0.006 0.008 0.010 0.012 0.014 0.016

010

020

030

0

γ2

Den

sity

0.095 0.100 0.105 0.110 0.115

050

100

150

200

η

Den

sity

simulated Euler, estimated explicitsimulated explicit, estimated Eulersimulated and estimated explicitsimulated and estimated Euler

Figure 39: Comparison of posterior densities for ϕ (top left), γ2 (top right) and η (bottom left).Solid lines: posterior based on simulated series with Euler, dotted lines: based onsimulated series with explicit solution (6.2), black lines: estimation each with theother procedure, red lines: estimation each with the corresponding procedure.

With a look on Figure 40, it becomes clear that the biased estimation of η does not effect theprediction. The red lines mark the 95% prediction intervals. The predictive distribution wasapproximated with Algorithm 2 based on K = 104 samples from the posterior, in each stepL = 104 samples are drawn. The solid red lines are the corresponding prediction intervalsfor the solid curve, based on the estimations with the explicit solution, whereas the dottedred lines mark the prediction intervals for the dotted black curve, estimation and predictwith the explicit solution. The difference between the red curves, solid and dotted, is negligi-

69

ble.

0.0 0.2 0.4 0.6 0.8 1.0

12

34

56

t

Yt

ExplicitEuler

Figure 40: Prediction results; in black: simulated process with formula (6.2) (solid) and withthe Euler approximation (dotted), in red: 95% prediction intervals, solid: estimationand prediction with the Euler approximation for the solid black curve simulatedwith explicit solution, dotted: estimation and prediction with (6.2) for the dottedblack curve simulated with Euler.

6.2. Application

Again, we will apply the presented methods to the first series of the data set presentedin Heeke, Hermann et al. (2015). Due to the results from Section 5, we restrict here tothe exponential intensity rate (5.2). Similar to the work in Hermann et al. (2016a), weconsider the model in Section 6.1, but based on the Euler approximation instead of theexplicit solution (6.2). In the preparation of this thesis, both models, the Euler approximatedand the explicit process, were compared and the results were equal. We choose the func-tions

b(ϕ, t, y) = ϕy, s(γ, t, y) = γy, h(η, t, y) = ηy and Λξ(t) = exp(ξ1t − ξ2) − exp(−ξ2)

and the following starting values

η = 0.1, ϕ = 0.1, γ = 0.1, ξ = (1, 0.5),

again, manually by simulating the model and comparing the data series. In contrast tothe calculations in Hermann et al. (2016a), which were based on the conjugate priors forthe normal likelihood of the logarithmic variables, we here can also use a non-informativeapproach, as done in the following. As mentioned, we also did the calculations for theexplicit model and used the starting values for the prior means and the standard devia-tions and the results were equal. This is not surprising because of the large amount of datapoints.

70

At first, we consider the new filtering algorithm for the whole data series. For the whole Gibbssampler, we draw 101 000 chain iterations, take a burn-in of 1000 samples and a thinning rateof 10. In Figure 41, the estimated Poisson process variables are displayed in green lines, incomparison to the black line that is assumed to be the observed one. Because they are onlyobserved through sound measurements, it is impossible to say if one or more wires are brokenat the same time. This leads to a new measuring uncertainty, that is not taken into accountyet. The question arises, if it would be useful to filter the counting process instead of taking itas observed. We can see, that up to the fifth broken wire, the lines are equal. Afterwards, thefiltering algorithm assumes two broken wires at some times instead of only one for all eventtimes. In addition, we can say that there is only few variation between the green lines, whichmeans that the algorithm draws very often the same trajectory for the NHPP variables. Thiscould be a hint that the algorithm is sure about the estimation of the latent process, since theyare drawn independent from each other. Of course, these findings are fully dependent on themodel which is just an assumption for the underlying data series. In addition, this algorithmis dependent on the starting values, as discussed in Hermann and Ruggeri (2016), because thelikelihood has local maxima, in which the algorithm can end up dependent on the starting val-ues.

0.5 1.0 1.5 2.0 2.5 3.0 3.5

05

1015

20

load cycles in mio.

num

ber

of b

roke

n w

ires observed based on sound measurements

filtered

Figure 41: Filtered counting process variables from the Gibbs sampler (green), in black: thetrajectory observed based on sound measurements.

In the following, we want to investigate the prediction results. Again, we truncate the se-ries after the first 10 observed jumps for estimation and predict the further development.All algorithm specifications are the same as before. For the prediction, we use Algorithm2. In Figure 42, we compare the prediction results for the investigated model based onthe estimations with the observed Poisson process variables and the filtered one. Top,the whole series are shown. In the bottom figure, we only see the points to be predicted.One can see, that the intervals based on the samples from the Gibbs sampler with thefiltering step for the counting process, in blue, are smaller and the median of the predic-tive distribution, i.e. the dotted blue line, gets closer to the observed data than the reddotted line, which is the median of predictive distribution based on the observed Poisson pro-

71

cess.

0.5 1.0 1.5 2.0 2.5 3.0 3.5

12

34

load cycles in mio.

crac

k w

idth

data used for estimationdata to predictNHPP observedNHPP latent

3.25 3.30 3.35 3.40

2.5

3.0

3.5

4.0

4.5

load cycles in mio.

crac

k w

idth

data to predictNHPP observedNHPP latent

Figure 42: 95 % prediction intervals with Algorithm 2 for the first series with the jumpdiffusion model. In blue, prediction results based on the Gibbs sampler withfiltering Algorithm 6 and in red, based on the Gibbs sampler with fixed, i.e.observed, Poisson process variables. Top: whole series, bottom: zoomed in theinteresting area of predicted values.

72

7. Jump Regression Model

During the interdisciplinary work in the project B5 “Statistical methods for damage processesunder cyclic load” of the collaborative research center 823, a physically motivated function hasbeen developed to describe the crack width based on the observed jump process. This idea isformulated in the PhD thesis of Guido Heeke, see Heeke (2016), p. 129, Section 6.1.3, or the jointwork Heeke, Hermann et al. (2015).

Based on the observed jump process Nt, t ∈ [0, T ], the concrete crack growth process can be re-calculated with the time-dependent crack width function

w(t) = (1 − k(t)) · (∆σ(t))2 · AP (t)0.72 · π · fctm · EP ·

√AP (t)

with

k(t) = kl + (ks − kl) · et·c, AP (t) = AP ·(

1 − Nt

35

),

∆σ(t) = σ2 · AP

AP (t) − σ1 = σ2 · 11 − Nt

35− σ1,

kl, ks, c, AP , σ1, σ2, fctm, EP constants, which are approximately known, but not exactly.

The first step is to transform the function to a more flexible parametric version with estimableparameters, because some of the constants in that function are approximate values. We ob-tain

w(t, Nt, ϕ) = (ϕ1 − ϕ2 · exp(−t · ϕ3)) · 1√h(Nt)

(h(Nt) − ϕ4)2 ,

where ϕ1 = (1−kl)·√

Ap·σ22

0,72·π·fctm·Ep, ϕ2 = (ks−kl)·

√Ap·σ2

20,72·π·fctm·Ep

, ϕ3 = −c, ϕ4 = σ1σ2

and h(x) = 11− x

35. The

function h(x) = 11− x

35= 35

35−x is motivated as follows. As seen in Section 5, in the be-ginning of the experiment, the load is distributed on 35 wires. With every broken wire,function h(x) makes a jump and stays constant in-between. The corresponding calculationcan be found in the Appendix B. Of course, other parameterizations are possible. How-ever, if more than four parameters are chosen, they will not be identifiable. We haveto mention, that this function only describes the increments of the crack growth, notthe position itself. In practice, the starting value has to be subtracted from the dataand w(t, Nt, ϕ) − w(t0, Nt0 , ϕ) has to be used. For notation simplicity, we skip this facthere.


The simplest possibility to include this function into a stochastic model is a non-linear regres-sion model dependent on the Poisson process variables. The corresponding Bayes model is given

73

by

Yi = w(ti, Nti , ϕ) + ϵi,

ϵi ∼ N (0, σ2) i.i.d.,Nti ∼ Pois(Λξ(ti)), i = 0, ..., n,

ϕ|mϕ, Vϕ ∼ N (mϕ, Vϕ),σ2|aσ, bσ ∼ IG(aσ, bσ),

ξ ∼ p(ξ) = 1.

Here, ϵi and Nti have to be stochastically independent. Conditional on the counting processvariables Nt0 , Nt1 , ..., Ntn , we have a regression model whose parameters can be estimated basedon the likelihood. Bayesian inference for non-linear regression models can be found in Carlin andLouis (2009). Following from

Yi|Nti , ϕ, σ2 ∼ N(w(ti, Nti , ϕ), σ2

), i = 0, ..., n,

the conditional likelihood is a product of normal distribution densities. Because of the non-linearfunction we have no closed form of the posterior of ϕ. Therefore, a Metropolis within Gibbs sam-pler will be used with θ = (ϕ, σ2, ξ), whereby

σ2|Y(n), N(n), ϕ ∼ IG(

aσ + n + 12 , bσ + 1

2

n∑i=0

(Yi − w(ti, Nti , ϕ))2)

.

The Gibbs sampler unites the three estimation steps as follows

ϕ∗k ∼ p(ϕ|Y(n), N(n), σ2∗

k−1),σ2∗

k ∼ p(σ2|Y(n), N(n), ϕ∗k),

ξ∗k ∼ p(ξ|N(n)), k = 1, ..., K.

Here, we have expert knowledge for the parameter ϕ, which can be used for the prior parameters.If we denote the vector resulting from the physical constants given by the expert with ϕ0, we as-sume mϕ =

√diag(Vϕ) = ϕ0.

With the samples ϕ∗1, σ∗2

1 , ..., ϕ∗K , σ∗2

K from the posterior distribution we can approximate the pre-dictive distribution

p(Y ∗i |Y(n), N(n)) (7.1)

=∫

p(Y ∗i |ϕ, σ2, N∗

ti) · p(ϕ, σ2|Y(n), N(n)) · p(N∗

ti|N(n)) d(ϕ, σ2, N∗

ti)

≈ 1K

K∑k=1

p(Y ∗i |ϕ∗

k, σ∗2k , N

∗(k)ti

), i = 0, ..., n,

with N∗(k)ti

, k = 1, ..., K, being the samples from the predictive distribution of the Poisson pro-cess defined in (5.5). The distribution in (7.1) can be approximated with rejection sampling orinversion method, see Section 2.

74

In Heeke, Hermann et al. (2015), we found out, that the first part of the regression functionϕ1 − ϕ2 · exp(−t · ϕ3) only describes the very beginning of the series and stays constant for therest. However, we are mainly interested in the prediction for the last part of the experimentand we lose efficiency by estimating needless parameters. Therefore, we simplify the function wto

ws(Nt, ϕ) = ϕ1 · 1√h(Nt)

(h(Nt) − ϕ2)2

and estimate the parameters similar to the regression model defined above. For the esti-mation procedure, only the likelihood has to be changed in the MH step of the posteriorsampling for ϕ. For the predictive distribution (7.1), only the new likelihood has to beused.

Outlook: filtering of the unobserved Poisson process

For this approach, again the observed Poisson process is very special for the data set of Heeke,Hermann et al. (2015). Filtering the unobserved Poisson process is not as self-evident as in thejump diffusion model, since the likelihood does not only depend on the independent differencesof Nt. One possibility would be the particle Gibbs sampler with the CSMC algorithm stepfor the latent N(n) described in Section 4 or the particle filtering, developed for jump diffu-sions in Johannes et al. (2009), might be extendable to the current model, but this will be futurework.

7.2. Application

Analogous to the last two sections, we here consider the first series of the crack width data,seen in Figure 35. In contrast to the evaluations in Heeke, Hermann et al. (2015), we hereuse the exponential intensity rate (5.2) due to the results in Section 5 and in Hermannet al. (2016a). For comparison with the jump diffusion model, we here as well truncatethe series after the 10th jump and use the first part for estimation and predict the furtherdevelopment. Since we have seen two functions w and ws, we will compare the predictionresults with both curves. For the first function, as mentioned, we have expert knowledge thatleads to the starting values ϕ∗

0 = (4.36, 2.03, 20, 0.64). For the simplified curve, we choose(ϕ∗

1,0 − ϕ∗2,0 · exp(ϕ∗

3,0), ϕ∗4,0) as starting parameter. For the intensity rate parameter, we again

start with ξ∗0 = (1, 0.5) and for the variance, we choose γ2 = 1. Except for ξ, all prior

parameter are chosen, so that prior mean and standard deviation is equal to the starting val-ues.

For the prediction, we first sample event times with Algorithm 2 and calculate the pre-dictive samples for the wire breaking process in (5.5). Afterwards, we sample from thepredictive distribution in (7.1). The pointwise 95% prediction intervals can be seen in Fig-ure 43, the whole series on the left and only the points to be predicted on the right side.In addition, the prediction intervals with the jump diffusion model from Section 6 basedon the estimations with observed Poisson process are displayed for comparison. At first,we can say that the jump diffusion model leads to smaller intervals and up to the fourthjump the prediction intervals cover the true values. The jump regression model leads to

75

larger intervals that cover all observed data points. The blue line shows the results forthe simplified curve which are smoother than these from the original curve. The reasonis a larger estimated variance. In the model based on function w, we have a posteriormean of 0.004 and for ws it is 0.046, which is much larger and the intervals look smoother.But, even with a higher estimated variance, the intervals are smaller and cover the true val-ues.

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

01

23

45

67

load cycles in mio.

crac

k w

idth

data used for estimationdata to predictjump regressionjump regression, simplified curvejump diffusion

3.25 3.30 3.35 3.40

23

45

67

load cycles in mio.

crac

k w

idth

data to predictjump regressionjump regression, simplified curvejump diffusion

Figure 43: Prediction for the first observed series of the crack width data set. Estimationsare done with the series up to 10 observed event times, predictions are made forfurther development. Left: whole series, right: zoomed in the interesting area, bothbased on the exponential (5.2) intensity rate. In addition: comparison with thejump diffusion model (in green).

76

8. Description of the R Package BaPreStoPro

In this thesis, we have seen lots of models with different estimation and prediction procedures.Beside the theoretical properties, the practical realization is important as well. Implementationcan be time-consuming and, for example for the particle Gibbs sampler, very complicated anderror-prone. An R package is a convenient tool for users with real data. The package presentedin the following, provides solutions for the inference described in this thesis. In detail, foreach presented model, an S4 class is constructed, wherein all the model details are stored. Foreach model class, simulation, estimation and prediction methods are available. In addition,for the estimation output, plot functions are implemented for a convenient visualization of the re-sults.

In addition to the models described before, a non-linear regression model with the correspond-ing hierarchical version is also implemented for an easy comparison. These models are givenby

Yi|ϕ, γ2 ∼ N (f(ϕ, ti), γ2s2(ti)), i = 0, ..., n,

ϕ|mϕ, Vϕ ∼ N (mϕ, Vϕ), Vϕ = diag(v21, ..., v2

p),γ2|aγ , bγ ∼ IG(aγ , bγ)

for the single series model and by

Yij |ϕj , γ2 ∼ N (f(ϕj , ti,j), γ2s2(ti,j)), i = 1, ..., nj ,

ϕj |µ, Ω ∼ N (µ, Ω) i.i.d., j = 1, ..., J, Ω = diag(ω21, ..., ω2

p),µ|mµ, Vµ ∼ N (mµ, Vµ), Vµ = diag(v2

1, ..., v2p),

ω2r |aω,r, bω,r ∼ IG(aω,r, bω,r), r = 1, ..., p,

γ2|aγ , bγ ∼ IG(aγ , bγ)

for the hierarchical model.

The package BaPreStoPro is structured as follows. For each of the models, there is one S4 modelclass as can be seen in the second column of Table 5.

They all are constructed by the function set.to.class with the main input parameterclass.name, denoting the name of the model class, and the input parameters parameter,prior, start, b.fun, s.fun, h.fun, sT.fun, y0.fun, fun, Lambda andpriorDensity. An overview is given in Table 6. prior is optional, if missing, it is calculatedfrom the list entries of parameter so that mean and standard deviation of the prior distributionare equal to the values of parameter. Analogously, start, if missing, is set equal to parameter.b.fun defines the function b(ϕ, t, y) in all (jump) diffusion models, s.fun defines the functions(γ2, t, y), and h.fun the function h(θ, t, y) in the jump diffusion model. sT.fun stands forthe variance function s(t, y) in the (mixed) diffusion models as well as for s2(t) in the (mixed)regression models. y0.fun denotes the starting value function y0(ϕ) for the hidden and mixeddiffusion models. fun is the regression function w(t, Nt, ϕ) in the jump regression and f(ϕ, t) inthe (mixed) regression model. Lambda is for all models, containing the NHPP, the cumulativeintensity function Λξ(t). priorDensity is for the jump diffusion model a list of prior densities,

77

Model Class objectsimulate & estimate plot & predict

diffusion Diffusion est.Diffusionhierarchical diffusion mixedDiffusion est.mixedDiffusionhidden diffusion hiddenDiffusion est.hiddenDiffusionhierarchical hidden diffusion hiddenmixedDiffusion est.hiddenmixedDiffusionNHPP NHPP est.NHPPjump diffusion jumpDiffusion est.jumpDiffusionjump diffusion (6.2) Merton est.Mertonjump regression jumpRegression est.jumpRegressionregression Regression est.Regressionhierarchical regression mixedRegression est.mixedRegression

Table 5: Overview of the class names for the corresponding methods in the different models.For each model in the left column, a model class is available for which a simulationand an estimation method is implemented. For the resulting estimation class objectsin the right column, a plot and a prediction method is realized.

input parameter meaningparameter values for the starting and the prior values, if they are not specifiedprior list of prior values, if not specified, chosen so that mean and standard

deviation is equal to the values in parameterstart list of starting values, if not specified, equal to the values in parameterb.fun function b(ϕ, t, y) in all (jump) diffusion modelss.fun function s(γ2, t, y) in the jump diffusion modelh.fun function h(η, t, y) in the jump diffusion modelsT.fun variance function s(t, y) in the (mixed) diffusion models as well as for

s2(t) in the (mixed) regression modelsy0.fun starting value function y0(ϕ) for the hidden and mixed diffusion modelsfun regression function w(t, Nt, ϕ) in the jump regression and f(ϕ, t) in the

(mixed) regression modelLambda cumulative intensity function Λξ(t)priorDensity list of prior densities in the jump diffusion models

Table 6: Overview of the input parameters of the function set.to.class.

for the Merton model and the NHPP a prior density for ξ, defaults are non-informative ap-proaches.

For each of the model classes, a simulation method simulate is available. With the re-spective data set, method estimate can be applied to the model class with the additionalinput parameters t, i.e. the time vector (τ0, τ1, ..., τM ), data, denoting the respective dataset, which can be a vector, a matrix or a list in the respective model, and the number ofMarkov chain iterations, nMCMC. In addition, there is the possibility to choose the proposalstandard deviation for the MH algorithms, propSd. If this is not specified, it is chosenproportional to the starting values, dependent on the model. If adapt = TRUE, this proposalstandard deviation is just the starting one and is adapted after every 50 iterations, if theacceptance rate is smaller than 30% or larger than 60%, see for adaptive MCMC Rosenthal(2011). The proposal density itself can also be chosen between the normal and the log-normal

78

distribution, proposal = "normal" or "lognormal". Output of the estimate method is anew S4 class composed of "est." and the model class name, see Table 5 in the third col-umn.

This estimation class object is an input parameter in the method predict. One of the maininput parameters is pred.alg and denotes the prediction algorithm. Algorithm 1 (default) isnamed "Distribution". Algorithm 2 is implemented with the pred.alg = "Trajectory".As mentioned, two additional sampling prediction algorithms are implemented. With pred.alg= "simpleTrajectory", the presented Algorithm 3 is implemented and with pred.alg ="simpleBayesTrajectory", Algorithm 4 is conducted. The second main input parameter iswhich.series, where one can decide between "new", i.e. τ0 = t0, and "current", i.e. τ0 = tn.An overview can be seen in Table 7.

input parameter choices description

pred.alg

"Distribution" Algorithm 1"Trajectory" Algorithm 2"simpleTrajectory" Algorithm 3"simpleBayesTrajectory" Algorithm 4

which.series"current" τ0 = tn

"new" τ0 = t0

Table 7: Overview of the main input parameters of prediction method predict.

In the following, we go into detail through the examples given during the thesis.

Diffusion

The process given in Example 1 is defined by the functions s(γ, t, y) = γs(y) = γy andb(ϕ, t, y) = ϕy, and the parameters y0 = 1, ϕ = 2 and γ = 1. We translate that to R-code:

b.fun <- function(phi, t, y) phi * ysT.fun <- function(t, y) ymodel <- set.to.class("Diffusion", parameter = list(phi = 2, gamma2 = 1),

b.fun = b.fun, sT.fun = sT.fun)t <- seq(0, 1, by = 0.01)Y <- simulate(model, seed = 123, t = t, y0 = 1)

The simulated trajectory can be seen in Figure 44. There is an optional input parametermw, default is 1, whose inverse serves as mesh width for simulating time-continuity of theprocess. For example, if mw = 10, nine points are equidistantly added between each two pointsof t and the process is simulated on the finer time grid. Afterwards, every tenth simulatedpoint is given out as simulated series. We here restrict to mw = 1 to avoid biased estima-tions.

Estimation is done by the method estimate as follows:

79

0.0 0.2 0.4 0.6 0.8 1.0

24

68

1012

t

Y

Figure 44: Trajectory created with method simulate for the diffusion model.

est <- estimate(model, t, Y, 21000)plot(est)

Output of the method estimate for class object Diffusion is a new class object est.Diffusion,which contains all information of the model as well as the data and the estimation results.In addition it contains a proposal for the thinning rate and the burn-in phase. Both areinput parameters in the following methods, but if they are missing, the proposed values aretaken.

The proposed burnIn is calculated by dividing the Markov chains into 10 blocks and calculatethe 95% credibility intervals and the respective mean. Starting in the first one, the block is takenas burn-in as long as the mean of the current block is not in the credibility interval of the follow-ing block or vice versa. The thinning rate is proposed by the smallest lag, which leads to a chainautocorrelation of less than 80%. It is not easy to automate these choices, so it is highly rec-ommended by the author to verify the chains manually, for example with the R package coda, seePlummer et al. (2006).

The Markov chains are visualized with method plot, which has several options for input param-eter style, namely chains, acf and density. The first one can be seen in Figure 45, the sec-ond one shows the autocorrelation functions for the chains and the third one the resulting poste-rior densities. Another important input parameter is reduced. If it is set to TRUE, the chains arereduced by the burn-in phase and the thinning rate, both optional input parameters but also pro-posed by the estimation method.

Class object est.Diffusion is used for method predict. Thinning rate and burn-in phase arealso input parameters in the method predict.

burnIn <- 1000thinning <- 2 # possibly small, only for example

80

0 5000 10000 15000 20000

−1

01

23

45

6

Index

φ 1

0 5000 10000 15000 20000

0.6

1.0

1.4

Index

γ2Figure 45: Markov chains produced by method estimate and visualized with method plot

for the diffusion process in Figure 44.

pred <- predict(est, burnIn = burnIn, thinning = thinning,b.fun.mat = function(phi, t, y) phi[,1]*y)

Since the function b.fun is stored in the est.Diffusion object, the input parameterb.fun.mat is not necessary but the algorithm gets much faster with the matrix-wise functiondefinition. If plot.prediction = TRUE, the 1-level prediction intervals are plotted with theobservation series used for estimation. This leads to Figure 46(a). Default value for the input pa-rameter pred.alg, i.e. the prediction algorithm, is "Distribution", which denotes the point-wise prediction Algorithm 1.

0.0 0.2 0.4 0.6 0.8 1.0

010

2030

4050

t

Yt

(a) which.series = "new"

0.0 0.2 0.4 0.6 0.8 1.0 1.2

010

2030

40

t

Yt

(b) which.series = "current"

Figure 46: 95% prediction intervals of method predict, visualized, if input parameterplot.prediction is TRUE (default).

One important input parameter of method predict is which.series. Default is "new",which leads to prediction of a new series, starting in the first point of the observation se-ries. The second option is "current", which yields a prediction of the further developmentof the observed series. If no vector of time points is specified (also an input parameter,

81

t), the input variable M2pred can be used (default: 10), which is the number of points thatwill be predicted with the time distances of the observation time vector. This can be seen as fol-lows:

pred.current <- predict(est, b.fun.mat = function(phi, t, y) phi[,1]*y,which.series = "current", M2pred = 20)

The result can be seen in Figure 46(b). Default values for the starting points of the prediction isfor which.series = "new" the starting point of the observed series, and for which.series ="current" the last observation point. If desired, this can be changed with the input parametery.start, which denotes the point Y

∗(k)0 , k = 1, ..., K.

Algorithm 2 is implemented with the input parameter pred.alg = "Trajectory". Here, itcould also be desired to choose L = K, implemented with sample.length. Default is L = K.In Hermann (2016b), the two presented prediction methods are compared to two commonly usedsampling procedures. For the first one, pred.alg = "simpleTrajectory", the point estimates,i.e. the posterior mean, are plugged into the model definition and sample.length, default isthe number of posterior samples K, trajectories are drawn. For the second one, pred.alg ="simpleBayesTrajectory", each of the K posterior samples is plugged into the model def-inition and one trajectory is drawn. If sample.length is specified and smaller than K, the firstsample.length posterior samples are taken.

There is one additional feature of method predict. For comparison, a one step Euler predictionis implemented, which would be similar to Algorithm 1 with M = 1. In that case, sampling isnot necessary and quantiles can be calculated directly. With Euler.interval = TRUE, foreach of the time points, level/2 and 1-level/2 quantiles of the predictive distribution are calcu-lated.

Mixed diffusion

We simulate a model similar to Example 1, but in a mixture model. Here, we can also assumethe starting point to be random, i.e. y0(ϕ) = ϕ2 and b(ϕ, t, y) = ϕ1y. Now, for similarmodel as before, we choose µ = (2, 1) and Ω = diag(1, 0.04). We sample J = 50 series as fol-lows.

J <- 50mu <- c(2, 1)Omega <- c(1, 0.04)phi <- cbind(rnorm(J, mu[1], sqrt(Omega[1])), rnorm(J, mu[2], sqrt(Omega[2])))gamma2 <- 1y0.fun <- function(phi, t) phi[2]b.fun <- function(phi, t, y) phi[1] * ysT.fun <- function(t, y) y

model <- set.to.class("mixedDiffusion", y0.fun = y0.fun,b.fun = b.fun, sT.fun = sT.fun,

82

parameter = list(phi = phi, mu = mu, Omega = Omega, gamma2 = gamma2))

t <- seq(0, 1, by = 0.01)series <- simulate(model, seed = 123, t = t, plot.series = FALSE)

The variable series is a J × 101 matrix. To estimate with the whole data set, this would alsobe input parameter for the method estimate. In this case, all series would be equally long, i.e.n1 = ... = nJ . Otherwise, it is possible to include a list.

t.list <- lapply(1:(J-1), function(i) t)t.list[[J]] <- t[1:50]data.list <- lapply(1:(J-1), function(i) series[i,])data.list[[J]] <- series[J, 1:50]est <- estimate(model, t.list, data.list, 11000)

For the hierarchical models, there is an additional style option, called "int.phi" mean-ing the credible intervals for the random effects. The point in the middle of the intervalsmarks the posterior median. In addition, input parameter par2plot, which is a logical vec-tor, contains TRUE for every parameter to be plotted and FALSE otherwise. For comparisonwith starting or prior values or, in a simulation study with the chosen values, phi itself can be in-cluded.

plot(est, style = "int.phi", phi = phi, par2plot = c(T, F))legend("bottomleft", c("true value", "posterior median",

"95% credible interval"),col = c(2, 1, 1), lty = c(-1, -1, 1), pch = c(20, 20, -1), cex = 0.7)

The result can be seen in Figure 47. Except one, all credible intervals include the simulated val-ues.

0 10 20 30 40 50

−1

01

23

45

Index

φ 1

true valueposterior median95% credible interval

Figure 47: 95% credible intervals for the first component of the random effect in the mixedmodel, created with plot(..., style = "int.phi").

Now, we want to make a prediction for the further development of the last, i.e. J , se-ries.

83

pred <- predict(est, t = t[50:101], which.series = "current", ind.pred = J,b.fun.mat = function(phi, t, y) phi[,1]*y,burnIn = 1000, thinning = 2)

lines(t[50:101], series[J, 50:101], lty = 2)

We can see the result in Figure 48. The solid black line marks the part of the Jth series that isused for estimation. The dotted line is the simulation part that is skipped in the estimation andprediction is made for. The red lines are the 95% prediction intervals. Output of the methodpredict is a matrix 5 000×52 matrix containing 5 000 samples of the predictive distribution ineach of the 52 time points.

0.0 0.2 0.4 0.6 0.8 1.0

12

34

t

Yt

Figure 48: 95% prediction intervals for the last series of the mixed diffusion model.

Jump diffusion

We consider the following process.

Example 3We will give an example by the process (2.1) with b(ϕ, t, y) = ϕ, s(γ, t, y) = γ, h(η, t, y) = ηy

and Λξ(t) =(

tξ2

)ξ1 . It is approximated by

Yi = Yi−1 + ϕ ∆i + γ√

∆i ζi + ηYi−1∆Ni, i = 1, ..., n,

with Y0 = y0, ζi ∼ N (0, 1) and ∆Ni ∼ Pois(Λξ(ti) − Λξ(ti−1)). We fix y0 = 0.5, ϕ = 0.2, γ =0.5, η = 0.1, ξ = (2, 0.2) and simulate random trajectories in t ∈ [0, 1].

We translate the example to the language of the package. In the package, parameter η is calledtheta, all others are the same as in the text. This leads to

84

b.fun <- function(phi, t, y) phis.fun <- function(gamma2, t, y) sqrt(gamma2)h.fun <- function(theta, t, y) theta * yLambda <- function(t, xi) (t/xi[2])^xi[1]model <- set.to.class("jumpDiffusion",

parameter = list(phi = 0.2, gamma2 = 0.25, theta = 0.1, xi = c(2, 0.2)),b.fun = b.fun, s.fun = s.fun, h.fun = h.fun, Lambda = Lambda)

t <- seq(0, 1, by = 0.01)series <- simulate(model, seed = 123, t = t, y0 = 0.5, plot.series = FALSE)

In the jump diffusion model, there are two possibilities. In the first case, the Poisson process isnot observed. This would be implemented by

est.hidden <- estimate(model, t, series$Y, 11000)plot(est.hidden, par2plot = c(rep(F, 5), T), reduced = T)lines(t, series$N, lwd = 2, col = 2)legend("topleft", c("filtered", "simulated"), col = 1:2, lwd = 1:2)

The method plot has input paramter par2plot, which is a logical vector, which contains TRUEfor every parameter to be plotted and FALSE otherwise. The order is (phi, theta, gamma2, xi,N). This leads to Figure 49, which compares the filtered trajectories of the Poisson process withthe simulated series in red.

0.0 0.2 0.4 0.6 0.8 1.0

010

2030

40

t

N

filteredsimulated

Figure 49: Example 3: Estimation of the latent variable N(n), i.e. the Poisson process vector,in red: the simulated series.

In the case of observed Poisson process variables, both observation Y(n) and N(n) have to be jointin a list with entries Y and N.

For the prediction of the jump diffusion with the presented methods in Section 2, it canbe decided for each of the processes, the latent Poisson process as well as for the jump

85

diffusion, if Algorithm 1 or 2 is desired. We have already seen the input parameter pred.alg,which is here used for the prediction of the jump diffusion. The additional input parameterpred.alg.N denotes the prediction algorithm for the Poisson process and can be chosen between"Distribution" and "Trajectory". In the first case, pointwise sampling from the predictivedistribution of the independent differences is conducted. In the second case, Algorithm 2 is runfor the event times of the process. Default is "Trajectory", since the combination pred.alg ="Trajectory" and pred.alg.N = "Distribution" does not make sense and is avoided to beaccidentally chosen. The parameter Lambda.mat again is, similar to b.fun.mat in the diffusionmodel, to fasten the sampling algorithm.

86

9. Discussion and Outlook

In this thesis, a new Bayesian prediction approach for stochastic processes, general diffusionsas well as jump diffusions, based on stochastic differential equations, has been developed. Anoverview of the considered models is given in Figure 50.

In the case of data following a continuous model, the diffusion, see Section 3, or the hiddendiffusion model, see Section 4, are suitable. The hidden diffusion model has the advan-tage of different uncertainty structures, one of the diffusion model itself, where the variancegrows over time, and one of the additional error, whose variance stays constant over time.This can be suitable in applications, where the crack growth process itself cannot be di-rectly observed and a clear distinction between measuring error and random growth processis desired. On the other side, estimation for this model is time-consuming. A particlefiltering algorithm has been applied for the latent diffusion process and joint with the es-timations of the parameter in a particle Gibbs sampler. Here, in a first step, predictionsare made for the latent diffusion process and, in a second step, these predictive variablesare included in the predictive distribution of the observation variables. Both, estimationand prediction, have longer runtimes for the hidden diffusion model than for the observeddiffusion. Hence, if a fast evaluation is necessary, the observed diffusion model is moreappropriate with the disadvantage of a more rough description of the uncertainty struc-ture.

In addition, we have seen both models in their hierarchical version. If several series areavailable, stemming from experiments on different individuals, the hierarchical is prefer-able to the single series model. But, if time plays a role in the evaluation, estimationis much faster for only one series instead of the whole hierarchical model for many se-ries. Hence, if only a first, and fast, look on a model is desired, the single series is a goodchoice.

In the case of data containing jumps, one of the non-continuous models is suitable. If the count-ing process itself is of interest, as it is, for example, for the number of broken tension wires overtime, the Poisson process presented in Section 5 is a good choice. In some cases, a deterministicintensity function does not describe the whole uncertainty structure, where a load sharing modelcan be more appropriate.

For modeling data following a piecewise continuous process containing jumps, like, forexample, the concrete crack width process, a jump diffusion or a jump regression modelcan be suitable dependent on the underlying uncertainty structure. In the jump diffusionmodel, the variance of the Brownian motion part can grow over time or proportional tothe current process or both, which makes the model very flexible. In the jump regressionmodel, the curve is assumed to follow a deterministic curve, dependent on the NHPP, andthe additive error has constant variance over time, whereby the extension with a time-dependent variance function is possible and already implemented in the package BaPreSto-Pro.

In both jump models, the Poisson process is latent and in most cases, this is not observed.For the jump diffusion, a filtering algorithm is already available and implemented in the package.For the jump regression model, this is future work.

87

Bay

esia

n

Pre

dic

tio

n

No

n-c

on

tin

uo

us

Pro

cess

es

(C

hap

. II)

Co

nti

nu

ou

s P

roce

sses

(C

hap

. I)

Dif

fusi

on

Pro

cess

(S

ec. 3

) H

idd

en

Dif

fusi

on

M

od

el (

Sec.

4)

Pois

son

Pro

cess

(S

ec. 5

) Ju

mp

Dif

fusi

on

P

roce

ss (

Sec.

6)

Jum

p R

egre

ssio

n

Mo

del

(Se

c. 7

) -

Dat

a:

con

tin

uo

us,

w

ith

ou

t ju

mp

s -

Mo

del

: d

iffu

sio

n

mo

del

, on

ly

on

e u

nce

rtai

nty

st

ruct

ure

-

Pre

dic

tio

n:

bas

ed o

n t

he

po

ster

ior

sam

ple

s fr

om

th

e G

ibb

s sa

mp

ler

-D

ata:

wit

ho

ut

jum

ps

bu

t w

ith

a

hig

h m

easu

rin

g er

ror

-M

od

el: d

iffu

sio

n is

o

nly

no

isily

o

bse

rved

, tw

o

dif

fere

nt

un

cert

ain

ty

stru

ctu

res

-Es

tim

atio

n: p

arti

cle

Gib

bs

sam

ple

r,

lon

g ru

nti

mes

b

ecau

se o

f fi

lter

ing

algo

rith

m

-P

red

icti

on

: (1

.) f

or

the

late

nt

dif

fusi

on

b

ased

on

th

e p

ost

erio

r sa

mp

les,

(2

.) f

or

the

ob

serv

atio

n

vari

able

s b

ased

on

th

e p

red

icti

ve

sam

ple

s o

f th

e d

iffu

sio

n a

nd

th

e p

ost

erio

r sa

mp

les

Hie

rarc

hic

al

Mo

del

-D

ata:

exp

erim

ents

of

seve

ral i

nd

ivid

ual

s -

Mo

del

: all

seri

es

follo

w t

he

sam

e p

roce

ss b

ut

wit

h

vary

ing

par

amet

er

fro

m a

mix

ture

(h

ere:

n

orm

al)

dis

trib

uti

on

-

Pre

dic

tio

n: b

ased

on

th

e p

ost

erio

r sa

mp

les

fro

m t

he

(par

ticl

e)

Gib

bs

sam

ple

r an

d t

he

pre

dic

tive

sam

ple

s fo

r th

e ra

nd

om

eff

ect

-D

ata:

co

un

tin

g p

roce

ss, e

.g.

nu

mb

er o

f b

roke

n w

ires

o

ver

tim

e -

Mo

del

: No

n-

ho

mo

gen

eou

s,

det

erm

inis

tic

inte

nsi

ty

fun

ctio

n

-P

red

icti

on

: b

ased

on

th

e p

ost

erio

r sa

mp

les

fro

m

the

Met

rop

olis

-H

asti

ngs

al

gori

thm

-D

ata:

pie

cew

ise

con

tin

uo

us

pro

cess

wit

h

jum

ps

-M

od

el: f

lexi

ble

va

rian

ce

stru

ctu

re: t

ime-

dep

end

ent

or

auto

regr

essi

ve,

late

nt

Po

isso

n

pro

cess

-

Esti

mat

ion

: fi

lter

ing

of

Po

isso

n p

roce

ss

wit

hin

th

e G

ibb

s sa

mp

ler,

if

un

ob

serv

ed

-P

red

icti

on

: bas

ed

on

th

e p

ost

erio

r sa

mp

les

fro

m t

he

Gib

bs

sam

ple

r an

d t

he

pre

dic

tive

sa

mp

les

of

the

Po

isso

n p

roce

ss

-D

ata:

fo

llow

a

det

erm

inis

tic

fun

ctio

n b

etw

een

ju

mp

s, w

ith

ad

dit

ive

erro

r -

Mo

del

: co

nst

ant

vari

ance

ove

r ti

me,

ti

me-

dep

end

ent

exte

nsi

on

po

ssib

le,

late

nt

Po

isso

n

pro

cess

-

Esti

mat

ion

: Gib

bs

sam

ple

r b

ased

on

o

bse

rved

Po

isso

n

pro

cess

, filt

erin

g is

fu

ture

wo

rk

-P

red

icti

on

: bas

ed o

n

the

po

ster

ior

sam

ple

s fr

om

th

e G

ibb

s sa

mp

ler

and

th

e p

red

icti

ve

sam

ple

s o

f th

e P

ois

son

pro

cess

R P

acka

ge

BaP

reSt

oP

ro

(Sec

. 8)

-fo

r ea

ch o

f th

e m

od

els:

on

e S4

cla

ss

-fo

r ea

ch

mo

del

cla

ss:

sim

ula

tio

n,

esti

mat

ion

an

d p

red

icti

on

m

eth

od

s -

plo

t m

eth

od

s fo

r es

tim

atio

n

resu

lts

Figure 50: Overview of considered models in this thesis.

88

For prediction, the Markov property is a big advantage of the jump diffusion model. Startingat the last observed value, the prediction uncertainty is small in the beginning and growsover time. This effect is expected heuristically. But, of course, this advantage depends on theapplication. In experiments with a high measuring error, the jump regression model can beappropriate, because a high prediction uncertainty is justified even close to the last observedvalue.

Outlook

The presented prediction methods are based on the widely used Euler approximation for theSDE, which leads to a universally applicable approach. Of course, other, more sophisticated,approximation schemes might have a smaller error with respect to the true process, wherethe explicit solution might be unknown. It will be future work to investigate other approxi-mation methods for a generalization of the presented methods. The Euler approximation isthought of a starting point, which is applicable to all of the considered models in this the-sis.

In Section 6, we have seen the jump diffusion process based on the non-homogeneous Pois-son process. For the presented crack width data, we assume to observe the underlyingcounting process based on sound measurements during the experiment. In this thesis, wepresented a filtering algorithm for the latent Poisson process. The evaluation of one of thedata set series raises the question, if on some event times, more than one wire breaks atthe same time, which makes a large difference for estimation and even for prediction. Acloser look at the problem will be future work. For the crack width data, there are currently10 experiments made, most of them under different conditions. Here, a hierarchical jumpdiffusion model could be of interest, where covariates like the stress range will be consid-ered.

In both jump models, the jump diffusion as well as the jump regression model, it could beof interest to exchange the latent Poisson process by the already mentioned self-exciting process.Here, a filtering algorithm for both models would have to be considered.

An R package has been developed, where all of the presented models are implemented as S4class and simulation, estimation and prediction methods are available. By now, all codesare written in R. It will be future work to implement parts of it in C to make the functions,especially the filtering algorithms, much faster. Even from the methodical side, there are exten-sions thinkable. For example, by now, estimation is only based on the likelihood of the Eulerapproximated variables. Fuchs (2013), for example, presents a data augmentation approach,which makes it possible to get a smaller Euler approximation error. This would make the esti-mation part more sophisticated. For comparison, it would be also desirable to give the user theopportunity to include an own transition density, that might stem from an explicit solution ofthe process. By now, implementation is based on the drift, variance and height function givenin (2.1). In addition, a filtering algorithm for the jump regression model would be a nice exten-sion.

89

Acknowledgement

This work has been supported by the Collaborative Research Center “Statistical modeling ofnon-linear dynamic processes” (SFB 823) of the German Research Foundation (DFG) in projectB5 “Statistical methods for damage processes under cyclic load”. We want to thank ProfessorReinhard Maurer, Guido Heeke and Jens Heinrich for their experiments and providing usthe data. In addition, we want to thank Professor B. M. Hillberry for his experiments and Eric J.Tuegel for providing us the data.

In addition, I want to thank Adeline Samson for her ideas and helpful discussions about theparticle Gibbs sampler and the productive work on the package for mixed diffusion models. Inparticular, I want to thank Fabrizio Ruggeri for the opportunity to work with him on the filteringof the latent Poisson process in the jump diffusion model with application to wear in cylinder lin-ers.

Persönliche Danksagung

Allem voran möchte ich mich hier bei Christoph Falkenau bedanken, ohne den ich diese Arbeitnie geschrieben hätte. In den fünf Jahren Promotionszeit gab es Höhen und vor allem Tiefen,in denen es immer Christoph war, der ein offenes Ohr und ein paar hilfreiche Worte für michhatte.

Ich möchte hier auch meinen beiden Betreuerinnen Christine Müller und Katja Ickstadt dankenfür die Freiheit, die ich von Anfang an hatte. Auch wenn ursprünglich thematisch etwas anderesgeplant war, konnte ich eigenverantwortlich meine eigenen Ideen verfolgen und umsetzen. Auchdie beiden Forschungsaufenthalte wurden sofort bereitwillig von euch unterstützt, die mich jew-eils sehr in meiner Arbeit weiter gebracht haben.

Ganz besonders möchte ich meinem Mann Eric danken für die Unterstützung und den Rückhaltgerade in der Endphase der Promotion. Dank dir gibt es inzwischen in meinem Leben so vielmehr als nur die Wissenschaft.

90

References

Andrieu, C., A. Doucet, and R. Holenstein (2009). “Particle Markov Chain Monte Carlo forEfficient Numerical Simulation”. Monte Carlo and quasi-Monte Carlo methods 2008. BerlinHeidelberg: Springer, pp. 45–60.

Andrieu, C., A. Doucet, and R. Holenstein (2010). “Particle Markov Chain Monte CarloMethods”. Journal of the Royal Statistical Society B 72, pp. 269–342.

Barber, D., A. T. Cemgil, and S. Chiappa (2011). Bayesian Time Series Models. New York:Cambridge University Press.

Beskos, A., O. Papaspiliopoulos, G. O. Roberts, and P. Fearnhead (2006). “Exact and Compu-tationally Efficient Likelihood-Based Estimation for Discretely Observed Diffusion Processes”.Journal of the Royal Statistical Society B 68, pp. 333–382.

Blanke, D. and D. Bosq (2015). “Bayesian Prediction for Stochastic Processes: Theory andApplications”. Sankhya: The Indian Journal of Statistics 77, pp. 79–105.

Bruti-Liberati, N. and E. Platen (2007). “Strong Approximations of Stochastic DifferentialEquations with Jumps”. Journal of Computational and Applied Mathematics 205, pp. 982–1001.

Carlin, B. P. and T. A. Louis (2009). Bayesian Methods for Data Analysis. Boca Raton:Chapman & Hall.

Carvalho, C. M., M. S. Johannes, H. F. Lopes, and N. G. Polson (2010). “Particle Learningand Smoothing”. Statistical Science 25, pp. 88–106.

Chiquet, J., N. Limnios, and M. Eid (2009). “Piecewise Deterministic Markov Processes Appliedto Fatigue Crack Growth”. Journal of Statistical Planning and Inference 139, pp. 1657–1667.

Cont, R. and P. Tankov (2004). Financial modelling with jump processes. London: CRC Press.

Crowder, M. J., A. C. Kimber, R. L. Smith, and T. J. Sweeting (1994). Statistical Analysis ofReliability Data. London: Chapman & Hall.

Del Moral, P., A. Doucet, and A. Jasra (2006). “Sequential Monte Carlo Samplers”. Journalof the Royal Statistical Society B 68, pp. 411–436.

Devroye, L. (1986). Non-Uniform Random Variate Generation. New York: Springer.

Dion, C., S. Hermann, and A. Samson (2016a). “Mixedsde: an R Package to Fit MixedStochastic Differential Equations”. hal-01305574.

Dion, C., A. Samson, and S. Hermann (2016b). mixedsde: Estimation Methods for StochasticDifferential Mixed Effects Models. R package version 1.0. url: http://CRAN.R-project.org/package=mixedsde.

91

http://CRAN.R-project.org/package=mixedsde

http://CRAN.R-project.org/package=mixedsde

Donnet, S., J.-L. Foulley, and A. Samson (2010). “Bayesian Analysis of Growth Curves UsingMixture Models Defined by Stochastic Differential Equations”. Biometrics 66, pp. 733–741.

Doucet, A., S. Godsill, and C. Andrieu (2000). “On Sequential Monte Carlo Sampling Methodsfor Bayesian Filtering”. Statistics and Computing 10, pp. 197–208.

Fuchs, C. (2013). Inference for Diffusion Processes. Berlin Heidelberg: Springer.

Gneiting, T. and A. E. Raftery (2007). “Strictly Proper Scoring Rules, Prediction and Estima-tion”. Journal of the American Statistical Association 102, pp. 359–378.

Gunkel, C., A. Stepper, A. C. Müller, and Ch. H. Müller (2012). “Micro Crack Detection withDijkstra’s Shortest Path Algorithm”. Machine Vision and Applications 23, pp. 589–601.

Heeke, G. (2016). “Untersuchungen zur Ermüdungsfestigkeit von Betonstahl und Spannstahlim Zeit-und Dauerfestigkeitsbereich mit sehr hohen Lastwechselzahlen”. PhD thesis. TUDortmund, Fachbereich Architektur und Bauingenieurwesen.

Heeke, G., S. Hermann, R. Maurer, K. Ickstadt, and Ch. H. Müller (2015). “StochasticModeling and Statistical Analysis of Fatigue Tests on Prestressed Concrete Beams underCyclic Loadings”. SFB 823 Discussion Paper 25/2015.

Hermann, S. (2016a). BaPreStoPro: Bayesian Prediction of Stochastic Processes. R packageversion 0.1. url: http://CRAN.R-project.org/package=BaPreStoPro.

Hermann, S. (2016b). “Bayesian Prediction for Stochastic Processes based on the EulerApproximation Scheme”. SFB 823 Discussion Paper 27/2016.

Hermann, S. and F. Ruggeri (2016). “Modelling Wear in Cylinder Liners”. Quality andReliability Engineering International. doi: 10.1002/qre.2061.

Hermann, S., K. Ickstadt, and Ch. H. Müller (2016a). “Bayesian Prediction for a JumpDiffusion Process with Application to Crack Growth in Fatigue Experiments”. ReliabilityEngineering & System Safety. doi: 10.1016/j.ress.2016.08.012.

Hermann, S., K. Ickstadt, and Ch. H. Müller (2016b). “Bayesian Prediction of Crack GrowthBased on a Hierarchical Diffusion Model”. Applied Stochastic Models in Business and Industry32, pp. 494–510.

Iacus, S. M. (2008). Simulation and Inference for Stochastic Differential Equations. New York:Springer.

Ikeda, N. and S. Watanabe (1989). Stochastic Differential Equations and Diffusion Processes.2nd ed. Amsterdam, Oxford, New York: North-Holland.

Johannes, M. S., N. G. Polson, and J. R. Stroud (2009). “Optimal Filtering of Jump Diffusions:Extracting Latent States from Asset Prices”. The Review of Financial Studies 22, pp. 2759–2799.

92

http://CRAN.R-project.org/package=BaPreStoPro

http://dx.doi.org/10.1002/qre.2061

http://dx.doi.org/10.1016/j.ress.2016.08.012

Jokiel-Rokita, A., D. Lazar, and R. Magiera (2014). “Bayesian Prediction in Doubly StochasticPoisson Process”. Metrika 77, pp. 1023–1039.

Klebaner, F. C. (2005). Stochastic Calculus with Applications. London: Imperial College Press.

Kluge, W. (2005). “Time-inhomogeneous Lévy processes in interest rate and credit risk models.”PhD thesis. University of Freiburg, 4. Jg., Nr. 5, S. 6.

Kuo, L. and T. Y. Yang (1996). “Bayesian Computation for Nonhomogeneous Poisson Processesin Software Reliability”. Journal of the American Statistical Association 91, pp. 763–773.

Kvam, P. H and J.-C. Lu (2007). “Load-Sharing Models”. Encyclopedia of Statistics in Qualityand Reliability.

Liu, J. S. and R. Chen (1998). “Sequential Monte Carlo Methods for Dynamic Systems”.American Statistical Association 93, pp. 1032–1044.

Maurer, R. and G. Heeke (2010). Ermüdungsfestigkeit von Spannstählen aus einer ÄlterenSpannbetonbrücke. Tech. rep. 76. Dortmund: Landesbetrieb Straßenbau NRW, TU Dortmund.

Meeker, W. Q. and L. A. Escobar (1998). Statistical Methods for Reliability Data. New York:John Wiley & Sons.

Merton, R. C. (1976). “Option Pricing When Underlying Stock Returns are Discontinuous”.Journal of Financial Economics 3, pp. 125–144.

Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller (1953).“Equation of State Calculations by Fast Computing Machines”. The Journal of ChemicalPhysics 21, pp. 1087 –1092.

Müller, C. H., S. Szugat, and R. Maurer (2016). “Simulation Free Prediction Intervals for aState Dependent Failure Process Using Accelerated Lifetime Experiments”. SFB 823 DiscussionPaper 45/2016.

Nakagawa, T. (2007). Shock and Damage Models in Reliability Theory. London: Springer.

Øksendal, B. (2003). Stochastic Differential Equations: An Introduction with Applications.Heidelberg New York: Springer.

Øksendal, B. and A. Sulem (2005). Applied Stochastic Control of Jump Diffusions. BerlinHeidelberg: Springer.

Phillips, P. C. B. (1973). “The Problem of Identification in Finite Parameter Continuous TimeModels”. Journal of Econometrics 1, pp. 351–362.

Pievatolo, A. and F. Ruggeri (2004). “Bayesian Reliability Analysis of Complex RepairableSystems”. Applied Stochastic Models in Business and Industry 20, pp. 253–264.

Platen, E. and N. Bruti-Liberati (2010). Numerical Solution of Stochastic Differential Equationswith Jumps in Finance. First. Berlin Heidelberg: Springer.

93

Plummer, M., N. Best, K. Cowles, and K. Vines (2006). “CODA: Convergence Diagnosis andOutput Analysis for MCMC”. R News 6, pp. 7–11. url: http://CRAN.R-project.org/doc/Rnews/.

Protter, P. E. (2005). Stochastic Integration and Differential Equations. Berlin Heidelberg:Springer.

R Core Team (2015). R: A Language and Environment for Statistical Computing. R Foundationfor Statistical Computing. Vienna, Austria. url: https://www.R-project.org/.

Ríos Insua, D., F. Ruggeri, and M. P. Wiper (2012). Bayesian Analysis of Stochastic ProcessModels. United Kingdom: John Wiley & Sons, Ltd.

Robert, C. P. and G. Casella (2004). Monte Carlo Statistical Methods. New York: Springer.

Robinson, M. E. and M. J. Crowder (2000). “Bayesian Methods for a Growth-Curve Degrada-tion Model with Repeated Measures”. Lifetime Data Analysis 6, pp. 357–374.

Rosenthal, J. S. (2011). “Optimal Proposal Distributions and Adaptive MCMC”. Handbook ofMarkov Chain Monte Carlo, pp. 93–112.

Shimizu, Y. and N. Yoshida (2006). “Estimation of Parameters for Diffusion Processes withJumps from Discrete Observations”. Statistical Inference for Stochastic Processes 9, pp. 227–277.

Sobczyk, K. and B.F. Spencer (1992). Random Fatigue: From Data to Theory. London:Academic Press Limited.

Sørensen, H. (2004). “Parametric Inference for Diffusion Processes Observed at Discrete Pointsin Time: a Survey”. International Statistical Review 72, pp. 337–354.

Szugat, S., J. Heinrich, R. Maurer, and Ch.H. Müller (2016). “Prediction Intervals for theFailure Time of Prestressed Concrete Beams”. Advances in Materials Science and Engineering.doi: 10.1155/2016/9605450.

Vidoni, P. (2004). “Improved Prediction Intervals for Stochastic Process Models”. Journal ofTimes Series Analysis 25, pp. 137–154.

Virkler, D. A., B. M. Hillberry, and P. K. Goel (1979). “The Statistical Nature of FatigueCrack Propagation”. Journal of Engineering Materials and Technology 101, pp. 148–153.

Weinberg, J., L. D. Brown, and J. R. Stroud (2007). “Bayesian Forecasting of an InhomogeneousPoisson Process with Application to Call Center Data”. Journal of the American StatisticalAssociation 102, pp. 1185–1198.

Yu, J.-W., G.-L. Tian, and M.-L. Tang (2007). “Predictive Analyses for NonhomogeneousPoisson Processes with Power Law Using Bayesian Approach”. Computational Statistics &Data Analysis 51, pp. 4254–4268.

94

http://CRAN.R-project.org/doc/Rnews/

http://CRAN.R-project.org/doc/Rnews/

https://www.R-project.org/

http://dx.doi.org/10.1155/2016/9605450

Yuan, T. and Y. Ji (2015). “A Hierarchical Bayesian Degradation Model for HeterogeneousData”. IEEE Transactions on Reliability 64, pp. 63–70.

Yuan, X.-X., D. Mao, and M. D. Pandey (2009). “A Bayesian Approach to Modeling andPredicting Pitting Flaws in Steam Generator Tubes”. Reliability Engineering and SystemSafety 94, pp. 1838–1847.

95

A. Notations

t time in [0, T ]

Yt, t ∈ [0, T ] stochastic process

Wt, t ∈ [0, T ] Brownian motion

Nt, t ∈ [0, T ] Poisson process

b(ϕ, t, y) drift function

s(γ, t, y) variance function

h(η, t, y) jump height function

Λξ(t) cumulative intensity function

λξ(t) intensity function, derivative of Λξ(t)

θ parameter vector, e.g., (ϕ, γ2, η, ξ), resp. dummy for arbitrary parameter

n number of observations

nj , j = 1, ..., J in the hierarchical model: number of observations in series j

J in the hierarchical model: number of series, e.g. test specimen

I number of jumps

y0 = y0(ϕ) starting point (almost surely) of the stochastic process Yt, t ∈ [0, T ]

Y(n) = (Y0, ..., Yn) vector of Euler approximated variables

Y(nj ,j) in the hierarchical model: the jth observation vector

Y(n·,J) in the hierarchical model: all the observations Yiji=0,...,nj ,j=1,...,J

t(n) = (t0, ..., tn) vector of time points

tij , i = 1, ..., nj ; j = 1, ..., J observation time points in the hierarchical models

∆i = ti − ti−1, i = 1, ..., n observation time distances

∆ij = tij − ti−1,j observation time distances in the hierarchical models, i = 1, ..., nj ; j = 1, ..., J

T1, ..., TI event times of the counting process

q(θ∗|θ) proposal density of MH algorithm

p(θ|Y(n)) posterior density of θ

p(Ym+1|Ym, θ) transition density of Markov process Ym, m = 0, 1, 2, ...

mθ, Vθ, aθ, bθ prior parameters of the normal, resp. the IG distribution of θ

θ∗k, k = 1, ..., K posterior samples

t∗, Y ∗ Y ∗ variable to predict in t∗

τ0 ∈ t0, tn, τ1, .., τM = t∗ partition of interval [t0, t∗], resp. [tn, t∗]

∆∗ time difference t∗−τ0M

Y ∗0 ∈ y0, Yn, Y ∗

1 , ..., Y ∗M prediction variables in τ0, τ1, .., τM

96

B. Further Details

Calculation to Section 7

Beginning with the function

w(t) = (1 − k(t)) · (∆σ(t))2 · AP (t)0.72 · π · fctm · EP ·

√AP (t)

with

k(t) = kl + (ks − kl) · et·c

AP (t) = AP ·(

1 − Nt

35

)= Ap · h(Nt)−1

∆σ(t) = σ2 · AP

AP (t) − σ1 = σ2 · 11 − Nt

35− σ1 = σ2 · h(Nt) − σ1

with kl, ks, c, AP , σ1, σ2, fctm, EP constants and h(x) = 11− x

35. We abbreviate ν = 0.72 ·π ·fctm ·

Ep and get

w(t) = (1 − k(t)) · (∆σ(t))2 · Ap(t)0.72 · π · fctm · Ep ·

√Ap(t)

=1ν

· (1 − k(t)) · (∆σ(t))2 ·√

Ap(t)

=1ν

· (1 − k(t)) · (σ2 · h(Nt) − σ1)2 ·√

Ap · h(Nt)−1

=√

Ap

ν ·√

h(Nt)· (1 − k(t)) · (σ2 · h(Ct) − σ1)2

=√

Ap

ν ·√

h(Nt)· (1 − kl − (ks − kl) · et·c) · σ2

2 ·(

h(Nt) − σ1σ2

)2

=√

Ap · σ22

ν·((1 − kl) − (ks − kl) · et·c

)· 1√

h(Nt)·(

h(Nt) − σ1σ2

)2

= (ϕ1 − ϕ2 · exp(−t · ϕ3)) · 1√h(Nt)

(h(Nt) − ϕ4)2

with

• ϕ1 = (1−kl)·√

Ap·σ22

0,72·π·fctm·Ep,

• ϕ2 = (ks−kl)·√

Ap·σ22

0,72·π·fctm·Ep,

• ϕ3 = −c,

• ϕ4 = σ1σ2

.

97

Conditional SMC

We here recall to the SMC algorithm in Section 4 and add the changes for the conditional SMC.Remember, we draw j ∼ p1:L(·|Wn) and set Y(n) = (Y j

0 , ..., Y jn ) with the weights and the parti-

cles from the SMC before.

We introduce the notion of the ancestral lineage Bl(n) := (Bl

0, ..., Bln), l = 1, ..., L,. It is recur-

sively defined by Bln = l and Bl

i := ABl

i+1i , i = n−1, ..., 0. In Figure 51, we see an illustration of

the SMC with L = 5 particles and n = 3 iteration steps.

A1 = (A11, ..., A5

1) = (1, 1, 3, 3, 4)

A2 = (A12, ..., A5

2) = (2, 2, 4, 3, 4)

Figure 51: Illustration of SMC; B1(3) = (1, 2, 1), B2

(3) = (1, 2, 2), B3(3) = (3, 4, 3), B4

(3) =(3, 3, 4), B5

(3) = (3, 4, 5).

We denote with M(L − 1, (p1, ..., pL)) the multinomial distribution with parameter L − 1and probability vector (p1, ..., pL). For the CSMC in the next iteration of the Gibbs sampler, setY

Bj0

0 = Y0 and replace (i) and (iii) in the description of the SMC by

(ia) – Sample offspring variables Oli−1 ∼ M(L − 1, Wi−1) and set o := Ol

i−1,Bji−1

+ 1, with

Oli−1,Bj

i−1the Bj

i−1th component

– sample h1, ..., ho ∼ U1, ..., Bji−1 − 1, Bj

i−1 + 1, ..., L

– Ali−1 = Bj

i−1 for l ∈ Bji , h1, ..., ho

Ali−1 ∼ p1,...,Bj

i −1,Bji +1,...,L

(·|W −Bji

i−1 ) for l ∈ 1, ..., L\Bji , h1, ..., ho,

whereby W−Bj

ii−1 = (W 1

i−1, ..., WBj

i −1i−1 , W

Bji +1

i−1 , ..., W Li−1).

(iiia) Set YBj

ii = Yi and draw Y l

i ∼ p(Yi|Y li−1, Zi, θ), l = 1, ..., Bj

i − 1, Bji + 1, ..., L.

See Andrieu et al. (2009), and remark the errata at the end.

98

Eidesstattliche Erklärung

Hiermit erkläre ich, dass ich die vorliegende Dissertation selbständig verfasst und keineanderen als die angegebenen Hilfsmittel benutzt habe. Die Dissertation ist bisher keineranderen Fakultät vorgelegt worden. Ich erkläre, dass ich bisher kein Promotionsverfahrenerfolglos beendet habe und dass keine Aberkennung eines bereits erworbenen Doktorgrades vor-liegt.

Simone Hermann

99

Bayesian Prediction for Stochastic Process Models...

Documents

Transcript of Bayesian Prediction for Stochastic Process Models...