The Shape of Bayes Tests of Power One

20
Sonderdrucke aus der Albert-Ludwigs-Universität Freiburg HANS RUDOLF LERCHE The Shape of Bayes Tests of Power One Originalbeitrag erschienen in: The annals of statistics 14 (1986), S. 1030-1048

Transcript of The Shape of Bayes Tests of Power One

Page 1: The Shape of Bayes Tests of Power One

Sonderdrucke aus der Albert-Ludwigs-Universität Freiburg

HANS RUDOLF LERCHE The Shape of Bayes Tests of Power One Originalbeitrag erschienen in: The annals of statistics 14 (1986), S. 1030-1048

Page 2: The Shape of Bayes Tests of Power One

The Annals of Statistics1986, Vol. 14, No. 3, 1030-1048

THE SHAPE OF BAYES TESTS OF POWER ONE 1

BY HANS RUDOLF LERCHE

University of Heidelberg

The problem of determining Bayes tests of power one (without anindifference zone) is considered for Brownian motion with unknown drift.When we let the unit sampling cost depend on the underlying parameter in anatural way, it turns out that a simple Bayes rule is approximately optimal.Such a rule stops sampling when the posterior probability of the hypothesis istoo small.

1. Introduction. Let W(t) denote Brownian motion with unknown drift0 e R and P0 the associated measure. We consider the following sequentialdecision problem. Let F be prior on R given by F = y80 + (1 — y)f.0(16:0)16: dewith 0 <y < 1 and 0(x) = (1/ li;--T )e —x /2 , consisting of a point mass at (0 = 0)and a smooth normal part on (0 * 0). Let the sampling cost be c0 2 , with c> 0for the observation of W per unit time when the underlying measure is Po . Weassume also a loss function which is equal to 1 if 6 = 0 and we decide in favor of"0 * 0" and which is identically 0 if 0 * 0. A statistical test consists of astopping time T of Brownian motion where stopping means a decision in favor of"0 * 0."

The Bayes risk for this problem is then given by00

(1.1) p(T) = yPo(T < oo) + (1— y)c f 0 2E07195(160)16: de.-00

In this paper we investigate the "optimal" stopping rule T c* which minimizesp(T).

For the cost c sufficiently small, Tc* is a test of power one for the decisionproblem 110 : 0 = 0 versus H1 : 6 * 0. This is by definition a stopping time Twhich satisfies the conditions

(1.2)

Po(T < oo) <1,

( 1.3 )

Po (T < oo) = 1 if 0 * O.

Here stopping also means a decision in favor of "0 * 0." For a discussion of testsof power one see Robbins (1970). A similar problem has been studied by Pollak(1978) who assumed an indifference zone in the parameter space. The type ofprior assumed here was once proposed by Jeffreys (1948).

A basic idea of this paper is to let the sampling cost depend on the underlyingparameter in a natural way. At the first view the cost term "c0 2 " has an unusual

Received May 1984; revised October 1985.'This work was supported by the Deutsche Forschungsgemeinschaft and the National Science

Foundation at MSRI, Berkeley, under NSF Grant MCS81-20790.AMS 1980 subject classifications. Primary 62L15; secondary 62C10.Key words and phrases. Sequential Bayes tests for composite hypotheses, tests of power one,

simple Bayes rules.

1030

Page 3: The Shape of Bayes Tests of Power One

BAYES TESTS OF POWER ONE 1031

structure. The factor 0 2/2 is the Kullback-Leibler information numberE9 log(dP9 , 1/dP0,,) which quantifies the separability of the measures Po and Po .Its meaning becomes apparent by the following consideration. Let us considertwo testing problems with simple hypotheses:

(1) 110 : 0 = 0 versus H1 : 0 = 0 1 ,(2) 1/0 : 0 - 0 versus H1 : 0 = 02

with O > 0, i = 1, 2. Let ti , i = 1,2 denote the sampling lengths. Then the level-aNeyman-Pearson tests for both problems have the same error probabilities if andonly if 0t1 = 01t2 . [This follows from the power function of a Neyman-Pearsontest of level a: 1(- ca + OVI).] Thus the factor 0 2 standardizes the samplinglengths in such a way that the embedded simple testing problems are of equaldifficulty. Beside this statistical aspect there is a basic mathematical reasonfor this choice of the sampling costs. Since in our decision problem (1.1) anindifference zone does not occur and since E0T = oo [as Po(T < oo) < 1] wehave lim o 0E0 T = oo. More information about the singularity is providedby a lemma of Darling and Robbins (1967) [see also Robbins and Siegmund(1973) and Wald (1947), page 197]. It states that for every stopping time T withPo(T < oo) < 1

(1.4) E0T ^ 2 b/0 2, where b= - log Po( T < cx)).

Equality in (1.4) holds for the special stopping rule

(1.5) To = inf{ t > 0t

> e l)} .dP0, t

Here dPo , t/dPo, t denotes the likelihood ratio (Radon-Nikodym derivative) of Po

with respect to Po given the path W(u), 0 � u � t. It is given by

die t ,JD = exp(OW(t) - Ji0 2 t).(41- 0 , t

According to (1.4) the expected sample size EoT of a test of power one,considered as a function of 0 has a pole at 0 = 0. The choice of "c" or "c101"instead of "c0 2 " would imply that tests of power one have an infinite Bayes risksince

fl0liE0T0(17: 0)11i: de = oo for i = 0,1.

A precise description of the pole of EoT is given by Robbins and Siegmund (1973)and Jennen and Lerche (1982). The sampling costs "c0 2 " remove the nonintegra-bility of the singularity of EoT for a large class of tests of power one, althoughlim 0 _900 2E0T = oo still holds [by the corollary on page 102 of Robbins andSiegmund (1973)]. For instance for all tests of power one defined by

T= inftt > 011W(t)I ^ Iii(t)),

where the function 4i(t) is concave and 4i( t) = o(t23 ) when t -> oo (with E > 0

Page 4: The Shape of Bayes Tests of Power One

1032 H. R. LERCHE

arbitrary small), the Bayes risk (1.1) is finite. This follows from the inequality101E0 T _� tP(E0T), which is a consequence of Wald's lemma and Jensen's in-equality. Therefore by the choice of the sampling costs as "c0 2 " the concept ofBayes tests of power one becomes an interesting topic to study.

The related problem for simple hypotheses can be solved easily. The Bayesrisk given by

(1.6) p(T) = yPo(T < cc) + (1 — y)c0 2E0 T,

using statement (1.4), is minimized by the stopping rule

(1.7) Tc* = inf { t > 01W(t) ^ log a/0 +-liet}

with a = y(2(1 — y)c)' provided a> 1. In this case the minimal Bayes risk isgiven by

(1.8) p(T,*) = 2(1 — y)c[log a + 1].

When a � 1, Tc* = 0, and p(7?) = y. (For more details see the end of the proofof Theorem 2.) Here the choice of the sampling costs leads to a solution notdepending on 0. This becomes obvious when one expresses T: in another way. Itcan be rewritten as

2c \Tc* = inf{t > Oly(W(t), t) _<

1 + 2c f 'where

y(x, t) =7

dPe ty + (1 — y) ' (x)

dP0 , t

denotes the posterior mass of the parameter "0" at (x, t) with respect to theprior F = y80 + (1 — y)80 . Thus T: has the intuitive meaning "stop when theposterior mass of the hypothesis "0" is too small". This is a simple Bayes rule orequivalently the one-sided sequential probability ratio test (1.5).

The following study shows that a simple Bayes rule which stops when theposterior probability of the hypothesis "0 = 0" is too small, is approximatelyoptimal for the risk (1.1). For precise statements see the Theorems 2 and 3 andthe corollaries. This simple Bayes rule is of the type (1.5) with a boundary equalto

(t + r ) 1/27

p( t) = ((t + r) log + 2 log b where b —r 2(1 — y)c •

For large t this boundary asymptotically grows like (t log 0 1/2 , which is fasterthan the limiting growth rate (2f log log 0 1/2 of the law of the iterated logarithm.As a consequence of our results the minimal Bayes risk can be approximated bythat of simple Bayes rules within o(c) when c —> 0 (Theorem 4). Simple Bayesrules for the same type of prior which we use (Jeffreys' priors) were alreadydiscussed by Cornfield (1966).

Similar results hold for exponential families with general priors, although onehas to make a careful analysis of the overshoot effect following the ideas of

Page 5: The Shape of Bayes Tests of Power One

BAYES TESTS OF POWER ONE 1033

Lorden (1977) to derive an o(c)-approximation for the minimal Bayes risk. Theresults will be published elsewhere. The proofs for the case of exponential familiesare more technical, since special approximation arguments are needed. The nicefeature of the Brownian motion case is, that most expressions can be calculatedexactly. Thus no approximations are needed and the proofs become simple.

This paper is organized as follows: Theorem 1 states the existence of anoptimal (Bayes) stopping rule T. Theorem 2 gives upper and lower bounds forT:, which makes it possible to derive its asymptotic shape when c —> 0 or t --> co.Theorem 3 refines these bounds, which yields the above mentioned o( c)-approximation of the minimal Bayes risk. Theorem 5 treats the one-sided case.

The results have some meaning for sequential clinical trials. These aspects arediscussed in more detail in a subsequent paper. Historical facts are mentioned inLerche (1985). A further result connected with the costs ce 2 is the exact Bayesproperty of the repeated significance test. For that see Lerche (1985, 1986).

2. Preliminaries. We need the following notations. The Brownian motionW with drift 0 starting at time t in point x is understood as a measure IV' 0 onthe space CP, oo) of continuous functions on [t, oo). .9Ts t denotes the a-algebra onCP, cc) which is generated by W(u), t � u _�. s. The restriction of the measureIV' ° on Sc., t is denoted by /V; °• This notation is also used for stopping times Sinstead of fixed times s. When the process starts at 0 at time 0, then we veryoften skip the superindex and write just .547,, P9 The Borel a-algebra on theparameter space R is denoted by .q. For F = 780 + (1 — y)f.14(17: 0) de, letdP = dP0 F(de) and dP = d( f P0F(d0)) be its projection. Let F denote theposterior distribution given that the process W( t) = x. This means that forA x/3e3rot eiq fAFw(t) , t(B)P(dW) = P(A x B) holds. Thus the Bayes risk(1.1) can be rewritten as

00(2.1) p(T) = i(Fwm , T (t0}) + cT f 0 2Fw(T) , T (d0)) dP.

{T<oo) —oo

Let P(x , 0 denote the conditional distribution of the process under P givenW( t) = x. It can be represented as P ( x' t) = f 4x, Op-ix, (de).

We define the posterior risk at the space-time point (x, t) for a stopping ruleT > t as

p(x, t, T) = fT<)

(FmT) , T ({0}) + c(T — t)f 00

,

x00

j 0 2FmT ), T(do )) dpx, O .

-co

(2.2)

The minimal posterior risk at (x, t) is defined as

(2.3) p(x, t) = V p(x, t, T),

where the infimum is taken over all stopping times of the process (Ms), s)

Page 6: The Shape of Bayes Tests of Power One

1034 H. R. LERCHE

starting at (x, t), including Tt --=--: t. For Tt the risk is given by

(2.4) y(x, t) = p(x, t, Tt ) = Fx , t({0})

and therefore the inequality p(x, t) � y(x, t) holds. The quantity p(x, t, T) +ctfO 2Fx , t(d0) represents the loss when the process runs without stopping up to(x, t) and is stopped at T .̂ ._ t.

The following theorem states that an optimal (Bayes) stopping rule existswhich minimizes (2.1) and characterizes it. Let W *(c) = t( y, 8)1p( y, s) < y( y, s))and

(2.5) Tc* = inftsl(W(s), s) 44 W*(c)).

THEOREM 1. The stopping rule T:(.̂ ._ t) of the space-time process (W(t), t)minimizes the risk (2.2) for all starting points (x, t).

This type of result is well known. Its statement is usually called the principleof dynamic programming. The result follows from the theory of optimal stoppingfor Markov processes [cf. Shiryayev (1978), page 127] applied to_the space-time process (W(t), t). We note that W(t) under the measure P is a dif-fusion process which satisfies the stochastic differential equation dW(t) =(1 — y(W(t), t))W(t)/(t + r) dt + dX(t) where X(t) is a standard Brownianmotion [cf. Liptser and Shiryayev (1977), page 258].

The stopping risk can be calculated by (2.4) as

(2.6) y(x, t) =

with

Y

y + (1 — y)g(x, t)

g(x, t)

(2.7)

We note that on (0 * 0)

= fco dPo t. dpo : t (x)0(11T- 0)11T- de

= f exp(Ox — -0 2 t)4)(11T- OW- de

r i x 2 )=/

t + r exi) 2( t + r) ) •

(2.8)

Fx, t(c10 ) = (1 — y(x , t))Gx , t(d0) ,

where

(

x 1 )Gx , t = Nt+ r' t+r)

holds.Here N(/L, a 2 ) denotes the normal distribution with mean au and variance a 2 .

The exact calculation of the minimal posterior risk p(x, t) seems to be impossiblefor this problem. We can only derive upper and lower bounds for it. To get thosewe will rewrite the posterior risk in an appropriate form.

Page 7: The Shape of Bayes Tests of Power One

BAYES TESTS OF POWER ONE 1035

LEMMA 1.

p(x, t, T) = y(x, t)Paxm(t � T < oo)(2.9)

+ (1 — y(x, t))c f 0 2Ee' t) (T — t)Gx , t(d0).

The posterior risk has the same form as the Bayes risk (1.1), with the slightdifference that the process starts in the space-time point (x, t), stops at T ^ t,and has as prior F = y(x, 080 + (1 — y(x, t))Gx , t, the posterior at the point(x,t).

The proof of the lemma is a direct consequence of the preceding definitionsand of the following basic fact about posterior distributions: the posterior ofBrownian motion starting at (x, t) with prior F at point (W(S), S) is given byFw(s), s•

3. Results for two-sided tests. The continuation region W *(c) of the opti-mal stopping rule for the Bayes risk (1.1) is now approximated by upper andlower bounding regions of the space-time plane. These bounds are refinedin Theorem 3. The bounding regions are given by sets of the typeW(X) := ((x, t)ly(x, t) > A).

THEOREM 2. There exists a constant M> 2 such that for every c> 0

2c \(3.1) W'

(1 + Mc c ) c W*(c) c W (1 + 2c )

holds.

REMARK 1. Let TA = inf ft > 01(W( t), t) 0 W(X)). Then (3.1) translates to

Titic/(i+mc) -� Tc* -� T2 c/(1 + 2 c).

REMARK 2. The theorem holds also for the more general prior

— II)) dO

by exactly the same arguments.

PROOF. At first we prove the lower inclusion of (3.1), which is the moredifficult part. We show that for all points (x, t) e '(Mc/(1 + Mc)) (M will bespecified during the proof) there exist stopping times S (x , t) of the process(W(s), s) starting at (x, t), such that

(3.2) p(x, t, Scr , 0 ) < y(x, t)

holds.Since by definition p(x, t) � p(x, t, Scr , 0 ), it follows from (3.2) and Theorem 1

that (x, t) e W*(c). We choose the stopping times as

S(x, 0 = infts > tly(W(s), s) � Qc} ,

Page 8: The Shape of Bayes Tests of Power One

1036 H. R. LERCHE

where the constant Q> 1 will be defined below in such a way that Qc < 1. [Infact the stopping times S(x , t) all arise from the same stopping time TQ, bychanging the starting point of the process.] We need several representations ofS(x , t) during the proof.

x, t) = infts > tlFw(s) , s(t0}) .� Qc)

(3.3) {

= inf s> tdi

I dpe, t)Gx, t(d0 ) ^ b(x , t)

= inf {S > tt + r (1 (W(s) 2 x2 )

s+r 2 s+r t+r

)exp ^ b(x, t)},

where

b(x , t) =(1— y(x , t))Qc .

The first equality holds by definition. The second equality follows by thecalculation

y(x , t)Fw(s), MO)) = y(x , t) + (1 — y(x , t))h(x , t, w, s)

withdp4x, t)

h(x, t, w , s) = f ap( 'xs, t) Gx, t(d0) .../-.- 0, s

The third equality follows by the following calculation [note here that Gx , t =N(x / t + r), 1/(t + r)],

dPtlx ' t )J ' s Gx Ade)

dP ( x ' t ) '0, s

11 t + r= f exp(O(W(s) — x) — -0 2 (s — t))

2 ,7

( t + r) ( 0x )2)X exp ( dO

2 t + r2X 1 dO

= V (t + r) exp( 2(t + r)

f exp(OW(s)1 1202(s + r)) ITT-Tr

=11 t + r (1 (147(8) 2

expX 2 \

s+r 2 s+r t+r))•

We start now to estimate the posterior risk for S(x , t) , which is given according to

y(x , t)(1 — Qc)

Page 9: The Shape of Bayes Tests of Power One

BAYES TESTS OF POWER ONE 1037

Lemma 1 by

p(x, t, x, t)) = y(x, t)Pgx't ) (t < Scx, <

+ (1 — y(x, t))c 0 2 (S(x , t) — t) dqx't )

with the new notation Q ( x' t )(dW, de) = 136;x' t )(dW)Gx, (de). We will also use(-7-2 (x,t)(dw) = f 04x, t)(dw\r'f

)LT t(d0 ) and will from now on simply write S in-

stead of Scr , t) • Then we get for the first term

Qc 1 — y(x, t)

1 — Qc y(x, t)

This follows from a well known martingale argument [see Robbins and Siegmund(1970), Lemma 1] by using (3.3):

,p(x t)(t < S < 0)= f

dp.,0,s _7_,t<s<so , d(6x, aQ (x,

= b(x, t)vx,ot t. < s < so ) .

Since (2 (x, ott < s < S 0 , 1 as so --> oo (3.5) follows. We note that for

(x, t) e W(Qc), b(x, t) > 1 holds and that thus the probability in (3.5) is lessthan one.

To estimate the second term of (3.4) we rewrite the integral. Since on wehave

(W(S) 1Q ( x' t) (dW, de) = N )(d0)Q(x t) (dW) ,

S+r' S+r

we get for the integral

f0 2 (S — t) dQ (x' t) = f 0 2 (S + r) dQ (x't ) — f 0 2 (t + r) dQ ( xm

= j1fo2s + r)NW(S) 1 \

S+r' S+r)(de) t)

f0 2(t r)Gx, (de)

W(S) 2 X2 Lic2(x,

S+r t+r)

Using now the third form in (3.3) of the stopping rule S yields

(3.7) f 147(S)2 x 2

de-2(x' t) = 2 log b(x, t) + flog ( S r dqx , t) .S+r t+r

)

t + r

(3.4)

(3.5) p (x t) ( t < S < (DC ) = b(x, t)'.

(3.6)

=

Page 10: The Shape of Bayes Tests of Power One

1038 H. R. LERCHE

Let a> 3. We show now that there exists a constant 0 < Ca < oo with1/a

(3.8) flog( S ± r+ r ) d(1 (xm � Ca [f 0 2 (S — t) dQ (x'°1 .t

Then (3.6), (3.7), and (3.8) yield

(3.9) f 0 2 (S — t) dQ ( x' t) � 2log b(x, t) + Ca[f e 2 (S — t) dQ ( x , 01 1/a ,

from which one can derive (3.2), as will be explained below.The proof of (3.8) runs as follows. By using the inequality log(1 + x) K ax l/a

for x ^ 0 we get by Holder's inequality

(

t+r )

do,S + r \ (x . ‘ ::____ S — t\ do(x,t)flog " f log(1 +

t + r)

(3.10)

� K a f 1 S — t ra _

dqx' 0t + r

= K a f (0 2 (S — t)) 1/a (0 2 (t + 0) -1/a dQ (x' t)

� K a [f 0 2 (S — t) dQ (x' °I I/ a

X [f(0 2 (t ± 7"Gx, t (d0)1(a-1)/a

But since Gx , t = N(x/(t + r), 1/(t + r)) we get for a > 3

f (0 2 (t + r)) -1/(a-1)

Gx, t (d0) � f (0 2 (t + r)) - "a- 1) N(0 1 ) de)' t + r

=J y -2/( « -iw(0,1)(dy) < oo.

We now put1(a-1)/a

Ca = K a [f y -2/( a -1)M0,1)(dy)

and get finally (3.8) and (3.9).Let b> 1 be given. Then by (3.9) there exists a constant B> 2 such that for

all b(x, t) ^ b

(3.11) .102(S — t) dQ (x'° � B log b(x, t)

holds. Now we choose Q = B/(1 + Bc) and M = bB. Then for (x, t) e'(Mc/(1 + Mc)) we have

1 — Qc y(x, t) y(x, t) Mc> —= b

Qc 1 — y(x, t) Bc(1 — y(x, t)) Bc 'b(x, t) =

Page 11: The Shape of Bayes Tests of Power One

BAYES TESTS OF POWER ONE 1039

and by (3.4), (3.5), and (3.11) we get further

p(x, t, S) � y(x, t)b(x, 0 -1 + (1 — y(x, t))Bc log b(x, t)

= y(x, t)b(x, t) -1 (1 + log b(x, t))

< y(x, t).

The last inequality follows from the inequality x(1 + log x -1 ) < 1 for x < 1 sinceb(x, t) > b> 1. This proves (3.2).

Now we prove the upper inclusion. We show

2c(3.12) p(x, t) = y(x, t) if y(x, t) < .

1 + 2c

This implies the upper inclusion of statement (3.1). The method of proof consistsin comparing the Bayes rule 71* with the best rule if 0 were known.

For the Bayes rule Tc* we always have

y(x, t) ^ p(x, t) = p(x, t,714`)

= y(x, t

+ (1 — y(x, Wcf0 2 (71,* — t) dqx , 0

-- f 00 [y(x, t)Paxm(t � 71,* < oo)- 00

+ (1 — y(x, t))c02 4,, t)( Tc* _ t)]Gx , t(d0).

Let the process W start in x and let W(u) = z. Under the transformationy = 0(z — x), s = 0 2(u — t) Brownian motion with drift 0 (resp. 0) goes over intoBrownian motion with drift 1 (resp. 0). With So = 0 2(71,* — t) we get

= f00

[y(x, t)Pa") (0 � So < oo) + (1 — y(x, t))cg00)S0 1Gx , t(d0)

> inf [y(x, t)Pa" ) (0 � S < oo) + (1 — y(x, t))cE 0 ' 0)S1 =. p(x, t).s

But p(x, t) is the minimal Bayes risk of (1.6) with y = y(x, t). We determine nowits Bayes stopping set. By (1.4)

ii(x, t) = min (yp + 2(1 — y)c log p -1 )0<p<1

= ypo + 2(1 — y)c log /5, ( 1

with po = (2(1 — y)c)/y A l(po denotes the stopping probability). Thus

2(1 — y)c -1 2(1 — y)c _p(x, t) = 2(1 — y)41 + log

if <( 1

7 ) 7

and

2(1 — )ckx, t) = y if >1.

7

)Pax , o(t � 71,* < 00)

Page 12: The Shape of Bayes Tests of Power One

1040 H. R. LERCHE

The Bayes stopping region is therefore equal to

{(x, t)iy(x, t) = i5(x, t)} = {(x, t)iy(x, t) � yo ),

where yo = 2 c(1 + 2 c) -1y0 is determined by the equation

2(1 — yo )cPo = =1E0

Yo

We now derive a refinement of the statement of Theorem 2. For this we need asomewhat more general notation. If h(t) is a positive function of time we shalldenote by (e(h(.)) = (x, t)ly(x, t) > h(t)}.

THEOREM 3. For every c> 0 there exists a bounded function e(.) ^ c with

(a) e(t)/c ---* 1 when t ---> cc for every fixed c, and(b) supo , coe(t)/c —> 1 when c --> 0, such that

W1( 2e(.) 2 c)

1 + 2 e(•) c vc(c) c w ( \

1 + 2c )

holds.

The theorem states that for c small or t large the optimal stopping region isvery near to its upper bound W(2 c/(1 + 2c)). The proof of Theorem 3, which isdeferred to the end of this section, will show that the upper bound of ë( •)/c is abit larger than M/2 where M is the constant appearing in Theorem 2.

Several conclusions can be drawn from the theorem. Let tP*c (t) =inftx > Olp(x, t) = y(x, t)}. By Theorem 2 this definition makes sense. Thus bythe symmetry of the problem

Tc*= infttlIW(t)1 ^ Iii c*(t)}.

COROLLARY 1.

tP*,(t) = [(t+ r)(log( t+ r ) + 2 log2(1 —

1'

y)c + 0(1))f

./2

r

when t --> co.

COROLLARY 2.

(t + r ))]1/214(0 = [(t+ r)(21og

2(1 —7

y)c + log + 0(1)

r

uniformly in t when c --> 0.

COROLLARY 3. For every 8> 0 there exists a co > 0 such that

for all co ^ c > 0.T201+0/0.+2c(i+E)) -� Tc* � T2c/(1+ 2c)

(3.13)

Page 13: The Shape of Bayes Tests of Power One

BAYES TESTS OF POWER ONE 1041

We can combine Corollary 3 with some recent results about boundary crossingdistributions to get the minimal Bayes risk for (1.1) up to an o(c)-term. A related0(c)-result for the Bayes risk has been obtained by Pollak (1978), when there isan indifference zone in the parameter space.

THEOREM 4.

(3.14) 0 � p(T2c/(1+2C) ) — p(71*) = o(c) when c —> 0.

The minimal Bayes risk for (1.1) is given by

(3.15) p(Tc4`) = 2(1 — y)c[log b + loglog b + 1 + Dog2 — A + 0(1)]

when c —> 0. Here

Y 00

b = and A = 2f log x 0(x) dx.2(1 — y)c o

REMARK. Comparing statement (3.15) with the related formula (1.8) for thesimple testing problem shows that the additional term 2(1 — y)cGiog(2 log b) —A + o(1)] appears in the minimal Bayes risk. This is caused by the ignoranceabout the parameter 0 * 0.

PROOF. From Corollary 3 it follows

P(T2c(i+E)/(1+2c(1+0))

(3.16)

— Y [Po(T2c(i+E)/(1+2c(i+E)) < 00 ) — Po(T2c/(1 -1- 20 < 00)]

< p(7?) � P(T2c/(1+20)-

We now show that the right- and left-hand sides of (3.16) differ from each otheronly by a o(c)-term. Formula (3.5) yields

(3.17) Po(1(1+)/(l+2c(i+E)) < °(:)) — Po(T2c/(1+20 < cx) ) � Eb -1 = 0(Ec)•

Now we compute p(T2c/(1+ 2c))• We write from now on for simplicity T instead of7'2c/a +20 . By (3.5) and (3.7) for x = 0, t = 0

(3.18) p(T) = 2(1 — y)c[l + log b + —1 flog(T ± r ) dQ2 r 1•

The integral on the right-hand side can be calculated by using Theorem 5 ofJennen and Lerche (1981). The following result is intuitively plausible by virtueof the relation

Po tT/log b —> 20 -2 } = 1.

We note

(3.19) flogT + r )

r dQ = log(2 log b) — 2A + o(1).

Page 14: The Shape of Bayes Tests of Power One

1042 H. R. LERCHE

Combining (3.19) with (3.18) yields

(3.20) p(T2,/(1+20log(2 log b)

) = 2(1 — y)c[log b + + 1 — A + 0(1)1.2

From (3.17) and (3.20) it follows also that

(3.21) P(T2 P(T2,/(1-1-20) + 0 ( 8c)•

Statement (3.21) together with (3.16) and (3.17) yields (3.14) and (3.14) togetherwith (3.20) yields (3.15). 0

PROOF OF THEOREM 3. The upper inclusion of (3.13) is already proved by(3.12). Now we prove the lower one. For the stopping times

2c(3.22) S(,,t) = inf{s ^ tly(W(s), s) �

1 + 2c f '

we show that

(3.23) p(x, t, S(x., 0 ) < y(x, t) for (x, t) e W( 1 +2e2( e.)(.) ) Vt Mc

k 1 + Mc )'

where e(.) will be specified below. M is the constant of Theorem 2. Then (3.1)together with (3.23) implies the lower inclusion of (3.13).

Now we define "e(t). We note that for the stopping times (3.22) by (3.9) witha = 4 and

y(x, t)b(x, t) = 2(1 — y(x, t))c

the inequality

(3.24) f 0 2 (S — t) d+Q (x' t) � 2log b(x, t) + C(x, t)[f 0 2 (S — t) dQ (x't ) 1 1/4

holds. The constants C(x, t) are given by3/4

C(x, t) = If( f (0 2 (t + r))-1Gx,t(d0) )

= K(f y -2/31■1(ellt + r ,1)(dy) )3/4

with 0 = x/(t + r).Let trc: and tii c7 denote the positive and negative branches of the solution of

the implicit equation y(4)E(t), t) = Mc/(1 + Mc). By symmetryth.± ------ ±tp,where IP, is given by

7 )1 1/2

'PP) = [( t + r) (log( t +r

r ) + 2 log (1 — y)Mc 1 j

We choose

(3.26) e(t, c) = — log(1 — CO PP), 0 11 A log M/2

(3.25)

Page 15: The Shape of Bayes Tests of Power One

BAYES TESTS OF POWER ONE 1043

and put .(t) = c exp(e(t, c)). Let a> 1. Let d(a) = infty > ha log(ay) <ay — 1). d( a) is uniquely determined. We define '40 = d(e(t)/c)e(t).

Now we claim that ë( )/c has the demanded properties (a) and (b). By (3.25)C(x, t) depends only on 1014 + 7.1= lx1/(1/t + r). Evaluating 1014 + r1 at thegraphs ( ± %Mt), t) yields

Y 1 1/2I Olit + r I = [log(t + r) + 2 log

(1 — y)Mc i

which tends to infinity, uniformly in t when c —> 0, or when t —> cx) .Consequently, C( +IMO, t) —› 0 and therefore by (3.26) e(t, c) —> 0, uniformly

in t when c —> 0, or when t —> a). Since d(a) —> 1 as a —> 1 the properties (a) and(b) follow.

Now we show (3.23). As a first step we prove

(3.27) cf 0 2 (S — t)dQ ( x' t) � H(t)log b(x, t)

for

2(.) \ \ w ( Mc \(x, t) e W1 (

1 + 24.) ) \ 1 + 'tic ) •

By (3.26) we can assume that E(t) < Mc/2. Let H(x, t) = f 0 2(S — t) clQ (x' t) .Then we have from (3.6), (3.7), and (3.24) (with the x and t variables suppressed)

(3.28) 2 log b < H � 2 log b + CH 1/4 .

Let C1(t) = C( P(t), t). Then 0 < C(x, t) � C1(t) holds on W(Mc/(1 + MO) c

and therefore

(3.29) (1 — C1/11 3/4 )H � 2 log b.

If

(3.30) b(x, t) _̂ exp(Xti(t))

holds for

(Mc \(x, t) e Wi 2(*)

1 + H(.) )\w'

1 + Mc)

then we get from the left-hand side of (3.28) C1(t) � H(x, t) and therefore from(3.29)

H(x, t) � 2 log(b(x, 0)(1 — C 1 (t) 1/4) 1 .

But this yields (3.27).It is left to show that (3.30) holds. Let

C / p(;Ci(t)).c(t) = 1 — 00 1/4 ' ex- —

Page 16: The Shape of Bayes Tests of Power One

1044 H. R. LERCHE

An elementary calculation shows that c(t) ^ c for 0 � C1(t) � 1. Then

y(x, t) y(x, t)b(x, t) =

2(1 — y(x, t))c ^ 2(1 — y(x, t))c(t)

y(x, t)exp(C i(t))

2(1 — y(x, 0)40

> exp( 12 Ci(t)).

The second equation holds since

-6(0 = c(1 — Ci (t) 1/4) 1

and the last inequality follows from the definition of w(2e(•)/(1 + 2e-(•)). Thisproves (3.30) and completes the proof of (3.27).

Combining now (3.4) and (3.5) with (3.27) yields for the stopping times (3.22)the estimate for the Bayes risks

(3.31) p(x, t, S) � y(x, t)b(x, 0 -1 + 2(1 — y(x, t))E(t)log b(x, t)

with

7(x, t)on'(

24 ) \

.) wi Mc )b(x, t) = 2(1 — y(x, t))c 1 + 2(.) 1 + mc )•

We assume now that

i Mc \(x, t) e Wi 2e(*) W'

1.+2 -6(•))\ 1.+Mcj

and estimate the right-hand side of (3.31) further. It is equal to

(3.32) y(x, t)b(x, till + (E(t)/c)logb(x, 0]

= y(x, t)[h(x, 040/ cj -1 [1 + (c(t)/c)log(h(x, 040/ c)]

with h(x, t) = y(x, t)/(2(1 — y(x, OEM). Since (ay) -1(1 + a log(ay)) < 1 fory> d(a) and since on W(2e(.)/(1 + 2e(.)), h(x, t) > d(E(t)/c) by the definitionof e, it follows that expression (3.32) is strictly less than y(x, t). This yields (3.23)and completes the proof of Theorem 3. 0

4. Results for one-sided tests. In this section we consider the Bayes riskgiven by

co(4.1) p(T) = yPo(T < co) + (1 — y)cf 0 2E0T0({7: 0)2{r- dO .

o

For it we can characterize the minimizing stopping rule T: (it also exists) byresults similar to those for the two-sided case.

Page 17: The Shape of Bayes Tests of Power One

BAYES TESTS OF POWER ONE 1045

If not mentioned otherwise we will use the same notation as in the precedingsections for the corresponding objects here. For instance,

y(x, t) = Fx , MOD

= [1 + 1 7 i °° exp(Ox — 0 2 t)0(0{7.-- )2{r- dO] 1 ,y 0

and also p(x, t, T), p(x, T), W *(c), W (Kc), etc. The prior on [0, co) is given by

F = y(30 + (1 — y) fi({r- 0)2{r- a .

The posterior at (x, t) can be represented as

F,, t = y(x , t)S0 + (1 — y(x , t))11„

where

( x 1xH = N (1) ,

t + r ' t + r) Ix,t ( Ilt + r 1

on (0, co). We only state the analogous result to Theorem 2. The counterpart toTheorem 3 holds also and can be proved in exactly the same way as Theorem 3.

THEOREM 5. There exists a constant K> 2 such that for every c> 0

2cW

(1 + c_Kc) c W*(c) c W (1 + 2c ).

PROOF. The proof of the upper inclusion of (4.2) runs exactly along the samelines as that of (3.1). For the lower inclusion we show that for all (x, t) EW(Kc/(1 + Kc)) there exists a stopping time S(x , t) of the processes (W(s), s)starting at (x, t) such that

(4.3) p(x, t, S(x , 0 ) < y(x, t)

holds. Let Q denote a constant which satisfies Qc < 1. We choose

S(x,t) = infts > tly(W(s), s) � Qc} ,

which can be rewritten as

ro d.134;x t)

Jo dP(x,ollx 't(d0) _> b(x, t)}

0,s S(x,t) = inf s > t

= inf{ s > t

(4.4) (W(s) \t + r 1 W(s) 2 x2 )) 421) Vs + r ) exps + r 2 s + r t + r ( x)

k Vt. + r 1

(4.2)

Page 18: The Shape of Bayes Tests of Power One

1046 H. R. LERCHE

withy(x , t)(1 — Qc) Y

b(X, t) = and ts(y) = f 0(x) dx.(1 — y(x , t))Qc J - co

The posterior risk at (x, t) can be represented as

p(x, t, S(x , 0 ) = y(x, t)13 t ) (t < S(x , 0 < co)(4.5) 00

+ (1 — y(x, t))c f 0 2.4") ( S(x , 0 — t)Hx , t (d0).

From here on we write S instead of S(x , 0 • The same martingale argument asfor (3.5) yields

pax, t)( t < s < 00) = b(x, 0-1 .

The estimate of the other part of the Bayes risk (4.5) is a bit more complicatedthan that of the corresponding part of (3.4). It can be expressed after somecalculations similar to those of (3.6) as follows:

i 0 2E t)(S — t)Hx , t (c10) = f (W(S)2

t

x

+r

2 ) cfqx,t)

S+r (4.6)

with± f ( 14111:T

r)

h( x Ilt + r

)) dCP' t)

h(y) = yck( y) /4D( y) and Q (x' t) = f °°/V't )Hx , t (dO).o

Using the defining equation (4.4) of the stopping time S yields

2E4x' t) (S — t)Hx , t(clO)f:0S + r _

= 2log b + flog( )

t ± r dQ ( x' t)

± f[h(

f x }1

VWS (: )r) li Vt + r ) i d(1(x't)

—2f[g( vWs (: )r ) g( v

it x+

r)14(x't)

with g( y) = log to( y).Now after some calculations we get

xh( W(S) ) h

( V x ) 2 [g (AISW(: )r) g (ilt+ r)]t+r

(4.7)

fx/147(tS+)/r 11S+r ( h,( y ) 2g,(y)) dy

= imsols+ r 0( y) rjx/Ift_7_,„0( y) [— (1 + y 2 ) — h(y)1 ply.

But this integral is always negative. It is obvious that the integrand is negative

(4.8)

Page 19: The Shape of Bayes Tests of Power One

BAYES TESTS OF POWER ONE 1047

for positive values of y. That it is also negative for negative y-values can be seenas follows. We have to show that

(4.9) —(1 + y 2 ) — yo(y)/(1)(y) � 0 for y <0,

which is equivalent to

—(1 + y 2 )(1 — (I)(y)) +y(y) � 0 for y > 0

and to

(4.10) 1 — (I)(y) ^ (y/(1 + y 2 ))0(y) for y> 0.

Both sides of (4.10) vanish at y = co and the derivative of the left-hand side isalways smaller than that of the right-hand side and both are negative, i.e.,

—0(y) < —0(y)(1 — 2/(1 + y 2 ) 2) for all y > O.

This yields (4.9) and therefore the integrand in (4.8) is always negative.It is left to show that W(S)/ IS + r> x/ Vt + r. Now let K/(1 + Kc) > Q.

Then (x, t) G W(Kc/(1 + Kc)) implies y(x, t) > Qc, which yields b(x, t) > 1.This together with (4.4) implies for S > t the inequality

(x 2 X t ± r i, (W(S) 2 ) 0 ( W(S) \

exp (I) 2( t + r) ) ( Vt + r ) <

exP

S + r 2(S + r) k IIS + r )•

Since the function X '- is increasing this yields W(S)/ IS + r>x/ Vt + r. Thus the expression in (4.8) is always negative, which by (4.7) yields

2 x't ) (S — t)Hx , t (c/O) � 2log b + f log( S ± r0E ) dqx't ) .fc° , (90 t ± r

The rest of the proof is similar to that of Theorem 2 from (3.7) on. 0

Acknowledgments. The author wishes to thank W. Ehm, C. Jennen, I.Johnstone, C. Klaasen, S. Lalley, T. L. Lai, D. W. Miller, T. Sellke, and D.Siegmund for stimulating discussions and the Associate Editor and the refereesfor their careful work.

REFERENCES

CORNFIELD, J. (1966). A Bayesian test of some classical hypotheses—with application to sequentialclinical trials. J. Amer. Statist. Assoc. 61 577-594.

DARLING, D. and ROBBINS, H. (1967). Iterated logarithm inequalities. Proc. Nat. Acad. Sci U.S.A.57 1188-1192.

JEFFREYS, H. (1948). Theory of Probability. 2nd ed. Clarendon, Oxford.JENNEN, C. and LERCHE, H. R. (1981). First exit densities of Brownian motion through one-sided

moving boundaries. Z. Wahrsch. verw. Gebiete 55 133-148.JENNEN, C. and LERCHE, H. R. (1982). Asymptotic densities of stopping times associated with tests

of power one. Z. Wahrsch. verw. Gebiete 61 501-511.LERCHE, H. R. (1985). On the optimality of sequential tests with parabolic boundaries. In Proc. of

the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer. (L. Le Cam and R.Olshen, eds.) 2 579-597. Wadsworth, Monterey, Calif.

Page 20: The Shape of Bayes Tests of Power One

1048 H. R. LERCHE

LERCHE, H. R. (1986). An optimal property of the repeated significance test. Proc. Nat. Acad. Sci.U. S. A. 83 1546-1548.

LIPTSER, R. S. and SHIRYAYEV, A. N. (1977). Statistics of Random Processes 1. Springer, Berlin.',ORDEN G. (1977). Nearly-optimal sequential tests for finitely many parameter values. Ann. Statist.

5 1-21.POLLAK, M. (1978). Optimality and almost optimality of mixture stopping rules. Ann. Statist. 6

910-916.R,OBBINS, H. (1970). Statistical methods related to the law of the iterated logarithm. Ann. Math.

Statist. 41 1397-1409.ROBBINS, H. and SIEGMUND, D. (1970). Boundary crossing probabilities for the Wiener process and

partial sums. Ann. Math. Statist. 41 1410-1429.RoBBINs, H. and SIEGMUND, D. (1973). Statistical tests of power one and their integral representa-

tion of solutions of certain partial differential equations. Bull. Acad. Sinica 1 93-120.SHIRYAYEV, A. N. (1978). Optimal Stopping Rules. Springer, Berlin.WALD, A. (1947). Sequential Analysis. Wiley, New York.

UNIVERSITAT HEIDELBERG

INSTITUT FUR ANGEWANDTE MATHEMATIK

IM NEUENHEIMER FELD 2946900 HEIDELBERG 1

WEST GERMANY