Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of...

29
Statistical Science 2006, Vol. 21, No. 1, 70–98 DOI 10.1214/088342305000000467 © Institute of Mathematical Statistics, 2006 The Sources of Kolmogorov’s Grundbegriffe Glenn Shafer and Vladimir Vovk Abstract. Andrei Kolmogorov’s Grundbegriffe der Wahrscheinlichkeits- rechnung put probability’s modern mathematical formalism in place. It also provided a philosophy of probability—an explanation of how the formalism can be connected to the world of experience. In this article, we examine the sources of these two aspects of the Grundbegriffe—the work of the earlier scholars whose ideas Kolmogorov synthesized. Key words and phrases: Axioms for probability, Borel, classical probabil- ity, Cournot’s principle, frequentism, Grundbegriffe der Wahrscheinlichkeits- rechnung, history of probability, Kolmogorov, measure theory. 1. INTRODUCTION Andrei Kolmogorov’s Grundbegriffe der Wahr- scheinlichkeitsrechnung, which set out the axiomatic basis for modern probability theory, appeared in 1933. Four years later, in his opening address to an interna- tional colloquium at the University of Geneva, Maurice Fréchet praised Kolmogorov for organizing a theory Émile Borel had created many years earlier by com- bining countable additivity with classical probability. Fréchet (1938b, page 54) put the matter this way in the written version of his address It was at the moment when Mr. Borel in- troduced this new kind of additivity into the calculus of probability—in 1909, that is to say—that all the elements needed to for- mulate explicitly the whole body of axioms of (modernized classical) probability theory came together. It is not enough to have all the ideas in mind, to recall them now and then; one must make sure that their totality is sufficient, Glenn Shafer is Professor, Rutgers Business School, Newark, New Jersey 07102, USA and Royal Hol- loway, University of London, Egham, Surrey TW20 OEX, UK (e-mail: [email protected]). Vladimir Vovk is Professor, Royal Holloway, University of London, Egham, Surrey TW20 OEX, UK (e-mail: [email protected]). bring them together explicitly, and take re- sponsibility for saying that nothing further is needed in order to construct the theory. This is what Mr. Kolmogorov did. This is his achievement. (And we do not believe he wanted to claim any others, so far as the axiomatic theory is concerned.) Perhaps not everyone in Fréchet’s audience agreed that Borel had put everything on the table, but surely many saw the Grundbegriffe as a work of synthesis. In Kol- mogorov’s axioms and in his way of relating his ax- ioms to the world of experience, they must have seen traces of the work of many others—the work of Borel, yes, but also the work of Fréchet himself, and that of Cantelli, Chuprov, Lévy, Steinhaus, Ulam and von Mises. Today, what Fréchet and his contemporaries knew is no longer known. We know Kolmogorov and what came after; we have mostly forgotten what came be- fore. This is the nature of intellectual progress, but it has left many modern students with the impression that Kolmogorov’s axiomatization was born full grown— a sudden brilliant triumph over confusion and chaos. To understand the synthesis represented by the Grundbegriffe, we need a broad view of the founda- tions of probability and the advance of measure the- ory from 1900 to 1930. We need to understand how measure theory became more abstract during those decades, and we need to recall what others were saying about axioms for probability, about Cournot’s principle and about the relationship of probability with meas- 70

Transcript of Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of...

Page 1: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

Statistical Science2006, Vol. 21, No. 1, 70–98DOI 10.1214/088342305000000467© Institute of Mathematical Statistics, 2006

The Sources of Kolmogorov’sGrundbegriffeGlenn Shafer and Vladimir Vovk

Abstract. Andrei Kolmogorov’s Grundbegriffe der Wahrscheinlichkeits-rechnung put probability’s modern mathematical formalism in place. It alsoprovided a philosophy of probability—an explanation of how the formalismcan be connected to the world of experience. In this article, we examine thesources of these two aspects of the Grundbegriffe—the work of the earlierscholars whose ideas Kolmogorov synthesized.

Key words and phrases: Axioms for probability, Borel, classical probabil-ity, Cournot’s principle, frequentism, Grundbegriffe der Wahrscheinlichkeits-rechnung, history of probability, Kolmogorov, measure theory.

1. INTRODUCTION

Andrei Kolmogorov’s Grundbegriffe der Wahr-scheinlichkeitsrechnung, which set out the axiomaticbasis for modern probability theory, appeared in 1933.Four years later, in his opening address to an interna-tional colloquium at the University of Geneva, MauriceFréchet praised Kolmogorov for organizing a theoryÉmile Borel had created many years earlier by com-bining countable additivity with classical probability.Fréchet (1938b, page 54) put the matter this way in thewritten version of his address

It was at the moment when Mr. Borel in-troduced this new kind of additivity into thecalculus of probability—in 1909, that is tosay—that all the elements needed to for-mulate explicitly the whole body of axiomsof (modernized classical) probability theorycame together.

It is not enough to have all the ideas inmind, to recall them now and then; one mustmake sure that their totality is sufficient,

Glenn Shafer is Professor, Rutgers Business School,Newark, New Jersey 07102, USA and Royal Hol-loway, University of London, Egham, Surrey TW20OEX, UK (e-mail: [email protected]).Vladimir Vovk is Professor, Royal Holloway, Universityof London, Egham, Surrey TW20 OEX, UK (e-mail:[email protected]).

bring them together explicitly, and take re-sponsibility for saying that nothing furtheris needed in order to construct the theory.

This is what Mr. Kolmogorov did. Thisis his achievement. (And we do not believehe wanted to claim any others, so far as theaxiomatic theory is concerned.)

Perhaps not everyone in Fréchet’s audience agreed thatBorel had put everything on the table, but surely manysaw the Grundbegriffe as a work of synthesis. In Kol-mogorov’s axioms and in his way of relating his ax-ioms to the world of experience, they must have seentraces of the work of many others—the work of Borel,yes, but also the work of Fréchet himself, and thatof Cantelli, Chuprov, Lévy, Steinhaus, Ulam and vonMises.

Today, what Fréchet and his contemporaries knewis no longer known. We know Kolmogorov and whatcame after; we have mostly forgotten what came be-fore. This is the nature of intellectual progress, but ithas left many modern students with the impression thatKolmogorov’s axiomatization was born full grown—a sudden brilliant triumph over confusion and chaos.

To understand the synthesis represented by theGrundbegriffe, we need a broad view of the founda-tions of probability and the advance of measure the-ory from 1900 to 1930. We need to understand howmeasure theory became more abstract during thosedecades, and we need to recall what others were sayingabout axioms for probability, about Cournot’s principleand about the relationship of probability with meas-

70

Page 2: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 71

ure and frequency. Our review of these topics drawsmainly on work by authors listed by Kolmogorovin the Grundbegriffe’s bibliography, especially SergeiBernstein, Émile Borel, Francesco Cantelli, MauriceFréchet, Paul Lévy, Antoni Łomnicki, Evgeny Slutsky,Hugo Steinhaus and Richard von Mises.

We are interested not only in Kolmogorov’s math-ematical formalism, but also in his philosophy ofprobability—how he proposed to relate the mathemat-ical formalism to the real world. In a letter to Fréchet,Kolmogorov (1939) wrote, “You are also right in at-tributing to me the opinion that the formal axioma-tization should be accompanied by an analysis of itsreal meaning.” Kolmogorov devoted only two pages ofthe Grundbegriffe to such an analysis, but the ques-tion was more important to him than this brevity mightsuggest. We can study any mathematical formalism welike, but we have the right to call it probability only ifwe can explain how it relates to the phenomena classi-cally treated by probability theory.

We begin by looking at the classical foundation thatKolmogorov’s measure-theoretic foundation replaced:equally likely cases. In Section 2 we review how proba-bility was defined in terms of equally likely cases, howthe rules of the calculus of probability were derivedfrom this definition and how this calculus was relatedto the real world by Cournot’s principle. We also lookat some paradoxes discussed at the time.

In Section 3 we sketch the development of measuretheory and its increasing entanglement with probabilityduring the first three decades of the twentieth century.This story centers on Borel, who introduced countableadditivity into pure mathematics in the 1890s and thenbrought it to the center of probability theory, as Fréchetnoted, in 1909, when he first stated and more or lessproved the strong law of large numbers for coin toss-ing. However, the story also features Lebesgue, Radon,Fréchet, Daniell, Wiener, Steinhaus and Kolmogorovhimself.

Inspired partly by Borel and partly by the challengeissued by Hilbert in 1900, a whole series of mathe-maticians proposed abstract frameworks for probabil-ity during the three decades we are emphasizing. InSection 4 we look at some of these, beginning withthe doctoral dissertations by Rudolf Laemmel and UgoBroggi in the first decade of the century and includingan early contribution by Kolmogorov, written in 1927,five years before he started work on the Grundbegriffe.

In Section 5 we finally turn to the Grundbegriffe it-self. Our review of it will confirm what Fréchet saidin 1937 and what Kolmogorov says in the preface: it

was a synthesis and a manual, not a report on new re-search. Like any textbook, its mathematics was novelfor most of its readers, but its real originality wasrhetorical and philosophical.

2. THE CLASSICAL FOUNDATION

The classical foundation of probability theory, whichbegins with the notion of equally likely cases, heldsway for 200 years. Its elements were put in place earlyin the eighteenth century, and they remained in placein the early twentieth century. Even today the classicalfoundation is used in teaching probability.

Although twentieth century proponents of new ap-proaches were fond of deriding the classical foundationas naive or circular, it can be defended. Its basic math-ematics can be explained in a few words, and it canbe related to the real world by Cournot’s principle, theprinciple that an event with small or zero probabilitywill not occur. This principle was advocated in Franceand Russia in the early years of the twentieth century,but disputed in Germany. Kolmogorov retained it in theGrundbegriffe.

In this section we review the mathematics of equallylikely cases and recount the discussion of Cournot’sprinciple, contrasting the support for it in France withGerman efforts to find other ways to relate equallylikely cases to the real world. We also discuss two para-doxes, contrived at the end of the nineteenth centuryby Joseph Bertrand, which illustrate the care that mustbe taken with the concept of relative probability. Thelack of consensus on how to make philosophical senseof equally likely cases and the confusion revealed byBertrand’s paradoxes were two sources of dissatisfac-tion with the classical theory.

2.1 The Classical Calculus

The classical definition of probability was formu-lated by Jacob Bernoulli (1713) in Ars Conjectandiand Abraham de Moivre in (1718) in The Doctrine ofChances: the probability of an event is the ratio of thenumber of equally likely cases that favor it to the to-tal number of equally likely cases possible under thecircumstances.

From this definition, de Moivre derived two rules forprobability. The theorem of total probability, or the ad-dition theorem, says that if A and B cannot both hap-pen, then

probability of A or B happening

= # of cases favoring A or B

total # of cases

Page 3: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

72 G. SHAFER AND V. VOVK

= # of cases favoring A

total # of cases+ # of cases favoring B

total # of cases= (probability of A) + (probability of B).

The theorem of compound probability, or the multipli-cation theorem, says

probability of both A and B happening

= # of cases favoring both A and B

total # of cases

= # of cases favoring A

total # of cases

· # of cases favoring both A and B

# of cases favoring A

= (probability of A)

· (probability of B if A happens).

These arguments were still standard fare in probabilitytextbooks at the beginning of the twentieth century, in-cluding the great treatises by Henri Poincaré (1896) inFrance, Andrei Markov (1900) in Russia and EmanuelCzuber (1903) in Germany. Some years later we findthem in Guido Castelnuovo’s (1919) Italian textbook,which has been held out as the acme of the genre(Onicescu, 1967).

Geometric probability was incorporated into theclassical theory in the early nineteenth century. Insteadof counting equally likely cases, one measures theirgeometric extension—their area or volume. However,probability is still a ratio, and the rules of total andcompound probability are still theorems. This was ex-plained by Antoine-Augustin Cournot (1843, page 29)in his influential treatise on probability and statistics,Exposition de la théorie des chances et des probabil-ités. This understanding of geometric probability didnot change in the early twentieth century, when Boreland Lebesgue expanded the class of sets for whichwe can define geometric extension. We may now havemore events with which to work, but we define andstudy geometric probabilities as before. Cournot wouldhave seen nothing novel in Felix Hausdorff’s (1914,pages 416–417) definition of probability in the chapteron measure theory in his treatise on set theory.

The classical calculus was enriched at the beginningof the twentieth century by a formal and universal no-tation for relative probabilities. Hausdorff (1901) intro-duced the symbol pF (E) for what he called the relativeWahrscheinlichkeit von E, posito F (relative probabil-ity of E given F ). Hausdorff explained that this nota-tion can be used for any two events E and F , no matter

what their temporal or logical relationship, and that itallows one to streamline Poincaré’s proofs of the ad-dition and multiplication theorems. Hausdorff’s nota-tion was adopted by Czuber in 1903. Kolmogorov usedit in the Grundbegriffe, and it persisted, especially inthe German literature, until the middle of the twenti-eth century, when it was displaced by the more flexibleP(E|F), which Harold Jeffreys (1931) introduced inhis Scientific Inference.

2.2 Cournot’s Principle

An event with very small probability is morally im-possible: it will not happen. Equivalently, an event withvery high probability is morally certain: it will hap-pen. This principle was first formulated within math-ematical probability by Jacob Bernoulli. In his ArsConjectandi, published in 1713, Bernoulli proved acelebrated theorem: in a sufficiently long sequence ofindependent trials of an event, there is a very high prob-ability that the frequency with which the event happenswill be close to its probability. Bernoulli explained thatwe can treat the very high probability as moral cer-tainty and so use the frequency of the event as an esti-mate of its probability.

Probabilistic moral certainty was widely discussedin the eighteenth century. In the 1760s, the French sa-vant Jean d’Alembert muddled matters by questioningwhether the prototypical event of very small probabil-ity, a long run of many happenings of an event as likelyto fail as happen on each trial, is possible at all. A run ofa hundred may be metaphysically possible, he felt, butit is physically impossible. It has never happened andnever will happen (d’Alembert, 1761, 1767; Daston,1979). Buffon (1777) argued that the distinction be-tween moral and physical certainty is one of degree.An event with probability 9999/10,000 is morally cer-tain; an event with much greater probability, such asthe rising of the sun, is physically certain (Loveland,2001).

Cournot, a mathematician now remembered as aneconomist and a philosopher of science (Martin, 1996,1998), gave the discussion a nineteenth century cast.Being equipped with the idea of geometric probabil-ity, Cournot could talk about probabilities that are van-ishingly small. He brought physics to the foreground.It may be mathematically possible, he argued, for aheavy cone to stand in equilibrium on its vertex, butit is physically impossible. The event’s probability isvanishingly small. Similarly, it is physically impossi-ble for the frequency of an event in a long sequence of

Page 4: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 73

trials to differ substantially from the event’s probability(Cournot, 1843, pages 57 and 106).

In the second half of the nineteenth century, the prin-ciple that an event with a vanishingly small probabilitywill not happen took on a real role in physics, mostsaliently in Ludwig Boltzmann’s statistical understand-ing of the second law of thermodynamics. As Boltz-mann explained in the 1870s, dissipative processesare irreversible because the probability of a state withentropy far from the maximum is vanishingly small(von Plato, 1994, page 80; Seneta, 1997). Also notablewas Henri Poincaré’s use of the principle in celes-tial mechanics. Poincaré’s (1890) recurrence theoremsays that an isolated mechanical system confined to abounded region of its phase space will eventually re-turn arbitrarily close to its initial state, provided onlythat this initial state is not exceptional. The states forwhich the recurrence does not hold are exceptionalinasmuch as they are contained in subregions whosetotal volume is arbitrarily small.

Saying that an event of very small or vanishinglysmall probability will not happen is one thing. Sayingthat probability theory gains empirical meaning onlyby ruling out the happening of such events is another.Cournot (1843, page 78) seems to have been the first tosay explicitly that probability theory does gain empir-ical meaning only by declaring events of vanishinglysmall probability to be impossible:

. . . The physically impossible event is there-fore the one that has infinitely small proba-bility, and only this remark givessubstance—objective and phenomenalvalue—to the theory of mathematical prob-ability.

[The phrase “objective and phenomenal” refers toKant’s distinction between the noumenon, or thing-in-itself, and the phenomenon, or object of experi-ence (Daston, 1994).] After the Second World War,some authors began to use “Cournot’s principle” forthe principle that an event of very small or zero proba-bility singled out in advance will not happen, especiallywhen this principle is advanced as the unique means bywhich a probability model is given empirical meaning.

2.2.1 The viewpoint of the French probabilists. Inthe early decades of the twentieth century, probabil-ity theory was beginning to be understood as puremathematics. What does this pure mathematics haveto do with the real world? The mathematicians whorevived research in probability theory in France dur-ing these decades, Émile Borel, Jacques Hadamard,

Maurice Fréchet and Paul Lévy, made the connectionby treating events of small or zero probability as im-possible.

Borel explained this repeatedly, often in a style moreliterary than mathematical or philosophical (Borel,1906, 1909b, 1914, 1930). Borel’s many discussionsof the considerations that go into assessing the bound-aries of practical certainty culminated in a classifica-tion more refined than Buffon’s. A probability of 10−6,he decided, is negligible at the human scale, a proba-bility of 10−15 at the terrestrial scale and a probabilityof 10−50 at the cosmic scale (Borel, 1939, pages 6–7).

Hadamard, the preeminent analyst who did path-breaking work on Markov chains in the 1920s (Bru,2003), made the point in a different way. Probabil-ity theory, he said, is based on two notions: the no-tion of perfectly equivalent (equally likely) events andthe notion of a very unlikely event (Hadamard, 1922,page 289). Perfect equivalence is a mathematical as-sumption which cannot be verified. In practice, equiva-lence is not perfect—one of the grains in a cup of sandmay be more likely than another to hit the ground firstwhen they are thrown out of the cup—but this need notprevent us from applying the principle of the very un-likely event. Even if the grains are not exactly the same,the probability of any particular one hitting the groundfirst is negligibly small. Hadamard was the teacher ofboth Fréchet and Lévy.

Among the French mathematicians of this period, itwas Lévy who expressed most clearly the thesis thatCournot’s principle is probability’s only bridge to re-ality. In his Calcul des probabilités (Lévy, 1925) Lévyemphasized the different roles of Hadamard’s two no-tions. The notion of equally likely events, Lévy ex-plained, suffices as a foundation for the mathematics ofprobability, but so long as we base our reasoning onlyon this notion, our probabilities are merely subjective.It is the notion of a very unlikely event that permits theresults of the mathematical theory to take on practicalsignificance (Lévy, 1925, pages 21, 34; see also Lévy,1937, page 3). Combining the notion of a very unlikelyevent with Bernoulli’s theorem, we obtain the notionof the objective probability of an event, a physical con-stant that is measured by frequency. Objective proba-bility, in Lévy’s view, is entirely analogous to lengthand weight, other physical constants whose empiricalmeaning is also defined by methods established formeasuring them to a reasonable approximation (Lévy,1925, pages 29–30).

By the time he undertook to write the Grundbe-griffe, Kolmogorov must have been very familiar with

Page 5: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

74 G. SHAFER AND V. VOVK

Lévy’s views. He had cited Lévy’s 1925 book in his1931 article on Markov processes and subsequently,during his visit to France, had spent a great deal oftime talking with Lévy about probability. He couldalso have learned about Cournot’s principle from theRussian literature. The champion of the principle inRussia had been Chuprov, who became professor ofstatistics in Petersburg in 1910. Chuprov put Cournot’sprinciple—which he called Cournot’s lemma—at theheart of this project; it was, he said, a basic principleof the logic of the probable (Chuprov, 1910; Sheynin,1996, pages 95–96). Markov, who also worked in Pe-tersburg, learned about the burgeoning field of mathe-matical statistics from Chuprov (Ondar, 1981), and wesee an echo of Cournot’s principle in Markov’s (1912,page 12 of the German edition) textbook:

The closer the probability of an event isto one, the more reason we have to expectthe event to happen and not to expect its op-posite to happen.

In practical questions, we are forced toregard as certain events whose probabilitycomes more or less close to one, and to re-gard as impossible events whose probabilityis small.

Consequently, one of the most importanttasks of probability theory is to identifythose events whose probabilities come closeto one or zero.

The Russian statistician Evgeny Slutsky discussedChuprov’s views in his influential article on limit the-orems (Slutsky, 1925). Kolmogorov included Lévy’sbook and Slutsky’s article in his bibliography, butnot Chuprov’s book. An opponent of the Bolsheviks,Chuprov was abroad when they seized power, and henever returned home. He remained active in Swedenand Germany, but his health soon failed, and he diedin 1926 at the age of 52.

2.2.2 Strong and weak forms of Cournot’s principle.Cournot’s principle has many variations. Like proba-bility, moral certainty can be subjective or objective.Some authors make moral certainty sound truly equiv-alent to absolute certainty; others emphasize its prag-matic meaning.

For our story, it is important to distinguish betweenthe strong and weak forms of the principle (Fréchet,1951, page 6; Martin, 2003). The strong form refers toan event of small or zero probability that we single outin advance of a single trial: it says the event will not

happen on that trial. The weak form says that an eventwith very small probability will happen very rarely inrepeated trials.

Borel, Lévy and Kolmogorov all subscribed toCournot’s principle in its strong form. In this form,the principle combines with Bernoulli’s theorem toproduce the unequivocal conclusion that an event’sprobability will be approximated by its frequency ina particular sufficiently long sequence of independenttrials. It also provides a direct foundation for statisticaltesting. If the meaning of probability resides preciselyin the nonhappening of small-probability events sin-gled out in advance, then we need no additional prin-ciples to justify rejecting a hypothesis that gives smallprobability to an event we single out in advance andthen observe to happen.

Other authors, including Chuprov, enunciated Cour-not’s principle in its weak form, and this can lead in adifferent direction. The weak principle combines withBernoulli’s theorem to produce the conclusion that anevent’s probability will usually be approximated byits frequency in a sufficiently long sequence of inde-pendent trials, a general principle that has the weakprinciple as a special case. This was pointed out inthe famous textbook by Castelnuovo (1919, page 108).On page 3, Castelnuovo called the general principle theempirical law of chance:

In a series of trials repeated a large num-ber of times under identical conditions, eachof the possible events happens with a (rel-ative) frequency that gradually equals itsprobability. The approximation usually im-proves as the number of trials increases.

Although the special case where the probability is closeto 1 is sufficient to imply the general principle, Castel-nuovo preferred to begin his introduction to the mean-ing of probability by enunciating the general principle,and so he can be considered a frequentist. His approachwas influential. Maurice Fréchet and Maurice Halb-wachs adopted it in their textbook in 1924. It broughtFréchet to the same understanding of objective proba-bility as Lévy: objective probability is a physical con-stant that is measured by frequency (Fréchet, 1938a,page 5; 1938b, pages 45–46).

The weak point of Castelnuovo and Fréchet’s po-sition lies in the modesty of their conclusion: theyconclude only that an event’s probability is usually ap-proximated by its frequency. When we estimate a prob-ability from an observed frequency, we are taking a

Page 6: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 75

further step: we are assuming that what usually hap-pens has happened in the particular case. This steprequires the strong form of Cournot’s principle. Ac-cording to Kolmogorov (1956, page 240 of the 1965English edition), it is a reasonable step only if we havesome reason to assume that the position of the partic-ular case among other potential ones “is a regular one,that is, that it has no special features.”

2.2.3 British indifference and German skepticism.The mathematicians who worked on probability inFrance in the early twentieth century were unusual inthe extent to which they delved into the philosophicalside of their subject. Poincaré had made a mark in thephilosophy of science as well as in mathematics, andBorel, Fréchet and Lévy tried to emulate him. The sit-uation in Britain and Germany was different.

In Britain there was little mathematical work inprobability proper in this period. In the nineteenthcentury, British interest in probability had been practi-cal and philosophical, not mathematical (Porter, 1986,page 74ff). Robert Leslie Ellis (1849) and John Venn(1888) accepted the usefulness of probability, but in-sisted on defining it directly in terms of frequency,leaving no role for Bernoulli’s theorem and Cournot’sprinciple (Daston, 1994). These attitudes persistedeven after Pearson and Fisher brought Britain into aleadership role in mathematical statistics. The Britishstatisticians had no puzzle to solve concerning how tolink probability to the world. They were interested inreasoning directly about frequencies.

In contrast with Britain, Germany did see a substan-tial amount of mathematical work in probability dur-ing the first decades of the twentieth century, much ofit published in German by Scandinavians and easternEuropeans, but few German mathematicians of the firstrank fancied themselves philosophers. The Germanswere already pioneering the division of labor to whichwe are now accustomed, between mathematicians whoprove theorems about probability, and philosophers,logicians, statisticians and scientists who analyze themeaning of probability. Many German statisticians be-lieved that one must decide what level of probabil-ity will count as practical certainty in order to applyprobability theory (von Bortkiewicz, 1901, page 825;Bohlmann, 1901, page 861), but German philosophersdid not give Cournot’s principle a central role.

The most cogent and influential of the Germanphilosophers who discussed probability in the latenineteenth century was Johannes von Kries (1886),whose Principien der Wahrscheinlichkeitsrechnung

first appeared in 1886. von Kries rejected what hecalled the orthodox philosophy of Laplace and themathematicians who followed him. As von Kries sawit, these mathematicians began with a subjective con-cept of probability, but then claimed to establish theexistence of objective probabilities by means of a so-called law of large numbers, which they erroneouslyderived by combining Bernoulli’s theorem with the be-lief that small probabilities can be neglected. Havingboth subjective and objective probabilities at their dis-posal, these mathematicians then used Bayes’ theoremto reason about objective probabilities for almost anyquestion where many observations are available. Allthis, von Kries believed, was nonsense. The notion thatan event with very small probability is impossible was,in von Kries’ eyes, simply d’Alembert’s mistake.

von Kries believed that objective probabilities some-times exist, but only under conditions where equallylikely cases can legitimately be identified. Two condi-tions, he thought, are needed:

• Each case is produced by equally many of the pos-sible arrangements of the circumstances, and thisremains true when we look back in time to earliercircumstances that led to the current ones. In thissense, the relative sizes of the cases are natural.

• Nothing besides these circumstances affects our ex-pectation about the cases. In this sense, the Spiel-räume are insensitive. [In German, Spiel meansgame or play, and Raum (plural Räume) meansroom or space. In most contexts, Spielraum can betranslated as leeway or room for maneuver. For vonKries the Spielraum for each case was the set of allarrangements of the circumstances that produce it.]

von Kries’ principle of the Spielräume was that objec-tive probabilities can be calculated from equally likelycases when these conditions are satisfied. He consid-ered this principle analogous to Kant’s principle thateverything that exists has a cause. Kant thought thatwe cannot reason at all without the principle of causeand effect. von Kries thought that we cannot reasonabout objective probabilities without the principle ofthe Spielräume.

Even when an event has an objective probability,von Kries saw no legitimacy in the law of large num-bers. Bernoulli’s theorem is valid, he thought, but ittells us only that a large deviation of an event’s fre-quency from its probability is just as unlikely as someother unlikely event, say a long run of successes. Whatwill actually happen is another matter. This disagree-ment between Cournot and von Kries can be seen as

Page 7: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

76 G. SHAFER AND V. VOVK

a quibble about words. Do we say that an event willnot happen (Cournot) or do we say merely that it isas unlikely as some other event we do not expect tohappen (von Kries)? Either way, we proceed as if itwill not happen. However, the quibbling has its rea-sons. Cournot wanted to make a definite prediction, be-cause this provides a bridge from probability theory tothe world of phenomena—the real world, as those whohave not studied Kant would say. von Kries thought hehad a different way to connect probability theory withphenomena.

von Kries’ critique of moral certainty and the lawof large numbers was widely accepted in Germany(Kamlah, 1983). Czuber, in the influential textbook wehave already mentioned, named Bernoulli, d’Alembert,Buffon and De Morgan as advocates of moral certaintyand declared them all wrong; the concept of moral cer-tainty, he said, violates the fundamental insight thatan event of ever so small a probability can still hap-pen (Czuber, 1843, page 15; see also Meinong, 1915,page 591).

This wariness about ruling out the happening ofevents whose probability is merely very small doesnot seem to have prevented acceptance of the idea thatzero probability represents impossibility. Beginningwith Wiman’s work on continued fractions in 1900,mathematicians writing in German worked on show-ing that various sets have measure zero, and everyoneunderstood that the point was to show that these setsare impossible (see Felix Bernstein, 1912, page 419).This suggests a great gulf between zero probability andmerely small probability. One does not sense such agulf in the writings of Borel and his French colleagues;as we have seen, the vanishingly small, for them, wasmerely an idealization of the very small.

von Kries’ principle of the Spielräume did not en-dure, because no one knew how to use it, but hisproject of providing a Kantian justification for the uni-form distribution of probabilities remained alive inGerman philosophy in the first decades of the twenti-eth century (Meinong, 1915; Reichenbach, 1916). JohnMaynard Keynes (1921) brought it into the English lit-erature, where it continues to echo, to the extent thattoday’s probabilists, when asked about the philosophi-cal grounding of the classical theory of probability, aremore likely to think about arguments for a uniform dis-tribution of probabilities than about Cournot’s princi-ple.

2.3 Bertrand’s Paradoxes

How do we know cases are equally likely, and whensomething happens, do the cases that remain possi-ble remain equally likely? In the decades before theGrundbegriffe, these questions were frequently dis-cussed in the context of paradoxes formulated byJoseph Bertrand, an influential French mathematician,in a textbook published in 1889.

We now look at discussions by other authors of twoof Bertrand’s paradoxes: Poincaré’s discussion of theparadox of the three jewelry boxes and Borel’s discus-sion of the paradox of the great circle. (In the literatureof the period, “Bertrand’s paradox” usually referredto a third paradox, concerning two possible interpre-tations of the idea of choosing a random chord on acircle. Determining a chord by choosing two randompoints on the circumference is not the same as deter-mining it by choosing a random distance from the cen-ter and then a random orientation.) The paradox of thegreat circle was also discussed by Kolmogorov and isnow sometimes called the Borel–Kolmogorov paradox.

2.3.1 The paradox of the three jewelry boxes. Thisparadox, laid out by Bertrand (1889, pages 2–3), in-volves three identical jewelry boxes, each with twodrawers. Box A has gold medals in both drawers, box Bhas silver medals in both, and box C has a gold medalin one and a silver medal in the other. Suppose wechoose a box at random. It will be box C with prob-ability 1/3. Now suppose we open at random one ofthe drawers in the box we have chosen. There are twopossibilities for what we find:

• We find a gold medal. In this case, only two possibil-ities remain: the other drawer has a gold medal (wehave chosen box A) or the other drawer has a silvermedal (we have chosen box C).

• We find a silver medal. Here also, only two possibil-ities remain: the other drawer has a gold medal (wehave chosen box C) or the other drawer has a silvermedal (we have chosen box B).

Either way, it seems, there are now two cases, one ofwhich is that we have chosen box C. So the probabilitythat we have chosen box C is now 1/2.

Bertrand himself did not accept the conclusion thatopening the drawer would change the probability ofhaving box C from 1/3 to 1/2, and Poincaré (1912,pages 26–27) gave an explanation: Suppose the draw-ers in each box are labeled (where we cannot see)α and β , and suppose the gold medal in box C is indrawer α. Then there are six equally likely cases forthe drawer we open:

Page 8: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 77

1. Box A, drawer α: gold medal.2. Box A, drawer β: gold medal.3. Box B, drawer α: silver medal.4. Box B, drawer β: silver medal.5. Box C, drawer α: gold medal.6. Box C, drawer β: silver medal.

When we find a gold medal, say, in the drawer we haveopened, three of these cases remain possible: case 1,case 2 and case 5. Of the three, only one favors ourhaving our hands on box C, so the probability for box Cis still 1/3.

2.3.2 The paradox of the great circle. Bertrand(1889, pages 6–7) begins with a simple question: if wechoose at random two points on the surface of a sphere,what is the probability that the distance between themis less than 10′?

By symmetry, we can suppose that the first point isknown. So one way to answer the question is to calcu-late the proportion of a sphere’s surface that lies within10′ of a given point. This is 2.1 × 10−6.

Bertrand also found a different answer. After fix-ing the first point, he said, we can also assume thatwe know the great circle that connects the two points,because the possible chances are the same on greatcircles through the first point. There are 360 degrees—2160 arcs of 10′ each—in this great circle. Only thepoints in the two neighboring arcs are within 10′ of thefirst point, and so the probability sought is 2/2160, or9.3 × 10−4. This is many times larger than the prob-ability found by the first method. Bertrand consideredboth answers equally valid, the original question beingill-posed. The concept of choosing points at random ona sphere was not, he said, sufficiently precise.

In his own probability textbook Borel (1909b, pages100–104) explained that Bertrand was mistaken.Bertrand’s first method, based on the assumption thatequal areas on the sphere have equal chances of con-taining the second point, is correct. His second method,based on the assumption that equal arcs on a great cir-cle have equal chances of containing it, is incorrect.Writing M and M′ for the two points to be chosen atrandom on the sphere, Borel explained Bertrand’s mis-take as follows:

. . . The error begins when, after fixing thepoint M and the great circle, one assumesthat the probability of M′ being on a givenarc of the great circle is proportional to thelength of that arc. If the arcs have no width,then in order to speak rigorously, we must

assign the value zero to the probability thatM and M′ are on the circle. In order to avoidthis factor of zero, which makes any calcu-lation impossible, one must consider a thinbundle of great circles all going through M,and then it is obvious that there is a greaterprobability for M′ to be situated in a vicinity90 degrees from M than in the vicinity of Mitself (Fig. 13).

To give this argument practical content, Borel dis-cussed how one might measure the longitude of a pointon the surface of the earth. If we use astronomical ob-servations, then we are measuring an angle, and er-rors in the measurement of the angle correspond towider distances on the ground at the equator than atthe poles. If we instead use geodesic measurements,say with a line of markers on each of many meridians,then to keep the markers out of each other’s way, wemust make them thinner and thinner as we approachthe poles.

2.3.3 Appraisal. Poincaré, Borel and others whounderstood the principles of the classical theory wereable to resolve the paradoxes that Bertrand contrived.Two principles emerge from the resolutions they of-fered:

• The equally likely cases must be detailed enoughto represent new information (e.g., we find a gold

FIG. 1. Borel’s Figure 13.

Page 9: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

78 G. SHAFER AND V. VOVK

medal) in all relevant detail. The remaining equallylikely cases will then remain equally likely.

• We may need to consider the real observed event ofnonzero probability that is represented in an ideal-ized way by an event of zero probability (e.g., a ran-domly chosen point falls on a particular meridian).We should pass to the limit only after absorbing thenew information.

Not everyone found it easy to apply these principles,however, and the confusion surrounding the paradoxeswas another source of dissatisfaction with the classicaltheory.

3. MEASURE-THEORETIC PROBABILITY BEFORETHE GRUNDBEGRIFFE

A discussion of the relationship between measureand probability in the first decades of the twentiethcentury must navigate many pitfalls, because measuretheory itself evolved, beginning as a theory about themeasurability of sets of real numbers and then becom-ing more general and abstract. Probability theory fol-lowed along, but since the meaning of measure waschanging, we can easily misunderstand things said atthe time about the relationship between the two theo-ries.

The development of theories of measure and inte-gration during the late nineteenth and early twenti-eth centuries has been studied extensively (Hawkins,1975; Pier, 1994a). Here we offer only a bare-bonessketch, beginning with Borel and Lebesgue, and touch-ing on those steps that proved most significant forthe foundations of probability. We discuss the workof Carathéodory, Radon, Fréchet and Nikodym, whomade measure primary and integral secondary, as wellas the contrasting approach of Daniell, who took inte-gration to be basic, and Wiener, who applied Daniell’smethods to Brownian motion. Then we discuss Borel’sstrong law of large numbers, which focused attentionon measure rather than on integration. After lookingat Steinhaus’ axiomatization of Borel’s denumerableprobability, we turn to Kolmogorov’s use of measuretheory in probability in the 1920s.

3.1 Measure Theory from Borel to Fréchet

Émile Borel is considered the founder of measuretheory. Whereas Peano and Jordan had extended theconcept of length from intervals to a larger class ofsets of real numbers by approximating the sets insideand outside with finite unions of intervals, Borel usedcountable unions. His motivation came from complex

analysis. In his doctoral dissertation Borel (1895) stud-ied certain series that were known to diverge on adense set of points on a closed curve and hence, it wasthought, could not be continued analytically into theregion bounded by the curve. Roughly speaking, Boreldiscovered that the set of points where divergence oc-curred, although dense, can be covered by a count-able number of intervals with arbitrarily small totallength. Elsewhere on the curve—almost everywhere,we would say now—the series does converge and soanalytic continuation is possible (Hawkins, 1975, Sec-tion 4.2). This discovery led Borel to a new theory ofmeasurability for subsets of [0,1] (Borel, 1898).

Borel’s innovation was quickly seized upon by HenriLebesgue, who made it the basis for his powerful the-ory of integration (Lebesgue, 1901). We now speak ofLebesgue measure on the real numbers R and on then-dimensional space Rn, and of the Lebesgue integralin these spaces. We need not review Lebesgue’s the-ory, but we should mention one theorem, the precursorof the Radon–Nikodym theorem: any countably addi-tive and absolutely continuous set function on the realnumbers is an indefinite integral. This result first ap-peared in (Lebesgue, 1904; Hawkins, 1975, page 145;Pier, 1994a, page 524). He generalized it to Rn in 1910(Hawkins, 1975, page 186).

Wacław Sierpinski (1918) gave an axiomatic treat-ment of Lebesgue measure. In this note, important tous because of the use Hugo Steinhaus later made of it,Sierpinski characterized the class of Lebesgue measur-able sets as the smallest class K of sets that satisfy thefollowing conditions:

I. For every set E in K , there is a nonnegative num-ber µ(E) that will be its measure and will satisfyconditions II, III, IV and V.

II. Every finite closed interval is in K and has itslength as its measure.

III. The class K is closed under finite and countableunions of disjoint elements, and µ is finitely andcountably additive.

IV. If E1 ⊃ E2, and E1 and E2 are in K , then E1 \ E2is in K .

V. If E is in K and µ(E) = 0, then any subset of E isin K .

An arbitrary class K that satisfies these conditions isnot necessarily a field; there is no requirement that theintersection of two of K’s elements also be in K .

Lebesgue’s measure theory was first made abstractby Johann Radon (1913). Radon unified Lebesgue andStieltjes integration by generalizing integration with

Page 10: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 79

respect to Lebesgue measure to integration with respectto any countably additive set function on the Borel setsin Rn. The generalization included a version of the the-orem of Lebesgue we just mentioned: if a countablyadditive set function g on Rn is absolutely continu-ous with respect to another countably additive set func-tion f , then g is an indefinite integral with respect to f

(Hawkins, 1975, page 189).Constantin Carathéodory was also influential in

drawing attention to measures on Euclidean spacesother than Lebesgue measure. Carathéodory (1914)gave axioms for outer measure in a q-dimensionalspace, derived the notion of measure and appliedthese ideas not only to Lebesgue measure on Euclid-ean spaces, but also to lower dimensional measureson Euclidean space which assign lengths to curves,areas to surfaces and so forth (Hochkirchen, 1999).Carathéodory also recast Lebesgue’s theory of integra-tion to make measure even more fundamental; in histextbook (Carathéodory, 1918) on real functions, hedefined the integral of a positive function on a subsetof Rn as the (n+1)-dimensional measure of the regionbetween the subset and the function’s graph (Bourbaki,1994, page 228).

It was Fréchet who first went beyond Euclideanspace. Fréchet (1915a, b) observed that much ofRadon’s reasoning does not depend on the assumptionthat one is working in Rn. One can reason in the sameway in a much larger space, such as a space of func-tions. Any space will do, so long as the countably addi-tive set function is defined on a σ -field of its subsets, asRadon had required. Fréchet did not, however, manageto generalize Radon’s theorem on absolute continuityto the fully abstract framework. This generalization,now called the Radon–Nikodym theorem, was obtainedby Otton Nikodym fifteen years later (Nikodym, 1930).

Did Fréchet himself have probability in mind whenhe proposed a calculus that allows integration overfunction space? Probably so. An integral is a meanvalue. In a Euclidean space this might be a meanvalue with respect to a distribution of mass or electricalcharge, but we cannot distribute mass or charge over aspace of functions. The only thing we can imagine dis-tributing over such a space is probability or frequency.However, Fréchet thought of probability as an appli-cation of mathematics, not as a branch of pure mathe-matics itself, so he did not think he was axiomatizingprobability theory.

It was Kolmogorov who first called Fréchet’s theorya foundation for probability theory. He put the matterthis way in the preface to the Grundbegriffe:

. . . After Lebesgue’s investigations, the anal-ogy between the measure of a set and theprobability of an event, as well as betweenthe integral of a function and the mathe-matical expectation of a random variable,was clear. This analogy could be extendedfurther; for example, many properties of in-dependent random variables are completelyanalogous to corresponding properties oforthogonal functions. But in order to baseprobability theory on this analogy, one stillneeded to liberate the theory of measureand integration from the geometric elementsstill in the foreground with Lebesgue. Thisliberation was accomplished by Fréchet.

It should not be inferred from this passage that Fréchetand Kolmogorov used “measure” in the way we dotoday. Fréchet may have liberated measure and inte-gration from its geometric roots, but Fréchet and Kol-mogorov continued to reserve the word measure forgeometric settings. Throughout the 1930s, what wenow call a measure, they called an additive set func-tion. The usage to which we are now accustomed be-came standard only after the Second World War.

3.2 Daniell’s Integral and Wiener’sDifferential Space

Percy Daniell, an Englishman working at the RiceInstitute in Houston, Texas, introduced his integral in aseries of articles (Daniell, 1918, 1919a, b, 1920) in theAnnals of Mathematics.

Like Fréchet, Daniell considered an abstract set E,but instead of beginning with an additive set functionon subsets of E, he began with what he called an in-tegral on E—a linear operator on some class T0 ofreal-valued functions on E. The class T0 might con-sist of all continuous functions (if E is endowed witha topology) or perhaps all step functions. ApplyingLebesgue’s methods in this general setting, Daniell ex-tended the linear operator to a wider class T1 of func-tions on E, the summable functions. In this way, theRiemann integral is extended to the Lebesgue integral,the Stieltjes integral is extended to the Radon integraland so on (Daniell, 1918). Using ideas from Fréchet’sdissertation, Daniell also gave examples in infinite-dimensional spaces (Daniell, 1919a, b). Daniell (1921)even used his theory of integration to construct a theoryof Brownian motion. However, he did not succeed ingaining recognition for this last contribution; it seemsto have been completely ignored until Stephen Stiglerspotted it in the 1970s (Stigler, 1973).

Page 11: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

80 G. SHAFER AND V. VOVK

The American ex-child prodigy and polymath Nor-bert Wiener, when he came upon Daniell’s 1918 andJuly 1919 articles (Daniell, 1918, 1919a), was in abetter position than Daniell himself to appreciate andadvertise their remarkable potential for probability(Wiener, 1956; Masani, 1990). Having studied philos-ophy as well as mathematics, Wiener was well awareof the intellectual significance of Brownian motion andof Einstein’s mathematical model for it.

In November 1919, Wiener submitted his first arti-cle (Wiener, 1920) on Daniell’s integral to the Annalsof Mathematics, the journal where Daniell’s four arti-cles on it had appeared. This article did not yet dis-cuss Brownian motion; it merely laid out a generalmethod for setting up a Daniell integral when the un-derlying space E is a function space. However, by Au-gust 1920, Wiener was in France to explain his ideason Brownian motion to Fréchet and Lévy (Segal, 1992,page 397). He followed up with a series of articles(Wiener, 1921a, b), including a later much celebratedarticle on “differential-space” (Wiener, 1923).

Wiener’s basic idea was simple. Suppose we wantto formalize the notion of Brownian motion for a fi-nite time interval, say 0 ≤ t ≤ 1. A realized path is afunction on [0,1]. We want to define mean values forcertain functionals (real-valued functions of the real-ized path). To set up a Daniell integral that gives thesemean values, Wiener took T0 to consist of functionalsthat depend only on the path’s values at a finite numberof time points. One can find the mean value of such afunctional using Gaussian probabilities for the changesfrom each time point to the next. Extending this in-tegral by Daniell’s method, he succeeded in definingmean values for a wide class of functionals. In particu-lar, he obtained probabilities (mean values for indicatorfunctions) for certain sets of paths. He showed that theset of continuous paths has probability 1, while the setof differentiable paths has probability 0.

It is now commonplace to translate this work intoKolmogorov’s measure-theoretic framework. KiyoshiItô, for example, in a commentary published alongwith Wiener’s articles from this period in Volume 1of Wiener’s collected works (Wiener, 1976–1985,page 515), wrote as follows concerning Wiener’s 1923article:

Having investigated the differential spacefrom various directions, Wiener defines theWiener measure as a σ -additive probabilitymeasure by means of Daniell’s theory of in-tegral.

It should not be thought, however, that Wiener defineda σ -additive probability measure and then found meanvalues as integrals with respect to that measure. Rather,as we just explained, he started with mean values andused Daniell’s theory to obtain more. This Daniellianapproach to probability, making mean value basic andprobability secondary, has long taken a back seat toKolmogorov’s approach, but it still has its supporters(Haberman, 1996; Whittle, 2000).

3.3 Borel’s Denumerable Probability

Impressive as it was and still is, Wiener’s workplayed little role in the story leading to Kolmogorov’sGrundbegriffe. The starring role was played instead byBorel.

In retrospect, Borel’s use of measure theory in com-plex analysis in the 1890s already looks like proba-bilistic reasoning. Especially striking in this respectis the argument Borel gave for his claim that a Tay-lor series will usually diverge on the boundary of itscircle of convergence (Borel, 1897). In general, he as-serted, successive coefficients of the Taylor series, orat least successive groups of coefficients, are indepen-dent. He showed that each group of coefficients de-termines an arc on the circle, that the sum of lengthsof the arcs diverges and that the Taylor series willdiverge at a point on the circle if it belongs to infi-nitely many of the arcs. The arcs being independentand the sum of their lengths being infinite, a given pointmust be in infinitely many of them. To make sense ofthis argument, we must evidently take “in general” tomean that the coefficients are chosen at random andtake “independent” to mean probabilistically indepen-dent; the conclusion then follows by what we now callthe Borel–Cantelli lemma. Borel himself used proba-bilistic language when he reviewed this work in 1912(Borel, 1912; Kahane, 1994). In the 1890s, however,Borel did not see complex analysis as a domain forprobability, which is concerned with events in the realworld.

In the new century, Borel did begin to explore the im-plications for probability of his and Lebesgue’s workon measure and integration (Bru, 2001). His first com-ments came in an article in 1905 (Borel, 1905), wherehe pointed out that the new theory justified Poincaré’sintuition that a point chosen at random from a line seg-ment would be incommensurable with probability 1and called attention to Anders Wiman’s (1900, 1901)work on continued fractions, which had been inspiredby the question of the stability of planetary motions, asan application of measure theory to probability.

Page 12: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 81

Then, in 1909, Borel published a startling result—hisstrong law of large numbers (Borel, 1909a). This newresult strengthened measure theory’s connection bothwith geometric probability and with the heart of clas-sical probability theory—the concept of independenttrials. Considered as a statement in geometric proba-bility, the law says that the fraction of 1’s in the binaryexpansion of a real number chosen at random from[0,1] converges to 1

2 with probability 1. Considered asa statement about independent trials (we may use thelanguage of coin tossing, though Borel did not), it saysthat the fraction of heads in a denumerable sequence ofindependent tosses of a fair coin converges to 1

2 withprobability 1. Borel explained the geometric interpre-tation and he asserted that the result can be establishedusing measure theory (Borel, 1909a, Section I.8). How-ever, he set measure theory aside for philosophicalreasons and provided an imperfect proof using denu-merable versions of the rules of total and compoundprobability. It was left to others, most immediatelyFaber (1910, page 400) and Hausdorff (1914), to giverigorous measure-theoretic proofs (Doob, 1989, 1994;von Plato, 1994).

Borel’s discomfort with a measure-theoretic treat-ment can be attributed to his unwillingness to as-sume countable additivity for probability (Barone andNovikoff, 1978; von Plato, 1994). He saw no logi-cal absurdity in a countably infinite number of zeroprobabilities adding to a nonzero probability, and soinstead of general appeals to countable additivity hepreferred arguments that derive probabilities as lim-its as the number of trials increases (Borel, 1909a,Section I.4). Such arguments seemed to him strongerthan formal appeals to countable additivity, becausethey exhibit the finitary pictures that are idealized bythe infinitary pictures. He saw even more fundamen-tal problems in the idea that Lebesgue measure canmodel a random choice (von Plato, 1994, pages 36–56;Knobloch, 2001). How can we choose a real number atrandom when most real numbers are not even definablein any constructive sense?

Although Hausdorff did not hesitate to equate Lebes-gue measure with probability, his account of Borel’sstrong law, in his Grundzüge der Mengenlehre (Haus-dorff, 1914, pages 419–421), treated it as a theoremabout real numbers: the set of numbers in [0,1] withbinary expansions for which the proportion of 1’s con-verges to 1

2 has Lebesgue measure 1. Later, FrancescoPaolo Cantelli (1916a, b, 1917) rediscovered the stronglaw (he neglected, in any case, to cite Borel) and ex-tended it to the more general result that the average of

bounded random variables will converge to their meanwith arbitrarily high probability. Cantelli’s work in-spired other authors to study the strong law and to sortout different concepts of probabilistic convergence.

By the early 1920s, it seemed to some that therewere two different versions of Borel’s strong law—one concerned with real numbers and one concernedwith probability. Hugo Steinhaus (1923) proposed toclarify matters by axiomatizing Borel’s theory of de-numerable probability along the lines of Sierpinski’saxiomatization of Lebesgue measure. Writing A forthe set of all infinite sequences of ρ’s and η’s (ρ for“rouge” and η for “noir”; now we are playing red orblack rather than heads or tails), Steinhaus proposedthe following axioms for a class K of subsets of A anda real-valued function µ that gives probabilities for theelements of K:

I. µ(E) ≥ 0 for all E ∈ K.II. 1. For any finite sequence e of ρ’s and η’s, the

subset E of A consisting of all infinite se-quences that begin with e is in K.

2. If two such sequences e1 and e2 differ in onlyone place, then µ(E1) = µ(E2), where E1 andE2 are the corresponding sets.

3. µ(A) = 1.III. K is closed under finite and countable unions of

disjoint elements, and µ is finitely and countablyadditive.

IV. If E1 ⊃ E2, and E1 and E2 are in K, then E1 \ E2is in K.

V. If E is in K and µ(E) = 0, then any subset of E isin K.

Sierpinski’s axioms for Lebesgue measure consistedof I, III, IV and V, together with an axiom that says thatthe measure µ(J ) of an interval J is its length. Thislast axiom being demonstrably equivalent to Steinhaus’axiom II, Steinhaus concluded that the theory of prob-ability for an infinite sequence of binary trials is iso-morphic with the theory of Lebesgue measure.

To show that his axiom II is equivalent to setting themeasures of intervals equal to their length, Steinhausused the Rademacher functions—the nth Rademacherfunction being the function that assigns a real num-ber the value 1 or −1 depending on whether the nthdigit in its dyadic expansion is 0 or 1. He also usedthese functions, which are independent random vari-ables, in deriving Borel’s strong law and related re-sults. The work by Rademacher (1922) and Steinhausmarked the beginning of the Polish school of “indepen-dent functions,” which made important contributions to

Page 13: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

82 G. SHAFER AND V. VOVK

probability theory during the period between the wars(Holgate, 1997).

3.4 Kolmogorov Enters the Stage

Although Steinhaus considered only binary trialsin 1923, his reference to Borel’s more general con-cept of denumerable probability pointed to generaliza-tions. We find such a generalization in Kolmogorov’sfirst article on probability, co-authored by Khinchin(Khinchin and Kolmogorov, 1925), which showed thata series of discrete random variables y1 + y2 + · · · willconverge with probability 1 when the series of meansand the series of variances both converge. The first sec-tion of the article, due to Khinchin, spells out how torepresent the random variables as functions on [0,1]:divide the interval into segments with lengths equalto the probabilities for y1’s possible values, then di-vide each of these segments into smaller segments withlengths proportional to the probabilities for y2’s possi-ble values and so on. This, Khinchin noted with a nodto Rademacher and Steinhaus, reduces the problem to aproblem about Lebesgue measure. This reduction wasuseful because the rules for working with Lebesguemeasure were clear, while Borel’s picture of denumer-able probability remained murky.

Dissatisfaction with this detour into Lebesgue mea-sure must have been one impetus for the Grundbegriffe(Doob, 1989, page 818). Kolmogorov made no suchdetour in his next article on the convergence of sumsof independent random variables. In this sole-authoredarticle (Kolmogorov, 1928), he took probabilities andexpected values as his starting point, but even then hedid not appeal to Fréchet’s countably additive calcu-lus. Instead, he worked with finite additivity and thenstated an explicit ad hoc definition when he passed toa limit. For example, he defined the probability P thatthe series

∑∞n=1 yn converges by the equation

P = limη→0

limn→∞ lim

N→∞W

[Max

∣∣∣∣∣p∑

k=n

yk

∣∣∣∣∣N

p=n

< η

],

where W(E) denotes the probability of the event E.[This formula does not appear in the Russian(Kolmogorov, 1986) and English (Kolmogorov, 1992)translations provided in Kolmogorov’s collectedworks; there the argument has been modernized so asto eliminate it.] This recalls the way Borel proceededin 1909: think through each passage to the limit.

It was in his seminal article on Markov processes(Kolmogorov, 1931) that Kolmogorov first explicitlyand freely used Fréchet’s calculus as his framework for

probability. In this article, Kolmogorov considered asystem with a set of states A. For any two time pointst1 and t2 (t1 < t2), any state x ∈ A and any element E ina collection F of subsets of A, he wrote P(t1, x, t2,E)

for the probability, when the system is in state x attime t1, that it will be in a state in E at time t2. Cit-ing Fréchet, Kolmogorov assumed that P is countablyadditive as a function of E and that F is closed un-der differences and countable unions, and contains theempty set, all singletons and A. However, the focus wasnot on Fréchet; it was on the equation that ties togetherthe transition probabilities, now called the Chapman–Kolmogorov equation. The article launched the studyof this equation by purely analytical methods, a studythat kept probabilists occupied for 50 years.

As many commentators have noted, the 1931 arti-cle makes no reference to probabilities for trajecto-ries. There is no suggestion that such probabilities areneeded for a stochastic process to be well defined. Con-sistent transition probabilities, it seems, are enough.Bachelier (1900, 1910, 1912) is cited as the first tostudy continuous-time stochastic processes, but Wieneris not cited.

4. HILBERT’S SIXTH PROBLEM

At the beginning of the twentieth century, manymathematicians were dissatisfied with what they sawas a lack of clarity and rigor in the probability calcu-lus. The whole calculus seemed to be concerned withconcepts that lie outside mathematics: event, trial, ran-domness, probability. As Henri Poincaré wrote, “onecan hardly give a satisfactory definition of probability”(Poincaré, 1912, page 24).

The most celebrated call for clarification came fromDavid Hilbert. The sixth of the twenty-three openproblems that Hilbert presented to the InternationalCongress of Mathematicians in Paris in 1900 was totreat axiomatically, after the model of geometry, thoseparts of physics in which mathematics already playedan outstanding role, especially probability and me-chanics (Hilbert, 1902; Hochkirchen, 1999). To explainwhat he meant by axioms for probability, Hilbert citedGeorg Bohlmann, who had labeled the rules of totaland compound probability axioms rather than theoremsin his lectures on the mathematics of life insurance(Bohlmann, 1901). In addition to a logical investiga-tion of these axioms, Hilbert called for a “rigorous andsatisfactory development of the method of average val-ues in mathematical physics, especially in the kinetictheory of gases.”

Page 14: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 83

Hilbert’s call for a mathematical treatment of aver-age values was answered in part by the work on inte-gration that we discussed in the preceding section, buthis suggestion that the classical rules for probabilityshould be treated as axioms on the model of geome-try was an additional challenge. Among the early re-sponses, we may mention the following:

• In his Zürich dissertation, Rudolf Laemmel (1904)discussed the rules of total and compound prob-ability as axioms, but he stated the rule of com-pound probability only in the case of independence,a concept he did not explicate. (For excerpts, seeSchneider, 1988, pages 359–366.)

• In his Göttingen dissertation, directed by Hilberthimself, Ugo Broggi (1907) gave only two axioms:an axiom stating that the sure event has probabil-ity 1, and an axiom stating the rule of total probabil-ity. Following tradition, he then defined probabilityas a ratio (a ratio of numbers of cases in the discretesetting; a ratio of the Lebesgue measures of two setsin the geometric setting) and verified his axioms. Hedid not state an axiom that corresponds to the clas-sical rule of compound probability. Instead, he gavethis name to a rule for calculating the probability ofa Cartesian product, which he derived from the defi-nition of geometric probability in terms of Lebesguemeasure. (For excerpts, see Schneider, 1988, pages367–377.) Broggi mistakenly claimed that his axiomof total probability (finite additivity) implied count-able additivity (Steinhaus, 1923).

• In an article written in 1920, published in 1923and listed in the bibliography of the Grundbegriffe,Antoni Łomnicki (1923) proposed that probabilityshould always be understood relative to a densityφ on a set M in Rr . Łomnicki defined this prob-ability by combining two of Carathéodory’s ideas:the idea of p-dimensional measure and the idea ofdefining the integral of a function on a set as themeasure of the region between the set and the func-tion’s graph (see Section 3.1 above). The probabil-ity of a subset m of M, according to Łomnicki, isthe ratio of the measure of the region between m

and φ’s graph to the measure of the region betweenM and this graph. If M is an r-dimensional sub-set of Rr , then the measure being used is Lebesguemeasure on Rr+1; if M is a lower dimensionalsubset of Rr , say p-dimensional, then the measureis the (p + 1)-dimensional Carathéodory measure.This definition covers discrete as well as continu-ous probability: in the discrete case, M is a set of

discrete points, the function φ assigns each pointits probability, and the region between a subset m

and the graph of φ consists of a line segment foreach point in m, whose Carathéodory measure is itslength (i.e., the point’s probability). The rule of totalprobability follows. Like Broggi, Łomnicki treatedthe rule of compound probability as a rule for re-lating probabilities on a Cartesian product to proba-bilities on its components. He did not consider it anaxiom, because it holds only if the density itself is aproduct density.

• In an article published in Russian, Sergei Bernstein(1917) showed that probability theory can be foun-ded on qualitative axioms for numerical coefficientsthat measure the probabilities of propositions. Healso developed this idea in his probability text-book (Bernstein, 1927), and Kolmogorov listed boththe article and the book in the bibliography ofthe Grundbegriffe. John Maynard Keynes includedBernstein’s article in the bibliography of his prob-ability book (Keynes, 1921), but Bernstein’s workwas subsequently ignored by English-language au-thors on qualitative probability. It was first sum-marized in English in Samuel Kotz’s translation ofLeonid E. Maistrov’s (1974) history of probability.

We now discuss at greater length responses byvon Mises, Slutsky, Kolmogorov and Cantelli.

4.1 von Mises’ Collectives

The concept of a collective was introduced intothe German scientific literature by Gustav Fechner’s(1897) Kollektivmasslehre, which appeared ten yearsafter the author’s death. The concept was quickly takenup by Georg Helm (1902) and Heinrich Bruns (1906).

Fechner wrote about the concept of a Kollektivgegen-stand (collective object) or a Kollektivreihe (collectiveseries). It was only later, in Meinong (1915) for ex-ample, that we see these names abbreviated to Kollek-tiv. As the name Kollektivreihe indicates, a Kollektivis a population of individuals given in a certain order;Fechner called the ordering the Urliste. It was sup-posed to be irregular—random, we would say. Fechnerwas a practical scientist, not concerned with the the-oretical notion of probability, but as Helm and Brunsrealized, probability theory provides a framework forstudying collectives.

The concept of a collective was developed by Richardvon Mises (1919, 1928, 1931). His contribution was torealize that the concept can be made into a mathemat-ical foundation for probability theory. As von Misesdefined it, a collective is an infinite sequence of out-comes that satisfies two axioms:

Page 15: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

84 G. SHAFER AND V. VOVK

1. The relative frequency of each outcome convergesto a real number (the probability of the outcome) aswe look at longer and longer initial segments of thesequence.

2. The relative frequency converges to the same prob-ability in any subsequence selected without knowl-edge of the future (we may use knowledge of theoutcomes so far in deciding whether to include thenext outcome in the subsequence).

The second property says we cannot change the oddsby selecting a subsequence of trials on which to bet;this is von Mises’ version of the “hypothesis of the im-possibility of a gambling system,” and it assures theirregularity of the Urliste.

According to von Mises, the purpose of the prob-ability calculus is to identify situations where collec-tives exist and the probabilities in them are known, andto derive probabilities for other collectives from thesegiven probabilities. He pointed to three domains whereprobabilities for collectives are known: (1) games ofchance where devices are carefully constructed sothe axioms will be satisfied, (2) statistical phenom-ena where the two axioms can be confirmed, to a rea-sonable degree and (3) branches of theoretical physicswhere the two axioms play the same hypothetical roleas other theoretical assumptions (von Mises, 1931,pages 25–27).

von Mises derived the classical rules of probabil-ity, such as the rules for adding and multiplying prob-abilities, from rules for constructing new collectivesfrom an initial one. He had several laws of large num-bers. The simplest was his definition of probability: theprobability of an event is the event’s limiting frequencyin a collective. Others arose as one constructed furthercollectives.

The ideas of von Mises were taken up by a num-ber of mathematicians in the 1920s and 1930s. Kol-mogorov’s bibliography includes an article by ArthurCopeland (1932) that proposed founding probabilitytheory on particular rules for selecting subsequencesin von Mises’ scheme, as well as articles by KarlDörge (1930), Hans Reichenbach (1932) and ErhardTornier (1933) that argued for related schemes. But themost prominent mathematicians of the time, includingthe Göttingen mathematicians (Mac Lane, 1995), theFrench probabilists and the British statisticians, werehostile or indifferent.

Collectives were given a rigorous mathematical basisby Abraham Wald (1938) and Alonzo Church (1940),but the claim that they provide a foundation for prob-ability was refuted by Jean Ville (1939). Ville pointed

out that whereas a collective in von Mises’ sense willnot be vulnerable to a gambling system that chooses asubsequence of trials on which to bet, it may still bevulnerable to a more clever gambling system, whichalso varies the amount of the bet and the outcome onwhich to bet.

4.2 Slutsky’s Calculus of Valences andKolmogorov’s General Theory of Measure

In an article published in Russian Evgeny Slutsky(1922) presented a viewpoint that greatly influencedKolmogorov. As Kolmogorov (1948) said in an obit-uary for Slutsky, Slutsky was “the first to give the rightpicture of the purely mathematical content of probabil-ity theory.”

How do we make probability purely mathemati-cal? Markov had claimed to do this in his textbook,but Slutsky did not think Markov had succeeded, be-cause Markov had retained the subjective notion ofequipossibility. The solution, Slutsky felt, was to re-move both the word “probability” and the notion ofequally likely cases from the theory. Instead of begin-ning with equally likely cases, one should begin by as-suming merely that numbers are assigned to cases andthat when a case assigned the number α is further sub-divided, the numbers assigned to the subcases shouldadd to α. The numbers assigned to cases might be equalor they might not. The addition and multiplication the-orems would be theorems in this abstract calculus, butit should not be called the probability calculus. In placeof “probability,” he suggested the unfamiliar word va-lentnost�, or “valence.” (Laemmel had earlier usedthe German valenz.) Probability would be only one in-terpretation of the calculus of valences, a calculus fullyas abstract as group theory.

Slutsky listed three distinct interpretations of the cal-culus of valences:

1. Classical probability (equally likely cases).2. Finite empirical sequences (frequencies).3. Limits of relative frequencies. (Slutsky remarked

that this interpretation is particularly popular withthe English school.)

Slutsky did not think probability could be reduced tolimiting frequency, because sequences of independenttrials have properties that go beyond their possessinglimiting frequencies. Initial segments of the sequenceshave properties that are not imposed by the eventualconvergence of the frequency, and the sequences mustbe irregular in a way that resists the kind of selectiondiscussed by von Mises.

Page 16: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 85

Slutsky’s idea that probability could be an instanceof a broader abstract theory was taken up by Kol-mogorov in a thought piece in Russian (Kolmogorov,1929), before his forthright use of Fréchet’s theory inhis article on Markov processes in 1930 (Kolmogorov,1931). Whereas Slutsky had mentioned frequencies asan alternative interpretation of a general calculus, Kol-mogorov pointed to more mathematical examples: thedistribution of digits in the decimal expansions of ir-rationals, Lebesgue measure in an n-dimensional cubeand the density of a set A of positive integers (the limitas n → ∞ of the fraction of the integers between 1 andn that are in A).

The abstract theory Kolmogorov sketches is con-cerned with a function M that assigns a nonnega-tive number M(E) to each element E of a class ofsubsets of a set A. He called M(E) the measure(mera) of E and he called M a measure specification(meroopredelenie). So as to accommodate all themathematical examples he had in mind, he assumed, ingeneral, neither that M is countably additive nor thatthe class of subsets to which it assigns numbers is afield. Instead, he assumed only that when E1 and E2are disjoint and M assigns a number to two of the threesets E1, E2 and E1 ∪ E2, it also assigns a number tothe third, and that

M(E1 ∪ E2) = M(E1) + M(E2)

then holds (cf. Steinhaus’ axioms III and IV). In thecase of probability, however, he did suggest (using dif-ferent words) that M should be countably additive andthat the class of subsets to which it assigns numbersshould be a field, for only then can we uniquely de-fine probabilities for countable unions and intersec-tions, and this seems necessary to justify argumentsinvolving events such as the convergence of randomvariables.

He defined the abstract Lebesgue integral of a func-tion f on A, and he commented that countable ad-ditivity is to be assumed whenever such an integralis discussed. He wrote ME1(E2) = M(E1E2)/M(E1)

“by analogy with the usual concept of relative proba-bility.” He defined independence for partitions, and hecommented, no doubt in reference to Borel’s strong lawand other results in number theory, that the notion ofindependence is responsible for the power of probabi-listic methods within pure mathematics.

The mathematical core of the Grundbegriffe is alre-ady here. Many years later, in his commentary in Vol-ume II of his collected works (Kolmogorov, 1992,page 520), Kolmogorov said that only the set-theoretic

treatment of conditional probability and the theory ofdistributions in infinite products were missing. Alsomissing, though, is the bold rhetorical move thatKolmogorov made in the Grundbegriffe—giving theabstract theory the name probability.

4.3 The Axioms of Steinhaus and Ulam

In the 1920s and 1930s, the city of Lwów in Polandwas a vigorous center of mathematical research, led byHugo Steinhaus. (Though it was in Poland between thetwo World Wars, Lwów is now in Ukraine. Its nameis spelled differently in different languages: Lwów inPolish, Lviv in Ukrainian and Lvov in Russian. Whenpart of Austria–Hungary and, briefly, Germany, it wasLemberg. Some articles in our bibliography refer to itas Léopol.) In 1929, Steinhaus’ work on limit theoremsintersected with Kolmogorov’s, and his approach pro-moted the idea that probability should be axiomatizedin the style of measure theory.

As we saw in Section 3.3, Steinhaus had already,in 1923, formulated axioms for heads and tails iso-morphic to Sierpinski’s axioms for Lebesgue measure.This isomorphism had more than a philosophical pur-pose; Steinhaus used it to prove Borel’s strong law. Ina pair of articles written in 1929 and published in 1930(Steinhaus, 1930a, b), Steinhaus extended his approachto limit theorems that involved an infinite sequence ofindependent draws θ1, θ2, . . . from the interval [0,1].His axioms for this case were the same as for the bi-nary case (Steinhaus, 1930b, pages 22–23), except thatthe second axiom, which determines probabilities forinitial finite sequences of heads and tails, was replacedby an axiom that determines probabilities for initial fi-nite sequences θ1, θ2, . . . , θn:

The probability that θi ∈ �i for i = 1, . . . , n,where the �i are measurable subsets of[0,1], is

|�1| · |�2| · · · |�n|,where |�i | is the Lebesgue measure of �i .

Steinhaus presented his axioms as a “logical extra-polation” of the classical axioms to the case of an infi-nite number of trials (Steinhaus, 1930b, page 23). Theywere more or less tacitly used, he asserted, in all clas-sical problems, such as the problem of the gambler’sruin, where the game as a whole—not merely finitelymany rounds—must be considered (Steinhaus, 1930a,page 409). As in the case of heads and tails, Steinhausshowed that there are probabilities that uniquely satisfy

Page 17: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

86 G. SHAFER AND V. VOVK

his axioms by setting up an isomorphism with Lebes-gue measure on [0,1], this time using a sort of Peanocurve to map [0,1]∞ onto [0,1]. He used the isomor-phism to prove several limit theorems, including onethat formalized Borel’s 1897 claim concerning thecircle of convergence of a Taylor series with randomlychosen coefficients.

Steinhaus’ axioms were measure-theoretic, but theywere not yet abstract. His words suggested that hisideas should apply to all sequences of random vari-ables, not merely ones uniformly distributed, and heeven considered the case where the variables werecomplex-valued rather than real-valued, but he did notstep outside the geometric context to consider pro-bability on abstract spaces. This step was taken byStanisław Ulam, one of Steinhaus’ junior colleaguesat Lwów. At the International Congress of Mathema-ticians in Zürich in 1932, Ulam announced that heand another Lwów mathematician, Zbigniew Łomnicki(a nephew of Antoni Łomnicki), had shown that pro-duct measures can be constructed in abstract spaces(Ulam, 1932).

Ulam and Łomnicki’s axioms for a measure m weresimple. We can put them in today’s language by sa-ying that m is a probability measure on a σ -algebrathat is complete (includes all null sets) and contains allsingletons. Ulam announced that from a countable se-quence of spaces with such probability measures, onecan construct a probability measure that satisfies thesame conditions on the product space.

We do not know whether Kolmogorov knew aboutUlam’s announcement when he wrote the Grundbe-griffe. Ulam’s axioms would have held no novelty forhim, but he would presumably have found the result onproduct measures interesting. When it finally appeared,Łomnicki and Ulam (1934) listed the same axioms asUlam’s announcement had, but it now cited the Grund-begriffe as authority for them. Kolmogorov (1935) ci-ted their article in turn in a short list of introductoryliterature in mathematical probability.

4.4 Cantelli’s Abstract Theory

Like Borel, Castelnuovo and Fréchet, FrancescoPaolo Cantelli turned to probability after distinguish-ing himself in other areas of mathematics. It was onlyin the 1930s, about the same time as the Grundbegriffeappeared, that he introduced his own abstract theoryof probability. This theory, which has important affini-ties with Kolmogorov’s, is developed most clearly inan article included in the Grundbegriffe’s bibliography

(Cantelli, 1932) and a lecture he gave in 1933 at theInstitut Henri Poincaré in Paris (Cantelli, 1935).

Cantelli (1932) argued for a theory that makes noappeal to empirical notions such as possibility, event,probability or independence. This abstract theory, hesaid, should begin with a set of points that have fi-nite nonzero measure. This could be any set for whichmeasure is defined, perhaps a set of points on a sur-face. He wrote m(E) for the area of a subset E. Henoted that m(E1 ∪ E2) = m(E1) + m(E2), providedE1 and E2 are disjoint, and 0 ≤ m(E1E2)/m(Ei) ≤ 1for i = 1,2. He called E1 and E2 multipliable whenm(E1E2) = m(E1)m(E2). Much of probability theory,he noted, including Bernoulli’s law of large numbersand Khinchin’s law of the iterated logarithm, can becarried out at this abstract level.

Cantelli (1935) explained how his abstract theory re-lates to frequencies in the world. The classical calculusof probability, he said, should be developed for a parti-cular class of events in the world in three steps:

1. Study experimentally the equally likely cases(check that they happen equally frequently), thusjustifying experimentally the rules of total and com-pound probability.

2. Develop an abstract theory based only on therules of total and compound probability, without re-ference to their empirical justification.

3. Deduce probabilities from the abstract theory anduse them to predict frequencies.

His own theory, Cantelli explains, is the one obtainedin the second step.

Cantelli’s 1932 article and 1933 lecture were notreally sources for the Grundbegriffe. Kolmogorov’searlier work (Kolmogorov, 1929, 1931) had alreadywent well beyond anything Cantelli did in 1932, inboth degree of abstraction and mathematical clarity.The 1933 lecture was more abstract, but obviouslycame too late to influence the Grundbegriffe. Howe-ver, Cantelli did develop independently of Kolmogorovthe project of combining a frequentist interpretation ofprobability with an abstract axiomatization that retai-ned in some form the classical rules of total and com-pound probability. This project had been in the air for30 years.

5. THE GRUNDBEGRIFFE

The Grundbegriffe was an exposition, not anotherresearch contribution. In his preface, after acknowl-edging Fréchet’s work, Kolmogorov said this:

Page 18: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 87

In the pertinent mathematical circles ithas been common for some time to con-struct probability theory in accordance withthis general point of view. But a completepresentation of the whole system, free fromsuperfluous complications, has been mis-sing (though a book by Fréchet, [2] in thebibliography, is in preparation).

Kolmogorov aimed to fill this gap, and he did so bril-liantly and concisely, in just 62 pages. Fréchet’s muchlonger book, which finally appeared in two volumes(Fréchet, 1937–1938), is regarded by some as a merefootnote to Kolmogorov’s achievement.

Fréchet’s own evaluation of the Grundbegriffe’s con-tribution, quoted at the beginning of this article, is cor-rect so far as it goes. Borel had introduced countableadditivity into probability in 1909, and in the following20 years, many authors, including Kolmogorov, hadexplored its consequences. The Grundbegriffe merelyrounded out the picture by explaining that nothingmore was needed. However, Kolmogorov’s mathema-tical achievement, especially his definitive work on theclassical limit theorems, had given him the grounds andthe authority to say that nothing more was needed.

Moreover, Kolmogorov’s appropriation of the nameprobability was an important rhetorical achievement,with enduring implications. Slutsky in 1922 andKolmogorov himself in 1927 had proposed a gener-al theory of additive set functions but had relied onthe classical theory to say that probability should be aspecial case of this general theory. Now Kolmogorovproposed axioms for probability. The numbers in hisabstract theory were probabilities, not merely valencesor mery. His philosophical justification for proceed-ing in this way so resembled the justification that Boreland Lévy had given for the classical theory that theycould hardly take exception.

It was not really true that nothing more was need-ed. Those who studied Kolmogorov’s formulation indetail soon realized that his axioms and definitionswere inadequate in a number of ways. Most salien-tly, his treatment of conditional probability was notadequate for the burgeoning theory of Markov process-es. In addition, there were other points in the mo-nograph where he could not obtain natural results atthe abstract level and had to fall back to the classi-cal examples—discrete probabilities and probabilitiesin Euclidean spaces. These shortcomings only gave im-petus to the new theory, because the project of filling inthe gaps provided exciting work for a new generationof probabilists.

In this section we take a fresh look at the Grund-begriffe. We review its six axioms and two ideas thatwere, as Kolmogorov himself pointed out in his pre-face, novel at the time: the construction of probabilitieson infinite-dimensional spaces (his famous consistencytheorem) and the definition of conditional probabilityusing the Radon–Nikodym theorem. Then we look atthe explicitly philosophical part of the monograph: thetwo pages in Chapter I where Kolmogorov explains theempirical origin and meaning of his axioms.

5.1 The Mathematical Framework

Kolmogorov’s six axioms for probability are so fa-miliar that it seems superfluous to repeat them, but soconcise that it is easy to do so. We do repeat themand then we discuss the two points just mentioned:the consistency theorem and the treatment of condi-tional probability and expectation. As we will see, themathematics was due to earlier authors—Daniell inthe case of the consistency theorem and Nikodym inthe case of conditional probabilities and expectations.Kolmogorov’s contribution, more rhetorical and philo-sophical than mathematical, was to bring this mathe-matics into a framework for probability.

5.1.1 The six axioms. Kolmogorov began with fiveaxioms concerning a set E and a set F of subsets of E,which he called random events:

I. F is a field of sets.II. F contains the set E.

III. To each set A from F is assigned a nonnegativereal number P(A). This number P(A) is called theprobability of the event A.

IV. The P(E) = 1.V. If A and B are disjoint, then

P(A ∪ B) = P(A) + P(B).

He then added a sixth axiom, redundant for finite F butindependent of the first five axioms for infinite F:

VI. If A1 ⊇ A2 ⊇ · · · is a decreasing sequence ofevents from F with

⋂∞n=1 An = ∅, then

limn→∞ P(An) = 0.

This is the axiom of continuity. Given the first five ax-ioms, it is equivalent to countable additivity.

The six axioms can be summarized by saying thatP is a nonnegative additive set function in the sense ofFréchet with P(E) = 1.

Unlike Fréchet, who had debated countable addi-tivity with de Finetti (Fréchet, 1930; de Finetti, 1930;Cifarelli and Regazzini, 1996), Kolmogorov did not

Page 19: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

88 G. SHAFER AND V. VOVK

make a substantive argument for it. Instead, he said this(page 14):

. . . Since the new axiom is essential only forinfinite fields of probability, it is hardly pos-sible to explain its empirical meaning. . . .In describing any actual observable randomprocess, we can obtain only finite fields ofprobability. Infinite fields of probability oc-cur only as idealized models of real randomprocesses. This understood, we limit our-selves arbitrarily to models that satisfy Ax-iom VI. So far this limitation has been foundexpedient in the most diverse investigations.

This echoes Borel who adopted countable additi-vity not as a matter of principle but because he hadnot encountered circumstances where its rejectionseemed expedient (Borel, 1909a, Section I.5). How-ever, Kolmogorov articulated even more clearly thanBorel the purely instrumental significance of infinity.

5.1.2 Probability distributions in infinite-dimension-al spaces. Suppose, using modern terminology, that(E1,F1), (E2,F2), . . . is a sequence of measurable spa-ces. For each finite set of indices, say i1, . . . , in, writeFi1,...,in for the induced σ -algebra in the product space∏n

j=1 Eij . Write E for the product of all the Ei andwrite F for the algebra (not a σ -algebra) that con-sists of all the cylinder subsets of E corresponding toelements of the various Fi1,...,in . Suppose we defineconsistent probability measures for all the marginalspaces (

∏nj=1 Eij ,F

i1,...,in). This defines a set functionon (E,F). Is it countably additive?

In general, the answer is negative; a counterexamplewas given by Erik Sparre Andersen and Børge Jessenin 1948, but as we noted in Section 4.3, Ulam hadgiven a positive answer for the case where the mar-ginal measures are product measures. Kolmogorov’sconsistency theorem, in Section 4 of Chapter III ofthe Grundbegriffe, gave a positive answer for anothercase, where each Ei is a copy of the real numbersand each Fi consists of the Borel sets. (Formally, weshould acknowledge, Kolmogorov had a slightly differ-ent starting point: finite-dimensional distribution func-tions, not finite-dimensional measures.)

In his September 1919 article (Daniell, 1919b),Daniell had proven a closely related theorem. AlthoughKolmogorov did not cite Daniell in the Grundbegriffe,the essential mathematical content of Kolmogorov’s re-sult is already in Daniell’s. This point was recognizedquickly; Jessen (1935) called attention to Daniell’s pri-ority in an article that appeared in MIT’s Journal of

Mathematics and Physics, together with an article byWiener that also called attention to Daniell’s result. Ina commemoration of Kolmogorov’s early work, Doob(1989) hazards the guess that Kolmogorov was una-ware of Daniell’s result when he wrote the Grund-begriffe. This may be true. He would not have beenthe first author to repeat Daniell’s work; Jessen hadpresented the result as his own to the Seventh Scan-dinavian Mathematical Conference in 1929 and hadbecome aware of Daniell’s priority only in time to ac-knowledge it in a footnote to his contribution to theproceedings (Jessen, 1930).

It is implausible that Kolmogorov was still unawareof Daniell’s construction after the comments by Wienerand Jessen, but in 1948 he again ignored Daniell whileclaiming the construction of probability measures oninfinite products as a Soviet achievement (Gnedenkoand Kolmogorov, 1948, Section 3.1). Perhaps thiscan be dismissed as mere propaganda, but we shouldalso remember that the Grundbegriffe was not meantas a contribution to pure mathematics. Daniell’s andKolmogorov’s theorems seem almost identical whenthey are assessed as mathematical discoveries, but theydiffered in context and purpose. Daniell was not think-ing about probability, whereas the slightly differenttheorem formulated by Kolmogorov was about proba-bility. Neither Daniell nor Wiener undertook to makeprobability into a conceptually independent branchof mathematics by establishing a general method forrepresenting it measure-theoretically.

Kolmogorov’s theorem was more general than Dani-ell’s in one respect—Kolmogorov considered an indexset of arbitrary cardinality, whereas Daniell consideredonly denumerable cardinality. This greater generality ismerely formal, in two senses: it involves no additionalmathematical complications and it has no practical use.The obvious use of a nondenumerable index would beto represent continuous time, and so we might conjec-ture that Kolmogorov was thinking of making prob-ability statements about trajectories, as Wiener haddone in the 1920s. However, Kolmogorov’s construc-tion does not accomplish anything in this direction.The σ -algebra on the product obtained by the con-struction contains too few sets; in the case of Brow-nian motion, it does not include the set of continuoustrajectories. It took some decades of further researchto develop general methods of extension to σ -algebrasrich enough to include the infinitary events one typi-cally wants to discuss (Doob, 1953; Bourbaki, 1994,pages 243–245). The topological character of theseextensions and the failure of the consistency theorem

Page 20: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 89

for arbitrary Cartesian products remain two importantcaveats to the Grundbegriffe’s thesis that probability isadequately represented by the abstract notion of a prob-ability measure.

5.1.3 Experiments and conditional probability. Inthe case where A has nonzero probability, Kolmogorovdefined PA(B) in the usual way. He called it bedingteWahrscheinlichkeit, which translates into English as“conditional probability.” Before the Grundbegriffe,this term was less common than “relative probability.”

Kolmogorov’s treatment of conditional probabilityand expectation was novel. It began with a set-theoreticformalization of the concept of an experiment (Ver-such in German). Here Kolmogorov had in mind asubexperiment of the grand experiment defined by theconditions S. The subexperiment may give only limi-ted information about the outcome ξ of the grand ex-periment. It defines a partition A of the sample spaceE for the grand experiment: its outcome amounts tospecifying which element of A contains ξ . Kolmogo-rov formally identified the subexperiment with A. Thenhe introduced the idea of conditional probability rela-tive to A:

• In the finite case, he wrote PA(B) for the randomvariable whose value at each point ξ of E is PA(B),where A is the element of A that contains ξ , and hecalled this random variable the “conditional proba-bility of B after the experiment A” (page 12). Thisrandom variable is well defined for all the ξ in ele-ments of A that have positive probability, and theseξ form an event that has probability 1.

• In the general case, he represented the partition A bya function u on E that induces it and he wrote Pu(B)

for any random variable that satisfies

P{u⊂A}(B) = E{u⊂A}Pu(B)

for every set A of possible values of u such that thesubset {ξ |u(ξ) ∈ A} of E (this is what he meant by{u ⊂ A}) is measurable and has positive probability(page 42). By the Radon–Nikodym theorem (onlyrecently proven by Nikodym), this random variableis unique up to a set of probability 0. Kolmogorovcalled it the “conditional probability of B with re-spect to (or knowing) u.” He defined Eu(y), whichhe called “the conditional expectation of the variabley for a known value of u,” analogously (page 46).

Kolmogorov was doing no new mathematics here; themathematics is Nikodym’s. However, Kolmogorov wasthe first to point out that Nikodym’s result can be used

to derive conditional probabilities from absolute prob-abilities.

We should not, incidentally, jump to the conclu-sion that Kolmogorov had abandoned the emphasis ontransition probabilities he had displayed in his 1931article and now wanted to start the study of stochas-tic processes with unconditional probabilities. Evenin 1935, he recommended the opposite (Kolmogorov,1935, pages 168–169 of the English translation).

5.1.4 When is conditional probability meaningful?To illustrate his understanding of conditional probabi-lity, Kolmogorov discussed Bertrand’s paradox of thegreat circle, which he called, with no specific reference,a Borelian paradox. His explanation of the paradox wassimple but formal. After noting that the probability dis-tribution for the second point conditional on a particu-lar great circle is not uniform, he said:

This demonstrates the inadmissibility ofthe idea of conditional probability with re-spect to a given isolated hypothesis withprobability zero. One obtains a probabilitydistribution for the latitude on a given greatcircle only when that great circle is consid-ered as an element of a partition of the entiresurface of the sphere into great circles withthe given poles (page 45).

This explanation has become part of the culture ofprobability theory, but it cannot completely replace themore substantive explanations given by Borel.

Borel insisted that we explain how the measurementon which we will condition is to be carried out. Thisaccords with Kolmogorov’s insistence that a partitionbe specified, because a procedure for measurement willdetermine such a partition. Kolmogorov’s explicitnesson this point was a philosophical advance. On the otherhand, Borel demanded more than the specification of apartition. He demanded that the measurement be speci-fied realistically enough that we can see partitions intoevents of positive probability, not just a theoretical lim-iting partition into events of probability 0.

Borel’s demand that we be told how the theoreticalpartition into events of probability 0 arises as a limitof partitions into events of positive probability againcompromises the abstract picture by introducing to-pological ideas, but this seems to be needed so as torule out nonsense. This point was widely discussedin the 1940s and 1950s. Dieudonné (1948) and Lévy(1959) gave examples in which the conditional prob-abilities defined by Kolmogorov do not have versions

Page 21: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

90 G. SHAFER AND V. VOVK

(functions of ξ for fixed B) that form sensible prob-ability measures (when considered as functions of B

for fixed ξ ). Gnedenko and Kolmogorov (1949) andBlackwell (1956) formulated conditions on measurablespaces or probability measures that rule out such exam-ples. For modern formulations of these conditions, seeRogers and Williams (2000).

5.2 The empirical origin of the axioms

Kolmogorov devoted about two pages of the Grund-begriffe to the relation between his axioms and the realworld. These two pages, a concise statement of Kolmo-gorov’s frequentist philosophy, are so important to ourstory that we quote them in full. We then discuss howthis philosophy was related to the thinking of his prede-cessors and how it fared in the decades following 1933.

5.2.1 In Kolmogorov’s own words. Section 2 ofChapter I of the Grundbegriffe is titled “Das Verhält-nis zur Erfahrungswelt.” It is only two pages in length.This subsection consists of a translation of the sectionin its entirety.

The relation to the world of experienceThe theory of probability is applied to thereal world of experience as follows:

1. Suppose we have a certain system ofconditions S, capable of unlimited repe-tition.

2. We study a fixed circle of phenomenathat can arise when the conditions S arerealized. In general, these phenomenacan come out in different ways in differ-ent cases where the conditions are rea-lized. Let E be the set of the differentpossible variants ξ1, ξ2, . . . of the out-comes of the phenomena. Some of thesevariants might actually not occur. Weinclude in the set E all the variants weregard a priori as possible.

3. If the variant that actually appears whenconditions S are realized belongs to a setA that we define in some way, then wesay that the event A has taken place.

EXAMPLE. The system of conditionsS consists of flipping a coin twice. Thecircle of phenomena mentioned in point 2consists of the appearance, on each flip,of heads or tails. It follows that there arefour possible variants (elementary events),

namely

heads—heads, heads—tails,

tails—heads, tails—tails.

Consider the event A that there is a repe-tition. This event consists of the first andfourth elementary events. Every event cansimilarly be regarded as a set of elementaryevents.

4. Under certain conditions, that we willnot go into further here, we may assumethat an event A that does or does not oc-cur under conditions S is assigned a realnumber P(A) with the following proper-ties:A. One can be practically certain that

if the system of conditions S is re-peated a large number of times, n,and the event A occurs m times, thenthe ratio m/n will differ only slightlyfrom P(A).

B. If P(A) is very small, then one canbe practically certain that the event A

will not occur on a single realizationof the conditions S.

Empirical deduction of the axioms. Usu-ally one can assume that the system F ofevents A,B,C . . . that come into consid-eration and are assigned definite probabili-ties forms a field that contains E (AxiomsI and II and the first half of Axiom III—theexistence of the probabilities). It is furtherevident that 0 ≤ m/n ≤ 1 always holds, sothat the second half of Axiom III appearscompletely natural. We always have m = n

for the event E, so we naturally set P(E) =1 (Axiom IV). Finally, if A and B are mu-tually incompatible (in other words, the setsA and B are disjoint), then m = m1 + m2,where m, m1 and m2 are the numbers ofexperiments in which the events A ∪ B , A

and B happen, respectively. It follows that

m

n= m1

n+ m2

n.

So it appears appropriate to set P(A ∪ B) =P(A) + P(B).

REMARK I. If two assertions are bothpractically certain, then the assertion that

Page 22: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 91

they are simultaneously correct is practi-cally certain, though with a little lowerdegree of certainty. But if the number of as-sertions is very large, we cannot draw anyconclusion whatsoever about making the as-sertions simultaneously from the practicalcertainty of each of them individually. Soit in no way follows from Principle A thatm/n will differ only a little from P(A) inevery one of a very large number of seriesof experiments, where each series consistsof n experiments.

REMARK II. By our axioms, the impos-sible event (the empty set) has the probabil-ity P(∅) = 0. But the converse inference,from P(A) = 0 to the impossibility of A,does not by any means follow. By Princi-ple B, the event A’s having probability zeroimplies only that it is practically impossiblethat it will happen on a particular unrepe-ated realization of the conditions S. Thisby no means implies that the event A willnot appear in the course of a sufficientlylong series of experiments. When P(A) = 0and n is very large, we can only say, byPrinciple A, that the quotient m/n will bevery small—it might, for example, be equalto 1/n.

5.2.2 The philosophical synthesis. The philosophyset out in the two pages we have just translated is a syn-thesis, combining elements of the German and Frenchtraditions.

By his own testimony, Kolmogorov drew first andforemost from von Mises. In a footnote, he put the mat-ter this way:

. . . In laying out the assumptions needed tomake probability theory applicable to theworld of real events, the author has fol-lowed in large measure the model providedby Mr. von Mises . . .

The very title of this section of the Grundbegriffe, “DasVerhältnis zur Erfahrungswelt,” echoes the title of thepassage in von Mises (1931) that Kolmogorov cites—“Das Verhältnis der Theorie zur Erfahrungswelt”—but Kolmogorov does not discuss collectives. As heexplained in a letter to Fréchet in 1939, he thoughtonly a finitary version of this concept would reflectexperience truthfully, and a finitary version, unlike

von Mises’ infinitary version, could not be made math-ematically rigorous. So for mathematics, one shouldadopt an axiomatic theory “whose practical value canbe deduced directly” from a finitary concept of collec-tives.

Although collectives are in the background, Kolmo-gorov starts in a way that echoes Chuprov more thanvon Mises. He writes, as Chuprov (1910, page 149)did, of a system of conditions (Komplex von Bedin-gungen in German; kompleks uslovii in Russian).Probability is relative to a system of conditions S, andyet further conditions must be satisfied in order forevents to be assigned a probability under S. Kolmogo-rov says nothing more about these conditions, but wemay conjecture that he was thinking of the three sour-ces of probabilities mentioned by von Mises: gamblingdevices, statistical phenomena and physical theory.

Where do von Mises’ two axioms—probability as alimit of relative frequency and its invariance under se-lection of subsequences—appear in Kolmogorov’s ac-count? Principle A is obviously a finitary version ofvon Mises’ axiom that identifies probability as the limitof relative frequency. Principle B, on the other hand,is the strong form of Cournot’s principle (see Sec-tion 2.2.2 above). Is it a finitary version of von Mises’principle of invariance under selection? Evidently. Ina collective, von Mises says, we have no way to sin-gle out an unusual infinite subsequence. One finitaryversion of this is that we have no way to single out anunusual single trial. It follows that when we do selecta single trial (a single realization of the conditions S,as Kolmogorov puts it), we should not expect anythingunusual. In the special case where the probability isvery small, the usual is that the event will not happen.

Of course, Principle B, like Principle A, is only sat-isfied when there is a collective, that is, under certainconditions. Kolmogorov’s insistence on this point isconfirmed by the comments we quoted in Section 2.2.2herein on the importance and nontriviality of the stepfrom “usually” to “in this particular case.”

As Borel and Lévy had explained so many times,Principle A can be deduced from Principle B togeth-er with Bernoulli’s theorem, which is a consequenceof the axioms. In the framework that Kolmogorov setsup, however, the deduction requires an additional as-sumption: we must assume that Principle B appliesnot only to the probabilities specified for repetitionsof conditions S, but also to the corresponding prob-abilities (obtaining by assuming independence) for re-petitions of n-fold repetitions of S. It is not clearthat this additional assumption is appropriate, not only

Page 23: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

92 G. SHAFER AND V. VOVK

because we might hesitate about independence (seeShiryaev’s comments on page 120 of the third Russianedition of the Grundbegriffe, published in 1998), butalso because the enlargement of our model to n-fold re-petitions might involve a deterioration in its empiricalprecision to the extent that we are no longer justified intreating its high-probability predictions as practicallycertain. Perhaps these considerations justify Kolmogo-rov’s presenting Principle A as an independent princi-ple alongside Principle B rather than as a consequenceof it.

Principle A has an independent role in Kolmogorov’sstory, however, even if we do regard it as a consequenceof Principle B together with Bernoulli’s theorem, be-cause it comes into play at a point that precedes theadoption of the axioms and hence the derivation of Ber-noulli’s theorem: it is used to motivate the axioms (cf.Bartlett, 1949). The parallel to the thinking of Lévy isstriking. In Lévy’s picture, the notion of equally like-ly cases motivates the axioms, while Cournot’s princi-ple links the theory with reality. The most importantchange Kolmogorov makes in this picture is to replaceequally likely cases with frequency; frequency nowmotivates the axioms, but Cournot’s principle remainsthe most essential link with reality.

In spite of the obvious influence of Borel and Lévy,Kolmogorov cites only von Mises in this section ofthe Grundbegriffe. Philosophical works by Borel andLévy, along with those by Slutsky and Cantelli, do ap-pear in the Grundbegriffe’s bibliography, but their ap-pearance is explained only by a sentence in the preface:“The bibliography gives some recent works that sho-uld be of interest from a foundational viewpoint.” Theemphasis on von Mises may have been motivated inpart by political prudence. Whereas Borel and Lévypersisted in speaking of the subjective side of proba-bility, von Mises was an uncompromising frequentist.Whereas Chuprov and Slutsky worked in economicsand statistics, von Mises was an applied mathemati-cian, concerned more with aerodynamics than socialscience, and the relevance of his work on collectivesto physics had been established in the Soviet litera-ture by Khinchin (1929; see also Khinchin, 1961, andSiegmund-Schultze, 2004). (For more on the politi-cal context, see Blum and Mespoulet, 2003; Lorentz,2002; Mazliak, 2003; Seneta, 2004.)

5.2.3 Why was Kolmogorov’s philosophy not moreinfluential? Although Kolmogorov never abandonedhis formulation of frequentism, his philosophy has not

enjoyed the enduring popularity of his axioms. Sec-tion 2 of Chapter I of the Grundbegriffe is seldom quo-ted. Cournot’s principle remained popular in Europeduring the 1950s (Shafer and Vovk, 2005), but nevergained substantial traction in the United States.

The lack of interest in Kolmogorov’s philosophyduring the past half century can be explained in manyways, but one important factor is the awkwardness ofextending it to stochastic processes. The first conditionin Kolmogorov’s credo is that the system of conditionsshould be capable of unlimited repetition. When wedefine a stochastic process in terms of transition prob-abilities, as in Kolmogorov (1931), this condition maybe met, for it may be possible to start a system repeat-edly in a given state, but when we focus on probabili-ties for sets of possible trajectories, we are in a moreawkward position. In many applications, there is onlyone realized trajectory; it is not possible to repeat theexperiment to obtain another. Kolmogorov managed tooverlook this tension in the Grundbegriffe, where heshowed how to represent a discrete-time Markov chainin terms of a single probability measure (Chapter I,Section 6), but did not give such representations forcontinuous stochastic processes. It became more dif-ficult to ignore the tension after Doob and others suc-ceeded in giving such representations.

6. CONCLUSION

Seven decades later, the Grundbegriffe’s mathemati-cal ideas still set the stage for mathematical probability.Its philosophical ideas, especially Cournot’s principle,also remain powerful, even for those who want to gobeyond the measure-theoretic framework (Shafer andVovk, 2001). As we have tried to show in this article,the endurance of these ideas is not due to Kolmogo-rov’s originality. Rather, it is due to the presence of theideas in the very fabric of the work that came before.The Grundbegriffe was a product of its own time.

ACKNOWLEDGMENTS

Glenn Shafer’s research was partially supportedby NSF Grant SES-98-19116 to Rutgers University.Vladimir Vovk’s research was partially supported byEPSRC Grant GR/R46670/01, BBSRC Grant111/BIO14428, MRC Grant S505/65 and EU GrantIST-1999-10226 to Royal Holloway, University ofLondon.

We want to thank the many colleagues who havehelped us broaden our understanding of the period

Page 24: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 93

discussed in this article. Bernard Bru and Oscar Shey-nin were particularly helpful. We also benefited fromconversation and correspondence with Pierre Crépel,Elyse Gustafson, Sam Kotz, Steffen Lauritzen, PerMartin-Löf, Thierry Martin, Laurent Mazliak, PaulMiranti, Julie Norton, Nell Painter, Goran Peskir, An-drzej Ruszczynski, J. Laurie Snell, Stephen M. Stiglerand Jan von Plato.

We are also grateful for help in locating references.Sheynin gave us direct access to his extensive trans-lations. Vladimir V’yugin helped us locate the origi-nal text of Kolmogorov’s 1929 article, and AleksandrShen’ gave us a copy of the 1936 Russian translationof the Grundbegriffe. Natalie Borisovets, at Rutgers’Dana Library, and Mitchell Brown, at Princeton’s FineLibrary, have also been exceedingly helpful.

REFERENCES

ANDERSEN, E. S. and JESSEN, B. (1948). On the introductionof measures in infinite product sets. Det Kongelige DanskeVidenskabernes Selskab, Matematisk-Fysiske Meddelelser 25(4), 8 pp.

BACHELIER, L. (1900). Théorie de la spéculation. Ann. Sci. ÉcoleNorm. Supér. (3) 17 21–86. This was Bachelier’s doctoral dis-sertation. Reprinted in facsimile in 1995 by Éditions JacquesGabay, Paris. An English translation, by A. J. Boness, appearsin P. H. Cootner, ed. (1964). The Random Character of StockMarket Prices 17–78. MIT Press.

BACHELIER, L. (1910). Les probabilités à plusieurs variables. Ann.Sci. École Norm. Supér. (3) 27 339–360.

BACHELIER, L. (1912). Calcul des probabilités. Gauthier-Villars,Paris.

BARONE, J. and NOVIKOFF, A. (1978). A history of the axiomaticformulation of probability from Borel to Kolmogorov. I. Arch.Hist. Exact Sci. 18 123–190.

BARTLETT, M. S. (1949). Probability in logic, mathematics andscience. Dialectica 3 104–113.

BAYER, R., ed. (1951). Congrès international de philosophie dessciences, Paris, 1949 4. Calcul des probabilités. Hermann, Pa-ris.

BERNOULLI, J. (1713). Ars Conjectandi. Thurnisius, Basel. Thispathbreaking work appeared eight years after Bernoulli’sdeath. A facsimile reprinting of the original Latin text is soldby Éditions Jacques Gabay, Paris. A German translation ap-peared in 1899 (Wahrscheinlichkeitsrechnung von Jakob Ber-noulli. Anmerkungen von R. Haussner, Ostwald’s Klassiker,Nr. 107–108, Engelmann, Leipzig), with a second edition(Deutsch, Frankfurt) in 1999. A Russian translation of Part IV,which contains Bernoulli’s law of large numbers, appeared in1986: �. Bernulli, O zakone bol�xih qisel. Nauka,Moscow. It includes a preface by Kolmogorov, dated October1985, and commentaries by other Russian authors. B. Sung’sEnglish translation of Part IV, dated 1966, remains unpublishedbut is available in several university libraries in the United Sta-tes. O. Sheynin’s English translation of Part IV, dated 2005,can be downloaded from www.sheynin.de.

BERNSTEIN, F. (1912). Über eine Anwendung der Mengenlehreauf ein aus der Theorie der säkularen Störungen herrührendesProblem. Math. Ann. 71 417–439.

BERNSTEIN, S. N. (1917). Opyt aksiomatiqeskogo obosno-vani� teorii vero�tnostei (On the axiomatic founda-tion of the theory of probability). Soobweni� Har�kovskogoMatematiqeskogo Obwestva (Communications of theKharkiv Mathematical Society) 15 209–274. Reprinted inS. N. Bernstein (1964). Sobranie Soqinenii 10–60. Na-uka, Moscow.

BERNSTEIN, S. N. (1927). Teori� vero�tnostei (Theoryof Probability). Gosudarstvennoe Izdatel�stvo (StatePublishing House), Moscow and Leningrad. Second edition1934, fourth 1946. This work was included in the Grundbe-griffe’s bibliography.

BERTRAND, J. (1889). Calcul des probabilités. Gauthier-Villars,Paris. Some copies of the first edition are dated 1888. Secondedition 1907. Reprinted by Chelsea, New York, 1972.

BLACKWELL, D. (1956). On a class of probability spaces. Proc.Third Berkeley Symp. Math. Statist. Probab. 2 1–6. Univ. Cali-fornia Press, Berkeley.

BLUM, A. and MESPOULET, M. (2003). L’Anarchie bureau-cratique. Statistique et pouvoir sous Staline. Découverte, Paris.

BOHLMANN, G. (1901). Lebensversicherungs-Mathematik. In En-cyklopädie der Mathematischen Wissenschaften 1(2) 852–917.Teubner, Leipzig.

BOREL, E. (1895). Sur quelques points de la théorie des fonctions.Ann. Sci. École Norm. Supér. (3) 12 9–55.

BOREL, E. (1897). Sur les séries de Taylor. Acta Math. 20243–247. Reprinted in Borel (1972) 2 661–665.

BOREL, E. (1898). Leçons sur la théorie des fonctions. Gauthier-Villars, Paris.

BOREL, E. (1905). Remarques sur certaines questions de proba-bilité. Bull. Soc. Math. France 33 123–128. Reprinted in Borel(1972) 2 985–990.

BOREL, E. (1906). La valeur pratique du calcul des probabilités.La revue du mois 1 424–437. Reprinted in Borel (1972) 2991–1004.

BOREL, E. (1909a). Les probabilités dénombrables et leurs appli-cations arithmétiques. Rend. Circ. Mat. Palermo 27 247–270.Reprinted in Borel (1972) 2 1055–1079.

BOREL, E. (1909b). Éléments de la théorie des probabilités.Gauthier-Villars, Paris. Third edition 1924. The 1950 editionwas translated into English by J. E. Freund and published asElements of the Theory of Probability by Prentice-Hall in 1965.

BOREL, E. (1912). Notice sur les travaux scientifiques. Gauthier-Villars, Paris. Prepared by Borel to support his candidacy to theAcadémie des Sciences. Reprinted in Borel (1972) 1 119–190.

BOREL, E. (1914). Le Hasard. Alcan, Paris. The first and secondeditions both appeared in 1914, with later editions in 1920,1928, 1932, 1938 and 1948.

BOREL, E. (1930). Sur les probabilités universellement négligea-bles. C. R. Acad. Sci. Paris 190 537–540. Reprinted as Note IVof Borel (1939).

BOREL, E. (1939). Valeur pratique et philosophie des probabilités.Gauthier-Villars, Paris. Reprinted in 1991 by Éditions JacquesGabay, Paris.

BOREL, E. (1972). Œuvres de Émile Borel. Centre National de laRecherche Scientifique, Paris. Four volumes.

Page 25: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

94 G. SHAFER AND V. VOVK

BOURBAKI, N. (pseudonym) (1994). Elements of the Historyof Mathematics. Springer, Berlin. Translated from the 1984French edition by J. Meldrum.

BROGGI, U. (1907). Die Axiome der Wahrscheinlichkeitsrech-nung. Ph.D. thesis, Universität Göttingen. Excerpts reprintedin Schneider (1988) 359–366.

BRU, B. (2001). Émile Borel. In Statisticians of the Centuries(C. C. Heyde and E. Seneta, eds.) 287–291. Springer, NewYork.

BRU, B. (2003). Souvenirs de Bologne. Journal de la SociétéFrançaise de Statistique 144 134–226. Special volume onhistory.

BRUNS, H. (1906). Wahrscheinlichkeitsrechnung und Kollek-tivmasslehre. Teubner, Leipzig and Berlin. Available athistorical.library.cornell.edu.

BUFFON, G.-L. (1777). Essai d’arithmétique morale. In Sup-plément à l’Histoire naturelle 4 46–148. Imprimerie Royale,Paris.

CANTELLI, F. P. (1916a). La tendenza ad un limite nel sensodel calcolo delle probabilità. Rend. Circ. Mat. Palermo 41191–201. Reprinted in Cantelli (1958) 175–188.

CANTELLI, F. P. (1916b). Sulla legge dei grandi numeri. Atti Re-ale Accademia Nazionale Lincei, Memorie Cl. Sc. Fis. 11329–349. Reprinted in Cantelli (1958) 189–213.

CANTELLI, F. P. (1917). Sulla probabilità come limite della fre-quenza. Atti Reale Accademia Nazionale Lincei 26 39–45. Re-printed in Cantelli (1958) 214–221.

CANTELLI, F. P. (1932). Una teoria astratta del calcolo delleprobabilità. Giornale dell’Istituto Italiano degli Attuari 3257–265. Reprinted in Cantelli (1958) 289–297.

CANTELLI, F. P. (1935). Considérations sur la convergence dansle calcul des probabilités. Ann. Inst. H. Poincaré 5 3–50. Re-printed in Cantelli (1958) 322–372.

CANTELLI, F. P. (1958). Alcune memorie matematiche. Giuffrè,Milan.

CARATHÉODORY, C. (1914). Über das lineare Mass vonPunktmengen—eine Verallgemeinerung des Längenbegriffs.Nachr. Akad. Wiss. Göttingen Math.-Phys. II Kl. 4 404–426.

CARATHÉODORY, C. (1918). Vorlesungen über reelle Funktionen.Teubner, Leipzig and Berlin. Second edition 1927.

CASTELNUOVO, G. (1919). Calcolo delle probabilitá. Albrighie Segati, Milan, Rome, and Naples. Second edition in twovolumes, 1926 and 1928. Third edition 1948.

CHUPROV, A. A. (1910). Oqerki po teorii statistiki(Essays on the Theory of Statistics), 2nd ed. Sabashnikov, St.Petersburg. The first edition appeared in 1909. The second edi-tion was reprinted by the State Publishing House, Moscow, in1959.

CHURCH, A. (1940). On the concept of a random sequence. Bull.Amer. Math. Soc. 46 130–135.

CIFARELLI, D. M. and REGAZZINI, E. (1996). de Finetti’s contri-bution to probability and statistics. Statist. Sci. 11 253–282.

COPELAND, A. H., SR. (1932). The theory of probability fromthe point of view of admissible numbers. Ann. Math. Statist. 3143–156.

COURNOT, A.-A. (1843). Exposition de la théorie des chanceset des probabilités. Hachette, Paris. Reprinted in 1984 as Vo-lume I (B. Bru, ed.) of Cournot (1973–1984).

COURNOT, A.-A. (1973–1984). Œuvres complètes. Vrin, Paris.Ten volumes, with an eleventh to appear.

CZUBER, E. (1903). Wahrscheinlichkeitsrechnung und ihre An-wendung auf Fehlerausgleichung, Statistik und Lebensver-sicherung. Teubner, Leipzig. Second edition 1910, third 1914.

D’ALEMBERT, J. (1761). Réflexions sur le calcul des probabilités.Opuscules mathématiques 2 1–25.

D’ALEMBERT, J. (1767). Doutes et questions sur le calcul des pro-babilités. Mélanges de littérature, d’histoire, et de philosophie5 275–304.

DANIELL, P. J. (1918). A general form of integral. Ann. of Math.(2) 19 279–294.

DANIELL, P. J. (1919a). Integrals in an infinite number of dimen-sions. Ann. of Math. (2) 20 281–288.

DANIELL, P. J. (1919b). Functions of limited variation in an infi-nite number of dimensions. Ann. of Math. (2) 21 30–38.

DANIELL, P. J. (1920). Further properties of the general integral.Ann. of Math. (2) 21 203–220.

DANIELL, P. J. (1921). Integral products and probability. Amer.J. Math. (2) 43 143–162.

DASTON, L. (1979). d’Alembert’s critique of probability theory.Historia Math. 6 259–279.

DASTON, L. (1994). How probabilities came to be objective andsubjective. Historia Math. 21 330–344.

DE FINETTI, B. (1930). A proposito dell’estensione del teoremadelle probabilità totali alle classi numerabili. Rend. RealeInstituto Lombardo Sci. Lettere 63 901–905, 1063–1069.

DE FINETTI, B. (1939). Compte rendu critique du colloque deGenève sur la théorie des probabilités. Actualités Scientifiqueset Industrielles 766. Hermann, Paris. Number 766 is the eighthfascicle of Wavre (1938–1939).

DE MOIVRE, A. (1718). The Doctrine of Chances: Or, A Methodof Calculating the Probability of Events in Play. Pearson, Lon-don. Second edition 1738, third 1756.

DIEUDONNÉ, J. (1948). Sur le théorème de Lebesgue–Nikodym.III. Ann. Univ. Grenoble 23 25–53.

DOOB, J. L. (1953). Stochastic Processes. Wiley, New York.DOOB, J. L. (1989). Kolmogorov’s early work on convergence

theory and foundations. Ann. Probab. 17 815–821.DOOB, J. L. (1994). The development of rigor in mathematical

probability, 1900–1950. In Pier (1994b) 157–170. Reprintedin Amer. Math. Monthly 103 (1996) 586–595.

DÖRGE, K. (1930). Zu der von R. von Mises gegebenen Begrün-dung der Wahrscheinlichkeitsrechnung. Math. Z. 32 232–258.

ELLIS, R. L. (1849). On the foundations of the theory of proba-bilities. Trans. Cambridge Philos. Soc. 8 1–6. The paper wasread on February 14, 1842. Part 1 of Volume 8 was publishedin 1843 or 1844, but Volume 8 was not completed until 1849.

FABER, G. (1910). Über stetige Funktionen. II. Math. Ann. 69372–443.

FECHNER, G. T. (1897). Kollektivmasslehre. Engelmann, Leipzig.Edited by G. F. Lipps.

FRÉCHET, M. (1915a). Définition de l’intégrale sur un ensembleabstrait. C. R. Acad. Sci. Paris 160 839–840.

FRÉCHET, M. (1915b). Sur l’intégrale d’une fonctionnelle étendueà un ensemble abstrait. Bull. Soc. Math. France 43 248–265.

FRÉCHET, M. (1930). Sur l’extension du théorème des probabilitéstotales au cas d’une suite infinie d’événements. Rend. RealeInstituto Lombardo Sci. Lettere 63 899–900, 1059–1062.

FRÉCHET, M. (1937–1938). Recherches théoriques modernes surla théorie des probabilités. Gauthier-Villars, Paris. This work

Page 26: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 95

was listed in the Grundbegriffe’s bibliography as in prepa-ration. It consists of two books, Fréchet (1937) and Fréchet(1938a). The two books together constitute Fascicle 3 of Vo-lume 1 of Émile Borel’s Traité du calcul des probabilités et sesapplications.

FRÉCHET, M. (1937). Généralités sur les probabilités. Variablesaléatoires. Gauthier-Villars, Paris. Second edition 1950. Thisis Book 1 of Fréchet (1937–1938).

FRÉCHET, M. (1938a). Méthode des fonctions arbitraires. Théoriedes événements en chaîne dans le cas d’un nombre fini d’étatspossibles. Gauthier-Villars, Paris. Second edition 1952. This isBook 2 of Fréchet (1937–1938).

FRÉCHET, M. (1938b). Exposé et discussion de quelques recher-ches récentes sur les fondements du calcul des probabilités.Actualités Scientifiques et Industrielles 735 23–55. Hermann,Paris. In Wavre (1938–1939), second fascicle, entitled Les fon-dements du calcul des probabilités.

FRÉCHET, M. (1951). Rapport général sur les travaux du Colloquede Calcul des Probabilités. In Bayer (1951) 3–21.

FRÉCHET, M. and HALBWACHS, M. (1924). Le calcul des proba-bilités à la portée de tous. Dunod, Paris.

GNEDENKO, B. V. and KOLMOGOROV, A. N. (1948). Teori�vero�tnostei (Probability theory). In Matematika vSSSR za tridcat� let 1917–1947 (Thirty Years of SovietMathematics 1917–1947) 701–727. Gostehizdat, Moscowand Leningrad. English translation in Sheynin (1998) 131–158.

GNEDENKO, B. V. and KOLMOGOROV, A. N. (1949). Pre-del�nye raspredeleni� dl� summ nezavisimyhsluqainyh veliqin. State Publishing House, Moscow.Translated into English by K. L. Chung and published in 1954as Limit Distributions for Sums of Independent Random Va-riables, Addison–Wesley, Cambridge, MA, with an appendixby J. L. Doob.

HABERMAN, S. J. (1996). Advanced Statistics 1. Description ofPopulations. Springer, New York.

HADAMARD, J. (1922). Les principes du calcul des probabilités.Revue de métaphysique et de morale 39 289–293. A sligh-tly longer version of this note, with the title “Les axiomes ducalcul des probabilités,” was included in Oeuvres de JacquesHadamard 4 2161–2162. Centre National de la Recherche Sci-entifique, Paris, 1968.

HAUSDORFF, F. (1901). Beiträge zur Wahrscheinlichkeitsrech-nung. Sitzungsber. Königlich Sächs. Gesellschaft Wiss. Leipz.Math.-Phys. Kl. 53 152–178.

HAUSDORFF, F. (1914). Grundzüge der Mengenlehre. von Veit,Leipzig.

HAWKINS, T. (1975). Lebesgue’s Theory of Integration: Its Ori-gins and Development, 2nd ed. Chelsea, New York. First edi-tion 1970, Univ. Wisconsin Press, Madison. The second editiondiffers only slightly from the first, but it corrects a consequ-ential error on p. 104. Second edition reprinted in 1979 byChelsea, New York, and then in 2001 by the American Mat-hematical Society, Providence, RI.

HELM, G. (1902). Die Wahrscheinlichkeitslehre als Theorie derKollektivbegriffe. Annalen der Naturphilosophie 1 364–384.

HILBERT, D. (1902). Mathematical problems. Bull. Amer. Math.Soc. 8 437–479. Hilbert’s famous address to the Internatio-nal Congress of Mathematicians in Paris in 1900, in whichhe listed twenty-three open problems central to mathematics.Translated from the German by M. W. Newson.

HOCHKIRCHEN, T. (1999). Die Axiomatisierung der Wahrsche-inlichkeitsrechnung und ihre Kontexte: Von Hilberts sechstemProblem zu Kolmogoroffs Grundbegriffen. Vandenhoeck andRuprecht, Göttingen.

HOLGATE, P. (1997). Independent functions: Probability and ana-lysis in Poland between the wars. Biometrika 84 161–173.

JEFFREYS, H. (1931). Scientific Inference. Cambridge Univ. Press.Second edition 1957, third 1973.

JESSEN, B. (1930). Über eine Lebesguesche Integrationstheoriefür Funktionen unendlich vieler Veränderlichen. In Den Sy-vende Skandinaviske Mathatikerkongress I Oslo 19–22 August1929 127–138. A. W. Brøggers Boktrykkeri, Oslo.

JESSEN, B. (1935). Some analytical problems relating to probabi-lity. J. Math. Phys. Mass. Inst. Tech. 14 24–27.

JOHNSON, N. L. and KOTZ, S., eds. (1997). Leading Personalitiesin Statistical Sciences. Wiley, New York.

KAHANE, J.-P. (1994). Des séries de Taylor au mouvement brow-nien, avec un aperçu sur le retour. In Pier (1994b) 415–429.

KAMLAH, A. (1983). Probability as a quasi-theoretical concept—J. V. Kries’ sophisticated account after a century. Erkenntnis19 239–251.

KEYNES, J. M. (1921). A Treatise on Probability. Macmillan, Lon-don.

KHINCHIN, A. YA. (1929). Uqenie Mizesa o vero�tnost�hi principy fiziqeskoi statistiki (Mises’ work onprobability and the principles of statistical physics). UspehiFiziqeskih Nauk 9 141–166.

KHINCHIN, A. YA. (1961). On the Mises frequentist theory.Voprosy filosofii (Questions of Philosophy) 15(1, 2) 91–102, 77–89. Published after Khinchin’s death byB. Gnedenko. English translation in Sheynin (1998) 99–137,reproduced with footnotes by R. Siegmund-Schultze inScience in Context 17 (2004) 391–422. We have seen only thisEnglish translation, not the original.

KHINCHIN, A. YA. and KOLMOGOROV, A. N. (1925). Über Kon-vergenz von Reihen, deren Glieder durch den Zufall bestimmtwerden. Matematiqeskii Sbornik (Sbornik: Mathema-tics) 32 668–677. Translated into Russian in Kolmogorov(1986) 7–16 and thence into English in Kolmogorov (1992)1–10.

KNOBLOCH, E. (2001). Emile Borel’s view of probability theory.In Probability Theory: Philosophy, Recent History and Re-lations to Science (V. F. Hendricks, S. A. Pedersen andK. F. Jørgensen, eds.) 71–95. Kluwer, Dordrecht.

KOLMOGOROV, A. N. (1928). Über die Summen durch den Zufallbestimmter unabhängiger Grössen. Math. Ann. 99 309–319.An addendum appears in 1930: 102 484–488. The articleand the addendum are translated into Russian in Kolmogorov(1986) 20–34 and thence into English in Kolmogorov (1992)15–31.

KOLMOGOROV, A. N. (1929). Obwa� teori� mery i is-qislenie vero�tnostei (The General Theory of Measureand the calculus of probability). In Sbornik rabot Mate-matiqeskogo Razdela, Kommunistiqeska� Akademi�,Sekci� Estestvennyh i Toqnyh Nauk (Collected Worksof the Mathematical Section, Communist Academy, Section forNatural and Exact Sciences) 1 8–21. The Socialist Academywas founded in Moscow in 1918 and was renamed The Com-munist Academy in 1923 (Vucinich, 2000). The date 8 January1927, which appears at the end of the article in the journal, was

Page 27: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

96 G. SHAFER AND V. VOVK

omitted when the article was reproduced in the second volumeof Kolmogorov’s collected works (Kolmogorov, 1986, 48–58).The English translation (Kolmogorov, 1992, 48–59) moderni-zes the article’s terminology somewhat: M becomes a “mea-sure” instead of a “measure specification.”

KOLMOGOROV, A. N. (1931). Über die analytischen Methodenin der Wahrscheinlichkeitsrechnung. Math. Ann. 104 415–458.Dated July 26, 1930. Translated into Russian in Kolmogorov(1986) 60–105 and thence into English in Kolmogorov (1992)62–108.

KOLMOGOROV, A. N. (1933). Grundbegriffe der Wahrschein-lichkeitsrechnung. Springer, Berlin. A Russian translation byG. M. Bavli, appeared under the title Osnovnye pon�ti�teorii vero�tnostei (Nauka, Moscow) in 1936, witha second edition, slightly expanded by Kolmogorov withthe assistance of A. N. Shiryaev, in 1974, and a third edi-tion (FAZIS, Moscow) in 1998. An English translation byN. Morrison appeared under the title Foundations of the Theoryof Probability (Chelsea, New York) in 1950, with a second edi-tion in 1956.

KOLMOGOROV, A. N. (1935). O nekotoryh novyh teqeni�hv teorii vero�tnostei (On some modern currents in thetheory of probability). In Trudy 2-go Vseso�znogo Mate-matiqeskogo S�ezda, Leningrad, 24–30 I�n� 1934 g.(Proceedings of the 2nd All-Union Mathematical Congress,Leningrad, 24–30 June 1934) 1 (Plenary Sessions and ReviewTalks) 349–358. Izdatel�stvo AN SSSR, Leningrad andMoscow. English translation in Sheynin (2000) 165–173.

KOLMOGOROV, A. N. (1939). Letter to Maurice Fréchet. FondsFréchet, Archives de l’Académie des Sciences, Paris.

KOLMOGOROV, A. N. (1948). Evgenii Evgenieviq Slu-ckii: Nekrolog (Obituary for Evgeny EvgenievichSlutsky). Uspehi Matematiqeskih Nauk (RussianMathematical Surveys) 3(4) 142–151. English translationin Sheynin (1998) 77–88, reprinted in Math. Sci. 27 67–74(2002).

KOLMOGOROV, A. N. (1956). Teori� vero�tnostei (Pro-bability theory). In Matematika, ee soder�anie, me-tody i znaqenie (A. D. Aleksandrov, A. N. Kolmogorovand M. A. Lavrent’ev, eds.) 2 252–284. Nauka, Moscow. TheRussian edition had three volumes. The English translation,Mathematics, Its Content, Methods, and Meaning, was firstpublished in 1962 and 1963 in six volumes by the AmericanMathematical Society, Providence, RI, and then republishedin 1965 in three volumes by the MIT Press, Cambridge, MA.Reprinted by Dover, New York, 1999. Kolmogorov’s chapteroccupies pp. 33–71 of Part 4 in the 1963 English edition andpp. 229–264 of Volume 2 in the 1965 English edition.

KOLMOGOROV, A. N. (1986). Izbrannye trudy. Teori� ve-ro�tnostei i matematiqeska� statistika. Nauka,Moscow.

KOLMOGOROV, A. N. (1992). Selected Works of A. N. Kolmogo-rov 2. Probability Theory and Mathematical Statistics. Kluwer,Dordrecht. Translation by G. Lindquist of Kolmogorov (1986).

LAEMMEL, R. (1904). Untersuchungen über die Ermittlungvon Wahrscheinlichkeiten. Ph.D. thesis, Universität Zürich.Excerpts reprinted in Schneider (1988) 367–377.

LEBESGUE, H. (1901). Sur une généralisation de l’intégrale défi-nie. C. R. Acad. Sci. Paris 132 1025–1028.

LEBESGUE, H. (1904). Leçons sur l’intégration et la recherchedes fonctions primitives. Gauthier-Villars, Paris. Second edi-tion 1928.

LÉVY, P. (1925). Calcul des probabilités. Gauthier-Villars, Paris.LÉVY, P. (1937). Théorie de l’addition des variables aléatoires.

Gauthier-Villars, Paris. Second edition 1954.LÉVY, P. (1959). Un paradoxe de la théorie des ensembles aléato-

ires. C. R. Acad. Sci. Paris 248 181–184. Reprinted in Levy(1973–1980) 6 67–69.

LÉVY, P. (1973–1980). Œuvres de Paul Lévy. Gauthier-Villars, Pa-ris. In six volumes. Edited by D. Dugué.

ŁOMNICKI, A. (1923). Nouveaux fondements du calcul des pro-babilités (Définition de la probabilité fondée sur la théorie desensembles). Fund. Math. 4 34–71.

ŁOMNICKI, Z. and ULAM, S. (1934). Sur la théorie de la mesuredans les espaces combinatoires et son application au calculdes probabilités. I. Variables indépendantes. Fund. Math. 23237–278.

LORENTZ, G. G. (2002). Mathematics and politics in the SovietUnion from 1928 to 1953. J. Approx. Theory 116 169–223.

LOVELAND, J. (2001). Buffon, the certainty of sunrise, and theprobabilistic reductio ad absurdum. Arch. Hist. Exact Sci. 55465–477.

MAC LANE, S. (1995). Mathematics at Göttingen under the Nazis.Notices Amer. Math. Soc. 42 1134–1138.

MAISTROV, L. E. (1974). Probability Theory: A Historical Sketch.Academic Press, New York. Translated and edited by S. Kotz.

MARKOV, A. A. (1900). Isqislenie vero�tnostei (Calculusof Probability). Tipografi� Imperatorskoi Akade-mii Nauk, St. Petersburg. Second edition 1908, fourth 1924.

MARKOV, A. A. (1912). Wahrscheinlichkeitsrechnung.Teubner, Leipzig. Translation of second edition of Markov(1900). Available at historical.library.cornell.edu.

MARTIN, T. (1996). Probabilités et critique philosophique selonCournot. Vrin, Paris.

MARTIN, T. (1998). Bibliographie cournotienne. Annales littérai-res de l’Université de Franche-Comté, Besançon.

MARTIN, T. (2003). Probabilité et certitude. In Probabilitéssubjectives et rationalité de l’action (T. Martin, ed.) 119–134.CNRS Éditions, Paris.

MASANI, P. R. (1990). Norbert Wiener, 1894–1964. Birkhäuser,Basel.

MAZLIAK, L. (2003). Andrei Nikolaevitch Kolmogorov(1903–1987). Un aperçu de l’homme et de l’œuvre pro-babiliste. Prépublication PMA-785, Univ. Paris VI. Availableat www.proba.jussieu.fr.

MEINONG, A. (1915). Über Möglichkeit und Wahrscheinlich-keit: Beiträge zur Gegenstandstheorie und Erkenntnistheorie.Barth, Leipzig.

NIKODYM, O. (1930). Sur une généralisation des intégrales deM. J. Radon. Fund. Math. 15 131–179.

ONDAR, KH. O., ed. (1981). The Correspondence Between A. A.Markov and A. A. Chuprov on the Theory of Probability andMathematical Statistics. Springer, New York. Translated fromthe Russian by C. M. and M. D. Stein.

ONICESCU, O. (1967). Le livre de G. Castelnuovo Calcolo dellaprobabilità e applicazioni comme aboutissant de la suite desgrands livres sur les probabilités. In Simposio Internazionale

Page 28: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 97

di Geometria Algebrica (Roma, 30 Settembre–5 Ottobre 1965)xxxvii–liii. Edizioni Cremonese, Rome.

PIER, J.-P. (1994a). Intégration et mesure 1900–1950. In Pier(1994b) 517–564.

PIER, J.-P., ed. (1994b). Development of Mathematics 1900–1950.Birkhäuser, Basel.

POINCARÉ, H. (1890). Sur le problème des trois corps et les équa-tions de la dynamique. Acta Math. 13 1–271.

POINCARÉ, H. (1896). Calcul des probabilités. Leçons professéespendant le deuxième semestre 1893–1894. Carré, Paris. Avail-able at historical.library.cornell.edu.

POINCARÉ, H. (1912). Calcul des probabilités. Gauthier-Villars,Paris. Second edition of Poincaré (1896).

PORTER, T. (1986). The Rise of Statistical Thinking, 1820–1900.Princeton University Press, Princeton, NJ.

RADEMACHER, H. (1922). Einige Sätze über Reihen von allge-meinen Orthogonalfunktionen. Math. Ann. 87 112–138.

RADON, J. (1913). Theorie und Anwendungen der absolut ad-ditiven Mengenfunktionen. Akad. Wiss. Sitzungsber. Kaiserl.Math.-Nat. Kl. 122 1295–1438. Reprinted in his GesammelteAbhandlungen 1 45–188. Birkhäuser, Basel, 1987.

REICHENBACH, H. (1916). Der Begriff der Wahrscheinlichkeitfür die mathematische Darstellung der Wirklichkeit. Barth,Leipzig.

REICHENBACH, H. (1932). Axiomatik der Wahrscheinlichkeits-rechnung. Math. Z. 34 568–619.

ROGERS, L. C. G. and WILLIAMS, D. (2000). Diffusions, MarkovProcesses, and Martingales. 1. Foundations, reprinted 2nd ed.Cambridge Univ. Press.

SCHNEIDER, I., ed. (1988). Die Entwicklung der Wahrscheinlich-keitstheorie von den Anfängen bis 1933: Einführungen undTexte. Wissenschaftliche Buchgesellschaft, Darmstadt.

SEGAL, I. E. (1992). Norbert Wiener. November 26, 1894–March18, 1964. Biographical Memoirs 61 388–436. National Aca-demy of Sciences, Washington.

SENETA, E. (1997). Boltzmann, Ludwig Edward. In Johnson andKotz (1997) 353–354.

SENETA, E. (2004). Mathematics, religion and Marxism in the So-viet Union in the 1930s. Historia Math. 31 337–367.

SHAFER, G. and VOVK, V. (2001). Probability and Finance: It’sOnly a Game! Wiley, New York.

SHAFER, G. and VOVK, V. (2005). The origins and legacy of Kol-mogorov’s Grundbegriffe. Working Paper No. 4. Available atwww.probabilityandfinance.com.

SHEYNIN, O. (1996). Aleksandr A. Chuprov: Life, Work, Corre-spondence. The Making of Mathematical Statistics. Vandenho-eck and Ruprecht, Göttingen.

SHEYNIN, O., ed. (1998). From Markov to Kolmogorov. Rus-sian papers on probability and statistics. Containing es-says of S. N. Bernstein, A. A. Chuprov, B. V. Gnedenko,A. Ya. Khinchin, A. N. Kolmogorov, A. M. Liapunov,A. A. Markov and V. V. Paevsky. Hänsel-Hohenhausen,Egelsbach, Germany. Translations from Russian into Englishby the editor. Deutsche Hochschulschriften No. 2514. Inmicrofiche.

SHEYNIN, O., ed. (2000). From Daniel Bernoulli to Urlanis. Stillmore Russian Papers on Probability and Statistics. Hänsel-Hohenhausen, Egelsbach, Germany. Translations from Rus-sian into English by the editor. Deutsche HochschulschriftenNo. 2696. In microfiche.

SIEGMUND-SCHULTZE, R. (2004). Mathematicians forced to phi-losophize: An introduction to Khinchin’s paper on von Mises’theory of probability. Sci. Context 17 373–390.

SIERPINSKI, W. (1918). Sur une définition axiomatique des en-sembles mesurables (L). Bull. Internat. Acad. Sci. Cracovie A173–178. Reprinted in W. Sierpinski (1975). Oeuvres choisies2 256–260. PWN (Polish Scientific Publishers), Warsaw.

SLUTSKY, E. (1922). K voprosu o logiqeskih osnovah teo-rii vero�tnosti (On the question of the logical foundationof the theory of probability). Vestnik Statistiki (Bul-letin of Statistics) 12 13–21.

SLUTSKY, E. (1925). Über stochastische Asymptoten und Grenz-werte. Metron 5 3–89.

STEINHAUS, H. (1923). Les probabilités dénombrables et leur rap-port à la théorie de la mesure. Fund. Math. 4 286–310.

STEINHAUS, H. (1930a). Über die Wahrscheinlichkeit dafür,daß der Konvergenzkreis einer Potenzreihe ihre natürlicheGrenze ist. Math. Z. 31 408–416. Received by the editors5 August 1929.

STEINHAUS, H. (1930b). Sur la probabilité de la convergencede séries. Première communication. Studia Math. 2 21–39.Received by the editors 24 October 1929.

STIGLER, S. M. (1973). Simon Newcomb, Percy Daniell, and thehistory of robust estimation 1885–1920. J. Amer. Statist. Assoc.68 872–879.

TORNIER, E. (1933). Grundlagen der Wahrscheinlichkeitsrech-nung. Acta Math. 60 239–380.

ULAM, S. (1932). Zum Massbegriffe in Produkträumen. In Ver-handlung des Internationalen Mathematiker-Kongress Zürich2 118–119.

VENN, J. (1888). The Logic of Chance, 3rd ed. Macmillan, Londonand New York. First edition 1866, second 1876.

VILLE, J. (1939). Étude critique de la notion de collectif. Gauthier-Villars, Paris. This differs from Ville’s dissertation, which wasdefended in March 1939, only in that a 17-page introductorychapter replaces the dissertation’s one-page introduction.

VON BORTKIEWICZ, L. (1901). Anwendungen der Wahrschein-lichkeitsrechnung auf Statistik. In Encyklopädie der Mathema-tischen Wissenschaften 1 821–851. Teubner, Leipzig.

VON KRIES, J. (1886). Die Principien der Wahrscheinlichkeits-rechnung. Eine logische Untersuchung. Mohr, Freiburg. Thesecond edition, which appeared in 1927, reproduced the firstwithout change and added a new 12-page foreword.

VON MISES, R. (1919). Grundlagen der Wahrscheinlichkeitsrech-nung. Math. Z. 5 52–99.

VON MISES, R. (1928). Wahrscheinlichkeit, Statistik und Wahrheit.Springer, Vienna. Second edition 1936, third 1951. A posthu-mous fourth edition, edited by his widow Hilda Geiringer, ap-peared in 1972. English editions, under the title Probability,Statistics and Truth, appeared in 1939 and 1957.

VON MISES, R. (1931). Wahrscheinlichkeitsrechnung und ihre An-wendung in der Statistik und theoretischen Physik. Deuticke,Leipzig and Vienna.

VON PLATO, J. (1994). Creating Modern Probability: Its Math-ematics, Physics, and Philosophy in Historical Perspective.Cambridge Univ. Press.

VUCINICH, A. (2000). Soviet mathematics and dialectics in theStalin era. Historia Math. 27 54–76.

WALD, A. (1938). Die Widerspruchfreiheit des Kollectivbegrif-fes. In Actualités Scientifiques et Industrielles 735 79–99.

Page 29: Statistical Science The Sources of Kolmogorov’s · PDF fileThe Sources of Kolmogorov’s Grundbegriffe ... about axioms for probability, about Cournot’s principle and about the

98 G. SHAFER AND V. VOVK

Hermann, Paris. Titled Les fondements du calcul des prob-abilités, Number 735 is the second fascicle of Wavre(1938–1939).

WAVRE, R. (1938–1939). Colloque consacré à la théorie des prob-abilités. Hermann, Paris. This celebrated colloquium, chairedby Maurice Fréchet, was held in October 1937 at the Univer-sity of Geneva. The proceedings were published by Hermannin eight fascicles in their series Actualités Scientifiques etIndustrielles. The first seven fascicles appeared in 1938 asnumbers 734 through 740; the eighth, de Finetti’s summary ofthe colloquium, appeared in 1939 as number 766 (de Finetti,1939).

WHITTLE, P. (2000). Probability via Expectation, 4th ed. Springer,New York. The first two editions (Penguin, 1970; Wiley, 1976)were titled Probability. The third edition, also by Springer, ap-peared in 1992.

WIENER, N. (1920). The mean of a functional of arbitrary ele-ments. Ann. of Math. (2) 22 66–72.

WIENER, N. (1921a). The average of an analytical functional.Proc. Natl. Acad. Sci. U.S.A. 7 253–260.

WIENER, N. (1921b). The average of an analytical functionaland the Brownian movement. Proc. Natl. Acad. Sci. U.S.A. 7294–298.

WIENER, N. (1923). Differential-space. J. Math. Phys. Mass. Inst.Tech. 2 131–174.

WIENER, N. (1924). The average value of a functional. Proc. Lon-don Math. Soc. 22 454–467.

WIENER, N. (1956). I am a Mathematician. The Later Life of aProdigy. Doubleday, Garden City, NY.

WIENER, N. (1976–1985). Collected Works with Commenta-ries. MIT Press, Cambridge, MA. Four volumes. Edited byP. Masani. Volume 1 includes Wiener’s early papers on Brow-nian motion (Wiener, 1920; Wiener, 1921a; Wiener, 1921b;Wiener, 1923; Wiener, 1924), with a commentary by K. Itô.

WIMAN, A. (1900). Über eine Wahrscheinlichkeitsaufgabe beiKettenbruchentwicklungen. Öfversigt af Kongliga SvenskaVetenskaps-Akademiens Förhandlingar. Femtiondesjunde År-gången 57 829–841.

WIMAN, A. (1901). Bemerkung über eine von Gyldén aufgewor-fene Wahrscheinlichkeitsfrage. Håkan Ohlssons boktrykeri,Lund.