Earliest Known Uses of Some of the Words of Probability & Statistics

Load Home page + menu
Last update 12 march 2003

Earliest Known Uses of Some of the Words of Probability & Statistics

This page attempts to show the first uses of various words used in Probability & Statistics. It contains words related to probability & statistics that are extracted from the Earliest Known Uses of Some of the Words of Mathematics pages of Jeff Miller with his permission. Research for his pages is ongoing, and the uses cited in this page should not be assumed to be the first uses that occurred unless it is stated that the term was introduced or coined by the mathematician named. If you are able antedate any of the entries herein, please contact Jeff Miller, a teacher at Gulf High School in New Port Richey, Florida, who maintains these aformentioned pages. See also Jeff Millers Earliest Uses of Various Mathematical Symbols. Texts in red are by Kees Verduin.

ANCILLARY in the theory of statistical estimation. The term "ancillary statistic" first appears in R. A. Fisher's 1925 "Theory of Statistical Estimation," Proc. Cambr. Philos. Soc. 22. 700-725, although interest in ancillary statistics only gathered momentum in the mid-1930s when Fisher returned to the topic and other authors started contributing to it [John Aldrich, David (1995)].

The phrase ANALYSIS OF VARIANCE appears in 1918 in Sir Ronald Aylmer Fisher, "The Causes of Human Variability," Eugenics Review, 10, 213-220 (David, 1995).

It appears in a paper by Sir Ronald Aylmer Fisher published in 1924, used as if Fisher expected the reader to know that an analysis of variance was. In a 1920 paper, Fisher used the phrase "analysis of total variance" as if he had to explain what such a procedure is.

In The History of Statistics: The Measurement of Uncertainty before 1900, Stephen M. Stigler writes, "Yule derived what we now, following Fisher, call the analysis of variance breakdown." [James A. Landau]

ASSOCIATION (in statistics) is found in 1900 in G. U. Yule, "On the Association of Attributes in Statistics," Philosophical Transactions of the Royal Society of London, Ser. A, 194, 257-319 (David, 1998).

AVERAGE ERROR: more to be added

BAR CHART occurs in Nov. 1914 in W. C. Brinton, "Graphic Methods for Presenting Data. IV. Time Charts," Engineering Magazine, 48, 229-241 (David, 1998).

The form of diagram, however, is much older; there is an example from William Playfair's Commercial and Political Atlas of 1786 at http://www.york.ac.uk/depts/maths/histstat/playfair.gif.

BAR GRAPH is dated 1924 in MWCD10.

Bar graph is found in 1925 in Statistics by B. F. Young: "Bar-graphs in the form of progress charts are used to represent a changing condition such as the output of a factory" (OED2).

BERNOULLI TRIAL is dated 1951 in MWCD10, although James A. Landau has found the phrases "Bernoullian trials" and "Bernoullian series of trials" in 1937 in Introduction to Mathematical Probability by J. V. Uspensky.

BIASED and UNBIASED. Biased errors and unbiased errors (meaning "errors with zero expectation") are found in 1897 in A. L. Bowley, "Relations Between the Accuracy of an Average and That of Its Constituent Parts," Journal of the Royal Statistical Society, 60, 855-866 (David, 1995).

Biased sample is found in 1911 An Introduction to the theory of Statistics by G. U. Yule: "Any sample, taken in the way supposed, is likely to be definitely biassed, in the sense that it will not tend to include, even in the long run, equal proportions of the A’s and [alpha]'s in the original material" (OED2).

Biased sampling is found in F. Yates, "Some examples of biassed sampling," Ann. Eugen. 6 (1935) [James A. Landau].

BIMODAL appears in 1903 in S. R. Williams, "Variation in Lithobius Forficatus," American Naturalist, 37, 299-312 (David, 1998).

BINOMIAL DISTRIBUTION is found in 1911 in An Introduction to the Theory of Statistics by G. U. Yule: "The binomial distribution,..only becomes approximately normal when n is large, and this limitation must be remembered in applying the table..to cases in which the distribution is strictly binomial" (OED2).

BIVARIATE is found in 1920 in Biometrika XIII. 37: "Thus in 1885 Galton had completed the theory of bi-variate normal correlation" (OED2).

CENTRAL LIMIT THEOREM. In 1919 R. von Mises called the limit theorems Fundamentalsätze der Wahrscheinlichkeitsrechnung in a paper of the same name in Math Z. 4, 1-97.

Central limit theorem appears in the title "Ueber den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung," Math. Z., 15 (1920) by George Polya (1887-1985) [James A. Landau]. Polya apparently coined the term in this paper.

Central limit theorem appears in English in 1937 in Random Variables and Probability Distributions by H. Cramér (David, 1995).

CENTRAL TENDENCY is dated ca. 1928 in MWCD10.

Central tendency is found in 1929 in Kelley & Shen in C. Murchison, Found. Exper. Psychol. 838: "Some investigators have often preferred the median to the mean as a measure of central tendency" (OED2).

CHI SQUARE. Karl Pearson introduced the chi-squared test and the name for it in an article in 1900 in The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science. Pearson had been in the habit of writing the exponent in the multivariate normal density as -1/2 chi-squared [James A. Landau, John Aldrich].

CLASSICAL PROBABILITY. This term for probability as defined by Laplace and earlier writers came into use in the 1930s when alternative definitions were widely canvassed. J. V. Uspensky (Introduction to Mathematical Probability, 1937, p. 8) gave the "classical definition," which he favored, and criticized the "new definitions" (von Mises) and "the attempt to build up the theory of probability as an axiomatic science" (Kolmogorov) [John Aldrich].

CLASSICAL statistical inference. The polar pair "classical" and "Bayesian" have figured in discussions of the foundations of statistical inference since the 1960s. The body of work to which "classical" was attached went back only to the 1920s and -30s but, as Schlaifer wrote in 1959 (Probability and Statistics for Business Decisions, p. 607), "it is expounded in virtually every course on statistics [in the United States] and is adhered to by the great majority of practicing statisticians." Schlaifer and a few others were sponsoring a rejuvenated Bayesian alternative. The "classical" tag may have derived some authority from Neyman's "Outline of a Theory of Statistical Estimation based on the Classical Theory of Probability" (Philosophical Transactions of the Royal Society, 236, (1937), 333-380), one of the classics of classical statistics. The non-classical possibility Neyman had in mind and rejected was the Bayesian theory of Jeffreys. Confusingly Neyman's "classical theory of probability" has more to do with Kolmogorov and von Mises than with Laplace [John Aldrich].

CLUSTER ANALYSIS is found in 1939 in Cluster Analysis by R. C. Tryon [James A. Landau].

The term COEFFICIENT OF VARIATION appears in 1896 in Karl Pearson, "Regression, Heredity, and Panmixia," Philosophical Transactions of the Royal Society of London, Ser. A. 187, 253-318 (David, 1995). The term is due to Pearson (Cajori 1919, page 382). According to the DSB, he introduced the term in this paper.

CONDITIONAL PROBABILITY is found in J. V. Uspensky, Introduction to Mathematical Probability, New York: McGraw-Hill, 1937, page 31:

Let A and B be two events whose probabilities are (A) and (B). It is understood that the probability (A) is determined without any regard to B when nothing is known about the occurrence or nonoccurrence of B. When it is known that B occurred, A may have a different probability, which we shall denote by the symbol (A, B) and call 'conditional probability of A, given that B has actually happened.'

[James A. Landau]

CONFIDENCE INTERVAL was coined by Jerzy Neyman (1894-1981) in 1934 in "On the Two Different Aspects of the Representative Method," Journal of the Royal Statistical Society, 97, 558-625:

The form of this solution consists in determining certain intervals, which I propose to call the confidence intervals..., in which we may assume are contained the values of the estimated characters of the population, the probability of an error is a statement of this sort being equal to or less than 1 - (epsilon), where (epsilon) is any number 0 < (epsilon) < 1, chosen in advance.

CONSISTENCY. The term consistency applied to estimation was introduced by R. A. Fisher in "On the Mathematical Foundations of Theoretical Statistics" (Phil. Trans. R. Soc. 1922). Fisher wrote: "A statistic satisfies the criterion of consistency, if, when it is calculated from the whole population, it is equal to the required population."

In the modern literature this notion is usually called Fisher-consistency (a name suggested by Rao) to distinguish it from the more standard notion linked to the limiting behavior of a sequence of estimators. The latter is hinted at in Fisher's writings but was perhaps first set out rigorously by Hotelling in the "The Consistency and Ultimate Distribution of Optimum Statistics," Transactions of the American Mathematical Society (1930). [This entry was contributed by John Aldrich, based on David (1995).]

CONTINGENCY TABLE was introduced by Karl Pearson in "On the Theory of Contingency and its Relation to Association and Normal Correlation," which appeared in Drapers' Company Research Memoirs (1904) Biometric Series I:

This result enables us to start from the mathematical theory of independent probability as developed in the elementary text books, and build up from it a generalised theory of association, or, as I term it, contingency. We reach the notion of a pure contingency table, in which the order of the sub-groups is of no importance whatever.

This citation was provided by James A. Landau.

CORRELATION, CORRELATION COEFFICIENT and COEFFICIENT OF CORRELATION. Francis Galton introduced the measurement of correlation (Hald, p. 604). The index of co-relation appears in 1888 in his "Co-Relations and Their Measurement," Proc. R. Soc., 45, 135-145: "The statures of kinsmen are co-related variables; thus, the stature of the father is correlated to that of the adult son,..and so on; but the index of co-relation ... is different in the different cases" (OED2). "Co-relation" soon gave way to "correlation" as in W. F. R. Weldon's "The Variations Occurring in Certain Decapod Crustacea-I. Crangon vulgaris," Proc. R. Soc., 47. (1889 - 1890), pp. 445-453.

The term coefficient of correlation was apparently originated by Edgeworth in 1892, according to Karl Pearson's "Notes on the History of Correlation," (reprinted in Pearson & Kendall (1970). It appears in 1892 in F. Y. Edgeworth, "Correlated Averages," Philosophical Magazine, 5th Series, 34, 190-204.

Correlation coefficient appears in a paper published in 1895 [James A. Landau].

The OED2 shows a use of coefficient of correlation in 1896 by Pearson in Proc. R. Soc. LIX. 302: "Let r₀ be the coefficient of correlation between parent and offspring." David (1995) gives the 1896 paper by Karl Pearson, "Regression, Heredity, and Panmixia," Phil. Trans. R. Soc., Ser. A. 187, 253-318. This paper introduced the product moment formula for estimating correlations--Galton and Edgeworth had used different methods.

Partial correlation. G. U. Yule introduced "net coefficients" for "coefficients of correlation between any two of the variables while eliminating the effects of variations in the third" in "On the Correlation of Total Pauperism with Proportion of Out-Relief" (in Notes and Memoranda) Economic Journal, Vol. 6, (1896), pp. 613-623. Pearson argued that partial and total are more appropriate than net and gross in Karl Pearson & Alice Lee "On the Distribution of Frequency (Variation and Correlation) of the Barometric Height at Divers Stations," Phil. Trans. R. Soc., Ser. A, 190 (1897), pp. 423-469. Yule went fully partial with his 1907 paper "On the Theory of Correlation for any Number of Variables, Treated by a New System of Notation," Proc. R. Soc. Series A, 79, pp. 182-193.

Multiple correlation. At first multiple correlation referred only to the general approach, e.g. by Yule in Economic Journal (1896). The coefficient arrives later. "On the Theory of Correlation" (J. Royal Statist. Soc., 1897, p. 833) refers to a coefficient of double correlation R₁ (the correlation of the first variable with the other two). Yule (1907) discussed the coefficient of n-fold correlation R²_1(23...n). Pearson used the phrases "coefficient of multiple correlation" in his 1914 "On Certain Errors with Regard to Multiple Correlation Occasionally Made by Those Who Have not Adequately Studied this Subject," Biometrika, 10, pp. 181-187, and "multiple correlation coefficient" in his 1915 paper "On the Partial Correlation Ratio," Proc. R. Soc. Series A, 91, pp. 492-498.

[This entry was largely contributed by John Aldrich.]

The term CORRELOGRAM was introduced by H. Wold in 1938 (A Study in the Analysis of Stationary Time Series). There is a plot of empirical serial correlations, i.e. an empirical correlogram, in Yule's "Why Do We Sometimes Get Nonsense Correlations between Time-series ..." Journal of the Royal Statistical Society, 89, (1926), 1-69 (David 2001).

COVARIANCE is found in 1930 in The Genetical Theory of Natural Selection by R. A. Fisher (David, 1998).

Earlier uses of the term covariance are found in mathematics, in a non-statistical sense.

The term CRITERION OF SUFFICIENCY was used by Sir Ronald Aylmer Fisher in his paper "On the Mathematical Foundations of Theoretical Statistics," in Philosophical Transactions of the Royal Society, April 19, 1922: "The complete criterion suggested by our work on the mean square error (7) is: -- That the statistic chosen should summarise the whole of the relevant information supplied by the sample. This may be called the Criterion of Sufficiency" [James A. Landau].

DECILE (in statistics) was introduced by Francis Galton (Hald, p. 604).

Decile appears in 1882 in Francis Galton, Rep. Brit. Assoc. 1881 245: "The Upper Decile is that which is exceeded by one-tenth of an infinitely large group, and which the remaining nine-tenths fall short of. The Lower Decile is the converse of this" (OED2).

DEGREES OF FREEDOM. (See also chi-squared, F-distribution and Student's t-distribution.) Fisher introduced degrees of freedom in connection with Pearson's

test in the 1922 paper "On the Interpretation of

from "Contingency Tables, and the Calculation of P," J. Royal Statist. Soc., 85, pp. 87-94. He applied the number of degrees of freedom to distributions related to chi-squared--Student's distribution and his own z distribution in his 1924 paper, "On a Distribution Yielding the Error Functions of Several well Known Statistics," Proceedings of the International Congress of Mathematics, Toronto, 2, 805-813 [John Aldrich].

DEPENDENT VARIABLE. Subordinate variable appears in English in the 1816 translation of Differential and Integral Calculus by Lacroix: "Treating the subordinate variables as implicit functions of the indepdndent [sic] ones" (OED2).

Dependent variable appears in in 1831 in the second edition of Elements of the Differential Calculus (1836) by John Radford Young: "On account of this dependence of the value of the function upon that of the variable the former, that is y, is called the dependent variable, and the latter, x, the independent variable" [James A. Landau].

DIRECT VARIATION. Directly is found in 1743 in W. Emerson, Doctrine Fluxions: "The Times of describing any Spaces uniformly are as the Spaces directly, and the Velocities reciprocally" (OED2).

Directly proportional is found in 1796 in A Mathematical and Philosophical Dictionary: "Quantities are said to be directly proportional, when the proportion is according to the order of the terms" (OED2).

Direct variation is found in 1856 in Ray's higher arithmetic. The principles of arithmetic, analyzed and practically applied by Joseph Ray (1807-1855):

Variation is a general method of expressing proportion often used, and is either direct or inverse. Direct variation exists between two quantities when they increase togeether, or decrease together. Thus the distance a ship goes at a uniform rate, varies directly as the time it sails; which means that the ratio of any two distances is equal to the ratio of the corresponding times taken in the same order. Inverse variation exists between two quantities when one increases as the other decreases. Thus, the time in which a piece of work will be done, varies inversely as the number of men employed; which means that the ratio of any two times is equal to the ratio of the numbers of men employed for these times, taken in reverse order.

This citation was taken from the University of Michigan Digital Library [James A. Landau].

DISCRIMINANT ANALYSIS is found in Palmer O. Johnson, "The quantification of qualitative data in discriminant analysis," J. Am. Stat. Assoc. 45, 65-76 (1950).

See also W. G. Cochran and C. I. Bliss, "Discriminant functions with covariance," Ann. Math. Statist. 19 (1948) [James A. Landau].

DISPERSION (in statistics) is found in 1876 in Catalogue of the Special Loan Collection of Scientific Apparatus at the South Kensington Museum by Francis Galton (David, 1998).

The term DISTRIBUTION FUNCTION of a random variable is a translation of the Verteilungsfunktion of R. von Mises "Grundlagen der Wahrscheinlichkeitsrechnung," Math. Zeit. 5, (1919) 52-99.

The English term appears in J. L. Doob's "The Limiting Distributions of Certain Statistics," Annals of Mathematical Statistics, 6, (1935), 160-169.

The term DUMMY VARIABLE is often used when describing the status of a variable like x in a definite integral. A. Church seems to be describing an established usage when he wrote in 1942, "A variable is free in a given expression ... if the expression can be considered as representing a function with that variable as an argument. In the contrary case the variable is called a bound (or apparent or dummy) variable." ("Differentials", American Mathematical Monthly, 49, 390.) [John Aldrich].

In regression analysis a DUMMY VARIABLE indicates the presence (value 1) or absence of an attribute (0).

A JSTOR search found "dummy variables" for social class and for region in H. S. Houthakker's "The Econometrics of Family Budgets" Journal of the Royal Statistical Society A, 115, (1952), 1-28.

A 1957 article by D. B. Suits, "Use of Dummy Variables in Regression Equations" Journal of the American Statistical Association, 52, 548-551, consolidated both the device and the name.

The International Statistical Institute's Dictionary of Statistical Terms objects to the name: the term is "used, rather laxly, to denote an artificial variable expressing qualitative characteristics .... [The] word 'dummy' should be avoided."

Apparently these variables were not dummy enough for Kendall & Buckland, for whom a dummy variable signifies "a quantity written in a mathematical expression in the form of a variable although it represents a constant", e.g. when the constant in the regression equation is represented as a coefficient times a variable that is always unity.

The indicator device, without the name "dummy variable" or any other, was also used by writers on experiments who put the analysis of variance into the format of the general linear hypothesis, e.g. O. Kempthorne in his Design and Analysis of Experiments (1952) [John Aldrich].

EFFICIENCY. The terms efficiency and efficient applied to estimation were introduced by R. A. Fisher in "On the Mathematical Foundations of Theoretical Statistics" (Phil. Trans. R. Soc. 1922). He described the criterion of efficiency as "satisfied by those statistics which, when derived from large samples, tend to a normal distribution with the least possible standard deviation." He also wrote: "To calculate the efficiency of any given method, we must therefore know the probable error of the statistic calculated by that method, and that of the most efficient statistic which could be used. The square of the ratio of these two quantities then measures the efficiency." Fisher seems not to have known that such calculations had been done by Gauss a century earlier (Gauss (1816) Bestimmung der Genauigkeit der Beobachtungen). However the idea of efficiency in extracting information was novel. [This entry was contributed by John Aldrich, based on David (1995).]

EMPTY SET is found in Walter J. Bruns, "The Introduction of Negative Numbers," The Mathematics Teacher, October 1940: "For our purposes we still need a symbol for an 'empty' set, that means for a multitude containing no element."

Dorothy Geddes and Sally I. Lipsey, "The Hazards of Sets," The Mathematics Teacher, October 1969 has: "The fact that mathematicians refer to the empty set emphasizes the rather unique nature of this set."

An older term is null set, q. v.

EQUIPROBABLE was used in 1921 by John Maynard Keynes in A Treatise on Probability: "A set of exclusive and exhaustive equiprobable alternatives" (OED2).

ESTIMATION. Long before the terminology stabilized around estimation the activity was called calculation, determination or fitting.

The terms estimation and estimate were introduced in R. A. Fisher's "On the Mathematical Foundations of Theoretical Statistics" (Phil. Trans. R. Soc. 1922). He writes (none too helpfully!): "Problems of estimation are those in which it is required to estimate the value of one or more of the population parameters from a random sample of the population." Fisher uses estimate as a substantive sparingly in the paper.

The phrase unbiassed estimate appears in Fisher's Statistical Methods for Research Workers (1925, p. 54) although the idea is much older.

The expression best linear unbiased estimate appears in 1938 in F. N. David and J. Neyman, "Extension of the Markoff Theorem on Least Squares," Statistical Research Memoirs, 2, 105-116. Previously in his "On the Two Different Aspects of the Representative Method" (Journal of the Royal Statistical Society, 97, 558-625) Neyman had used mathematical expectation estimate for unbiased estimate and best linear estimate for best linear unbiased estimate (David, 1995).

The term estimator was introduced in 1939 in E. J. G. Pitman, "The Estimation of the Location and Scale Parameters of a Continuous Population of any Given Form," Biometrika, 30, 391-421. Pitman (pp. 398 & 403) used the term in a specialised sense: his estimators are estimators of location and scale with natural invariance properties. Now estimator is used in a much wider sense so that Neyman's best linear unbiased estimate would be called a best linear unbiased estimator (David, 1995). [This entry was contributed by John Aldrich.]

EVENT has been in probability in English from the beginning. A. De Moivre's The Doctrine of Chances (1718) begins "The Probability of an Event is greater or less, according to the number of chances by which it may happen, compared with the whole number of chances by which it may either happen or fail."

Event took on a technical existence when Kolmogorov in the Grundbegriffe der Wahrscheinlichkeitsrechnung (1933) identified "elementary events" ("elementare Ereignisse") with the elements of a collection E (now called the "sample space") and "random events" ("zufällige Ereignisse") with the elements of a set of subsets of E [John Aldrich].

EXPECTATION. According to A. W. F. Edwards, expectatio occurs in 1657 in Huygens's De Ratiociniis in Ludo Alae (David 1995).

According to Burton (p. 461), the word expectatio first appears in van Schooten's translation of a tract by Huygens.

The two references above point to the same text as Huygens's De Ratiociniis in Ludo Alae was a translation by van Schooten. NB The word expectatio is used quite frequently throughout the text.
This is the Latin translation by Van Schooten of the first proposition:

Si a vel b expectem, quorum utriusque aeque facile mihi obtingere possit. expectatio mea dicenda est (a+b)/2

This is the Dutch text of Huygens' Van Rekeningh in Spelen van Geluck. This text was published in 1660 but already written in 1656.

Als ick gelijcke kans hebbe om a of b te hebben, dit is my so veel weerdt als (a+b)/2

The litteral translation of the Dutch text is: If I have an equal chance to get either a or b, this to me is worth as much as (a+b)/2. There is no explicit mention of expectation only of value, but as the rest of the explanation of the first proposition is concentrated on the possible outcomes of a game of chance, expectation is implicitly around.

Expectation appears in English in Browne's 1714 translation of Huygens's De Ratiociniis in Ludo Alae (David 1995).
This is Browne's 1714 translation of the first proposition:

If I expect a or b, and have an equal chance of gaining either of them, my Expectation is worth (a+b)/2

See also expectation.

MATHEMATICAL STATISTICS. Mathematische Statistik is found in 1867 in the title Mathematische Statistik und deren Anwendung auf National-Oekonomie und Versicherungs-Wissenschaft by T. Wittstein (David, 1998).

The term MAXIMUM LIKELIHOOD was introduced by Sir Ronald Aylmer Fisher in his paper "On the Mathematical Foundations of Theoretical Statistics," in Philosophical Transactions of the Royal Society, April 19, 1922. In this paper he made clear for the first time the distinction between the mathematical properties of "likelihoods" and "probabilities" (DSB).

The solution of the problems of calculating from a sample the parameters of the hypothetical population, which we have put forward in the method of maximum likelihood, consists, then, simply of choosing such values of these parameters as have the maximum likelihood. Formally, therefore, it resembles the calculation of the mode of an inverse frequency distribution. This resemblance is quite superficial: if the scale of measurement of the hypothetical quantity be altered, the mode must change its position, and can be brought to have any value, by an appropriate change of scale; but the optimum, as the position of maximum likelihood may be called, is entirely unchanged by any such transformation. Likelihood also differs from probability in that it is not a differential element, and is incapable of being integrated: it is assigned to a particular point of the range of variation, not to a particular element of it.

MEAN occurs in English in the sense of a geometric mean in a Middle English manuscript of circa 1450 known as The Art of Numbering: "Lede the rote of o quadrat into the rote of the oþer quadrat, and þan wolle þe meene shew" [Mark Dunn].

In 1571, A geometrical practise named Pantometria by Thomas Digges (1546?-1595) has: "When foure magnitudes are...in continual proportion, the first and the fourth are the extremes, and the second and thirde the meanes" (OED2).

Mean is found in 1755 in Thomas Simpson, "An ATTEMPT to shew the Advantage, arising by Taking the Mean of a Number of Observations, in practical Astronomy," Philosophical Transactions of the Royal Society of London.

MEAN ERROR. The 1845 Encyclopedia Metropolitana has "mean risk of error" (OED2).

Mean error is found in 1853 in A dictionary of arts, manufactures, and mines; containing a clear exposition of their principles and practice by Andrew Ure [University of Michigan Digital Library].

Mean error is found in English in an 1857 translation of Gauss's Theoria motus: Consequently, if we desire the greatest accuracy, it will be necessary to compute the geocentric place from the elements for the same time, and afterwards to free it from the mean error A, in order that the most accurate position may be obtained. But it will in general be abundantly sufficient if the mean error is referred to the observation nearest to the mean time" [University of Michigan Digital Library].

In 1894 in Phil. Trans. Roy. Soc, Karl Pearson has "error of mean square" as an alternate term for "standard-deviation" (OED2).

In Higher Mathematics for Students of Chemistry and Physics (1912), J. W. Mellor writes:

In Germany, the favourite method is to employ the mean error, which is defined as the error whose square is the mean of the squares of all the errors, or the "error which, if it alone were assumed in all the observations indifferently, would give the same sum of the squares of the errors as that which actually exists." ...
The mean error must not be confused with the "mean of the errors," or, as it is sometimes called, the average error, another standard of comparison defined as the mean of all the errors regardless of sign.

In a footnote, Mellor writes, "Some writers call our "average error" the "mean error," and our "mean error" the "error of mean square" [James A. Landau].

MEAN SQUARE is found in 1845 Encycl. Metrop. (OED2).

The term MEAN SQUARE DEVIATION (apparently meaning variance) appears in a paper published by Sir Ronald Aylmer Fisher in 1920 [James A. Landau].

MEDIAN (in statistics). Valeur médiane was used by Antoine A. Cournot in 1843 in Exposition de la Théorie des Chances et des Probabilités (David, 1998).

Median was used in English by Francis Galton in Report of the British Association for the Advancement of Science in 1881: "The Median, in height, weight, or any other attribute, is the value which is exceeded by one-half of an infinitely large group, and which the other half fall short of" (OED2).

The term METHOD OF LEAST SQUARES was coined by Adrien Marie Legendre (1752-1833), appearing in Sur la Méthode des moindres quarrés [On the method of least squares], the title of an appendix to Nouvelles méthodes pour la détermination des orbites des comètes (1805). The appendix is dated March 6, 1805 [James A. Landau].

"Minimum" and "small" were the early English translations of moindres (David, 1995).

Method of least squares occurs in English in 1825 in the title "On the Method of Least Squares" by J. Ivory in Philosophical Magazine, 65, 3-10.

MODE was coined by Karl Pearson (1857-1936). He used the term in 1895 in "Skew Variation in Homogeneous Material," Philosophical Transactions of the Royal Society of London, Ser. A, 186, 343-414: "I have found it convenient to use the term mode for the abscissa corresponding to the ordinate of maximum frequency. Thus the "mean," the "mode," and the "median" have all distinct characters."

MODULUS (in logarithms) was used by Roger Cotes (1682-1716) in 1722 in Harmonia Mensurarum: Pro diversa magnitudine quantitatis assumptae M, quae adeo vocetur systematis Modulus. Cotes also coined the term ratio modularis (modular ratio) in this work.

Modulus (a coefficient that expresses the degree to which a body possesses a particular property) appears in the 1738 edition of The Doctrine of Chances: or, a Method of Calculating the Probability of Events in Play by Abraham De Moivre (1667-1754) [James A. Landau].

(Corollary 6)...To apply this to particular Examples, it will be necessary to estimate the frequency of an Event's happening or failing by the Square-root of the number which denotes how many Experiments have been, or are designed to be taken, and this Square-root, according as at has been already hinted at in the fourth Corollary, will be as it were the Modulus by which we are to regulate our Estimation, and therefore suppose the number of Experiments to be taken is 3600, and that it were required to assign the Probability of the Event's neither happening oftner than 2850 times, nor more rarely than 1750, which two numbers may be varied at pleasure, provided they he equally distant from the middle Sum 1800, then make the half difference between the two numbers 1850 and 1750, that is, in this case, 50=sÖn; now having supposed 3600=n, then Ön will be 60, which will make it that 50 will be =60s, and consequently s=50/60=5/6, and therefore if we take the proportion, which in an infinite power, the double Sum of the Terms corresponding to the Interval 5/6 Ön, bears to the Sum of all the Terms, we shall have the Probability required exceeding near.

See also Stigler (1986), page 83. The Egyptologist Flinders Petrie (1883) refers to the modulus as a measure of dispersion. His sources are Airy's Theory of Errors (1875²) and De Morgan's Essay on Probability (1838). The modulus equals Ö2 s. FY Edgeworth also uses the modulus in 1885.

Modulus (in number theory) was introduced by Gauss in 1801 in Disquisitiones arithmeticae:

Si numerus a numerorum b, c differentiam metitur, b et c secundum a congrui dicuntur, sin minus, incongrui; ipsum a modulum appelamus. Uterque numerorum b, c priori in casu alterius residuum, in posteriori vero nonresiduum vocatur. [If a number a measure the difference between two numbers b and c, b and c are said to be congruent with respect to a, if not, incongruent; a is called the modulus, and each of the numbers b and c the residue of the other in the first case, the non-residue in the latter case.]

Modulus (in number theory) is found in English in 1811 in An Elementary Investigation of the Theory of Numbers by Peter Barlow [James A. Landau].

Modulus (the length of the vector a + bi) is due to Jean Robert Argand (1768-1822) (Cajori 1919, page 265). The term was first used by him in 1814, according to William F. White in A Scrap-Book of Elementary Mathematics (1908).

Modulus for Ö(a² + b²) was used by Augustin-Louis Cauchy (1789-1857) in 1821.

MOMENT was used in the obsolete sense of "an infinitesimal increment or decrement of a varying quantity" by Isaac Newton in 1704 in De Quadratura Curvarum: "Momenta id est incrementa momentanea synchrona" (OED2).

Moment appears in English in the obsolete sense of "momentum" in 1706 in Synopsis Palmariorum Matheseos by William Jones: "Moment..is compounded of Velocity..and..Weight" (OED2).

Moment of a force appears in 1830 in A Treatise on Mechanics by Henry Kater and Dionysius Lardner (OED2).

Moment was used in a statistics sense by Karl Pearson in October 1893 in Nature: "Now the centre of gravity of the observation curve is found at once, also its area and its first four moments by easy calculation" (OED2).

The phrase method of moments was used in a statistics sense in the first of Karl Pearson's "Contributions to the Mathematical Theory of Evolution" (Phil. Trans. R. Soc. 1894). The method was used to estimate the parameters of a mixture of normal distributions. For several years Pearson used the method on different problems but the name only gained general currency with the publication of his 1902 Biometrika paper "On the systematic fitting of curves to observations and measurements" (David 1995). In "On the Mathematical Foundations of Theoretical Statistics" (Phil. Trans. R. Soc. 1922), Fisher criticized the method for being inefficient compared to his own maximum likelihood method (Hald pp. 650 and 719). [This paragraph was contributed by John Aldrich.]

MONTE CARLO. The method as well as the name for it were apparently first suggested by John von Neumann and Stanislaw M. Ulam. In an unpublished manuscript, "The Origin of the Monte Carlo Method," dated Apr. 12, 1983, Ulam wrote that the method came to him while playing solitaire during an illness in 1946, and that what seems to be the first written account of the method was given by von Neumann in a letter to Robert Richtmyer of Los Alamos in early 1947.

According to W. L. Winston, the term was coined by Ulam and von Neumann in the feasibility project of atomic bomb by simulations of nuclear fission; they gave the code name Monte Carlo for these simulations.

According to several Internet web pages, the term was coined in 1947 by Nicholas Metropolis, inspired by Ulam's interest in poker during the Manhattan Project of World War II.

Monte Carlo method occurs in the title "The Monte Carlo Method" by Nicholas Metropolis in the Journal of the American Statistical Association 44 (1949).

Monte Carlo method also appears in 1949 in Math. Tables & Other Aids to Computation III: "This method of solution of problems in mathematical physics by sampling techniques based on random walk models constitutes what is known as the 'Monte Carlo' method. The method as well as the name for it were apparently first suggested by John von Neumann and S. M. Ulam" (OED2).

MULTIVARIATE is found in J. Wishart, "The generalized product moment distribution in samples from a normal multivariate population," Biometrika 20A, 32 (1928) [James A. Landau].

NON-NORMAL appears in 1929 in Biometrika in the heading: "On the distribution of the ratio of mean to standard deviation in small samples from non-normal universes" (OED2).

NONPARAMETRIC (referring to a statistical inference) is found in 1942 in Jacob Wolfowitz (1910-1981), "Additive Partition Functions and a Class of Statistical Hypotheses," Annals of Mathematical Statistics, 13, 247-279 (David, 1995).

NORMAL (statistics). Normal was used by F. Galton in 1889 in Natural Inheritance. David (1995) writes that Stigler informs him that this is the first use of "normal" unambiguously as a term for the distribution.

Normal probability curve was used by Karl Pearson (1857-1936) in 1893 in Nature 26 Oct. 615/2: "As verification note that for the normal probability curve 3µ₂² = µ₄ and µ₃ = 0" (OED2).

Pearson used normal curve in 1894 in "Contributions to the Mathematical Theory of Evolution":

When a series of measurements gives rise to a normal curve, we may probably assume something approaching a stable condition; there is production and destruction impartially around the mean.

The above quotation is from Porter.

Pearson used normal curve in 1894 in Phil. Trans. R. Soc. A. CLXXXV. 72: "A frequency-curve, which for practical purposes, can be represented by the error curve, will for the remainder of this paper be termed a normal curve."

Normal distribution appears in 1897 in Proc. R. Soc. LXII. 176: "A random selection from a normal distribution" (OED2).

According to Hald, p. 356:

The new error distribution was first of all called the law of error, but many other names came to be used, such as the law of facility of errors, the law of frequency of errors, the Gaussian law of errors, the exponential law, and the typical law of errors. In his paper "Typical laws of heredity" Galton (1877) studied biological variation, and he therefore replaced the term "error" with "deviation," and referring to Quetelet, he called the distribution "the mathematical law of deviation." Chapter 5 in Galton's Natural Inheritance (1889a) is entitled "Normal Variability," and he writes consistently about "The Normal Curve of Distributions," an expression that caught on.

According to Walker (p. 185), Karl Pearson did not coin the term normal curve. She writes, "Galton used it, as did also Lexis, and the writer has not found any reference which seems to be its first use."

Nevertheless, "...Pearson's consistent and exclusive use of this term in his epoch-making publications led to its adoption throughout the statistical community" (DSB).

However, Porter (p. 312) calls normal curve a "Pearsonian neologism."

NORMAL CORRRELATION appears in W. F. Sheppard, "On the application of the theory of error to cases of normal distributions and normal correlations," Phil. Trans. A, 192, page 1091, and Proc. Roy. Soc. 62, page 170 (1898) [James A. Landau].

NORMAL DEVIATE is found in 1925 in R. A. Fisher, Statistical Methods: "Table I. shows that the normal deviate falls outside the range +/-1.598193 in 10 per cent of cases" (OED2).

NORMAL LAW was coined by Karl Pearson in 1894, according to Porter (p. 13).

NORMAL POPULATION appears in E. S. Pearson, "A further note on the distribution of range in samples taken from a normal population," Biometrika 18, page 173 (1926) [James A. Landau]. Also see extreme value.

NORMAL SAMPLES is found in R. A. Fisher, "The moments of the distribution for normal samples of measures of departure from normality," Proc. Roy. Soc. A, 130 (1930).

NULL HYPOTHESIS is used in 1935 by Ronald Aylmer Fisher in The Design of Experiments. He writes, "We may speak of this hypothesis as the 'null hypothesis,' and it should be noted that the null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation."

The "null hypothesis" is often identified with the "hypothesis tested" of J. Neyman and E. S. Pearson's 1933 paper, "On the Problems of the Most Efficient Tests of Statistical Hypotheses" Phil. Trans. Roy. Soc. A (1933), 289-337, and represented by their symbol H₀. Neyman did not like the "null hypothesis," arguing (First Course in Probability and Statistics, 1950, p. 259) that "the original term 'hypothesis tested' seems more descriptive." It is not clear, however, that "hypothesis tested" was ever floated as a technical term [John Aldrich].

NULL SET. Null-set appears in 1906 in Theory of Sets and Points by W. H. and G. C. Young (OED2).

ORDINAL. The earliest citation for this term in the OED2 is in 1599 in Percyvall's Dictionarie in Spanish and English enlarged by J. Minsheu, in which the phrase ordinall numerals is found.

OUTLIER

P-VALUE is found in 1943 in Statistical Adjustment of Data by W. E. Deming (David, 1998).

PARAMETER (in statistics) is found in 1914 in E. Czuber, Wahrscheinlichkeitsrechnung, Vol. I (David, 1998).

Parameter is found in 1922 in R. A. Fisher, "On the Mathematical Foundations of Theoretical Statistics," Philosoophical Transactions of the Royal Society of London, Ser. A. 222, 309-368 (David, 1995).

The term was introduced by Fisher, according to Hald, p. 716.

PERCENTILE appears in 1885 in Francis Galton, "Some Results of the Anthropometric Laboratory," Journal of the Anthropological Institute, 14, 275-287: "The value which 50 per cent. exceeded, and 50 per cent. fell short of, is the Median Value, or the 50th per-centile, and this is practically the same as the Mean Value; its amount is 85 lbs." (OED2).

According to Hald (p. 604), Galton introduced the term.

PERMUTATION first appears in print with its present meaning in Ars Conjectandi by Jacques Bernoulli: "De Permutationibus. Permutationes rerum voco variationes..." (Smith vol. 2, page 528).

Earlier, Leibniz had used the term variationes and Wallis had adopted alternationes (Smith vol. 2, page 528).

PIE CHART is found in 1922 in A. C. Haskell, Graphic Charts in Business (OED2).

POPULATION. See sample.

POISSON DISTRIBUTION. Poisson's exponential binomial limit appears in 1914 in the title "Tables of Poisson's Exponential Limit" by Herbert Edward Soper in Biometrika, 10, 25-35 (David, 1995).

Poisson distribution appears in 1922 in Ann. Appl. Biol. IX. 331: "When the statistical examination of these data was commenced it was not anticipated that any clear relationship with the Poisson distribution would be obtained" (OED2).

POPULATION. See sample.

POSTERIOR PROBABILITY and PRIOR PROBABILITY. These contractions of "probability a priori" and "probability a posteriori" were introduced by Wrinch and Jeffreys ("On Certain Fundamental Principles of Scientific Inquiry," Philosophical Magazine, 42, (1921), 369-390). The longer forms were used by Lubbock & Drinkwater-Bethune (On Probability, 1830?) presumably following Laplace (Théorie Analytique des Probabilités (1812)) who wrote of "la probabilité de l'évenement observé, déterminée à priori" though Laplace did not use the à posteriori form [John Aldrich, using David (2001) and Hald (1998, p. 162)].

POWER (of a test) is found in 1933 in J. Neyman and E. S. Pearson, "The Testing of Statistical Hypotheses in Relation to Probabilities A Priori," Proceedings of the Cambridge Philosophical Society, 24, 492-510 (David (2001)).

The term PROBABILITY may appear in Latin in De Ratiociniis in Ludo Aleae (1657) by Christiaan Huygens, since the 1714 English translation has:

As, if any one shou'd lay that he wou'd throw the Number 6 with a single die the first throw, it is indeed uncertain whether he will win or lose; but how much more probability there is that he shou'd lose than win, is easily determin'd, and easily calculated.

This is from the Latin translation by van Schooten of Huygens' introduction:

Ut si quis primo jactu una tessera senarium jacerere contendat, incertum quidem an vincet; at quanto verisimilius sit eum perdere quam vincere, reipsa definitum est, calculoque subducitur.

This is the Dutch text of the introduction of Huygens' Van Rekeningh in Spelen van Geluck. This text was published in 1660 but allready written in 1656.

Als, by exempel. Die met een dobbel-stee(n) ten eerste(n) een ses neemt te werpen / het is onseecker of hy het winnen sal of niet; maer hoe veel minder kans hy heeft om te winnen als om te verliesen / dat is in sich selven seecker / en werdt door reeckeningh uyt-gevonden.

and

TO resolve which, we must observe, First, That there are six several Throws upon one Die, which all have an equal probability of coming up.

This is from the Latin translation by van Schooten of Huygens' 9th proposition:

Ad quas solvendas advertendum est. Primo unius tesserae sex esse jactus diversos, quorum quivis aeque facile eveniat.

This is the Dutch text from the 9th proposition of Huygens' Van Rekeningh in Spelen van Geluck.

Om welcke te solveeren / so moet hier op worden acht genomen. Eerstelijck dat op 1 steen zijn 6 verscheyde werpen / die even licht konnen gebeuren.

Although Huygens uses the word Kans (Chance) repeatedly in his Dutch text, van Schooten seems in his Latin translation to rephrase the text every time just to circumvent the use of a single term for probability. (See p. 11-13, in Waerden, BL van der (ed, 1975) Die Werke von Jacob Bernoulli, Band 3, Birckhauser Verlag Basel)

The opening sentence of De Mensura Sortis (1712) by Abraham de Moivre (1667-1754) is translated:

If p is the number of chances by which a certain event may happen, and q is the number of chances by which it may fail; the happenings as much as the failings have their degree of probability: But if all the chances by which the event may happen or fail were equally easy; the probability of happening will be to the probability of failing as p to q.

The first citation for probability in the OED2 is in 1718 in the title The Doctrine of Chances: or, a Method of Calculating the Probability of Events in Play by De Moivre.

Pascal did not use the term (DSB).

PROBABILITY DENSITY FUNCTION. Probability function appears in J. E. Hilgard, "On the verification of the probability function," Rep. Brit. Ass. (1872).

Wahrscheinlichkeitsdichte appears in 1912 in Wahrscheinlichkeitsrechnung by A. A. Markoff (David, 1998).

In J. V. Uspensky, Introduction to Mathematical Probability (1937), page 264 reads "The case of continuous F(t), having a continuous derivative f(t) (save for a finite set of points of discontinuity), corresponds to a continuous variable distributed with the density f(t), since F(t) = integral from -infinity to t f(x)dx" [James A. Landau].

Probability density appears in 1939 in H. Jeffreys, Theory of Probability: "We shall usually write this briefly P(dx|p) = f'(x)dx, dx on the left meaning the proposition that x lies in a particular range dx. f'(x) is called the probability density" (OED2).

Probability density function appears in 1946 in an English translation of Mathematical Methods of Statistics by Harald Cramér. The original appeared in Swedish in 1945 [James A. Landau].

PROBABILITY DISTRIBUTION appears in a paper published by Sir Ronald Aylmer Fisher in 1920 [James A. Landau].

PROBABLE ERROR appears in 1812 in Phil. Mag.: "All that can be gained is, that the errors are as trifling as possible--that they are equally distributed--and that none of them exceed the probable errors of the observation" (OED2).

According to Hald (p. 360), Friedrich Wilhelm Bessel (1784-1846) introduced the term probable error (wahrscheinliche Fehler) without detailed explanation in 1815 in "Ueber den Ort des Polarsterns" in Astronomische Jahrbuch für das Jahr 1818, and in 1816 defined the term in "Untersuchungen über die Bahn des Olbersschen Kometen" in Abh. Math. Kl. Kgl. Akad. Wiss., Berlin. Bessel used the term for the 50% interval around the least-squares estimate.
Also in 1816 Gauss published a paper Bestimmung der Genauigkeit der Beobachtungen in which he showed several methods to calculate the Probable Error. He wrote:"... wir wollen diese Grösse ... der wahrscheinleichen Fehler nennen, und ihn met r bezeichnen." His calculations were based on a general dispersion measure E_k= (S(d^k)/n)^1/k. Gauss showed that k = 2 results in the most precise value of the probable error: r = 0.6744897 * E₂. Notice that E₂ is the mean error (i.e. the sample standard deviation).

All calculations and constants related to the probable error and starting with Gauss are based on the assumption that the errors follow a normal distribution. A modern approximation of the ratio ^r/_E₂ is 0.674489749382381

Probable error is found in 1852 in Report made to the Hon. Thomas Corwin, secretary of the treasury by Richard Sears McCulloh. This book uses the term four times, but on the one occasion where a computation can be seen the writer takes two measurements and refers to the difference between them as the "probable error" [University of Michigan Digital Library].

Probable error is found in 1853 in A dictionary of science, literature & art edited by William Thomas Brande: "... the probable error is the quantity, which is such that there is the same probability of the difference between the determination and the true absolute value of the thing to be determined exceeding or falling short of it. Thus, if twenty measurements of an angle have been made with the theodolite, and the arithmetical mean or average of the whole gives 50° 27' 13"; and if it be an equal wager that the error of this result (either in excess or defect) is less than two seconds, or greater than two seconds, then the probable error of the determination is two seconds" [University of Michigan Digital Library].

Probable error is found in 1853 in A collection of tables and fromulae (=formulae) useful in surveying, geodesy, and practical astronomy by Thomas Jefferson Lee. The term is defined, in modern terminology, as the sample standard deviation times .674489 divided by the square root of the number of observations [James A. Landau; University of Michigan Digital Library].
Actually on page 238 of the book mentioned above T.J. Lee presents two versions of the probable error: r and R. The one called r is the PE of a single observation with r = 0.674489 * E₂ with E₂ = s and the one called R is the PE of final result (ie of the mean) with R = r / Ön.

Probable error is found in 1855 in A treatise on land surveying by William Mitchell Gillespie: "When a number of separate observations of an angle have been made, the mean or average of them all, (obtained by dividing the sum of the readings by their number,) is taken as the true reading. The 'Probable error' of this mean, is the quantity, (minutes or seconds) which is such that there is an even chance of the real error being more or less than it. Thus, if ten measurements of an angle gave a mean of 350 18', and it was an equal wager that the error of this result, too much or too little, was half a minute, then half a minute would be the 'Probable error' of this determination. This probable error is equal to the square root of the sum of the squares of the errors (i. e. the differences of each observation from the mean) divided by the number of observations, and multiplied by the decimal 0.674489. The same result would be obtained by using what is called 'The weight' of the observation. It is equal to the square of the number of observations divided by twice the sum of the squares of the errors. The 'Probable error' is equal to 0.476936 divided by the square root of the weight" [University of Michigan Digital Library].

Probable error is found in 1865 in Spherical astronomy by Franz Brünnow (an English translation by the author of the second German edition): "In any series of errors written in the order of their absolute magnitude and each written as often as it actually occurs, we call that error which stands exactly in the middle, the probable error" [University of Michigan Digital Library].

In 1872 Elem. Nat. Philos. by Thomson & Tait has: "The probable error of the sum or difference of two quantities, affected by independent errors, is the square root of the sum of the squares of their separate probable errors" (OED2).

In 1889 in Natural Inheritance, Galton criticized the term probable error, saying the term was "absurd" and "quite misleading" because it does not refer to what it seems to, the most probable error, which would be zero. He suggested the term Probability Deviation be substituted, opening the way for Pearson to introduce the term standard deviation (Tankard, p. 48).

The term QUARTILE was introduced by Francis Galton (Hald, p. 604).

Higher and lower quartile are found in 1879 in D. McAlister, Proc. R. Soc. XXIX: "As these two measures, with the mean, divide the curve of facility into four equal parts, I propose to call them the 'higher quartile' and the 'lower quartile' respectively. It will be seen that they correspond to the ill-named 'probable errors' of the ordinary theory" (OED2).

Upper and lower quartile appear in 1882 in F. Galton, "Report of the Anthropometric Committee," Report of the 51st Meeting of the British Association for the Advancement of Science, 1881, p. 245-260 (David, 1995).

QUINTILE is found in 1922 in "The Accuracy of the Plating Method of Estimating the Density of Bacterial Populations," Annals of Applied Biology by R. A. Fisher, H. G. Thronton, and W. A. Mackenzie: "Since the 3-plate sets are relatively scanty, we can best test their agreement with theory by dividing the theoretical distribution of 43 values at its quintiles, so that the expectation is the same in each group." There are much earlier uses of this term in astrology [James A. Landau].

RANDOM NUMBER. The phrase "this table of random numbers" is found in 1927 in Tracts for Computers (OED2).

See also L. H. C. Tippett, "Random Sampling Numbers 1927," Tracts for Computers, No. 15 (1927) [James A. Landau].

RANDOM SAMPLE is found in April 1870 in "Notices of Recent Publications," The Princeton review: "We confess that we have never suspected Satan as capable of poetizing in the manner attributed to him in Book IX, of which the following is a random sample."

Random choice appears in the Century Dictionary (1889-1897).

Random selection occurs in 1897 in Proc. R. Soc. LXII. 176: "A random selection from a normal distribution" (OED2).

Random sampling was used by Karl Pearson in 1900 in the title, "On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling," Philosophical Magazine 50, 157-175 (OED2).

Random sample is found in 1903 in Biometrika II. 273: "If the whole of a population were taken we should have certain values for its statistical constants, but in actual practice we are only able to take a sample, which should if possible be a random sample" (OED2).

RANDOM WALK. Karl Pearson posed "The Problem of the Random Walk," in the July 27, 1905, issue of Nature (vol. LXXII, p. 294). "A man starts from a point O and walks l yards in a straight line; he then turns through any angle whatever and walks another l yards in a second straight line. He repeats this process n times. I require the probability that after these stretches he is at a distance between r and r + dr from his starting point O." Pearson's objective was to develop a mathematical theory of random migration. In the next issue (vol. LXXII, p. 318) Lord Rayleigh translated the problem into one involving sound, "the composition of n iso-periodic vibrations of unit amplitude and of phases distributed at random," and reported that he had given the solution for large n in 1880 [John Aldrich].

RANDOM VARIABLE. Variabile casuale is found in 1916 in F. P. Cantelli, "La Tendenza ad un limite nel senso del calcolo delle probabilità," Rendiconti del Circolo Matematico di Palermo, 41, 191-201 (David, 1998).

Random variable is found in 1934 in A. Winter, "On Analytic Convolutions of Bernoulli Distributions," American Journal of Mathematics, 56, 659-663 (David, 1998).

RANDOMIZATION appears in 1926 in R. A. Fisher, "The Arrangement of Field Experiments," Journal of the Ministry of Agriculture of Great Britain, 33, 503-513 (David, 1995).

According to Tankard (p. 112), R. A. Fisher "may ... have coined the term randomization; at any rate, he certainly gave it the important position in statistics that it has today."

RANGE (in statistics) is found in 1848 in H. Lloyd, "On Certain Questions Connected with the Reduction of Magnetical and Meteorological Observations," Proceedings of the Royal Irish Academy, 4, 180-183 (David, 1995).

RANK CORRELATION. Kendall & Stuart vol ii page 494 say that the rank correlation coefficient was introduced by "the eminent psychologist" Spearman in 1906. Pearson's biography of Galton also uses the term "correlation of ranks" [James A. Landau].

Rank correlation appears in 1907 in Drapers' Company Res. Mem. (Biometric Ser.) IV. 25: "No two rank correlations are in the least reliable or comparable unless we assume that the frequency distributions are of the same general character .. provided by the hypothesis of normal distribution. ... Dr. Spearman has suggested that rank in a series should be the character correlated, but he has not taken this rank correlation as merely the stepping stone..to reach the true correlation" (OED2).

REGRESSION. According to the DSB, Francis Galton (1822-1911) discovered the statistical phenomenon of regression and used this term, although he originally termed it "reversion."

Porter (page 289), referring to Galton, writes:

He did, however, change his terminology from "reversion" to "regression," a shift whose significance is not entirely clear. Possibly he simply felt that the latter term expressed more accurately the fact that offspring returned only part way to the mean. More likely, the change reflected his new conviction, first expressed in the same papers in which he introduced the term "regression," that this return to the mean reflected an inherent stability of type, and not merely the reappearance of remote ancestral gemmules.

In 1859 Charles Darwin used reversion in a biological context in The Origin of Species (1860): "We could not have told, whether these characters in our domestic breeds were reversions or only analogous variations" (OED2).

Galton used the term reversion coefficient in "Typical laws of heredity," Nature 15 (1877), 492-495, 512-514 and 532-533 = Proceedings of the Royal Institution of Great Britain 8 (1877) 282-301.

Galton used regression in a genetics context in "Section H. Anthropology. Opening Address by Francis Galton," Nature, 32, 507-510 (David, 1995).

Galton also used law of regression in 1885, perhaps in the same address.

Karl Pearson used regression and coefficient of regression in 1897 in Phil. Trans. R. Soc.:

The coefficient of regression may be defined as the ratio of the mean deviation of the fraternity from the mean off-spring to the deviation of the parentage from the mean parent. ... From this special definition of regression in relation to parents and offspring, we may pass to a general conception of regression. Let A and B be two correlated organs (variables or measurable characteristics) in the same or different individuals, and let the sub-group of organs B, corresponding to a sub-group of A with a definite value a, be extracted. Let the first of these sub-groups be termed an array, and the second a type. Then we define the coefficient of regression of the array on the type to be the ratio of the mean-deviation of the array from the mean B-organ to the deviation of the type a from the mean A-organ.

[OED2]

The phrase "multiple regression coefficients" appears in the 1903 Biometrika paper "The Law of Ancestral Heredity" by Karl Pearson, G. U. Yule, Norman Blanchard, and Alice Lee. From around 1895 Pearson and Yule had worked on multiple regression and the phrase "double regression" appears in Pearson's paper "Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity, and Panmixia" (Phil. Trans. R. Soc. 1896). [This paragraph was contributed by John Aldrich.]

RISK and RISK FUNCTION (referring to the expected value of the loss in statistical decision theory) first appear in Wald’s "Contributions to the Theory of Statistical Estimation and Testing Hypotheses," Annals of Mathematical Statistics,, 10, (1939), 299-326 [John Aldrich, based on David (2001)].

SAMPLE. The juxtaposition of sample and population seems to have originated with Karl Pearson writing in 1903 in Biometrika 2, 273. The relevant passage appears in OED2: "If the whole of a population were taken we should have certain values for its statistical constants, but in actual practice we are only able to take a sample ...." Pearson's colleague, the zoologist W. F. R. Weldon, had been using "sample" to refer to collections of observations since 1892. (See also random sample.) [John Aldrich]

SAMPLE SPACE was introduced into statistical theory by J. Neyman and E. S. Pearson, Phil. Trans. Roy. Soc. A (1933), 289-337. It was associated with the representation of a sample comprising n numbers as a point in n-dimensional space, a representation R. A. Fisher had exploited in articles going back to 1915. W. Feller used this notion of sample space in his "Note on regions similar to the sample space," Statist. Res. Mem., Univ. London 2, 117-125 (1938) but in the Introduction to Probability Theory and its Applications, volume one of 1950 Feller used the term quite abstractly for the set of outcomes of an experiment. He attributed this general concept to Richard von Mises (1883-1953) who had referred to the Merkmalraum (label space) in writings on the foundations of probability from 1919 onwards [John Aldrich].

The term may have been used earlier by Richard von Mises (1883-1953).

SAMPLING DISTRIBUTION. R. A. Fisher seems to have introduced this term. It appears incidentally in 1922 (JRSS, 85, 598) and then in the title of his 1928 paper "The General Sampling Distribution of the Multiple Correlation Coefficient," Proc. Roy. Soc. A, 213, p. 654.

SCATTER DIAGRAM is found in 1925 in F. C. Mills, Statistical Methods X. 366: "The equation to a straight line, fitted by the method of least squares to the points on the scatter diagram, will express mathematically the average relationship between these two variables" (OED2).

Scattergram is found in 1938 in A. E. Waugh, Elem. Statistical Method: "This is the method of plotting the data on a scatter diagram, or scattergram, in order that one may see the relationship" (OED2).

Scatterplot is found in 1939 in Statistical Dictionary of Terms and Symbols by Kurtz and Edgerton (David, 1998).

SCORE and METHOD OF SCORING in the theory of statistical estimation. The derivative of the log-likelihood function played an important part in R. A. Fisher's theory of maximum likelihood from its beginnings in the 1920s but the name score is more recent. The "score" was originally associated with a particular genetic application; a family is assigned a score based on the number of children of each category and there were different ways scoring associated with different ways of estimating linkage. In a 1935 paper ("The Detection of Linkage with Dominant Abnormalities," Annals of Eugenics, 6, 193) Fisher wrote that, because of the efficiency of maximum likelihood, the "ideal score" is provided by the derivative of the log-likelihood function. In 1948 C. R. Rao used the phrase efficient score (Proc. Cambr. Philos. Soc. 44, 50-57) and score by itself (J. Roy. Statist. Soc., B, 10: 159-203) when writing about maximum likelihood in general, i.e. without reference to the linkage application. Today "score" is so established in this derivative of the log-likelihood sense that the phrases "non-ideal score" or "inefficient score" convey nothing.

In 1946 - still in the genetic context - Fisher ("A System of Scoring Linkage Data, with Special Reference to the Pied Factors in Mice. Amer. Nat., 80: 568-578) described an iterative method for obtaining the maximum likelihood value. Rao's 1948 J. Roy. Statist. Soc. B paper treats the method in a more general framework and the phrase "Fisher's method of scoring" appears in a comment by Hartley. Fisher had already used the method in a general context in his 1925 "Theory of Statistical Estimation" paper (Proc. Cambr. Philos. Soc. 22: 700-725) but it attracted neither attention nor name. [This entry was contributed by John Aldrich, with some information taken from David (1995).]

SERIAL CORRELATION. The term was introduced by G. U. Yule in his 1926 paper "Why Do We Sometimes Get Nonsense Correlations between Time-series? A Study in Sampling and the Nature of Time-series," Journal of the Royal Statistical Society, 89, 1-69 (David 2001).

The term SET first appears in Paradoxien des Unendlichen (Paradoxes of the Infinite), Hrsg. aus dem schriftlichen Nachlasse des Verfassers von Fr. Prihonsky, C. H. Reclam sen., xi, pp. 157, Leipzig, 1851. This small tract by Bernhard Bolzano (1781-1848) was published three years after his death by a student Bolzano had befriended (Burton, page 592).

Menge (set) is found in Geometrie der Lage (2nd ed., 1856) by Carl Georg Christian von Staudt: "Wenn man die Menge aller in einem und demselben reellen einfoermigen Gebilde enthaltenen reellen Elemente durch n + 1 bezeichnet und mit diesem Ausdrucke, welcher dieselbe Bedeutung auch in den acht folgenden Nummern hat, wie mit einer endlichen Zahl verfaehrt, so ..." [Ken Pledger].

Georg Cantor (1845-1918) did not define the concept of a set in his early works on set theory, according to Walter Purkert in Cantor's Philosophical Views.

Cantor's first definition of a set appears in an 1883 paper: "By a set I understand every multitude which can be conceived as an entity, that is every embodiment [Inbegriff] of defined elements which can be joined into an entirety by a rule." This quotation is taken from Über unendliche lineare Punctmannichfaltigkeiten, Mathematische Annalen, 21 (1883).

In 1895 Cantor used the word Menge in Beiträge zur Begründung der Transfiniten Mengenlehre, Mathematische Annalen, 46 (1895):

By a set we understand every collection [Zusammenfassung] M of defined, well-distinguished objects m of our intuition [Zusammenfassung] or our thinking (which are called the elements of M brought together to form an entirety.

This translation was taken from Cantor's Philosophical Views by Walter Purkett.

SIGN TEST appears in W. MacStewart, "A note on the power of the sign test," Ann. Math. Statist. 12 (1941) [James A. Landau].

SIGNIFICANCE. Significant is found in 1885 in F. Y. Edgeworth, "Methods of Statistics," Jubilee Volume, Royal Statistical Society, pp. 181-217: "In order to determine whether the observed difference between the mean stature of 2,315 criminals and the mean stature of 8,585 British adult males belonging to the general population is significant [etc.]" (OED2).

Significance is found in 1888 in Logic of Chance by John Venn: "As before, common sense would feel little doubt that such a difference was significant, but it could give no numerical estimate of the significance" (OED2).

Test of significance and significance test are found in 1907 in Biometrika V. 183: " Several other cases of probable error tests of significance deserve reconsideration" (OED2).

Testing the significance is found in "New tables for testing the significance of observations," Metron 5 (3) pp 105-108 (1925) [James A. Landau].

Statistically significant is found in 1931 in L. H. C. Tippett, Methods Statistics: "It is conventional to regard all deviations greater than those with probabilities of 0.05 as real, or statistically significant" (OED2).

Statistical significance is found in 1938 in Journal of Parapsychology: "The primary requirement of statistical significance is met by the results of this investigation" (OED2).