«Chapter 14—Framing Statistical Questions CHAPTER Framing Statistical 14 Questions Introduction Translating Scientific Questions Into Probabilistic ...»
Chapter 14—Framing Statistical Questions
Translating Scientific Questions Into Probabilistic And Statistical Questions
The Three Types of Questions
The Steps In Statistical Inference
Chapters 3-10 discussed problems in probability theory. That
is, we have been estimating the probability of a composite
event resulting from a system in which we know the probabilities of the simple events—the “parameters” of the situation.
Then Chapters 11-13 discussed the underlying philosophy of statistical inference.
Now we turn to inferential-statistical problems. Up until now, we have been estimating the complex probabilities of known universes—the topic of probability. Now as we turn to problems in statistics, we seek to learn the characteristics of an unknown system—the basic probabilities of its simple events and parameters. (Here we note again, however, that in the process of dealing with them, all statistical-inferential problems eventually are converted into problems of pure probability). To assess the characteristics of the system in such problems, we employ the characteristics of the sample(s) that have been drawn from it.
For further discussion on the distinction between inferential statistics and probability theory, see Chapters 1-3.
This chapter begins the topic of hypothesis testing. The issue is:
whether to adjudge that a particular sample (or samples) come(s) from a particular universe. A two-outcome yes-no universe is discussed first. Then we move on to “measured-data” universes, which are more complex than yes-no outcomes beResampling: The New Statistics cause the variables can take on many values, and because we ask somewhat more complex questions about the relationships of the samples to the universes. This topic is continued in subsequent chapters.
In a typical hypothesis-testing problem presented in this chapter, one sample of hospital patients is treated with a new drug and a second sample is not treated but rather given a “placebo.” After
this question into an operational and testable scientific hypothesis by asking this question: Do doctors in tobacco-economy states differ from doctors in other states in their smoking, and in their beliefs about smoking?
Which numbers would help us answer this question, and how do we interpret those numbers? We now were ready to ask the statistical question: Do doctors in tobacco-economy states “belong to the same universe” (with respect to smoking) as do other doctors? That is, do doctors in tobacco-economy states have the same characteristics—at least, those characteristics we are interested in, smoking in this case—as do other doctors? Later we shall see that the way to proceed is to consider the statistical hypothesis that these doctors do indeed belong to that same universe; that hypothesis and the universe will be called “benchmark hypothesis” and “benchmark universe” respectively—or in more conventional usage, the “null hypothesis.” If the tobacco-economy doctors do indeed belong to the benchmark universe—that is, if the benchmark hypothesis is correct—then there is a 49/50 chance that doctors in some state other than the state in which tobacco is most important will have the highest rate of cigarette smoking. But in fact we observe that the state in which tobacco accounts for the largest proportion of the state’s income—North Carolina—had (as of
1964) a higher proportion of doctors who smoked than any other state. (Furthermore, a lower proportion of doctors in North Carolina than in any other state said that they believed that smoking is a health hazard.) Of course, it is possible that it was just chance that North Carolina doctors smoked most, but the chance is only 1 in 50 if the benchmark hypothesis is correct. Obviously, some state had to have the highest rate, and the chance for any other state was also 1 in 50. But, because our original scientific hypothesis was that North Carolina doctors’ smoking rate would be highest, and we then observed that it was highest even though the chance was only 1 in 50, the observation became interesting and meaningful to us. It means that the chances are strong that there was a connection between the importance of tobacco in the economy of a state and the rate of cigarette smoking among doctors living there (as of 1964).
To consider this problem from another direction, it would be rare for North Carolina to have the highest smoking rate for doctors if there were no special reason for it; in fact, it would occur only once in fifty times. But, if there were a special reaResampling: The New Statistics
it is not feasible for you to obtain a larger sample. Five of six “medicine” patients get well, two of six “no medicine” patients get well. Does the medicine cure the cancer? That is, if future cancer patients take the medicine, will their rate of recovery be higher than if they did not take the medicine?
One way to translate the scientific question into a statistical question is to ask: Do the “medicine” patients belong to the same universe as the “no medicine” patients? That is, we ask whether “medicine” patients still have the same chances of getting well from the cancer as do the “no medicine” patients, or whether the medicine has bettered the chances of those who took it and thus removed them from the original universe, with its original chances of getting well. The original universe, to which the “no medicine” patients must still belong, is the benchmark universe. Shortly we shall see that we proceed by comparing the observed results against the benchmark hypothesis that the “medicine” patients still belong to the benchmark universe— that is, they still have the same chance of getting well as the “no medicine” patients.
We want to know whether or not the medicine does any good.
This question is the same as asking whether patients who take medicine are still in the same population (universe) as “no medicine” patients, or whether they now belong to a different population in which patients have higher chances of getting
well. To recapitulate our translations, we move from asking:
Does the medicine cure the cancer? to, Do “medicine” patients have the same chance of getting well as “no medicine” patients?; and finally, to: Do “medicine” patients belong to the same universe (population) as “no medicine” patients? Remember that “population” in this sense does not refer to the population at large, but rather to a group of cancer sufferers (perhaps an infinitely large group) who have given chances of getting well, on the average. Groups with different chances of getting well are called “different populations” (universes).
Shortly we shall see how to answer this statistical question. We must keep in mind that our ultimate concern in cases like this one is to predict future results of the medicine, that is, to predict whether use of the medicine will lead to a higher recovery rate than would be observed without the medicine.
Illustration C Is method Alpha a better method of teaching reading than method Beta? That is, will method Alpha produce a higher 200 Resampling: The New Statistics
Translating from a scientific question into a statistical question is mostly a matter of asking the probability that some given benchmark universe (population) will produce one or more observed samples. Notice that we must (at least for general scientific testing purposes) ask about a given universe whose composition we assume to be known, rather than about a range of universes, or about a universe whose properties are unknown. In fact, there is really only one question that probability statistics can answer: Given some particular benchmark universe of some stated composition, what is the probability that an observed sample would come from it? (Please notice the subtle but all-important difference between the words “would come” in the previous sentence, and the word “came.”) A variation of this question is: Given two (or more) samples, what is the probability that they would come from the same universe—that is, that the same universe would produce both of them? In this latter case, the relevant benchmark universe is implicitly the universe whose composition is the two samples combined.
The necessity for stating the characteristics of the universe in question becomes obvious when you think about it for a moment. Probability-statistical testing adds up to comparing a sample with a particular benchmark universe, and asking whether there probably is a difference between the sample and the universe. To carry out this comparison, we ask how likely it is that the benchmark universe would produce a sample like the observed sample. But in order to find out whether or not a universe could produce a given sample, we must ask whether or not some particular universe—with stated characteristics— could produce the sample. There is no doubt that some universe could produce the sample by a random process; in fact, some universe did. The only sensible question, then, is whether or not a particular universe, with stated (or known) characteristics, is likely to produce such a sample. In the case of the medicine, the universe with which we compare the sample who took the medicine is the benchmark universe to which that sample would belong if the medicine had had no effect.
This comparison leads to the benchmark (null) hypothesis that the sample comes from a population in which the medicine (or other experimental treatment) seems to have no effect. It is to avoid confusion inherent in the term “null hypothesis” that I replace it with the term “benchmark hypothesis.” The concept of the benchmark (null) hypothesis is not easy to grasp. The best way to learn its meaning is to see how it is used in practice. For example, we say we are willing to beResampling: The New Statistics
wish to investigate, the investigation of that model’s behavior, and the interpretation of the results.
Stating the steps to be followed in a procedure is an operational definition of the procedure. My belief in the clarifying power of this device (the operational definition) is embodied in the set of steps given in Chapter 10 for the various aspects of statistical inference. A canonical question-and-answer procedure for testing hypotheses will be found in Chapter 19, and one for confidence intervals will be found in Chapter 20.
We define resampling to include problems in inferential statistics as well as problems in probability as follows: Using the entire set of data you have in hand, or using the given data-generating mechanism (such as a die) that is a model of the process you wish to understand, produce new samples of simulated data, and examine the results of those samples. That’s it in a nutshell. In some cases, it may also be appropriate to amplify this procedure with additional assumptions.
Problems in pure probability may at first seem different in nature than problems in statistical inference. But the same logic as stated in this definition applies to both varieties of problems. The difference is that in probability problems the “model” is known in advance—say, the model implicit in a deck of poker cards plus a game’s rules for dealing and counting the results—rather than the model being assumed to be best estimated by the observed data, as in resampling statistics.
The hardest job in using probability statistics, and the most important, is to translate the scientific question into a form to which statistics can give a sensible answer. You must translate scientific questions into the appropriate form for statistical operations, so that you know which operations to perform.
This is the part of the job that requires hard, clear thinking— though it is non-mathematical thinking—and it is the part that someone else usually cannot easily do for you.
Once you know exactly which probability-statistical question you want to ask—that is, exactly which probability you want to determine—the rest of the work is relatively easy. The stage at which you are most likely to make mistakes is in stating the 204 Resampling: The New Statistics