A Pivotal Role Statistics Play in Scientific Research

Modern science is often based on statements of probability and statistical significance. For instance:

  • There is a significant likelihood of a catastrophic meteorite impact on Earth sometime in the next 200,000 years (Bland, 2005)
  • Studies have shown that the probability of developing lung cancer is almost 20 times greater in cigarette smokers compared to nonsmokers (ACS, 2004)
  • First-born male children exhibit IQ test scores that are 2.82 points higher than second-born males, a difference that is significant at the 95% confidence level (Kristensen & Bjerkedal, 2007).

But a keen observer having no scientific temperament can be easily found saying scientists speak in terms that seem obscure. If we should immediately establish a colony on the moon to escape extraterrestrial disaster, why not inform people? If cigarette smoking causes lung cancer, why not simply say so? And if older children are smarter than their younger siblings, why not let them know?  

The reason is that none of these statements can be precisely substantiated through data. Scientific data rarely lead to absolute conclusions. Not all smokers die from lung cancer, some of them may die prematurely from diseases other than lung cancer, all there are chances that some of the smokers never contract the disease.

All data show variability, and here comes the role of statistics which is to quantify this variability and help scientists to make more accurate statements about their data.

Most of us are of the opinion that statistics render a measure of proof that something is true, but actually, they provide a measure of the probability of observing a certain result. This is a critical distinction. For instance, the American Cancer Society has carried out extensive studies to make statements about the risk of cancer in US citizens. These studies gave support to the fact that rates of lung cancer are higher among cigarette smokers as compared to non-smokers. However, not all smokers contracted lung cancer, (and, in fact, some nonsmokers did contract lung cancer).

Thus, the development of lung cancer is not a simple cause-and-effect relationship but a probability-based event. So using statistical techniques scientists can put numbers to this probability. They can now say conclusively that the probability of developing lung cancer is almost 20 times greater in cigarette smokers compared to the non-smoker and not that if you smoke cigarettes you are likely to contract lung cancer.Statistics is a powerful tool that quantifies the probability and used throughout science.

What is statistics?

In 1654, while answering the query of French gambler, Antoine Gombaud, foundations of the probability and statistics were laid. He asked the noted philosopher and mathematician about how one should divide the stakes among players when a game of chance is interrupted prematurely.  Pascal posed the same question to the mathematician and lawyer Pierre de Fermat in an attempt to know the answer, both Pascal and Pierre devised a mathematical system that not only answered Gombaud’s original question but also led to the discovery of modern probability theory and statistics.

From its roots in gambling, statistics has grown into a field of study that involves the development of tests and methods that are used to quantitatively define the probability of certain outcomes, variability inherent in data, and the error and uncertainty associated with those outcomes.  

As such, statistical methods have a wide range of use throughout the scientific process from the design of research questions through data and to the final interpretation of data.

Statistics in research design

Many people deem statements of probability and likelihood as a sign of uncertainty or weakness in scientific results. However, the use of probability tests and statistical methods in research is an important aspect of science that buttresses and add certainty to scientific conclusions.

Through statistical methods, you can confirm if the data are interpreted correctly and that apparent relationship are significant or meaningful and not simply chance occurrences. A “statistic” is a numerical value that describes some property of a data set. Standard deviation and the average (or “mean”) are the most commonly used statistics.The variance is the square of the standard deviation.

Two important concepts in statistics are the “population” and the “sample”. The population is an idealized representation of the set of all possible values of some measured quantity and is a theoretical concept. On the flip side, a sample is what we can measure and actually see i.e., what we have been provided with for statistical analysis, and a necessarily limited subset of the population.

In the real world, everything we are encircled with is limited samples, from which we attempt to estimate the properties of the population.

As an analogy, the population might be an infinite jar of chocolates, a certain proportion of which (say 70%) is blue and the rest (30%) are red. We can only draw off a finite number of these chocolates, and not precisely in the ratio of 70:30. The ratio we measure is called a “sample statistic”. The techniques of statistical science allow us to obtain the best estimate of the population parameter and make optimum use of the sample statistic. Statistical science also allows us to quantify the uncertainty in this estimate.

Pondering over the crucial role statistics play in finding the truth, it can be said conclusively that making progress in Scientific Research is not possible without using statistical tools and techniques. With the help of statistics, we can extract anything meaningful hidden in otherwise vague expression. Statistics help scientists draw a conclusion and gain monumental success in their area of work. Research without statistics is inconceivable and will be a barren effort.


Read more, about research related information and debates at BEYOND TEACHING