8

Objective

To clarify by having what traits or attributes, I can say an analysis is inferential or predictive.

Background

Taking a data science course which touches on analyses of Inferential and Predictive. The explanations (what I understood) are

  • Inferential

    Induct a hypothesis from a small samples in a population, and see it is true in larger/entire population.

    It seems to me it is generalisation. I think induct smoking causes lung cancer or CO2 causes global warming are inferential analyses.

  • Predictive

    Induct a statement of what can happen by measuring variables of an object.

    I think, identify what traits, behaviour, remarks people react favourably and make a presidential candidate popular enough to be the president is a predictive analysis (this is touched in the course as well).

Question

I am bit confused with the two as it looks to me there is a grey area or overlap.

Bayesian Inference is "inference" but I think it is used for prediction such as in a spam filter or fraudulent financial transaction identification. For instance, a bank may use previous observations on variables (such as IP address, originator country, beneficiary account type, etc) and predict if a transaction is fraudulent.

I suppose the theory of relativity is an inferential analysis that inducted a theory/hypothesis from observations and thought experimentations, but it also predicted light direction would be bent.

Kindly help me to understand what are Must Have attributes to categorise an analysis as inferential or predictive.

nbro
  • 15,395
  • 32
  • 113
  • 196
mon
  • 18,789
  • 22
  • 112
  • 205

3 Answers3

9

"What is the question?" by Jeffery T. Leek, Roger D. Peng has a nice description of the various types of analysis that go into a typical data science workflow. To address your question specifically:

An inferential data analysis quantifies whether an observed pattern will likely hold beyond the data set in hand. This is the most common statistical analysis in the formal scientific literature. An example is a study of whether air pollution correlates with life expectancy at the state level in the United States (9). In nonrandomized experiments, it is usually only possible to determine the existence of a relationship between two measurements, but not the underlying mechanism or the reason for it.

Going beyond an inferential data analysis, which quantifies the relationships at population scale, a predictive data analysis uses a subset of measurements (the features) to predict another measurement (the outcome) on a single person or unit. Web sites like FiveThirtyEight.com use polling data to predict how people will vote in an election. Predictive data analyses only show that you can predict one measurement from another; they do not necessarily explain why that choice of prediction works.

data analysis flowchart

dranxo
  • 3,348
  • 4
  • 35
  • 48
7

There is some gray area between the two but we can still make distinctions.

Inferential statistics is when you are trying to understand what causes a certain outcome. In such analyses there is a specific focus on the independent variables and you want to make sure you have an interpretable model. For instance, your example on a study to examine whether smoking causes lung cancer is inferential. Here you are trying to closely examine the factors that lead to lung cancer, and smoking happens to be one of them.

In predictive analytics you are more interested in using a certain dataset to help you predict future variation in the values of the outcome variable. Here you can make your model as complex as possible to the point that it is not interpretable as long as it gets the job done. A more simplified example is a real estate investment company interested in determining which combination of variables predicts prime price for a certain property so it can acquire them for profit. The potential predictors could be neighborhood income, crime, educational status, distance to a beach, and racial makeup. The primary aim here is to obtain an optimal combination of these variables that provide a better prediction of future house prices.

Here is where it gets murky. Let's say you conduct a study on middle aged men to determine the risks of heart disease. To do this you measure weight, height, race, income, marital status, cholestrol, education, and a potential serum chemical called "mx34" (just making this up) among others. Let's say you find that the chemical is indeed a good risk factor for heart disease. You have now achieved your inferential objective. However, you are satisfied with your new findings and you start to wonder whether you can use these variables to predict who is likely to get heart disease. You want to do this so that you can recommend preventive steps to prevent future heart disease.

AlxRd
  • 285
  • 1
  • 16
  • Thanks for the answer. I think inferential analysis takes a result or a finding from a small population (sample patients to find risk factors of heat disease). Then see if applies to larger population to tell which ones of them can have heart disease, which seems to me the same with predictive analysis. So inferential analysis encompasses predictive analysis (predictive is a part/subset of inferential)? – mon Dec 26 '15 at 21:18
  • Or … as long as I try to tell what can happen based on observations, it is called predictive, and inferential analysis is by nature predictive because it tries out hypothesis (expectation of what to happen) on larger population? – mon Dec 26 '15 at 21:34
  • I would say yes to your first comment. Inference is invariably about the population which we try to understand/infer from a well chosen sample. On the other hand, I am not sure I would call hypothesis testing prediction. – AlxRd Dec 26 '15 at 23:02
1

The same academic paper I was reading that spurred this question for me also gave an answer (from Leo Breiman, a UC Berkeley statistician):

• Prediction. To be able to predict what the responses are going to be to future input variables;

• [Inference].23 To [infer] how nature is associating the response variables to the input variables.

Source: http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf

Alex W
  • 37,233
  • 13
  • 109
  • 109