“Statistically Insignificant”? Watch out!

Experts can detect non-experts just by the way they use technical terms.  Something is said that no one trained in the field would actually say.  It gets one’s teeth grating to hear it…

The great fictional detective Nero Wolfe once identified the culprit, who claimed to be a law student, by asking him at dinner if they had taught him to "draft torts."  When the student assured Wolfe that they had, he knew him to be a fraud.

This brings us to the term "statistically insignificant."  It is a phrase that would not be used by statisticians because it is misleading and has no technical meaning.  It is sometimes substituted when someone means "not statistically significant."  To the casual observer, this might seem to be a distinction without a difference.  Nit-picking.

That view would be wrong. The distinction is important.  Anyone following the markets should understand and appreciate this distinction, because it can lead to a clear trading edge versus those who do not get it.  I am trying to make this as non-technical as possible, so please read on.

Statistical significance is a technical term related to the measurement of sampling error.  In most cases, researchers are attempting to accept or reject a null hypothesis.  For these purposes, the advancement of science is very cautious about rejecting a null hypothesis, so tests of significance generally require probabilities that the "effect" measured is extremely unlikely to have occurred as a result of a sample (perhaps from a survey) differing by chance from the entire population that it is expected to represent.  It does not address non-sampling error — including many things like poorly designed questions, not identifying the relevant population to be polled, interviewer bias, etc.

Back in the old days, students learned in an early class the difference between statistical significance and substantive significance.  Let’s suppose, for example, that we did a survey of voters in Illinois about the upcoming gubernatorial election.  It found that likely male voters planned to choose the incumbent (male, Democratic) candidate at a rate of 55.3%  Female voters planned to choose the incumbent (versus the female, Republican) candidate at a rate of 54.9%.  This small difference between expected male and female voters is not very important in a substantive fashion.  The headline of the news story might be that men and women see the election the same way.  Despite this main story theme, if the sample were large enough, the difference would be statistically significant.  That would mean only that the .4% difference was very unlikely to be the result of sampling error.  The null hypothesis of "no difference" could be rejected.

In short — large samples narrow the confidence interval, often called the "margin of error" in journalistic terms.  Making the sample large does not mean that the difference identified is important.  Substantive and statistical signficance are two completely different things.

If anyone is still reading at this point, let’s check out how it applies to market data.

In his influential and widely-read blog, The Big Picture, Barry Ritholtz delved into the new home starts data from the Census Bureau.  Barry wrote as follows:

"Here is the data point released by the Census Bureau:

Privately-owned housing starts in September were at a seasonally adjusted annual rate of 1,772,000. This is 5.9 percent (±8.9%)*

Single-family housing starts in September were at a rate of
1,426,000; this is 4.3 percent (±8.4%)* above the August figure of
1,367,000.

What is the mathematical significance
of this release? ABSOLUTELY ZERO. Any datapoint below the margin of
error is statistically insignificant.

As the Census Bureau notes:

* 90% confidence interval includes zero. The Census Bureau does not have sufficient statistical evidence to conclude that the actual change is different from zero.

Insufficient evidence to conclude the change is different from zero.
So September starts up 5.9% with a +/- 8.9% error rate means nothing.
Single Family Home starts of 4.3% and a +/- 8.4% margin is meaningless."

Before analyzing this, let me make a few points:

  • We are not necessarily taking issue with anything related to housing, or the blip Barry identifies in his later discussion.
  • We agree that this is a "noisy" series and one that is difficult to interpret.
  • We applaud Barry’s effort to educate his readers and highlight the issues in the data.
  • The point we are making is technical but important.  Virtually everyone commenting on the markets says similar things.  Barry is just providing a convenient example for the illustration.

Now to the analysis —

To say that the results are "stastically insignificant" or that there is "absolutely zero" mathematical significance is incorrect.  If Barry wants to test this, I will construct a game where we draw marbles out of a cloaked container with a 60-40 collection of marbles, either blue or white.  The sample size will not be enough to attain "statistical significance."  We will each start with a stack of hundred dollar bills.  I get to choose the color based upon a series of sample draws that do not attain statistical significance.  He has to take the other side.  I will swiftly proved that just because something is not "statistically significant" does not prove that it has no value.  If he agrees to play long enough, I’ll fly him to Chicago for the game!

What would I (or any expert statistician) conclude from the actual data cited?

  • The single most likely value for new housing starts is an increase of 5.9%.
  • The increase in housing starts is statistically significant at a level of about 70%.  That is we can be 70% sure that the actual increase is not zero.  The Census Bureau uses 90%.  Journal writers use 95% or 99%.  The choice for one’s margin of error is arbitrary.
  • If we could know the "true" increase in housing starts (which we never will) that number is just as likely to be 11.8% as it is to be zero.

If you want to read more about substantive signficance (called oomph in this excellent essay) check out this source.  Among other things, it points out that 96% of the articles in the leading economic journal misused statistical significance during the 80’s.  Barry is far from alone here!

Bottom line for investors and traders:  Much of what we see comes from surveys of one sort or another.  The information does have value, but any single data point may be suspect, requiring careful interpretation.  We have tried to help in the educational process with our Payroll Employment Game, where players get to see how the survey results affect their predictions.  Please give it a try and read the excellent technical notes by my colleague Allen Russell.

You may also like

2 comments

  • dude June 3, 2011  

    hi, im a fan of stas and irony so this page tickled me. herein, you claim to never take advice from someone who uses the term in question however:
    ‘What is the mathematical significance of this release? ABSOLUTELY ZERO. Any datapoint below the margin of error is “”statistically insignificant””. ‘
    while i agree with you intelectually i’ve found that the use of that term is more a misnomer, and it is more accurate to say the term used should be, statistically irrelevant. as in, it is irrelevent that x occured in (any range in) this analisys whatsoever and should otherwise not be counted or weighed as heavily (if at all).
    though arguing semantics back and forth IS a great way to get off track.

  • dude June 3, 2011  

    hi, im a fan of stas and irony so this page tickled me. herein, you claim to never take advice from someone who uses the term in question however:
    ‘What is the mathematical significance of this release? ABSOLUTELY ZERO. Any datapoint below the margin of error is “”statistically insignificant””. ‘
    while i agree with you intelectually i’ve found that the use of that term is more a misnomer, and it is more accurate to say the term used should be, statistically irrelevant. as in, it is irrelevent that x occured in (any range in) this analisys whatsoever and should otherwise not be counted or weighed as heavily (if at all).
    though arguing semantics back and forth IS a great way to get off track.