The pathological science: Psychology, skepticism, and statistics: September 2016

[Warning: Very Basic Stuff contained within]

Last weekend I was at a conference where there was an interesting tutorial-style talk on the Reliable Change Index (RCI). The RCI is generally used to try and determine if a single person has displayed 'real' change over the course of some intervention. (In other words, could an observed change in score plausibly have occurred purely due to random measurement error?)

I have some conceptual problems with the RCI in terms of whether it really tells us anything we really want to know (which I'll save for another day), but it was an interesting and well-delivered presentation. That said, I want to pick on an idea that was mentioned in the talk, and that I've heard others repeat recently.

The idea relates to extending the RCI outside of single cases. Particularly, the speaker suggested that when looking at a group study, that if a mean difference is less than the standard error of measurement, that this suggests that the apparent effect might be spurious (i.e., purely the result of measurement error) - even if the mean difference is statistically significant. His reasoning for this was that a statistical significance test focuses on sampling error, not measurement error.

Now, for a single case, a change in score that is less than the standard error of measurement is indeed one that would be quite consistent with a null hypothesis that the true score of the participant has not actually changed. (This isn't to say that this null is true, just that the observation isn't overtly unconsistent with the null). The RCI framework formalises this idea further by:

Using the SEM to calculate the standard error of the difference, Sdiff = sqrt(2*SEM^2). Since both the pre score and the post score are subject to errors of measurement, the standard error of the difference is a little more than the SEM.
Using 1.96*Sdiff as the cut-off for reliable change, drawing on the usual goal of a 5% Type 1 error rate.

All good so far. However, if we are comparing two sample means, the picture changes. At each time point we now have multiple observations (for different people), each with a different quantity of measurement error. The mean of the measurement error across people will itself have a variance that is less than the variance of the measurement error variable itself. This should be intuitively obvious: The variance of the mean of a sample of observations is always less than the variance of the underlying variable itself (well, provided the sample has N > 1 and the observations aren't perfectly correlated.)

In fact, when the sample is reasonably large, the standard error of the mean of the measurement error for the sample will be drastically less than the standard error of measurement itself. So an observation that a mean difference is less than the standard error of measurement is not necessarily consistent with the null hypothesis of no true change occurring.

So do we need to calculate the standard error of measurement for a particular sample, and use that along with a significance test (or Bayesian test) when assessing mean differences?

No.

Standard inferential tests do not only deal with sampling error. Any test you're likely to use to look at a mean difference will include an error term (often, but not necessarily, assumed to be normal and i.i.d. with mean zero). This error term bundles up any source of purely unsystematic random variability in the dependent variable - including both sampling error and unsystematic measurement error. So your standard inferential test already deals with unsystematic measurement error. Looking at the standard error of measurement in a group analysis tells you nothing extra about the likelihood that a true effect exists.

The pathological science: Psychology, skepticism, and statistics

Wednesday, September 7, 2016

What if the mean difference less than the standard error of measurement?