The pathological science: Psychology, skepticism, and statistics: May 2016

This week I reviewed a paper attempting to explain why latent variable modelling is useful for working out whether a measure is valid or not. (Latent variable modelling meaning either exploratory or confirmatory factor analysis). The author drew on Borsboom, Mellenbergh and van Heerden's theory of validity.

In Borsboom et al's theory, a test is a valid measure of an attribute if:
1) The attribute exists
2) Variation in the attribute causes variation in the measurement outcomes.

Therefore, the author suggested that latent variable models - which test models in which unobserved "latent" variables have causal effects on observed indicators - are useful for testing the validity of a measure. For example, if you hypothesise that a test is a valid measure of a specific individual attribute, and you fit a unidimensional CFA model and find "good" fit, then this supports the idea that the measure is valid. (We'll set aside the controversy surrounding what constitutes "good" fit of a latent variable model for the moment).

Now I don't want to pick on the paper I reviewed too much here - this is a line of reasoning that I suspect a lot of psychologists explicitly or implicitly follow when fitting latent variable models (or mesurement models anyway). I've certainly published conventional psychometric papers that are at least indirectly based on this line of reasoning (example). But the more I think about it, the more it seems to me that this line of reasoning just doesn't work at all.

Why? The problem is the auxiliary hypothesis of conditional independence.

When we're examining the validity of a set of items as a measure of some attribute, we will typically have a substantive hypothesis that variation in the attribute causes variation in the item responses. This is fine. The problem is that this hypothesis is only testable in conjunction with the hypothesis that, controlling for the effects of the latent attribute, the item responses are uncorrelated with each other (the assumption of conditional independence). At most, we might be able to free some of these error correlations, but we cannot allow all of them to be freely estimated, otherwise the model will be unidentifiable.

Problematically, the assumption of local independence is typically not part of the substantive hypothesis we are testing - if variation in the attribute causes variation in the measurement outcomes, then the measure is valid, regardless of whether local independence holds. There are occasional cases where we are genuinely interested in trying to explain correlations between item scores - e.g., say, the g explanation for the positive manifold of IQ tests - but for the most part, the assumption of conditional independence is just an assumption we make for convenience, not a part of the substantive theory. In Popper's terms, conditional independence is an auxiliary hypothesis.

Importantly, conditional independence is also an auxiliary hypothesis that typically isn't very plausible: For a pair of items, it means that we assume that responses to the two items have exactly zero effects on each other, and that aside from the latent variable specified, there exists no other variable whatsoever that has any direct effect on the responses to both of the two items.

What this all means is that if an hypothesised latent variable model doesn't fit the data well, it could be because the test isn't a valid measure of the attribute, but it could also be the case that the test is valid, but the assumption of conditional independence doesn't hold: In other words, the items have relationships with one another that aren't perfectly explained by shared effects of the latent variable.
To some extent, I suspect researchers are aware of this: It might be part of the reason why most researchers use fairly lax standards for testing the fit of latent variable models, and why many researchers are reasonably open to post-hoc modifications to models to try and account for problematic error correlations.

But what I think is less widely appreciated is that breaches of conditional independence can also lead to the opposite problem: A finding that a latent variable model fits "well", with significant and positive loadings of the variable on the items, despite the latent variable actually having no effect on any of the items. For a unidimensional model, this can occur when the error correlations are homogenous, but the latent variable has no true effect.

I have attached simulations below demonstrating examples of both cases.

require(lavaan)
require(MASS)
 
#Scenario 1: Latent variable does affect observed outcomes
#but lack of conditional independence means model fits poorly
 
  set.seed(123) #for replicability
  latent.var = rnorm(1000, 0, 1) #Standard normal latent variable
 
  #In the population, error correlations vary between 0 and 0.3 in size
  Sigma1 = matrix(runif(25, 0, 0.3), ncol = 5)
  diag(Sigma1) <- rep(1, times = 5) 
  errors1 = mvrnorm(n = 1000, mu = rep(0, times = 5), Sigma = Sigma1)
 
  #The latent variable has true effect of beta = 0.5 on all items
  data1 = as.data.frame(apply(errors1, 2, FUN = function(x){
    x+latent.var*0.5})) 
 
  #fit a unidimensional latent variable model to the data
  #assuming conditional independence
  mod1 = cfa('latent.var =~ V1 + V2 + V3 + V4 + V5', data = data1) 
  summary(mod1, fit.measures = TRUE) 
  #The model fits poorly per the chi-square and RMSEA
  #yet the latent variable does have positive effects 
  #on the observed outcomes
  #I.e., the observed measure IS valid
  #yet the latent variable model doesn't fit 
  #due to the lack of conditional independence.
 
 
#Scenario 2: No effects of latent variable on observed outcomes
#but lack of conditional independence means
#model fits well (one latent, five indicators)
 
  set.seed(123) #for replicability
 
  #There is a standard normal latent variable
  latent.var = rnorm(1000, 0, 1) 
 
  #In the population, the error correlation matrix is homogenous 
  #with all error correlations equalling 0.3
  Sigma2 = matrix(rep(0.3, times = 25), ncol = 5)
  diag(Sigma2) <- rep(1, times = 5) 
  errors2 = mvrnorm(n = 1000, mu = rep(0, times = 5), Sigma = Sigma2)
 
  #The latent variable has no effect on any of the variables. 
  #(so observed variables are just the errors)
  data2 = as.data.frame(apply(errors2, 2, FUN = function(x){
    x+latent.var*0})) 
 
  #fit a unidimensional latent variable model to the data
  #assuming conditional independence
  mod2 = cfa('latent.var =~ V1 + V2 + V3 + V4 + V5', data = data2) 
  summary(mod2, fit.measures = TRUE) 
 
  #The model fits extremely well by any measure, 
  #and all the estimated effects of the latent variable on observed 
  #variables are positive and significant. 
  #Yet in reality the latent variable does not have a causal effect 
  #on observed outcomes; the measure is not valid.

Created by Pretty R at inside-R.org

Does this mean that latent variable modelling has no place in psychometric validation research? Probably not. But certainly I think we need to be more aware that the statistical models we're testing when we estimate latent variable models can be very different from the substantive hypotheses we're trying to test. When conditional independence is an assumption, rather than a part of the substantive theory we want to test, the fit of a latent variable model (whether good or poor) probably doesn't tell us an awful lot.

The pathological science: Psychology, skepticism, and statistics

Wednesday, May 4, 2016

Is good fit of a latent variable model positive evidence for measurement validity?