Wednesday, May 4, 2016

Is good fit of a latent variable model positive evidence for measurement validity?

This week I reviewed a paper attempting to explain why latent variable modelling is useful for working out whether a measure is valid or not. (Latent variable modelling meaning either exploratory or confirmatory factor analysis). The author drew on Borsboom, Mellenbergh and van Heerden's theory of validity.

In Borsboom et al's theory, a test is a valid measure of an attribute if:
 1) The attribute exists
 2) Variation in the attribute causes variation in the measurement outcomes.

Therefore, the author suggested that latent variable models - which test models in which unobserved "latent" variables have causal effects on observed indicators - are useful for testing the validity of a measure. For example, if you hypothesise that a test is a valid measure of a specific individual attribute, and you fit a unidimensional CFA model and find "good" fit, then this supports the idea that the measure is valid. (We'll set aside the controversy surrounding what constitutes "good" fit of a latent variable model for the moment).

Now I don't want to pick on the paper I reviewed too much here - this is a line of reasoning that I suspect a lot of psychologists explicitly or implicitly follow when fitting latent variable models (or mesurement models anyway). I've certainly published conventional psychometric papers that are at least indirectly based on this line of reasoning (example). But the more I think about it, the more it seems to me that this line of reasoning just doesn't work at all.

Why? The problem is the auxiliary hypothesis of conditional independence.

When we're examining the validity of a set of items as a measure of some attribute, we will typically have a substantive hypothesis that variation in the attribute causes variation in the item responses. This is fine. The problem is that this hypothesis is only testable in conjunction with the hypothesis that, controlling for the effects of the latent attribute, the item responses are uncorrelated with each other (the assumption of conditional independence). At most, we might be able to free some of these error correlations, but we cannot allow all of them to be freely estimated, otherwise the model will be unidentifiable.

Problematically, the assumption of local independence is typically not part of the substantive hypothesis we are testing - if variation in the attribute causes variation in the measurement outcomes, then the measure is valid, regardless of whether local independence holds. There are occasional cases where we are genuinely interested in trying to explain correlations between item scores - e.g., say, the g explanation for the positive manifold of IQ tests - but for the most part, the assumption of conditional independence is just an assumption we make for convenience, not a part of the substantive theory. In Popper's terms, conditional independence is an auxiliary hypothesis.

Importantly, conditional independence is also an auxiliary hypothesis that typically isn't very plausible: For a pair of items, it means that we assume that responses to the two items have exactly zero effects on each other, and that aside from the latent variable specified, there exists no other variable whatsoever that has any direct effect on the responses to both of the two items.

What this all means is that if an hypothesised latent variable model doesn't fit the data well, it could be because the test isn't a valid measure of the attribute, but it could also be the case that the test is valid, but the assumption of conditional independence doesn't hold: In other words, the items have relationships with one another that aren't perfectly explained by shared effects of the latent variable.
To some extent, I suspect researchers are aware of this: It might be part of the reason why most researchers use fairly lax standards for testing the fit of latent variable models, and why many researchers are reasonably open to post-hoc modifications to models to try and account for problematic error correlations.

But what I think is less widely appreciated is that breaches of conditional independence can also lead to the opposite problem: A finding that a latent variable model fits "well", with significant and positive loadings of the variable on the items, despite the latent variable actually having no effect on any of the items. For a unidimensional model, this can occur when the error correlations are homogenous, but the latent variable has no true effect.

I have attached simulations below demonstrating examples of both cases.
require(lavaan)
require(MASS)
 
#Scenario 1: Latent variable does affect observed outcomes
#but lack of conditional independence means model fits poorly
 
  set.seed(123) #for replicability
  latent.var = rnorm(1000, 0, 1) #Standard normal latent variable
 
  #In the population, error correlations vary between 0 and 0.3 in size
  Sigma1 = matrix(runif(25, 0, 0.3), ncol = 5)
  diag(Sigma1) <- rep(1, times = 5) 
  errors1 = mvrnorm(n = 1000, mu = rep(0, times = 5), Sigma = Sigma1)
 
  #The latent variable has true effect of beta = 0.5 on all items
  data1 = as.data.frame(apply(errors1, 2, FUN = function(x){
    x+latent.var*0.5})) 
 
  #fit a unidimensional latent variable model to the data
  #assuming conditional independence
  mod1 = cfa('latent.var =~ V1 + V2 + V3 + V4 + V5', data = data1) 
  summary(mod1, fit.measures = TRUE) 
  #The model fits poorly per the chi-square and RMSEA
  #yet the latent variable does have positive effects 
  #on the observed outcomes
  #I.e., the observed measure IS valid
  #yet the latent variable model doesn't fit 
  #due to the lack of conditional independence.
 
 
#Scenario 2: No effects of latent variable on observed outcomes
#but lack of conditional independence means
#model fits well (one latent, five indicators)
 
  set.seed(123) #for replicability
 
  #There is a standard normal latent variable
  latent.var = rnorm(1000, 0, 1) 
 
  #In the population, the error correlation matrix is homogenous 
  #with all error correlations equalling 0.3
  Sigma2 = matrix(rep(0.3, times = 25), ncol = 5)
  diag(Sigma2) <- rep(1, times = 5) 
  errors2 = mvrnorm(n = 1000, mu = rep(0, times = 5), Sigma = Sigma2)
 
  #The latent variable has no effect on any of the variables. 
  #(so observed variables are just the errors)
  data2 = as.data.frame(apply(errors2, 2, FUN = function(x){
    x+latent.var*0})) 
 
  #fit a unidimensional latent variable model to the data
  #assuming conditional independence
  mod2 = cfa('latent.var =~ V1 + V2 + V3 + V4 + V5', data = data2) 
  summary(mod2, fit.measures = TRUE) 
 
  #The model fits extremely well by any measure, 
  #and all the estimated effects of the latent variable on observed 
  #variables are positive and significant. 
  #Yet in reality the latent variable does not have a causal effect 
  #on observed outcomes; the measure is not valid.

Created by Pretty R at inside-R.org


Does this mean that latent variable modelling has no place in psychometric validation research? Probably not. But certainly I think we need to be more aware that the statistical models we're testing when we estimate latent variable models can be very different from the substantive hypotheses we're trying to test. When conditional independence is an assumption, rather than a part of the substantive theory we want to test, the fit of a latent variable model (whether good or poor) probably doesn't tell us an awful lot.

4 comments:

  1. The covariance structure implied by the 1-factor model with equal loadings is that of equal covariances. Of course, such data can also be thought of as generated by a completely unstructured (saturated) model, where all covariances just happen to be the same. But there is no parsimony in this representation. More formally and generally, if Sigma=LL'+Psi, where Psi is diagonal, it's obviously also true that an alternative representation is Sigma=Psi*, where Psi is not diagonal. We don't need simulations to show that if data are generated from Sigma=Psi*, where Psi* just happens to meet the constraints of a 1-factor model, that the 1-factor model actually fits the data.

    ReplyDelete
    Replies
    1. Hi Vika, thanks for your comment. Yes, you're absolutely right that the simulation isn't really necessary. I just think that simulation can just be a nice tool for demonstrating an idea and/or checking intuitions. (Especially for folks like me whose mathematical ability isn't wonderfully strong).

      You're also obviously right that a saturated model is not parsimonious.

      However, the goal in this context is not to find a model that provides a parsimonious representation of the data. The goal is to test an hypothesis that, in reality, there exists a specific latent attribute that has causal effects on the observed indicators. That is, our hypothesis is that the measure is valid according to Borsboom's theory of validity (which takes a realist ontological stance towards latent variables).

      For a specific test, the hypothesis above may be false - i.e., the unobserved attribute that the test is intended to measure does not actually have causal effects on the measurement outcomes, or does not itself exist at all. And yet the error terms - the variance in the observed outcomes not explained by the hypothesised latent variable - may still have am homogenous covariance structure, or another covariance structure that results in good apparent fit of the estimated model. Thus we might find "good" fit despite the test being invalid.

      Delete
    2. Hi Matt, the test is not invalid. There is no test for the existence of a latent variable. The test is on the consequence of that existence--namely, a particular hypothesized model-implied covariance structure. If the model is true, it implies a certain covariance structure on the data, and this consequence can be tested. If the test rejects the hypothesized covariance structure, the latent variable model is called into question. But if the test shows the covariance structure is consistent with the model, it of course does not follow that the model is true, only that it's one of the possible representations of the data, and it happens to be a particularly parsimonious one. Even within the class of structured models, there are numerous equivalent models to a given model that will produce identical statistical fit.

      Delete
    3. Hi again Vika. I think maybe there is some confusion here arising from dual meanings of the word "test". My post and comment above are not about the validity of statistical tests of goodness of fit for latent variable models. What I'm addressing is the measurement validity of psychometric tests.

      An hypothesis that a test is a valid measure of an attribute (given Borsboom's definition of validity, discussed in the post) does not actually imply a certain covariance structure of the data. We only get to the point of being able to propose a testable covariance structure by adding additional assumptions about the covariances of the errors (e.g., conditional independence).

      Hence a statistical test of goodness of fit may be a trustworthy measure of the accuracy of the statistical model specified, but not of the substantive hypothesis we actually want to test. (That substantive hypothesis being that the psychometric test is a valid measure of the intended attribute).

      Delete