Why This Matters: The Foundation of Trust in Testing
You've studied reliability, and you know that a test needs to produce consistent scores. But here's the thing: a bathroom scale could reliably show you're five pounds heavier every single day, and while that's consistent, it's not actually measuring your weight correctly. That scale is reliable but not valid.
This is exactly why validity matters in psychological testing. When you're sitting across from a client, making recommendations about their career path, their mental health treatment, or their ability to parent their children, you need to know that the tests you're using actually measure what they claim to measure. Validity is about accuracy and truth, not just consistency.
The EPPP wants you to understand three main types of validity: content validity, construct validity, and criterion-related validity. In this lesson, we're focusing on the first two. These concepts will show up on your exam, and more importantly, they'll shape how you evaluate and select tests throughout your career.
The Modern Understanding of Validity
Here's something important: the field has evolved in how it talks about validity. The traditional view treated content, construct, and criterion-related validity as three separate types. The current Standards for Educational and Psychological Testing (2014) describes validity as one unified concept, defined as "the degree to which evidence and theory support the interpretation of test scores for proposed uses of tests."
Think about what that means. We're not just asking "Is this test valid?" We're asking "Is this test valid for this specific purpose, with this specific population, in this specific context?" It's a more nuanced question.
The Standards identify five sources of validity evidence:
- Evidence based on test content
- Evidence based on response processes
- Evidence based on internal structure
- Evidence based on relationships with other variables
- Evidence based on consequences of testing
For the EPPP, you still need to know the traditional three types because that's the language the exam uses. The good news? Those three types fit right into these five sources of evidence.
Content Validity: Does the Test Cover What It Should?
Content validity is straightforward: it's about whether a test actually samples the domain it claims to measure. This is especially important for achievement tests, job knowledge tests, and work samples.
{{M}}Imagine you're hiring a new therapist for your practice, and you want to test their knowledge of cognitive-behavioral therapy. You create a 50-question test, but all the questions are about Beck's cognitive therapy for depression. Nothing about behavioral activation, exposure therapy, cognitive restructuring for anxiety, or any CBT applications beyond depression. Even if your test is perfectly reliable, it doesn't have good content validity because it's not representative of the full domain of CBT knowledge.{{/M}}
How Content Validity Is Established
Content validity isn't about statistics or correlation coefficients. It's established during test development through a systematic process:
Step 1: Clearly Define the Domain You need to know exactly what you're trying to measure. If you're creating a test of parenting knowledge, what does "parenting knowledge" include? Infant care? Discipline strategies? Developmental milestones? Safety practices? All of the above?
Step 2: Sample Representatively Once the domain is defined, test items must represent all important aspects of that domain proportionally. If developmental milestones make up 30% of essential parenting knowledge, then roughly 30% of your test items should cover that topic.
Step 3: Expert Review Subject matter experts systematically review the items to ensure they're appropriate, accurate, and comprehensive. These experts verify that the test covers all critical aspects of the domain and doesn't over-emphasize some areas while neglecting others.
Content Validity vs. Face Validity
Students often confuse content validity with face validity, so let's clear this up now.
Face validity refers to whether a test looks valid to the people taking it. It's not actual validity at all. It's about appearances and perceptions.
Sometimes face validity is helpful. {{M}}If you're administering a career interest inventory to college students, and the questions seem relevant and professional, they're more likely to take it seriously and answer thoughtfully. But if the questions seem random or silly, they might rush through it without genuine reflection.{{/M}}
Sometimes face validity is actually undesirable. Tests measuring socially sensitive topics like honesty, criminal thinking, or prejudice often work better when they're not face valid. If it's obvious what you're measuring, people can fake their responses to look better.
Here's a simple way to remember the difference:
- Content validity: Does the test actually cover the domain comprehensively? (Expert judgment, systematic)
- Face validity: Does the test look like it covers the domain? (Test-taker perception, superficial)
Construct Validity: Measuring What You Can't See
Now we move into trickier territory. Construct validity is essential when you're measuring hypothetical traits. Things you can't directly observe but must infer from behavior.
What are these constructs? Intelligence, personality traits, motivation, depression, anxiety, self-esteem, locus of control. You can't directly measure any of these things the way you measure height or blood pressure. Instead, you observe behaviors, responses, and patterns, then make inferences about the underlying construct.
{{M}}Think about how you know someone is your friend. You can't see "friendship" directly. Instead, you observe that they text you regularly, remember your birthday, support you during tough times, and enjoy spending time with you. From these observable behaviors, you infer the existence of friendship. Psychological constructs work the same way.{{/M}}
The Challenge of Validation
Because you can't directly observe constructs, proving that a test measures them is complex. You need to gather multiple types of evidence. Two critical pieces of this evidence are convergent validity and divergent validity.
Convergent Validity: Scores on your test should correlate highly with scores on other measures of the same or similar constructs. If you develop a new depression measure, it should correlate strongly with established depression measures. If it doesn't, something's wrong.
Divergent Validity (also called discriminant validity): Scores on your test should have low correlations with measures of unrelated constructs. Your new depression measure shouldn't correlate highly with a test of spatial reasoning ability, because depression and spatial reasoning aren't related. If they do correlate strongly, your depression test might be measuring something else entirely.
Together, convergent and divergent validity help you build a case that your test measures what you think it measures. And not something else.
The Multitrait-Multimethod Matrix: A Powerful Tool
One of the most elegant ways to assess construct validity is the multitrait-multimethod matrix, developed by Campbell and Fiske. Yes, it sounds complicated, but the logic is actually straightforward.
The basic idea: You assess your construct using multiple methods (self-report, observer ratings, behavioral tasks, etc.) and you also include measures of different constructs using those same methods. Then you compare all the correlations.
Breaking Down the Matrix
Let's work through a concrete example. Suppose you've developed a new self-report test to measure sociability in middle-school students. You know from research that sociability is unrelated to impulsivity. To validate your test, you administer four measures:
- Your new self-report sociability test (the one you're validating)
- A teacher report sociability test (same trait, different method)
- A self-report impulsivity test (different trait, same method)
- A teacher report impulsivity test (different trait, different method)
Now you correlate all pairs of scores. Four specific correlations matter for validation:
The Four Key Coefficients
| Coefficient Type | What It Compares | What You Want | What It Tells You |
|---|---|---|---|
| Monotrait-Monomethod | Same trait, same method | High | Reliability of your test |
| Monotrait-Heteromethod | Same trait, different method | High | Convergent validity |
| Heterotrait-Monomethod | Different trait, same method | Low | Divergent validity |
| Heterotrait-Heteromethod | Different trait, different method | Low | Divergent validity |
Monotrait-Monomethod (same trait, same method): This is actually a reliability coefficient for your test, something like coefficient alpha or test-retest reliability. It should be high.
Monotrait-Heteromethod (same trait, different method): This correlation between your self-report sociability test and the teacher-report sociability test. A high correlation here provides evidence of convergent validity. Both methods are measuring sociability, and they agree.
Heterotrait-Monomethod (different trait, same method): This correlation between your self-report sociability test and the self-report impulsivity test. A low correlation here provides evidence of divergent validity. Even though both use the same method (self-report), they're measuring different things, so they shouldn't correlate much.
Heterotrait-Heteromethod (different trait, different method): This correlation between your self-report sociability test and the teacher-report impulsivity test. This should also be low, providing more evidence of divergent validity. Different traits measured in different ways shouldn't correlate.
What Makes the Matrix Powerful
The matrix is elegant because it separates trait variance (what you actually want to measure) from method variance (artificial similarity created by using the same assessment method). {{M}}It's like having multiple witnesses describe the same event. If all the witnesses who saw the event from the north side tell similar stories, that's convergent evidence. If their stories don't match what the witnesses on the south side saw (a different event), that's divergent evidence. And if two witnesses on the north side tell wildly different stories, you know something's wrong with at least one of their accounts.{{/M}}
Factor Analysis: Unveiling Hidden Structure
Factor analysis is a statistical technique used for multiple purposes, including assessing construct validity. It's more complex than the multitrait-multimethod matrix, but for the EPPP, you mainly need to understand what the output means.
The Basic Process
Factor analysis involves four main steps:
- Administration: Give your test to a sample of people, along with tests of related and unrelated constructs
- Correlation: Calculate correlations between all pairs of tests and create a correlation matrix
- Factor Extraction: Use those correlations to derive factors (underlying dimensions that explain the pattern of correlations)
- Rotation: Rotate the factors to make them easier to interpret, then interpret and name them
The rotation step might sound odd, but it's necessary because the initial factor solution is often mathematically sound but psychologically meaningless. Rotation redistributes the variance to create factors that make conceptual sense.
Reading a Rotated Factor Matrix
Let's look at a real example. Suppose you've developed a new test of locus of control (Test A). You administer it along with two established locus of control tests (Tests B and C) and three self-esteem tests (Tests D, E, and F). Research shows that locus of control and self-esteem aren't correlated, so this is a good way to test divergent validity.
After running the factor analysis, you get this rotated factor matrix:
| Test | Factor I | Factor II | Communality |
|---|---|---|---|
| Test A (LOC) | .80 | .10 | .65 |
| Test B (LOC) | .85 | .12 | .73 |
| Test C (LOC) | .76 | .15 | .60 |
| Test D (SE) | .13 | .85 | .74 |
| Test E (SE) | .14 | .76 | .60 |
| Test F (SE) | .25 | .70 | .55 |
Understanding Factor Loadings
The numbers under Factor I and Factor II are factor loadings. Essentially correlation coefficients between each test and each factor. They range from -1.00 to +1.00, just like regular correlations.
To interpret a factor loading, square it to see how much variability in the test is explained by that factor:
- Test A's loading on Factor I is .80, which squared is .64 or 64%
- Test A's loading on Factor II is .10, which squared is .01 or 1%
So Factor I explains 64% of the variance in Test A, while Factor II explains only 1%.
Understanding Communality
The communality (rightmost column) tells you the total percentage of variance in each test explained by all the factors combined. For Test A, the communality is .65, meaning 65% of variance in Test A scores is explained by the factor analysis.
If communality isn't provided, you can calculate it (when factors are orthogonal/uncorrelated) by squaring each loading and adding them up:
- Test A: (.80)² + (.10)² = .64 + .01 = .65
Interpreting the Factors
Here's where you become a detective. Look at the pattern of loadings:
- Tests A, B, and C (all locus of control tests) load highly on Factor I and minimally on Factor II
- Tests D, E, and F (all self-esteem tests) load highly on Factor II and minimally on Factor I
The pattern is clear: Factor I is "Locus of Control" and Factor II is "Self-Esteem."
What This Tells You About Validity
For your new Test A, this provides strong evidence of both convergent and divergent validity:
- Convergent validity: Test A loads highly (.80) on the same factor as other locus of control tests
- Divergent validity: Test A loads minimally (.10) on the self-esteem factor, showing it's not confusing the two unrelated constructs
{{M}}Think of factor analysis like organizing your music library. You might notice that certain songs always get played together. They cluster. Some songs cluster with other jazz tracks, some with rock, some with classical. The factors are like genres: hidden organizing principles that explain why certain things go together. If your new "jazz" song clusters with other jazz songs and not with death metal, you've got good evidence it's actually jazz.{{/M}}
Common Misconceptions to Avoid
Misconception 1: "Content validity is just about making sure test items look relevant."
Reality: Content validity is systematic and expert-driven, not superficial. It requires clearly defining the domain, proportionally sampling all aspects of it, and having experts verify coverage. It's not about appearances; it's about comprehensive, representative sampling.
Misconception 2: "If a test has high reliability, it must be valid."
Reality: Reliability is necessary but not sufficient for validity. That bathroom scale that consistently shows you five pounds heavier is reliable but not valid. High reliability means consistent measurement; validity means accurate measurement of the intended construct.
Misconception 3: "Face validity is the same as content validity."
Reality: Face validity is about perception and appearance; content validity is about actual, systematic domain coverage. Face validity isn't even real validity. It's just about whether test-takers think the test looks appropriate.
Misconception 4: "Construct validity is established by a single study showing convergent validity."
Reality: Construct validity is built through accumulated evidence from multiple studies using various methods. You need both convergent validity (correlating with what you should correlate with) and divergent validity (not correlating with what you shouldn't correlate with). One study showing your depression test correlates with another depression test is a good start, but it's not sufficient.
Misconception 5: "In a factor analysis, higher communality always means a test is better."
Reality: Communality tells you how much variance is explained by the extracted factors, but it doesn't tell you if those are the right factors. A test could have high communality but still load on the wrong factor, showing poor construct validity.
Practical Applications in Your Future Career
Understanding content and construct validity isn't just about passing the EPPP. It shapes your practice:
Selecting Assessment Tools: When you're choosing a test for clinical, educational, or organizational use, you'll evaluate its validity evidence. Is there proof that this ADHD screening tool actually measures ADHD symptoms (content validity)? Is there evidence that scores correlate with clinical diagnosis and don't just reflect general anxiety (construct validity)?
Interpreting Test Results: Knowing a test's validity limitations helps you interpret scores cautiously. If a measure has weak divergent validity, you know to be careful about making strong interpretations.
Communicating with Clients and Courts: When you write reports or provide testimony, you may need to explain why you selected certain tests. "This measure has strong content validity, with systematic expert review ensuring comprehensive coverage of the relevant skill domain" is more compelling than "I like this test."
Developing New Measures: If you ever create questionnaires, protocols, or measures for research or practice, you'll use these principles to ensure they're valid.
Practice Tips for Remembering These Concepts
For Content Validity:
- Associate "content" with "comprehensive coverage", like a comprehensive exam that covers all course content
- Remember: Content validity = Expert review + Representative sampling
- Think of achievement tests and job samples as prime examples
For Construct Validity:
- Remember it's for hypothetical traits you can't directly see
- Use the phrase "Converge with friends, diverge from strangers" to remember that tests should correlate with similar measures and not correlate with different measures
For the Multitrait-Multimethod Matrix: Create an acronym for the four coefficients:
- M-M: Monotrait-Monomethod (reliability. Should be high)
- M-H: Monotrait-Heteromethod (convergent validity. Should be high)
- H-M: Heterotrait-Monomethod (divergent validity. Should be low)
- H-H: Heterotrait-Heteromethod (divergent validity. Should be low)
The pattern: Same traits should correlate high, different traits should correlate low.
For Factor Analysis:
- Factor loadings = correlations between tests and factors
- Communality = total variance explained (think "common" = "combined")
- Square loadings to get percentage of variance explained
- Tests measuring the same construct should load on the same factor
Creating a Quick Study Table:
| Validity Type | What It Assesses | How It's Established | Key Example |
|---|---|---|---|
| Content | Domain coverage | Expert review, systematic sampling | Achievement tests, work samples |
| Construct | Hypothetical traits | Convergent + divergent validity | Intelligence, personality, depression |
Key Takeaways
-
Validity is about accuracy: Reliability means consistency; validity means actually measuring what you intend to measure.
-
Content validity is established through systematic domain definition, representative sampling, and expert review. It's crucial for achievement tests and work samples.
-
Face validity is not real validity. It's just whether a test looks appropriate to examinees. Sometimes helpful, sometimes undesirable.
-
Construct validity is essential for measuring hypothetical traits that can't be directly observed. It requires accumulated evidence.
-
Convergent validity: Your test should correlate highly with other measures of the same construct.
-
Divergent/discriminant validity: Your test should correlate minimally with measures of unrelated constructs.
-
Multitrait-multimethod matrix assesses validity by comparing correlations across different traits and methods:
- Same trait, same method = reliability (high)
- Same trait, different method = convergent validity (high)
- Different trait, same method = divergent validity (low)
- Different trait, different method = divergent validity (low)
-
Factor analysis reveals underlying dimensions:
- Factor loadings show correlations between tests and factors
- Square loadings to get percentage of variance explained
- Communality shows total variance explained by all factors
- Tests of the same construct should load on the same factor
-
The modern view treats validity as a unitary concept with multiple sources of evidence, but the EPPP uses the traditional three-type framework.
-
Application matters: These concepts guide test selection, result interpretation, and professional communication throughout your career.
When you encounter validity questions on the EPPP, slow down and identify what type of validity is being asked about. Is it about domain coverage (content)? About measuring a hypothetical trait (construct)? About comparing to a criterion (criterion-related, covered separately)? Understanding the distinctions will help you confidently select the correct answer.
