Why Test Score Interpretation Actually Matters for Your Career
You've just administered a cognitive test to a client. She scored a 75. Now what? Does that mean she answered 75% correctly? That she's the 75th best performer? That her IQ is 75? Without knowing how to interpret test scores, you're essentially reading tea leaves instead of providing professional psychological services.
Here's the reality: Every time you write a report, talk to a family about their child's testing results, or make a treatment recommendation, you're translating numbers into meaningful information that affects people's lives. A parent might hear "84th percentile" and panic, thinking their kid failed something. Your job is to know exactly what that number means and communicate it clearly. This isn't just exam material—it's the foundation of ethical psychological practice.
The Two Big Families of Test Scores
Think about how we evaluate performance in everyday life. When you get your annual performance review at work, your boss might say one of two things: "You sold more than 90% of people on your team" (comparing you to others) or "You hit 95% of your sales targets" (comparing you to a standard). These represent the two main ways psychologists interpret test scores: norm-referenced and criterion-referenced approaches.
Norm-referenced scores tell you how someone performed compared to a reference group. It's like checking your marathon time against thousands of other runners. Criterion-referenced scores tell you how someone performed against a specific standard or goal. It's like checking whether you finished the marathon under 4 hours, regardless of what anyone else did.
Both approaches have their place. Norm-referenced scores help you understand where someone falls in the distribution of abilities. Criterion-referenced scores help you determine whether someone has mastered specific content or skills.
Norm-Referenced Scores: The Art of Comparison
Percentile Ranks: Everyone's Favorite (and Most Misunderstood)
A percentile rank tells you what percentage of people in the reference group scored at or below a particular score. If your client scores at the 82nd percentile, that means 82% of people in the comparison group scored the same or lower.
Here's where people mess up constantly: A percentile rank is NOT the same as a percentage correct. Your client could answer only 40% of questions correctly but still be at the 82nd percentile if everyone else in the reference group also struggled with the test. Similarly, answering 90% correctly doesn't automatically put you at the 90th percentile—maybe everyone else also got 90% or higher.
Think of percentile ranks like waiting in line for concert tickets. If you're at the 82nd percentile, you're ahead of 82% of people in line. Your position in line (percentile) has nothing to do with how long you've been waiting (percentage correct)—it only matters how many people are behind you.
A crucial technical detail: Converting raw scores to percentile ranks creates what's called a "nonlinear transformation." This happens because the percentile distribution is always flat (rectangular)—exactly 1% of scores go into each percentile rank slot. The top 1% get assigned to the 100th percentile, the next 1% to the 99th percentile, and so on. This forced distribution means that percentile ranks don't maintain the same intervals between scores that existed in the original data.
Standard Scores: Speaking the Language of Standard Deviations
Standard scores express performance in terms of standard deviations from the mean. Unlike percentile ranks, converting raw scores to standard scores creates a "linear transformation"—the shape of the distribution stays the same, just with different numbers on the axis. Imagine taking a photo and resizing it; the proportions stay identical.
Let's meet the standard score family:
| Score Type | Mean | Standard Deviation | Range |
|---|---|---|---|
| Z-score | 0 | 1.0 | Usually -3 to +3 |
| T-score | 50 | 10 | Usually 20 to 80 |
| IQ score (Wechsler/SB-5) | 100 | 15 | Usually 55 to 145 |
| Stanine | 5 | 2 | 1 to 9 |
Z-Scores: The Foundation
Z-scores are the most straightforward. A z-score directly tells you how many standard deviations away from the mean someone scored. The formula is simple:
z = (X – M) / SD
Where X is the person's raw score, M is the mean, and SD is the standard deviation.
Let's say your client scores 110 on a test where the mean is 100 and the standard deviation is 5. Their z-score would be: (110 – 100) / 5 = +2.0. They scored exactly two standard deviations above the mean.
Think of standard deviations like zones on a map spreading out from a central point. The mean is downtown, and each standard deviation is another neighborhood farther from the center. A z-score tells you which neighborhood someone's score lives in.
T-Scores: Adding Comfort
Many psychologists prefer T-scores because they eliminate negative numbers and decimals. With a mean of 50 and standard deviation of 10, T-scores feel more intuitive. A T-score of 40 means "one standard deviation below the mean" but looks less intimidating than z = -1.0.
This is especially helpful when explaining results to clients. Saying "Your score was 40, compared to an average of 50" sounds less alarming than "Your score was negative one."
IQ Scores: The Most Famous
Full-scale IQ scores on major intelligence tests (like the Wechsler scales and Stanford-Binet) have a mean of 100 and standard deviation of 15. This standardization happened decades ago and has stuck. An IQ of 85 means someone scored one standard deviation below the mean. An IQ of 130 means two standard deviations above.
Stanines: Chunking Information
Stanines (short for "standard nine") range from 1 to 9 with a mean of 5 and standard deviation of 2. What makes stanines unique is that each level (except the extremes of 1 and 9) represents half a standard deviation. A stanine of 5 captures scores from .25 SD below the mean to .25 SD above the mean.
Stanines work like a simplified rating system. Instead of dealing with precise scores, you're grouping performance into nine broad categories. It's similar to how streaming services might rate movies from 1 to 5 stars—you lose some precision but gain simplicity.
The Critical Conversions You Must Memorize
For the EPPP, you need to quickly convert between different standard scores and know their percentile equivalents in a normal distribution. Here's the cheat sheet that should become automatic:
| Standard Deviations | Z-Score | T-Score | IQ Score | Percentile Rank |
|---|---|---|---|---|
| +2 SD | +2.0 | 70 | 130 | 98 |
| +1 SD | +1.0 | 60 | 115 | 84 |
| Mean | 0 | 50 | 100 | 50 |
| -1 SD | -1.0 | 40 | 85 | 16 |
| -2 SD | -2.0 | 30 | 70 | 2 |
Notice the pattern in percentile ranks: In a normal distribution, approximately 68% of scores fall within one standard deviation of the mean (between the 16th and 84th percentiles), and about 95% fall within two standard deviations (between the 2nd and 98th percentiles).
Here's a memory trick: The percentile ranks follow a pattern of 2, 16, 50, 84, 98. Notice that 16 and 84 add up to 100, as do 2 and 98. The distribution is symmetrical.
Criterion-Referenced Scores: Meeting a Standard
Percentage Scores: Straightforward Mastery
Percentage scores tell you what portion of the test content someone answered correctly. If a test has 150 questions and your client answers 75 correctly, they achieved 50%. Simple.
These scores shine when you're assessing mastery of specific content. Imagine creating a test to determine whether therapy trainees can correctly identify suicide risk factors. You might decide that correctly identifying 80% or more means they've mastered this critical knowledge. That 80% becomes your cutoff score—the minimum threshold for passing.
This approach is common in licensing exams (including parts of psychology licensure) and educational settings. The question isn't "How did you do compared to others?" but rather "Did you demonstrate sufficient competence?"
Expectancy Tables: Predicting Future Performance
Expectancy tables take a different approach to criterion-referenced interpretation. They show you the probability of different outcomes based on a test score. These tables come from validity studies that examine the relationship between a predictor (like a test) and a criterion (like job performance).
Let's say a company conducted a study where they gave current employees a cognitive ability test and also collected their performance ratings. They might create an expectancy table that looks like this:
| Test Score Range | Below Average Performance | Average Performance | Above Average Performance |
|---|---|---|---|
| 90-100 | 10% | 35% | 55% |
| 80-89 | 25% | 45% | 30% |
| 70-79 | 45% | 40% | 15% |
Now when a job applicant scores between 90-100, you can tell them (or the hiring manager) that based on past data, there's a 55% chance they'll be an above-average performer, a 35% chance they'll be average, and only a 10% chance they'll be below average.
This is more honest than a regression equation that predicts a single outcome. Instead, expectancy tables acknowledge uncertainty. They're like checking weather forecasts—you get probabilities rather than absolute predictions. "There's a 70% chance of rain" helps you decide whether to carry an umbrella better than a definitive but potentially wrong "It will rain."
Using Test Scores for Selection Decisions
When organizations use tests to make hiring or admission decisions, they typically employ one of three methods: cutoff scores, ranking, or banding.
Cutoff Scores: The Pass/Fail Approach
With cutoff scores, anyone scoring at or above the cutoff gets selected (or moves forward in the process). This method works well when you need to ensure a minimum level of competence.
One way to set a cutoff is to link test scores to actual performance. For example, if you know that people who score below 75 on your test almost always perform poorly on the job, you'd set 75 as your cutoff. Anyone scoring 75 or higher continues in the hiring process.
Ranking: Top-Down Selection
Ranking is like elite college admissions. You rank all candidates from highest to lowest based on test scores, then select from the top down until you've filled all available positions. If you need to hire 10 people and 100 apply, you select the top 10 scorers.
This method maximizes the average test performance of your selected group but can be controversial because it treats small score differences as meaningful. Is someone who scored 87 really a better candidate than someone who scored 86?
Banding: Acknowledging Measurement Error
Banding represents a more nuanced approach. Instead of treating every score as precise, banding groups scores into ranges based on the test's standard error of measurement. Scores within the same band are considered statistically equivalent—the differences between them are likely due to measurement error rather than true differences in ability.
Here's how it works: If a test has a standard error of measurement of 5 points, you might create bands like 90-95, 85-89, 80-84, etc. All candidates scoring 90-95 are treated as equivalent. You'd consider all candidates in the highest band before moving to the next band down.
Within each band, you can use other factors (experience, interview performance, specific skills) to make selection decisions. This approach acknowledges that tests aren't perfect measuring tools—they're thermometers that might be off by a degree or two.
Advocates argue that banding can reduce adverse impact on minority groups who may score lower on average on certain tests. By treating similar scores as equivalent and allowing other factors to enter the decision, banding can increase diversity without abandoning merit-based selection. However, this approach remains somewhat controversial in personnel psychology.
Common Misconceptions That Trip Up Even Smart Students
Misconception 1: Percentile ranks and percentage scores are the same thing. They're completely different. A percentile rank compares you to others; a percentage score tells you how much content you got right. You could get 60% correct and be at the 90th percentile if the test was extremely difficult for everyone.
Misconception 2: The distance between percentile ranks is uniform. Wrong. Percentile ranks are squeezed together near the mean and spread out at the extremes. The difference between the 50th and 55th percentiles represents far fewer raw score points than the difference between the 95th and 100th percentiles. This happens because scores cluster around the mean in a normal distribution.
Misconception 3: Standard scores can only be calculated for normally distributed data. Not true. You can calculate standard scores for any distribution, and the resulting distribution will have the same shape as the original. However, psychologists sometimes use a nonlinear transformation to normalize distributions, but this is only done when the original distribution approximates normal anyway.
Misconception 4: A criterion-referenced test doesn't need norms. While criterion-referenced tests focus on mastery of content rather than comparison to others, norms can still be useful. Knowing that 90% of qualified professionals score above 80% on your mastery test helps you set appropriate cutoff scores.
Misconception 5: Higher standard deviations mean better performance. The standard deviation is just a measure of spread—how much scores vary around the mean. A high standard deviation means scores are spread out; a low standard deviation means they're clustered together. Neither is inherently better or worse.
Practice Tips for Making This Stick
Create a conversion anchor chart. Write out the table with z-scores, T-scores, IQ scores, and percentile ranks. Keep it visible while studying. Quiz yourself: "If someone is at the 84th percentile, what's their z-score?" (Answer: +1.0) "What IQ score corresponds to that?" (Answer: 115)
Use the formula in practice. Don't just memorize z = (X – M) / SD. Actually calculate z-scores from scratch with different examples. Make up your own scenarios: "The mean reaction time is 500ms with an SD of 50ms. My client's reaction time was 425ms. What's their z-score?" Work through it: (425 – 500) / 50 = -1.5.
Draw the normal curve. Sketch it out and mark the percentile ranks at -2 SD (2nd percentile), -1 SD (16th percentile), mean (50th percentile), +1 SD (84th percentile), and +2 SD (98th percentile). This visual becomes automatic with repetition.
Connect to real clinical examples. When studying case vignettes for the EPPP, translate the scores mentioned. "The client scored at the 2nd percentile on processing speed" should immediately trigger: "That's -2 SD, or about two z-scores below the mean, which would be an IQ of 70 on that index."
Remember the percentile pattern: 2, 16, 50, 84, 98. These correspond to -2, -1, 0, +1, and +2 standard deviations. The middle 68% of scores fall between the 16th and 84th percentiles. The middle 95% fall between the 2nd and 98th percentiles.
For banding, focus on the concept. You probably won't need to calculate bands on the exam, but you should understand that banding groups similar scores together based on measurement error, allowing consideration of factors beyond test scores within bands.
Key Takeaways
-
Norm-referenced scores compare performance to a reference group; criterion-referenced scores compare performance to a standard or predict outcomes
-
Percentile ranks indicate the percentage who scored at or below a given score; they are NOT the same as percentage correct
-
Converting to percentile ranks creates a nonlinear transformation (flat distribution); converting to standard scores creates a linear transformation (preserves shape)
-
Z-scores have M = 0, SD = 1.0; calculated as z = (X – M) / SD
-
T-scores have M = 50, SD = 10
-
IQ scores (Wechsler/Stanford-Binet) have M = 100, SD = 15
-
Stanines have M = 5, SD = 2, range from 1-9
-
In a normal distribution, memorize these percentile-standard deviation pairs: 2nd percentile = -2 SD, 16th percentile = -1 SD, 50th percentile = Mean, 84th percentile = +1 SD, 98th percentile = +2 SD
-
Expectancy tables show probabilities of different criterion outcomes based on predictor scores
-
Banding groups similar test scores together based on measurement error, treating scores within bands as equivalent
-
Selection methods include cutoff scores (minimum threshold), ranking (top-down selection), and banding (grouping scores within error ranges)
Understanding test score interpretation isn't just about passing the EPPP—it's about being able to look at any test result and immediately know what it means for your client's life. Master these conversions, understand the logic behind different score types, and you'll be able to confidently interpret and explain psychological test results throughout your career.
