Why Test Score Interpretation Matters for Your Psychology Career
You've just completed an assessment with a client. They look at you expectantly and ask, "So what does my score mean? Am I doing okay?" This is the moment where your understanding of test score interpretation becomes critical. You can't just say "You got 75 correct." You need to put that number in context, explain what it reveals, and help your client understand their results in a meaningful way.
Test score interpretation isn't just about crunching numbers. It's about translating raw data into insights that can guide treatment, inform hiring decisions, support educational planning, and help people understand themselves better. For the EPPP, you'll need to master two major approaches: norm-referenced scores (comparing someone to others) and criterion-referenced scores (measuring against a standard). Let's break this down.
The Foundation: Raw Scores Need Context
A raw score by itself tells you almost nothing. {{M}}If I told you I ran a race in 8 minutes, you'd immediately ask: "How long was the race? What's a good time? How did others do?"{{/M}} The same principle applies to psychological testing. When a client scores 75 on a test, that number only becomes meaningful when we have a reference point.
This is where standardization comes in. When test developers create a test, they administer it to a large, representative sample of people, the standardization sample. This process establishes uniform procedures for giving and scoring the test, and it creates norms (comparison data) that let us interpret individual scores.
Norm-Referenced Scores: The Comparison Approach
Norm-referenced scores answer the question: "How did this person do compared to others?" The goal is to make distinctions among individuals based on the trait or ability being measured. There are two main types you'll need to know: percentile ranks and standard scores.
Percentile Ranks: Where Do You Stand?
A percentile rank tells you the percentage of people in the reference group who scored at or below a particular score. If your client receives a percentile rank of 82, it means 82% of people in the comparison group scored the same or lower.
Here's something crucial to understand: percentile ranks use a nonlinear transformation. Every raw score distribution (whether it's bell-shaped, skewed, or lumpy) gets converted into a perfectly flat (rectangular) distribution. Why? Because exactly 1% of scores get assigned to each percentile rank. The highest 1% become percentile rank 100, the next 1% become percentile rank 99, and so on.
{{M}}Think of it like a TV singing competition where contestants get ranked from 1 to 100, regardless of whether the talent pool is evenly distributed or if there's a huge gap between the top two singers.{{/M}} The ranking system forces an even distribution even when the actual talent differences aren't even at all.
This creates an important limitation: the difference between percentile ranks of 50 and 60 doesn't represent the same raw score difference as the gap between percentile ranks of 90 and 100. The scores get squeezed and stretched in the conversion process.
Standard Scores: Speaking the Language of Standard Deviations
Standard scores tell you how far someone's score falls from the average, measured in standard deviations. Unlike percentile ranks, converting raw scores to standard scores is a linear transformation, the shape of the distribution stays the same. If your raw scores form a bell curve, your standard scores will too.
Let's look at the four types of standard scores you need to know:
Z-Scores: The Foundation
Z-scores have a mean of 0 and a standard deviation of 1.0. They express scores directly in standard deviation units. A z-score of -1.0 means the person scored one standard deviation below average. A z-score of +2.0 means two standard deviations above average.
The formula is straightforward: z = (X – M)/SD
Where X is the person's raw score, M is the mean, and SD is the standard deviation.
Let's say your client scores 110 on a test where the average is 100 and the standard deviation is 5. Their z-score would be: (110 – 100)/5 = +2.0.
T-Scores: Avoiding Negative Numbers
T-scores have a mean of 50 and a standard deviation of 10. They're essentially z-scores that have been rescaled to avoid negative numbers and decimals, which can confuse clients. A T-score of 40 means one standard deviation below average. A T-score of 70 means two standard deviations above average.
IQ Scores: The Most Recognized Standard
Full-scale IQ scores on tests like the Wechsler scales and Stanford-Binet have a mean of 100 and a standard deviation of 15. An IQ of 85 means one standard deviation below average. An IQ of 130 means two standard deviations above average.
Stanines: Dividing Into Ninths
Stanines (short for "standard nine") range from 1 to 9, with a mean of 5 and a standard deviation of 2. Each stanine represents about half a standard deviation, except for the extremes (1 and 9, which extend beyond). A stanine of 5 includes scores from .25 standard deviations below to .25 standard deviations above the mean.
The Critical Table: Standard Score Equivalents
Here's the table you absolutely must memorize for the EPPP. It shows how different standard scores relate to each other and to percentile ranks in a normal distribution:
| Standard Deviations | Z-Score | T-Score | IQ Score | Stanine | Percentile Rank |
|---|---|---|---|---|---|
| -2 SD | -2.0 | 30 | 70 | 1 | 2 |
| -1 SD | -1.0 | 40 | 85 | 3 | 16 |
| Mean | 0 | 50 | 100 | 5 | 50 |
| +1 SD | +1.0 | 60 | 115 | 7 | 84 |
| +2 SD | +2.0 | 70 | 130 | 9 | 98 |
Notice the pattern for percentile ranks: 2, 16, 50, 84, 98. These correspond to -2 SD, -1 SD, mean, +1 SD, and +2 SD. Committing this to memory will help you answer multiple EPPP questions.
A Note on Normalization
Sometimes test developers want to transform a non-normal distribution into a normal one. This is called normalization and involves a nonlinear transformation. However, this is typically only done when the raw score distribution already approximates a normal curve. You're just smoothing out minor irregularities, not forcing wildly skewed data into a bell shape.
Criterion-Referenced Scores: Measuring Against a Standard
While norm-referenced scores compare people to each other, criterion-referenced scores evaluate whether someone has achieved a specific level of competence or mastery. The question shifts from "How do you compare to others?" to "Can you do what needs to be done?"
Percentage Scores: The Mastery Approach
Percentage scores simply indicate what proportion of test content the person answered correctly. Answer 75 out of 150 questions correctly, and you get 50%.
{{M}}This is like getting a driver's license. You need to demonstrate you can perform the required skills, regardless of how well others drive.{{/M}} When organizations use percentage scores for decision-making, they typically establish a cutoff score (like 80% correct) to determine who has achieved mastery.
This approach makes the most sense when you have a clearly defined content domain and you want to know if someone has learned it. {{M}}If you're training new therapists in suicide risk assessment, you care whether they can accurately identify warning signs, not whether they're in the top 20% of trainees.{{/M}}
Expectancy Tables: Predicting Future Performance
Expectancy tables bridge the gap between test scores and real-world outcomes. Instead of predicting a single outcome score (like a regression equation does), an expectancy table shows the distribution of outcomes for people at different score levels.
Here's how they work in practice: Suppose a company validates a selection test by giving it to current employees along with performance evaluations. They can then create a table showing what percentage of people at different test score ranges achieved different performance levels.
For example:
| Test Score Range | Below Average Performance | Average Performance | Above Average Performance |
|---|---|---|---|
| 90-100 | 10% | 35% | 55% |
| 80-89 | 20% | 45% | 35% |
| 70-79 | 40% | 40% | 20% |
| 60-69 | 60% | 30% | 10% |
{{M}}This gives hiring managers a realistic picture, like checking restaurant reviews before visiting. You see the full range of experiences, not just an average rating.{{/M}} A job applicant who scores 95 still has a 10% chance of underperforming, but their odds are much better than someone who scores 65.
Using Test Scores for Selection Decisions
When organizations use tests for hiring, promotions, or admissions, they need a strategy for making decisions. Three common approaches are cutoff scores, ranking, and banding.
Cutoff Scores: Pass or Fail
With a cutoff approach, you set a minimum score and select everyone at or above that threshold. One method for determining the cutoff is to link test scores to criterion scores (like job performance ratings) and use the test score associated with the lowest acceptable performance as your cutoff.
{{M}}It's similar to setting a minimum GPA for graduate school admission. It's a clear, objective line in the sand.{{/M}} The challenge is setting the cutoff at the right level. Too high, and you exclude qualified candidates; too low, and you admit people who'll struggle.
Ranking: Top-Down Selection
With ranking, you order all candidates from highest to lowest score and select from the top until you've filled all positions. This maximizes the average test score of selected candidates, but it has drawbacks.
The main issue is that it treats all score differences as meaningful, even tiny ones that might just reflect measurement error. {{M}}If two candidates score 87 and 86, ranking treats the one-point difference as significant, even though on a different day their scores might reverse due to normal fluctuation.{{/M}}
Banding: Accounting for Measurement Error
Banding groups scores into ranges based on the test's standard error of measurement. All scores within a band are considered statistically equivalent. You can't confidently say one is truly higher than another.
Here's how it works: The band containing the highest scores gets considered first. All candidates within that band are evaluated using additional criteria like experience, interviews, or interpersonal skills. Only after that band is exhausted do you move to the next band down.
The rationale is sound: Small score differences often reflect test unreliability rather than true ability differences. {{M}}Two candidates who score 85 and 88 might actually have identical true abilities, with the three-point difference just reflecting the noise inherent in any measurement.{{/M}}
Advocates particularly note that banding can reduce adverse impact. Because bands often include members of groups that tend to receive lower test scores on average, these candidates remain in consideration alongside others in the same band. Selection then happens based on factors unrelated to group membership, potentially increasing diversity without sacrificing test validity.
Common Misconceptions Students Have
Misconception 1: "Percentile ranks and percentage scores are the same thing."
No. A percentile rank of 75 means you scored better than 75% of people. A percentage score of 75 means you answered 75% of questions correctly. These are completely different metrics. You could answer 90% of questions correctly but still have a percentile rank of 50 if everyone else also answered 90% correctly.
Misconception 2: "The distance between any two percentile ranks represents the same difference in ability."
Not true. Because percentile ranks use nonlinear transformation, the raw score difference between percentile ranks 50 and 60 is typically much smaller than the difference between percentile ranks 90 and 95. Most scores cluster around the middle of a normal distribution, so small raw score changes create larger percentile rank changes in that region.
Misconception 3: "Standard scores can only be used with normally distributed data."
False. Standard scores preserve the shape of the raw score distribution through linear transformation. You can calculate z-scores and T-scores for any distribution. Normalization (forcing a distribution to be normal) is a separate, optional step that uses nonlinear transformation.
Misconception 4: "An IQ of 115 is 15% above average."
This confuses standard deviations with percentages. An IQ of 115 is one standard deviation above the mean of 100, which corresponds to approximately the 84th percentile, meaning the person scored higher than about 84% of people, not 15% above average.
Misconception 5: "Criterion-referenced scores don't compare people to each other at all."
While the primary purpose is evaluating mastery of content, criterion-referenced scores can still reveal differences among individuals. If five trainees score 60%, 70%, 80%, 90%, and 95% on a criterion-referenced test with an 80% cutoff, you can see both who passed the mastery threshold and how they compare to each other.
Practice Tips for Remembering This Material
Create a master conversion chart. Write out the table showing standard score equivalents and percentile ranks, then practice filling it in from memory. Do this daily until it's automatic.
Use the "2, 16, 50, 84, 98" percentile rank pattern. These five numbers unlock most EPPP questions about percentile ranks in normal distributions. They correspond to -2 SD, -1 SD, mean, +1 SD, and +2 SD. Notice that moving one standard deviation from the mean changes the percentile rank by 34 points (50 to 84, or 50 to 16).
Remember the means and standard deviations. Make flashcards:
- Z-scores: M = 0, SD = 1
- T-scores: M = 50, SD = 10
- IQ scores: M = 100, SD = 15
- Stanines: M = 5, SD = 2
Distinguish norm vs. criterion by asking the question. Norm-referenced: "Compared to whom?" Criterion-referenced: "Can they do it?"
Practice calculating z-scores. Take any set of numbers, calculate the mean and standard deviation, then convert a few scores to z-scores using the formula: z = (X – M)/SD. This builds intuition about what z-scores represent.
Connect standard scores to each other. If you know one type, you can figure out the others. A z-score of +1.0 equals a T-score of 60 (just multiply by 10 and add 50), which equals an IQ of 115 (multiply by 15 and add 100), which equals approximately stanine 7, which equals approximately the 84th percentile rank.
Understand nonlinear vs. linear transformations intuitively. Linear keeps the shape; nonlinear changes it. Converting raw scores to z-scores keeps the distribution shape. Converting to percentile ranks flattens it. This distinction appears frequently on the EPPP.
For banding, remember it's about measurement error. The key concept is that the standard error of measurement tells us how much uncertainty exists around any score. Banding acknowledges this uncertainty by treating nearby scores as equivalent.
Real-World Applications
Understanding test score interpretation isn't just about passing the EPPP. It's essential for competent practice:
Clinical assessment: When you administer cognitive tests, personality inventories, or symptom measures, you'll need to explain results to clients in understandable terms. Knowing that their T-score of 65 on an anxiety measure means they're scoring higher than about 93% of people helps them understand the significance.
Educational planning: School psychologists use standard scores to determine eligibility for special education services. Understanding that a child with an IQ of 85 is one standard deviation below average (not "below average by 15 points") matters for accurate interpretation.
Forensic work: When providing expert testimony about test results, you need to explain what scores mean to judges and juries who have no statistical background. Can you clearly articulate the difference between a percentile rank and a percentage score?
Personnel selection: Industrial-organizational psychologists use these concepts when helping organizations make hiring decisions. Understanding the tradeoffs between cutoff scores, ranking, and banding affects both selection validity and fairness.
Communicating with other professionals: When writing reports or consulting with colleagues, using standard scores appropriately ensures clear communication. Reporting that a client scored "at the 84th percentile with a z-score of +1.0" is more precise than saying they scored "above average."
Key Takeaways
-
Raw scores need context. Test standardization with representative samples provides the comparison data that makes scores interpretable.
-
Norm-referenced scores compare individuals to a reference group. The two main types are percentile ranks and standard scores.
-
Percentile ranks use nonlinear transformation. Every distribution becomes flat (rectangular), with 1% of scores assigned to each percentile rank.
-
Standard scores use linear transformation. The distribution shape stays the same. They express performance in standard deviation units from the mean.
-
Memorize standard score equivalents: Z (M=0, SD=1), T (M=50, SD=10), IQ (M=100, SD=15), Stanine (M=5, SD=2).
-
Remember the percentile rank landmarks: In a normal distribution, percentile ranks of 2, 16, 50, 84, and 98 correspond to -2 SD, -1 SD, mean, +1 SD, and +2 SD respectively.
-
Criterion-referenced scores evaluate mastery or competence against a predetermined standard, not comparison to others.
-
Percentage scores indicate proportion correct. They're often combined with cutoff scores to determine mastery.
-
Expectancy tables show the distribution of criterion outcomes for different predictor score ranges, providing more nuanced prediction than point estimates.
-
Three selection approaches exist: Cutoff scores (pass/fail), ranking (top-down selection), and banding (grouping scores within measurement error ranges).
-
Banding can reduce adverse impact by treating statistically equivalent scores as interchangeable and allowing selection based on additional factors within bands.
Understanding test score interpretation transforms you from someone who can administer tests into someone who can meaningfully interpret and communicate what those scores reveal. This knowledge serves your clients, your organization, and your professional competence throughout your career.
