Resources / 3, 5, 6: Organizational Psychology / Employee Selection – Evaluation of Techniques

Employee Selection – Evaluation of Techniques

3, 5, 6: Organizational Psychology

Why Employee Selection Techniques Matter (And Why You Need to Know This Cold)

Here's the deal: Organizations spend huge amounts of money hiring people. Sometimes they get it right and hire someone amazing. Other times, they hire someone who causes chaos, costs money, and eventually quits or gets fired. The difference? Usually, it's how they evaluated their selection techniques.

For the EPPP, this topic bridges organizational psychology with measurement principles. You'll need to understand not just what makes a good selection tool, but how we determine if it's actually worth using. This isn't abstract theory, understanding these concepts means you could genuinely help a company save hundreds of thousands of dollars or avoid costly discrimination lawsuits.

Let's break down how psychologists evaluate whether a hiring tool (like a test, interview, or work sample) is worth using.

The Four Big Questions Every Selection Technique Must Answer

Before a company uses any new hiring tool, psychologists ask four critical questions:

  1. Is it reliable and valid? (Does it measure something consistently and accurately?)
  2. Does it have incremental validity? (Does it actually improve our hiring decisions?)
  3. Will it cause adverse impact? (Does it unfairly screen out protected groups?)
  4. Does it have good utility? (Is the financial return worth the investment?)

Let's tackle each one.

Reliability and Validity: The Foundation

Reliability: Consistency is Key

Reliability means the selection tool gives you consistent results. {{M}}Think of reliability like your bathroom scale. If you step on it five times in a row and get five wildly different numbers, that scale isn't reliable. It's measuring inconsistently due to random error. But if it gives you the same number each time, you know it's at least consistent.{{/M}}

A reliability coefficient ranges from 0 to 1.0. The closer to 1.0, the less random error is messing with your measurements. Most good selection tools aim for reliability coefficients above .70 or .80.

There are different ways to check reliability:

  • Test-retest: Does the same person get similar scores at different times?
  • Internal consistency: Do different items on the test measure the same thing?
  • Inter-rater: Do different evaluators give similar scores?

Here's the catch: A selection tool can be perfectly reliable but still useless for hiring. {{M}}Your bathroom scale might consistently tell you that you weigh 150 pounds, but if you actually weigh 180 pounds, that scale is reliable but not valid. It's consistently measuring the wrong thing.{{/M}}

Validity: Measuring What Matters

Validity answers the question: "Does this tool actually measure what we think it measures?" For the EPPP, you need to know three types of validity inside and out.

Content Validity: Sampling the Right Stuff

Content validity means the selection tool adequately covers the knowledge or skills needed for the job. You establish content validity by:

  • Basing the tool on a thorough job analysis
  • Having subject matter experts review it

{{M}}Imagine you're hiring a barista, and your selection test only asks about coffee bean origins but nothing about operating an espresso machine, handling customer complaints, or working during a rush. That test has poor content validity. It's missing huge chunks of what the job actually requires.{{/M}}

Content validity is crucial for:

  • Job knowledge tests
  • Work samples
  • Structured interviews based on job requirements

Construct Validity: Measuring the Right Trait

Construct validity means the tool actually measures the psychological trait (construct) it claims to measure. A construct is a hypothetical trait like intelligence, conscientiousness, or emotional stability.

To establish construct validity, you'd show that:

  • Scores correlate highly with other valid measures of the same construct (convergent validity)
  • Scores don't correlate too highly with measures of different constructs (discriminant validity)

{{M}}If you create a test claiming to measure emotional intelligence, but scores on your test correlate more strongly with regular IQ tests than with other emotional intelligence measures, you've got a construct validity problem. Your test might just be measuring general intelligence in disguise.{{/M}}

Construct validity matters most for:

  • Intelligence tests
  • Personality assessments
  • Any tool claiming to measure a specific psychological trait

Criterion-Related Validity: Predicting Performance

This is the big one for employee selection. Criterion-related validity answers: "Do scores on this selection tool predict actual job performance?"

You calculate a validity coefficient by correlating predictor scores (the selection tool) with criterion scores (job performance measures). This coefficient ranges from -1.0 to +1.0:

  • +1.0 = perfect positive relationship (higher test scores = better performance)
  • 0 = no relationship at all
  • -1.0 = perfect negative relationship (higher test scores = worse performance)

In reality, validity coefficients for selection tools usually fall between .20 and .50. Even a coefficient of .30 can be valuable under the right conditions (more on that soon).

Here's a comparison of the three validity types:

Validity TypeKey QuestionBest ForHow to Establish
ContentDoes it sample the job content?Job knowledge tests, work samplesJob analysis + expert review
ConstructDoes it measure the trait it claims?Intelligence tests, personality testsCorrelations with other measures
Criterion-RelatedDoes it predict job performance?Any predictor when performance prediction mattersCorrelate predictor with performance

Incremental Validity: Does Adding This Tool Actually Help?

So you've got a valid selection tool. Great! But here's the million-dollar question: If you add this new tool to your current hiring process, will you actually make better hiring decisions?

Incremental validity is the increase in decision-making accuracy you get from adding a new predictor. Three factors determine whether a new tool will improve your decisions:

Factor 1: The Validity Coefficient

Obviously, a tool with a higher criterion-related validity coefficient will improve decisions more. But even tools with modest validity (around .30) can help under certain conditions.

Factor 2: The Selection Ratio

The selection ratio is calculated as:

Selection Ratio = Number of people you'll hire ÷ Total number of applicants

  • A selection ratio of .10 means you'll hire 1 out of every 10 applicants (low ratio)
  • A selection ratio of .90 means you'll hire 9 out of every 10 applicants (high ratio)

A low selection ratio is better because you're choosier. {{M}}It's like being accepted to a competitive graduate program versus a program that admits almost everyone who applies. When you're selective, each additional piece of information helps you differentiate between many qualified candidates.{{/M}}

Factor 3: The Base Rate

The base rate is the percentage of people hired under your current system who turn out to be successful employees.

  • High base rate (.80): 80% of people you hire succeed → Your current system is already working well
  • Moderate base rate (.50): 50% succeed → Lots of room for improvement
  • Low base rate (.20): Only 20% succeed → Something's seriously wrong (probably not the selection process)

A moderate base rate (around .50) gives you the most room for improvement. Here's why:

If your base rate is already .80, your current system is doing great, adding a new tool probably won't help much. {{M}}It's like trying to optimize an already excellent recipe. Sure, you might make it slightly better, but you won't see dramatic improvement.{{/M}}

If your base rate is very low (.20), the problem probably isn't the selection process. It's something else, like terrible training, awful management, or unrealistic job demands. {{M}}It's like if 80% of people who join your gym quit within a month. The problem isn't your membership screening process; the problem is probably your gym.{{/M}}

The Taylor-Russell Tables: Putting It All Together

The Taylor-Russell tables let you estimate how much a new predictor will improve hiring success for different combinations of validity coefficients, selection ratios, and base rates.

Here's an example from the tables:

  • Current base rate: .50 (50% of current hires are successful)
  • New predictor's validity coefficient: .30 (fairly modest)
  • Selection ratio: .10 (hiring 1 out of 10 applicants)

Result: Adding the new predictor increases success rate to 71%, a 21% improvement!

But change the selection ratio to .90 (hiring 9 out of 10 applicants), and that same predictor only increases success to 53%. Barely a 3% improvement.

This is why tech companies and consulting firms can use selection tools with moderate validity and still see big benefits: they have very low selection ratios (hundreds of applicants for each position).

Adverse Impact: Making Sure Your Tool Is Fair

Even if a selection tool is valid and improves decisions, you can't use it if it unfairly discriminates against protected groups. Adverse impact (also called disparate impact) occurs when a selection method negatively affects protected group members compared to majority group members.

The Uniform Guidelines on Employee Selection Procedures outlines two main problems that cause adverse impact:

Test Unfairness

Test unfairness happens when members of one group consistently score lower on a selection test, but this difference doesn't appear in their actual job performance.

{{M}}Imagine two employees: Marcus and Jennifer. They both receive the same excellent performance ratings from their supervisor. But when they applied for their jobs, Jennifer scored much higher than Marcus on the selection test, and actually, most male applicants scored lower than female applicants. If men and women perform equally well on the job but women consistently outscore men on the selection test, that test is unfair to men.{{/M}}

The test is predicting something other than job performance, and as a result, qualified men are being screened out unfairly.

Differential Validity

Differential validity occurs when a selection tool has significantly different validity coefficients for different groups.

Example:

  • The test's validity coefficient for men: .70 (strong predictor)
  • The test's validity coefficient for women: .20 (weak predictor)

This means the test predicts job performance well for men but poorly for women. It's not working the same way for both groups.

The 80% Rule (Four-Fifths Rule)

The Uniform Guidelines describes a practical method for detecting adverse impact: the 80% rule.

Adverse impact is occurring when the hiring rate for a protected group is less than 80% of the hiring rate for the majority group.

Here's the calculation:

  1. Calculate hiring rate for majority group: (Number hired ÷ Number applied) × 100
  2. Calculate hiring rate for protected group: (Number hired ÷ Number applied) × 100
  3. Divide protected group rate by majority group rate
  4. If the result is less than .80, adverse impact is likely occurring

Example:

  • White applicants: 70% hired
  • African American applicants: 50% hired
  • Calculation: 50% ÷ 70% = .71
  • Result: .71 < .80, so adverse impact is occurring

The minimum acceptable hiring rate would be: .70 × .80 = .56 (56%)

What Employers Can Do About Adverse Impact

If a court determines adverse impact is occurring, the employer has three options:

  1. Replace the procedure with one that doesn't have adverse impact
  2. Modify the procedure so it no longer has adverse impact
  3. Defend the procedure by showing it's job-related and no alternative exists

To show a procedure is job-related, the employer must demonstrate:

DefenseWhat It MeansExample
ValidityThe procedure has adequate criterion-related, content, or construct validitySelection test scores correlate .45 with job performance ratings
Business NecessityThe requirement is necessary for safe and efficient operationAirline requiring pilots to have excellent vision
Bona Fide Occupational Qualification (BFOQ)The requirement is essential for normal business operationsReligious school requiring teachers to be members of that religion

Important EPPP note: BFOQ can apply to gender, age, religion, and national origin. But never to race.

Utility Analysis: Show Me the Money

Finally, even if a selection tool is valid, improves decisions, and doesn't cause adverse impact, organizations want to know: Is it worth the cost?

Utility analysis calculates the economic return on investment from using a selection tool. The most commonly cited formula is the Brogden-Cronbach-Gleser formula, which estimates utility in dollars based on:

  • Number of people hired
  • The test's validity coefficient
  • Standard deviation of job performance in dollars (how much difference there is between good and poor performers)
  • Cost of testing

{{M}}Think of utility analysis like deciding whether to pay for a premium dating app subscription. Sure, the premium features might help you find better matches (validity), but is the improvement worth the monthly fee? That depends on how much you value the outcome, how many people you're meeting, and what the subscription costs.{{/M}}

Utility analysis often shows that even modest validity coefficients produce substantial dollar returns when:

  • You're hiring many people (large N)
  • The difference between good and poor performers is large (high standard deviation of performance)
  • The test is relatively inexpensive

For a company hiring 100 employees per year where the difference between a good and poor performer is worth $50,000 annually, even a test with a validity coefficient of .30 can yield hundreds of thousands of dollars in returns.

Common Misconceptions Students Have

Misconception 1: "Higher reliability always means higher validity."

Reality: Not at all. A test can be perfectly reliable (consistent) but completely invalid (measuring the wrong thing). Reliability is necessary for validity, but it doesn't guarantee it.

Misconception 2: "Only high validity coefficients are useful."

Reality: Even modest validity coefficients (.20-.40) can substantially improve hiring when the selection ratio is low and base rate is moderate. Check the Taylor-Russell tables!

Misconception 3: "If a test is valid, it's legal to use."

Reality: Even valid tests can cause adverse impact. You must evaluate both validity and fairness. Many valid tests have been found illegal because they disproportionately screened out protected groups without sufficient job-relatedness.

Misconception 4: "Content validity is enough for any job-related test."

Reality: Content validity is important for job knowledge tests and work samples, but if you want to predict future performance (not just sample current knowledge), you need criterion-related validity.

Misconception 5: "The 80% rule is a strict legal requirement."

Reality: The 80% rule is a guideline for detecting possible adverse impact. Courts consider multiple factors, not just this calculation. However, falling below 80% will definitely trigger scrutiny.

Practice Tips for Remembering This Material

For reliability and validity types: Create a simple acronym or visual. The "three Cs of validity" can help:

  • Content: Does it cover the job Content?
  • Construct: Does it measure the Construct?
  • Criterion: Does it predict the Criterion?

For incremental validity factors: Remember "VBS". Validity coefficient, Base rate, Selection ratio. {{M}}Think of it like adjusting three dials to see if a new hire tool is worth adding.{{/M}}

For the 80% rule: Practice the calculation with different numbers until it's automatic. Make up scenarios: "If 60% of men are hired, what's the minimum for women?" (.60 × .80 = .48 or 48%)

For adverse impact defenses: Remember "V-B-B" (Validity, Business necessity, BFOQ), and lock in that race can never be a BFOQ.

For Taylor-Russell conditions: Remember that you want a "Goldilocks" base rate (not too high, not too low), a low selection ratio (more choosy), and as high a validity coefficient as you can get.

Key Takeaways

  • Reliability = consistency of measurement (ranges from 0 to 1.0)

  • Three validity types you must know:

    • Content validity: Does it sample the job content? (for knowledge tests, work samples)
    • Construct validity: Does it measure the intended trait? (for intelligence, personality tests)
    • Criterion-related validity: Does it predict job performance? (coefficient from -1.0 to +1.0)
  • Incremental validity increases when:

    • The validity coefficient is higher
    • The selection ratio is lower (more selective)
    • The base rate is moderate (around .50)
  • Taylor-Russell tables estimate how much a new predictor improves hiring success

  • Adverse impact occurs when a selection method disproportionately affects protected groups

    • Test unfairness: Score differences don't reflect performance differences
    • Differential validity: Different validity coefficients for different groups
  • 80% rule: Adverse impact is likely when the protected group's hiring rate is less than 80% of the majority group's hiring rate

  • Three defenses for adverse impact: Validity, Business necessity, BFOQ (never for race)

  • Utility analysis calculates the dollar value of using a selection tool (Brogden-Cronbach-Gleser formula)

Master these concepts, and you'll handle any EPPP question about evaluating selection techniques. More importantly, you'll understand how organizations make evidence-based hiring decisions that are both effective and fair.

Ready to practice? Get started in the app.