Resources / 7: Research Methods & Statistics / Types of Variables and Data

Types of Variables and Data

7: Research Methods & Statistics

Why Variables and Data Types Matter for Your EPPP Success

Understanding variables and data types isn't just about passing a test question. It's the foundation for reading research, evaluating treatment effectiveness, and making evidence-based decisions in your future practice. When you read that a new therapy approach "significantly reduces depression," you need to know what variables were measured, how they were measured, and whether the conclusions make sense. This lesson will break down these concepts so they stick with you not just for exam day, but for your entire career.

The Cast of Characters: Types of Variables

Every research study tells a story with different players. Let's meet the main characters and understand their roles.

Independent and Dependent Variables: The "What Affects What" Relationship

The independent variable is what researchers manipulate or compare (it's the "cause" side of the equation. The dependent variable is what they measure to see if anything changed) it's the "effect" side.

{{M}}Think of planning a dinner party where you're testing whether background music affects how much your guests enjoy the meal. The type of music (jazz, classical, or silence) is your independent variable (you control it. Your guests' enjoyment ratings are the dependent variable) you measure it to see if the music made a difference.{{/M}}

Here's a practical trick that works every time: Convert any study description into this question format: "What are the effects of [independent variable] on [dependent variable]?"

Let's practice:

  • Study: Researchers compare cognitive-behavioral therapy, medication, and combined treatment for reducing panic attacks.
  • Question: "What are the effects of treatment type on panic attack frequency?"
  • Answer: Treatment type (with three levels: CBT, medication, combined) is the independent variable. Panic attack frequency is the dependent variable.

Remember: The independent variable always has at least two levels. You're always comparing something to something else. Treatment versus no treatment, high dose versus low dose, Group A versus Group B.

Moderator Variables: The "It Depends" Factor

A moderator variable changes the strength or direction of the relationship between your independent and dependent variables. {{M}}It's like how the effectiveness of your morning coffee (independent variable) on your alertness (dependent variable) might depend on whether you're a regular coffee drinker or someone who rarely has caffeine (moderator variable). The same cup affects different people differently.{{/M}}

Clinical example: You're studying whether exposure therapy reduces social anxiety. You find it works great for some clients but not others. When you dig deeper, you discover that clients with strong social support networks improve more than those without support. Social support is a moderator variable. It changes how effective the treatment is.

Mediator Variables: The "Why It Works" Explanation

A mediator variable explains why or how an independent variable affects a dependent variable. It's the mechanism, the middle step in the process.

{{M}}If you notice that people who start meal planning (independent variable) lose weight (dependent variable), the mediator might be that meal planning leads to fewer impulsive fast-food purchases, which then leads to weight loss. The reduced fast-food consumption is the mediator. It explains the connection.{{/M}}

In psychotherapy research, cognitive therapy for depression assumes this pathway:

  • Therapy (independent variable) → Changes in negative thinking patterns (mediator) → Reduced depressive symptoms (dependent variable)

The mediator answers the question: "What's happening between the treatment and the outcome?"

Extraneous Variables: The Unwanted Guests

Extraneous variables (also called confounding variables) are factors you didn't plan for but might mess up your results. They make it hard to know if your independent variable really caused the changes you observed.

{{M}}Imagine you're comparing two exercise classes for stress reduction. Class A meets at 6 AM and Class B meets at 6 PM. If Class B shows better results, is it because of something special about the instructor, or because people who choose evening classes are already less stressed (since they don't have to wake up early)? Time of day becomes an extraneous variable that confounds your results.{{/M}}

Good research design tries to control or eliminate extraneous variables. That's why randomization matters. It helps spread out potential confounds evenly across groups.

Scales of Measurement: How We Quantify Experience

Not all data is created equal. The way we measure something determines what we can say about it and which statistical tests we can use. Let's work through the four scales from simplest to most informative.

Nominal Scale: Categories Without Order

Nominal scales simply sort people into categories with no inherent ranking. These are labels, nothing more.

Examples:

  • Gender identity
  • Eye color
  • Diagnosis (depression, anxiety, PTSD)
  • Treatment group assignment (Group A, Group B)

You can count how many people fall into each category, but you can't say one category is "more" or "better" than another in a quantitative sense. {{M}}It's like organizing your streaming services. Netflix, Hulu, and Disney+ are different categories, but none is mathematically "greater" than the others.{{/M}}

A special type of nominal variable is the dichotomous variable, which has only two categories (yes/no, treated/untreated, improved/not improved).

Ordinal Scale: Ranked but Unequal Steps

Ordinal scales put people in order, but the distances between rankings aren't necessarily equal.

{{M}}Think about how you rate your stress level: "not stressed," "somewhat stressed," "very stressed," and "extremely stressed." You know "very stressed" is worse than "somewhat stressed," but the jump between categories might not be the same size each time. The difference between "not" and "somewhat" might be smaller than the difference between "very" and "extremely."{{/M}}

Common examples:

  • Likert scales (strongly agree to strongly disagree)
  • Class rankings (1st, 2nd, 3rd place)
  • Education level (high school, bachelor's, master's, doctorate)

With ordinal data, you know the order but not the exact distance between points.

Interval Scale: Equal Distances, No True Zero

Interval scales have equal distances between adjacent points, but no absolute zero point where the quality being measured completely disappears.

IQ scores are the classic example. The difference between IQ scores of 100 and 101 equals the difference between 130 and 131. However, an IQ of 0 doesn't mean "no intelligence". There's no true zero point on the scale.

Most standardized psychological tests produce interval data. The key feature: equal intervals mean you can add and subtract scores meaningfully, but you can't make ratio statements.

Ratio Scale: The Full Package

Ratio scales have everything: ordered categories, equal intervals, and an absolute zero point that means "none of this quality exists."

Examples:

  • Weight in pounds
  • Age in years
  • Number of therapy sessions attended
  • Reaction time in milliseconds
  • Annual income in dollars

The absolute zero point lets you make ratio statements. Someone who weighs 200 pounds is exactly twice as heavy as someone who weighs 100 pounds. Someone who attended 10 sessions attended twice as many as someone who attended 5 sessions.

Here's the key distinction between interval and ratio: You cannot say someone with an IQ of 150 is 1.5 times as intelligent as someone with an IQ of 100 (interval scale), but you can say someone who's 60 years old is twice as old as someone who's 30 (ratio scale).

ScaleOrder?Equal Intervals?True Zero?Example
NominalNoNoNoDiagnosis categories
OrdinalYesNoNoLikert ratings
IntervalYesYesNoIQ scores
RatioYesYesYesNumber of symptoms

Visualizing Data: Choosing the Right Graph

Your measurement scale determines which graph type is appropriate. Using the wrong graph for your data type is like trying to eat soup with a fork. Technically possible but definitely not right.

Bar Graphs: For Categories

Bar graphs work with nominal and ordinal data. Each category gets its own bar, and the bars are separated by spaces to show they're distinct categories, not a continuous scale.

{{M}}If you're displaying how many clients at your clinic have different primary diagnoses (depression, anxiety, OCD, PTSD), you'd use a bar graph. Each diagnosis is a separate category, and the space between bars reinforces that these aren't points on a continuum.{{/M}}

Histograms: For Continuous Data

Histograms are used with interval and ratio data. The bars touch each other because the data represents a continuous range of scores, not separate categories.

{{M}}If you're showing the distribution of depression severity scores (measured on a scale from 0-100) among your clients, you'd use a histogram. The touching bars show this is a continuous scale where someone could theoretically score anywhere along the range.{{/M}}

Frequency Polygons (Line Graphs): Another Option for Continuous Data

Frequency polygons also work with interval and ratio data. Instead of bars, you place dots above each score or score interval and connect them with lines, creating a smooth shape that makes patterns easy to see.

Understanding Distribution Shapes: What the Data's Shape Tells You

When you graph data, the shape of the distribution reveals important information about your sample and helps you choose appropriate statistical analyses.

The Normal Distribution: The Ideal Bell Curve

The normal distribution is symmetrical and bell-shaped. It has special properties that make it beloved by statisticians:

  1. The mean, median, and mode all equal the same value (they're all at the center peak)
  2. About 68% of scores fall within one standard deviation of the mean
  3. About 95% fall within two standard deviations
  4. About 99% fall within three standard deviations

{{M}}If you measure the wait times for appointments at a large clinic and get a normal distribution with a mean of 15 minutes and standard deviation of 5 minutes, you know that about 68% of clients wait between 10 and 20 minutes, about 95% wait between 5 and 25 minutes, and virtually everyone waits between 0 and 30 minutes.{{/M}}

Many statistical tests assume your data follows a normal distribution, so understanding this pattern matters for analysis decisions.

Skewed Distributions: When Data Piles to One Side

Skewed distributions are asymmetrical, with most scores bunched on one side and a few extreme scores creating a tail on the other side.

Negatively skewed distribution: Most scores are high, with a few low scores stretching out the left tail. {{M}}Think of scores on an easy exam where most students get As and Bs, but a few students who didn't study get Ds and Fs.{{/M}}

Positively skewed distribution: Most scores are low, with a few high scores stretching out the right tail. {{M}}Annual income in the general population is positively skewed. Most people earn moderate amounts, but a few high earners create a long tail on the high end.{{/M}}

Here's the memory trick: The tail tells the tale. The tail points toward the skew direction. Negative skew = tail points left (toward negative numbers). Positive skew = tail points right (toward positive numbers).

In skewed distributions, the mean, median, and mode separate:

  • The mean gets pulled toward the tail (toward the extreme scores)
  • The mode stays with the hump (where most scores cluster)
  • The median sits in between
Distribution TypeMean PositionMedian PositionMode Position
Negatively SkewedLowest valueMiddleHighest value
NormalAll three equalAll three equalAll three equal
Positively SkewedHighest valueMiddleLowest value

When data is skewed, the median is usually better than the mean for representing a typical score because the mean gets distorted by those extreme values.

Leptokurtic and Platykurtic: Peak Height Variations

These describe how pointy or flat a distribution looks compared to a normal curve.

Leptokurtic distribution: Sharper peak and flatter tails than normal. {{M}}Picture everyone's scores bunched tightly around the average on a test that had very predictable content. Lots of people got scores right in the middle range, with few very high or very low scores.{{/M}}

Platykurtic distribution: Flatter peak and fatter tails than normal. {{M}}Imagine a test so unpredictable that scores spread out more evenly across the range. No strong clustering in the middle.{{/M}}

These terms come up less frequently on the EPPP than skewness, but they're worth knowing for recognition purposes.

Common Misconceptions and Exam Traps

Misconception 1: "The independent variable is always manipulated by the researcher."

Not quite. While researchers often assign participants to different treatment conditions (true manipulation), comparing pre-existing groups also involves an independent variable. When you compare therapy outcomes for clients with different attachment styles, attachment style is your independent variable even though you didn't create or assign it.

Misconception 2: "Moderator and mediator are basically the same thing."

They're fundamentally different. A moderator changes whether or how much something works (it modifies the effect). A mediator explains why something works (it's part of the causal chain). Moderators answer "for whom?" while mediators answer "how?"

Misconception 3: "Higher level scales are always better."

While ratio scales provide more information than nominal scales, the appropriate scale depends on what you're measuring. You can't measure someone's diagnosis category on a ratio scale. Some variables are inherently categorical. Use the scale that matches your construct.

Misconception 4: "If bars touch, it's always a histogram."

The touching bars in a histogram reflect continuous data, but make sure the data is actually interval or ratio before calling it a histogram. The graph type should match your measurement scale.

Misconception 5: "In a positively skewed distribution, most scores are positive."

No! The direction of skew refers to where the tail points, not the sign of the scores. In a positively skewed distribution, the tail points toward higher (more positive) values, but most scores are actually on the lower end. The few extreme high scores create the positive tail.

Memory Aids and Study Strategies

For Independent vs. Dependent Variables: Create the question "What are the effects of ___ on ___?" The first blank is always your independent variable, the second is your dependent variable.

For Moderator vs. Mediator:

  • Moderator: Sounds like "moderate" or "modify". It modifies the strength of the effect
  • Mediator: Think of the word "medium" or "middle". It's the middle step that explains the connection

For Scales of Measurement (NOIR): Use the acronym NOIR (French for "black"): Nominal, Ordinal, Interval, Ratio. They go from least to most information.

For Skewness: Visual memory: Draw a quick distribution and notice where the tail points. That tells you the direction of the skew. Then remember: mean follows tail, mode follows hump, median in the middle.

For Bar Graph vs. Histogram:

  • Bar graph = Breaks between bars (for categories)
  • Histogram = Hugging bars (for continuous data)

Practice Exercise: Take any research abstract from a psychology journal and identify all five variable types: What's the independent variable? What's dependent? Can you spot any moderators or mediators mentioned? What about potential extraneous variables they tried to control?

Key Takeaways

  • Independent variables are what researchers manipulate or compare; dependent variables are what they measure as outcomes
  • Moderator variables change the strength or direction of effects (they modify relationships)
  • Mediator variables explain why or how effects occur (they're the mechanism)
  • Extraneous variables are unintended factors that confound your results
  • Nominal scales provide categories without order (diagnosis, gender)
  • Ordinal scales provide order without equal intervals (rankings, Likert scales)
  • Interval scales have equal intervals but no true zero (IQ, most standardized tests)
  • Ratio scales have equal intervals and a true zero (age, weight, count data)
  • Bar graphs (with spaces between bars) display nominal and ordinal data
  • Histograms and frequency polygons (continuous) display interval and ratio data
  • In a normal distribution, mean = median = mode, and scores follow the 68-95-99 rule
  • In skewed distributions, the tail tells the direction, and the mean gets pulled toward the tail
  • Leptokurtic = pointy peak; platykurtic = flat peak

Understanding these fundamentals will help you analyze research questions correctly on the EPPP and evaluate scientific evidence throughout your career. When you can identify what type of variable you're dealing with and what its measurement properties are, you're equipped to think critically about research claims and statistical conclusions.

Ready to practice? Get started in the app.