📚 Stats Cheatsheet

Welcome to Your Stats Survival Guide! 🎉

Let's be real: statistics can feel like learning a new language. But here's the secret — you don't need to memorize formulas or become a math wizard. You just need to know which test to use when and how to read the results.

This guide is written in plain English (mostly!), with real marketing examples you'll actually encounter. Think of it as your stats translator.

🎯 Quick Decision Tree:

Comparing 2 groups? → T-Test
Comparing 3+ groups? → ANOVA (then Post-Hoc)
Looking at categories? → Chi-Square
Predicting continuous outcome? → Multiple Regression
Predicting yes/no? → Logistic Regression
Measuring relationships? → Correlation
Comparing percentages? → Z-Test for Proportions
Weird/skewed data? → Mann-Whitney (non-parametric)
Finding customer segments? → Cluster Analysis

📊

T-Test

"The classic comparison tool"

What is it?

The t-test compares the average (mean) between two groups to see if they're statistically different. It's like asking: "Is Version A really better than Version B, or did I just get lucky?"

When to Use It:

✓ Comparing 2 groups (and only 2!)
✓ Your data is continuous (numbers like prices, time, ratings)
✓ Data is roughly normal-ish (bell curve-ish)
✓ You want to compare means

Three Flavors:

Independent: Two separate groups (Ad A vs Ad B)
Paired: Same group, before/after (Pre-campaign vs Post-campaign)
One-Sample: Your group vs a known value (Your NPS vs industry average of 50)

🎬 Real Example:

Scenario: You're testing two email subject lines for your Black Friday campaign.

• Subject A: "50% Off Everything!" → sent to 1,000 people → avg. order value: $85
• Subject B: "Last Chance: Huge Savings!" → sent to 1,000 people → avg. order value: $92

Question: Is that $7 difference real, or just random luck?

Answer: Run an independent t-test! If p < 0.05, Subject B is legitimately better. 🎉

📖 How to Read Results:

t-statistic: How many "standard errors" apart the groups are. Bigger = more different.

p-value: If p < 0.05, the difference is "statistically significant" (probably not random).

Cohen's d (effect size): How BIG is the difference? < 0.5 = small, 0.5-0.8 = medium, > 0.8 = large.

95% Confidence Interval: "We're 95% sure the true difference is between X and Y."

⚠️ Watch Out For:

• Only works for 2 groups — if you have 3+, use ANOVA instead!
• Needs decent sample sizes (at least 30 per group is safer)
• Assumes data is roughly normal (not super skewed)
• Statistical significance ≠ practical importance! A difference can be "significant" but tiny and meaningless for business.

🎯

ANOVA

"T-test's big brother for multiple groups"

What is it?

ANOVA (Analysis of Variance) compares means across 3 or more groups. It tells you: "Is at least one group different from the others?" Think of it as a t-test on steroids.

When to Use It:

✓ Comparing 3+ groups
✓ Continuous outcome variable (sales, satisfaction, etc.)
✓ Want to know if any groups differ
✓ All groups have similar variance

Two Types:

One-Way ANOVA: One factor (e.g., comparing 3 ad designs)
Two-Way ANOVA: Two factors at once (e.g., ad design × time of day)

🎬 Real Example:

Scenario: You're testing 4 different promotional strategies for your loyalty program:

• Strategy A: 10% discount → avg. new signups: 45/week
• Strategy B: Free shipping → avg. new signups: 52/week
• Strategy C: Points multiplier → avg. new signups: 61/week
• Strategy D: Exclusive access → avg. new signups: 48/week

Question: Are these differences real or random noise?

Answer: Run one-way ANOVA! If F-statistic is significant (p < 0.05), at least one strategy works differently.

📖 How to Read Results:

F-statistic: Ratio of between-group variance to within-group variance. Bigger = groups differ more.

p-value: If p < 0.05, at least one group is significantly different (but doesn't tell you which one!).

Eta-squared (η²): Effect size. Shows % of variance explained by groups. 0.01 = small, 0.06 = medium, 0.14+ = large.

🔍 Pro Tip: ANOVA only tells you "something is different" — you need post-hoc tests (like Tukey's HSD) to figure out WHICH groups differ!

⚠️ Watch Out For:

• ANOVA doesn't tell you WHICH groups differ — only that at least one does!
• Assumes equal variances across groups (Levene's test can check this)
• More sensitive to outliers than t-tests
• With many groups, you increase the chance of false positives (Type I error)

🎲

Chi-Square Test

"For when you're counting categories"

What is it?

Chi-square tests relationships between categorical variables (groups, types, categories). It's all about counts — not averages! Think: "Do men and women prefer different products?"

When to Use It:

✓ Comparing categories (not numbers)
✓ Data is frequency counts (how many in each group)
✓ Want to test independence or goodness of fit
✓ Each cell has at least 5 expected observations

Two Flavors:

Goodness of Fit: Does your data match expected distribution?
Test of Independence: Are two categorical variables related?

🎬 Real Example:

Scenario: You want to know if customer age group affects product preference.

Age Group	Product A	Product B	Product C
18-25	120	45	35
26-40	85	110	55
41+	40	75	90

Question: Is product choice independent of age, or are they related?

Answer: Chi-square test of independence! If p < 0.05, age and product preference ARE related.

📖 How to Read Results:

χ² (Chi-square value): How much observed counts differ from expected. Bigger = more association.

p-value: If p < 0.05, the variables are significantly related (not independent).

Cramér's V (effect size): Strength of association. 0-0.1 = weak, 0.1-0.3 = moderate, 0.3+ = strong.

Degrees of freedom (df): Calculated as (rows - 1) × (columns - 1). Just needed for looking up critical values.

⚠️ Watch Out For:

• Only for categorical data — can't use it on continuous variables!
• Needs at least 5 expected observations in each cell (use Fisher's exact test if cells are small)
• Doesn't show direction — tells you IF there's a relationship, not HOW variables relate
• Sensitive to sample size — huge samples make even tiny differences "significant"

🔗

Correlation

"Measuring how things move together"

What is it?

Correlation measures the strength and direction of the relationship between two continuous variables. Key word: relationship, not causation! Just because two things correlate doesn't mean one causes the other.

When to Use It:

✓ Both variables are continuous (numbers)
✓ Want to know if two things are related
✓ Not trying to predict one from the other (that's regression)
✓ Relationship is roughly linear (straight line-ish)

The r Value:

r = +1: Perfect positive correlation (as X ↑, Y ↑)
r = 0: No correlation (no relationship)
r = -1: Perfect negative correlation (as X ↑, Y ↓)
|r| > 0.7: Strong correlation
|r| = 0.3-0.7: Moderate correlation
|r| < 0.3: Weak correlation

🎬 Real Example:

Scenario: You suspect that your advertising spend and sales revenue are related.

• Collect data for 12 months
• X-axis: Monthly ad spend ($1K, $2K, $3K, etc.)
• Y-axis: Monthly sales revenue

Question: Is there a relationship between ad spend and sales?

Answer: Calculate Pearson's r! If r = 0.85 (strong positive), more ad spend is associated with higher sales. If p < 0.05, the correlation is statistically significant (not due to chance).

📖 How to Read Results:

r (Pearson's correlation): Ranges from -1 to +1. Sign shows direction, magnitude shows strength.

r² (R-squared): % of variance in Y explained by X. If r = 0.8, then r² = 0.64 = 64% explained.

p-value: If p < 0.05, the correlation is "real" (not random noise).

🚨 BIG WARNING: Correlation ≠ Causation! Just because ice cream sales and drowning rates correlate doesn't mean ice cream causes drowning. (Both increase in summer!)

⚠️ Watch Out For:

• Correlation DOES NOT imply causation! (Can't say this enough!)
• Outliers can dramatically affect r
• Only measures linear relationships (misses curves)
• Spurious correlations are everywhere — always ask "does this make sense?"

📈

Regression

"Predicting the future (sort of)"

What is it?

Regression lets you predict one variable from another. It's like correlation's practical cousin. Instead of just saying "these are related," regression says "if X increases by 1, Y will increase by [this much]."

When to Use It:

✓ Want to predict an outcome (Y) from predictors (X)
✓ Both variables are continuous
✓ Relationship is roughly linear
✓ Need to quantify how much X affects Y

The Equation:

Y = b₀ + b₁(X)

Y: What you're predicting (dependent variable)
X: What you're using to predict (independent variable)
b₀: Intercept (Y when X = 0)
b₁: Slope (change in Y for each 1-unit increase in X)

🎬 Real Example:

Scenario: You want to predict monthly sales based on your Facebook ad spend.

• Collected 10 months of data
• Run regression: Sales = $5,000 + $4.50(Ad Spend)

What this means:

• Intercept ($5,000): Base sales with zero ad spend
• Slope ($4.50): For every $1 spent on ads, sales increase by $4.50

Prediction: If you spend $1,000 on ads → Expected sales = $5,000 + $4.50(1,000) = $9,500

📖 How to Read Results:

Slope (b₁): The "bang for your buck." How much Y changes for each unit of X.

R² (R-squared): % of variance explained by the model. 0.75 = model explains 75% of variation in Y.

p-value for slope: If p < 0.05, X is a significant predictor of Y.

Standard error: How much predictions typically miss by. Lower = better predictions.

⚠️ Watch Out For:

• Don't extrapolate beyond your data! Model breaks down outside observed range.
• Outliers can throw off the entire line
• Just because model is significant doesn't mean it's useful (check R²!)
• Correlation ≠ Causation still applies! Regression doesn't prove X causes Y.

🔀

Mann-Whitney U Test

"When your data is weird"

What is it?

The Mann-Whitney U test is the non-parametric alternative to the t-test. It compares medians instead of means, and it works when your data is skewed, has outliers, or isn't normal.

When to Use It:

✓ Comparing 2 independent groups
✓ Data is skewed or has outliers
✓ Data is ordinal (rankings, Likert scales)
✓ Sample sizes are small
✓ T-test assumptions are violated

Key Difference from T-Test:

T-test: Compares means, assumes normal distribution
Mann-Whitney: Compares distributions/ranks, no normality assumption
Think of it as: "Does one group tend to have higher values than the other?"

🎬 Real Example:

Scenario: You're comparing customer satisfaction ratings (1-5 stars) between two store locations.

• Store A: Mostly 3-4 stars, few outliers at 1 star
• Store B: More consistent 4-5 stars
• Data is ordinal (Likert scale) and skewed

Question: Is Store B significantly better?

Answer: Use Mann-Whitney U test instead of t-test! It's better suited for ordinal, non-normal data. If p < 0.05, Store B has significantly higher ratings.

📖 How to Read Results:

U statistic: Counts how many times values from Group B exceed values from Group A. Smaller U = bigger difference.

Z-score: Standardized version of U. Used for larger samples (n > 20).

p-value: If p < 0.05, the distributions differ significantly.

⚠️ Watch Out For:

• Less powerful than t-test if data IS normal (harder to detect real differences)
• Only for 2 groups — if you have 3+, use Kruskal-Wallis instead
• Tests if distributions differ, not just medians (can be significant even if medians are same!)

📊

Z-Test for Proportions

"Comparing percentages like a pro"

What is it?

The Z-test for proportions compares percentages between two groups. Think: conversion rates, click-through rates, success rates — anything that's "X out of Y people did something."

When to Use It:

✓ Comparing two proportions/percentages
✓ Data is binary (yes/no, click/no click)
✓ Large enough sample (at least 10 successes and failures in each group)
✓ Independent samples

Common Use Cases:

• A/B testing conversion rates
• Comparing click-through rates
• Survey response differences
• Win rates between strategies

🎬 Real Example:

Scenario: You're testing two landing page designs for sign-ups.

• Page A: 1,200 visitors → 84 sign-ups → 7.0% conversion
• Page B: 1,200 visitors → 108 sign-ups → 9.0% conversion

Question: Is that 2% difference real or just luck?

Answer: Run a Z-test for proportions! If p < 0.05, Page B legitimately has a higher conversion rate. Ship it! 🚀

📖 How to Read Results:

Z-score: How many standard errors apart the proportions are. Bigger Z = more different.

p-value: If p < 0.05, the difference is statistically significant.

Difference in proportions: Absolute difference (e.g., 9.0% - 7.0% = 2.0% improvement).

💡 Pro Tip: Even if p < 0.05, ask: "Is this difference meaningful for business?" A 0.1% improvement might be "significant" but not worth the effort!

⚠️ Watch Out For:

• Needs large samples (at least 10 successes/failures per group)
• Small differences can be "significant" with huge samples but not actionable
• Assumes independence — don't use for before/after on same people

🎲

Bayesian Statistics

"A different way of thinking about data"

What is it?

Bayesian statistics flips traditional stats on its head. Instead of asking "what's the probability of seeing this data IF my hypothesis is true?" (frequentist), it asks "what's the probability my hypothesis is true GIVEN the data I've seen?" Mind = blown. 🤯

Key Differences from Frequentist:

✓ Incorporates prior knowledge
✓ Updates beliefs as new data arrives
✓ Gives probability that hypothesis is true
✓ Can make decisions with less data
✓ More intuitive interpretation

The Core Equation:

Posterior = Prior × Likelihood

Prior: What you believed before seeing data
Likelihood: What the data says
Posterior: Your updated belief

🎬 Real Example:

Scenario: You're running an A/B test for a new checkout flow.

• Prior belief: Based on past tests, you think conversion rate is around 5%
• Observed data: New flow gets 6.2% conversion (58 out of 935 users)
• Posterior belief: Combining prior + data, you now believe it's 5.8%

Bayesian Answer: "There's an 87% probability that Version B is better than Version A."

Frequentist Answer: "We reject the null hypothesis at p < 0.05."

See the difference? Bayesian gives you the probability directly — way more useful for business decisions!

📖 Key Concepts:

Prior Distribution: Your belief before seeing data. Can be "uninformative" (no strong opinion) or "informative" (based on past data).

Credible Interval (CI): Bayesian version of confidence interval. "95% CI [0.05, 0.08]" means "95% chance the true value is between 5% and 8%." (This is what people THINK confidence intervals mean!)

P(B > A): Direct probability that B is better than A. E.g., "82% chance B beats A."

💡 Why Marketers Love Bayesian:

• Actionable results: Get probability of success, not just "significant or not"
• Faster decisions: Can stop tests early when you have enough evidence
• Incorporates domain knowledge: Use past campaign data as priors
• Easier to explain: "87% chance B is better" > "p = 0.03 with 97% CI"

⚠️ Watch Out For:

• Prior choice matters! Different priors can give different results
• More complex to calculate (but our calculator does it for you!)
• Not universally accepted in academic journals (yet)
• Can be computationally intensive for complex models

📊

Multiple Regression

"Predict with multiple factors, not just one"

What is it?

Multiple regression lets you predict an outcome using multiple predictor variables at once. Instead of "Does ad spend affect sales?", you can ask "Do ad spend, seasonality, AND pricing all affect sales?" It tells you each variable's unique contribution while controlling for the others.

🎬 Real Example:

Predicting monthly sales from:

• Ad spend ($)
• Season (1=peak, 0=off)
• Price ($)

Result: Sales = 45 + 3.2×(AdSpend) + 12×(Season) - 1.5×(Price)

Each coefficient shows the effect of that variable holding others constant.

When to use it:

✓ Multiple factors affect your outcome
✓ Want to control for confounding variables
✓ Need to know which factors matter most
✓ Building a predictive model

Key Metrics:

R²: % of variance explained (higher = better fit)
Coefficients: Effect of each variable
P-values: Which predictors are significant
VIF: Checks for multicollinearity (<5 is good)

📖 How to Interpret:

"For every $1,000 increase in ad spend, sales increase by $3,200 (holding season and price constant)." The key is "holding others constant" — that's what makes multiple regression powerful!

⚠️ Watch Out For:

• Multicollinearity: Predictors too correlated (check VIF > 10)
• Overfitting: Too many predictors for sample size
• Linearity: Assumes linear relationships
• Outliers: Can heavily influence results

🔍

Post-Hoc Tests

"ANOVA said they differ... but WHO differs from WHO?"

What is it?

ANOVA tells you "at least one group is different" but doesn't say which ones. Post-hoc tests (like Tukey HSD or Bonferroni) do pairwise comparisons to find out exactly which groups differ, while controlling for multiple testing errors.

🎬 Real Example:

Comparing email open rates for 4 subject line types:

• Questions: 14.5%
• Urgency: 19.2%
• Personalized: 15.1%
• Emoji: 17.3%

ANOVA: "At least one is different (p < 0.001)"

Post-Hoc (Tukey): "Urgency significantly beats Questions and Personalized (p < 0.05), but not Emoji."

Tukey HSD:

✓ Most common choice
✓ Controls family-wise error rate
✓ Best for similar sample sizes
✓ More powerful than Bonferroni

Bonferroni:

✓ Very conservative (fewer false positives)
✓ Works with unequal sample sizes
✓ Easy to understand (divides α)
✓ Good for few comparisons

📖 How to Interpret:

"Post-hoc Tukey tests showed Urgency-based subject lines (M=19.2%) significantly outperformed Question-based (M=14.5%, p=0.003) and Personalized (M=15.1%, p=0.008) subject lines."

⚠️ Watch Out For:

• Only run AFTER significant ANOVA! Don't fish for significance
• Too many groups: Power drops with many comparisons
• Bonferroni too conservative: May miss real differences

🎯

Logistic Regression

"Predicting yes/no outcomes (will they buy? will they click?)"

What is it?

Logistic regression predicts binary outcomes (yes/no, 0/1, success/failure) based on predictor variables. Unlike regular regression, it gives you a probability between 0 and 1. Perfect for "Will customer X convert?" or "Will email Y get clicked?"

🎬 Real Example:

Predicting customer purchase (yes/no) from:

• Age
• Income
• Number of website visits

Result: "For every $10k increase in income, odds of purchase increase by 2.3x (p < 0.001)"

Model accuracy: 84%, with precision of 78% for predicting purchases.

When to use it:

✓ Outcome is binary (yes/no)
✓ Want to predict probabilities
✓ Multiple predictors available
✓ Building classification models

Key Metrics:

Odds Ratios: How much each variable changes odds
Accuracy: % correctly predicted
Precision: % of predicted Yes that are really Yes
Recall: % of actual Yes that we caught

📖 How to Interpret:

Odds Ratio = 2.3 for Income: "Each $10k income increase is associated with 2.3× higher odds of purchasing."

Confusion Matrix: Shows true positives, false positives, etc. — essential for evaluating model quality.

⚠️ Watch Out For:

• Imbalanced data: If 95% are "No", model might just predict "No" always
• Overfitting: Too many predictors for sample size
• Multicollinearity: Correlated predictors cause problems
• Sample size: Need enough of BOTH outcomes

🔵

Cluster Analysis (K-Means)

"Finding natural groups in your customer data"

What is it?

Cluster analysis groups similar data points together without predefined labels. K-means is the most common method — it finds k groups by minimizing distance within clusters. Perfect for customer segmentation: "Are there natural customer types in my data?"

🎬 Real Example:

Clustering 500 customers based on:

• Age
• Income
• Purchase frequency
• Average order value

Found 3 segments:

• Cluster 1: Young, low-income, infrequent buyers (25%)
• Cluster 2: Middle-aged, high-income, frequent buyers (35%)
• Cluster 3: Mature, mid-income, regular buyers (40%)

When to use it:

✓ Customer segmentation
✓ Market research
✓ Finding patterns in data
✓ Personalization strategies

Key Concepts:

k: Number of clusters (you choose)
Centroids: Center point of each cluster
WCSS: Within-cluster sum of squares (lower = tighter)
Elbow plot: Helps choose optimal k

📖 How to Interpret:

Look at cluster centroids (average values) to understand what makes each segment unique. Then target each segment differently: VIP treatment for high-value cluster, reactivation campaigns for low-engagement cluster, etc.

⚠️ Watch Out For:

• Choosing k: Use elbow plot, but it's partly subjective
• Scale matters: Normalize variables first (age 20-80, income $20k-$200k)
• Random initialization: Different runs can give different results
• Assumes spherical clusters: Won't find complex shapes

🔍 Quick Reference: Which Test Do I Use?

Question	Data Type	# of Groups	Test to Use
Are two group means different?	Continuous, normal	2	T-Test
Are 3+ group means different?	Continuous, normal	3+	ANOVA
Are two variables related?	Categorical	2+	Chi-Square
Can I predict Y from X?	Both continuous	N/A	Regression
How strongly are X and Y related?	Both continuous	N/A	Correlation
Are two groups different? (non-normal data)	Ordinal or skewed	2	Mann-Whitney
Are two percentages different?	Binary/proportions	2	Z-Test (Proportions)
What's the probability B beats A?	Any	2	Bayesian A/B Test

💡 Key Tips & Best Practices

✅ DO:

• Check your data type FIRST (continuous, categorical, ordinal?)
• Always look at effect sizes, not just p-values!
• Visualize your data before running tests
• Report confidence/credible intervals
• Ask: "Is this statistically AND practically significant?"
• Use the right test for your data (don't force it!)

❌ DON'T:

• Confuse correlation with causation
• Cherry-pick results until you get p < 0.05 (p-hacking!)
• Ignore assumptions (normality, independence, etc.)
• Run multiple tests without correcting for multiple comparisons
• Use t-test for 3+ groups (use ANOVA!)
• Treat "not significant" as "no effect" (absence of evidence ≠ evidence of absence)

🎯 The Golden Rule:

Stats is a tool, not a magic 8-ball. Numbers can guide decisions, but they can't tell you what to do. Always combine statistical results with domain knowledge, business context, and common sense.

And remember: p < 0.05 doesn't mean your idea is good — it just means it's probably not random. A tiny, meaningless effect can still be "statistically significant" with enough data!

Back to Calculators →