Question 1

What is linear regression and how does it work?

Accepted Answer

Linear regression finds the best-fitting straight line through data points by minimising the sum of squared distances between each point and the line — the ordinary least squares method. Result: equation y = mx + b. Slope m tells you how much Y changes per unit increase in X. Intercept b is the Y value when X = 0. Example: advertising spend X vs revenue Y. If m = 5.2, each extra $1 in advertising predicts $5.20 more revenue. If b = 1,200, the model predicts $1,200 revenue with zero advertising (fixed base).

Question 2

How do I interpret R-squared in linear regression?

Accepted Answer

R-squared (R²) = proportion of variance in Y explained by the model. R² = 0.85 means 85% of variation in Y is explained by X — 15% is unexplained by the model. By field: physics and engineering expect R² above 0.90. Business analytics considers 0.70–0.89 strong. Economics and social sciences often accept 0.30–0.50 as meaningful — human behaviour involves many variables beyond any one predictor. R² of 1.0 is a perfect fit (rare in real data). R² near 0 means the model explains almost nothing. Important: R² does not prove causation and does not tell you whether the model's assumptions are met.

Question 3

What is the difference between simple and multiple linear regression?

Accepted Answer

Simple linear regression: one X predicts Y — equation y = mx + b. Multiple linear regression: two or more X variables predict Y — equation y = b₀ + b₁x₁ + b₂x₂ + ... Simple example: predicting salary (Y) from years of experience (X). Multiple example: predicting salary from years of experience, education level, and location. Multiple regression almost always gives higher R² because more predictors explain more variance — but beware of overfitting with too many predictors and too few observations. Use n ≥ 10 to 20 observations per predictor variable as a minimum guideline.

Question 4

How do you interpret the slope in linear regression?

Accepted Answer

The slope (m) in y = mx + b is the predicted change in Y for each one-unit increase in X, holding all else constant. Positive slope: Y increases as X increases. Example — m = 3.5 in a study hours vs test score regression means each additional hour of studying predicts 3.5 more points. Negative slope: Y decreases as X increases. Example — m = -0.8 in a price vs demand model means each $1 price increase predicts 0.8 fewer units sold. Slope of zero: X has no linear predictive relationship with Y. The slope's statistical significance (p-value) tells you whether the relationship is likely real or due to chance — a slope of 3.5 is meaningless if p = 0.40.

Question 5

What is the difference between regression and correlation?

Accepted Answer

Correlation measures the strength and direction of the linear relationship between X and Y — output is the Pearson r coefficient from -1 to +1. Regression quantifies the relationship and gives a prediction equation — output is y = mx + b. Key differences: correlation is symmetric (r between X and Y equals r between Y and X). Regression is directional (predicting Y from X gives a different equation than predicting X from Y). Correlation says "these two variables move together." Regression says "for every one-unit increase in X, Y changes by m units." Use correlation to explore relationships, regression to make predictions. R-squared in regression equals the square of the Pearson correlation coefficient (r²).

Question 6

What are the assumptions of linear regression and why do they matter?

Accepted Answer

Five core assumptions must hold for regression results to be valid. Linearity — the relationship between X and Y is genuinely linear. Violated by curves in scatter plots; fix by transforming variables (log, square root) or using polynomial regression. Independence — observations are independent of each other. Violated by time-series data (autocorrelation); check with Durbin-Watson test. Homoscedasticity — residuals have constant variance across all X values. Violated when residuals fan out; visible in residual vs fitted plots. Normality of residuals — residuals are approximately normally distributed. Important for small samples; less critical with n above 30 (central limit theorem). No severe outliers — extreme values can disproportionately pull the regression line. Check with leverage and Cook's distance statistics. Violating assumptions does not necessarily invalidate regression — it means results require careful interpretation and possible model adjustments.

Question 7

How many data points do you need for linear regression?

Accepted Answer

Absolute minimum: 3 data points — with only 2, any line fits perfectly and R² = 1 meaninglessly. Practical minimum for meaningful results: 10 to 20 observations. With fewer than 10, regression coefficients are unreliable and confidence intervals extremely wide. Recommended for reliable inference: n ≥ 30 — at this point the central limit theorem applies to residuals, making p-values and confidence intervals trustworthy. For multiple regression, the guideline is 10 to 20 observations per predictor variable — a model with 5 predictors needs 50 to 100 observations minimum. The more variable your data (high scatter around the line), the larger the sample needed to detect a statistically significant slope. R-squared increases artificially with fewer data points — an R² of 0.95 from 5 data points is far less meaningful than R² = 0.70 from 200 points.

R² Value	Interpretation	Fit Quality	Typical Fields
0.90 – 1.00	90-100% of variance explained	Excellent	Physics, engineering, controlled experiments
0.70 – 0.89	70-89% of variance explained	Good	Biology, chemistry, business analytics
0.50 – 0.69	50-69% of variance explained	Moderate	Economics, social sciences, marketing
0.30 – 0.49	30-49% of variance explained	Weak	Psychology, behavioral research
0.00 – 0.29	0-29% of variance explained	Poor	Consider non-linear models or additional variables

Field	Example Application	X Variable	Y Variable
Business	Sales forecasting	Advertising spend	Revenue
Economics	Demand modeling	Price	Quantity demanded
Science	Physical laws verification	Force applied	Acceleration
Medicine	Dosage response	Drug dosage	Patient response
Real Estate	Property valuation	Square footage	Sale price
Education	Performance prediction	Study hours	Test scores

Linear Regression Calculator

Calculate Line of Best Fit Equation, Slope, Y-Intercept and R-Squared

About This Calculator

The Linear Regression Formulas

R-Squared (R²) Interpretation Guide

Assumptions of Linear Regression

How to Use This Linear Regression Calculator

Common Linear Regression Mistakes to Avoid

Linear Regression Applications Across Fields

Related Statistical Calculators

Frequently Asked Questions