Linear Regression

Linear Regression

Linear Regression is one of the simplest and most widely used algorithms in machine learning.
It is used to predict a continuous target variable based on one or more input features.

The key idea is to fit a linear equation to the observed data:

\[y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n + \epsilon\]

Where:

  • $y$ is the target variable
  • $x_1, x_2, \dots, x_n$ are input features
  • $\beta_0$ is the intercept
  • $\beta_1, \dots, \beta_n$ are coefficients
  • $\epsilon$ is the error term

How It Works

  1. Fit a line that minimizes the difference between predicted and actual values.
    This is usually done by minimizing the Mean Squared Error (MSE):
\[\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} \left(y_i - \hat{y}_i\right)^2\]
  1. Estimate coefficients using methods like Ordinary Least Squares (OLS).

Assumptions

Linear Regression works best when the following assumptions hold:

  1. Linearity: Relationship between features and target is linear
  2. Independence: Observations are independent
  3. Homoscedasticity: Constant variance of errors
  4. Normality: Errors are normally distributed

Simple Example in Python

from sklearn.linear_model import LinearRegression
import numpy as np

# Sample data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])

# Fit linear regression model
model = LinearRegression()
model.fit(X, y)

print("Coefficient:", model.coef_)
print("Intercept:", model.intercept_)