Linear Regression

Linear Regression

Linear Regression is one of the simplest and most widely used algorithms in machine learning.
It is used to predict a continuous target variable based on one or more input features.

The key idea is to fit a linear equation to the observed data:

$$ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n + \epsilon $$

Where:

  • $y$ is the target variable
  • $x_1, x_2, \dots, x_n$ are input features
  • $\beta_0$ is the intercept
  • $\beta_1, \dots, \beta_n$ are coefficients
  • $\epsilon$ is the error term

How It Works

  1. Fit a line that minimizes the difference between predicted and actual values.
    This is usually done by minimizing the Mean Squared Error (MSE):

[ MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y_i})^2 ]

  1. Estimate coefficients using methods like Ordinary Least Squares (OLS).

Assumptions

Linear Regression works best when the following assumptions hold:

  1. Linearity: Relationship between features and target is linear
  2. Independence: Observations are independent
  3. Homoscedasticity: Constant variance of errors
  4. Normality: Errors are normally distributed

Simple Example in Python

```python from sklearn.linear_model import LinearRegression import numpy as np

Sample data

X = np.array([[1], [2], [3], [4], [5]]) y = np.array([2, 4, 5, 4, 5])

Fit linear regression model

model = LinearRegression() model.fit(X, y)

print(“Coefficient:”, model.coef_) print(“Intercept:”, model.intercept_)