# Ordinary Least Squares (OLS) Derivation

## Understanding Ordinary Least Squares (OLS)

The Ordinary Least Squares (OLS) method is a statistical method for estimating the parameters of a linear regression model.

It is one of the most commonly used methods in data science and machine learning. OLS is based on the concept of minimizing the sum of squared errors between observed values and predicted values. The derivation of OLS involves taking partial derivatives with respect to each parameter, setting them equal to zero, and solving for the parameters in terms of other variables. This process can be used to find an optimal solution for any linear regression problem.

By using this method, we can find an optimal solution that minimizes the sum of squared errors between observed values and predicted values. In the context of linear regression, OLS is used to estimate the parameters of a model. In this case, the model is a line that represents an observed point in time and its corresponding predicted value for a selected variable.

If ƒ is any function relating x and y, then ƒ(x) = α + βx: ƒ(y) = α + βy. The term “ordinary least squares” comes from the fact that it minimizes squared errors taken as observations minus predictions.

## Ordinary Least Square (OLS) Derivation

Consider a linear regression model with one independent variable: Yi = α̂ + β̂Xi + ei where Yi is the response variable, Xi is the independent variable, alpha hat (α̂) and beta hat (β̂) are the coefficients, and ei is the error term.

The model is given by:

Yi = α̂ + β̂Xi + ei

Where Yi is the response variable, Xi is the independent variable, α̂ and β̂ are the coefficients to be estimated, and ei is the error term.

To estimate α̂ and β̂ using the OLS method, we want to minimize the sum of squared errors (SSE) between the predicted values Ŷi and the actual values Yi:

SSE = (Yi – Ŷi)2

Where n is the number of observations.

The predicted values Ŷi are given by:

Ŷi = α̂ + β̂Xi

To find the values of α̂ and β̂ that minimize SSE, we take the partial derivatives of SSE with respect to α̂ and β̂ and set them equal to zero:

```∂SSE/∂α̂ = -2  (Yi - α̂ - β̂Xi) = 0
∂SSE/∂β̂ = -2  Xi(Yi - α̂ - β̂Xi) = 0```

Solving these equations simultaneously, we obtain the following OLS estimators for α̂ and β̂:

α̂ = Ȳ – β̂X̄

β̂ = (Xi – X̄)×(Yi – Ȳ) / (Xi – X̄)2

where X̄ and Ȳ are the sample means of X and Y, respectively.

Thus, the OLS estimates for α̂ and β̂ are given by:

α̂ = Ȳ – β̂X̄

β̂ = (Xi – X̄)*(Yi – Ȳ) / (Xi – X̄)2

where Ȳ is the sample mean of Y, X̄ is the sample mean of X, and n is the number of observations.

Once we have estimated α̂ and β̂, we can use them to make predictions for new values of X by plugging them into the equation:

Ŷ = α̂ + β̂X

### Properties of least square estimators using Gauss-Markov Theorem

The Gauss-Markov theorem provides a set of conditions under which the Ordinary Least Squares (OLS) estimator is the Best Linear Unbiased Estimator (BLUE). The properties of the OLS estimator can be summarized as follows:

1. Unbiasedness: The OLS estimator is unbiased, which means that it has an expected value that is equal to the true population parameter. In other words, on average, the OLS estimator produces estimates that are close to the true values of the parameters being estimated.
2. Efficiency: Among all linear unbiased estimators, the OLS estimator has the smallest variance. This means that it produces estimates that are more precise than any other linear unbiased estimator and achieves the smallest possible mean squared error.
3. Consistency: As the sample size n approaches infinity, the OLS estimator approaches the true population parameter with probability 1. In other words, as the sample size becomes larger, the OLS estimator becomes more and more accurate.
4. Normality: Under certain assumptions, the OLS estimator is normally distributed. This is useful for constructing confidence intervals and hypothesis tests for estimated parameters.

The Gauss-Markov theorem provides conditions under which the OLS estimator satisfies these properties. The conditions include:

1. Linearity: The regression model is linear in the parameters.
2. Strict exogeneity: The error term has a zero mean and is uncorrelated with the independent variables.
3. No perfect multicollinearity: There is no perfect linear relationship among the independent variables.
4. Homoscedasticity: The error term has constant variance.
5. Normality: The error term is normally distributed.
6. Sample size: The sample size is sufficiently large.

If these conditions are met, then the OLS estimator is the Best Linear Unbiased Estimator (BLUE), meaning it has the smallest variance among all unbiased linear estimators. In other words, the OLS estimator is the most efficient and precise estimator of the population parameters in a linear regression model.