Gauss-Markov theorem

In statistics, the Gauss–Markov theorem, named after Carl Friedrich Gauss and Andrey Markov, states that in a linear regression model in which the errors have expectation zero and are uncorrelated and have equal variances, the best linear unbiased estimator (BLUE) of the coefficients is given by the ordinary least squares (OLS) estimator, provided it exists. Here "best" means giving the lowest variance of the estimate, as compared to other unbiased, linear estimators. The errors do not need to be normal, nor do they need to be independent and identically distributed (only uncorrelated with mean zero and homoscedastic with finite variance). The requirement that the estimator be unbiased cannot be dropped, since biased estimators exist with lower variance. See, for example, the James–Stein estimator (which also drops linearity) or ridge regression.

Suppose we have in matrix notation,

expanding to,

where $\beta _{j}$ are non-random but unobservable parameters, $X_{ij}$ are non-random and observable (called the "explanatory variables"), $\varepsilon _{i}$ are random, and so $y_{i}$ are random. The random variables $\varepsilon _{i}$ are called the "disturbance", "noise" or simply "error" (will be contrasted with "residual" later in the article; see errors and residuals in statistics). Note that to include a constant in the model above, one can choose to introduce the constant as a variable $\beta _{K+1}$ with a newly introduced last column of X being unity i.e., $X_{i(K+1)}=1$ for all $i$ .

...
Wikipedia