Define a general linear $\tilde\beta$ estimator of true $\beta$,
\begin{align*}\tilde\beta &= Ay\tag{1a}\end{align*}
given this generalisation employing regressor matrix X, we can sumarise $\widehat\beta$ as a specific case of $\tilde\beta$ with
\begin{align*}
A &= [ C+ (X’X)^{-1} X’]\tag{1b}\\
\tilde\beta = \widehat{\beta} &= [ C+ (X’X)^{-1} X’] y \; \tag{1c}
\end{align*}
such that with $C=0$, $\tilde\beta$ = $\widehat \beta$ and $A = (X’X)^{-1}X’$. In terms of the error,
\begin{align*}
\widehat{\beta} &= \beta + [C+ (X’X)^{-1}X’]u \quad\\
\Rightarrow\widehat{\beta} &= \beta + {A}u. \quad(C=0)\end{align*}
all this implies that with $C=0$, the variance
\begin{align*}
\text{var}&( \tilde\beta )\\
&= \mathsf{E} \bigl[ ( \tilde{\beta} – \beta) ( \tilde{\beta} – \beta)’ \bigr] = E \bigl[ ( \widehat{\beta} – \beta) ( \widehat{\beta} – \beta)’ \bigr]\\
&= E \bigl[ (X’X)^{-1} X’ u \quad {(X’X)^{-1}X’u}’ \bigr] \\
&= E \bigl[ {A} u \quad ({A}u)’\bigr] = \mathsf{E} \bigl[ {A} u \quad {u}'{A}’ \bigr]\\
&= \sigma^2 A A’
\end{align*}
now back to the general case of $\tilde\beta$. [alt. Greene notation] with ${A} ={C+(X’X)^{-1}X’} \tag{2}$
such that $\tilde\beta = { Cy+(X’X)^{-1}X’y}\tag{3}$
and $\tilde\beta – \widehat\beta = { Cy }\tag{4}$
[Greene (2): ${C = D + \dots},$ (4): ${Dy = b_0 – b}$ ]
Given that ${ y} = {X \beta+ u},$ from (1a) it follows}
$$\tilde\beta = AX \beta + Au$$
For a stochastic ${X}$, zero bias requires
$$E[({Ay}|X)] = \textsf{E}[({AX \beta + Au}) | X)] = \beta.$$
Linear Regression assumptions tell us
$$E({u} | X) = 0 \Rightarrow E({AX \beta})|X = \beta$$
But this can only be the case if and only if for all $\beta$,
\begin{equation*}
{AX} = {I}\quad (k \times k) \; \therefore \text{from (2), }{CX} = {0}.
\end{equation*}
Gauss Markov Proof – Part 3
(see preceeding videos for proof parts 1 & 2)
Similarly if we make use of the common assumption that ${X}$ is non-stochastic and define ${C}$ as an arbitrary non-stochastic $ k \times n$ matrix, then we have an alternative route to this strong restriction on matrix $C$. Substitute for ${y}$ in (3)
\begin{align*}
\tilde\beta = { CX\beta + (X’X)^{-1}X’X\beta +\\
(Cy+(X’X)^{-1}X’)u}\\
= { CX\beta +\beta + (Cy+(X’X)^{-1}X’)u}
\end{align*}
taking expectations of both sides and again remembering the key underlying assumption of the LR model
$$E(u) = 0 \quad \text{giving} \quad E(\tilde\beta) = { CX\beta +\beta}$$
To ensure for zero bias, $\textsf{E}(\tilde\beta)$ must equal $\beta$. Therefore, ${CX} = {0}$. (Shown for both stochastic and non-stochastic X). Revisiting $\text{var}(\tilde\beta)$ for the case where $C\neq0$, we simply substitute the entire expression for $A$ (page 12-1):
\begin{align*}
\text{var}\tilde\beta &\\
&=\sigma^2 ( C+ (X’X)^{-1} X’)( C+ (X’X)^{-1} X’)’\\
&=\sigma^2 ( C+ (X’X)^{-1} X’)( C’+ [(X’X)^{-1} X’]’)\\
&=\sigma^2 ( CC’+ (X’X)^{-1} X’ C’+ C[(X’X)^{-1} X’]’ \\
&=\underbrace{+(X’X)^{-1} X’ \quad X}_{\text{=1}}{(X’X)^{-1}}’\\
&=\sigma^2 ( CC’+ \underbrace{(X’X)^{-1} (CX)’+ CX{(X’X)^{-1}}’}_{\text{CX=0: +0+0 = 0}}\\
&+(X’X)^{-1})
\end{align*}
$(X’X)^{-1}$ is a symmetric matrix hence
\begin{align*}(A’)^{-1} &= (A^{-1})'{(X’X)^{-1}}’\\
&= {(X’X)’}^{-1}\\
(AB)’&= B’A’ \\
\therefore
{(X’X)^{-1}}’ &= {X’X}^{-1}\\
\therefore \qquad \text{var}( \tilde\beta )&= \sigma^2 CC’+ \sigma^2 (X’X)^{-1}\quad
\end{align*}QED
Now since ${y = X\beta + u}$,
$${Cy = CX\beta + Cu}$$
and with ${CX} = {0}$
$${Cy = Cu}$$
reverting back to (4), $\tilde\beta$ and $\widehat\beta$ must each converge to the true value $\beta$ in order to be unbiased. This implies that the conditional mean of $\tilde\beta – \widehat\beta$ is zero. We see this again by noting that (4)
\begin{equation*} \tilde\beta – \widehat\beta = { Cu }\end{equation*}
and by recalling that $ \textsf{E}({u} | X) = 0.$
We confirm that the covariance matrix of $\tilde\beta – \tilde\beta$ and is zero:
\begin{align*}
\mathsf{E} \bigl[ ( \widehat{\beta} – \beta) ( \tilde{\beta} – \widehat\beta)’ \bigr] =& \text{var}(\widehat{\beta}) \\
= &\mathsf{E} \bigl[ (X’X)^{-1} X’ u \; u’C’ \bigr] \\
= &(X’X)^{-1} X’ \sigma^2 {I C}’\\
= &\sigma^2 (X’X)^{-1} X’ {I C}’\\
= &0 ({CX=0})
\end{align*}
Implication: $\tilde\beta = \widehat\beta$ + random component ${Cy}$ which has zero mean and is uncorrelated with $\widehat\beta$. The random component simply adds noise to the efficient estimator $\beta$. Therefore,
\begin{align*}
\text{var}( \tilde\beta )=&\; \text{var}[( \widehat{\beta} + (\tilde\beta – \widehat\beta) ]\\
=& \text{var}( \widehat{\beta} + {Cy})\\
= & \text{var}( \widehat{\beta}) + \text{var}({Cy}) + 0
\end{align*}