Portfolio Theory

Skip to:
1. How Should a Risk Manager Choose the Portfolio Weights?
2. Riskiness of Asset Returns
3. Riskiness of Portfolio Returns
4. The Basis for Portfolio Analysis
5. Constrained Optimisation
6. Solving the Constrained Problem Using the Lagrange Method
7. The General Case Lagrangian Solution
8. Calculate the Portfolio Variance $\sigma^2_p$

Consider $n$ risky assets, $\{ S_1, S_2, \cdots, S_n \}$ with prices known today, time $t$. Our motivation is to build a portfolio with time horizon:
$$t \rightarrow t + \tau$$
For each asset we define the return
$$r_i = \dfrac{S_i(t+\tau) – S_i (t)}{S_i(t)}$$
We tend to work with returns rather than dollars. Note that
$$S_i(t+\tau) = S_i (t)(1+ r_i) $$

We deploy a strategy to buy or short sell $\alpha_i$ shares in each asset $i$. Our portfolio value at time $t$ will then be given by
$$ \Pi (t) = \sum_{i = 1}^n \alpha_i S_i (t) $$
and at at time $t+\tau$, the portfolio value will then be given by
$$ \Pi (t) = \sum_{i = 1}^n \alpha_i S_i (t)(1+ r_i) $$
The portfolio profit is therefore
$$\Pi (t +\tau) – \Pi (t) = \sum_{i = 1}^n \alpha_i S_i (t)((1+ r_i)-1) $$
$$ = \sum_{i = 1}^n \alpha_i S_i (t) r_i $$
and the portfolio return is
$$r_p = \dfrac{\Pi (t+ \tau) – \Pi (t)}{\Pi (t)} = \dfrac{\sum_{i = 1}^n \alpha_i S_i (t) r_i}{\Pi (t)} $$

where $r_i$, the return on asset $i$ is a random variable. The portfolio weight $w_i$ of asset $i$ is defined as
$$w_i = \dfrac{\alpha_i S_i (t)}{\Pi (t)}, \hspace{0.35in} i = 1, \cdots, n $$which is the ratio of the amount invested in asset $i$ to the total investment.

By definition, portfolio weights $(w_i)^n_{i=1}$ sum to one
$$\sum_{i = 1}^n w_i = \dfrac{\sum_{i = 1}^n \alpha_i S_i (t) r_i }{\Pi (t)} = \dfrac{\Pi (t) }{\Pi (t)} = 1 $$
and we have a set of feasible portfolio weights $w_1, \cdots, w_n$ all of which by the above relation are potential feasible portfolio weights.

1. How Should a Risk Manager Choose the Portfolio Weights

Given a target return $r_p$ for a portfolio, how do we choose the weights of the constituent assets such that
$$r_p = \sum_{i = 1}^n w_i r_i$$
It might be useful to revisit some probability theory at this point.

2. Riskiness of Asset Returns

From the simple relation above it is clear that a portfolio manager will seek to know how the returns $r_i$ vary with respect to a mean value say $\mu_i$. For each of our returns $\{r_1, r_2 \cdots, r_n\}$ we say that
$$\mathbb{E} \bigl[ r_i \bigr] = \mu_i \quad i = 1, \cdots, n $$such that $\mu_i$ measures the potential reward or the expected return on asset $i$. Of interest would be how much the asset returns $r_i$ can vary above the mean asset return $\mu_i$
$$r_i – \mu_i$$which is

  • large if $r_i$ fluctuates wildly around $\mu_i$
  • small if $r_i$ sticks closely to $\mu_i$

As returns can be negative, a loss $l_i$, the expectation (mean) of the square of the variation from the mean is more interesting. This leads to the variance of asset returns
$$\sigma_i^2 = \mathbb{E} \bigl[ (r_i – \mu_i)^2 \bigr]$$
where square root variance $\sigma_i$ is the volatility of returns of the asset. The volatility and variance quantify the riskiness of asset $i$. In order to capture the influence assets within a portfolio have on each other, we measure the covariance of returns between pairs of assets $i$ and $j$
$$\sigma_{ij} = \mathbb{E} \bigl[ (r_i – \mu_i)(r_j – \mu_j) \bigr]$$
Covariance may be conveniently scaled by the volatility of the pair of assets. This scaling is defined as correlation
$$\rho_{ij} = \dfrac{\sigma_{ij}}{\sigma_{i}\sigma_{j}}$$It can be shown that $-1 \leq \rho_{ij} \leq 1 $. The inference of this is as follows

  • if $\rho = 0$, assets are uncorrelated
  • if $\rho = 1$, assets are highly positively correlated and tend to move in the same direction
  • if $\rho = -1$, assets are highly negatively correlated and tend to move in opposing directions

3. Riskiness of Portfolio Returns

Now that we have established a means of measuring the riskiness of asset returns, we can look more closely at the potential reward of a portfolio, the expected portfolio returns. Once again the portfolio return
$$r_p = \sum_{i = 1}^n w_i r_i = w_1 r_1 + w_2 r_2 + \cdots + w_n r_n$$ the potential reward, the expectation of portfolio returns
\begin{align}\mathbb{E} \bigl[r_p \bigr] &= \mathbb{E} \bigl[ w_1 r_1 + w_2 r_2 + \cdots + w_n r_n \bigr] = \mu_p\\
&= w_1 \mathbb{E} \bigl[ r_1 \bigr] + w_2 \mathbb{E} \bigl[ r_2 \bigr] + \cdots + w_n \mathbb{E} \bigl[ r_n \bigr]\\
&=\sum_{i = 1}^n w_i \mu_i
\end{align}We can express this using vectors. Define the vector of returns, the vector of expected asset returns and weight vector respectively
$$\mathbf{r} = \left[\begin{array}{c}r_1 \\r_2 \\ \vdots \\r_n\end{array}\right], \quad \mathbf{e} = \mathbb{E} \bigl[ \mathbf{r} \bigr] = \left[\begin{array}{c}\mu_1 \\\mu_2 \\ \vdots \\\mu_n\end{array}\right]\quad \text{and }\quad \mathbf{w} = \left[\begin{array}{c}w_1 \\w_2 \\ \vdots \\w_n\end{array}\right]$$from linear algebra we know that the inner product of two vectors $\mathbf{w}$ and $\mathbf{e}$ is
$$\mathbf{w}^T\mathbf{e} = \sum_{i = 1}^n w_i \mu_i = \mu_p $$ the expected portfolio return. Similarly, the inner product of $\mathbf{w}$ and $\mathbf{r}$ is
$$\mathbf{w}^T\mathbf{r} = \sum_{i = 1}^n w_i r_i = r_p $$ the return on the portfolio such that,
$$r_p – \mu_p = \mathbf{w}^T(\mathbf{r} – \mathbf{e})$$is the excess portfolio return. As we did with a single asset $i$ above, we can now quantify the riskiness of a portfolio $p$. As before, variance is the riskiness measure of interest, but this time we require the portfolio variance
$$\sigma_p^2 = \mathbb{E} \bigl[(r_p – \mu_p)^2 \bigr]$$The variance of a portfolio’s returns arises from the assets in the portfolio. As in the case of a single asset $i$ there is the variance $\sigma_i^2$ of the asset. Considering a portfolio comprised of an additional asset $j$, the covariances $\sigma_{ij}$ and $\sigma_{ji}$ will need to be considered, as will the variance $\sigma_j^2$ of asset $j$. The covariance may be zero but we should not automatically assume so.
$$\sigma_{ij} = \mathbb{E} \bigl[ (r_i – \mu_i)(r_j – \mu_j) \bigr] = \sigma_{ji}$$
For an entire portfolio $p$
$$\sigma_p^2 = \mathbb{E} \bigl[(r_p – \mu_p)(r_p – \mu_p)^T \bigr] = \sum_{i=1}^n w_i^2 \sigma_i^2 + \sum_{i \neq j}w_i w_j \sigma_{ij} $$
or by taking out the weight factor common to returns and expected returns
$$\sigma_p^2=\mathbf{w}^T\mathbb{E} \bigl[(\mathbf{r} – \mathbf{e})(\mathbf{r} – \mathbf{e})^T \bigr]\mathbf{w}$$
where $(\mathbf{r} – \mathbf{e})(\mathbf{r} – \mathbf{e})^T$ is an $n \times n$ matrix whose $(ij)^{th}$ element is
$$(r_i – \mu_i)(r_j – \mu_j)\qquad [1 \leq i,j \leq n ]$$We can define the expectation of this matrix as $V$, the covariance matrix holding all the covariance information between pairs of assets such that the portfolio variance becomes
$$\sigma_p^2=\mathbf{w}^T \mathbf{V} \mathbf{w}$$

4. The Basis for Portfolio Analysis

To choose portfolio weights, we must remember that the weights add up to 1
$$ \sum_{i = 1}^n w_i = 1$$or algebraically,
$$\mathbf{w}^T \mathbf{1} = \sum_{i = 1}^n w_i \cdot 1 = 1$$
we say that $\mathbf{w}$ is a feasible vector and using this as a portfolio weight vector, the potential reward
$$\mu_p = \mathbf{w}^T\mathbf{e}$$
and the risk associated with this reward is
$$\sigma_p^2=\mathbf{w}^T \mathbf{V} \mathbf{w}$$
Clearly a portfolio manager desires to maximise $\mu_p$ and minimise $\sigma_p^2$ in order to optimise the portfolio of assets. Practitioners including some regulators of public utilities typically fix a desired level of expected return (for example $\mu_p = 8.4\%$). So the manager must seek a $\mathbf{w}$ that achieves the desired portfolio return $\mu_p$ while keeping the risk measured by variance $\sigma_p^2$ at a minimum. This $\mathbf{w}$ optimises the portfolio. Let us suppose that this sought after feasible weight vector takes a value $\mathbf{w^*}$ which is known as a stationary point.

From simple one dimensional calculus we know that for a function $f(x)$ to be a minimum at $x^*$ its

  • First derivative with respect to $x$ must equal zero
  • Second derivative with respect to $x$ must be positive

when evaluated at $x^*$. In our vector context, the same applies for the minimum of $\sigma_p^2 \equiv f(\mathbf{w})$. First derivative vector
$$\nabla f(\mathbf{w^*}) \equiv \left[\begin{array}{c}\dfrac{\partial{f} (\mathbf{w^*})}{\partial{w_1}} \\ \dfrac{\partial{f} (\mathbf{w^*})}{\partial{w_2}} \\ \vdots \\ \dfrac{\partial{f} (\mathbf{w^*})}{\partial{w_n}}\end{array}\right] = \left[\begin{array}{c}0 \\ 0 \\ \vdots \\ 0 \end{array}\right]$$
and Hessian $n \times n$ matrix of second derivatives
$$\nabla^2f(\mathbf{w^*}) \equiv \left[\begin{array}{cccc}\dfrac{\partial^2{f} (\mathbf{w^*})}{\partial{w_1^2}} & \dfrac{\partial^2{f} (\mathbf{w^*})}{\partial{w_1}\partial{w_2}} & \cdots & \dfrac{\partial^2{f} (\mathbf{w^*})}{\partial{w_1}\partial{w_n}}\\ \dfrac{\partial^2{f} (\mathbf{w^*})}{\partial{w_2}\partial{w_1}} & \dfrac{\partial^2{f} (\mathbf{w^*})}{\partial{w_2^2}} & \cdots & \dfrac{\partial^2{f} (\mathbf{w^*})}{\partial{w_2}\partial{w_n}}\\ \vdots & \vdots & \ddots & \vdots \\ \dfrac{\partial^2{f} (\mathbf{w^*})}{\partial{w_n}\partial{w_1}} & \dfrac{\partial^2{f} (\mathbf{w^*})}{\partial{w_n}\partial{w_2}} & \cdots & \dfrac{\partial^2{f} (\mathbf{w^*})}{\partial{w_n^2}} \end{array}\right]$$
This leads to the question:

what is the matrix analogue of $ \dfrac{\partial^2{f} (\mathbf{w^*})}{\partial{w^2}} > 0$?

Answer:

A symmetric matrix $A$ is positive definite if $\mathbf{w}^T \mathbf{A} \mathbf{w} >0, \; \mathbf{w} \neq 0.$

We can conclude that $\mathbf{w^*}$ is a local minimum of $f(\mathbf{w})$ if $$\nabla f(\mathbf{w^*}) = \mathbf{0}$$ and $$\nabla^2f(\mathbf{w^*}) $$ is positive definite.

5. Constrained Optimisation

The objective function of primary concern is
$$\sigma_p^2 \equiv f(\mathbf{w}) =\mathbf{w}^T \mathbf{V} \mathbf{w}$$
this is a vector analogue of one dimensional $f(w)=vw^2$, of which the first derivative $f'(w)$ is $2vw$ and trivially $f'(w)$ is zero, and therefore a minimum, if $w=0$. Similarly our function $f(\mathbf{w})$ is minimized when $\mathbf{w = 0}$. This leaves the question of whether the second derivative $\nabla^2f(\mathbf{w})=\mathbf{2V}$, the vector analogue of $f”(w) = 2v$, is positive definite. Well recall that
$$\sigma_p^2 =\mathbf{w}^T \mathbf{V} \mathbf{w}= \mathbb{E} \bigl[(r_p – \mu_p)^2 \bigr]$$
The expectation of a square is always positive. Hence as long as $\mathbf{w \neq 0}$, $\mathbf{V}$ is always positive. The second derivative $\nabla^2f(\mathbf{w})=\mathbf{2V}$ is therefore always positive as long as $\mathbf{w \neq 0}$.

We can explicitly state therefore, that

$\mathbf{w = 0}$ does not satisfy the requirement for a positive definite matrix $\nabla^2f(\mathbf{w})$, the second derivative of the objective function. Only $\mathbf{w} \neq 0$ satisfies this requirement, but it must be satisfied

while being feasible,

$$\mathbf{w}^T \mathbf{1} = 1 \quad \Rightarrow \quad \mathbf{w}^T \mathbf{1} -1 = 0$$
(we say that $\mathbf{w}$ is a feasible vector)

and while attaining the desired return

$$\mu_p = \mathbf{w}^T\mathbf{e}\quad \Rightarrow \quad \mathbf{w}^T\mathbf{e} \;- \mu_p = 0 $$

We therefore have a constrained optimisation problem. This brings us to the mathematics of constrained optimisation using Lagrange multipliers.

6. Solving the Constrained Problem Using the Lagrange Method

The Lagrange method provides a way to solve constrained problems by adapting them into unconstrained problems. In our case we have two constraints $g_1(\mathbf{w})$ and $g_2(\mathbf{w})$. We seek to minimize our original funtion
$$f(\mathbf{w}) =\mathbf{w}^T \mathbf{V} \mathbf{w}$$ subject to
$$g_1(\mathbf{w}): \quad \mathbf{w}^T \mathbf{1} -1 = 0$$
$$g_2(\mathbf{w}): \quad \mathbf{w}^T\mathbf{e} \;- \mu_p = 0 $$
With two constraints, introduce two new parameters $\lambda_1$, $\lambda_2$ to an adapted objective function, the Lagrangian function which is unconstrained. For algebraic convenience, the objective function to minimize is written as
$$f(\mathbf{w}) =\frac{1}{2}\mathbf{w}^T \mathbf{V} \mathbf{w}$$
The $\frac{1}{2}$ does not change or affect the solution as $f(\mathbf{w})$ is minimised by the same $\mathbf{w}$ either way. The neater first and second derivatives will result in much neater first order Lagrangian conditions and equations as we shall see.

We therefore minimize the unconstrained Lagrangian function
$$\mathcal{L} (\mathbf{w},\lambda_1, \lambda_2) = f(\mathbf{w}) – \lambda_1 g_1(\mathbf{w}) – \lambda_2 g_2(\mathbf{w}).$$ Suppose the unconstrained solution, the values of $\mathbf{w}, \lambda_1, \lambda_2$ that minimize $\mathcal{L} (\mathbf{w},\lambda_1, \lambda_2)$ are $\mathbf{w^*}, \lambda_1^*, \lambda_2^*$. Then, the $\mathbf{w}$ component of the solution set is precisely the solution to the constrained problem of minimizing $f(\mathbf{w})$ subject to $g_1(\mathbf{w})$ and $g_2(\mathbf{w})$.

7. The General Case Lagrangian Solution

Lagrangian Function:
$$\mathcal{L} (\mathbf{w},\lambda_1, \lambda_2) = \frac{1}{2}\mathbf{w}^T \mathbf{Vw} – \lambda_1 (\mathbf{w}^T \mathbf{1} -1) – \lambda_2 (\mathbf{w}^T\mathbf{e} \;- \mu_p ).$$
First Order Conditions:
\begin{align}\nabla \mathcal{L}(\mathbf{w^*}, \lambda_1^*, \lambda_2^*) &= \mathbf{0}\\ \\
\Rightarrow \; \nabla_{\mathbf{w}}\mathcal{L}(\mathbf{w^*}, \lambda_1^*, \lambda_2^*) &= \mathbf{Vw^*}\; – \lambda_1^* \mathbf{1}\; – \lambda_2^* \mathbf{e} = \mathbf{0} \tag{7.1}\\ \\
\dfrac{\partial \mathcal{L}}{\partial{\lambda}_1} &= -(\mathbf{w^*}^T \mathbf{1} -1)= \mathbf{0}\tag{7.2}\\ \\
\dfrac{\partial \mathcal{L}}{\partial{\lambda}_2} &= – (\mathbf{w^*}^T\mathbf{e} \;- \mu_p )= \mathbf{0}\tag{7.3}
\end{align}
Now solve for $\mathbf{w^*}, \lambda_1$ and $\lambda_2$. Properties of symmetric $n\times n$ invertible matrices and their determnants will come in handy.

• Symmetric: $A^T = A$
• Invertibility: If $A$ is invertible, $A^{-1}$ is also invertible.
• If symmetric and invertible,
$$AA^{-1} = I_n = A^{-1}A$$ where $I_n$ is the identity matrix.

From 7.1,
\begin{align}
\mathbf{Vw^*} &= \lambda_1^* \mathbf{1}\; + \lambda_2^* \mathbf{e} \\ \\
\Rightarrow \; \mathbf{w^*} &= \lambda_1^* V^{-1} \mathbf{1}\; + \lambda_2^* V^{-1} \mathbf{e}\tag{7.4}
\end{align}
substituting $7.4$ into constraints $7.2$ and $7.3$ noting that $\mathbf{w^*}^T\mathbf{1} \equiv \mathbf{1}^T \mathbf{w^*} = 1$
\begin{equation}\lambda_1^* \mathbf{1}^T V^{-1} \mathbf{1}\; + \lambda_2^* \mathbf{1}^T V^{-1} \mathbf{e} = 1 \tag{7.5}\end{equation}
and noting that $\mathbf{w^*}^T\mathbf{e} \equiv \mathbf{e}^T\mathbf{w^*}=\mu_p$,
\begin{align}
\lambda_1^* \mathbf{e}^T V^{-1} \mathbf{1}\; + \lambda_2^* \mathbf{e}^T V^{-1} \mathbf{e} = \mu_p \tag{7.6}\end{align}
solving $7.5$ and $7.6$ simultaneously gives $\lambda_1$ and $\lambda_2$. Note that because $V^{-1}$ is symmetric, $\mathbf{e}^T V^{-1}\mathbf{1} = \mathbf{1}^T V^{-1} \mathbf{e}$. We therefore have a matrix system:

$$\left[\begin{array}{cc}\mathbf{1}^T V^{-1} \mathbf{1} & \mathbf{1}^T V^{-1} \mathbf{e} \\ \mathbf{e}^T V^{-1} \mathbf{1} & \mathbf{e}^T V^{-1} \mathbf{e}\end{array}\right]\left[\begin{array}{c}\lambda_1^* \\ \lambda_2^* \end{array}\right] \equiv \left[\begin{array}{cc}C & A \\ A & B \end{array}\right]\left[\begin{array}{c}\lambda_1^* \\ \lambda_2^* \end{array}\right] = \left[\begin{array}{c}1 \\ \mu_p \end{array}\right]$$
The determinant of the $2 \times 2$ matrix being $D = BC – A^2$ allows us to determine its inverse $$\frac{1}{D}\left[\begin{array}{cc}B & -A \\ -A & C \end{array}\right]$$
Hence,
$$\left[\begin{array}{c}\lambda_1^* \\ \lambda_2^* \end{array}\right] = \frac{1}{D}\left[\begin{array}{cc}B & -A \\ -A & C \end{array}\right] \left[\begin{array}{c}1 \\ \mu_p \end{array}\right]$$
$\Rightarrow$
$$ \lambda_1^*= \frac{1}{D}(B -A\mu_p)$$
$$ \lambda_2^* = \frac{1}{D}(C \mu_p – A)$$
substitute these values of $ \lambda_1^*$ and $ \lambda_2^*$ into $7.4$ to obtain $\mathbf{w^*}$ the vector of optimal weights
\begin{align}\mathbf{w^*} &= \frac{1}{D}(B -A\mu_p) V^{-1} \mathbf{1}\; + \frac{1}{D}(C \mu_p – A) V^{-1} \mathbf{e}\\ \\
&= \frac{1}{D}\Biggl[BV^{-1}\mathbf{1} \; – A V^{-1}\mathbf{e} \Biggr] + \frac{1}{D}\Biggl[C V^{-1}\mathbf{e} -AV^{-1} \mathbf{1} \Biggr]\mu_p\\ \\
& = \mathbf{g} + \mathbf{h}\mu_p
\end{align}
where $g$ and $h$ are constant column vectors that can be constructed from the initial mean and covariance information.

8. Calculate the Portfolio Variance $\sigma^2_p$

\begin{align}
\sigma_p^2 &=\mathbf{w}^T \mathbf{V} \mathbf{w}\\ \\
&= \mathbf{w}^T V \Biggl[\frac{1}{D}\Bigl(BV^{-1}\mathbf{1} \; – A V^{-1}\mathbf{e} \Bigr) + \frac{1}{D}\Bigl(C V^{-1}\mathbf{e} -AV^{-1} \mathbf{1} \Bigr)\mu_p \Biggr]\\ \\
& \text{recall } VV^{-1} = I_n \; ,\\ \\
&= \frac{1}{D} \mathbf{w}^T \Biggl[B\mathbf{1} \; – A \mathbf{e} + \mu_p(C \mathbf{e} \; -A \mathbf{1}) \Biggr]
\end{align}
But $\mathbf{w}^T \mathbf{1} = 1$ and $\mathbf{w}^T \mathbf{e}=\mu_p$, therefore
\begin{align}
\sigma_p^2 &= \frac{1}{D} (B\; – A\mu_p + C\mu_p^2 – A\mu_p) \hspace{2in}
\blacksquare
\end{align}
Now, multiplying the variance by $\frac{C}{C}$
\begin{align}\sigma_p^2 &= \frac{C}{D} (\frac{B}{C}\; – \frac{A}{C}\mu_p + \mu_p^2 – \frac{A}{C}\mu_p)\\ \\
&= \frac{C}{D}(\frac{B}{C}\; – 2\frac{A}{C}\mu_p + \mu_p^2 )\\ \\
\text{and completing the square, }\qquad & \\ \\
& = \frac{C}{D}\Biggl[\Biggl(\mu_p – \frac{A}{C} \Biggr)^2 – \frac{A^2}{C^2} + \frac{B}{C} \Biggr]\\ \\
& = \frac{C}{D}\Biggl[\Biggl(\mu_p – \frac{A}{C} \Biggr)^2 + \frac{BC- A^2}{C^2} \Biggr]\\ \\
& D = BC – A^2, \\ \\
& = \frac{C}{D}\Biggl[\Biggl(\mu_p – \frac{A}{C} \Biggr)^2 + \frac{D}{C^2} \Biggr]\\ \\
\sigma_p^2 & = \frac{C}{D}\Biggl(\mu_p – \frac{A}{C} \Biggr)^2 + \frac{1}{C}\\ \\
\sigma_p^2 C \; – \; \frac{C^2}{D}\Biggl(\mu_p – \frac{A}{C} \Biggr)^2 & = 1\\ \\
\frac{\sigma_p^2}{\frac{1}{C}} \; – \; \dfrac{\Biggl(\mu_p – \frac{A}{C} \Biggr)^2}{\frac{D}{C^2}} & = 1
\end{align}
A hyperbola with vertex $$\Bigl(\sqrt{\dfrac{1}{C}},\dfrac{A}{C} \Bigr)$$ and asymptotes $$\mu=\pm \sqrt{\frac{D}{C}}+\dfrac{A}{C}.$$This hyperbola is the optimal portfolio frontier. Portfolios that lie on the frontier are optimal. Observe however, that the two branches of the optimal frontier lead to two levels of mean return $\mu$ for a given level of risk $\sigma$. One branch gives a higher return and is therefore efficient. The other lower branch yields a lower return and is therefore inefficient.