In the mid 2000s just before the credit crisis, operational risk (factors beyond your control, such as a rogue trader, an IT system crash) was more in focus and an alternative method for determination of and capturing risk was being explored. The theory has been more successfully explored an applied in natural sciences to mitigate against natural disasters such as in flood defences, dykes, dams and guiding town planners in specifying the height of a dam for example.

Hydrologists seek to determine a dam height which will not be breached. In finance, the equivalent of a dam breach is not a loss occuring three times a year. It is hard to imagine that in the case of dam. Consider that VaR$_\alpha$ is often calculated for an $\alpha = 0.99$, or $0.95$ confidence level or somewhere in between, which captures losses that occur 1% or 5% of the time every 100 days if not more frequently as required by some regulators. This is not really a rare event. A stock market does not lose 25% of its value every 100 days or two and a half times in the trading calendar year. But it could happen once every 10years. It becomes somewhat questionable why the entire distribution needs to be modelled. Extreme Value Theory focuses therefore on what happens in the tail of the distribution.

Because this risk management method is focused on rare events, there is no requirement from the regulator to calculate it as often as say VaR. There have been some interesting projects using this model where financial data was applied to the model to see when and where these historical extreme threshold violations occured.

EVT is used to determine the buffer fund which needs to be available in the rare event of a crash for example (analgous to the height of the dam in the hydrology example).

The motivation for the method arises from the central limit theorem CLT. Consider a sequence of $iid$ variables (random draws in an experiement), the CLT focuses on the average as the number of draws increases substantially. In terms of a distribution, the CLT focuses on the central part of the distribution. Keep in mind that for a plain average

$$ S_n = \dfrac{1}{n} \sum_{k = 1}^n X_t \quad , \quad n \geq 1 $$

in the limit, the average $S_n$ converges. The CLT focuses on averages of the $X_t$ variabes which have a common distribution. But with EVT we are not interested in the averages but the central part of the distribution, from which we seek the minimum or maximum of a sequence of random variables as they evolve through time. The averages of the sequence of random variables, as they stand, do not help in any meaningful way. However if the sequence average $S_n$ is shifted and scaled then we obtain something more useful. We therefore define the following

SHIFT sequence(n) : $a_n$
SCALE sequence(n): $b_n$

such that

$$ \dfrac{S_n \; – \; a_n}{b_n} \tag{1}$$

is a new random variable indexed by $n \geq 1$, with $a_n \in \mathbb{R}$ and $b_n > 0$.

Skip to:
1. $F(x)$, without being shifted and scaled, is a degenerate distribution
2. Characteristics of The Probability Distribution of Extremely Rare Events

Now let us recall the central limit theorem in this context. For a random variable $S_n$ of mean $\mu = \mathbb{E}S_n$ and finite variance $\sigma^2 = \mathbb{E} [ ({S_n \; – \; \mu} )^{2}] < \infty$ the CLT stipulates

$$ \lim_{n\rightarrow \infty} \sqrt{n} \dfrac{S_n \; – \; \mu}{\sigma} \sim N(0, 1) $$

by comparison with $(1)$, we see that by applying the shift and scale sequences($n$) we simply have a standardised variable($n$) obtained in the usual way by subtracting the mean and dividing by the standard deviation

$$ \dfrac{S_n \; – \; a_n}{b_n} \equiv \dfrac{S_n – \mu}{\sigma}$$ and to ensure convergence in distribution convergence in distribution and avoid a degenerate distribution as $n$ increases, we multiply by $\sqrt{n}$ and obtain/define new variable $A_n$

$$ A_n = \sqrt{n} \dfrac{S_n – \mu}{\sigma} \equiv \dfrac{S_n \; – \; a_n}{b_n}$$

It follows that we have defined the shift and scale sequences such that

$$\mu = a_n$$

$$ \dfrac{\sqrt{n}}{\sigma} = \dfrac{1}{b_n} \; \Rightarrow b_n = \dfrac{\sigma}{\sqrt{n}}$$

The question that must be asked of the random variable $A_n$ is, are there shift sequences $a_n$ and scale sequences $b_n$ such that a loss random variable $\mathcal{L}^{(max)}$ when shifted and scaled can reveal a recognisable distribution as $n$ gets large? That is,

$$ \lim_{n\rightarrow \infty} \dfrac{\mathcal{L}_n^{(max)} \; – \; a_n}{b_n} \sim \text{a non-degenerate distribution} $$

where,

$$ \mathcal{L}_n^{(max)} = \text{max} (L_1, L_2, \cdots, L_n).$$

Define the right hand end point of the tail of the maximum loss distribution as a random variable $x_{\text{end}}$, at which the cumulative probability distribution $F(x)$ has accumulated most but not all of its probability mass and is still therefore less than 1

$$x_{\text{end}} = \text{sup}\{ x \in \mathbb{R} : F(x) < 1 \}$$

This is the point at which all probability would have accumulated. (Analogous to infinity for the Normal CDF). We can then say

\begin{equation*} F(x) \begin{cases} < 1 & \quad \text{ if } x < x_{\text{end}}\\ = 1 & \quad \text{ if } x \geq x_{\text{end}}\\ \end{cases} \end{equation*}
which without being shifted and scaled, is degenerate.

1. $F(x)$, without being shifted and scaled, is degenerate

\begin{align}
F_n^{\text{max}} (x) &= \mathbb{P} \bigl[ L_n^{\text{max}} \leq x\bigr]\\ \\
& = \mathbb{P} \bigl[ \text{max} (L_1, L_2, \cdots, L_n) \leqslant x \bigr]\\ \\
&= \mathbb{P} \bigl[ L_1 \leqslant x \bigr]\cdot \mathbb{P} \bigl[ L_2 \leqslant x \bigr]\cdots \mathbb{P} \bigl[ L_n \leqslant x \bigr]
\end{align}
assuming the losses are independent. Clearly each probability above is a small number $\forall x < x_{\text{end}}$ and their product is even much smaller, tending to zero as $n$ increases. Assuming identical distribution,
$$F_n^{\text{max}} (x) = \bigl( F(x) \bigr)^n$$.
Consequently,
\begin{equation*} \lim_{n\rightarrow \infty} F_n^{\text{max}}(x) = \begin{cases} 0 & \quad \text{ if } x < x_{\text{end}}\\ 1 & \quad \text{ if } x \geqslant x_{\text{end}}\\ \end{cases} \end{equation*}
which is a degenerate distribution.

2. Characteristics of The Probability Distribution of Extremely Rare Events

The shift and scale sequences can be used to establish a key characteristic of any probability distribution that can capture extremely rare events.

\begin{align}
\mathbb{P} \biggl[ \dfrac{\mathcal{L}_n^{(max)} \; – \; a_n}{b_n} \leqslant x \biggr] &= \mathbb{P} \bigl[ \mathcal{L}_n^{(max)} \leqslant a_n + b_n x \bigr] = \mathbb{P} \bigl[ \text{max} (L_1, L_2, \cdots, L_n) \leqslant a_n + b_n x \bigr]\\ \\
&=\biggl(F(a_n + b_n x)\biggr)^n \equiv G(x)
\end{align}

where $G(x)$ is a non-degenerate distribution. Thus far, the theory applies to extreme events in general. There has been no specific restriction to financially extreme events. The next step is to examine a range of known distributions by shifting and scaling them and then analysing their tail characteristics.

3. What happens if the tail of the random loss has the Exponential distribution?
4. What happens if the tail of the random loss has the Pareto distribution
5. What happens if the tail of the random loss has the Uniform distribution?

3. What happens if the tail of the random loss has the Exponential distribution?

\begin{equation*}
\mathbb{P} \bigl[ \mathcal{L}_k \leqslant x \bigr] = \begin{cases} 1 – e^{-x} & \quad x \geqslant 0 \\ 0 & \quad x < 0 \end{cases} \tag{3.1}
\end{equation*}
For the exponential distribution as well as the other two distributions examined for their tail properties, we are going to seek to take advantage of a particular known limit result in deciding how to shift and scale $\mathcal{L}_k$;

$$ e^x = \lim_{n\rightarrow \infty} \biggl(1 + \dfrac{x}{n} \biggr)\quad x\in \mathbb{R} \tag{3.2}$$

as this first distribution that we seek to shift and scale is itself the exponential distribution, there is hardly any manipulation required. It is just useful to remember that
$$e^{\text{ln} x} = x \equiv e^{\text{log}_e x} \equiv e^{\text{log} x} \quad \text{in simplified notation for clarity}$$

recalling that
$$\biggl(F(a_n + b_n x)\biggr)^n \equiv G(x)$$
is the shift and scale sequence we seek to apply to the distribution $F(x)$ which in this case is the exponential distribution, it is fairly easy to see that if we choose $a_n = log(n)$ and $b_n = 1$,
\begin{align}
\mathbb{P} \bigl[ \mathcal{L}_k \leqslant a_n + b_n x \bigr] =\biggl(F(a_n + b_n x)\biggr)^n &= \begin{cases} \bigl(1 – e^{-(a_n + b_n x)}\bigr)^n & \quad x \geqslant 0 \\ 0 & \quad x < 0 \end{cases}
\\ \\
&= \begin{cases} \bigl(1 – e^{-(log(n) + x)}\bigr)^n & \quad x \geqslant -\text{log}(n) \\ 0 & \quad x < – \text{log}(n) \end{cases}\\ \\
&= \begin{cases} \bigl(1 – e^{\text{log}(1/n) – x}\bigr)^n & \quad x \geqslant -\text{log}(1/n) \\ 0 & \quad x < – \text{log}(1/n) \end{cases}\\ \\
&= \begin{cases} \biggl(1 – \dfrac{e^{-x}}{n}\biggr)^n & \quad x \geqslant -\text{log}(1/n) \\ 0 & \quad x < – \text{log}(1/n) \end{cases}\\ \\
&= \begin{cases} \biggl(1 + \dfrac{-e^{-x}}{n}\biggr)^n & \quad x \geqslant -\text{log}(1/n) \\ 0 & \quad x < – \text{log}(1/n) \end{cases}
\end{align}
and by comparing with $(3.2)$ we can conclude that
$$ \lim_{n\rightarrow \infty} \biggl(F(a_n + b_n x)\biggr)^n \equiv G(x) = e^{-e^{-x}}$$
for all $x$.

4. What happens if the tail of the random loss has the Pareto distribution?

\begin{equation*}
\mathbb{P} \bigl[ \mathcal{L}_k \leqslant x \bigr] = \begin{cases} 1 – Ax^{-\alpha} & \quad x > A^{1 / {\alpha}} \\ 0 & \quad \text{otherwise} \end{cases} \tag{4.1}
\end{equation*}

again,
$$\biggl(F(a_n + b_n x)\biggr)^n \equiv G(x)$$
but this time let $a_n = 0$ and $b_n = \bigl( A\cdot n \bigr)^{\frac{1}{\alpha}}$ for all $n \geqslant 1$
\begin{align}
\mathbb{P} \bigl[ \mathcal{L}_k \leqslant a_n + b_n x \bigr] =\biggl(F(a_n + b_n x)\biggr)^n &= \biggl(F(A\cdot n)^{\frac{1}{\alpha}} x \biggr)^n \\ \\
&= \begin{cases} \biggl(1 – A\bigl(\underbrace{ (A \cdot n)^{\frac{1}{\alpha}} x }_{\text{scaled }x (4.1) }\bigr)^{-\alpha} \biggr)^n \quad (A \cdot n)^{\frac{1}{\alpha}}\; x > A^{\frac{1}{\alpha}} \\ \\ 0 \qquad \text{otherwise} \end{cases}
\end{align}
examine the exponents carefully. The $1/ \alpha$ fraction (in the index) internal to the scaling factor coefficient $A\cdot n$ cancels with the $\alpha$ external to the coefficient. The “-” stays (the exponent becomes -1) and the external $\alpha$ remains on $x$. Hence,
\begin{align}
\biggl(F(A\cdot n)^{\frac{1}{\alpha}} x \biggr)^n &= \begin{cases} \biggl(1 \; – A \cdot \dfrac{x^{-\alpha}}{A n}\biggr)^n \qquad x > \dfrac{1}{n^{1/ \alpha}}\\ \\
0 \qquad \text{otherwise}
\end{cases}\\ \\
&= \begin{cases} \biggl(1 \; – \dfrac{x^{-\alpha}}{n}\biggr)^n \qquad x > \dfrac{1}{n^{1/ \alpha}}\\ \\
0 \qquad \text{otherwise}
\end{cases}\\ \\
&= \begin{cases} \biggl(1 \; + \dfrac{- x^{-\alpha}}{n}\biggr)^n \qquad x > \dfrac{1}{n^{1/ \alpha}}\\ \\
0 \qquad \text{otherwise}
\end{cases}
\end{align}
Keep in mind that $x^{-\alpha} = \frac{1}{x^\alpha}$. Also note that $\dfrac{1}{n^{1/ \alpha}} \rightarrow 0 $ as $n \rightarrow \infty$. By comparing with $(3.2)$ we can conclude that

\begin{align}
\biggl(F(A\cdot n)^{\frac{1}{\alpha}} x \biggr)^n \equiv G(x) =
\begin{cases} e^{{-x}^{-\alpha}} \quad x > 0 \\ \\
0 \qquad \text{otherwise}
\end{cases}
\end{align}

5. What happens if the tail of random loss has the Uniform distribution?

\begin{equation*}
\mathbb{P} \bigl[ \mathcal{L}_k \leqslant x \bigr] = \begin{cases} 1 & \quad x \geqslant 1 \\
x & \quad x \in [ 0, 1 ]
\\ 0 & \quad x \leqslant 0
\end{cases}
\end{equation*}
let $a_n = 1$, $b_n = \frac{1}{n}$, $n \geqslant 1$
\begin{align}
\mathbb{P} \bigl[ \mathcal{L}_k \leqslant a_n + b_n x \bigr] =\biggl(F(a_n + b_n x)\biggr)^n
&= \biggl(F \bigl(1 + \frac{x}{n} \bigr)\biggr)^n \\ \\
&= \begin{cases} 1 & \quad 1 + \frac{x}{n} \geqslant 1 \\
\bigl(1 + \frac{x}{n} \bigr)^n & \quad 0 \leqslant \bigl(1 + \frac{x}{n} \bigr) < 1
\\ 0 & \quad \ 1 + \frac{x}{n} \leqslant 0
\end{cases}\\ \\
&= \begin{cases} 1 & \quad x \geqslant 0 \\
\bigl(1 + \frac{x}{n} \bigr)^n & \quad -n < x < 0
\\ 0 & \quad x \leqslant -n
\end{cases}\\ \\
&= \begin{cases} 1 & \quad x \geqslant 0 \\
e^x & \quad x < 0
\end{cases}
\end{align}

Observe that we get a different limiting distribution type depending on the starting distribution of the tail random variable. We think of an attraction as the random variable tends to infinity as the distribution that the starting distribution is attracted to in the limit. This is how we arrive at the conjecture that there are three types of non-degenerate distributions.

6. Fisher–Tippett–Gnedenko theorem or the Extreme Value Theorem

The preceding investigations of the tail properties of particular known distributons leads to the conclusion that there are only three types of non-degenerate extreme (or limiting) distributions of the maximum, these being determined by the thickness of the tail of the starting distrubution. This is the Extreme Value Theorem.

1. Fréchet:
  
  \begin{equation*}
  \Phi_{\alpha} (x) = \begin{cases} e^{-1/x^{\alpha}} & \quad x > 0 \quad \alpha > 0\\ 0 & \quad x \leqslant 0 \end{cases}
  \end{equation*}a good example is the Pareto distribution which has a fat/slowly decaying tails.
2. Gumbel:
  
  $$ \Lambda (x) = e^{-e^{-x}}$$ a good example is the Exponential distribution which has moderately decaying tails.
3. Weibull:
  
  \begin{equation*}
  \Psi_{\alpha} (x) = \begin{cases} 1 & \quad x \geqslant 0 \\ e^{-{(-x)}^{\alpha}} & \quad x < 0 \quad \alpha >0 \end{cases} \end{equation*}the ultimate thin tailed distribution as there are extremely quickly decaying or virtually no tails. A good example is the Uniform distribution. Clearly for financial applications this is of no interest.

So the theorem states,

if the distribution of a normalized maximum converges, then the limit has to be one of a particular class of distributions

as we have seen, a normalised maximum does not always converge. So this theorem’s emphasis is on the if.

7. A Generalised Definition of Non-Degenerate Distributions

The three conjectures can be combined in a general definition for the non-degenerate extreme distribution;

$$H_\xi = \text{exp} \biggl[ – (1 + \xi x)^ {- \frac{1}{\xi}}\biggr]$$

defined when 1 + \xi x > 0. That is

$$ x > \dfrac{1}{\xi} \qquad \text{and zero otherwise}$$

8. Application to a large Financial Loss (u)

Picklands (1975)