Bayes’ Theorem
n events $B_{i}, i=1, 2, \ldots , n$, are mutually exclusive, as shown in the image below. Therefore, $P(B_{i}, B_{j})=0 \ \forall \ i \neq j$.
Considering the entire sample space, we have $\sum_{i=1}^{n}P(B_{i})=1$. Then, the probability of a random event $A$ is expressed as shown in equation [1]:
$=\sum_{i=1}^{n}P(A|B_{i})P(B_{i})$ [1]
Equation [1] is called the total probability theorem.
In order to derive Bayes’ theorem, we should understand conditional probability. The conditional probability is:
$=\frac{P(A \mid B_{i})P(B_{i})}{P(A)}$ [2]
If we substitute the law of total probability into equation [2], we get:
$\frac{P(A \mid B_{i})P(B_{i})}{\sum_{i=1}^{n}P(A \mid B_{i})P(B_{i})}$ [3]
Equation [3] is called the Bayes’ theorem.
It can also be expressed using the probability density function as shown in equation [4].
$=\frac{p_{Y \mid X}(y \mid x)p_{X}(x)}{\int_{-\infty}^{\infty}p_{Y \mid X}(y \mid x)p_{X}(x)dx}$ [4]
$= \frac{p_{XY}(x,y)}{p_{Y}(y)}$where $p_{X}(x)$ is a prior probability density function, $p_{X \mid Y}(x \mid y)$ is a posterior probability density function.
When events and random variables are mixed, Bayes’ theorem is given by equation [5].
$P(B_{i} \mid y) = \frac{p_{Y \mid B_{i}}(y \mid B_{i})P(B_{i})}{\sum_{j=1}^{n}p_{Y \mid B_{j}}(y \mid B_{j})P(B_{j})}$ [5]