Probability
Experiments in which the results are not known in advance are called random experiments, although we can predict what is likely to happen. A set of outcomes from a random experiment is called an event. A sample space is a set of all possible outcomes in a random experiment.
$P(A)$, which means the probability that event $A$ will occur in sample space $S$, is defined by any number that satisfies the 3 axioms:
- Axiom 1 : Probability is always greater than or equal to 0 ($P(A) \geq 0$).
- Axiom 2 : The probability of the sample space is 1 ($P(S)=1$).
- Axiom 3 : In the case of mutually exclusive events A and B, the relationship $P(A \cup B) = P(A) + P(B)$ holds. Mutually exclusive events mean that $A \cap B = \varnothing$, and $\cup$, $\cap$, and $\varnothing$ denote union, intersection, and the empty set, respectively.
Probability is defined by a given case. Therefore, from the three axioms above, we know that $P(\varnothing) = 0$ and $P(A) = 1 - P(\bar{A})$. Here, $\bar{A}$ denotes the complement of event $A$.
Random Variable
A random variable $X \equiv X(e)$ is defined as a function that assigns one real number to each element ($e$) in a sample space. A random variable is written in uppercase, and the value that the random variable takes is written in lowercase.
For example, $X(e) = x$ means that the real number corresponding to the random variable in the random experiment is $x$. In short, it can be written as $X = x$.
The domain of a random variable is the sample space, and the range is $-\infty \leq X \leq \infty$, the entire real number line.
Since an event is a set of elements $e$ resulting from a random experiment, there is a corresponding real number interval $I$ for each event $A$. Therefore, if the probability of event $A$ is $P(A)$, the probability that the random variable $X$ belongs to the real number interval $I$ is $P(X \in I) = P(A)$.
Additionally, if a random variable ($X$) takes discrete values, it is called a discrete random variable; if it takes continuous values, it is called a continuous random variable.
Probability Distribution Function, Probability Density Function and Probability Mass Function
Probability Distribution Function
Since $(X \leq x)$ represents an event, it is possible to calculate the probability $P(X \leq x)$ for that event. The probability distribution function of a random variable $X$, denoted $F_{X}(x)$, is defined as $P(X \leq x)$.
The probability distribution function is expressed as follows:
$F_{X}(x) = P(X \leq x)$ [1]
According to the definition, $F_{X}(-\infty) = 0$ and $F_{X}(\infty) = 1$. Additionally, if $\Delta x \geq 0$, then $F_{X}(x+\Delta x) \geq F_{X}(x)$.
Probability Density Function
The probability density function $p_{X}(x)$ is defined as a function that satisfies the following equation [2]:
$\int_{-\infty}^{x} p_{X}(x)dx = P(X \leq x) = F_{X}(x)$ [2]
According to the definition above, if the probability density function is differentiable, it can be expressed as follows:
$= \lim_{\Delta x \rightarrow 0} \frac{F_{X}(x+\Delta x) - F_{X}(x)}{\Delta x}$ [3]
The probability that the random variable $X$ belongs to $(a, b]$ is calculated using the probability density function as follows:
$= \int^{b}_{a} p_{X}(x)dx$ [4]
According to the definition of the probability density function, $p_{X}(x) \geq 0$ and $\int_{- \infty}^{\infty} p_{X}(x) dx = 1$.
Probability Mass Function
A discrete random variable $X$ uses a probability mass function $\omega_{X}(x_{i})$ instead of a probability density function.
$\omega_{X}(x_{i})=P(X=x_{i}), i= 1, \ldots , n$ [5]
where $x_{i}, i = 1, \ldots, n$, represents all elements in the sample space.
By using the Dirac delta function ($\delta(x)$), the probability mass function can be expressed in the form of a probability density function.
$p_{X}(x)=\sum^{n}_{i=1}\omega_{X}(x_{i})\delta (x-x_{i})$ [6]
Note that the Dirac delta function is defined by the following two properties:
$\int_{- \infty}^{\infty} \delta(x)dx = 1$
Joint Probability Function
The joint probability distribution function $F_{XY}(x,y)$ of random variables $X$ and $Y$ is defined as the probability of joint events as follows:
$F_{XY}(x, y) = P((X \leq x) \cap (Y \leq y))$ [7]
The joint probability density function $p_{XY}(x, y)$ is derived from the joint probability distribution function as follows:
$F_{XY}(x, y) = \int_{-\infty}^{y} \int_{-\infty}^{x} p_{XY}(x, y)dxdy$ [8]
If $F_{XY}(x, y)$ is differentiable, the joint probability density function can be expressed as follows:
$p_{XY}(x, y) = \frac{\partial^{2}F_{XY}(x, y)}{\partial x \partial y}$ [9]
Since $F_{X}(x) = F_{XY}(x, \infty)$, the probability density function of $X$ can be obtained using equation [10]. This is called the marginal density function of $X$.
$p_{X}(x) = \int_{-\infty}^{\infty}p_{XY}(x ,y)dy$ [10]
Conditional Probability
The probability that an event A occurs given that event B has occurred is called the conditional probability of event A, and it is defined as equation [11].
$P(A \mid B) = \frac{P(A,B)}{P(B)}$ [11]
The conditional probability density function $p_{X\mid Y}(x\mid y)$ of $X$, given that the random variable $Y$ is $y$, is defined as the probability that the event $(X \leq x)$ will occur given $Y = y$, as shown in equation [12].
$P(X \leq x \mid Y = y) = \int_{-\infty}^{x} p_{X\mid Y}(x\mid y)dx$ [12]
If event A is $(X \leq x)$ and $Y = y$ is in the infinitesimal interval, then event B is $(y < Y \leq y + dy)$. Therefore, $p_{X\mid Y}(x\mid y)$ is derived from equation [11] as follows:
$p_{X\mid Y}(x\mid y) = \frac{p_{XY}(x, y)}{p_{Y}(y)}, p_{Y}(y) \neq 0$ [13]
If the probability of event A occurring given that $X = x$ is:
$p_{X|A}(a|x) = \frac{P(a, x)}{P_{X}(a)}, p_{X}(x) \neq 0$ [14]
Conversely, the conditional probability density function of $X$ given event $A$ is:
$p_{X|A}(X|A) = \frac{P(A, x)}{P(A)}$ [15]
Independent Random Variable
If the joint probability of events A and B equals the product of the probabilities of A and B, then events A and B are called independent events :
$P(A, B) = P(A)P(B)$ [16]
If the joint probability of the N events $A_{i}, i=1, \ldots, n$ satisfies equation [17], the events are called independence.
$P(\bigcap_{i=1}^{n}A_{i}) = \prod_{i=1}^{n}P(A_{i})$ [17]
Likewise, if the probability density function of random variables satisfies equation [18], the N random variables are independent.
$p_{X_{1}, \ldots, X_{n}} = \prod_{i=1}^{n}p_{X_{i}}(x_{i})$ [18]
If two random variables $X$ and $Y$ are independent, the conditional probability density function becomes a function independent of the condition, as shown in equation [19].
$p_{X\mid Y}(x \mid y) = p_{X}(x)$ [19]
Function of Random Variables
If the random variable $Y$ is given as a function $Y = g(X)$ of the random variable $X$, the probability of the event $Y \leq y$ is the same as the probability that the random variable $X$ belongs to the interval $I_{x}$ satisfying $g(X) \leq y$. Thus, the probability distribution function can be calculated as shown in equation [20].
$= P(g(X) \leq y)$ [20]
$= P(X \in I_{x})$For example, let’s assume the functional relationship between two random variables $X$ and $Y$ is $Y = 2X + 3$. Then, the interval of $X$ that satisfies $Y \leq 2X + 3$ is calculated as shown in equation [21].
$= P(X \leq \frac{y-3}{2})$ [21]
$= F_{X}(\frac{y-3}{2})$The probability density function of $Y$ can be calculated as shown in equation [22].
$P_{Y}(y) = \frac{dF_{Y}(y)}{dy} = \frac{d}{dy}[F_{X}(\frac{y-3}{2})]$ [22]
$= \frac{1}{2}p_{X}(\frac{y-3}{2})$It is also possible to calculate the probability density function of two random variables. Let’s assume that $Z$ is the sum of two random variables $X$ and $Y$: $Z = X + Y$. If we know the joint probability density function $p_{XY}(x, y)$, then $P(Z \leq z)$ is equivalent to $P(X + Y \leq z)$:
Thus, the probability distribution function of $Z$ is given by equation [23]:
$= P(X+Y \leq z)$ [23]
$=\int_{-\infty}^{\infty} \int_{-\infty}^{z-x} p_{XY}(x, y)dydx$$\therefore$ The probability density function of $Z$ is given by equation [24]:
$p_{Z}(z) = \frac{dF_{Z}(z)}{dz} = \int_{-\infty}^{\infty}\frac{d}{dz}\int_{-\infty}^{z-x}p_{XY}(x,y)dydz$ [24]
Next, to solve equation [24], we use Leibniz integral Rule[28] to become equation [24].
$p_{Z}(z) = \int_{-\infty}^{\infty}p_{XY}(x, z-x)dx$ [25]
Likewise, equation [25] can be written like equation [26].
$p_{Z}(z) = \int_{-\infty}^{\infty}p_{XY}(z-y,y)dy$ [26]
If $X$ and $Y$ are independent, equation [26] becomes a convolution as shown in equation [27].
$= \int_{-\infty}^{\infty}p_{X}(x)p_{X}(z-x)dx$ [27]
$\equiv p_{X}(x) \ast p_{Y}(z)$Leibniz Integral Rule
$\frac{d}{dx}\int_{a(x)}^{b(x)}f(x,t)dt = f(x, b(x)) \cdot b'(x) - f(x, a(x)) \cdot a'(x) + \int_{a(x)}^{b(x)} \frac{\partial}{\partial x} f(x, t) dt$ [28]
Sampling
A sample extracted from the random variable $X$, whose probability density function is $p_{X}(x)$, is written as follows:
$x~p_{X}(x)$ [29]
Assume that $N$ samples extracted from a random variable are ${x^{(1)}, x^{(2)}, \ldots, x^{(N)}}$. If each sample is extracted independently and fairly, the probability of each sample being extracted is given by equation [30].
$=\frac{1}{N}\sum_{i=1}^{N} \delta(x - x^{(i)})$ [31]
As shown in equation [30], if each sample is independently and equitably extracted from a population with some probabilistic features, the extracted sample is called an IID (independent and identically distributed) sample. By using equation [6], we can approximate $p_{X}(x)$ as follows:
$=\frac{1}{N}\sum_{N}^{i=1} \delta(x-x^{(i)})$ [31]
Then we can calculate the probability $P(x < X \leq x + \Delta x)$ of $X$ belonging to the interval $(x, x + \Delta x]$ as shown in equation [32].
$=\frac{1}{N} \sum_{i=1}^{N} \int_{x}^{x+\Delta x} \delta (x - x^{(i)}) , dx$
$=\frac{\text{the number of samples that belong to the interval } (x, x + \Delta x]}{N}$ [32]
Therefore, the histogram that shows the number of samples belonging to an arbitrary bin has the same shape as the approximation of the probability density function $p_{X}(x)$. One difference from the probability density function is that the area under the probability density function must be 1. So, if the area of the histogram is normalized to 1, we can obtain a shape closer to that of the probability density function.
For example, let’s approximate the probability density function of $Z$ obtained in the example by extracting 10,000 samples each from $X$ and $Y$. The image below shows the probability density function of $Z$ that is approximately calculated using the samples $z^{(i)}$.
As you can see in the picture above, it merely corresponds to the result of the example.
Example
Question.
Assume that $X$ and $Y$ are independent random variables, and their probability density functions are:
Find the probability density function of $Z = X + Y$.
Answer.
$p_{X}(x)=\left\{\begin{matrix} 1, \ \ 0\leq x \leq 1\\ 0, \ \ otherwise \end{matrix}\right.$
$=\left\{\begin{matrix} 1, \ \ 0\leq y \leq 1\\ 0, \ \ otherwise \end{matrix}\right.$
Therefore,
- $p_{Z}(z)=0, z <0$
- $p_{Z}(z)= \int_{0}^{z}dx = z, 0 \leq z \leq 1$
- $p_{Z}(z)= \int_{z-1}^{1}dx = 2-z, 1 < z \leq 2$
- $p_{Z}(z)= 0, z > 2$