Basic Calculation

\[(A + B)^T = A^T + B^T \]

\[(AB)^T = B^TA^T\]

\[(A^T)^{-1} = (A^{-1})^T\]

\[(AB)^{-1} = B^{-1}A^{-1}\]

Probability and Statistics

Random Variable

Random Variable

In probability and statistics, a random variable, random quantity, aleatory variable, or stochastic variable is a variable whose possible values are numerical outcomes of a random phenomenon.

A random variable is defined as a function that maps outcomes to numerical quantities (labels), typically real numbers. As a function, a random variable is required to be measurable, which rules out certain pathological cases where the quantity which the random variable returns is infinitely sensitive to small changes in the outcome.

A random variable has a probability distribution, which specifies the probability that its value falls in any given interval. Random variables can be:

  • discrete, taking any of a specified finite or countable list of values, endowed with a probability mass function characteristic of the random variable’s probability distribution;
  • continuous, taking any numerical value in an interval or collection of intervals, via a probability density function that is characteristic of the random variable’s probability distribution;
  • a mixture of both types.

Probability Distribution

Jump to Wikipedia

In probability theory and statistics, a probability distribution is a mathematical function that, stated in simple terms, can be thought of as providing the probabilities of occurrence of different possible outcomes in an experiment.

For instance, if the random variable X is used to denote the outcome of a coin toss (‘the experiment’), then the probability distribution of X would take the value 0.5 for X=heads, and 0.5 for X=tails (assuming the coin is fair).

In more technical terms, the probability distribution is a description of a random phenomenon in terms of the probabilities of events.

Probability distributions are generally divided into two classes:

  • discrete probability distribution: probability mass function (the set of possible outcomes is discrete)
  • continuous probability distribution: probability density functions (the set of possible outcomes can take on values in a continuous range)


The following terms are used for non-cumulative probability distribution functions:

  • Distribution, Frequency distribution: is a table that displays the frequency of various outcomes in a sample.
  • Probability distribution: is a table that displays the probabilities of various outcomes in a sample. Could be called a “normalized frequency distribution table”, where all occurrences of outcomes sum to 1.
  • Distribution function: is a functional form of frequency distribution table.
  • Probability distribution function: is a functional form of probability distribution table. Could be called a “normalized frequency distribution function”, where area under the graph equals to 1.


  • Probability mass, Probability mass function, p.m.f., Discrete probability distribution function: for discrete random variables.
  • Categorical distribution: for discrete random variables with a finite set of values.
  • Probability density, Probability density function, p.d.f., Continuous probability distribution function: most often reserved for continuous random variables.

The following terms are somewhat ambiguous as they can refer to non-cumulative or cumulative distributions, depending on authors’ preferences:

  • Probability distribution function: continuous or discrete, non-cumulative or cumulative.
  • Probability function: even more ambiguous, can mean any of the above or other things.

*Basic terms:*

  • Mode: for a discrete random variable, the value with highest probability (the location at which the probability mass function has its peak); for a continuous random variable, a location at which the probability density function has a local peak.
  • Support: the smallest closed set whose complement has probability zero.
  • Head: the range of values where the pmf or pdf is relatively high.
  • Tail: the complement of the head within the support; the large set of values where the pmf or pdf is relatively low.
  • Expected value or mean: the weighted average of the possible values, using their probabilities as their weights; or the continuous analog thereof.
  • Median: the value such that the set of values less than the median, and the set greater than the median, each have probabilities no greater than one-half.
  • Variance: the second moment of the pmf or pdf about the mean; an important measure of the dispersion of the distribution.
  • Standard deviation: the square root of the variance, and hence another measure of dispersion.
  • Symmetry: a property of some distributions in which the portion of the distribution to the left of a specific value is a mirror image of the portion to its right.
  • Skewness: a measure of the extent to which a pmf or pdf “leans” to one side of its mean. The third standardized moment of the distribution.
  • Kurtosis: a measure of the “fatness” of the tails of a pmf or pdf. The fourth standardized moment of the distribution.

Cumulative Distribution Function

Because a probability distribution P on the real line is determined by the probability of a scalar random variable X being in a half-open interval (−∞, x], the probability distribution is completely characterized by its cumulative distribution function:

\[ F(X) = P[X \leq x]\ for\ all\ x \in R. \]

Discrete Probability Distribution

A discrete probability distribution is a probability distribution characterized by a probability mass function. Thus, the distribution of a random variable X is discrete, and X is called a discrete random variable, if

\[ \sum_{u} P(X=u) = 1 \]

Continuous Probability Distribution

A continuous probability distribution is a probability distribution that has a cumulative distribution function that is continuous. Most often they are generated by having a probability density function.

\[ P[a \leq X \leq b] = \int_{a}^{b} f(x)dx \]

\[ F(x) = u(-\infty,x]) = \int_{-\infty}^{x} f(t)dt. \]


Jump to Wikipedia

In probability and statistics, population mean and expected value are used synonymously to refer to one measure of the central tendency either of a probability distribution or of the random variable characterized by that distribution.

In the case of a discrete probability distribution of a random variable X, the mean is equal to the sum over every possible value weighted by the probability of that value:

\[ u=\sum xP(x) \]

An analogous formula applies to the case of a continuous probability distribution. Moreover, for some distributions the mean is infinite: for example, when the probability of the value 2n is \(\frac{1}{2^{n}}\) for n = 1,2,3,….

Types of Mean

Pythagorean Mean

Arithmetric Mean

\[ \overline{x}=\frac{x_1+x_2+...+x_n}{n} \]

Geometric Mean

\[ \overline{x}=(\prod_{i=1}^{n} x_i)^{\frac{1}{n}} \]

Harmonic Mean

The harmonic mean is an average which is useful for sets of numbers which are defined in relation to some unit, for example speed (distance per unit of time).

\[ \overline{x}=n\cdot (\sum_{i=1}^{n} \frac{1}{x_i})^{-1} \]

Relationship between AM, GM, and HM

AM, GM, and HM satisfy these inequalities:

\[ AM \geq GM \geq HM \]

Mean of a Probability Distribution

The mean of a probability distribution is the long-run arithmetic average value of a random variable having that distribution. In this context, it is also known as the expected value.

For a discrete probability distribution, the mean is given by \(\sum xP(x)\).

For a continuous distribution, the mean is \(\int_{-\infty}^{\infty}f(x)dx\).

Generalized Means

Power Mean

The generalized mean, also known as the power mean or Hölder mean, is an abstraction of the quadratic, arithmetic, geometric and harmonic means. It is defined for a set of n positive numbers xi by

\[ \overline{x}(m)=(\frac{1}{n}\cdot\sum_{i=1}^{n} x_i^m)^{\frac{1}{m}} \]

By choosing different values for the parameter m, the following types of means are obtained:

  • \(m\to\infty\): maximum of xi
  • \(m=2\): quadratic mean
  • \(m=1\): arithmetic mean
  • \(m\to 0\): geometric mean
  • \(m=-1\): harmonic mean
  • \(m\to -\infty\): minimum of xi

\[ \overline{x}=f^{-1}(\frac{1}{n}\cdot\sum_{i=1}^{n}f(x_i)) \]

  • \(f(x)=x\): arithmetic mean
  • \(f(x)=\frac{1}{x}\): harmonic mean
  • \(f(x)=x^m\): power mean
  • \(f(x)=lnx\): geometric mean

Weighted Arithmetic Mean

\[\overline{x}=\frac{\sum_{i=1}^{n}w_i\cdot x_i}{\sum_{i=1}^{n}w_i}\]

Expected Value

Jump to Wikipedia

In probability theory, the expected value of a random variable, intuitively, is the long-run average value of repetitions of the experiment it represents. Less roughly, the law of large numbers states that the arithmetic mean of the values almost surely converges to the expected value as the number of repetitions approaches infinity.

The expected value is also known as the expectation, mathematical expectation, EV, average, mean value, mean, or first moment.

The expected value is a key aspect of how one characterizes a probability distribution; it is one type of location parameter. By contrast, the variance is a measure of dispersion of the possible values of the random variable around the expected value.


Univariate discrete random variable, finite case

\[ E[X] = \frac{\sum_{i=1}^{k} x_ip_i}{\sum_{i=1}^{k} p_i} = \frac{\sum_{i=1}^{k} x_ip_i}{1} = \sum_{i=1}^{k} x_ip_i \]

Univariate discrete random variable, countably infinite case

\[ E[X] = \sum_{i=1}^{\infty} x_ip_i \]

Univariate discrete random variable, countably infinite case. If this series does not converge absolutely, we say that the expected value of X does not exist.

Univariate continuous random variable

If the probability distribution of {\displaystyleX} X admits a probability density function {\displaystylef(x)} f(x), then the expected value can be computed as

\[ E[X] = \int_{-\infty}^{\infty}xf(x)dx \]

provided the intergral converges.

General Definition

In general, if X is a random variable defined on a probability space (Ω, Σ, P), then the expected value of X, denoted by E[X], <X>, \(\overline{X}\), is defined as the Lebesgue integral.

\[ E[X] = \int_{\Omega}XdP = \int_{\Omega}X(\omega)dP(\omega) \]

provided this integral exists.

The expected value of a measurable function of X, g(X), given that X has a probability density function f(x), is given by the inner product of f and g:

\[ E[g(X)] = \int_{-\infty}^{\infty} g(x)f(x)dx \]



\[ E[c] = c \]

\[ E[E[X]] == E[X] \]


\[ E[X+Y] = E[X] + E[Y] \]

\[ E[aX] = aE[X] \]

\[ E[aX+bY+c] = aE[X] + bE[Y] + c \]

Iterated Expection

Iterated expectation for discrete random variables

\[ E[X|Y=y] = \sum_{x} x\cdot P(X=x|Y=y) \]

\[ E[X] = E[E[X|Y]] \]

Iterated expectation for continuous random variables

\[ E[X] = E[E[X|Y]] \]

\[ if\ X\leq Y,then\ E[X]\leqE[Y] \]


If one considers the joint probability density function of X and Y, say j(x,y), then the expectation of XY is

\[ E[XY] = \int\int xyj(x,y)dxdx \]

In general, the expected value operator is not multiplicative, i.e. E[XY] is not necessarily equal to E[X]·E[Y]. The amount by which multiplicativity fails is called the covariance:

\[ Cov(X,Y) = E[XY] - E[X]E[Y] \]



Standard Deviation