3. Probablistic methods

This time we will dive into probability, a topic that seems unrelated to graph theory but sometimes is extremely useful in proofs and analysis.

Definitions¶

A random variable is just a variable that we take very randomly. For example, the outcome of rolling a standard six-sided dice and the outcome of a coin toss are both random variables.

We denote the probability that certain event \(A\) will happen as \(P(A)\) . For instance, if \(D_6\) is the outcome of rolling a regular dice, then

\[P(D_1=1)=P(D_2=2)=\cdots =P(D_6=6)=\frac16\]

Or if we know that some event will never happen, such as rolling a 7 on a regular dice, then we say \(P(D_6=7)=0\).

If we rolled a regular dice infinitely many times, what will we expect to get as an average of our outcome? This value is the expected value \(E[X]\) of a random variable \(X\). A formal way to say it is:

\[ E(X)=\sum_{x}P(X=x)\cdot x \]

In our previous example,

\[ E(D_6)=\frac16\cdot1+\frac16\cdot2+\cdots+\frac16\cdot6=3.5 \]

In statistics and probability theory, variance is a measure of dispersion from the average value. It is calculated by taking the sum of the square of the differences between each number in a data set and the average value. If we have a data set with \(n\) values \(x_1,x_2,\dots,x_n\), then the variance is

\[ \frac{\sum(x_i-\bar{x})^2}{n} \]

We can generalize this notion for a random variable \(X\),

\[ \text{Var}(X) =E(|X-E(X)|^2)=E(|X|^2)-E(|X|)^2 \]

Exercises:

What is the expected value of flippinga coin if we say head is 1 and tail is 0?
What is the expected value of the sum of the outcome of 2 coins? 3 coins? What patterns do you find?
What is the variance of the outcome of a standard dice?

We say two events \(A\) and \(B\) are independent of each other, if \(A\)'s outcome doesn't affect \(B\)'s outcome. Similarly, we say two random variables are independent if their values don't affect each other. For example, getting a head on your first flip does not mean you will get a head on your second flip. More formally:

\[ P(A\cap B)=P(A)P(B) \]

It means that the probability of both event happening is equal to the probability of \(A\) multiplies the probability of \(B\). For example, the probability of getting two heads in a row is \(0.5\times 0.5=0.25\).

Denote \(P(A|B)\) as the probability of \(A\) when we know that \(B\) occurs. This is called the conditional probability. It is defined as:

\[ P(A|B) = \frac{P(A\cap B)}{P(B)} \]

This requires \(P(B) > 0\), otherwise this event will never happen and thus \(P(A|B)\) is undefined. When \(A\) and \(B\) are independent, we have \(P(A|B)=P(A)\) , since it doesn't matter if \(B\) happens or not.

Let's see some examples for conditional probability:

Alice rolled two dices in a row and she got no greater than 3 in her first roll. What is the probability that she will get a sum of 10?
If in a school there are \(40\%\) boys and \(60\%\) girls, and \(10\%\) of the students are in the school's track-and-field team. If \(5\%\) of all the boys are in the track-and-field team, what's the percentage of girls that are in the team? What's the percentage of the team that are girls?

By the definition of conditional probability we are able to get a famous theorem in probability theory and statistics:

Bayes Theorem¶

(Bayes Theorem) Let \(A,B\) be two events,

\[ P(A|B)=\frac{P(B|A)\cdot P(A)}{P(B)} \]

Here's a nice video made by 3Blue1Brown, a famous math youtuber, if you would like to understand Bayes Theorem in further details.

Markov's Inequality¶

(Markov's Inequality) Let \(X\) be a non-negative random variable. Then for any positive real \(\lambda >0\),

\[ P(X\ge \lambda)\le \frac{E(X)}{\lambda} \]

Proof: This should be quite straightforward by using the definition of expected value. We have

\[ E(X)=\sum_{x}x\cdot P(X=x)\ge \sum_{x\ge \lambda}x\cdot P(X=x)\ge \lambda\cdot P(X\ge \lambda) \]

Intuitively, this suggests that \(X\) is not likely to be a lot higher than \(E(X)\). For example, if the average grade on AP calculus final exam is 70%, we would know that the upper bound of students getting higher than 90% is

\[ P(X\ge 90)\le \frac{E[X]}{90}=\frac79\approx77.8\% \]

However, the equality holds only when actually \(7/9\) of the class is getting exactly 90% and \(2/9\) is getting zero, which is rare in real life situations. How do we improve our inequality?

Chebyshev's Inequality¶

(Chebyshev's Inequality) Let \(X\) be any random variable with finite expected value and variance. For every positive real number \(a\),

\[ P(|X-E(X)|\ge a)\le \frac{\text{Var}(X)}{a^2} \]

This is actually an extension of Markov's inequality.

Proof: Let \(Y=(X-E(X))^2\), then \(Y\) is a non-negative random variable with \(E(Y)=\text{Var}(X)\) by definition. By Markov's inequality:

\[ P(Y\ge a^2)\le \frac{E(Y)}{a^2}=\frac{\text{Var}(X)}{a^2} \]

The event \(Y\ge a^2\) is equivalent to \(|X-E(X)|\ge a\), thus the original inequality holds.

Chebyshev's inequality gives a bound on the probability that \(X\) is far from \(E[X]\).

Linearity of Expectations¶

Linearity of expectations is the property that the expected value of the sum of random variables is equal to the sum of their individual expected values, regardless of whether they are independent. Take one of the previous exercises as an example, we have already seen that the expected value of the sum of \(n\) coins is \(n/2\). Formally, the theorem states:

For random variables \(X\) and \(Y\),

\[ E(X+Y)=E(X) + E(Y) \]

More generally, for \(n\) random variables \(X_1,X_2,\dots,X_n\),

\[ E(\sum_{i=1}^n X_i) = \sum_{i=1}^n E(X_i) \]

Proof: By definition,

\[ \begin{aligned} E(X+Y) &=\sum_{x}\sum_{y}(x+y)P(X=x,Y=y)\\ &= \sum_{x}\sum_{y}xP(X=x,Y=y) + \sum_{x}\sum_{y}yP(X=x,Y=y)\\ &= \sum_{x}x\cdot P(X=x)+\sum_{y}y\cdot P(Y=y)\\ &= E(X) + E(Y) \end{aligned} \]

This proof can be generalized to \(n\) variable by induction.

Remark: By exercise \(1\), we should be able to see that

\[ E(c_1X_1+c_2X_2+\cdots+c_nX_n)=c_1E(X_1) + c_2E(X_2)+\cdots + c_nE(X_n) \]

Exercises:

Prove another property of linearity, where \(c\) is any real number and \(X\) is a random variable:

\[ E(c\cdot X)=c\cdot E(X) \]

One may wonder if linearity holds for variance. Prove that for two independent random variables \(X,Y\), we have

\[ \text{Var}(X+Y) = \text{Var}(X)+\text{Var}(Y) \]

In a party, there are \(n\) people who are assigned a name tag. They now decide to shuffle their name tags randomly and give each person a name tag back. Let \(S\) be the number of people that have got their name tag. Find \(E[S]\).
Let \(S\) be \(0.abcabcabc\dots\), where \(a,b,c\) are integers chosen uniformly at random from \(0\) to \(9\). What is the expected value of \(S\)?