Probability distributions

The distribution, or probability distribution, of a random variable tells us how likely it is that the random variable will have a particular value.

The distribution of a discrete variable

Let's have some discrete random variable, for example we could keep track of how many pints of beer the local drunkard, Bamboo, drinks in a day. By following Bamboula for one whole month, we constructed the following table:

Day Pints of beer Day Pints of beer Day Pints of beer
1. 5 11. 10 21. 4
2. 10 12. 5 22. 6
3. 4 13. 12 23. 10
4. 4 14. 10 24. 10
5. 5 15. 12 25. 4
6. 4 16. 12 26. 5
7. 0 17. 6 27. 10
8. 5 18. 10 28. 5
9. 4 19. 6 29. 0
10. 5 20. 10 30. 10

We can see that Bambula was still saving himself from the end of the month, but then he was throwing one beer after another. And the week since day 11 must have been crazy. Let's label this random discrete variable X. Now we'll be interested in the probability that Bamba drinks exactly x pints of beer on any one day of the month.

Before we calculate the probability itself, let's construct a simple frequency graph. We find all the pints of beer that Bambula drank on some day and plot them on the frequency graph. For example, Bamboo drank ten pints nine times in total, so for a value of ten pints we will have a bar with a value of nine:

The graph reads that Bamboo drank six pints on three days in the month.

What is the probability that Bambula drank ten pints in a day? Formally, we would write it like this: we would label the probability function P and write the query for ten pints as follows: P(X = 10). This asks, what is the probability that the random variable X takes the value ten?

The probability will be equal to the relative frequency of the given value. So let's still construct a graph of relative frequency:

We can now say that P(X = 10) is equal to 0,3, or 30%.

Statistical vs. mathematical probability

Let's take a classic dice game as an example. Let's try to roll it 600 times. In terms of classical mathematical probability, we have an equal chance of rolling a one, a two, ..., a six. Each face of the dice has a probability of rolling $\frac16$. So, purely theoretically, each face should roll exactly 100 times when we have rolled the dice six hundred times.

Of course, this is unlikely to happen. In our experiment we can get the following results:

The number on the dice Number of rolls Relative frequency
1 105 0.175
2 103 0.171666
3 90 0.15
4 96 0.16
5 100 0.16666
6 106 0.17666

We can see that the relative frequency, and therefore probability, of a single dot occurring on the cube is 0.175, which is 17.5%. While this is close to the probability of $\frac16$, which is 0,16666…, it is not completely accurate.

It doesn't mean that either probability is wrong. Constructing a perfectly balanced die is no easy task, just as it can be difficult to perform the actual test of rolling the die 600 times.

Thus, mathematical probability works under ideal conditions where we would roll the dice indefinitely. The longer we rolled the ideally balanced dice, the more the relative frequencies of each roll would approach the theoretical mathematical probability.

We therefore distinguish between two probabilities - the mathematical probability, which is based on idealized conditions, and the statistical (also empirical) probability, which we calculate from the relative frequencies in our data.

In order to calculate the statistical probability, we should have a sufficiently large data set. For example, we only followed our drunkard Bambul for one month, but it would have been much better if we had followed him for at least a year.

The distribution of the continuous variable

For a continuous variable, we do not specify the probability at a point, but always only at an interval. The probability will be equal to the size of the area under the curve, so we will need an integral to calculate it. We can have such a graph of a continuous variable, for example it can be a measurement of some variation in cm:

Probability distribution of a continuous variable

So on the x axis we have the values in cm, on the y axis we have the relative frequencies. Suppose we only measured values in the interval <−4, 4>. The empirical probability that we measure a deviation in this interval is thus 1, i.e. 100%.

How is area related to this? We say that the size of the area $\int_{-4}^4 p(x)$, where p is the likelihood function, gives us just that 100%. If we asked what is the empirical probability that the value is in the interval <0, 4>, we would get a picture like this:

Znázornění empirické pravděpodobnosti pro x \in \left<0, 4\right>

The previous highlighted part corresponded to 100%, this one corresponds to 50%, since the content is obviously half that. Exactly, we would express the statistical probability as a ratio of

$$ \frac{\int_{-4}^4 p(x)}{\int_{0}^4 p(x)}. $$

The law of large numbers

Related to the previous section is the well-known law of large numbers. This informally states that, given a large number of independent trials, the statistical probability will be close to the relative frequencies.

We can imagine this again on a coin toss. We might get heads or tails, each side has a probability $\frac12$, 50%. If we flip a coin four times, we may come up heads three times and tails once. The relative frequencies are thus $\frac34$ and $\frac14$. This is quite far from $\frac12$.

If we flip ten times in total, we may get seven heads and three tails. These are the frequencies of $\frac{7}{10}$ and $\frac{3}{10}$. This is still a long way from $\frac12$, but these numbers are still closer to one-half than the previous frequencies.

We could go on like this. If we flipped a coin a thousand times, we might get 520 times heads and 480 times tails. That's already frequencies $\frac{13}{25}$ and $\frac{12}{25}$, which are very close to one-half ( $\frac{13}{25} = 0,52$ for an idea).

Note that although the relative frequencies are much closer to one half, the absolute frequencies are much further away from the "ideal" frequency. When we rolled the dice ten times, the ideal absolute frequency for each side was 5. That is, we'd get five heads and five tails. Since we rolled a virgin seven times, that gives us a difference of two rolls: 7 − 5 = 2 We rolled "two rolls differently" than would have occurred in the ideal situation.

However, when we flipped the coin 1,000 times, the coin came up heads 520 times, giving a difference of 20 flips from the ideal, because ideally each side should have come up 500 times. So it would seem that the more times we flip, the more distant the results we get.

But that doesn't matter, the law of large numbers doesn't claim that absolute frequencies will come close to the ideal, but relative frequencies will. And they are getting closer. So in absolute frequencies they may be moving away, but in relative they will be moving closer.