Basic probability theory explained
Lets face it:
Probability theory is not easy. Part of the reason is that probabilities can be combined, excluded and included with other probabilities, which increase the complexity.
We can use probability theory to reason that our odds of getting a 6 when rolling a die must be 1/6. But, if we extend the experiment to include 3 dice, the probability of rolling a 6 on all of them seems less obvious
In this section, we will start with basic probability theory, then look for ways to work with more complex problems.
Let’s start by defining the concept of probability:
The word itself is relatively self-explanatory, but what does it mean when we say the probability of a getting 6 when rolling a die is 1/6? Mathematically, since 6 times 1/6 equals 100%, does that mean that we are sure to get a 6 after 6 throws?
- As you probably already know or have guessed, probabilities are concerned with results in the long run, that is, the probability that an event would occur if a die were rolled an infinite number of times.
If an experiment were performed only a few times, the outcomes would appear to be random. In other words, you have no guarantee of getting a 6 by rolling 6 dice. However, if you had the patience to throw the dice a billion times, the number of sixes would be approximately 1/6.
Three Different Types of Probabilities
In the “real world,” it's rare to work with exact probabilities. Often, we estimate our way to probabilities based on a sample. Estimated probabilities are described by experimental probability theory.
The name refers to the fact that we must experiment to find the likelihood of an event occurring.
These types of estimates are not exact, as is the case with the roll of a die, but larger samples produce more accurate estimates. In other words, a sample of 250 million Americans would offer a more accurate estimate of the proportion of American voters who would vote for Obama than a sample of only 10 million Americans.
Unlike the roll of a die—a scenario in which we know there are 6 possible outcomes—we encounter a number of situations every day where we do not have this kind of information available. If you were to one day bet all of your savings on a new horse that had never raced before, your estimate of the probability that the horse would win would greatly depend on your own subjective assessment of the horse’s skills.
The likelihood of a specific event (Hi) occurring must be the sum of all probabilities belonging to the incident. With a single roll of the die, the probably of rolling “at least 5” must be the sum of the probability of rolling a “5” and “6”
Before we look at the rules for calculating probabilities, we must understand the concepts of outcome, sample space and incident. We can define an outcome as the result of an experiment.
An experiment leads one’s thoughts to white lab coats and Frankenstein, but in principle, an experiment simply reflects a particular action. The action might be tossing a coin or playing in the Wimbledon finals. Both examples contain a clear outcome.
With a toss of a coin, we either get heads or tails, and in the Wimbledon finals, a player can either win or lose. The sample space (U) can be defined as all the possible outcomes of an experiment.
When we throw a single die, the sample space is defined as U (1, 2, 3, 4, 5, 6) 5 6 possible outcomes. An incident (Hi) is defined as the outcome for which we want to calculate the probability.
If you know you would win a coin toss by getting “heads,” you could define that incident as such: H(Heads). If you know you would win a die roll if the resulting sum was less than 4, you could define incident as follows: H (4, 5, 6). The table below illustrates the concepts.
The box represents the experiment and all the possible outcomes (U). The circle represents the event H.
H – is the complement, which represents the (t) outcomes. These are the outcomes not included in the incident H. Together, �H� and �H –� equal the total sample space U.
Imagine you were in the running to win a million Dollars. The raffle would be played by drawing one out of 100 numbered balls, randomly chosen.
If the ball drawn had the number 1 on it, you would win. I hope you agree that the probability of selecting a winning ball, ball number 1, must necessarily be 1/100, or 1%. In other words, we have found the probability of an event—the winning ball coming out—out of all the possible events.
Formally, this is called the probability of an event, labeled P (event). In this example, that event would be P (the winning ball comes out). The probabilities calculated in the raffle example are called a priori probabilities.
A priori refers to the fact that we can calculate the exact probability before the incident occurs. In other words, given the knowledge we have of the experiment, we can reason that the probability of drawing a winning ball should be 1/100.
A priori probabilities are based on the fundamental assumption that all outcomes are equally probable. For our probability of drawing a winning ball to be true, it is necessary that the balls are designed the same way.
There should, for example, be no difference in weight or size.
Out of the 20 balls available were winning balls, the likelihood of winning would be written as follows:
The basis for calculating a priori probabilities is that we must know the number of possible outcomes and be able to count the number of possible events we want.
Now, if we assume that the logistics manager for Post Denmark would like to know the probability that the sorting machine will make a mistake, he must observe the machine in a given period and then count the number of errors.
The question is, how long must the manager observe the machine to learn the true probability that it will make an error? You will hopefully agree that 5 minutes would be on the low side, but what about observing the machine for a whole day or week?
On the one hand, we will, ceteris paribus, approach a more accurate probability that the machine will make an error the longer we observe it. On the other hand, it seems intuitive that we would probably get two different results if we observed the machine during two different weeks.
Thus, we would obtain different probabilities for machine failure, which can be illustrated as follows:
In contrast to objective probabilities, which we looked at in the example of dice rolling, here we must consider two important facts.
Firstly, we cannot pre-calculate the probability that the machine will make an error. We would need to conduct an experiment in which the machine’s errors were counted over a period of time.
Secondly, we can see that the estimated probability changes with each experiment. We would not weigh exact probabilities, but estimated (approximate) probabilities.
When considering probabilities, we have thus far dealt with situations in which data can be either measured objectively or obtained from estimates. Subjective probabilities fall outside both categories.
As the name suggests, subjective probabilities are based on experiences and feelings, not numbers. We are surrounded by subjective probabilities on a daily basis.
For example, your sense of whether a person is telling the truth is often a subjective assessment. Every day, many of our actions are, more or less unconsciously, guided by subjective probabilities. We could call these instinctive estimates.
Intersection (”AND Incident”)
So far, we have discussed the probabilities of a single incident occurring, such as the probability that a single die would roll a 6 or that a sorting machine at Post Denmark would make mistakes.
What we will now see is how we can combine probabilities and thereby calculate the probability that two or more different events will occur.
Basically, events can be combined in two ways: either as the probability of event “A and B” or event “A or B”. When considering the intersection, we seek the probability that two events will occur simultaneously.
That’s the common area, which we can illustrate with the following Venn diagram:
The area both circles have in common is called the intersection, which is the blue area in the diagram
EXAMPLE: Suppose we have one white and one black die. We want to know the probability of getting a 6 with both. We know that the probability of rolling a 6 with a single die is 1/6, so how can we calculate the probability that both die will roll sixes? Calculation of events:
It may surprise us that the probability of getting two 6s is only 3.33% when we also consider that the probability of rolling a single 6 is about 17%.
Why is there such a big difference, and why are we six times less likely to roll two 6s than one 6? If we illustrate the possible outcomes for two dice, you will quickly see why.
With a single die, we have six possible outcomes, so we know the probability of rolling a 6 must be 1/6. However, with two dice, our sample space not only doubles, but increases six-fold, for a total of 36. This explains why the probability of rolling two 6's (blue box) must be six times smaller than the probability of getting a 6 with a single die.
The intersection is not just limited to two incidents, since the possible combinations are, in principle, infinite. For example, the probability of all three die rolling 6s would be:
Union (”OR” Event)
In contrast to an intersection, where the events A and B must both occur, a union has less demanding criteria.
With a union, we want to find the probability that at least one of the incidents will occur. To illustrate this, we can use a Venn diagram, where the union represents the total area of both circles.
The union is met when either A or B or both A and B occurs. If we, for simplicity’s sake, reuse the example of the two dice, this union would refer to the probability that the white die rolls a 6, the black die rolls a 6, or that they both do it.
EXAMPLE: If we continue the example of the white and black die, what is the probability that the union will occur, i.e., that either the white die will roll a 6, the black die will a roll a 6, or that both will do it?
The reason we subtract the intersection from the sum of A and B is that the intersection is part of both A and B. When we add up the probability of A and B, it means we are going to include the intersection twice—see the dark blue area in cell “66” below.
A complementary event is the opposite of the incident we just defined. If the event (A) is defined as getting “tails” on a coin toss, the complementary event (A –) would be getting “heads.”
When we use complementary probabilities, it is often to calculate the intersection or the union in a simpler way.
Suppose we have an assembly line where two control mechanisms ensure that defective items are discarded. Each control mechanism is 99% accurate and there is only a 1% probability of error. As the production manager, you are interested in knowing the probability that a defective item would slip through both control mechanisms without being detected. This probability can be solved using the union, where we find the probability that the error would pass through control 1, control 2 or both controls:
Instead of using the union, we can solve the problem in a more simple (elegant) way by using the complement.
Rather than finding the probability that an error will be detected by one or both controls, we can just find the probability that an error will not be discovered and subsequently subtract this number from 1, which is equivalent to our total probability (100%).
NB: Note that the event is the opposite of the incident, which is marked with a slash over the letter.
Conditional Probabilities Dependent Events
So far, we have looked at events as independent events, i.e., experiments in which outcomes are not dependent on or affected by one another.
The line that divides A and B means “A on condition of B,” or “A when B has occurred.”
We still have a white and a black die and events like “A: rolling a 1 on the white die” and “B: rolling a 6 on the black die”. Does that mean we have two independent events? Since the outcome of the black die in no way affects the outcome of the white die, refer to the following calculation:
In other words, the probability of occurrence “A” does not influence the outcome of occurrence “B”. This confirms the rule of independence. Everything has an opposite and, as you probably guessed, there are some situations where we cannot assume independence between A and B. In such cases, the rule is:
Dependency means that the event A influences event B, or vice versa. That does not necessarily mean that event B influences the probability of event A occurring.
EXAMPLE: Suppose we have a lotto game with 10 numbered balls. To win, we must get ball number 1. We know that the probability of getting a certain ball on the first attempt must necessarily be 1 out of 10. Thus, we define the events:
But, what if we don’t draw ball number 1 on the first attempt? Now, when the next ball is drawn, there will necessarily be a probability of 1/9 that ball number 1 will be drawn. The probability of drawing ball number 1 has increased from 1/10 to 1/9 when we moved on to the second attempt. This confirms the rule of dependence: