0% Complete
Minds on

MINDS ON

This is the discussion icon. Share Your Thoughts

When a company fills an aluminum can with pop for sale, they print the mL on the can. Often, a can is set to fill up 355 mL of liquid.

This is an image of a can of coca cola with a label of 355 mL.
by Coca-Cola Ltd.

What is the probability that a pop can, which has been filled with 355 mL of liquid, and advertises 355 mL, actually contains exactly 355 mL?

 
Action.

ACTION

Recall the following information from Unit 5.

Continuous vs. Discrete

The difference between the two is whether or not they are obtained by measuring or counting.

Continuous Data

Definition:  Continuous data is data that is obtained by measuring.  You can measure data in different ways, including time and distance. Because there is always a data point that can exist between two data points and there is a possibility of infinite data points, measured data is continuous and organized into intervals.   

This is the example icon. Example

How long does it take you to get to school?  

This is an image of a person's hand holding a running stopwatch.

Your answer here could be 20 minutes for example.  Say the intervals the data was being organized into were:  

  • 15-20 minutes,
  • 20-25 minutes,
  • 25-30 minutes.  

What interval would your time go into?  You would have to look at the accuracy of your time.  

As a continuous data point, 20 minutes is either higher than 20 (20.000000001 for example) or lower than 20 (19.999999999 for example).  So you would always be able to decide which interval you belong to.

 

Discrete Data

Definition:  Discrete data is data that is obtained by counting.  This type of data was what you focused on in the first half of this course.  Discrete data points, unlike continuous ones, do not have points between points. There are a finite number of possibilities.  

This is the example icon. Examples

  1. How many heads do you get when you flip a coin 3 times?  The only possibilities are 0,1,2 and 3.  There are no values between these.
     
  2. What percent did you score on your G1 driver's test?  Here, even though the numbers can be decimals, they were obtained by counting the score.  Making the score a percent does not change it from a discrete data point to a continuous one.  Even if the instructor used half or quarter points, it is not possible to score every value on the number line (even decimals) between 0 and 40. 
 

Calculating Probabilities

We have spent the first half of the course learning about how to calculate probabilities for discrete distributions, by counting the total number of possible ways to arrive at a specific outcome and divide by the total possible ways to arrive at all outcomes.  As discrete variables are measured by counting, discrete probabilities are calculated by counting.

In a discrete probability histogram, each of the outcomes would have its own unique probability, and you can read the probabilities of each outcome off of the graph.

This is a histogram displaying the probability of times different numbers of heads are flipped when flipping 3 coins.
Probability of x which represents the
number of heads flipped when flipping 3 coins.

The focus of this unit will be on calculating probabilities for continuous variables.  The probabilities for continuous variables are calculated by summarizing data that shows up on an interval. 

Definition: Similar to a discrete probability histogram, a continuous probability density graph shows us the percentage or probability of different amounts occurring except the histogram is shown on intervals.  We can find the probabilities of specific intervals by finding the area of the graph between those intervals.  What makes the probability density graph very useful is the fact that the area of the bars add to 1.

This is most easily seen in a graph of uniformly distributed data:

This is the ePortfolio icon. Record Your Work

This is an image of a histogram that shows the distribution of wait times for a subway train.
This graph shows the percentage distribution of wait times for a subway train that comes every 10 minutes.  This is known as a probability density histogram since the entire area of the bars is 1.

Answer the following questions about the uniformly distributed graph:

  1. What is the probability that a wait time will be between between 2 and 5 minutes?
  2. What is the probability that a wait time is more than 6 minutes?
  3. What is the probability that a wait time is less than 2.5 minutes?
  4. What is the probability that a wait time is exactly 4 minutes?

Compare your answers to the solutions below. What did you get correct? What are you having trouble with?

Solutions

  1. 30%
  2. 60%
  3. 25%
  4. 0%
 

Now, often, you will see a probability density graph with intervals that are not a width of 1. See the following graphing on the distribution of teacher ages at an Ontario High School.  

This is an image of a histogram that shows the age distribution of teachers at a high school.
Age distribution of teachers at an Ontario high school. 
The height of each bar is 0.1428.

Age is a continuous variable, and for this example, teacher's ages would be the time since they were born. In other words, if a teacher was 35, they would be in the 35-40 interval. If a teacher was 40, they would be in the 40-45 interval.

Now, the area would not work here to give the probability of intervals other than the ones given. The total area under the curve is A equals b times h equals 35 times 0.1428 equals 5 or 5 times what it should be.  

This is because the interval size is 5. What we can do is take each of the interval boundaries and call 25 = 0, 30 = 1, 35 = 2 and so on....  

This would be the same as calling 5 years equal to 1 unit.  

This would be illustrated in the following graph, by taking each age, subtracting 25 and then dividing by 5:

This is an image of a histogram that shows the age distribution of teachers at a high school the x axis has 0 representing age 25.
Age distribution of teachers at an Ontario high school.  
The height of each bar is 0.1428. 
The units represent every 5 years, starting at age 25.

Now we can calculate probabilities involving the teachers by using the area of the bars because the total area is: A equals b times h = 7 times 0.1428 = 1

You will calculate probabilities for this graph in the quiz in this activity.

This is an image of a teacher standing in front of a green chalk board.

A common question for discrete probabilities is whether or not the question includes the number given. For example, if you wanted to find the probability of less than 2 heads flipped on 3 coins, you would have to ask whether or not 2 heads is included.  Asking "What is the probability of 2 or less heads on the flip of 3 coins?" is different from asking "What is the probability of less than 2 heads on the flip of 3 coins?" For the first, we could write:  P of x is less than or equal to 2. For the second we could write P of x is less than 2 which is the same as P of x is less than or equal to 1..

For continuous probabilities, since the probability of any specific value is 0%, the probability that a number is less than a certain number is the same as the probability that a number is less than or equal to that certain number. In other words, P of x is less than 2 is the same as P of x is less than or equal to 2. The area of the bars over these intervals are the same.

Challenges with Continuous Variables

Unlike discrete variables like the ones we saw in the first half of this course, continuous variables do not have theoretical probabilities associated with them. Often we can only base our probabilities on past data and create a histogram based on a sample of the data. The length of time it takes you to get to school, for example, can be modelled after collecting a sample of the times it takes you to get to school. There will be some variability(definition:The extent to which numbers are different.) in the data, requiring a measure of the mean and the standard deviation, which we will see in the remainder of the unit. There is also a need to assume that the data will behave in a certain way, by being normally distributed around the mean.  

In Activity 4, you will see some data for Hurricane wind speeds. Now, since it is impossible to calculate the theoretical probability of a Hurricane having a wind speed of greater than a certain amount, the best we can do is to make a mathematical model based on the past data that we have. This may include only a sample of the data, which will need to represent the population. It may also require us to use data that may not be as accurate, because it was calculated by different sources.

Consolidation

CONSOLIDATION

This is the dropbox icon. Application and Thinking

Revisit the 100m results discussed in the Introduction. Based on what you have learned in this activity, did Jeneba Tarmoh and Allyson Felix actually tie for 3rd?

What would you recommend to avoid this in the future?  

test text.