0% Complete
Minds on

MINDS ON

 
 

This is the discussion icon. Consider the Following

There are many continuous variables that are naturally occurring in the world and that would result in the most amount of data at the mean and then data occurring in less frequency as you get farther from the mean. Think of a continuous variable that would take this shape.

This is an image of a histogram with a normal curve over top.
Action.

ACTION

The so-called "Law of Frequency of Error" that Sir Francis Galton mentions in the quote from the introduction is the idea that we can predict, with a high number of data, the amount that natural things, which are seemingly chaotic, that deviate from the average.

The Normal Curve

In naturally occurring variables, we can assume that the modal interval will lie in the middle of the data, at the average.  Then points will fall above and below the average with decreasing frequency as they get farther from the average.  The "Law of the Frequency of Error" tells us that for every really high point from the average, there will be a very low point, making a symmetric curve on both sides.

In the following example, the mass of 21000 potatoes, in grams is recorded with an average mass of 150 g and a standard deviation (definition:The distance a typical point is from the average.of 20 g.  The standard deviation shows how far points ended up above and below the average.

For 21000 potatoes, an interval size of 10 g is used, giving the following histogram.  The Normal Curve has been placed over the probability density histogram so you can see how well it describes the data.

This is an image of a histogram with a normal curve on top, it is showing the distribution of apple mass with 10 g intervals.

If you decrease the interval size to 5 g, you can see, similar to the last activity, the improved fit of the normal curve:

This is an image of a histogram with a normal curve on top, it is showing the distribution of apple mass with 5 g intervals.

Decreasing to 2 g, it becomes harder to see gaps between the curve and the histogram:

This is an image of a histogram with a normal curve on top, it is showing the distribution of apple mass with 2 g intervals.

When the interval size is 1 g, the curve almost exactly covers the histogram:

This is an image of a histogram with a normal curve on top, it is showing the distribution of apple mass with 1 g intervals.

Rule of 68, 95, 99.7

When many data points are taken and the mean and standard deviation are calculated, we can predict that the data and the probability density graph will follow a predictable curve, and we no longer need the rectangles, and their areas to calculate probabilities.

This is an image of a normal curve showing the distribution of apple mass.

The curve is always the same curve, no matter the context, as long as the data can be expected to be normally distributed.  What customizes the curve is the standard deviation and mean.

For every normal curve, we can say that approximately 68% of the data lies within 1 standard deviation from the mean.  In this example, 68% of the potatoes are between 130 and 170 g. 

Now, if we called X the random variable that represent the mass of potatoes, we would say that X is approximately normally distributed with a mean of 150 and a standard deviation of 20.  We say that in symbols by writing: X is approximately Normally distributed with mean of 150 and standard deviation of 20.

We can also write:  P of 130 less than or equal to X is less than or equal to 170 is 68%.

We also know that 95% of the data is within 2 standard deviation from the mean.  In this example, 95% of the potatoes are between 110 and 190 or P of 110 less than or equal to X is less than or equal to 190 is 95%.

Finally, we know that 99.7% of the data is within 3 standard deviations from the mean.  In this example, 99.7% of the potatoes are between 90 and 210 g or P of 90 less than or equal to X is less than or equal to 210 is 68%.

This rule of 68, 95, 99.7 is shown in the following image.  Recall that mu represent the population mean and sigma represents the population standard deviation.  In general, we can show that a random variable, X, is normally distributed with a mean, mu, and standard deviation, sigma, by writing X is approximately N of mu and sigma squared. where sigma squared represents the variance.

This is an image of the normal distribution, showing the percent within each interval depending on the number of standard deviations above or below the mean.

The benefit of knowing the rule of 68, 95, and 99.7 is that you can calculate many other probabilities just from those three numbers.

Work through the following interactive to help understand how to uncover many other probabilities:

NormCurve

Long Description

 

This is the ePortfolio icon. Record Your Work

This is an image of potatoes on a conveyor belt.

Given that X represents the random variable for mass of potatoes and is approximately normally distributed with a mean of 150g and a standard deviation of 20g, find the following and justify how you arrived at the number:

  1. P of X is less than or equal to 150
     
  2. P of X is less than or equal to 130
     
  3. P of X is greater than or equal to 170
     
  4. P of X is greater than or equal to 190
     
  5. P of X is less than or equal to 190
     
  6. P of 130 less than or equal to X is less than or equal to 190

Would the answers be different if the symbol used was "less than" instead of "less than or equal to"? 

Compare your answers to the solutions below.  What did you get correct?  What are you having trouble with?

Solution

  1. 50%
  2. 16% 
  3. 16% 
  4. 2.5% 
  5. 97.5% 
  6. 81.5%

The following graph could be drawn to help with the above questions:

This is an image of the normal distribution with percentages and mass of apples on the horizontal axis.
 
Consolidation

CONSOLIDATION

This is the dropbox icon. Communication and Thinking

The time it takes to drive from Orangeville to the Vaughan Mills Mall is normally distributed with a mean of 52 minutes and standard deviation of 5 minutes.  What intervals could you estimate, using the knowledge from this activity, that do not include the mean as a max or min?  For example,  P of X is less than or equal to 52 is 50% is an interval we could estimate.  However, it includes the mean as the maximum of the interval. Another example is P of 52 is less or equal to X is less or equal to 57 equals 34% as an example of a probability interval that we know from this activity, but it includes the mean as the minimum.

Include at least 7 intervals and their probabilities.  Include a sketch in your answer.

test text.