In this activity, you will learn about polygons made from histograms and how efficient the polygons are at approximating area of rectangles. The following video is related to this idea, and is a really neat application of polygons.
Recall that the frequency histogram describes the number of data points in a given interval.
Definition: The frequency polygon connects the midpoints of the tops of all of the bars on a histogram, creating a polygon that approximates the frequency histogram.
Using this data for CO2 emissions of coal plants in the United States seen in Unit 6, follow the instructions to learn about the frequency polygon.
The point of the above exercise was for you to see that smaller interval sizes can make the polygon better approximate the actual data or, in other words, minimize the area that the bars are above or below the polygon.
However, you can't just make the intervals as small as possible, as in when you made the interval size 0.01 for the CO2 emissions data. Just as when you are building a histogram, there is a reasonable interval size based on how spread out the data is, how much data was collected, and how precisely (definition:How many decimal points were used when collecting the data?) the data was collected.
From the last activity, you were able to calculate probabilities on any interval by finding the area of the bars that would span that interval. This was made possible when the total area is equal to 1. The frequency histogram can be changed by technology into a relative frequency histogram, giving the probabilities/percentages of data in each of the given interval. The probabilities/percentages can be read off the corresponding frequency table.
Returning to Geogebra and the data for CO2 emissions of coal plants in the United States, set your interval size of the histogram to 0.1. You can also have the Histogram start at 1.9.
Now, you may ask, why do we need to make a probability density curve and have the area under 1? A suitable way to find the probability that a random point is within an interval is to add up all of the relative frequency/probability/percent amounts in each interval that is included. If you only wanted to include a portion of the interval, you could just add that proportion of the percent that the chart says.
The key about the polygon is, you can find a mathematical function that approximates the polygon, and you can use technology that uses higher level mathematical processes (definition:The concept of integration is an extension from Calculus and an application of integration allows you to find the area under the curve between two points.) to find the area under the curve on any given interval.
Term | Meaning | Use |
---|---|---|
Frequency Histogram | A histogram that displays the intervals in which the data points lie on the horizontal axis and the total number of data points in each interval on the vertical axis. | Shows the distribution of the data and the modal interval. |
Frequency Polygon | The polygon that connects the midpoints of the tops of the frequency histogram bars. | Approximates the frequency histogram. |
Relative Frequency Histogram | A histogram that displays the intervals in which the data points lie on the horizontal axis and the percentage of data points in each interval on the vertical axis. | Shows the probability that a data point lies in one of the given intervals on the vertical axis. |
Relative Frequency Polygon | The polygon that connects the midpoints of the tops of the relative frequency histogram bars. | Approximates the relative frequency histogram. |
Probability Density Histogram ("Normalized" in Geogebra) |
A histogram that displays the bars in a way that their areas are equal to their relative frequencies/probabilities/percentages | Used to create a probability density polygon. |
Probability Density Polygon/Curve | The polygon that connects the midpoints of the tops of the probability density histogram bars. | When the polygon can be approximated with a mathematical function, technology that uses calculus can find the area of any interval. |
Now, although finding areas with these complex mathematical processes are not part of this course, you are going to explore one mathematical function that approximates a good amount of situations, we call it, suitably, the "Normal Curve."
The data for CO2 emissions above was generated artificially from information about coal plants in the United States. The 486 data points were generated artificially so that they would create an average of 3.5 million tonnes of CO2 emissions each year and a sum of 1.7 billion. After generating the probability density polygon, you can click to see the normal curve:
The equation for the curve is somewhat complicated, as an exponential equation with a quadratic exponent.
Fortunately, to make it more accessible for people to use, it has useful properties and a chart that shows the area of all values less than a given point.
You will learn how to use this chart in this unit.
For now, you will take a moment to explore where we have already seen this shape, in discrete probability distributions.
Explore the following interactive which will allow you to simulate both the Binomial and Hypergeometric Distributions.
Recall, the Binomial distribution is used when trials are independent. You will choose how many objects to take and the percentage of the objects that are winners.
For the Hypergeometric distribution, trials are dependent, often because objects are not replaced. You will choose how many total objects there are and how many of them are winners.
For both, you will explore the shape of the graph that is made from the number of successes and the probability of success.
Using the interactive answer the following questions.