For each of the following graphs, describe one conclusion that you could take from it:
1) Out of the people who reported incidents of racial profiling in Ontario in the last 12 months, the following is found:

2) The times it takes for surveyed students to get to school is recorded and summarized as follows:

3) For each of the students surveyed amount the time it takes to get to school, they were asked how many absences they had:

4) 3 Coins are flipped multiple times and the number of heads were recorded each time and summarized:

Statistics can be categorized into many different types. Those types of data will tell you if you can organize it into percentages, rank the data and calculate an average, or draw a line of best fit and predict future outcomes. The first major distinction you will make is if the data is categorical or numerical.
What is the colour of your eyes? What gender do you self-identify with? How would you rate your perceived mental health? These are all examples of categorical data.

Definition: Categorical data is data that does not take a numerical value, but instead has values that are qualitative or are categories.
We can display categorical data in a bar graph or a circle graph.
There are two types of categorical data, ordinal and nominal.
Definition: Ordinal data is categorical data that can be ranked in a logical way.
Definition: Nominal data is categorical data that has no apparent order to the data.
Categorical data, both nominal and ordinal can be displayed in either a bar graph or a circle graph. If ordinal data is displayed in a bar graph, the bars should be in the order that makes sense with respect to the data.
How many days a week do you work? How long does it take you to get to school? The data collected by both of these questions would be numerical data. There are two types of numerical data: continuous and discrete.
The difference between the two is whether or not they are obtained by measuring or counting.
Definition: Continuous data is data that is obtained by measuring. You can measure data in different ways, including time and distance. Because there is always a data point that can exist between two data points and the possibility of infinite data points, measured data is continuous and organized into intervals.

How long does it take you to get to school? Your answer here could be 20 minutes for example. Say the intervals the data was being organized into were: 15-20 minutes, 20-25 minutes and 25-30 minutes. What interval would your time go into? You would have to look at the accuracy of your time. As a continuous data point, 20 minutes is either higher than 20 (20.000000001 for example) or lower than 20 (19.999999999 for example). So you would always be able to decide which interval you belong to.
Definition: Discrete data is data that is obtained by counting. This type of data was what you focused on in the first half of this course. Discrete data points, unlike continuous ones, do not have points between points. There are a finite number of possibilities.
Numerical data is typically displayed with a bar graph or a histogram. There is a constant debate as to the difference between the two and which to use. Categorical data can be displayed as a bar graph but not a histogram. Continuous data can be displayed in a histogram but not a bar graph. Discrete data can use both a bar graph and a histogram. Do you see the confusion?
The simplest way of thinking about this is histograms are used for continuous data and bar graphs are used for discrete data.
In Unit 3 of this course, you displayed probability distributions in a bar graph. It is possible to display them in histograms, although we did not do this. You can also display discrete data as a frequency (total number of each outcome) and as a percentage. Each of these displays will look very similar and result in the same conclusion.
The following outcomes are the number of heads when flipping 3 coins. Display as a frequency bar graph, frequency histogram, probability bar graph, and a probability histogram: 3, 2, 1, 2, 0, 1, 2, 1
Frequency Bar Graph

Probability Bar Graph

|
Frequency Histogram ![]() |
Probability Histogram ![]()
|
Note: The following video will demonstrate how to create these graphs in a spreadsheet. Editing was done in Paint to put the labels on the frequency histogram in the center and to put the lines on the probability histogram.
What do you notice about the 4 graphs above? Would any of them give a different conclusion?
75% of the time 1 or 2 heads came up.
It is useful to say that the probability histogram, where each of the bars have a width of 1, demonstrates the probability as the area of each of the bars. This is a much more useful concept when we come to probability histograms for continuous data later in the course.
Histograms are much more useful, when displaying continuous data.
The following are times that students commute to school. To put an emphasis on the fact that the data is continuous, the data is recorded to 3 decimal places and represents the number of minutes: 23.405, 24.774, 27.344, 21.412, 12.280, 26.799, 16.309, 15.857, 22.287, 2.521, 17.651, 13.518, 30.282, 18.478, 22.705, 19.999, 30.605, 20.835.

The following video demonstrates how it was created:
11 of the 19 respondents took 20 minutes or more to get to school.
One final difference in data can be seen in the last example.
Definition: The individual data points given are called microdata. Microdata is incredibly useful because you can go back and make calculations on all of the individual data.
If you had a chart of data already organized into intervals, without the original data set, this is an example of aggregate data.
Definition: aggregate data is data that has been summarized in some way where you can't know what the original microdata was.
The data from the histogram above, could be written in a table:

And if the original data was not given or lost, it would be referred to as aggregate data.
One final distinction when analysing data is not so much in the data type, but when displaying numerical data.
If you look at the bar graphs, histograms, and circle graphs, they all look at one variable at a time and display the number of times or percentage of times an individual point or interval is counted. These are displays for a one variable data set. We have ways to summarize one variable data sets as well. Average is one of those ways, and additional ways will be explored in the next unit.
If you want to compare two numerical variables to each other, you don't use histograms and bar graphs, you use scatterplots.
From before, we had the one variable of commuting times to school. Now, say we collected the student absences for each student as well. This becomes the second variable:

The scatter plot would be as follows:

As the time it takes to get tp school increases, so does the number of absences.
There will be a focus on the displays and analysis of one and two variable data sets in the next unit.
Use Stats Canada to find a data set to display. Create a graph and make one conclusion based on the graph. Watch the following video for some assistance.