Key Concepts

Understand how to compare data sets displayed in dot plots
Understand how to compare data sets displayed in box plots
Understand how to compare data sets displayed in histograms
Understand how to make observations with data displays

Compare Data Sets Displayed in Dot Plots

How can you use measures of centre and spread to compare data sets?

Example 1

Sawyer has narrowed his car search down to two different types of cars. To make an informed decision, he gathers data on estimated highway fuel efficiency (mpg) of the two different types of cars . The dot plots show the data for each type.

If highway fuel efficiency is the most important feature to Sawyer, which type of car should Sawyer purchase?

The data displays suggest that Type 2 cars have better highway fuel efficiency.

The data for Type 1 car are clustered from 33 to 38 and the data for Type 2 cars are clustered from 35 to 44.

Based on this data display, Sawyer should purchase a car in the type 2 category.

Vocabulary

An outlier is a data value that is very different from the others. In the data set for Type 2,51 appears to be an outlier.

Study Tip

Remember that the mean absolute deviation (MAD) is a measure of variability that describes how much the data values are spread out from the mean of the data set.

Sawyer wants more information about the fuel efficiency of each type of car, so he calculates the mean and the mean absolute deviation (MAD) of the two data sets. How can these measures help him make a more informed decision?

Mean Fuel Efficiency

Type 1 Cars35.75 mpg

Type 2 Cars40.25 mpg

The mean fuel efficiency of Type 2 cars is greater than the mean fuel efficiency of Type 1 cars.

Sawyer also wants to consider how much his data vary to determine reliability.

The mean absolute deviation (MAD) is the mean of the differences between each value in a data set and the mean of the data set.

The MAD helps you determine how much data vary within a particular data set.

To calculate mean absolute deviation, calculate the absolute value of the difference between each data point and the mean. Then find the mean of those differences.

MAD for Type 1 Cars (mean : 35.75)

2.75(2) + 1.75(3) +0.75(3)+ 0.25(3) + 1.25(2) + 2.25(2) + 5.25 = 26

The sum of all of the differences between the data points and the mean is 26.

Divide that by the number of data points to find the MAD.

26 ÷16 ≈ 1.63

The MAD of Type 1 cars is about 1.63.

Use a similar process to find the MAD of Type 2 cars.

The MAD of Type 2 cars is about 2.94.

So, while the mean fuel efficiency for Type 2 cars is greater than for Type 1 cars, there is more variation with Type 2 cars.

This could mean that the expected fuel efficiency is less reliable for Type 2 cars.

Try It!

How does the outlier in the second data set affect the mean and the MAD?

The outlier on the second data set caused the mean and the MAD to be much higher than when it is not included since it is much higher than the other data.

Use Structure

Recall that the interquartile range (IQR) is the difference of the third and first quartiles and represents the spread of the middle 50% of the data values.

How does the structure of a box plot represent the IQR?

Example 2

Kaitlyn and Philip go to neighboring high schools, and both are sponsoring charity fundraisers. Kaitlyn claims that students at her school are raising more for charity than the students at Philip’s school. The amounts raised by a random sample of 30 students at each school are shown in the box plots below. Do the data support Kaitlyn’s claim?

Analyze the distribution of values in each data set.

While the minimum and maximum amount of money raised at each school was the same, the spread of data points between the minimum and maximum values varies.

The sample data show that 50% of the students at Kaitlyn’s school raised between $32 and $52. At Philip’s school, 50% of the students in the sample raised between $45 and $56.
Based on the data, 50% of the students at Kaitlyn’s school raised $45 or more; At Philip’s school, 50% raised $50 or more.

The data do not support Kaitlyn’s claim.

Instead, they suggest that individual students at Philip’s school raised more money than individual students at Kaitlyn’s school.

Try It!

How does the IQR compare to the range for each school?

Step 1

The IQR or interquartile range is the difference of the third and first quartiles and represent the spread of the middle 50% of the data values.

For Kaitlyn’s school,

IQR = 52 – 32 = 20

For Philip’s school,

IQR = 56 – 45 = 11

Kaitlyn’s school has the higher IQR which means that the amounts raised by the middle 50% of the students vary more than the amounts raised by the middle 50% of the students from Philip’s school.

Try It!

How does the IQR compare to the range for each school?

Step 2

The range is the difference of the maximum and minimum values and represent the spread of the whole data set

For Kaitlyn’s school,

range = 68 – 25 = 43

For Philip’s school,

range = 68 – 25 = 43

Kaitlyn’s school and Philip’s school have the same range so their data are equally spread out

Common Error

When comparing histograms of data sets, be sure the intervals of the histogram are the same.

Compare Data Sets Displayed in Histograms

Example 3

A marketing team compares the ages of a random sample of 30 viewers of two popular new shows to decide which product to advertise during each show. During which show should the marketing team advertise a product that is targeted at adults aged 20-29?

From the data displays, you can make several observations about the data collected by the marketing team.

Of the 30 viewers in the sample, each show has 8 viewers between the ages of 20 and 29.
Show 1 has no viewers between the ages of 20 and 24.
Show 2 has viewers in each subsection of the target range; 20-24 and 25-29.
Show 2 also has viewers in the age brackets just above and just below the target range, who are potential customers.

Based on this sample, the marketing team should advertise during Show 2 because that show has broader appeal.

From the data displays, you can make several observations about the data collected by the marketing team.

Of the 30 viewers in the sample, each show has 8 viewers between the ages of 20 and 29.
Show 1 has no viewers between the ages of 20 and 24.
Show 2 has viewers in each subsection of the target range; 20-24 and 25-29.
Show 2 also has viewers in the age brackets just above and just below the target range, who are potential customers.

Based on this sample, the marketing team should advertise during Show 2 because that show has broader appeal.

Try It!

3. If the marketing team wants to advertise a product that is targeted at adults 25-34, during which show should they advertise?

Show 1 has 18 viewers between the ages of 25 and 34 while

Show 2 has only 9 viewers between the ages of 25 and 34

so the marketing team should advertise during Show l.

Make Observations With Data Displays

Use Appropriate Tools

You may want to enter the data into a spreadsheet so you can easily sort and perform calculations.

Example 4

Nadia collected data from 15 classmates about the number of text messages they send on school days and the number of text messages they send on non-school days. Nadia organized her data in the tables below. How can you use a box plot to compare the data that she collected?

Step 1 : Calculate the five-number summary for each set of data.

School Day Texts

Minimum : 0

Maximum : 26

Q1 : 9

Median : 17

Q3 : 22

IQR : 13

Non-School Day

Texts Minimum : 0

Maximum : 80

Q1 : 40

Median : 50

Q3 : 60

IQR : 20

Step 2 : Use the information to create a box plot to represent each set of data.

Step 3 :

Use the data displays to make observations about the data sets.

Students send far more texts on non-school days than on school days.

There is more variation in the number of texts sent on non-school days than on school days.
One person does not send any texts on non-school days.

This represents an outlier because it is far from the other data values.

Try It!

a .Provide a possible explanation for each of the observations that was made.

Make 2 more observations about the data that Nadia collected.

Students send far more texts on non-school days than on school days because the students may have no home works to do compared to when it is school days. There is more variation in the number of texts sent on non-school days than on school days because some students may prefer to text more than most of the students. One person does not send any texts on non-school days because he/she might be doing other activities or do not own a cellular phone.

One observation is that 50% of the students sent more than 17 texts on school days and more than 50 texts on non-school days. Another observation is that one student sent 80 texts on non-school days which is the highest number of texts sent

Dot Plots

Dot plots show how a particular data point fits in with the rest of the data

For a more specific measure of variance, find the mean absolute deviation

Box Plots

Box plots show the minimum, maximum, and measures of center of the data

Histogram

Histograms allow you to easily compare data ranges.

How can you use measures of center and spread to compare data sets?

How are the MAD and the IQR similar? How are they different?

When comparing two sets of data, it is common to look at the means. Why might the MAD be a useful piece of information to compare in addition to the mean?

Val says that if the minimum and maximum values of two data sets are the same, the median will be the same. Is Val correct? Explain.

Use the two data sets.

How do the means compare?

How do the MADs compare?

How do the medians compare?

How do the IQRs compare?

Which measures of center and spread are better for comparing data sets A and B? Explain.

1. How can you use measures of center and spread to compare data sets?

Step 1

Measure of center like mean, median and spread like standard deviation, mean absolute deviation gives us idea about the way the data values are distributed for a given data set

Step 2

For example, consider return on two portfolios of investment, If the two portfolios have same mean but the first one has larger spread, then one can infer that there are chances that return on this portfolio would be very at times larger and at times smaller than return on second portfolio. Depending upon the investors preference, one portfolio could be more attractive to him than the other.

Step 3

Similarly, consider data on temperature of two cities where the spread is same but mean is higher for first city. Then, we can infer that most of the times, the first city is warmer.

2. How are the MAD and the IQR similar? How are they different?

Similarity :

Both IQR and MAD are measures of spread.

The larger is their value, the larger would be spread of data.

Differences : Change in values between first and third quartile does not affect IQR but would affect MAD.

MAD is much more responsive to extreme data values/outliers.

Both are measure of spread but their responsiveness to data values is different.

3. When comparing two sets of data, it is common to look at the means. Why might the MAD be a useful piece of information to compare in addition to the mean?

MAD is measure of spread of data values. Many times mean along with spread is important.

For example, when looking at data on returns on two portfolios. For an individual who wants to decide which portfolio to choose, both the average return and spread of returns ( reflecting riskiness associated with portfolio) need to be taken into while making choice.

Spread is also important in a number of situations like choosing among different portfolios of investment

4. Val says that if the minimum and maximum values of two data sets are the same, the median will be the same. Is Val correct? Explain.

No, Val is incorrect

Consider the following data sets : Data set 1 : 1, 4, 6, 8, 10

median=6

Data set 2 : 1, 4, 7, 8, 10

median=7

The two data sets have same maximum and minimum values but different median value.

5. How do the means compare?

6. How do the MADs compare?

7. How do the medians compare?

8. How do the IQRs compare?

9. Which measures of center and spread are better for comparing data sets A and B? Explain.

Uses the formula, mean = ∑xi / n

where x are the data values and n is the number of observations.

For data set A, sum of observations is 1305 and mean is 87.

For data set B, sum of observations is 1248 and mean is 83.2

The mean of data set, B is greater.

Finding MAD for data set A : Mean deviation for different values is

| 76 – 87 | = 11

|83 – 87 | = 4

|84 – 87 | = 3

| 85 – 87 | = 2

| 86 – 87 | = 1

|87-87 | = 0

|89 – 87 | = 2

| 90 – 87= 3

|94 – 87 | = 7

|98 – 87 | = 11

Sum of absolution deviations is given by

11(1) + 4(2) + 3(1) + 2(1)+1(2) + 0(3) + 2(1) +2(2)+7(1) +11(1) = 50.

The mean absolute deviation is

50 / 15 ≈ 3.47

Finding MAD for data set, B : Mean deviation for different values is

| 70 – 83.2 | = 13.2

| 75 – 83.2 | = 8.2

| 80 – 83.2 | = 3.2

| 81 – 83.2 | = 2.2

|84 – 83.2 | = 0.8

| 87 – 83.2 | = 3.8

|88 – 83.2 | = 4.8

|89 – 83.2 | = 5.8

Sum of absolution deviations is given by

13.2(1) + 8.2(2) +3.2(1) + 2.2(2) + 0.8(1) + 3.8(4) + 4.8(2) + 5.8(2) = 74.4. 74.4

The mean absolute deviation is

74.4 / 15 ≈ 4.96

Mean absolute deviation of data set, B, is greater.

We can see that median of both data sets is 87.

Result The median is same for both sets, median is 87

IQR for data set, A is 5 whereas for data set, B, it is 7.5. IQR is greater for data set, B

The measures of center are mean and median.

The measures of spread are MAD, IQR, and range.

From Question 5, we know the mean for Set A is 87 and the mean for Set B is 83.2.

The means are then close but not the same.

From Question 6, we know the MAD for Set A is about 3.47 and the MAD for Set B is about 4.96.

The MADs are close but because they are small numbers, small differences mean much more variability.

From Question 7, we know the median for Set A is 87 and the median for Set B is 87.

The medians are then the same.

From Question 8, we know the IQR for Set A is 6 and the IQR for Set B is 8.

The IQRs are then close but not the same.

The range for Set A is 98 – 76 = 22 and the range for Set B is 89 – 70 = 19.

The ranges are then close but not the same.

Since the medians were the exact same but the means were not, then the mean is the best measure of center for comparing the two data sets.

The ranges and IQRs were both about the same while the MADs showed that Set B had much more variability than Set A.

The MAD is then the best measure of spread for comparing the two data sets.

The mean and MAD are the best measures for comparing the two data sets

Exercise

What is the MAD of the following data:

20,25,12,13,15,16,17,18

Fill in the blanks
MAD…………………………(Abbreviation)
Mid value is also known as …………………..
No. of types of data displays………………..
For a more specific measure of variance , we need to find……………

Key Concepts

Compare Data Sets Displayed in Dot Plots

Vocabulary

Study Tip

Try It!

Use Structure

Try It!

Try It!

Common Error

Compare Data Sets Displayed in Histograms

Try It!

Make Observations With Data Displays

Try It!

Dot Plots

Histogram

Exercise

Concept Summary