Key Concepts

Understand how to find measures of Center and Spread
Understand how to use appropriate Statistics to compare Data sets
Understand how to recognize a normal Distribution
Understand how to Classify a Data distribution.

Critique and Explain

Chen and Dakota were asked to estimate the mean and median of the following data set.

Chen said, ’The middle value is 11. Both the mean and median are approximately 11.” Dakota said

“Most of the data are the left I think the mean and median will be about 9, with the mean slightly

Larger.”

Is either Chen or Dakota correct? Explain

What strategies could you to approximate the exact mean and median

Which measure of center is more representative in this case, the mean or the median? Explain.

Solution:

Both Chen and Dakota are not correct. Because the mean of the histogram is 11(approx) and median is 8(approx).

I am going to follow the following strategies:

Mean = ∑xifi / n where,

xi= mid value of the class intervals

fi= Frequency

n = total frequency

Median = l + n/2 −cf / f * h where,

L = lower boundary of the median class

n = total frequency

cf = cumulative frequency of the median class

f = frequency of the median class

h = size of the class

The mean is more representable in the given case. We can determine the data values by above the mean or below the mean. The median is nothing but the middle value of the data when the data is written in ascending order.

Example 1: Find measures of centre and spread

What are the mean and standard deviation of the following data set?

4, 12, 15, 9, 14, 13, 6, 7, 6, 25, 3, 13, 17, 22, 4

The mean, or average of a data set is the sum of the values in the data set divided by a number of values in the data set. The Standard Deviation is a measure of how much the values in a data set vary, or deviate, from the mean. It is the measure of variability or spread of data.

You can use a spreadsheet to calculate the mean and standard deviation.

The mean and the standard deviation are used together to measure the centre and spread of the data.

The mean and the standard deviation are used together to measure the centre and spread of the data.

The mean is

x ≈ 11.6, and the standard deviation is σ ≈ 6.3

What is the five-number summary of the data set?

The five-number summary includes the minimum value, first quartile, median, third quartile and maximum value.

Step 1: Rearrange the data in ascending numerical order.

3, 4, 4, 6, 6, 7, 9, 12, 13, 13, 14, 15, 16, 17, 22, 25

Step 2: Note the minimum and maximum values:

Minimum = 3

Maximum = 25

Step 3:

Calculate the median, the number in the middle of the data set. Since there are an even number of values, the median is the average of the two middle values or 12.5.

Step 4:

Calculate the first and third quartiles. The quartiles show how the data are disturbed. The first quartile

is the median of the lower half of the data, 6. The third quartile is the median of the upper half

of the data, 15.5.

These data can be represented in a box-and-whisker plot. Notice that the one quartile is closer to the median than the other.

The five-number summary of this data set is: minimum = 3, 1^st quartile = 6, median = 12.5

3^rd quartile = 15.5, maximum = 25

Try it

List the mean, standard deviation, and five-number summary of the following data set 3, 4, 9, 12, 12, 14, 15, 19, 30, 32, 33, 34, 34, 35

Solution:

Mean = 3+4+9+12+12+14+15+19+30+32+33+34+34+35 / 14

=286 / 14

≈ 20.4

List the mean, standard deviation, and five-number summary of the following data set 3, 4, 9, 12, 12, 14, 15, 19, 30, 32, 33, 34, 34, 35

Standard deviation = ∑(x − x−)2 / n−1

= √1883.38 / 13

≈12.03

Five number summary:

Minimum value = 3

Maximum value = 35

Median = mean of n/2th observation and n2 + 1th observation.

= mean of 7^th and 8^th observations

= 15+19 / 2

= 17

First quartile = 12

Third quartile = 33

Then,

The five-number summary of this data set is: minimum = 3, 1^st quartile = 12, median = 17

3^rd quartile = 33, maximum = 35

Example 2 : Use appropriate statistics to compare data sets

How can you describe different types of distributions?

To compare the different types of distributions, look at the shape, the center, and the spread of the distributions.

The standard deviation, range, and the interquartile range are three measures of spread. The range of a data set is the difference.

When measuring centre and spread, median and interquartile range are used together, and mean and standard deviation are used together between the maximum and minimum values. The interquartile range is the difference between the third quartile and the first quartile.

A skewed distribution is one with a shape that is stretched out in either the positive or negative direction. A symmetrical distribution has a shape, when reflected across the mean, the display is roughly the same.

The shape of a distribution can affect the measures of center and spread and determine which measures the center and spread best describes the data.

Use appropriate statistics to compare data sets

The mean, median, and mode are all about the same in a symmetric distribution. You can use the mean and the standard deviation to describe the center and spread.

What measures of center and spread would you use for the following data set?

10, 13, 16, 21, 22, 26, 29, 29, 30, 32, 33, 33, 33, 35, 37 You can use a histogram to determine the shape. Since the mean is more affected than the median by a data distribution that is skewed, it is better to use the median and interquartile range as the measures of center and spread. Also, the quartiles show how the data are disturbed differently on either side of the center.

The data are already in numerical order.

The range is 37 – 10 = 27, and the interquartile range is 33 – 21 = 12

Try it

What are the better measures of center and spread of the following data sets?

55, 55, 57, 57, 57, 58, 58, 59, 59, 61, 61

110, 110, 110, 120, 120, 130,140, 150, 160, 170, 180, 190

Solution:

Step 1: Make a histogram of the data set.

The histogram is skewed to the left. So, it is better to use the median and interquartile range as

the measures of center and spread.

110, 110, 110, 120, 120, 130, 140, 150, 160, 170, 180, 190

Solution:

Step 1: Make the histogram for the data.

The histogram is skewed to the right. So, it is better to use the median and interquartile range as the measures of center and spread.

Example 3: Recognize a normal distribution

Are the following variables likely to have a normal distribution?

The heights of all people in a large group.

A normal distribution can be modeled by a particular bell-shaped curve that is symmetric about the mean. This is call the normal curve.

Approximately normal distributions can be found in many real-world situations where the data are symmetric and mostly clustered near the mean.

The heights of people in a large group are likely to be normally distributed.

The probability of landing on each of 8 equal parts of a spinner.

This data set is not normally distributed because each outcome has the same probability of occurring as any other.

The probability of landing on each of 8 equal parts of a spinner.

The scores on any test.

The scores on any test are often skewed to the left and not normally distributed, because more students will receive higher scores.

The number of children in a family.

The number of children in a family is not normally distributed. The distribution is skewed to the right because many families have 0, 1, 2, or 3 children, but very few families have 10 or more children.

Example 4: Classify a data distribution

How would you classify the following the data set? Describe the shape of the distribution and

the center and spread of the data.

106, 96 ,86, 120, 98, 76, 112, 64, 99, 72, 119, 115, 76, 120, 97

Step 1. Make a histogram of the data.

Step 2: Analyze the shape of the histogram.

Since the data are bunched to the right and have a long tail to the left, the data are skewed left.

Step 3:

Determine the center and spread of the data. Use the median and inter-quartile range.

64, 72, 76, 76, 86, 96, 98, 99, 106, 112, 115, 119, 120, 120

1^st quartile = 76, median = 98, 3^rd quartile = 115

The interquartile range is 115 – 76 = 39. Notice that the 3^rd quartile is closer to the median than

the first quartile. This is the characteristic of a distribution that is skewed left.

The distribution is skewed left with median 98 and interquartile range 39.

Try it

What is the type of distribution and the center and spread of the data ?

20 , 17 , 17 , 12 , 18 , 21 , 19 , 18 , 13 , 14 , 17 , 23 , 25

Solution:

Ascending order of the data set is

12, 13, 14, 17, 17, 17, 18, 18, 19, 20, 21, 23, 25

The histogram is skewed right. So, it is better to use the median and interquartile range as

the measures of center and spread.

Step 2:

Determine the center and spread of the data. Use the median and inter quartile range.

12, 13, 14, 17, 17, 17, 18, 18, 19, 20, 21, 23, 25

Median =

13+1 / 2 = 7^th observation = 18

1^st quartile = 14 +17 / 2

= 15.5

3^rd quartile = 20+21 / 2

= 20.5

Interquartile range = 20.5 – 15.5 = 5

The distribution is skewed with the median 18 and interquartile range 5.

Concept Summary

Data Distributions

Shapes

For distributions that are approximately normal, use mean and standard deviation to describe the data. For skewed distributions, use median and quartiles to describe the data.

Graphs

Let’s check our knowledge:

Determine the mean, standard deviation and five-number summary to the following data set.

5, 8, 5, 9, 6, 14, 9, 3, 8, 7, 10, 12

For each of data, describe the shape of the distribution and determine which measures of center and spread best represents the data.

28, 13, 23, 34, 55, 38, 44, 65, 49, 33, 50, 59, 67, 45

12, 2, 14, 4, 1, 6, 11, 7, 8, 5, 9, 10, 8, 15

Answers:

Determine the mean, standard deviation and five number summary to the following data set.

5, 8, 5, 9, 6, 14, 9, 3, 8, 7, 10, 12

Solution:

Mean =

5+8+5+9+6+14+9+3+8+7+10+12 / 12

= 96 / 12

= 8

Standard deviation:

Standard deviation =∑(x − x)2 / n−1

=√106 / 11

≈ 3.10

Five number summary:

Minimum = 3

Maximum = 14

Ascending order of the data set: 3, 5, 5, 6, 7, 8, 8, 9, 9, 10, 12, 14

Median = 8+8 / 2

= 8

First quartile = 5

Third quartile = 10

For each of data, describe the shape of the distribution and determine which measures of centre and spread best represents the data.

28, 13, 23, 34, 55, 38, 44, 65, 49, 33, 50, 59, 67, 45

12, 2, 14, 4, 1, 6, 11, 7, 8, 5, 9, 10, 8, 15

Solution:

Step 1: Make the histogram for the data set

The shape of the histogram is symmetric.

So, it is better to use standard deviation and mean to describe center and spread.

12, 2, 14, 4, 1, 6, 11, 7, 8, 5, 9, 10, 8, 15

Solution:

Step 1: Make the histogram of the data set

The shape of the histogram is symmetric.

So, it is better to use standard deviation and mean to describe the center and spread.

Exercise

Determine if each situation is likely to be uniformly distributed, normally distributed, skewed left or skewed right.

The age at which people die in United States
Number of pets owned by students at your school.
Selling price of cars in 2018.
The test scores from a history test are 88, 95, 92, 60, 86, 78, 95, 98, 92, 96, 70, 80, 89, and 96
Find the mean and the standard deviation
Find the five number summary of the test scores
Describe the type of distribution