Mean, Median, Outliers, Mode, Distributions
Introduction:
I went bowling with Mr. Waldman the other day. He didn't really want to go but I twisted his arm. I told him it would help the kids learn, and I knew that he was a softy for education. We rented the Sunday morning 3 hour special for $20 and each of us bowled 10 games. It's the best deal!
I thought I would win but he kicked my butt. I got:
141, 149, 138, 180, 178, 158, 162, 154, 192, 148.
He got:
172, 150, 142, 120, 110, 160, 164, 158, 123, 151.
He beat me 7/10 times! Check it out yourself. I was disappointed since we had a bet that the winner would take the loser to a ballgame of his choice, and Yankee tickets are expensive. So I said -
"Hey, if we had counted total pins then I would have won!"
You can check that I got 1600 total pins, and he got 1450. That means if we spread the total pins over each game equally, it is as if I got 160 each game, and he got 145. In this way, I win each game by 15 points. Unfortunately that is not how we made the bet, and I did really lose.
Still if you had to judge over the long run who would win more games, you might bet on me. After all, on the average I score 15 points more per game than Mr. Waldman does.
Mean - The mean of a list of numbers is calculated by adding the numbers up and dividing by how many numbers are in the list.
In the bowling example, the mean (also called average) of my scores was 1600/10 = 160. Mr. Waldman's mean was 1450/10= 145.
The mean is a
really useful tool for answering
certain kinds of questions, like "who would you bet on in a single game
of
bowling, Professor Shai or Mr. Waldman?" But it does not help
with
other kinds of questions like who actually won more games.
Here is another example:
A new teacher is coming to Schechter and is planning to move into Sharon. He is interested in the housing prices. I do a quick survey of ten houses in town. The selling prices are:
$360,000 $420,000 $489,000 $381,000 $521,000
$463,000 $510,000 $399,000 $612,000 $8,700,000
I drop him a quick email and report that the average (or mean) price of a random sample of ten homes in town is: $1,285,500. You should check this yourself - I do make mistakes with arithmetic.
He says that he took another job where the housing is more affordable. I realized that I goofed up! He obviously thought that a typical house in Sharon costs over a million bucks - a stretch on a teacher's salary. I should have told him that the there are some really fancy expensive houses but that most houses are not over a million bucks.
Median - The median of a list of numbers is the middle number of the sorted list.
For example, 5 is the median of the list 5, 19, 2, 3, 8, because when you sort the list to get 2, 3, 5, 8, 19, you can see that 5 is in the middle. Looking at the house price example: If we sort the house prices (in thousands of dollars) we get the list:
360, 381, 399, 420, 463, 489, 510, 521, 612, 8700.
Now we have two middle numbers: 463 and 489. Which one is the median? Mathematicians agreed that in the case where there are two middle numbers, the median should be the mean of the two numbers. In this case, the median is $476,000, the mean of $463,000 and $489,000.
If my goal was to communicate the sense of a "typical" house price, then I was a fool to send this teacher the mean of the list. I should have sent him the median!
Mode - There is a third measurement of a list called mode which is simply the value in the list that appears most often.
This measurement is useful when there are lots of duplicates in a list. For example, let's say I have list of favorite ice cream flavors (chocolate, vanilla, or strawberry) for each kid in the school, where 1 means chocolate, 2 means vanilla, and 3 means strawberry. That's 250 numbers whose mean is useless because it will be a fraction, and whose median is useless because the order of the flavors is arbitrary. In this distribution, I would be better off simply knowing which number appears the most. This number is the mode, and it would tell me correctly which flavor is most popular or typical.
Outlier - An outlier is a value that is nowhere near the other values in the list.
If I had just gotten rid of the $8,700,000 home, I could have safely used mean in the price example. It would be fair to delete that value. No teacher is ever going to buy the Burn's mansion anyhow. Mr. Smithers haunts the place! The $8,700,000 value is an outlier.
Distributions -
Lists of numbers come up everywhere: bowling scores, batting averages, house prices, heights of people, cancer cells in a tumor on different days, heart rates, etc.
The way the numbers in a list are distributed is called its distribution. That is a circular definition, but the real definition is too technical for middle school. A few examples may help. Let's consider a list of 300 numbers in the range 1-100, with mean 50. There can be duplicates, and some numbers between 1-100 may not appear at all.
Distribution 1 - Uniform
Each number is equally likely to occur in the list Here is a picture of a uniform distribution:
Each yellow column represents a different range of five numbers, 1-5, 6-10, etc. The height of each column shows how many numbers there are in that range. You should note that each range has about the same number of numbers in it.
Distribution 2 - Normal
Numbers closer to the mean are more likely than numbers farther from the mean. Here is a picture of a normal distribution:
The center vertical line is the mean value 50, and the blue line represents how many numbers of each value there are. You can see that the most common number is the mean 50, and as you go further from the mean on either higher (to the right) or lower (to the left), there are fewer of these numbers in the list.
There are a wide variety of distributions. But you should be able to distinguish between these two very different kinds.
Example: A list of adult male heights will be a normal distribution. We see a lot of men who are 5 feet 10 inches tall, and much fewer who are less than 5 feet 4 inches or greater than 6 feet 4 inches, (that is 6 inches or more shorter or taller).
Example: Let's number the days of the week Sunday - Saturday from 1-7. A list of the days people were born will be a uniform distribution. We expect to see a similar number of 1's, 2's. etc. through 7's.
Most list of numbers are normal distributions, hence the name. Some distributions are uniform, and some are neither,
What about a
list of dates in a month 1-31 on
which people were born? Do you expect there to be more people
with dates
near the mean? I think not. Do you expect there to be the
same
number of people with each date? I also think not - why?
This
distribution would neither be uniform nor normal.
Finally, the
picture of a distribution tells you a lot about the mean and the
median. In a normal distribution we expect the mean and the
median to be about the same, In a distribution where there a just
a few values above the mean and many below the mean, the median would
be below the mean. If there are many values above the mean and
just a few below, then the median would be above the mean.
A normal
distribution looks like a bell but it can be tall and narrow or broad
and flat. Tall and narrow that most of the numbers cluster around
the mean, while broad and flat means that more values spread out away
from the mean. A way to measure how tight the values cluster
around the mean is called the standard
deviation. About 68% of all data values lie within one
standard deviation from the mean either higher or lower, and 95% lie
within two standard deviations. The standard deviation of a
normal distribution is complicated to calculate but intuitively the
larger the number the more flat the shape of the bell, because a wider
area is needed to include the 68%. Here is a good
place to learn more about standard deviation.
Summary:
Mean,
median and mode are all measurements of a list of numbers. The
purpose of
each is to try to get a sense of the "typical" number in the
list. Sometimes mean gives you a better sense of typical,
sometimes median
does, and sometimes mode. Outliers are data values that are way
off the mean, and can safely be discarded. Distributions of lists
vary widely and you can learn a lot about a list of numbers from its
distribution.
Problems:
1. Calculate the mean and
median of the following data points:
145 167 124
110 128 138 136
135 189 190 120
118 139
127 138 145
143 165 156 143
128 192
2. Make a chart with 110, 120, 130,
140, 150, 160, 170, 180, 190 at the bottom. Then draw a bar
upwards next to each number X showing how many numbers in the list are
at least X, but no more than X+10.
3. Do you think these numbers
realistically look like a single person's bowling scores, or like a
group of unrelated scores? Why? Does this distribution look
more like a uniform
distribution or a normal
distribution?
4. Do the same three questions with
the list below:
123 146 148
139 151 147 152
185 128 149 148
176 125
144 167 163
134 132 138 158
154 172 181
5. For each of the following kinds
of lists, would you expect the distribution to look uniform or normal?
a. The number of hours of TV each kid in the
middle school watches.
b. The last digit of the address of each
middle school student.
c. The
heights of kids in the middle school.
d. The
number of letters in the last name of
each middle school student.
e. The day of the month each middle schooler
was born on.
f. The weights of kids in the middle school.
6. Pick one of the choices above
and actually gather the data for your grade. Make a chart and
look at the distribution.
Under Construction All Year