Math/Science Initiative - Professor Shai

Mean, Median, Outliers, Mode, Distributions

Introduction:

I went bowling with Mr. Waldman the other day.  He didn't really want to go but I twisted his arm.  I told him it would help the kids learn, and I knew that he was a softy for education.  We rented the Sunday morning 3 hour special for $20 and each of us bowled 10 games.  It's the best deal!

I thought I would win but he kicked my butt.  I got:

141, 149, 138, 180, 178, 158, 162, 154, 192, 148.

He got:

172, 150, 142, 120, 110, 160, 164, 158, 123, 151.

He beat me 7/10 times!  Check it out yourself.   I was disappointed since we had a bet that the winner would take the loser to a ballgame of his choice, and Yankee tickets are expensive.  So I said -  

"Hey, if we had counted total pins then I would have won!"  

You can check that I got 1600 total pins, and he got 1450.  That means if we spread the total pins over each game equally, it is as if I got 160 each game, and he got 145.  In this way, I win each game by 15 points.  Unfortunately that is not how we made the bet, and I did really lose.

Still if you had to judge over the long run who would win more games, you might bet on me.  After all, on the average I score 15 points more per game than Mr. Waldman does.  

Mean - The mean of a list of numbers is calculated by adding the numbers up and dividing by how many numbers are in  the list.  

In the bowling example, the mean (also called average) of my scores was 1600/10 = 160.  Mr. Waldman's mean was 1450/10= 145.  

The mean is a really useful tool for answering certain kinds of questions, like "who would you bet on in a single game of bowling, Professor Shai or Mr. Waldman?"  But it does not help with other kinds of questions like who actually won more games.

Here is another example: 

A new teacher is coming to Schechter and is planning to move into Sharon.  He is interested in the housing prices.  I do a quick survey of ten houses in town.  The selling prices are:  

$360,000    $420,000    $489,000    $381,000    $521,000

$463,000    $510,000    $399,000    $612,000    $8,700,000

I drop him a quick email and report that the average (or mean) price of a random sample of ten homes in town is:  $1,285,500.  You should check this yourself - I do make mistakes with arithmetic.

He says that he took another job where the housing is more affordable.  I realized that I goofed up!  He obviously thought that a typical house in Sharon costs over a million bucks - a stretch on a teacher's salary.  I should have told him that the there are some really fancy expensive houses but that most houses are not over a million bucks. 

Median - The median of a list of numbers is the middle number of the sorted list.  

For example, 5 is the median of the list 5, 19, 2, 3, 8, because when you sort the list to get 2, 3, 5, 8, 19, you can see that 5 is in the middle.  Looking at the house price example:  If we sort the house prices (in thousands of dollars) we get the list:

360, 381, 399, 420, 463, 489, 510, 521, 612, 8700.

Now we have two middle numbers: 463 and 489.  Which one is the median?  Mathematicians agreed that in the case where there are two middle numbers, the median should be the mean of the two numbers.  In this case, the median is $476,000, the mean of $463,000 and $489,000.

If my goal was to communicate the sense of a "typical" house price, then I was a fool to send this teacher the mean of the list.  I should have sent him the median!

Mode - There is a third measurement of a list called mode which is simply the value in the list that appears most often.  

This measurement is useful when there are lots of duplicates in a list.  For example, let's say I have list of favorite ice cream flavors (chocolate, vanilla, or strawberry) for each kid in the  school, where 1 means chocolate, 2 means vanilla, and 3 means strawberry.  That's 250 numbers whose mean is useless because it will be a fraction, and whose median is useless because the order of the flavors is arbitrary.  In this distribution, I would be better off simply knowing which number appears the most.  This number is the mode, and it would tell me correctly which flavor is most popular or typical.

Outlier - An outlier is a value that is nowhere near the other values in the list.

If I had just gotten rid of the $8,700,000 home, I could have safely used mean in the price example.  It would be fair to delete that value.  No teacher is ever going to buy the Burn's mansion anyhow.  Mr. Smithers haunts the place!  The $8,700,000 value is an outlier.

Distributions -

Lists of numbers come up everywhere:  bowling scores, batting averages, house prices, heights of people, cancer cells in a tumor on different days, heart rates, etc.

The way the numbers in a list are distributed is called its distribution.  That is a circular definition, but the real definition is too technical for middle school.  A few examples may help.  Let's consider a list of 300 numbers in the range 1-100, with mean 50.  There can be duplicates, and some numbers between 1-100 may not appear at all.

Distribution 1 - Uniform

Each number is equally likely to occur in the list  Here is a picture of a uniform distribution:

 

Each yellow column represents a different range of five numbers, 1-5, 6-10, etc.  The height of each column shows how many numbers there are in that range.  You should note that each range has about the same number of numbers in it.

Distribution 2 - Normal

Numbers closer to the mean are more likely than numbers farther from the mean.  Here is a picture of a normal distribution:

The center vertical line is the mean value 50, and the blue line represents how many numbers of each value there are.  You can see that the most common number is the mean 50, and as you go further from the mean on either higher (to the right) or lower (to the left), there are fewer of these numbers in the list.

There are a wide variety of distributions.  But you should be able to distinguish between these two very different kinds.  

Example:  A list of adult male heights will be a normal distribution.  We see a lot of men who are 5 feet 10 inches tall, and much fewer who are less than 5 feet 4 inches or greater than 6 feet 4 inches, (that is 6 inches or more shorter or taller).

Example:  Let's number the days of the week Sunday - Saturday from 1-7.  A list of the days people were born will be a uniform distribution.  We expect to see a similar number of 1's, 2's. etc. through 7's.

Most list of numbers are normal distributions, hence the name.  Some distributions are uniform, and some are neither,

What about a list of dates in a month 1-31 on which people were born?  Do you expect there to be more people with dates near the mean?  I think not.  Do you expect there to be the same number of people with each date?  I also think not - why?  This distribution would neither be uniform nor normal.

Finally, the picture of a distribution tells you a lot about the mean and the median.  In a normal distribution we expect the mean and the median to be about the same,  In a distribution where there a just a few values above the mean and many below the mean, the median would be below the mean.  If there are many values above the mean and just a few below, then the median would be above the mean.

A normal distribution looks like a bell but it can be tall and narrow or broad and flat.  Tall and narrow that most of the numbers cluster around the mean, while broad and flat means that more values spread out away from the mean.  A way to measure how tight the values cluster around the mean is called the standard deviation.  About 68% of all data values lie within one standard deviation from the mean either higher or lower, and 95% lie within two standard deviations.  The standard deviation of a normal distribution is complicated to calculate but intuitively the larger the number the more flat the shape of the bell, because a wider area is needed to include the 68%.  Here is a good place to learn more about standard deviation.

Summary:

Mean, median and mode are all measurements of a list of numbers.  The purpose of each is to try to get a sense of the "typical" number in the list.  Sometimes mean gives you a better sense of typical, sometimes median does, and sometimes mode.  Outliers are data values that are way off the mean, and can safely be discarded.  Distributions of lists vary widely and you can learn a lot about a list of numbers from its distribution.

Problems:

1.  Calculate the mean and median of the following data points:
    145   167   124   110   128   138   136   135   189   190   120   118   139
    127   138   145   143   165   156   143   128   192

2.  Make a chart with 110, 120, 130, 140, 150, 160, 170, 180, 190 at the bottom.  Then draw a bar upwards next to each number X showing how many numbers in the list are at least X, but no more than X+10. 

3. Do you think these numbers realistically look like a single person's bowling scores, or like a group of unrelated scores?  Why?  Does this distribution look more like a uniform distribution or a normal distribution? 

4.  Do the same three questions with the list below:
    123   146   148   139   151   147   152   185   128   149   148   176   125  
    144   167   163   134   132   138   158   154   172   181

5.  For each of the following kinds of lists, would you expect the distribution to look uniform or normal?
    a.  The number of hours of TV each kid in the middle school watches.
    b.  The last digit of the address of each middle school student.
   
c.  The heights of kids in the middle school.
    d.  The number of letters in the last name of each middle school student.
    e.  The day of the month each middle schooler was born on.
    f.  The weights of kids in the middle school.

6.  Pick one of the choices above and actually gather the data for your grade.  Make a chart and look at the distribution.

Under Construction All Year


back

Email me: shai@stonehill.edu

My professional homepage