Introduction to Statistics with M&Ms Using EXCEL
Share the mean and standard deviation of all the sets of
M&M's. Compute the mean of the means for plain and
peanut M&Ms. The standard deviations can be combined
by squaring them, adding them together, and then taking the
square root. Use these class means and standard deviations
to plot the Gaussian distributions for plain M&Ms and peanut
M&Ms.
You need to get four sets of data from other students: two sets
of the same kind of M&M's, and two of the other kind.
You will do F and t tests for these sets against your own
set. When you do a set that matches the type you have, say
plain vs plain, the F test should say that the variances are the
same, and you will then do a t test assuming equal variances.
When you do a test using two different kinds of M&Ms, plain
vs peanut, the T test should say that the variances are not
equal, and you should then do a t test assuming unequal
variances.
Sample and Measurements. Each student will receive a
bag containing about 20 M&Ms; they may be plain or peanut
M&Ms. Weigh the mass of each M&M in grams on the new
Mettler-Toledo XS204 balances in the balance room. In your
notebook record the color and mass of each M&M. Record all digits from the balance!
Be sure to make the M&Ms "disappear" after you have
weighed them.
Analysis. 1 Descriptive Statistics:
Compute the "descriptive statistics" of your data. This
will usually include the count or sample size, the average, the
standard deviation, the standard error (aka the standard deviation in the mean), and
the 95% confidence limits. The minimum, the maximum, and the
range are also useful.
Excel. Probably the easiest way to
do this is to use the "Descriptive Statistics" function in the Data
Analysis Tool Pak in Excel. Place all your data in a
column of a worksheet. Then click on "Data Analysis ..." in
the Tools menu of Excel, and select "Descriptive Statistics" from
the alphabetical list. The Descriptive Stats routine will want
you to select a range for your data. Check the box for
"Summary Statistics". The "New Worksheet Ply" means that the
output will be placed on a brand new sheet in the workbook.
That probably isn't what you want to do, but although go ahead
and give it a try. A better option is push the "Output Range"
button which will cause the output to be placed on the current
worksheet; however, you must select a 15 x 2 set of cells for the
output. Then click the OK button and the results will spill
out on to the sheet. There will a lot of results! Remove unnecessary statistics such as "skewness"
from the table before you copy it into Word.
If the Data Analysis Tool Pak is not available
on your computer, you can use the functions in the
spreadsheet. The 95% confidence limits are a range defined
by the average plus or minus t times the standard
error:
The correct value of t can be obtained
using TINV(0.05, n-1). n-1 is the number of degrees of
freedom.
Do NOT use the function CONFIDENCE in EXCEL to compute confidence
limits!
The range can be computed as MAX - MIN.
Use the mean and standard deviation to plot a Gaussian curve for
your M&Ms using NORMDIST. Once you have used the data to
find the curve parameters, you don't need the data for
plotting. The function is defined by its mean and standard
deviation, so there is no need to try to compute
the function for each M&M mass; the function should
be plotted over whatever range of numbers lets you see the graph
best.
Sort the M&M data by color. Do a t test on the two most
frequent colors in your set of M&Ms. For example, test
whether the average mass of the blue and yellow M&Ms are
significantly different. Use the instructions below for t
test assuming equal variances.
2 F tests and t tests:
The F test answers the question "Is my standard deviation
significantly different from the standard deviation of another set
of data?" The t test answers the question "Is my mean value
significantly different from mean value of another set of data?"
You must compare the standard deviation and mean of your
data with those of four other sets of data: two that are the same
type of M&Ms as your own, and two that are the other type of
M&M! For example, if you have peanut M&Ms, you
will compare your data set against two sets of plain M&Ms and
also against two sets of peanut M&Ms. You must always
do the F test first, because its result will determine how you do
the t test.
Excel. You will find that the F
test is available in the Data Analysis Tool Pak. You
must have each set of data on the same spreadsheet to perform the F
test. A 3 x 10 set of cells is needed to contain the output.
Consult your Quant book to interpret the results correctly, because
it can be confusing. The test will give you two F values, F
and Fcritical. If F is farther from 1.00 than Fcritical,
then the standard deviations are different. Arrange the test so that F comes out greater
than 1.00, then check to see if F>Fcrit; if
so, then the standard deviations are different. If not, then
they are not significantly different.
If the
Data
Analysis Tool Pak is not available on your computer,
you can use the FTEST function in EXCEL. The function
returns
the
probability of an insignificant difference, but it does not give the
F values. Use FTEST(range1, range2) to find
Use FINV(df1, df2)
to find the value of F and FINV(0.05, df1,df2) to find the
value of Fcrit. The dfs are the
number of degrees of freedom in each data set, n-1. The first
value df1 must be the set with the greater standard deviation.
If<0.05, then the difference
is significant, and you should find that F>Fcrit.
The t test is also in the Data Analysis Tool Pak; however, you must select the t test based on the results
of the F test! Choose either t Test: Two
Sample Assuming Equal Variances
or t Test: Two Sample Assuming Unequal
Variances depending on whether the F test showed no
difference or a difference. You will need a 3 x 14 set of
cells for the output. Again, consult your Quant book to
interpret the results. There will be three T values: T stat, T
critical one-tail, and T critical two-tail. Ignore the
one-tail value. If the absolute value of T stat is
greater than T critical two-tail, then the means are
significantly different.
If the Data Analysis Tool Pak is
not available on your computer, you can use the
TTEST function in EXCEL. The function returns
the
probability of an insignificant difference in the means, which is
often called the "P-value". Use TTEST(range1, range2, 2, 2) to
do an Equal Variance test, and
TTEST(range1, range2, 2, 3) to do an Unequal
Variance test. If
<0.05,
then the difference in the means is significant; if not,
then the means are not significantly different. To
compute tcrit, use TINV(0.05, df) and to compute tstat,
use TINV(). For the t test df will
be n1 + n2 - 2, the combined degrees of freedom. If
<0.05,
then you should find that tstat>tcrit.
3 Pooled Data: All
the M&M data should be pooled so that the mean and standard
deviation for all plain M&Ms and for peanuts M&Ms can be
computed for the entire class. You could merge all the
data into one big spreadsheet, but all you really have to do is
calculate a weighted mean and a
weighted standard deviation.
This requires the mean xi, the
standard deviation si, and the
number of data ni in each set of
data. Square the standard deviation s to get the
variance Var of each set. Then
Use the weighted mean and weighted standard deviation to construct
normal Gaussian curves for each population. These curves are
now functions, so there is no need to try to compute them for each
M&M mass; they should be plotted over whatever range of numbers
let you see the graphs. The "right" ranges won't be the
same for the two types of M&Ms! But they both need to
appear on the same graph.
Report. From the MoCoSin network, you should also look
at the lab report form for this experiment. Under
Catwoman\chemistry\courses\ch220, look at the lab report called MandMs.doc. It has already been
written for you--to some extent. There are places in the
document file where you are allowed to entire data or
conclusions. Do not think that you can just cut and paste
results from the descriptive statistics, the F test, or the t test
directly from Excel into the lab report document. You must
edit the results--remove any numbers that are
irrelevant to the reader, such as skewness or T critical
one-tail.