TC3 → Stan Brown → Statistics → 1-Way ANOVA
revised 9 Mar 2013 (What’s New?)

Comparing More Than Two Means:
One-Way ANOVA

Copyright © 2009–2014 by Stan Brown, Oak Road Systems

Summary:

When you have several means to compare, it’s not valid just to compare all possible pairs with t tests. Instead, you follow a two-stage process:

  1. Are all the means equal? A computation called ANOVA (analysis of variance) answers this question.
  2. If ANOVA shows that the means aren’t all equal, then which means are unequal, and by how much? There are many ways to answer this question (and they give different answers), but we’ll use a process called Tukey’s HSD (Honestly Significant Difference).
Contents:

Terminology

The factor that varies between samples is called the factor. (Every once in a while things are easy.) The r different values or levels of the factor are called the treatments. Here the factor is the choice of fat and the treatments are the four fats, so r = 4.

The computations to test the means for equality are called a 1-way ANOVA or 1-factor ANOVA.

Example 1: Fat for Frying Donuts

g Fat Absorbed in Batchs
Fat 16472687756957213.34
Fat 2789197828577857.77
Fat 3759378716376769.88
Fat 4556649647068628.22
source: Snedecor pp 217–218

Hoping to produce a donut that could be marketed to health-conscious consumers, a company tried four different fats to see which one was least absorbed by the donuts during the deep frying process. Each fat was used for six batches of two dozen donuts each, and the table shows the grams of fat absorbed by each batch of donuts.

It looks like donuts absorb the most of Fat 2 and the least of Fat 4, with intermediate amounts of Fat 1 and Fat 3. But there’s a lot of overlap, too: for instance, even though the mean for Fat 2 is much higher than for Fat 1, one sample of Fat 1, 95 g, is higher than five of the six samples of Fat 2.

Nevertheless, the sample means do look different. But what about the population means? In other words, would the four fats be absorbed in different if you made a whole lot of batches of donuts — do statistics justify choosing one fat over another? This is the basic question of a hypothesis test or significance test: is the difference great enough that you can rule out chance?

If Fats 2 and 4 were the only ones you had data for, you’d do a good old 2-sample t test. So why can’t you do that anyway? because that would greatly increase your chances of a Type I error. The reasons are given in the Appendix.

By the way, though usually you are interested in the differences between population means with various treatments, you can also estimate the individual means. If you’re interested, see Estimating Individual Treatment Means in the Appendix.

Step 1: ANOVA Test for Equality of All Means

The ANOVA procedure tests these hypotheses:

H0: μ1 = μ2 = ... = μr, all the means are the same

H1: two or more means are different from the others

Let’s test these hypotheses at the α = 0.05 significance level.

You might wonder why you do analysis of variance to test means, but this actually makes sense. The question, remember, is whether the observed difference in means is too large to be the result of random selection. How do you decide whether the difference is too large? You look at the absolute difference of means between treatments (samples), but you also consider the variability within each treatment. Intuitively, if the difference between treatments is a lot bigger than the difference within treatments, you conclude that it’s not due to random chance and there is a real effect.

And this is just how ANOVA works: comparing the variation between groups to the variation within groups. Hence, analysis of variance.

Requirements for ANOVA

  1. You need r simple random samples for the r treatments, and they need to be independent samples. The sample sizes need not be the same, though it’s best if they’re not very different.
  2. The underlying populations should be normally distributed. However, the ANOVA test is robust and moderate departures from normality aren’t a problem, especially if sample sizes are large and equal or nearly equal (Kuzma page 180).
  3. The samples should all have the same standard deviation, theoretically. Because the ANOVA test is robust, Sullivan (page C–21) says it’s good enough if the largest standard deviation is less than double the smallest standard deviation.

    Miller (pages 90–91) is more cautious. When sample sizes are equal but standard deviations are not, the actual p-value will be slightly larger than what you find in the tables. But when sample sizes are unequal, and the smaller samples have the larger standard deviations, the actual p-value “can increase dramatically above” what the tables say, even “without too much disparity” in the standard deviations. “Falsely reporting significant results when the small samples have the larger variances is a serious worry. The lesson to be learned is to balance the experiment [equal sample sizes] if at all possible.

Perform a 1-Way ANOVA Test

A 1-way ANOVA tests whether the means of all groups are equal for different levels of one factor, using some fairly lengthy calculations. You could do all the computations by hand as shown in the Appendix, but no one ever does. Here are some alternatives:

When you use a calculator or computer program to do ANOVA, you get an ANOVA table that looks something like this:

SSdfMSFp
Between groups
(or “Factor”)
1636.53545.45.410.0069
Within groups
(or “Error”)
2018.020100.9
Total 3654.523

Note that the mean square between treatments, 545.4, is much larger than the mean square within treatments, 100.9. That ratio, between-groups mean square over within-groups mean square, is called an F statistic (F = MSB/MSW = 5.41 in this example). It tells you how much more variability there is between treatment groups than within treatment groups. The larger that ratio, the more confident you feel in rejecting the null hypothesis, which was that all means are equal and there is no treatment effect.

But what you care about is the p-value of 0.0069, obtained from the F distribution. The p-value has the usual interpretation: the probability of the between-treatments MS being ≥5.41 times the within-treatments MS, if the null hypothesis is true, is p = 0.0069.

The p-value is below your significance level of 0.05: it would be quite unlikely to have MSB/MSW this large if there were no real difference among the means. Therefore you reject H0 and accept H1, concluding that the mean absorption of all the fats is not the same.

An interesting extra parameter can be derived from the ANOVA table; see η²: Strength of Association in the Appendix below.

Now that you know that it does make a difference which fat is used, you naturally want to know which fats are significantly different. This is post-hoc analysis. There are several different post-hoc analyses, and no one is superior on all points, but the most common choice is the Tukey HSD.

Step 2: Tukey HSD for Post-Hoc Analysis

If your ANOVA test shows that the means aren’t all equal, your next step is to determine which means are different, to your level of significance. You can’t just perform a series of t tests, because that would greatly increase your likelihood of a Type I error. So what do you do?

John Tukey gave one answer to this question, the HSD (Honestly Significant Difference) test. You compute something analogous to a t score for each pair of means, but you don’t compare it to the Student’s t distribution. Instead, you use a new distribution called the studentized range or q distribution.

Caution: Perform post-hoc analysis only if the ANOVA test shows a p-value less than your α. If p>α, you don’t know whether the means are all the same or not, and you can’t go fishing for unequal means.

You generally want to know not just which means differ, but by how much they differ (the effect size). The easiest thing is to compute the confidence interval first, and then interpret it for a significant difference in means (or no significant difference). You’ve already seen this relationship between a test of significance at the α level and a 1−α confidence interval:

You compute that confidence interval similarly to the confidence interval for the difference of two means, but using the q distribution which avoids the problem of inflating α:

xbar sub i minus xbar sub j plus or minus q of alpha, r, df sub w times square root of 0.5 times MS sub w times quantity 1 over n sub i plus 1 over n sub j

where i and j are the two sample means, ni and nj are the two sample sizes, MSW is the within-groups mean square from the ANOVA table, and q is the critical value of the studentized range for α, the number of treatments or samples r, and the within-groups degrees of freedom dfW. The square-root term is called the standardized error (as opposed to standard error).

Using the studentized range, developed by Tukey, overcomes the problem of inflated significance level that I talked about earlier. If sample sizes are equal, the risk of a Type I error is exactly α, and if sample sizes are unequal it’s less than α: the procedure is conservative. In terms of confidence intervals, if the sample sizes are equal then the confidence level is the stated 1−α, but if the sample size are unequal then the actual confidence level is greater than 1−α (NIST section 7.4.7.1).

Estimating Differences of Means

Usually the comparisons are presented in a table, like this one for the example with frying donuts:

ij Critical q
q(α,r,dfW)
Standardized
error
95% Conf Interval
for μi−μj
Signif
at 0.05?
Fat 1 − Fat 2 −133.95974.1008 −29.23.2
Fat 1 − Fat 3 −43.95974.1008 −20.212.2
Fat 1 − Fat 4 103.95974.1008 −6.226.2
Fat 2 − Fat 3 93.95974.1008 −7.225.2
Fat 2 − Fat 4 233.95974.1008 6.839.2YES
Fat 3 − Fat 4 143.95974.1008 −2.230.2

How do you read the table, and how was it constructed? Look first at the rows. Each row compares one pair of treatments.

If you have r treatments, there will be r(r−1)/2 pairs of means. The “/2” part comes because there’s no need to compare Fat 1 to Fat 2 and then Fat 2 to Fat 1. If Fat 1 is absorbed less than Fat 2, then Fat 2 is absorbed more than Fat 1 and by the same amount.

Now look at the columns. I’ll work through all the columns of the first row with you, and you can interpret the others in the same way.

  1. The row heading tells you which treatments are being compared in this row, and the direction of comparison.
  2. The next column gives the point estimate of difference, which is nothing more than the difference or the two sample means. The sample means of Fat 1 and Fat 2 were 72 and 85, so the difference is −13: the sample average of Fat 1 was 13 g less fat absorbed than the sample average of Fat 2.
  3. Next is critical q, from the confidence interval formula. q(α,r,dfW) depends on the number of treatments and total number of data points, not on the individual treatments, so it’s the same for all rows in any given experiment.

    For this experiment, we had four treatments and dfW from the ANOVA table was 20, so we need q(0.05, 4, 20). Your textbook may have a table of critical values for the studentized range, or you can look up q in an online table such as Lane2 or find it with an online calculator like Lowry2. (Sullivan, the textbook used at TC3, does not have a table of q, and the TI calculators can’t compute it.)

    Different sources give slightly different critical values of q, I suspect because q is extremely difficult to compute. One value I found was q(0.05,4,20) = 3.9597.

  4. The standardized error square root of 0.5 times MS sub w times quantity 1 over n sub i plus 1 over n sub j is the square-root term from Tukey’s formula for confidence interval.

    In an experiment with unequal sample sizes, the standardized error would vary for comparing different pairs of treatments. But in this experiment, every treatment has six data points, and so the standardized error is the same for every pair of means:

    √[(MSW/2)·(1/6+1/6)] = √[(100.9/2)·(2/6)] = 4.1008

  5. The endpoints of the confidence interval, as usual, are the point estimate plus or minus the critical q times the standardized error. Critical q times the standardized error is 3.9597×4.1008 = 16.2, and the difference of means in the first row is 12 = −13, so the endpoints of the confidence interval are −13−16.2 = −29.2 and −13+16.2 = 3.2.

    Interpretation: You’re 95% confident that, on average, a batch of 24 donuts absorbs between 29.2 g less and 3.2 g more of Fat 1 than Fat 2.

  6. The last column applies the relation between confidence interval and significance test to say whether there’s a significant difference between the two treatments.

    The confidence interval for the difference between Fat 1 and Fat 2 goes from a negative to a positive, so it does include zero. That means the two fats might have the same or different absorption, so you can’t say whether there’s a difference.

    Caution: It’s generally best not to say that there is no significant difference. Even though that’s literally true, it’s easily misinterpreted to mean that the absorption of the two fats is the same, and you don’t know that. It might be, and it might not be. Stick to neutral language.

    On the other hand, when the endpoints of the confidence interval are both positive or both negative, then 0 is not in the interval and we reject the null hypothesis of equality. In this table, only Fats 2 and 4 have a significant difference.

    Interpretation: Fats 2 and 4 are not equally absorbed in frying donuts, and we’re 95% confident that a batch of 24 donuts absorbs 6.8 g to 30.2 g more of Fat 2 than Fat 4.

Other Comparisons

It’s possible to make more complicated comparisons. For instance, with a control group and two treatments you might compare the mean of the control group to the average of the means of the two treatments. Any kind of linear comparison can be done using a procedure developed by Henry Scheffé. A good brief explanation of Scheffé’s method is at NIST section 7.4.7.2.

Tukey’s method is best when you are simultaneously comparing all pairs of means. If you have pre-selected a subset of means to compare, the Bonferroni method (NIST section 7.4.7.3) may be better.

Example 2: Stock Market

5-year Rates of Return
FinancialEnergyUtilities
10.7612.7211.88
15.0513.915.86
17.016.4313.46
5.0711.199.90
19.5018.793.95
8.1620.733.44
10.389.607.11
6.7517.4015.70
11.58513.8468.913
s 5.1244.8674.530
source: morningstar.com via Sullivan page C–30

A stock analyst randomly selected eight stocks in each of three industries and compiled the five-year rate of return for each stock. The analyst would like to know whether any of the industries have a different rate of return from the others, at the 0.05 significance level.

Solution: The hypotheses are

H0: = μF = μE = μU, all three industries have the same average rate of return

H1: the industries don’t all have the same average rate of return

You can use a normal probability plot to assess normality for each sample; see MATH200A Program part 4. The standard deviations of the three samples are fairly close together, so the requirements are met.

Here is the ANOVA table:

SSdfMSFp
Between groups
(or “Factor”)
97.5931248.79652.080.1502
Within groups
(or “Error”)
493.25772123.4885
Total 590.850823

The F statistic is only 2.08, so the variation between groups is only about double the variation within groups. The high p-value makes you fail to reject H0 and you cannot reach a conclusion about differences between average rates of returns for the three industries.

Since you failed to reject H0 in the initial ANOVA test, you can’t do any sort of post-hoc analysis and look for differences between any particular pairs of means. (Well, you can, but you know in advance that all of the intervals will include zero, meaning that you don’t know whether any particular sector has a different return from any other sector or not.)

Example 3: CRT Lifetimes

Lifetime, hrs
Type A 407   411   409409 2.0
Type B 404   406   408   405   402 4052.2
Type C 410   408   406   408 4081.6
source: Spiegel, pp 378–379

A company makes three types of high-performance CRTs. A random sample finds lifetimes shown in the table at right. At the 0.05 level, is there a difference in the average lifetimes of the three types?

Solution: Your hypotheses are

H0: μA = μB = μC, the three types have equal mean lifetime

H1: the three types don’t all have the same mean lifetime

Excel or the TI-83/84 gives you this ANOVA table:

SSdfMSFp
Between groups
(or “Factor”)
362184.500.0442
Within groups
(or “Error”)
3694
Total 7211

p<α, so you reject H0 and accept H1, concluding that the three types don’t all have the same mean lifetime.

Since you were able to reject the null hypothesis, you can proceed with post-hoc analysis to determine which means are different and the size of the difference. Here is the table:

ij Critical q
q(α,r,dfW)
Standardized
error
95% Conf Interval
for μi−μj
Signif
at 0.05?
Type A − Type B 43.95081.0328 −0.18.1
Type A − Type C 13.95081.0801 −3.35.3
Type B − Type C −33.95080.9487 −6.70.7

This result might surprise you: although the three means aren’t all equal, you can’t say that any two of the means are unequal. But when you look more closely at the numbers, this doesn’t seem quite so unreasonable.

First, look at the p-value in the ANOVA table: 0.0442 is below 0.05, yes, but it’s not very far below. There’s almost a 4½% chance that we’re committing a Type I error in rejecting H0. Next, look at the confidence interval μA−μB. While the interval does include 0, it’s extremely lopsided and almost doesn’t include 0.

Though we’re used to thinking of significance as “either it is or it isn’t”, there are cases where the decision is a close one, and this is one of those cases. And the confidence intervals are computed by a different method than the significance test, using a different distribution. Here again, the decision is a close one. So what we have is two close decisions, based on different computations, one falling slightly on one side of the line and the other falling slightly on the other side of the line. It’s a good reminder that in statistics we’re dealing with probabilities, not certainties.

References

Kuzma
Kuzma, Jan W., and Stephen E. Bohnenblust, Basic Statistics for the Health Sciences 5/e (McGraw-Hill, 2005)
Lane
Lane, David M., HyperStat Online, accessed 26 Dec 2012 at http://davidmlane.com/hyperstat/index.html
Lane2
Critical Values of the Studentized Range (0.05 level) and Critical Values of the Studentized Range (0.01 level), tables for 2–20 samples, accessed 26 Dec 2012 at http://davidmlane.com/hyperstat/sr_05.html and http://davidmlane.com/hyperstat/sr_01.html
Lowry
Lowry, Richard, Concepts and Applications of Inferential Statistics, accessed 26 Dec 2012 at http://vassarstats.net/textbook/
Lowry2
Lowry, Richard, Critical Values of Q, online calculator for 3–10 samples, accessed 26 Dec 2012 at http://www.vassarstats.net/tabs.html#q
Lowry3
Lowry, Richard, One-Way Analysis of Variance for Independent or Correlated Samples, online calculator for 2–5 samples, accessed 26 Dec 2012 at http://vassarstats.net/anova1u.html
Miller
Miller, Rupert G., Jr., Beyond ANOVA: Basics of Applied Statistics (Wiley, 1986)
NIST
NIST/SEMATECH e-Handbook of Statistical Methods, accessed 26 Dec 2012 at http://www.itl.nist.gov/div898/handbook/
Snedecor
Snedecor, George W., and William G. Cochran, Statistical Methods 8/e (Iowa State, 1989)
Spiegel
Spiegel & Stephens, Theory and Problems of Statistics 3/e (McGraw-Hill, 1999)
Sullivan
Sullivan, Michael, Fundamentals of Statistics 3/e (Pearson Prentice Hall, 2011), section C.4 (on CD)

What’s New

Appendix (The Hard Stuff)

The following sections are for students who want to know more than just the bare bones of how to do a 1-way ANOVA test.

Why Not Just Pick Two Means and Do a t Test?

Remember that you have to set up hypotheses up before you know the data. Before you’ve actually fried the donuts, you have no reason to expect any particular outcome. Specifically, until you have the data you have no reason to think Fats 2 and 4 are any more different than Fats 1 and 4, or any other pair.

Why can’t you collect the data and then select your hypotheses? Because that can put significance on a chance event. For example, a golfer hits a ball and it lands on a particular tuft of grass. The probability of landing on that particular tuft is extremely small, so there’s something different about that particular tuft, right? Obviously not! It’s a logical fallacy to decide what to test after you already have the data.

So if you want to do a 2-sample t test in differences among four fats you would have to test every pair of fats: 1 and 2, 1 and 3 1 and 4, 2 and 3, 2 and 4, 3 and 4. That’s six hypotheses in all.

Well, why not do a 0.05 significance test on pair of means? Remember what a 0.05 significance level means: you’re willing to accept a 5% chance of a Type I error, rejecting H0 when it’s actually true. But if you test six 0.05 hypotheses on the same set of data, you’re much more likely to commit a Type I error. How much more likely? Well, for each hypothesis there’s a 95% chance of escaping a Type I error, but the probability of escaping a Type I error six times in a row is 0.956 = 0.7351. 1−0.7351 = 0.2649, so if you test all six pairs at the 0.05 level, you’re more likely than one chance in four to get a false positive, finding a difference between two fats when there’s actually no difference.

Prob. of Type I Error
rpairs α = 0.05α = 0.01
330.14260.0297
460.26490.0585
5100.40130.0956
6150.53670.1399

In general, if you have r treatments, there are r(r−1)/2 pairs of means to compare. If you test each pair at significance level α, the overall probability of a Type I error is 1 − (1−α)r(r−1)/2. The table at right shows the effective α for various numbers of treatments when the nominal α is 0.05 or 0.01. You can see that testing multiple hypotheses increases your α dramatically. Even with just three treatments, the effective α is almost three times the nominal α. This is clearly unacceptable.

Why not just lower your alpha? Because as you lower your α you increase your β, the chance of a Type II error. β represents the probability of a false negative, failing to find a difference in fats when there actually is a difference. This, too, is unacceptable.

So you have to find a way to test all the pairs of means at the same time, in one test. The solution is an extension of the t test to multiple samples, and it’s called ANOVA. (If you have only two treatments, ANOVA computes the same p-value as a two-sample t test, but at the cost of extra effort.)

How ANOVA Works

How does the ANOVA procedure compute a p-value? This section shows you the formulas and carries through the computations for the example with fat for frying donuts.

Remember, long ago in a galaxy called Descriptive Statistics, how the variance was defined: find the mean, then for each data point take the square of its difference from the mean. Add up all those squares, and you have SS(x), the sum of squared deviations in x. The variance was SS(x) divided by the degrees of freedom n−1, so it was a kind of average or mean squared deviation. You probably learned the shortcut computational formulas:

SS(x) = ∑x² − (∑x)²/n or SS(x) = ∑x² − n²

and then

s² = MS(x) = SS(x)/df where df = n−1

In 1-way ANOVA, we extend those concepts a bit. First you partition SS(x) into between-treatments and within-treatments parts, SSB and SSW. Then you compute the mean square deviations:

  • MSB is called the between-treatments mean square, between-groups variance, or factor MS. It measures the variability associated with the different treatment levels or different values of the factor.
  • MSW is called the within-treatments mean square, within-group variance, pooled variance, or error MS. It measures the variability that is not associated with the different treatments.
  • Finally you divide the two to obtain your test statistic, F = MSB/MSW, and you look up the p-value in a table of the F distribution.

    (The F distribution is named after “the celebrated R.A. Fisher” [Kuzma page 176]. You may have already seen the F distribution in computing a different ratio of variances, as part of testing the variances of two populations for equality.)

    There are several ways to compute the variability, but they all come up with the same answers and this method in Spiegel pages 367–368 is as easy as any:

    SSdfMSF
    Between groups
    (or “Factor”)
    SSB = ∑njj²−N² dfB = r−1 MSB = SSB/dfB F = MSB/MSW
    Within groups
    (or “Error”)*
    SSW = SStot−SSB dfW = N−r MSW = SSW/dfW
    Total* SStot = ∑x²−N² dftot = N−1
    * or, if you know the standard deviations of the samples,
    SSW = ∑(nj−1)sj²
    SStot = SSB + SSW

    where

    You begin with the treatment means j={72, 85, 76, 62} and the overall mean =73.75, then compute

    SSB = (6×72²+6×85²+6×76²+6×62²) − 24×73.75² = 1636.5

    MSB = 1636.5 / 3 = 545.4

    The next step depends on whether you know the standard deviations sj of the samples. If you don’t, then you jump to the third row of the table to compute the overall sum of squares:

    ∑x² = 64² + 72² + 68² + ... + 70² + 68² = 134192

    SStot = ∑x² − N² = 134192 − 24×73.75² = 3654.5

    Then you find SSW by subtracting the “between” sum of squares SSB from the overall sum of squares SStot:

    SSW = SStot−SSB = 3654.5−1636.5 = 2018.0

    MSW = 2018.0 / 20 = 100.9

    Now you’re almost there. You want to know whether the variability between treatments, MSB, is greater than the variability within treatments, MSW. If it’s enough greater, then you conclude that there is a real difference between at least some of the treatment means and therefore that the factor has a real effect. To determine this, divide

    F = MSB/MSW = 5.41

    This is the F statistic. The F distribution is a one-tailed distribution that depends on both degrees of freedom, dfB and dfW.

    At long last, you look up F=5.41 with 3 and 20 degrees of freedom, and you find a p-value of 0.0069. The interpretation is the usual one: there’s only a 0.0069 chance of getting an F statistic greater than 5.41 (or higher variability between treatments relative to the variability within treatments) if there is actually no difference between treatments. Since the p-value is less than α, you conclude that there is a difference.

    Estimating Individual Treatment Means

    Usually you’re interested in the contrast between two treatments, but you can also estimate the population mean for an individual treatment. You do use a t interval, as you would when you have only one sample, but the standard error and degrees of freedom are different (NIST section 7.4.3.6).

    To compute a confidence interval on an individual mean for the jth treatment, use

    df = dfW

    standard error = √(MSW/nj)

    Therefore the margin of error, which is the half-width of the confidence interval, is

    E = t(α/2,dfW) · √(MSW/nj)

    Example: Refer back to the fats for frying donuts. Estimate the population mean for Fat 2 with 95% confidence? In other words, if you fried a great many batches of donuts in Fat 2, how much fat per batch would be absorbed, on average?

    Solution: First, marshal your data:

    sample mean for Fat 2: 2 = 85

    sample size: n2 = 6

    degrees of freedom: dfW = 20 (from the ANOVA table)

    MSW = 100.9 (also from the table)

    1−α = 0.95

    TI-83 or TI-84 users, please see an easy procedure below.

    Computation by Hand

    Begin by finding the critical t. Since 1−α = 0.95, α/2 = 0.025. You therefore need t(0.025,20). You can find this from a table:

    t(0.025,20) = 2.0860

    Next, find the standard error. This is

    standard error = √(MSW/nj) = √(100.9/6) = 4.1008

    Now you’re ready to finish the confidence interval. The margin of error is

    E = t(α/2,df) · √(MSW/nj) = 2.0860×4.1008 = 8.5541

    Therefore the confidence interval is

    μ2 = 85 ± 8.6 g (95% confidence)

    or

    76.4 g ≤ μ2 ≤ 93.6 g (95% confidence)

    Conclusion: You’re 95% confident that the true mean amount of fat absorbed by a batch of donuts fried in Fat 2 is between 76.4 g and 93.6 g.

    TI-83/84 Procedure

    Your TI calculator is set up to do the necessary calculations, but there’s one glitch because the degrees of freedom is not based on the size of the individual sample, as it is in a regular t interval. So you have to “spoof” the calculator as follows.

    Press [STAT] [] [8] to bring up the TInterval screen. First I’ll tell you what to enter; then I’ll explain why.

    TI-83/84 input screen for estimating mean
    TI-83/84 output screen for estimating mean, showing (76.446,93.554)

    Now, what’s up with n and Sx? Well, the calculator uses n to compute degrees of freedom for critical t as n−1. You want degrees of freedom to be dfW, so you lie to the calculator and enter the value of n as dfW+1 (20+1 = 21).

    But that creates a new problem. The calculator also divides s by √n to come up with the standard error. But you want it to use nj (6) and not your fake n (21). So you have to multiply MSW by dfW+1 and divide by nj to trick the calculator into using the value you actually want.

    By the way, why is MSW inside the square root sign? Because the calculator wants a standard deviation, but MSW is a variance. As you know, standard deviation is the square root of variance.

    All this fakery achieves the desired result: the confidence interval matches the one that you would have if you computed it by hand.

    Conclusion: You’re 95% confident that the true mean amount of fat absorbed by a batch of donuts fried in Fat 2 is between 76.4 g and 93.6 g.

    η²: Strength of Association

    Lowry chapter 14 part 2 mentions a measure that is usually neglected in ANOVA: η². (η is the Greek letter eta, which rhymes with beta.)

    η² = SSB/SStot, the ratio of sum of squares between groups to total sum of squares. For the donut-frying example,

    η² = SSB/SStot = 1636.5 / 3654.5 = 0.45

    What does this tell you? η² measures how much of the total variability in the dependent variable is associated with the variation in treatments. For the donut example, η² = 0.45 tells you that 45% of the variability in fat absorption among the batches is associated with the choice of fat.


    This page is used in instruction at Tompkins Cortland Community College in Dryden, New York; it’s not an official statement of the College. Please visit www.tc3.edu/instruct/sbrown/ to report errors or ask to copy it.

    For updates and new info, go to http://www.tc3.edu/instruct/sbrown/stat/