Lecture 5 - One-way between-subjects ANOVA


-- thinking up a research project
-- designs with more than two conditions
-- how to test: the need to expand the t-test
-- nomenclature
-- conceptual formula
-- actual formula
-- an example
Follow-up comparisons
-- need for follow-up comparisons
-- problem of probability
-- planned vs. post-hoc
-- Tukey's test
Writing an ANOVA
- what do we know
-- order

-- writing

SPSS One-way between-subjects ANOVA
-- data entry
-- descriptive stats
-- follow-up comparisons

One-way between-subjects ANOVA

Thinking up a research project

First I should tell you about some of the research I do. I am concerned with memory, but with how people remember in situations more similar to real world situations than the traditional memory task. I don¹t use word lists. One of my primary concerns is how the social context affects the way in which people remember. I spend a fair amount of my research effort on questions derived from the basic observation that people talk different to experimenters in a memory experiment than they do in all other contexts. This means , of course, that much of our knowledge of how and what people remember may be limited to a very strange context. One variable within social context in which I am currently interested is shared knowledge. What type and how much knowledge do the participants in a memory conversation share? How does this affect the content and organization of the memory conversation? Basic approach. People come in and read a short story. They then experience a distractor task for a few minutes. Finally they are asked to converse with another about the story for a specified length of time. Usually ten minutes. I then transcribe the tape recorded conversations and count the occurrance of different types of information (details from story, inferences from the story, statements of reactions to the story, personal statements, etc). Shared knowledge has two components -- knowledge about the item to be discussed and general background knowledge. The basic view is that shared knowledge means that you don¹t talk about that information and you instead talk about other information. This called being polite in conversation. In previous work I have already begun looking at this in terms of knowledge about the to be remembered information (same stories vs different stories) and yes people do include less details about the story when they both read the same story. Now I want to know if it works the same with the other aspect of shared knowledge -- background knowledge.

Designs with more than 2 conditions
Independent Variable
    Dependent Variable
---> A
    Level 1
Pop ---> Sample ---> B
    Level 2
---> C
    Level 3

IV is familiarity with conversation partner

level 1 is strangers
level 2 is acquaintances
level 3 is roommates



DVs (taken from the 10 minute conversation)

# of details from story
# of inferences from story
# of reactions to story
# of personal comments



Think about predictions!

In this case I have 3 levels of the IV because I want to sample the full range of the IV. Sometimes you have more than 3. Sometimes the reasons are for more control or experimental conditions. The number of levels, and what they are, is determined by the theoretical questions being asked.

How to test: the need to expand the t-test

We want to know if their are differences among the groups. How can we do this? We could do three t-tests. But there is a problem -- it is the opposite of the gambler¹s fallacy, that is we can¹t consider each as separate events but rather as a total. We would like to be able to do everything at one time and that is what an ANOVA allows us to do.


Let me introduce how I will refer to the scores and means in a design like this.

Any score is refer to as Xia

X means a score.
i means an individual generally, can substitute a subject number within a group
a means a level of the IV
Thus Xia, refers to a score, for a subject, in a group.
IVA -- Familiarity of 

Conversation Partners

Subjects w/i 

a group

refer to as i

level A=1


level A=2


level A=3


Could have a

level A=4 or


. . .
. . .
. . .
. . .

There are means for every group of subject referred to as Maand there is a total mean referred to as MT

Conceptual formula

t = diff between means / variability

F = diff among groups / variability
= between group variance / within group (or error) variance
= part of variability due to IV / part of variability due to error variability



This is why it is called an ANOVA

Analysis of Variance

1 -- partition variance into the components for one-way between-subjects ANOVA, that is the b/t & w/i(error)
2 -- then you compare those components. Dividing one thing by another is a way of comparing numerical size



Actual formula

Let¹s work our way to the actual formulas that are involved. Variance is the SS/df --> so we have to get to the SS

Partitioning Variance:

1. Start by partitioning deviation
In general:
Deviations = Xi - M
DevT = Xia - Mt how far a score is from total mean
DevA = Ma - Mt how far a score¹s group mean is from total mean
DevE = Xia - Ma how far a score is from group mean
Turns out that these things add up: (Xia - Mt) = (Ma - Mt) + (Xia - Ma)
DevT is the total amount any score deviates (differs from) the total average
DevA is the part of the total deviation due to what group a score belongs to; it is how much, on average membership in the group makes the score differ from the total mean. This is caused by group membership (generally assumed to be the IV, unless you've got a confound in your study).
DevE is the part of the total deviation due to variability within a group; it is how far a score is from what you would expect it to be based on the group membership (given that the best estimate of what scores in the group should do is the mean). This is caused by individual differences and measurement error.



2. Sums of Squares (Sums of the squared deviations)

Then you square the deviations and sum across all individuals in all groups

SST = SUMaSUMi (Xia - Mt)2
SSA = SUMaSUMi (Ma - Mt)2
SSE = SUMaSUMi (Xia - Ma)2



These things add up too, so we really have partitioned SS


3. variance = SS/df So we need dfs

In general, Degrees of Freedom is the number of things free to vary given the number of things known. For ANOVAs, the first part of the deviation formula is the number of things we are working with and the second part is the constant.

A is the number of groups.

Nt is the total number subjects in the entire experiment.

dfT = Nt - 1
dfA = A - 1
dfE = Nt - A
Things add up here too:
dfT = dfA + dfE



4. Variance is the SS/df or: Mean of the Squared Deviations = MS = Mean Squares




Comparing Variances

You are done with partitioning variance and are ready to compare variances. This tells you how big your effect is compared to variability within the groups (how much the nubers randomly move around). Are the differences observed large enough to be attributed to the IV rather than random chance?


Look up probability on an F table in the back of a stat book, or have the computer give you an exact probability.

Present information in a Source Table.


Follow-up comparisons

need for follow-up comparisons

There is need for additional work beyond the ANOVA, when it allows you to conclude that there is an effect of the IV on the DV.  This boils down to the difference between what you know with an ANOVA compared to a t-test. With a t-test: know there is an effect and know where difference is and direction (that is which group is higher on the DV than which other group). With ANOVA: know that there is an effect. Where is it? Consider our example:
# of Personal Statements

The ANOVA says that there is some difference among the means. But among which means?

Which is it? S < A < R ? S = A < R ? S = A and A = R and S < R?

In order to know exactly where the difference is we need to compare the means to one another. Did not do these pairwise comparisons first because we wanted to global comparison (ANOVA). If the ANOVA is not significant, then we do not do follow-up comparison because the ANOVA says there is no difference among the means.

What to: Compare S to A, S to R, and A to R

problem of probability

The problem of probability is the reason we did not just start making the pairwise comparisons -- doing repeated stats on the same set of data causes your p level to inflate. This is the opposite of the gambler's fallacy. Gambler's fallacy: think that independent events can be viewed as related. Researcher's folly: think that related events can be treated independently

Acting as if things were independent: Do those three comparisons at .05 cut off.  Reality is they are related: What is the chance of getting 1 of 3 reaching .05?

This is referred to as the experiment-wide or familywise error rate. Generally the probabilities add up but never reach 1.0.

Use the following formula to compute exactly

p = 1 - (1-a)k

a is probability level for each comparison
k is the number of comparisons



In our case, the odds of 1 of 3 comparisons being significant at .05 level. p = 1 - (1-.05)3 = .143

This means when you make your follow-up, or pairwise, or paired, comparisons, you need to do it in a way that controls your experiment wide p-level.

planned vs. post-hoc

Statisticians and Researchers talk about two different types of paired comparisons.

Planned: -- those you intended to do before you conducted the experiment -- generally the decisions are based on theory -- need to control experiment-wide prob only for the number you make

Post-hoc: -- Latin for after the fact, you look at the outcome of experiment and decide which comparisons to make -- you would be willing to make any and all comparisons -- regardless of how many you do, you must account for all possible comparisons in controlling for experiment-wide prob

Say a 5 level, one-way ANOVA: there are 10 possible comparisons

Planned decide to do 2: p = 1 - (1-.05)2 = .098

Post-hoc decide to do 2: p = 1 - (1-.05)10 = .401

Is there really a difference between planned and post-hoc? In practice, generally researchers always do post-hoc -- going to do all comparison, otherwise why include group -- you are curious about exactly what happened. Occassionally some planned, ie when you have multiple control groups Alzheimer's Disease compared to college, elderly, Korsakoff's you don't, in this case, care about comparisons among controls

Tukey's test

There are many statistics that allow you to make pairwise comparisons while controlling for experiment-wide probability. Almost all of these provide a Critical Difference (CD) that the difference between any two means must exceed in order to be significant at the experiment-wide p-level. The one we will use is Tukey's Honestly Significant Difference Test (or Tukey's HSD, Tukey's, Tukey's post-hoc)

CD = q [ sqrt (MSE / Na) ]

q is based on the # of means being compared and dfe and cut-off (.01 or .05).
You find this in the back of most stat books.
MSE is straight from your ANOVA
Na is the number of subjects in each group (this assumes equal N).



In our case

CD.05 = q [ sqrt (MSe/Na) ] = 3.53 [ sqrt (2.37/10) ] = 1.72

Thus, for any difference between two groups to be meaningful, the difference between two means must exceed 1.72.

In making comparisons with three or more groups, it is easiest to do in a Tukey's table. In the table put the absolute value of the difference between two groups.

* p < .05

Now we know that roommates made more personal statements than strangers or acquaintances. Those are the differences that the ANOVA picked up. It claimed that there was some difference somewhere, now we know where.

Writing an ANOVA

-- what do we know

- means and standard deviations for each group

- there is an effect of familiarity on the number of personal statements made (ANOVA tells us this)

- Roommates > Acquaintances = Strangers (Tukey's tells us this, but only include if ANOVA says there is an effect)

-- order

1. difference based on ANOVA

2. where the effects are based on Tukey's

3. descriptive stats

Use this order to put the most important information first. Whether or not there is an effect of the IV on the DV is the most important issue. If there is, then you report where the differences are and in what direction (Tukey's told you this). If there was no effect, then you don't do Tukey's because there is no effect to find and you don't report it. Finally, give the descriptives. Do this as part of the paragraph, in a Table, or a Figure (graph).

-- writing

For the example from class

A one-way between-subjects ANOVA found an effect of familiarity on the number of personal statements that subjects made, F (2, 27) = 29.536, p < .001, MSE = 2.37. Tukey's follow-up comparisons found that the participants talking with their roommate provided more personal statements than the participants talking with an acquaintance and the participants talking with a stranger (p < .05), and no difference between talking with an acquaintance and talking with a stranger . Table 1 shows the mean number of personal statements provided by participants in each of the familiarity conditions.

In generic terms

A one-way between-subjects ANOVA found an effect of the IV on the DV, F (2, 27) = 29.536, p < .001, MSE = 2.37. Tukey's follow-up comparisons found that the LEVEL OF IV more DV than the OTHER LEVEL OF IV (repeat for all comparisons where there was a meaningful difference (p < .05). Table 1 shows the mean number of DV by participants in each of the IV conditions.

SPSS One-way between-subjects ANOVA

Data entry
As with t-test.  Label variables, set up file, enter data, save data.  The only difference is in the number of levels of the IV (3 in example from class, not 2).

Under the Analyze menu.
Pull down to the Compare Means option
Select One-way ANOVA
Put the IV under the Factor space
Put all DVs under the Dependent List

Descriptive stats
To get descriptive stats, click on the options button
Click in the box beside Descriptive
Click Continue

Follow-up comparisons
To get a Tukey's HSD, click on the post-hoc button
You should notice that there are several from which you can choose
Click on the box beside Tukey
Click Continue

Tell the dialogue box OK and get your output