Outline
__Why summerize__

-- the problem

-- the solution

-- what you lose
__Numbers__

-- central tendency

-- variability
__Graphics____Example for Thinking about Central Tendency__

__Why summerize__

__The problem__

Let’s say that I have collected some memory data.

I presented subjects with a short story

Then 10 minutes later had them write down what they could remember.

Now I need to communicate what I found to other researchers.
How shall I do it?

__Can’t see the forest for the trees__

I could just type up the responses and publish them, but that
isn’t very satisfactory and it wastes a lot of space in journals.
Besides, people need some way of summarizing it.

__Summarize__

Based on the Operational Definition.

Memory is the number of idea units in the story that the subjects recalled.
Say there were 80 in the original story. Now we can count how many
each subject got. I could just print all the scores, but even that
can be wasteful and silly if there are other ways to summarize -- which
there are. So we are working toward simple and informative summaries.

__What you lose__

It is also important to know what you lose. Each time you
summarize, you lose the individual instances. To say yes or no to
each idea unit, you lose what the subject actually wrote. To go to
a score, you no longer know what units each subject recalled. To
sum across, no longer know what each subject did. Sometime you sum
in many ways: say for each subject and each idea unit.

__Numbers__

The most common way of summarizing is numerically and our primary
concerns here are measures of central tendency and variability.

__Central tendency__

Mean, median, and mode

Mean is the numerical average

Median is the central score in rank order

Mode is the most common score

__Mean__: __M__ = (Sum_{i}X_{i})/N

i=1 i=2 1=3 ... i=n

= X1 + X2 + X3 ...... Xn / N

1315/20 = 65.75

__Median__: central score

odd numbers, the one with equal number of scores above and below;

even numbers, it is the average of the two middle scores

In this case: average of score 10 & 11 = 66+66/2 =
66

__Mode__: The most common score.

66

__Variability__

central tendency tells us what the average is and measures of variability
tells us how the scores are distributed around that average

No scores are "average" -- most deviate from average

*** would like to know the average deviation***

__Deviation__ : How far each score is from the mean (X_{i
}-
__M__)

If you wanted to know in general, well you could just sum these
deviations. But that sum it turns out is 0. That’s a property
of mathematical means.

*** so we can’t get the average straight from the deviations, but that
it goal***

So what we do is square them before we sum them.

__Sum of the Squared Deviations__; Sum of Squares; SS

by definition

SS = Sum_{i }(X_{i }- __M__)^{2}

Problem is that this is totaled across all the instances and depends
of the number of observations you have made, more subjects, the larger
the SS.

*** Now we have a non-zero number and we work but to the average***

Thus we need the average of this.

*** the sum divided by the number of things***

__Variance__: SS divided by the degrees of freedom: s^{2}
= SS/df

s^{2} = Sum_{i }(X_{i
}-
__M__)^{2}

N-1

*** Could also think of this as the Mean of the Squared Deviations --
Mean Square

But the problem here is that we are in terms of X-squared, not X

__Standard Deviation__: Square root of the Variance:
s = ?s2

__Graphics: Frequency Distributions__

Sometimes it is enough to have just the numbers: mean and
standard deviation. Graphic presentation, however, allows you to
see more about the data. A picture is worth a thousand words.

Frequency Distributions:

x-axis = scores on the measure

y-axis = frequency (number of people with that score)

Choose size of the range: each score or groups?

How to group? B/t 7 and 10 groups Try a few and go with the best picture--

Group by 3 Group by 4

52-54 | 1 | 52-55 | 1 |

55-57 | 0 | 56-59 | 1 |

58-60 | 3 | 60-63 | 4 |

61-63 | 2 | 64-67 | 7 |

64-66 | 6 | 68-71 | 4 |

67-69 | 4 | 72-75 | 2 |

70-72 | 2 | 76-80 | 1 |

73-75 | 1 | ||

76-78 | 1 |

__Types of Distributions__ are: Normal, Skewed, or Bimodal

What does this mean for the mean, median, and mode?

__Example for Thinking about Central Tendency__

Drawn from the Bellingham Herald

Median family income: $35,225

Washington: The largest family paychecks in the nation
are brought home in the big-city suburbs of the Northeast. At the
other end of the scale, family income on a stretch of South Texas farm
land on the Mexican border is the lowest in the nation. There’s a
connection between what you make and where you live, 1990 census figures
show. The counties where median family income is highest are concentrated
outside Washington, D.C., and New York City. Nationally, the typical
family median income was $35,225 a year. To government statisticians,
a family is a group of related people living together. The Washington
suburb of Fairfax County, VA, had the highest median family income:
$65,201 a year. Most of lowest-income counties are rural. In
Starr County, Texas, the median family makes $10,903 a year.

Question: What is the ratio of income for poor to rich?

What measure of central tendency?

Compute by family or per capita?

What were those government statisticians trying to do with these numbers?