Posts tagged ‘standard deviation’

Polling 101: Expected Values

We know, Math was not a subject you were planning to sit through when you visited us.  However in order to lay the foundation for sound polling procedures, we need to cover a few basics.  Consider them refreshers; for us as much as it is for you.

Random Variables

Like any statistician, pollsters must reduce their raw data to a numeric form to represent real-world phenomenon.  This transition from the real-world to numbers is called a random variable (RV).  For example, if you threw one die, you could define the random variable x to represent the outcome.  The mapping to numeric values would simply be the value on the top of the die after each roll.  A pollster could ask a multiple choice question to a group of people where x represented the number of the response selected.

RV’s come in two varieties: discrete and continuous.  Discrete random variables can take on one value from a finite list of possible candidates.  The roll of a die or a poll respondent’s answer to a multiple choice question are examples.  Continuous random variables can take on any value in a given range.  A person’s height is an example of a continuous random variable.  Since polling always deals with discrete responses, we’ll limit our discussions to discrete RV’s.  Besides, the math is easier.

Expected Values

Of the calculations you could make on a random variable, the most fundamental is its expected value , E(X).  E(X) is, as the name suggest, a calculation of the value you’re most likely to expect, given enough samples.  An expected value can be calculated on the random variable itself or any function of it, say g(x).  Before you can calculate an expected value however, you need to know its probability mass function (PMF).  A fancy term, but the PMF simply defines the probability of each possible outcome.  If the die is fair, for example, the probability of any outcome between 1 and 6 is 1/6, and 0 for all others.

The general formula for the expected value of g(x) is:

Formula’s provided by mathURL.com.

If you’re unfamiliar with the notation, the equation basically instructs you to list every possible outcome for X (let’s say there’s n of them), run each through the function g(x), and multiply the results by the probably of that outcome occurring.  Sum (i.e. add) these values for each possible outcome, and you’re done.  It may seem like unnecessary tedium, but we’ll see shortly how this is used to calculate familiar values like the average and standard deviation.

The Mean

We’ll look at other values of g(x) shortly, but when g(x) = x, we call the expected value the mean, first moment, or simply the average, designated by µ.

For example, consider if x is the result of a throw of one die.  The value you’d expect to see, on average, would be:

If you’re wondering what happen to calculating averages by adding up all the outcome values and dividing by the number of outcomes, that’s what we did here.  That’s a special case of the process for uniform distributions, where all outcomes are equally likely.  Note that, although rolling a die produces discrete integer values, the mean is a fraction and will never be rolled itself.  Think of the outcomes of the die as if they were loaded on a beam according to their value (1 is one inch from the left, 2 is two inches from the left, etc.).  Placing a fulcrum at 3.5 would balance the load on the beam perfectly.

Average of 1 roll of a die

This is an important concept when conducting polls.  The best questions allow the respondent to provide answers along an ordered continuum, rather than from a group of random options.  For example, if we asked 100 people:

How crunchy do you like your peanut butter?

  1. Extra crunchy
  2. Medium crunchy
  3. Smooth

we might calculate a mean response of 2.735 and draw reasonable conclusions from it.  However if we asked:

What’s your favorite primary color?

  1. Red
  2. Green
  3. Blue

calculating a mean of 1.68 would be of little use.

Variance and Standard Deviation

Suppose you were interested in measuring the spread of the data you collected for your RV.  One idea might be to calculate E(x – µ), or the expected difference from the mean.  A larger E(x – µ) would imply that the results are spread further from the mean value.  However, if you plug (x – µ) into the equation, a little algebra will show you that E(x – µ) is always zer0.  It makes sense, since µ is positioned exactly in the middle of all possible values (consider the fulcrum above).

Since we’re really only interested in the magnitude (rather than the sign) of the difference  between the value and the mean, statisticians use the square of the difference, which always produces a positive result.  Therefor, if g(x)=(x – µ)2, we define the variance, Var(X) or σ2, as :

Returning to our fair die example, the variance of one throw is:

Finally, since the variance’s units are the square of the original values, we define the standard deviation, σ, as the square root of the variance.  For the fair die:

OK, that’s enough for today.  We’ve defined a lot of terms we’ll use throughout the coming months, as well as introduced several important functions.  Next, we’ll talk about correlation, dependence, comparisons of 2 RV’s, sample populations, and hypothesis testing, as well as some ideas of our own.  Hang in there.