More Confusion on Healthcare

Chicago Tribune Logo

Another story, this time by the Chicago Tribune, musing about contradicting polling data over healthcare reform.  Some polls suggest support for the public option, others show dissatisfaction with the bills that provide it.  Either the data is wrong, the questions are inconclusive, or the respondents are idiots.  The author, Eric Zorn, and Mark Blumenthal believe it’s the latter.  It should be no surprise to these veterans that asking simple questions to people (of any IQ) about incredibly complex topics should lead to contradictory results.  It doesn’t make for the best media soundbites, but decomposing the problem into very small issues and building up a picture based on those ‘mini-responses’ may paint a clearer pictures.

But who would want to be on the phone that long with a pollster?

Polling 101: Probability

In our last post on Expected Values, we covered some of the basic tools a pollster might use to analyze a single random variable.  Before we move on to comparing 2 RV’s, we first need to cover one more preliminary concept.

Probability

Probability is a concept that’s easy to understand, but difficult to master.  It is a number between 0 and 1, assigned to an event, that indicates the likelihood of that event occurring.  An event is a collection of outcomes from a given RV.  The event of rolling an even number on one throw of a fair die includes the outcomes 2, 4, and 6.  An event with 0 probability means there’s no chance it will occur; an event with a probability of 1 means that it will occur with absolute certainty.

We calculate probabilities of events when we don’t have a procedure to predict the outcome with certainty (as with the trajectory of a rocket, for example).   Calculating them is a mix of art and science; a variety of mathematical rules are available to help you, but in the end, it’s a matter of analyzing all the available information.

One such  rule is, if all outcomes are equally likely (i.e. a uniform distribution), the probably of an event is the number of outcomes that satisfies the event, divided by the total number of outcomes.  The probability of rolling a 5 on a die is 1/6 (0.167) since only 1 outcome yields this result and there are 6 possible outcomes.  The probability of throwing an even number is 3/6 (1/2 or 0.5) since there are 3 outcomes that yield this result.

Probability Functions

We denote the probability of event A as Pr(A).  We can also define a probability density function (PDF) for continuous RV’s, and a probability mass function (PMF) for discrete RV’s.  PDF’s and PMF’s plot the probability of each outcome in a single function.  Many common functions occur frequently in nature.  One of the most common is the Gaussian or Normal function show below.

courtesy of wikipedia.org

courtesy of wikipedia.org

The Normal PDF forms the well known “bell-shaped curve”, with it’s mean value, µ, at the center.  We’ll return to the Normal function later, but for now will confine ourselves to discrete PMF’s.  For our die example, the PMF is:

coutesy wikipedia.org

coutesy wikipedia.org

Here we see that the probability of each outcome is the same, 1/6.

Polling 101: Expected Values

We know, Math was not a subject you were planning to sit through when you visited us.  However in order to lay the foundation for sound polling procedures, we need to cover a few basics.  Consider them refreshers; for us as much as it is for you.

Random Variables

Like any statistician, pollsters must reduce their raw data to a numeric form to represent real-world phenomenon.  This transition from the real-world to numbers is called a random variable (RV).  For example, if you threw one die, you could define the random variable x to represent the outcome.  The mapping to numeric values would simply be the value on the top of the die after each roll.  A pollster could ask a multiple choice question to a group of people where x represented the number of the response selected.

RV’s come in two varieties: discrete and continuous.  Discrete random variables can take on one value from a finite list of possible candidates.  The roll of a die or a poll respondent’s answer to a multiple choice question are examples.  Continuous random variables can take on any value in a given range.  A person’s height is an example of a continuous random variable.  Since polling always deals with discrete responses, we’ll limit our discussions to discrete RV’s.  Besides, the math is easier.

Expected Values

Of the calculations you could make on a random variable, the most fundamental is its expected value , E(X).  E(X) is, as the name suggest, a calculation of the value you’re most likely to expect, given enough samples.  An expected value can be calculated on the random variable itself or any function of it, say g(x).  Before you can calculate an expected value however, you need to know its probability mass function (PMF).  A fancy term, but the PMF simply defines the probability of each possible outcome.  If the die is fair, for example, the probability of any outcome between 1 and 6 is 1/6, and 0 for all others.

The general formula for the expected value of g(x) is:

Formula’s provided by mathURL.com.

If you’re unfamiliar with the notation, the equation basically instructs you to list every possible outcome for X (let’s say there’s n of them), run each through the function g(x), and multiply the results by the probably of that outcome occurring.  Sum (i.e. add) these values for each possible outcome, and you’re done.  It may seem like unnecessary tedium, but we’ll see shortly how this is used to calculate familiar values like the average and standard deviation.

The Mean

We’ll look at other values of g(x) shortly, but when g(x) = x, we call the expected value the mean, first moment, or simply the average, designated by µ.

For example, consider if x is the result of a throw of one die.  The value you’d expect to see, on average, would be:

If you’re wondering what happen to calculating averages by adding up all the outcome values and dividing by the number of outcomes, that’s what we did here.  That’s a special case of the process for uniform distributions, where all outcomes are equally likely.  Note that, although rolling a die produces discrete integer values, the mean is a fraction and will never be rolled itself.  Think of the outcomes of the die as if they were loaded on a beam according to their value (1 is one inch from the left, 2 is two inches from the left, etc.).  Placing a fulcrum at 3.5 would balance the load on the beam perfectly.

Average of 1 roll of a die

This is an important concept when conducting polls.  The best questions allow the respondent to provide answers along an ordered continuum, rather than from a group of random options.  For example, if we asked 100 people:

How crunchy do you like your peanut butter?

  1. Extra crunchy
  2. Medium crunchy
  3. Smooth

we might calculate a mean response of 2.735 and draw reasonable conclusions from it.  However if we asked:

What’s your favorite primary color?

  1. Red
  2. Green
  3. Blue

calculating a mean of 1.68 would be of little use.

Variance and Standard Deviation

Suppose you were interested in measuring the spread of the data you collected for your RV.  One idea might be to calculate E(x – µ), or the expected difference from the mean.  A larger E(x – µ) would imply that the results are spread further from the mean value.  However, if you plug (x – µ) into the equation, a little algebra will show you that E(x – µ) is always zer0.  It makes sense, since µ is positioned exactly in the middle of all possible values (consider the fulcrum above).

Since we’re really only interested in the magnitude (rather than the sign) of the difference  between the value and the mean, statisticians use the square of the difference, which always produces a positive result.  Therefor, if g(x)=(x – µ)2, we define the variance, Var(X) or σ2, as :

Returning to our fair die example, the variance of one throw is:

Finally, since the variance’s units are the square of the original values, we define the standard deviation, σ, as the square root of the variance.  For the fair die:

OK, that’s enough for today.  We’ve defined a lot of terms we’ll use throughout the coming months, as well as introduced several important functions.  Next, we’ll talk about correlation, dependence, comparisons of 2 RV’s, sample populations, and hypothesis testing, as well as some ideas of our own.  Hang in there.

Mixed Signals

Collecting public opinion data can be a complicated task, but not nearly as complicated as interpreting it.  A recent Pew survey found that support for sustained troop involvement in Afghanistan fell mostly along party lines (71% of Republicans, 37% of Democrats), with overall approval slipping over the last 3 months (57% to 50%).  With no clear trend indicated, these numbers make it difficult to form a substantial policy based on consensus.

In another poll conducted by Stanford University and the Robert Wood Johnson Foundation, participants were asked if they supported a federal ban on insurance companies denying health care coverage based on pre-existing conditions.  Support fell dramatically when the participants were told what such a ban would cost.

In both cases, poll takers failed to understand the depth of the problem by overlooking the various dimensions of these issues.  Results may depend on a participants affiliations or previous knowledge of the subject matter, but also a myriad of other factors that can’t be explored in a 3 minute phone call or mall interview.  Statisticians call these interrelations coherence, the subtle interaction of several tiny factors painting a larger picture.  Breaking down issues into their intrinsic components is they only way to understand these seeming inconsistencies.

Open Source Polling Redux

It’s been a battle cry for some time now.  “Open Source Polling” is the solution to what’s ailing a seemingly flagging industry.  But as with most catchy cliches, a useful definition is illusive.  In the aftermath of the 2004 mid-term elections, Mark Blumenthal comes close, but fails to capture the most important feature: opennessJeff Jarvis came closer, siting that polling should be (among other things) efficient, unbiased, free, and, above all, transparent.  These are some of the core tenets of proloquor.net, tenets we’ll explore in the coming weeks.  Stay tuned.

Welcome!

Welcome

Welcome to Prolog, proloquor.net’s web log.  Here you can learn more about proloquor.net, the art and science of opinion polling, and anything else that comes to our minds.