Bell Curve The Law Talking Guy Raised by Republicans U.S. West
Well, he's kind of had it in for me ever since I accidentally ran over his dog. Actually, replace "accidentally" with "repeatedly," and replace "dog" with "son."

Wednesday, October 01, 2008

How To Be a Poll Junkie 101

Hi Everyone,


I thought I would get the ball rolling on a discussion about how to read all the polls that are coming out.  Here are some of the main issues:

1)  Some thoughts about sampling.  Polls are not a census of the entire electorate - obviously.  They are based on samples of the electorate that the pollers assume is representative of the entire population.  Now you might say, "yeah, I know.  Random samples." But actually, polling techniques are much more complicated than simply taking a random sample of people.  They have to be when you think about.  For example, if you are conducting a phone poll and you get your numbers from lists of land line phones, you will almost certainly over-sample people over 50 and under sample people under 40 because younger people are more likely to use only cell phones.  Since young people are more likely to vote for Obama that could throw off the poll's results.  Similar problems crop up for door to door polls, computer based polls etc.  To compensate for this, polling companies adjust their samples to fit the proportions of each type of voter in the electorate.  That's where the difference between "likely voter" and "registered voter" polls come from.  They try to weight their results by those respondents who they believe are more likely to vote.

2)  This brings up the question of what is a likely voter.  Most of these polling companies use models based on past voting patterns.  These models vary somewhat.  But the honest answer is that no one really knows for sure what a likely voter is this year.  Traditionally young people and African Americans are much less likely to vote than are older, white Americans.  But since Obama's most enthusiastic support comes from the youth and African American, turnout among these groups could spike which would cast doubt on most of the "likely voter" models.  In the past, Republicans typically did a little better in the likely voter polls.  But that's not consistently true this year.  Mainly because different polling companies are using different models to predict who is a likely voter and who isn't.

3)  National tracking polls vs state by state polls.  Given that the US President is elected in state by elections, the national tracking polls really don't tell us much at all.  What the national polls can do is tell us about regional variations a little.  Also, one national poll is cheaper to run than 50 state polls.  National polls therefore get done more frequently and can be good bellweather indicators of overall trends in support.

4)  Polling averages are all the rage this time.  There are lots of news sites out there that do variations of an "poll of polls."  Realclearpolitics.com is one.  Another is Fivethirtyeight.com.  Basically the idea is to compensate for the fact that some of the polls may have bias problems or big standard errors that lead to wide variation in numbers from one poll to the next.  The poll of polls idea is to average a bunch of polls together to try to wash out some of that variation.  But if the polls being averaged as mostly biased in the same direction, then averaging them won't help that much.  Polling averages can give a convenient way to see trends over time in a single number.  Otherwise you'd have to look at trends in many polls and having an average is just easier.  

As with most things, you have to be a sophisticated consumer of the data.  You shouldn't read too much into any one poll.  Look at trends and groups of polls.  

There are several people who post regularly or comment regularly on this blog who are FAR better at math than I am.  I'd like to hear what they have to say about reading polls.

UPDATE:  Push polling.  Push polling is when you set up the response you want by asking questions in leading or provocative ways.  For example you could ask a bunch of questions about whether you think there should be a Muslim President in the United States, then ask if you've heard that Barack Hussein Obama is a Muslim, then ask who you'd vote for Obama or McCain.  Obviously the people who ask these polls aren't looking for information.  They're looking to publish their results to generate buzz about their candidate and maybe sway a few votes among the respondents to their fake poll.  There a lot of rumors in the blogosphere about Republicans doing push polls in Florida and Pennsylvania that suggest that Obama gives money to the PLO.  

13 comments:

Ilyas said...

why bother with complicated fact finding and number crunching when you can just do as the republican machine does :

make something up and keep repeating it until middle America believes it.

The Law Talking Guy said...

Thanks, RBR. I am concerned that the "poll of polls" gets mentioned as if it is more scientific or likely to reflect actual preferences than the individual polls that go into it. I think it is useful for identifying trends, but it is hard to talk about accuracy.

Dr. Strangelove said...

All things being equal, a "poll of polls" should be reflect actual preferences better than any single poll. Anyone who has had to sit through any statistics course, however, has suffered through a long catalog of the many ways in which all things might not be equal.

As RbR nicely described, pollsters cannot "measure" the variable they wish to measure directly: (a) they have to select people for their sample, and the selection method can easily introduce bias; (b) they have to ask people their opinions, and the way in which they ask the question can easily introduce bias; (c) and even if they do everything else right, the people they ask might not be honest with them--or with themselves--about their true intentions in the voting booth.

So there is a whole science of polling to try to correct for these factors and eliminate systematic bias. You all know this, I know. Forgive me for belaboring the obvious here. But there are two basic principles of statistics that I always keep in mind when it comes to polls.

First, the Law of Large Numbers says the more measurements of a single random variable you average together, the more accurate that estimate of the mean becomes. It does not matter how that variable is distributed: more measurements get you more better results. Second, the Central Limit Theorem goes a step farther and says the more random variables you average together, the more you approach the normal distribution. It does not matter how the variables are distributed: combining enough of them yields a bell curve. (The devil is in the details, of course, but this is the gist of it.)

So anyhow, my point in mentioning those two theorems is just that so long as the pollsters try in good faith to eliminate error independently (i.e. they do it in different ways, without colluding), it is perfectly reasonable to believe that averaging all the polls together will produce a more precise, more accurate result. How much better? I have no idea, really. The very basic rule of thumb says uncertainty shrinks as the square root of the number of measurements: e.g. averaging nine similar polls reduces the "plus-or-minus" error to one-third of the original size. But that is the very best case: I'm sure the improvement is much less impressive. But it should improve a bit, anyhow.

Well, looking back on this comment now, it seems like I am not really saying anything except spouting trivialities. Oh well. But it took me a while to write so I'll just submit it anyhow :-)

Dr. Strangelove said...

What I wonder about most is not averaging lots of polls taken on the same day, but rather the averaging of different polls--or even the same poll--across time. Do voter's opinions really change that much, i wonder, and that frequently? Whose opinions keep changing to give us these dramatic ups and downs?

I figure there is a small group of people--oh, let's say 20%--who really just do not have a strong opinion and who probably won't really even know who they are voting for until they are standing in the voting booth. In that case, these folks act more like the identical random variables statisticians love. Or perhaps they are quantum voters: they are neither going to vote for "Obama" nor "McCain" but rather exist in a superposition of "Obama" and "McCain" states, and they only pick one when you ask the question. So quantum mechanics means god not only plays dice--he does push polling :-)

Raised By Republicans said...

Yeah, what would Einstein say? God does play dice with the universe and he uses loaded dice.

I'm glad Dr S brought up push polling. I should have mentioned that. Maybe I'll update the original post with an example.

Anonymous said...

Dr. S, wouldn't the variation depend on who was being sampled? The poll isn't administered to the same people every time. The compoistion of the sample changes.

The Law Talking Guy said...

To me, it seems like the fallacy of the golden mean is at issue. Each polling agency believes its poll is the best. If I average the Rasmussen and Gallup tracking polls, am I not blindly discounting the likelihood that one of them accurately represents the opinions of the population?

Dr. Strangelove said...

USWest: the error bars are supposed to account for the variation in sample composition. If the variation exceeds the error bars significantly, the most likely explanation is that there has been a real change--something you would likely have seen no matter who you sampled. It just surprises me that people change their minds more than once during the campaign cycle--but then, I'm not an "independent" so I just don't get the mindset, I suppose.

LTG: It's all about rational expectations. If you know only the error bars of the polls, you may expect the average of several polls to be more accurate than you may expect any single poll to be. No other expectation would be rational. If you have additional beliefs about the accuracy and precision of some of the polls (e.g. from the history you believe a certain poll tends to be more accurate, or from the methodology you believe a certain poll is biased in certain direction) you may weight and adjust your average accordingly to reflect these beliefs (instead of just weighting the average based on error bars, as before).

But you should still incorporate all polls, because each one adds some information. Even if you think Gallup is better than Rasmussen, you should still expect that incorporating Rasmussen in your average--at least a little--will provide a more accurate result. And of course it may be that one poll is dead on the money, but you cannot know that--therefore you should not expect it.

Dr. Strangelove said...

A quick note on the preceding comment: if you believe Gallup underestimates Obama by 2 points, you need to correct that before you average. I was including that adjustment when I said you should "weight and adjust" your average based on your beliefs.

The Law Talking Guy said...

If your beliefs are not based in statistical sampling, how does that aid accuracy?

The Law Talking Guy said...

Push polls should be properly named as "rumormongering." Around LA I still meet Jewish people who are being told that Obama is not pro-Israel, or that he is a secret Muslim, or all that stuff. This is a dirty campaign to win Florida by McCain.

Dr. Strangelove said...

LTG writes: "If your beliefs are not based in statistical sampling, how does that aid accuracy?"

Let me take another run at it. I said that you may expect the average of several polls to be more accurate than you may expect any single poll to be. But it matters how you do the averaging. If you know nothing else, you should at least give heavier weights to polls that promise smaller error bars. (I don't know if RCP does this, but they really should. I can't find their methodology anywhere.)

All I'm saying is: if you know more about the polling methodology than just the error bars, you can include that knowledge in your weighting. Just be careful to avoid circular reasoning: if for example Gallup always puts Obama two points over your "poll of polls" average, that does not yet constitute additional knowledge. You cannot use the mere fact that Obama polls high in Gallup to say Gallup's polls deserve less weight or should be downwardly adjusted.

Dr. Strangelove said...

The more I look at fivethirtyeight.com, the more impressed I am. They do exactly what I wanted. Their average weights polls based on error bars as well as other knowledge: historical accuracy and a review of methodology which produces what they call "pollster induced error." They even age the polls, giving more weight to the most recent polls.

Move over, Real Clear Politics. I have a new favorite website!