Bell Curve The Law Talking Guy Raised by Republicans U.S. West
Well, he's kind of had it in for me ever since I accidentally ran over his dog. Actually, replace "accidentally" with "repeatedly," and replace "dog" with "son."

Thursday, October 23, 2008

Calling out 538 -- UPDATED

I love I think they're great, I love their analysis, all the rest. But I have to call them out today for this piece knocking a IPD/TIPP poll. The poll says that 18-24 year-olds are going for McCain 74%-22%, which is clearly ridiculous and needs to be mocked. But the math he uses to do it is pretty bad. Let's go over his assumptions. (I'm hiding it behind the fold for mathophobes.)

Suppose that the true distribution of the 18-24 year old vote is a 15-point edge for Obama. This is a very conservative estimate; most pollsters show a gap of anywhere from 20-35 points among this age range.
If he's assuming a 15-point edge for Obama, I take that to mean that the probability an 18-24 year-old will vote Obama is 0.575 and the probability he will vote McCain is 0.425. It looks like he's taking out undecided/other.
About 9.3 percent of the electorate was between age 18-24 in 2004. Let's assume that the percentage is also 9.3 percent this year. Again, this is a highly conservative estimate. The IBD/TIPP poll has a sample size of 1,060 likely voters, which would imply that about 98 of those voters are in the 18-24 age range.
This seems fair.
What are the odds, given the parameters above, that a random sampling of 98 voters aged 18-24would distribute themselves 74% to McCain and 22% to Obama?

Using a binomial distribution, the odds are 54,604,929,633-to-1 against. That is, about 55 billion to one.
This is where I object.

First of all, a binomial distribution can't give an outcome where you have 74% to McCain and 22% to Obama, because .74+.22=.96, not 1. You'd need to use a multinomial distribution, and try to figure out somehow what the fraction of "other" voters are in your sample.

So let's do it with a binomial distribution and try to figure out the probability of 74% McCain, 26% Obama, i.e. 73 McCain voters and 25 Obama voters. This comes out to 10 billion to 1. Not the number Nate gets, but still awfully long odds.

Okay, but I'm not done objecting. Why? Because the probability of any one outcome is pretty low. As an example, what is the probability that a random person is exactly 183.4 cm tall? Probably pretty low, even though a lot of people are approximately that height. A better probability to compute would be the probability that someone is between 180cm and 185cm tall.

So, a better number to give to show how ridiculous the poll is would be the odds that McCain beats 70% of the vote. This is best computed using the central limit theorem and (if my math is right) says that the odds McCain would get 70% or more of the vote in any given poll are about 150 to 1.
Update: Okay, this is wrong. I just plain computed the wrong probability here. Thanks to several commenters who pointed this out. Let's just do it directly since I have Mathematica handy.

Let X be the number of McCain voters in the sample. Then X is a binomial random variable with n=98 and p=0.425. We want to compute P(X>=69) which equals


Plugging this into Mathematica gives a probability of 2.12127x10-8, or odds of about 45.8 million to 1.

In the comments, Bob suggests not spotting the pollster 70%, but just starting at 74%. 74% of 98 is 72.5, so let's compute the probability of at least 72 McCain voters. Again using Mathematica, this gives a probability of 5.09459x10-10, or odds of about 1.96 billion to 1.

Further, anonymous suggests that Nate not assume that there are 98 18-24 year-olds in the sample, but that he should take into account all the possibilities. I've wasted enough company resources on this problem already, so I'll take his word for it on his math.

In any case, my original two points stand: you can't use a binomial distribution to compute the probability Nate said he did, and computing the probability of getting exactly the result the pollster did is misleading.

It's clear that this poll is an outlier. But don't tell me this was a 50 billion to 1 shot. 150 to 1 is more like it.


Raised By Republicans said...

Great analysis Bell Curve!!!

I'm reminded of a great joke. A physicist of some sort is giving a lecture on the Sun. The lecture goes on for an hour or so and among other things he mentions that the Sun will grow into a red giant and consume the Earth in about 4 billion years.

After the lecture a guy in the back and raises his hand, "When did you say the Earth would be destroyed?"
"4 billion years."
"Oh! Phew! For a second there I thought you said 3 billion years."

The Law Talking Guy said...

You should email Nate Silver so he fixes this. Clearly his brain was somewhat flummoxed by the obviously ridiculous polling result. I have the feeling someone at IPD/TIPP just got those figures backwards.

Dr. Strangelove said...

Bell Curve: yes, e-mail this to Nate Silver. This is a blot on the otherwise excellent statistical record of their site. You would be doing them a service.

Anonymous said...

Your math is definitely wrong. The skinny: Use the standard breakdown of probability, P(A&B)=SUM P(A|B[i])*P(B[i]), where A and B are events, and B[i] is a partition of B. Here, we have

P(D,N such that D<.3*N) =
SUM P(D<.3*N|N) * P(N)

where the sum is over possible values of N. So this just says that the probability that the D (Dem) vote is less than 30% of the youth vote (N) is just the following -- the probability that the Dem vote is less than 30%, GIVEN N youth votes, times the probability of N youth votes -- summed over all possible values of N.

Okay, so I have no desire to do all the math for this, nor to verify any given approximation, so I'm just going to bound the probability. On average, N=98, so I'm going to look at N ranging from 10 to 180. (This is really not restrictive, because the probability of observing N=10 or 180 is less than floating point accuracy, and anything lower or higher will obviously be LESS probable. It zeros whatever it multiplies.) For each N, I calculate the probability of D=.3*N (to be precise, the probability of the floor of that).

Okay, so I now have P(D=.3*N|N) and P(N) for our given poll size of 1060, for N=10,11,...,180. Now we want to know what P(D<=.3*N|N) is. (That is, we want to add in the smaller observations -- we want the probability that Obama gets 30% OR LESS, as you discussed.) I don't want to deal with cdf's for binomials (approximations for binomial pdf's are troublesome enough!), so I'm going to bound it as follows: Notice that, since we're already WELL below the expected value, and since the binomial is unimodal, P(D=.3*N-1|N) < P(D=.3*N|N). That is, anything less than 30% is even less probable. So I'm just going to bound P(D<=.3*N|N) with .3*N*P(D<=.3*N|N). That's likely to be WAY bigger than the actual number, but no big whoop. So I multiply that number by P(N) for each N and add 'em up. Roughly, a 1060-person poll where McCain gets 70% or more is a 1 in a million shot. (To be precise, 1 in 961,350 shot.)

And remember, that's just a very generous bound. It's less likely than that.

Dr. Strangelove said...

Here is a rule of thumb to a different (but related question) that I find useful. (I hope I am typing this right.) For most competitive polls, if you survey N people, a good estimate for the purely random standard deviation ("sigma") due to your sampling error is 0.5/sqrt(N). For example, if you survey 100 people, a quick estimate for sigma would be 0.05, or 5%.

For most questions you want to ask about the poll, you usually just need this table:
0.5 sigma = 31%
1 sigma = 16%
1.5 sigma = 7.5%
2 sigma = 2.3%
3 sigma = 0.14%
4 sigma = 0.003%

Here's how you use it. Suppose you poll 100 people and your candidate scores well at 60%, but you already suspect a 3 point bias in favor of your candidate. After incorporating the bias, how likely is it that your candidate's true level of support is actually 50$ or less?

(a) N = 100 gives you sigma = 5%
(b) 60% - 50% - 3% bias = 7% required random shift
(c) 7% shift is about 1.5 sigma
(d) 1.5 sigma means a chance of 7.5%

If you had surveyed 1000 people, however, the sigma would have been much smaller (1.6%) so the 7% required random shift would have represented more an underlying result more than 4 sigma from the mean, or less than 0.003%.

This is crude, cuts corners, and ignores lots of stuff, but it is handy for back-of-the-envelope calculations.

Incidentally, the plus or minus quoted by polls usually represents a "95% confidence" level, or a spread of 2 sigma in either direction. So for a typical survey of 1000 people, sigma is about 1.6% so a fair estimate of the plus or minus would be about 3.2%. And that is usually about right.

Dr. Strangelove said...

Anonymous, if that's your real name... :-)

I believe Bell Curve was talking about an estimated 98-person sub-sample in which McCain appeared to have 70% support, where 42.5% would have been expected (not 1100 people). However, reversing the question I discussed in my previous comment, this still represents about a six sigma shift from reality, which puts it in the one-in-a-million likelihood range. But my "estimate" is very crude here and I did not repeat Bell Curve's math, or yours, so I am not sure if this is relevant. But I'm a math geek, so I had to chime in...

bell curve said...

Anonymous --

a) Why are you summing over N?
b) How are you getting P(N)?
c) Dr. S is right that I don't care about a 1060-person poll being 70% or more McCain ... which looks like it's what you're calculating.

Dr. S -- you don't want to use the CLT? I never trust these back of the envelope calculations.

The Law Talking Guy said...

I don't have the foggiest idea what it means, but I am going to work the phrase "six sigma shift from reality" into every conversation from here on out.

The Law Talking Guy said...

Oh I see, we're talking standard deviations. Nifty. I like that phrase, though.

Anonymous said...

I'm summing over N (which is the total number of young likely voters surveyed, not the total sample size of 1060) because it's another source of variance. There are two sources of variance to consider. First, the total number of young voters in the sample. Second, the number of those voters who choose Obama (or McCain -- obviously one could do the math either way). So I'm trying to take both into account. It's not a calculation of McCain getting more than 70% in the entire sample, it's a calculation of the probability that McCain gets more than 70% of the young voters in a 1060-person sample (without assuming that the number of young voters exactly equals the mean of 98).

N is also binomial, so P(N) is easy to get. Just think of this as a two-stage process, with both stages being binomial. Call a voter, and the first random variable is Bernoulli -- young or not. Conditional on being young, the next Bernoulli r.v. to consider is Obama/McCain.

If I assume a 98-person subsample of likely young voters, I can apply the same logic and bound the probability as less likely than 1 in 2.25 mil (approximately). Just take P(D=floor(.3*98)), given the parameters we're using, and multiply that by floor(.3*98).

Apologies for using the anon tag, but I don't have a blog or anything like that, and I don't feel like opening a google acct (just an annoyance, and the Phillies are on -- or off, I guess). I was going to reply at 538, but they require that, so I just thought I'd point it out here.

Dr. Strangelove said...

Anon: no need to apologize at all for the anon tag. I did not realize you were attempting to account for the variance in the number of young voters sampled. It seems we are getting numbers in the same ballpark, on the order of one in a million.

Bell Curve: I approximated the end result with a normal distribution, which I guess I thought pretty much was using the CLT... But this is not precisely my area of expertise: I'm just a back-of-the-envelope sort of guy. Maybe you could show your calculation, at least in a little more detail?

Bob said...

I think Bell Curve should stop for a second before emailing Nate Silver.

Firstly, there's a statistical principle here. Bell Curve is completely right that the likelihood of any particular outcome is tiny, and not a good way to measure whether the outcome was "likely" or whether the poll is "suspect". But the usual statistical comparison is what is the probability of getting the actual (sample) result or further from the mean. The right thing to look for, then, is the probability that McCain got 74% or more of the sample.

Saying "McCain's result was 74%, so let's look at the probability that the poll (sample) would come up with _70%_ or greater" is unfairly helping your own argument, since you're throwing in the chunk from 70% to 74%. So that part of the argument needs to be refined in any case.

Here's what I looked at. There's bound to be a few flaws in it, but I think it suggests Nate's numbers are closer to the mark.

First, there's the chi-squared goodness-of-fit test, as described here and here. The chi^2=41.04 for a 98-person sample producing an outcome of 73 McCain, 25 Obama, when the population is 41.25% McCain, 56.35% Obama. That's a big chi^2 value; according to the distribution calculator I just downloaded, that corresponds to a level of significance (for a one-sided tail) of about p=2.18*10^(-11). That's pretty close to Nate's odds (which give a probability of 1.83*10^(-11).) Since his numbers are so specific, and I haven't hesitated to round off, I'm thinking Nate has some exact binomial distribution data to get the precise value, and it looks to me like he's counting the tail, not just the chance of that precise outcome.

But far be it from me to just throw yet another approach out there and not produce some apples to compare to your apples. With a name like Bell Curve, you can't fault the guy for going to the Central Limit Theorem. :)

Okay, the binomial random variable Y is the number of young polled people who said they favored McCain, which has n=98 and we posit p=.425. So the expected value is np=41.65 and the variance is np(1-p)=23.94875.

The Central Limit Theorem says that Z=(Y-41.65)/sqrt(23.94875) is approximately N(0,1). The actual result was 73 for Y, or 6.406135493 for Z. The probability of Z being greater than or equal to 6.406135493 is 7.4627*10(-11), which again is more in the ballpark of Nate's result than 1 in 150.

Also, referring to Dr. S's approximations, the sigma for this case was sqrt(23.949)=4.89, pretty close to his 5% estimate, and the jump of 31.35% was 6.4 sigmas, close to his estimate of 6 sigmas or so. Which don't appear on his table, because 6 sigmas means really really unlikely. (Other than fudging p(1-p) in the sigma, which is a low-error fudge, Dr. S _is_ using the CLT, albeit with some roundoff.)

I don't have the facility to replicate Bell Curve's computation of the binomial distribution directly of the probability of getting exactly 73 McCain voters out of 98, so I couldn't say why that number came out at lower odds (=higher probability) than these tests indicate the probability of getting greater than or equal to 73 McCain voters out of 98 should be.

But I'm very suspicious of the 150 to 1 odds Bell Curve ends up with, for the reasons given above.

LTG, it may uninterest you to know that "six sigma" is a business management strategy, (some might say "fad"), that involves co-opting statistical methods for quality management of business processes, or something equally buzzwordy.

Dr. Strangelove said...

Bob: Thanks for reading through all this. And yes, I deliberately fudged the factor of sqrt(p(1-p)) because if your candidate is polling anywhere in the 30%-70% range, that numerical factor is within the range of 0.46 to 0.5, so I figured--what the heck--dividing by two was good enough. (And if p outside that range, one of the candidates is just toast so there's no need to calculate.)

For the record, Bob is correct that I was too generous: the odds of a six sigma deviation or greater (one-sided) is in fact about one-in-a-billion rather than one-in-a-million.

The Law Talking Guy said...

I just want to say that this is a marvelous conversation and I am enjoying as much of it as I can understand. I'm trying not to feel dumb. If I feel too dumb, I'll start some thread about something that I think I know more about than you all. This may take a while, but I'll figure it out. Slavonic etymology?

On a more serious note, I fear that the level of sophistication displayed in this thread far exceeds anything you will find at most polling operations. This tells you something about the quality of that data we are getting. So I'm starting to think that "polling averages" may be more useful than I had previously thought in trying to gain some insight from a set of flawed (but in a sense randomly flawed) observations and calculations.

jdk said...

My objection to the Poblano's math is more fundamental. He shouldn't even be using probabilities at all.
Of course, I also smuggly note that I pointed this "problem" out days before he did in my post to his tracking poll primer commentary.

The better approach is to use a p-chart a la Shewart and Deming, to answer the question: Does it make sense to look for a special cause (typo, fabrication, methodological problem, whatever) for a particular result? If yes, look for the cause; If no, the data is what it is.

Western Electric Rules may also be applied, especially with tracking polls since they form a time series.

Control limits for p-chart calculated as follows:

UCL = p-bar + 3* sqrt(p-bar*(1-p-bar)/N)
LCL = p-bar - 3* sqrt(p-bar*(1-p-bar)/N)

p-bar is the p for ALL samples of the same cohort (18-24). No assumptions about p, just calculate it. The problem, as I had noted before Nate's post is that it doesn't appear that anyone breaks down the age cohort this way. It seems to always be 18-29.

N is the sample size in question.

There is a certain amount of cooking that goes on in ALL political polls, because there is no standard way that weighting or screens are used. But the control chart is a very economical way to ferret out when one should look for something or not.

Bell Curve said...

I've updated the post since my math was definitely flawed. Thanks to all the commenters for your great work!

Dr. Strangelove said...

jdk: I am not familiar with a "p chart" or "Western Electric Rules," but I notice that your upper/lower control limits are equivalent to +3/-3 sigma from the mean, using the normal distribution approximation that Bob and I discussed. And for any reasonable value of p (0.3 to 0.7) the limits will be pretty much the same. What is your conclusion here?

Bell Curve: nice work, and thanks for writing the binomial distribution sum explicitly.

A note to anyone else who has made it this far: 538 differs from sites like Pollster, electoral-vote, and RCP in that the other sites use weighted averages to estimate the current mood, while 538 attempts to project the final outcome on election day based on additional assumptions including trend-line regression analysis, demographic data, and historical data from the 2004 election. You can see the "polling average" in the 538 tables, as well as the "projection." I believe Bell Curve is reporting the "projection" line (the one in bold) but to compare 538 with the others, it might be more appropriate to use the "Polling Average" line instead.

jdk said...

If you are not familiar with p-chart, a control chart, the read bead experiment, Western Electric Rules, Deming, Shewhart or one of my favorites the Moving Range/Individual X chart; I think in this day and age "google" and wikipedia seem to be the answer.

Rather than talk about it probabilistic terms, statistical process control techniques focus on an understanding of variation (what polls are about) through distinguishing between assignable causes (some thing in particular) versus common causes (systemic sources) of variation. Secondarily, the control chart is a test of statistical "stability". Only where there is statistical stability, can you actually use sampling to make predictionsn.

If one alleges that a poll is crap, you are saying that there is an assignable cause of variation. Or alternatively, if there is no statistical stability produced by a system (what ever that system might be) - then you cannot really predict anything.


Dr. Strangelove said...

I took your advice and checked Wikipedia.

Regarding "p-charts," Wikipedia notes that, "Control limits for the p-chart are calculated on the basis of the binomial distribution and an approximation based on the central limit theorem."

Regarding "Western Electric Rules," Wikipedia describes them as decision rules for detecting non-randomness in output. Basically, you divide the output into bins or "zones" based on the expected standard deviation and then watch how those bins fill up, compared to how you would expect them to be filled by purely random, normally distributed output. (Excessive "runs" in a single bin, or quantities outside the control limits altogether, are signs of a problem.)

It appears to me that terminology is the only substantive distinction between the "probabilistic" methods we have been discussing and the "statistical process control techniques" you advocate. You are treating a poll as a process rather than a sample, but the analysis is based on the same principles.

jdk said...

"It appears to me that terminology is the only substantive distinction between the "probabilistic" methods we have been discussing and the "statistical process control techniques" you advocate. You are treating a poll as a process rather than a sample, but the analysis is based on the same principles."

No, the analysis is really not based upon the same principles. The computation is the same (in this particular instance)but the analysis is very different.

The difference might be whether a poll is an enumerative study or an analytical study.

No time to develope the philosphical distinction, there is an election going on.

Anonymous said...

[p]All around into the the twentieth 100 [url=]kids ugg boots sale[/url] years, Nike shox NZ boots or shoes add more even more way essentials . Frosty UGG a pre-existing are very well acknowledged with regard to their [url=]cheap ugg australia boots sale[/url] chance to allow for ease additionally advantage . uk is often a partner [url=]ugg boots sale[/url] of the official uggs uk shops . The Appearance Of Classic Cheap Ugg Boots Sale Are Perfect GreatYou [url=]ugg australia boots[/url] may wonder cheap christian louboutin that why the prices of them are so cheap . They [url=]genuine ugg boots uk[/url] looked ugly and naturally they are called 隆庐ugly boots隆炉 . As an ankle height boot the Classic Mini is perfect for year round [url=]genuine ugg boots uk[/url] wear . Ourite parish, [url=]cheap ugg boots for sale[/url] most of us enquired stimulus police officers that suits a couple of maqui berry farmers simply because capital of scotland - Flagaman ended up being removed different display dirty floodwaters and doesn't imagined ripped . To keep find out males who have been pushing all the way down $25K on a monthly basis around profits earnings some three years backside.[/p]

[url=]There Offer the discount MBT shoes online with high-quality 3o[/url]
[url=]Here Offer the discount MBT sandals shoes online with good quality 9p[/url]
[url=]There Offer the cheap MBT sandals for women online with good quality 4e[/url]
[url=]Here Offer the discount MBT sandals shoes online with good quality 3t[/url]
[url=]There Offer the discount MBT sandals for men online with good quality 4g[/url]