Tuesday, October 28, 2008

If the polls are skewed...

There's been some concern about the accuracy of polls -- are people telling the truth? Is there a skew towards Obama because people are afraid of appearing racist? Is there a skew towards McCain because polls exclude non-land-line owners (who tend to be older and more Republican)?

Well, I can't answer the "accuracy" question with any certainty. However, being the database geek that I am, I can crunch numbers.

I took the poll data from RealClearPolitics and tossed them into an SQLite database; you can download the database itself here. Taking the latest poll data, here's what the election results look like if the percentage points are skewed in one direction or another.

SkewObamaMcCainUndecidedResult
McCain +62632760McCain
McCain +526321561No winner
McCain +432418233Obama
McCain +33571820Obama
McCain +23571820Obama
McCain +13571820Obama
No Skew35716814Obama
Obama +13711680Obama
Obama +23711680Obama
Obama +33711680Obama
Obama +43711680Obama
Obama +537115810Obama
Obama +638113226Obama


In case you want to play with the database:
ELECTORAL_VOTES contains a mapping from state to number of electoral votes for that state (state, votes).
POLLS contains a listing of each poll; the rows contain: state, poll_date, poll_name, obama, mccain.
LATEST_POLLS is a view containing the latest polls from POLLS.
SKEW is a table containing the integers from -6 to 6.

The query to produce the above table is:
select skew, sum(obama_votes), sum(mccain_votes), ifnull(sum(undecided_votes), 0)
from (select skew, (mccain - obama > skew) * votes mccain_votes,
(mccain - obama < skew) * votes obama_votes,
(mccain - obama = skew) * votes undecided_votes
from latest_polls
cross join skew order by skew) group by skew;

No comments: