Wednesday, November 25, 2009

The NFL and How To Lie With Statistics

(In case you need a refresher on what happened, here's a summary.)

This blog post is a response to this article: Defending Belichick's Fourth Down Decision.

For the record, I disagree with that article and I think that it's a great example of how to mislead with statistics.

A number of points:

First, using the probability collected from the league as a whole is a fallacy of false analogy. The problem is right there in front of us. The probabilities are based on the league as a whole. The false analogy here is thinking that just because the numbers crunched for all teams who have attempted a 4th and 2 will apply to the Patriots. It's reducing the complexity of football down to a mere throw of the die. The oversimplification is so drastic and untenable. There is absolutely no way you can characterize a complicated situation like that by looking at the statistics of the league as a whole. Yet people read the article and believe the logic behind it.

It also misleads about what Belichick's decision process. In the end, only Belichick knows what went on in his mind. But to imply that Belichick churned the numbers in his head and made this call based on it is a total red herring. The message was loud and clear, especially to Belichick's defense. Coach Belichick did not trust his defense. Period. Do you think he would have made the same decision if he had his defense from 2002?

Secondly, the numbers used are very dubious. To quote the article:

With 2:08 left and the Colts with only one timeout, a successful 4th-and-2 conversion wins the game for all practical purposes. A conversion on 4th-and-2 would be successful 60 percent of the time. Historically, in a situation with 2:00 left and needing a TD to either win or tie, teams get the TD 53 percent of the time from that field position.

There are a couple of things wrong with this.

  • The article qualifies the 53% with the fact that this was taken from the sample space of situations with 2 minutes left needing a TD to win or tie. But the 4th down statistic is taken without such a qualification. This already skews the numbers we need. This leads me to suspect that the 60 percent number was cherry picked to make the case stronger. Alas, I don't have any way to confirm this.

  • The 4th down probability of 60% is probably too high. First of all, this takes into account both runs and throws. Given that the Pats chose to throw, it was likely that their actual probability was lower. Add to that the fact that the Colts are playing at home with the crowd on their side. Add to that the Colts have a defense that's unbeaten at this point in the season. Throw in the unquantifiable factors. The Colts defense took it personally as a "diss" by Belichick. Add to that the fact that the Pats defense morale will take a hit, not just for this game but for the rest of the season.

So, let's recalculate the probabilities using some revised numbers. The two key probabilities here are the probability of converting the 4th down and the probability of the Colts scoring from that field position.

Here's the table if you change that probability:

60% -> 79% WP
50% -> 74% WP
40% -> 69% WP
30% -> 63% WP

And that is assuming that the Colts, at home can only score 53% of the time from the 34 yard line with the game on the line.

So let's change that value assuming that the 4th down conversion is at 50% (giving the Colts an advantage at home).

53% -> 74% WP
60% -> 66% WP
70% -> 62% WP

So if we assume a coni flip probability for 4th down conversion, and give Manning a 70% probablity of converting from that field position, then the probability of a Pats win is only 62%.

Since as we already noted that the numbers quoted in the article are NFL league-wide, then it's pretty safe to assume that the actual probability for both are higher than what is quoted in the article.

It's closer to a coin flip. It's certainly not the 79% winning probability that the article would like you to believe that the Pats had.

IMNSHO, it's almost 100% certain that Belichick screwed up this game and the rest of the season big time.

Wednesday, November 18, 2009

pythonic perl 5: Use scalar

To illustrate this point, let's look at the different ways one can get the length of a list in Perl.

Let us say we have an array/list @mylist

  1. To get the length of @mylist, simply use @mylist in a scalar context. Remember that in Perl, there are 3 contexts: void, scalar and list. Using a list in a scalar context will return the length of the list.

  2. Sound easy? Well, yes and no. When reading through code in practice, it's quite painful to have to try to figure out whether something is being used in which context. And in fact, such errors will occur sooner or later. Which brings us to this technique. We use the scalar function to force the variable to be evaluated in a scalar context. And this I present to be the pythonic perl way to do it. The question you want to ask at this point is why bother typing extra characters if it's going to be evaluated in a scalar context anyway? There are a number of reasons. First it clarifies in the programmer's mind that this is exactly what you want to do. Secondly, it makes the code a LOT easier to read 6 months later when you have to go in and fix a bug.
  3. The third way has nothing to do whether to use scalar or not and uses the fact that the following is always true: scalar(@mylist) == $#mylist + 1 . My personal preference is to use scalar over this. Some people might prefer this. This is indeed a good alternative to using scalar and is preferable to the first method.

To summarize, the following will return the length of @mylist:

$len1 = @mylist;
$len2 = scalar( @mylist );
$len3 = $#mylist + 1;

How to initialize a hash from a list in Perl

Given the following code:

@mylist = qw( little tech tips );

We want to construct a hash with the elements in @mylist as the keys. We can use the x (that's lowercase x) operator for this. Let's say we will initialize the values to cool.

The code will look like this:

@myhash{ @mylist } = "cool" x scalar(@mylist);

One thing to note. The call to scalar is actually not necessary since you're using @mylist in a scalar context with the x operator. But doing it this way makes it clear and less error prone.

Thursday, November 12, 2009

Simple way to do alternation with grep (command line)

I know that the | character is supposed to be used for alternation, but I just can't seem to get it to work with command line grep. The easy way I've found to do alternation is to use -e switch like so:

grep -e someregex1 -e someregex2 file.txt

Using more than one -e on the command line implies alternation.

Tuesday, November 10, 2009

Open Source and Free on Windows XP

I have a brand new previously owned computer with a fresh install of Windows XP. (I bought it at my company's fire sale.) As I am a bit of a software geek, I love installing new software on my PCs.

Here's a list of open source and free (as in beer) software that I like to install on fresh Windows XP machines.

First up is the browser. Internet Explorer just does not hack it for me. There are a lot of security issues with version 6. A lot of the defects have been addressed in versions 7 and 8 though, but at which point I've gotten used to the following little known browser.

For the longest time, my browser of choice is K-Meleon, a lightweight browser that uses the Gecko rendering engine. This is the same rendering engine used by the folks at Mozilla. K-Meleon though is not as heavy (hence lightweight), and has a lot of features builtin. That is, no need to screw around with plugins although you can add Mozilla plugins with a little bit of work. K-Meleon has all the goodies that Mozilla does (tabs, plugins). There are a couple of things that I've grown to like about K-Meleon. First, the search box and address box have been integrated into one box. Secondly, On a right click, you can send a new tab to either the foreground or background. This one is pretty much a K-Meleon only feature that I haven't found anywhere else.

I still use K-Meleon occassionally but it has mostly been replaced by Chrome.

What is good about Chrome? What's not to like? It has the basic features we've come to expect of browsers (tabs, builtin search, extensions, themes). The layout is great. It is fast. The builtin incognito mode is very useful. It's very, very stable. And best of all it is open source. I'm hoping for a Linux port of Chrome to show up.

Up next: Open Source and Free Office/Productivity Software on Windows XP.

Wednesday, November 4, 2009

Python backticks

Backticks in Python is an operator that converts the enclosed expression into a string.

For example:

>>> print `2+3`
>>> print `[ 1,2,3 ]`
[1, 2, 3]

Note that I intentionally changed the formatting in the list example to show that the backtick operator evaluates the expression and prints a representation. The spacing after the comma in the output is different from how I had it formatted in the print statement.