Saturday, August 22, 2009

The Mathematics of Hitting Streaks

With the hope that there's actually someone other than my coauthors reading these posts once the college football season arrives (when the hits to the old page understandably ramped up in past years), one of the upsides to transitioning to a blog is to provide easy pointers to other interesting work in the mathematics and statistics of sports.

There are a pair of papers about hitting streaks that have appeared on in the past year. Making things particularly interesting, these two papers take completely different methodological approaches. Sam Arbesman and Steve Strogatz "examine Joe DiMaggio’s 56-game hitting streak and look at its likelihood, using a number of simple models. And it turns out that, contrary to many people’s expectations, an extreme streak, while unlikely in any given year, is not unlikely to have occurred about once within the history of baseball." Meanwhile, Trent McCotter uses permutation tests to find that there appear to have been a significantly larger number of 20-25 game streaks in real life than one would obtain in an independent-games model. You can hear Steve talk more about both studies in a Radiolab podcast from earlier this summer.

Finally, for perhaps the only timely element of this post, Steve has a new book just out this past week, The Calculus of Friendship: What a Teacher and a Student Learned about Life while Corresponding about Math. If it's like everything else Steve does, it will be amazing.

Addition (29Aug): For more discussion about hitting streaks, other streaks, and the way that people tend to overinterpret streaks, check out Leonard Mlodinow's interesting WSJ essay, "The Triumph of the Random."

Another addition (31Aug): Trent McCotter's second N&O column is about hitting streaks, with a decidedly local-to-NC flavor ("Zimmerman best in state at hitting streaks").

Labels: , , ,

Tuesday, August 11, 2009

Random walking through baseball

Now that the new site format appears to be largely up and working, it's time to start digging into a backlog of math-in-sports topics I've wanted to briefly write about. That said, if anyone has a general solution for the seemingly infamous "Publishing your blog is taking longer than expected" problem occasionally afflicting those of us who ftp-publish to other servers, I would love to hear about it, please!

Today's links are all about baseball. No, not the recent Yankees 4-game sweeping of the Red Sox (just typing that hurts). Instead, consistent with the title of this site, today is all about random walker rankings applied to baseball players. Well, sort of. Specifically, some of my collaborators and I recently wrote a paper (submitted for publication) studying the network of baseball players defined by the collection of pitcher-batter matchups across 1954-2008. Our focus so far is the study of this large network, and one of the (many) ways to try to understand a network is to study some process occurring on that network: enter the biased random walkers that can be used to define a ranking. Of course, the result is a very crude ranking. If one wanted to turn this into a more serious ranking of baseball players, numerous effects could and indeed should be included.

Brandon Keim picked up the story about our work for Wired Science, nicely including some thoughts (both ours and his) about the limitations of using this as a ranking. From there it got some nice attention and further helpful comments, some of which we'll use to clarify and acknowledge in an eventual revision. My coauthor, Mason Porter, has already collected most of the resulting links, including an interview he did with

A big thanks to Brandon for writing such a nice story about our work.

Don't worry, we'll start discussing and adding links to less narcissistic topics soon. Maybe.

Labels: , ,