Tuesday, December 4, 2007

Where to go for Your Advanced Basketball Stats

First off, here's a link to an epic post I slapped up on SportsHub LA about all the games during USC/UCLA weekend, which even got linked on TrueHoop with a glaring grammatical error right there in the quoted paragraph. Hey, it's not like I'm an English major or anything.

Today's Berri/Bynum/Kobe debacle made me finally get around to putting up a kind of "Beginner's Guide To Advanced Basketball Stats," which I've wanted to do for a while. I'm a bit of a stats junkie: when I read "Moneyball" as a kid, a light kind of went off in my head and I've been a die-hard stats guy ever since; I'm a huge proponent of knowing everything possible about a topic before forming an opinion, and a big reason I don't write about baseball is that guys like Bill James have nearly reduced the game to a science, and I don't think there's more informative/insightful work on baseball than what the guys at Baseball Prospectus are doing, and I don't have anywhere near the resources to keep up with them and tell people anything Prospectus can't. I could say "The Yankees should think about the fact that horrible, horrible things often happen to teams that put a lot of stock into young pitching," and mention the A's and Cubs, but BP could easily break out an analysis of age, usage rates, and everything else and have a definite answer where I could only offer a few anecdotes. Because of the team nature of basketball, it's still more art than science, which is why it's much more fun for me to write about.

That being said, there's been a movement to provide Moneyball-type basketball statistics in the last few years, and of course I've been keeping up. Here are my favorite advanced-stats websites:

1. John Hollinger

Synopsis: ESPN.com's resident basketball statistician and thus probably the best-known one of all, which is fortunate, because he's probably better than anyone else at navigating the uneasy divide between basketball science and basketball art.

Signature Statistic: PER (Player Efficency Rating), a stat that takes all of a player 's points, shooting percentage, rebounds, assists, etc. on a per-minute basis and puts them into one individual statistic.

Uniting Theory: Basketball stats should not be measured by the gross of what they do over a game or a season, because those numbers are skewed by usage, pace, league conditions, and minutes played but what players and teams to on a per-possession basis.

Useful Statistics: True Shooting %(Takes free throws and 3-pointers into account and gives a shooting percentage based on all of those-hugely valuable when comparing guys like Shaq and Steve Nash), "pace" factor (how many possessions a team uses in a given game), usage rate (how many possessions a player uses in a given game), assist ratio (what % of a player's possessions end in assists), rebound ratio (what percentage of rebounds a player pulls down).

Sample John Hollinger Column: A great piece about how the point guards that age well are the ones that shoot well, have good size, and pass well, while the ones that do only one or none of those things will fall off rapidly after 30.

Pretentiousness factor: Low to Moderate. Hollinger believes in his statistics, but knows they don't exist in a vacuum; in his player previews, he includes a more conventional paragraph explaining what about that player may have caused his numbers to fail to describe him, such as that player's defense, how young players can mess up his system, how he'll be playing a different role this year, how a trade may have affected his team, etc. However, if you argue with one of his findings, like that a team's true quality is better calculated by their average scoring margin than their actual wins and losses, prepare to feel his wraith.

2. 82games.com

Synopsis: Essentially a no-frills pile of stats compiled by a small army of "game charters" who watch every game and record things that aren't reflected in a box score.

Signature Statistic: +/-, which was started on 82games.com and subsequently grabbed by the NBA and now exists as the Lenovo Statistic. Their catch-all statistic is the "Roland Rating," which puts +/- along with the player's PER and his defensive counterpart's PER to make an overall rating.

Uniting Theory: Hey, there's a lot of things that happen in an NBA game that aren't in a box score! Let's record them!

Usful Stats: Breakdown of each player's shots into jumpers, "close" shots, and dunks, with how many of each the player takes and their percentage on each, +/- statistics for offense, defense, and rebounding, if their assists led to 3s, jumpers, layups, or dunks, crunch time statistics, data on which players play well together, production by position...the list goes on.

Sample Column: Their columns are an extension of their site; examinations of interesting things. (Last-second shot performance, how teams perform after timeouts, who took the most charges, etc. They've shied away from more ambitious columns like "The value of a Steve Nash" in recent years, although they did just do a nice study on how the pre-season reflects on the actual season.

Pretentiousness Factor: Zero. You can easily spend 20 minutes on 82games.com without seeing a word, and most of their columns are basically "Hey, here's a table! Here's what the table says! I wonder what that means!"

3. The Wages of Wins guys

Synopsis: The first chapter of their book, The Wages of Wins, said that it intended to be basketball's answer to Moneyball. Naturally, I plowed through it. When I finished it, I felt sick to my stomach. That's all I'll say for now.

Uniting Theory: They've run retroactive regression analysis on past games, and have decided that they can then go back and assign "wins" to each player based on his rebounds, assists, and scoring efficiency. Usage means nothing to them: They believe that it a player who goes 2 for 3 is more valuable than a player who goes 18 of 30, since a team ends up shooting on every possession anyways. However, they do not believe that carries over to rebounding, and believe that Jason Kidd's 8 rebounds per game is in no way effected by the fact that his big men are poor rebounders. Based on this, they award each player with a number of "wins," even though the numbers don't carry over-the 5 best players each year amass a cumulative win total of over 82, which clearly shows the rating is fluid, but they stick to it like it is absolute. Their unifying theory is a load of crap. I'm an English major who intends never to take another math class in his life, so I'm out of my league with this stuff, but Hollinger and the guys at 82games have both written up pretty convincing cases against the book, so read those, since those guys know what they're talking about better than I do.

Signature Statistic: "WP48" (Wins Produced per 48 minutes), which I described above.

Useful Statistics: None. Everything is about WP or WP48, which are both essentially useless takes on data everyone already has.

Pretentiousness Factor: Extreme. These guys firmly believe that everything they say is right and everyone in the NBA is stupid for not believing them, and never comment on when things like Ben Wallace being a horrifying bust happens. In their book, they drop gems like "People fail to consider that points scored in the first quarter count just the same as points scored in the 4th quarter" and "If every team played their mascot, people would conclude that the mascot is an integral part of the game." They're even annoying when they're right, saying things like "People say that Adam Morrison is on his way to Rookie of the Year this year. But what has he done well? Basically, he's shot more than anyone else. So what he's good at is throwing the ball towards the basket." Of course, maybe we deserve to be talked down to: not all of us are able to see into the future and know that Nick Fazekas will be better than Al Horford and Greg Oden. I'm being a bit hypocritical in dishing out venom to these guys when my big problem with them is their attitude, but these guys get under my skin. I'm sorry. Go to the first two sites, and hopefully we can keep anyone from trying to reduce the beauty of basketball to some sort of Excel experiment, and not even being right.

8 comments:

iwatchthenba said...

I think your simple lack of respect for what D. Berri is trying to accomplish is hypocritical. Who are you, someone who admits that they suck at Math, to say their stats are wrong? Maybe you don't like them. That's cool, but in a post that supposedly introduces people to advanced statistics, don't you think badmouthing someone because you peresonally don't like their attitude is a kinda pretentious?

Krolik1157 said...

I also think their conclusions are fellacious, which comes more from common sense than any kind of formulas. Like I said, people who are better at math than I am have debunked their conclusions in a more math-y way over at 82games.com...there's been a whole stat-head debate on this that you can find recapped on Truehoop if you're so inclined. like I said, I eagerly read their book with the highest of expectations, but came to the conclusion they didn't know what they were doing after reading it as well as more work on their website.

Doctor Dribbles said...

It is a little weird to read a post that teases your "favorite advanced-stats websites," only to find out that you really only enjoy two and want to bag on one. That said, you're totally within your rights to be critical and your opinion sits well with me--Berri's ambitions may be praiseworthy, but his results stink. Plus, it's a disservice to basketball statheads' legitimacy when Berri's work gets picked up in major media and debated as if it's representative of the rest.

There's one factor working in Berri's favor, however: Having more than one pundit never hurts. Berri's presence and contrasting system help us better appreciate Hollinger's work, and competing models no doubt drive all guys in that field to improve, too. And I don't need some advanced model to know that's a good thing.

iwatchthenba said...

My main point was that the post was about an intro to adv stats and yet you spend the majority of the time bashing one particular one. As far as other people discrediting berri's stats, i simply disagree, i think the stat berri uses needs to be used in context, and i think that is repeatedly misunderstood by trying to dis-prove his result.

ClipperSteve said...

The Nick Fazekas example might not be the best one for debunking WOW. Hollinger is also a big Fazekas fan.

Anonymous said...

Was Berri really making a prediction there about Fazekas, or just applying his stats to the previous college basketball season?

If the latter, there is nothing unusual about the results at all, but Nick Fazekas was a beast for Nevada, and pretty much did it all for them.

Obviously college hoops has its varying degrees of difficulty by conference, so these numbers have to be taken with a grain of salt, but I'm pretty sure Berri's model doesn't make serious predictions or projections about how a college player will actually do in the NBA.

Anonymous said...

On the Fazekas comment, Berri admits using his model on college numbers isn't great for predicting NBA stats, as opposed to his apparent belief that it's perfect when using NBA stats.

I think there are some things wrong with his stats, his valuation of rebounding being the obvious one. But then Hollinger undervalues shooting percentage (my opinion). Berri's numbers do seem to do a pretty good job of predicting how teams will fare, which says something for them.

Anonymous said...

Dberri's tone is definitely irksome, but his statistics do have value. His method of regressing win totals as compared to every boxscore stat we have is faulty because it assumes the boxscore contains all relevant data about a player (and that each player's boxscore stats are independent from each other players - obviously false). But WP is a better measure of a player's value than PPG or Points+Reb+Assists, or any other simple metric based only on some aspects of the boxscore.

Wins Produced can provide insight into players that are performing well in their respective roles - for instance, Rajon Rondo's .350 WP48 stat last year captured his ability to get assists and rebounds at an above average rate, turnovers at a below average rate, and to shoot a high percentage from the field. As a conglomerate stat, it did far more than simply stating that he scored 12 ppg or shot 52%.

Wins Produced is not a catch all stat like Dberri describes it, but it is a useful metric for simply and quickly understanding a players boxscore stats.