How Well Did ZiPS Perform for the 2011 Rookie Hitters?Posted: January 3, 2012
The incredible Dan Szymborski is currently rolling out his ZiPS projections for the 2012 baseball season team-by-team at the Baseball Think Factory (most recently the Orioles). These projections not only include well-established veteran players, but also minor league players who have little to no track record in the major leagues, which is where I will focus my attention.
Szymborski was kind enough to send me a link to the 2011 projections. I went to Fangraphs and downloaded a table of all of the 2011 rookies (there may be some mistakes, such as Alexi Ogando who is not actually a rookie). Then, I found OPS+ statistics from baseball-reference (not available at Fangraphs). Finally, I compared the ZiPS projections to the actual totals from those rookies.
There are 17 counting stats, four rate stats and one league-average stat projected by ZiPS. I was able to find the actual production for each of these except RC/27, which I have not compared yet and will not include in this analysis. Fangraphs has a wRC stat, but it is a counting stat (total number of runs) instead of a rate stat (runs created per 27 outs). That still leaves 21 stats to compare, so I consolidated a few of them. I compared all counting stats besides games and at-bats (playing time stats) together. I took the absolute difference between how many runs, hits, doubles, triples, etc. ZiPS projected and the player actually accumulated and added them all together. This gives a total “counting stat” difference between projection and actual. For playing time stats I only looked at at-bats, as I figured it would give me the same basic information as games. For batting average, on-base percentage, slugging percentage and OPS+, I only looked at players with more than 100 at-bats (no projected plate appearances).
The dark line on each of the graphs represents x=y (if ZiPS could perfectly project every statistic), not a trend line.
This category covers at-bats.
The playing time numbers are way off in general. There are maybe 11 players where the playing time is actually in line with what was projected. This is understandable since it is very difficult to project how a team will need a player throughout the year and what their plans are for him. Here is that list of 11 players:
The correlation coefficient for at-bats is 0.09 for rookies. The coefficient for all players in 2011 is about 0.16.
This category covers runs, hits, doubles, triples, home runs, runs batted in, walks, strikeouts, hit by pitches, stolen bases, caught stealing, sacrifice hits and flies, intentional walks and grounded into double plays.
There were 11 players with an absolute difference between all of the counting stats of less than 100. The best projection was for Luke Hughes. It was off by two runs, four hits, three doubles, two triples, one run batted in, two walks, one stolen base, one caught stealing, two sacrifice flies, three grounded-into-double-plays and 0 home runs, strike-outs, hit-by-pitches, sacrifice hits and intentional walks. The worst projection was for Chris Carter, where the main culprit was his 586 projected at-bats versus 44 actual at-bats. It’s difficult to accumulate counting stats when you rarely play. Here is that top 11:
You may notice that 10 of the top 11 in the counting stat category are also in the top 11 of the projected at-bat category. Jemile Weeks is the only player from the at-bat top 11 not in the counting stats top 11 (he comes in 13th) and Chris Stewart is the only player in the counting stats top 11 not in the at-bat top 11 (he comes in 13th also).
The average sum of the absolute difference of all of these categories (total amount off for all stats) is 336.
This category covers batting average, on-base percentage and slugging percentage.
The rate stats all follow an upward linear correlation. The r^2 values for each are listed here:
Batting average is the most highly correlated statistic that this analysis covers and it only has an r^2 value of 0.16.
This is the best overall measure of offense to compare ZiPS to the actual rookie performance.
The correlation coefficient for this graph is 0.1, which is not very strong. There is always the possibility that the difference in OPS+ calculations (park factors, etc) makes enough of a difference to make this an invalid relationship to try to make. However, as it looks now, ZiPS does not do a very good job of projecting rookie OPS+.
At least in 2011, ZiPS struggled with projecting the proper playing time for rookie hitters. However, it seems that the projected rates of the counting stats are a bit more valid. Batting average and slugging percentage have among the highest of correlations between projected and actual statistics, but they are still below 0.2. Finding some way to improve rookie playing time numbers (if possible) would greatly increase the usability of these projections.