A couple of weeks ago, I asked a co-worker about his fantasy team. He was in the middle of selecting a player for his daily play of Beat the Streak, the game sponsored by MLB.com. He asked for some advice, and then asked if I played. I didn’t, and I explained the odds are very much stacked against you. He thanked me for ruining his fun :-), but my help kept his streak alive for another day.
It also made me revisit playing the game.
Be the first to reach 57 games and win $5.6 Million
Given that you don’t need to spend any money to play, I started thinking about how to go about doing this. The first thing I did was go to the FanGraphs list of probable starters, and sorting on Batting Average on Balls in Play (BABIP). I then looked for pitchers with a high BABIP and a low strikeout rate. From today, Alfredo Simon and Aaron Blair are good examples. Giving up a lot of home runs with few walks would be good, too. Simon gives up the homers, but both he and Blair walk batters. Adam Wainwright and Jered Weaver actually do a better job of meeting all three categories.
Of course, once I identified the pitchers, I needed go to the opposing rosters and find hitters with high BABIP who put the ball in play a lot. Needless to say, this takes time. Given that I have an extensive database at my fingertips, I decided to write a program to see if I could do better.
My first try was to write a program that calculated the expected Hit Average of a batter pitcher match-up. Hit Average is hits divided by plate appearances. The amount of at bats a player gets in a game is variable, put the plate appearances are pretty constant. So I figured a hit average for the batter based on 2016 stats, an opposition hit average for the pitcher based on 2016 stats, regressed those for hitters and pitchers with few PA to the league hit average*, and used the Log5 method to figure the probability.
*For the league hit average, I only use player whose position at the time of the PA was not pitcher.
The results were reasonable but they were all based on small sample sizes. So I averaged that number with a three-year weighted hit average, again regressed to the 2016 league average from small sample sizes. Each counted 50%; I’m using it more as a ranking method then call it a true probability. Here is the top of the list it produced for today:
0.369 — Daniel Murphy batting against Adam Wainwright
0.340 — Ryan Braun batting against Alfredo Simon
0.322 — Jose Altuve batting against Jered Weaver
0.320 — Robinson Cano batting against Phil Hughes
0.312 — Francisco M Lindor batting against Ubaldo Jimenez
0.311 — Martin Prado batting against Aaron D Blair
0.310 — Marcell Ozuna batting against Aaron D Blair
0.304 — Jonathan Lucroy batting against Alfredo Simon
0.304 — Wilson Ramos batting against Adam Wainwright
0.302 — Buster Posey batting against Christopher Rusin
0.302 — Kelby Tomlinson batting against Christopher Rusin
0.302 — Jonathan Villar batting against Alfredo Simon
So it agrees with my method of choosing pitchers. The number leading off is again a ranking, but if you want to think of it as a probability, that fine. A player with a .369 probability of getting a hit in a PA would have a probability of 0.84 of getting a hit in four PA, if that’s how you define a game.
I decided to try another method as well. I built a neural network using the same five parameters; current year pitcher hit average, three year weighted pitcher hit average, current year batter hit average, three year weighted batter hit average, and the current year league hit average. I then trained the net, and here is the top of the list it produced for today:
0.302, 0.760 — Buster Posey batting against Christopher Rusin.
0.322, 0.752 — Jose Altuve batting against Jered Weaver.
0.291, 0.745 — Joe Panik batting against Christopher Rusin.
0.320, 0.743 — Robinson Cano batting against Phil Hughes.
0.289, 0.742 — Matt M Duffy batting against Christopher Rusin.
0.369, 0.741 — Daniel Murphy batting against Adam Wainwright.
0.285, 0.737 — Denard Span batting against Christopher Rusin.
0.279, 0.735 — Ben Revere batting against Adam Wainwright.
0.312, 0.734 — Francisco M Lindor batting against Ubaldo Jimenez.
0.302, 0.734 — Kelby Tomlinson batting against Christopher Rusin.
(Dee Gordon was on the list, but he’s not playing due to a drug suspension.)
The first number is the weight hit averages I used in the previous list. The second number is the probability of the player getting a hit in a game started by this pitcher.
The neural net (NN) likes the Giants chances against Chris Rusin. The biggest difference in the lists is the absence of Ryan Braun from the NN. Having looked at bit at the parameters, the NN seems to favor the three-year hit averages over the current year hit averages, but a big current year boost counts for something. Murphy drops, but stays high on the NN because his prior three years are okay, and this particular match-up is good for him.
I am still back testing to see how well these methods work, but if it can help you beat the streak, I’ll publish these top lists daily. I’ll also continue to work to see if there are better NN combinations for predictions.
from baseballmusings.com http://ift.tt/20OjKEq
No comments:
Post a Comment