Thursday, June 16, 2016

Beat the Streak with Parks

A reader suggested that the Beat the Streak picks posted here every day might be improved by including park data. ( This post mostly explains the ideas behind the calculations. In addition, this post shows tests on the Neural Network (NN). ) I decided to try this on the neural network, since I only need to feed the program one more number. For the log5 method, I would need to figure out a park factor, then adjust the probabilities based on the factor. For the NN, I just fed in a three-year weighted hit average for the park, regressed to the current season hit average if the park has less than 1500 plate appearances. I trained the model and did the same tests as before. One looks at the results for the highest probability predicted on 160 randomly chosen game days, one chooses a random player from each of those same days.

Here are the results for picking the highest probability player.

Days: 160, Expected Games with hit: 119.1, Actual Game with hit 129.
Streak Length: 0, Number of times: 5
Streak Length: 1, Number of times: 7
Streak Length: 2, Number of times: 4
Streak Length: 3, Number of times: 1
Streak Length: 4, Number of times: 4
Streak Length: 5, Number of times: 1
Streak Length: 6, Number of times: 2
Streak Length: 7, Number of times: 1
Streak Length: 8, Number of times: 1
Streak Length: 9, Number of times: 1
Streak Length: 10, Number of times: 1
Streak Length: 13, Number of times: 1
Streak Length: 14, Number of times: 1
Streak Length: 16, Number of times: 1

The test without the parks did better at predicting the best hitter, with an expected value of 124.9, and was also right 129 times. So the park model underestimates reality even more.

That’s also true with the random test:

Days: 160, Expected Games with hit: 104.1, Actual Game with hit 111.
Streak Length: 0, Number of times: 18
Streak Length: 1, Number of times: 13
Streak Length: 2, Number of times: 5
Streak Length: 3, Number of times: 3
Streak Length: 4, Number of times: 2
Streak Length: 5, Number of times: 1
Streak Length: 6, Number of times: 3
Streak Length: 7, Number of times: 1
Streak Length: 8, Number of times: 1
Streak Length: 13, Number of times: 2

Interestingly, while the expected value is lower, this test gets one more correct than the original model, and that two is interesting. Look at actual rankings. Here is the run for Friday (the actual run tomorrow will include Thursday’s games) based on the non-park model. The parameters are Pitcher this year, Pitcher last three years, batter this year, batter last three years, this year MLB average for position players:

  1. 0.340, 0.769 — Jose Altuve batting against John Lamb. Parameters: [‘0.270655’, ‘0.26125’, ‘0.298’, ‘0.299’, ‘0.230’]
  2. 0.270, 0.760 — Dee Gordon batting against Jonathan Gray. Parameters: [‘0.220’, ‘0.24185’, ‘0.24358’, ‘0.294’, ‘0.230’]
  3. 0.325, 0.745 — Xander Bogaerts batting against Hisashi Iwakuma. Parameters: [‘0.248’, ‘0.237’, ‘0.328’, ‘0.291’, ‘0.230’]
  4. 0.307, 0.743 — Daniel Murphy batting against Christian Friedrich. Parameters: [‘0.215405’, ‘0.244’, ‘0.331’, ‘0.285’, ‘0.230’]
  5. 0.256, 0.740 — Michael Brantley batting against Jose Quintana. Parameters: [‘0.225’, ‘0.242’, ‘0.225485’, ‘0.278’, ‘0.230’]
  6. 0.319, 0.737 — Yunel Escobar batting against Kendall Graveman. Parameters: [‘0.274’, ‘0.261’, ‘0.281’, ‘0.273’, ‘0.230’]
  7. 0.295, 0.734 — David Peralta batting against Adam C Morgan. Parameters: [‘0.27656’, ‘0.2647’, ‘0.237155’, ‘0.267’, ‘0.230’]
  8. 0.238, 0.733 — Ben Revere batting against Christian Friedrich. Parameters: [‘0.215405’, ‘0.244’, ‘0.20399’, ‘0.270’, ‘0.230’]
  9. 0.293, 0.733 — Martin Prado batting against Jonathan Gray. Parameters: [‘0.220’, ‘0.24185’, ‘0.307’, ‘0.277’, ‘0.230’]
  10. 0.277, 0.732 — Francisco Lindor batting against Jose Quintana. Parameters: [‘0.225’, ‘0.242’, ‘0.271’, ‘0.275’, ‘0.230’]
  11. 0.297, 0.727 — Odubel Herrera batting against Robbie Ray. Parameters: [‘0.267’, ‘0.255’, ‘0.259’, ‘0.267’, ‘0.230’]
  12. 0.305, 0.726 — Danny Valencia batting against Matthew Shoemaker. Parameters: [‘0.249’, ‘0.239’, ‘0.301495’, ‘0.275’, ‘0.230’]
  13. 0.259, 0.722 — Miguel Cabrera batting against Yordano Ventura. Parameters: [‘0.218’, ‘0.219’, ‘0.267’, ‘0.277’, ‘0.230’]
  14. 0.272, 0.721 — Christian Yelich batting against Jonathan Gray. Parameters: [‘0.220’, ‘0.24185’, ‘0.275’, ‘0.267’, ‘0.230’]
  15. 0.280, 0.721 — DJ LeMahieu batting against Adam Conley. Parameters: [‘0.233’, ‘0.236’, ‘0.280’, ‘0.270’, ‘0.230’]

I usually leave out Dee Gordon and Michael Brantley because they are currently inactive. Here is the ranking for the model with the park. The last parameter is the three year park value, batting by position players:

  1. 0.340, 0.727 — Jose Altuve batting against John Lamb. Parameters: [‘0.270655’, ‘0.26125’, ‘0.298’, ‘0.299’, ‘0.230’, ‘0.219’]
  2. 0.270, 0.722 — Dee Gordon batting against Jonathan Gray. Parameters: [‘0.220’, ‘0.24185’, ‘0.24358’, ‘0.294’, ‘0.230’, ‘0.232’]
  3. 0.325, 0.709 — Xander Bogaerts batting against Hisashi Iwakuma. Parameters: [‘0.248’, ‘0.237’, ‘0.328’, ‘0.291’, ‘0.230’, ‘0.246’]
  4. 0.256, 0.704 — Michael Brantley batting against Jose Quintana. Parameters: [‘0.225’, ‘0.242’, ‘0.225485’, ‘0.278’, ‘0.230’, ‘0.238’]
  5. 0.307, 0.704 — Daniel Murphy batting against Christian Friedrich. Parameters: [‘0.215405’, ‘0.244’, ‘0.331’, ‘0.285’, ‘0.230’, ‘0.229’]
  6. 0.319, 0.698 — Yunel Escobar batting against Kendall Graveman. Parameters: [‘0.274’, ‘0.261’, ‘0.281’, ‘0.273’, ‘0.230’, ‘0.229’]
  7. 0.295, 0.697 — David Peralta batting against Adam C Morgan. Parameters: [‘0.27656’, ‘0.2647’, ‘0.237155’, ‘0.267’, ‘0.230’, ‘0.232’]
  8. 0.277, 0.696 — Francisco Lindor batting against Jose Quintana. Parameters: [‘0.225’, ‘0.242’, ‘0.271’, ‘0.275’, ‘0.230’, ‘0.238’]
  9. 0.238, 0.696 — Ben Revere batting against Christian Friedrich. Parameters: [‘0.215405’, ‘0.244’, ‘0.20399’, ‘0.270’, ‘0.230’, ‘0.229’]
  10. 0.293, 0.695 — Martin Prado batting against Jonathan Gray. Parameters: [‘0.220’, ‘0.24185’, ‘0.307’, ‘0.277’, ‘0.230’, ‘0.232’]
  11. 0.297, 0.689 — Odubel Herrera batting against Robbie Ray. Parameters: [‘0.267’, ‘0.255’, ‘0.259’, ‘0.267’, ‘0.230’, ‘0.232’]
  12. 0.305, 0.687 — Danny Valencia batting against Matthew Shoemaker. Parameters: [‘0.249’, ‘0.239’, ‘0.301495’, ‘0.275’, ‘0.230’, ‘0.229’]
  13. 0.259, 0.686 — Miguel Cabrera batting against Yordano Ventura. Parameters: [‘0.218’, ‘0.219’, ‘0.267’, ‘0.277’, ‘0.230’, ‘0.238’]
  14. 0.241, 0.685 — Lorenzo Cain batting against Michael Fulmer. Parameters: [‘0.194’, ‘0.21698’, ‘0.261’, ‘0.275’, ‘0.230’, ‘0.238’]
  15. 0.272, 0.684 — Christian Yelich batting against Jonathan Gray. Parameters: [‘0.220’, ‘0.24185’, ‘0.275’, ‘0.267’, ‘0.230’, ‘0.232’]

The takeaway here is that when someone moves up on the park list, that batter is playing in a park with higher three year hit average than the batters passed. So even though the probabilities are somewhat underestimated, we may actually get a better ranking. I’ll start running both models on a daily basis.



from baseballmusings.com http://ift.tt/1YvsCzB

No comments:

Post a Comment