Stat Splits

From NSB User Guide
Revision as of 08:51, 11 August 2018 by Cslane (talk | contribs) (Created page with "== Understanding Split Stats in Nostalgia Sim Baseball == This article was written in response to a policy change of August 2016. Though useful information about all stat spl...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Understanding Split Stats in Nostalgia Sim Baseball

This article was written in response to a policy change of August 2016. Though useful information about all stat splits can be found here, it focuses on the way the sim handles left/right matchups when only partial split data is available.

Probabilities for batter-pitcher matchups are exceedingly complex. To get the best set of probabilities it is important for a sim to create a baseline series of probabilities taking into consideration players' relative strength compared to the leagues and seasons they actually played in, the ballpark in which the game is taking place, play-fatigue, and a few other game-time conditions (e.g., whether the infield in in or deep, whether lines are being guarded, etc.)

On top of this baseline another complex set of formulas is used to asses the situational context (i.e., batting and pitching splits) in which batter-pitcher matchups take place. There are four main situational contexts used by Nostalgia Sim Baseball:

  1. The handedness of the batter and the pitcher (LHB v LHP, LHB v RHP, RHB v LHP, RHB v RHP)
  2. The location of the matchup (HOME v ROAD)
  3. The state of baserunners (BASES EMPTY v ON BASE)
  4. Whether the batter is pinch hitting

However, these sets of conditions are not weighted equally. #1 is weighted more heavily than #2, and #2 is weighted more heavily than #3. In fact, #3 is called upon only to determine the chance of a batter getting a walk. #4 is weighted as heavily as #1, although if a pinch hitter exceeds his actual pinch plate appearances then probabilities regress to his performance mean.

Before August 2016, the sim would only use split data from seasons where all split data was complete. From 1973 on, all split data is complete. Before 1973 there are increasing numbers of games missing from the data record, making it less reliable, although there are complete data sets before 1973 in a number or cases, and the research work continues over at Retrosheet. It's much easier to get complete data for home/road splits than for left/right splits, because the former can be garnered from box score analysis.

Effective August 2016, however, a major initiative was completed to make our sim conducive to using whatever partial left/right split data we have. To do this well required lots of retrofitting, but we are pleased with the results. And we want to share with you how we did it so that there are no mysteries in your players' performances.

Before I go any further, two things:

1. Partial Left/Right Stats are published in the owner software package in the database window on the split tabs. Whenever a partial stat is in use, it is followed by an asterisk. You will also note two columns at the right headed "RPL" and "PERF". These indicate the percentage of real play (RPL) represented in the split--the higher the percentage the more accurate the data--and the performance level (PERF) as a factor (.98 or 1.03 etc.) of the player's total performance.

2. Remember that the nature of probabilities is often different than we suppose. To see the real effects of splits work themselves out, many thousands of plate appearances may be required. In any given simulated season, performance can vary quite wildly. So, just because Ken Caminiti his .230 against lefties and .265 against righties in 1998, it does not guarantee that he will hit better against righties if he plays on your team. If he plays for you, he will be facing different pitchers in different parks under the direction of different managers. But let's say that over a period of 12 months Caminiti's 1998 season was played 10-15 times across the NSB system. With that sample size we would begin to see the larger probabilities play themselves out. Probabilities are always potential outcomes, not certain outcomes.

Now, the big problem we faced in introducing partial splits was how to keep from unfairly prejudicing performance (positive or negative). For example, look at the following stats for Fred Schulte in the 1929 season.

Schulte.jpg

In 1929 we have managed to get 78.5% of Schulte's left/right plate appearances. In those appearances, he performed worse than he did overall for 1929. If we allow Schulte's left/right probabilities to be determined by this data, truthful though it is, we would be using left/right data to his disadvantage.

So, to correct this, we must create a profile of the remaining 21.5% of his left/right appearances using what we know about the broader tendencies of matchups. When we do this, we get the following adjustments to the left and right sides:

Schulte-adj.jpg

These adjusted numbers match up exactly to Schulte's total AB,H,BB,2B,3B,and HRs for the 1929 season. We can't be certain this is the actual 1929 result, but we have created a model profile with a PERF factor now at 1.00. Schulte will now not be penalized for using his left-right splits. We do the same, naturally, for pitchers.