Calculating Expected Goals 2.0

I wrote a post similar to this a while back, outlining the process for calculating our first version of Expected Goals. This is going to be harder. Get out your TI-89 calculators, please. (Or you can just used my Expected Goals Cheatsheet).

Expected Goals is founded on the idea that each shot had a certain probability of going in based on some important details about that shot. If we add up all the probabilities of a team’s shots, that gives us its Expected Goals. Our goal is that this metric conveys the quality of opportunities a team earns for itself. For shooters and goal keepers, the details about the shot change a little bit, so pay attention.

The formulas are all based on a logistic regression, which allows us to sort out the influence of each shot’s many details all at once. The formula changes slightly each week because we base the regression on all the data we have, including each week’s new data, but it won’t change by much.

Expected Goals for a Team

  • Start with -0.19
  • Subtract 0.95 if the shot was headed (0.0 if it was kicked or othered).
  • Subtract 0.74 if the shot was taken from a corner kick (by Opta definition)
  • Subtract one of the following amounts for the shot’s location:
    Zone 1 – 0.0
    Zone 2 – 0.93
    Zone 3 – 2.37
    Zone 4 – 2.68
    Zone 5 – 3.55
    Zone 6 – 3.06

Now you have what are called log odds of that shot going in. To find the odds of that shot going in, put the log odds in an exponent over the number “e”. 

Finally, to find the estimated probability of that shot going in, take the odds and divide by 1 + odds.

Example: Shot from zone 3, header, taken off a corner kick:

-0.19 – 0.95 – 0.74 – 2.37 = -4.25

e^(-4.25) = .0143

.0143 / (1 + .0143) = 0.014 or a 1.4% chance of going in.

A team that took one of these shots would earn 0.014 expected goals.

Expected Goals for Shooter

  • Start with -0.28
  • Subtract 0.83 if the shot was headed (0.0 if it was kicked or othered).
  • Subtract 0.65 if the shot was taken from a corner kick (by Opta definition).
  • Add 2.54 if the shot was as a penalty kick.
  • Add 0.71 if the shot was taken on a fastbreak (by Opta definition).
  • Add 0.16 if the shot was taken from a set piece (by Opta definition).
  • Subtract one of the following amounts for the shot’s location:
  1. 0.0
  2. 1.06
  3. 2.32
  4. 2.61
  5. 3.48
  6. 2.99

Now you have what are called log odds of that shot going in. To find the odds of that shot going in, put the log odds in an exponent over the number “e”. 

Finally, to find the estimated probability of that shot going in, take the odds and divide by 1 + odds

Example: A penalty kick

-0.28 + 2.54 – 1.06 = 1.2
e^(1.2) = 3.320
3.320/ (1 + 3.320) = 0.769 or a 76.9% chance of going in.
A player that took a penalty would gain an additional 0.769 Expected Goals. If he missed, then he be underperforming his Expected Goals by 0.769.

Expected Goals for Goalkeeper

*These are calculated only from shots on target.

  • Start with 1.61
  • Subtract 0.72 if the shot was headed (0.0 if it was kicked or othered).
  • Add 1.58 if the shot was as a penalty kick.
  • Add 0.42 if the shot was taken from a set piece (by Opta definition).
  • Subtract one of the following amounts for the shot’s location:
  1. One) 0.0
  2. Two) 1.10
  3. Three) 2.57
  4. Four) 2.58
  5. Five) 3.33
  6. Six) 3.21
  • Subtract 1.37 if the shot was taken toward the middle third of the goal (horizontally).
  • Subtract 0.29 if the shot was taken at the lower half of the goal (vertically).
  • Add 0.35 if the was taken outside the width of the six-yard box and was directed toward the far post.

Now you have what are called log odds of that shot going in. To find the odds of that shot going in, put the log odds in an exponent over the number “e”. 

Finally, to find the estimated probability of that shot going in, take the odds and divide by 1 + odds

Example: Shot from zone 2, kicked toward lower corner, from the run of play.

1.61 – 1.10 – 0.29 = 0.22

e^(0.22) = 1.246

1.246/ (1 + 1.246) = 0.555 or a 55.5% chance of going in.

A keeper that took on one of these shots would gain an additional 0.555 Expected Goals against. If he saved it, then he would be outperforming his Expected Goals by 0.555.

Frequently Asked Questions

1. Why a regression  model? Why not just subset each shot in a pivot table by its type across all variables?
I think a lot of information–degrees of freedom we call it–would be lost if I were to partition each shot into a specific type by location, pattern of play, body part, and for keepers, placement. The regression gets more information about, say, headed shots in general, rather than “headed shots from zone 2 off corner kicks,” of which there are far fewer data points.
2. Why don’t you include info about penalty kicks in the team model?
Penalty kicks are not earned in a stable manner. Teams that get lots of PK’s early in the season are no more likely to get additional PK’s later in the season. Since we want this metric to be predictive at the team level, including penalty kicks would cloud that prediction for teams that have received an extreme number of PK’s thus far.
3. The formula looks quite a bit different for shooters versus for keepers. How is that possible since one is just taking a shot on the other?
There are a few reasons for this. The first is that the regression model for keepers is based only on shots on target. It is meant only to assess their ability to produce quality saves. A different data set leads to different regression results. Also, we are now accounting for the shooter’s placement. It is very possible that corner kicks are finished less often than shots from other patterns of play because they are harder to place. By including shot placement information in the keeper model, the information about whether the shot came off a corner is now no longer needed for assessing the keeper’s ability.
4. Why don’t you include placement for shooters, then?
We wish to assess a shooter’s ability to create goals beyond what’s expected. Part of that skill is placement. When a shooter has recorded more goals than his expected goals, it indicates a player that is outperforming his expectation. It could be because he places well, or that he is deceptive, or he is good at getting opportunities that are better than what the model thinks. In any case, we want the expected goals to reflect the opportunities earned, and thus the actual goals should help us to measure finishing ability to some extent.

 

Should away teams be more aggressive?

Second Half Shot chart - HOUvPOR - April 2014The Portland Timbers traveled to Houston on Sunday in desperate need of three points to get out of the cellar in the Western Conference. They played well in the first half, outshooting the Dynamo 8 – 7 en route to a 1 – 1 tie, while dominating possession. Then Portland came out in the second half much like many away teams do with a tie score, conservatively. The second-half shot charts to the right serve as an indication of the change in strategy.

 

This conjured up a question that constantly bugs me. Should away teams go for wins more often when tied in the second half? Let’s get right to the data. Here is chart summarizing the offensive aggression of away teams during gamestates when the score is tied and the teams are playing with the same number of players. The data presents the proportion of totals earned by the away team in both the first and second halves.

2013 – 2014 Goals% xGoals% Shots%
1st Half 44.8% (266) 42.3% (282.9) 43.4% (2948)
2nd Half 34.8% (184) 37.4% (168.6) 39.7% (1654)
P-value 0.017 0.007

The away team consistently garners 42% to 45% of these primary offensive stats during the first half, and then drops down to the 35%-to-40% range in the second half. For the proportions of goals and shots, those differences are statistically significant (there is no simple test for xGoals%, but it is probably statistically significant as well).

My instinct is that away teams are capable of playing in the second half as they do in the first half, and that these discrepancies are a product of conscious decision making by away coaches and players. Teams likely change strategy in the second half to preserve a tie. Playing more openly would ostensibly increase the chances of both a loss and win, while decreasing the chances of a tie. However, I would think based on the data above that it would increase the chances of a win more so than the chances of a loss. Since a win would earn the away team an extra two points, while a loss would cost it just one, my gut says teams should go for it more often.

Are away teams playing conservatively because mindless soccer conventionality tells them that it’s okay to get one point on the road? Is this the self-detrimental risk aversion that plagues coaches in other sports, or are these numbers missing something that could justify the conservative play?

I can’t say that I’ve proven anything, but these data suggest the former.

Looking for the model-busting formula

Well that title is a little contradictory, no? If there’s a formula to beat the model then it should be part of the model and thus no longer a model buster. But I digress. That article about RSL last week sparked some good conversation about figuring out what makes one team’s shots potentially worth more than those of another team. RSL scored 56 goals (by their own bodies) last season, but were only expected to score 44, a 12-goal discrepancy. Before getting into where that came from, here’s how our Expected Goals data values each shot:

  1. Shot Location: Where the shot was taken
  2. Body part: Headed or kicked
  3. Gamestate: xGD is calculated in total, and also specifically during even gamestates when teams are most likely playing more, shall we say, competitively.
  4. Pattern of Play: What the situation on the field was like. For instance, shots taken off corner kicks have a lower chance of going in, likely due to a packed 18-yard box. These things are considered, based on the Opta definitions for pattern of play.

But these exclude some potentially important information, as Steve Fenn and Jared Young pointed out. I would say, based on their comments, that the two primary hindrances to our model are:

  1. How to differentiate between the “sub-zones” of each zone. As Steve put it, was the shot from the far corner of Zone 2, more than 18 yards from goal? Or was it from right up next to zone 1, about 6.5 yards from goal?
  2. How clean a look the shooter got. A proportion of blocked shots could help to explain some of that, but we’re still missing the time component and the goalkeeper’s positioning. How much time did the shooter have to place his shot and how open was the net?

Unfortunately, I can’t go get a better data set right now so hindrance number 1 will have to wait. But I can use the data set that I already have to explore some other trends that may help to identify potential sources of RSL’s ability to finish. My focus here will be on their offense, using some of the ideas from the second point about getting a clean look at goal.

Since we have information about shot placement, let’s look at that first. I broke down each shot on target by which sixth of the goal it targeted to assess RSL’s accuracy and placement. Since the 2013 season, RSL is second in the league in getting its shots on goal (37.25%), and among those shots, RSL places the ball better than any other team. Below is a graphic of the league’s placement rates versus those of RSL over that same time period. (The corner shots were consolidated for this analysis because it didn’t matter to which corner the shot was placed.)

Placement Distribution - RSL vs. League

 

RSL obviously placed shots where the keeper was not likely at: the corners. That’s a good strategy, I hear. If I include shot placement in the model, RSL’s 12-goal difference in 2013 completely evaporates. This new model expected them to score 55.87 goals in 2013, almost exactly the 56 they scored.

Admittedly, it isn’t earth-shattering news that teams score by shooting at the corners, but I still think it’s important. In baseball, we sometimes assess hitters and pitchers by their batting average on balls in play (BABIP), a success rate during specific instances only when the ball is contacted. It’s obvious that batters with higher BABIPs will also have higher overall batting averages, just like teams that shoot toward the corners will score more goals.

But just because it is obvious doesn’t mean that this information is worthless. On the contrary, baseball’s sabermetricians have figured out that BABIP takes a long time to stabilize, and that a player who is outperforming or underperforming his BABIP is likely to regress. Now that we know that RSL is beating the model due to its shot placement, this begs the question, do accuracy and placement stabilize at the team level?

To some degree, yes! First, there is a relationship between a team’s shots on target totals from the first half of the season and the second half of the season. Between 2011 and 2013, the correlation coefficient for 56 team-seasons was 0.29. Not huge, but it does exist. Looking further, I calculated the differences between teams’ expected goals in our current model and teams’ expected goals in this new shot placement model. The correlation from first half to second half on that one was 0.54.

To summarize, getting shots on goal can be repeated to a small degree, but where those shots are placed in the goal can be repeated at the team level. There is some stabilization going on. This gives RSL fans hope that at least some of this model-busting is due to a skill that will stick around.

Of course, that still doesn’t tell us why RSL is placing shots well as a team. Are their players more skilled? Or is it the system that creates a greater proportion of wide-open looks?

Seeking details that may indicate a better shot opportunity, I will start with assisted shots. A large proportion of assisted shots may indicate that a team will find open players in front of net more often, thus creating more time and space for shots. However, an assisted shot is no more likely to go in than an unassisted one, and RSL’s 74.9-percent assist rate is only marginally better than the league’s 73.1 percent, anyway. RSL actually scored about six fewer goals than expected on assisted shots, and six more goals than expected on unassisted shots. It becomes apparent that we’re barking up the wrong tree here.*

Are some teams more capable of not getting their shots blocked? If so then then those teams would likely finish better than the league average. One little problem with this theory is that RSL gets it shots blocked more often than the league average. Plus, in 2013, blocked shot percentages from the first half of the season had a (statistically insignificant) negative correlation to blocked shots in the second half of the season, suggesting strongly that blocked shots are more influenced by randomness and the defense, rather than by the offense which is taking the shots.

Maybe some teams get easier looks by forcing rebounds and following them up efficiently. Indeed, in 2013 RSL led the league in “rebound goals scored” with nine, where a rebounded shot is one that occurs within five seconds of the previous shot. That beat their expected goals on those particular shots by 5.6 goals. However, earning rebounds does not appear to be much of a skill, and neither does finishing them. The correlation between first-half and second-half rebound chances was a meager–and statistically insignificant–0.13, while the added value of a “rebound variable” to the expected goals model was virtually unnoticeable. RSL could be the best team at tucking away rebounds, but that’s not a repeatable league-wide skill. And much of that 5.6-goal advantage is explained by the fact that RSL places the ball well, regardless of whether or not the shot came off a rebound.

Jared did some research for us showing that teams that get an extremely high number of shots within a game are less likely to score on each shot. It probably has something to do with going for quantity rather than quality, and possibly playing from behind and having to fire away against a packed box. While that applies within a game, it does not seem to apply over the course of a season. Between 2011 and 2013, the correlation between a teams attempts per game and finishing rate per attempt was virtually zero.

If RSL spends a lot of time in the lead and very little time playing from behind–true for many winning teams–then its chances may come more often against stretched defenses. RSL spent the fourth most minutes in 2013 with the lead, and the fifth fewest minutes playing from behind. In 2013, there was a 0.47 correlation between teams’ abilities to outperform Expected Goals and the ratio of time they spent in positive versus negative gamestates.

If RSL’s boost in scoring comes mostly from those times when they are in the lead, that would be bad news since their Expected Goals data in even gamestates was not impressive then, and is not impressive now. But if the difference comes more from shot placement, then the team could retain some of its goal-scoring prowess. 8.3 goals of that 12-goal discrepancy I’m trying to explain in 2013 came during even gamestates, when perhaps their ability to place shots helped them to beat the expectations. But the other 4-ish additional goals likely came from spending increased time in positive gamestates. It is my guess that RSL won’t be able to outperform their even gamestate expectation by nearly as much this season, but at this point, I wouldn’t put it past them either.

We come to the unsatisfying conclusion that we still don’t know exactly why RSL is beating the model. Maybe the players are more skilled, maybe the attack leaves defenses out of position, maybe it spent more time in positive gamestates than it “should have.” And maybe RSL just gets a bunch of shots from the closest edge of each zone. Better data sets will hopefully sort this out someday.

*This doesn’t necessarily suggest that assisted shots have no advantage. It could be that assisted shots are more commonly taken by less-skilled finishers, and that unassisted shots are taken by the most-skilled finishers. However, even if that is true, it wouldn’t explain why RSL is finishing better than expected, which is the point of this article.

Real Salt Lake: Perennial Model Buster?

If you take a look back at 2013’s expected goal differentials, probably the biggest outlier was MLS Cup runner up Real Salt Lake. Expected to score 0.08 fewer goals per game than its opponents, RSL actually scored 0.47 more goals than its opponents. That translates to a discrepancy of about 19 unexplained goals for the whole season. This year, RSL finds itself second in the Western Conference with a goal differential of a massive 0.80. However, like last year, the expected goal differential is lagging irritatingly behind at –0.77.

There are two extreme explanations for RSL’s discrepancy in observed versus expected performance, and while the truth probably lies in the middle, I think it’s valuable to start the discussion at the extremes and move in from there.

It could be that RSL plays a style and has the personnel to fool my expected goal differential statistic. Or, it could be that RSL is one lucky son of a bitch. Or XI lucky sons of bitches. Whatever.

Here are some ways that a team could fool expected goal differential:

  1. It could have the best fucking goalkeeper in the league.
  2. It could have players that simply finish better than the league average clip in each defined shot type.
  3. It could have defenders that make shots harder than they appear to be in each defined shot type–perhaps by forcing attackers onto their weak feet, or punching attackers in the balls whilst winding up.
  4. That’s about it.

We know are pretty sure that RSL does indeed have the best goalkeeper in the league, and Will and I estimated Nick Rimando’s value at anywhere between about six and eight goals above average* during the 2013 season. That makes up a sizable chunk of the discrepancy, but still leaves at least half unaccounted for.

The finishing  ability conversation is still a controversial one, but that’s where we’re likely to see the rest of the difference. RSL scored 56 goals (off their own bodies rather than those of their opponents), but were only expected to score about 44. That 12-goal difference can be conveniently explained by their five top scorers–Alvaro Saborio, Javier Morales, Ned Grabavoy, Olmes Garcia, and Robbie Findley–who scored 36 goals between them while taking shots valued at 25.8 goals. (see: Individual Expected Goals, and yes it’s biased to look at just the top five goal scorers, but read on.)

Here’s the catch, though. Using the sample of 28 players that recorded at least 50 shots last season and at least 5 shots this season, the correlation coefficient for the goals above expectation statistic is –0.43. It’s negative. Basically, players that were good last year have been bad this year, and players that were bad last year have been good this year. That comes with some caveats–and if the correlation stays negative then that is a topic fit for another whole series of posts–but for our purposes here it suggests that finishing isn’t stable, and thus finishing isn’t really a reliable skill. The fact that RSL players have finished well for the last 14 months means very little for how they will finish in the future.

Since I said there was a third way to fool expected goal differential–defense. I should point out that once we account for Rimando, RSL’s defense allowed about as many goals as expected. Thus the primary culprits of RSL’s ability to outperform expected goal differential have been Nick Rimando and its top five scorers. So now we can move on to the explanation on the other extreme, luck.

RSL has been largely lucky, using the following definition of lucky: Scoring goals they can’t hope to score again. A common argument I might expect is that no team could be this “lucky” for this long. If you’re a baseball fan, I urge you to read my piece on Matt Cain, but if not, here’s the point. 19 teams have played soccer in MLS the past two seasons. The probability that at least one of them gets lucky for 1.2 seasons worth of games is actually quite high. RSL very well may be that team–on offense, anyway.

Unless RSL’s top scorers are all the outliers–which is not impossible, but unlikely–then RSL is likely in for a rude awakening, and a dogfight for a playoff spot.

 

*Will’s GSAR statistic is actually Goals Saved Above Replacement, so I had to calibrate.

Predictive strength of Expected Goals 2.0

It is my opinion that a statistic capable of predicting itself—and perhaps more importantly predicting future success—is a superior statistic to one that only correlates to “simultaneous success.” For example, a team’s actual goal differential correlates strongly to its current position in the table, but does not predict the team’s future goal differential or future points earned nearly as well.

I created the expected goals metrics to be predictive at the team level, so without further ado, let’s see how the 2.0 version did in 2013.

Mid-season Split

In predicting future goals scored and allowed, the baseline is to use past goals scored and allowed. In this case, expected goals beats actual goals in its predictive ability by quite a bit.*

Predictor Response R2 P-value
GF (first 17) GF (last 17) 0.155 0.099
xGF (first 17) GF (last 17) 0.409 0.004
GA (first 17) GA (last 17) 0.239 0.024
xGA (first 17) GA (last 17) 0.604 0.000
GD (first 17) GD (last 17) 0.487 0.000
xGD(first 17) GD (last 17) 0.800 0.000
xGD (by gamestate) GD (last 17) 0.805 0.000

Whether you’re interested in offense, defense, or differential, Expected Goals 2.0 outshone actual goals in its abilities to predict the future (the future in terms of goal scoring, that is). That 0.800 R-squared figure for xGD 2.0 even beats xGD 1.0, calculated at 0.624 by one Steve Fenn. One interesting note is that by segregated expected goals into even gamestates and non-even gamestates, very little predictive ability was gained (R-squared = 0.805).

Early-season Split

Most  of those statistics above showed some predictive ability in 17 games, but what about in fewer games? How early do these goal scoring statistics become stable predictors of future goal scoring? I reduced the games played for my predictor variables down to four games—the point of season we are currently at for most teams—and here are those results.

Predictor Response R2 P-value
GF (first 4) GF (last 30) 0.022 0.538
xGF (first 4) GF (last 30) 0.140 0.093
GA (first 4) GA (last 30) 0.003 0.835
xGA (first 4) GA (last 30) 0.236 0.033
GD (first 4) GD (last 30) 0.015 0.616
xGD(first 4) GD (last 30) 0.227 0.028
xGD (by gamestate) GD (last 30) 0.247 0.104**

Some information early on is just noise, but we see statistically significant correlations from expected goals on defense (xGA) and in differential (xGD) after only four games! Again, we don’t see much improvement, if any at all, in separating out xGD for even and non-even gamestates. If we were to look at points in the tables as a response variable, or perhaps include information on minutes spent in each gamestate, we might see something different there, but that’s for another week!

Check out the updated 2014 Expected Goals 2.0 tables, which now just might be meaningful in predicting team success for the rest of the season.

*A “home-games-played” variable was used as a control variable to account for those teams who’s early schedule are weighted toward one extreme. R-squared values and p-values were derived from a sequential sum of squares, thus reducing the effects of home games played on the p-value. 

**Though the R-squared value was higher, splitting up xGD into even and non-even game states seemed to muddle the p-values. The regression was unsure as to where to apportion credit for the explanation, essentially. 

Introducing Expected Goals 2.0 and its Byproducts

Many of the features listed below from our shot-by-shot data for 2013 and 2014 can be found above by hovering over the “Expected Goals 2.0” link.

Last month, I wrote an article explaining our method for calculating Expected Goals 1.0, based only on the six shot locations. Now, we have updated our methods with the cool, new, sleek Expected Goals 2.0.

Recall that in calculating expected goals, the point is to use shot data to effectively suggest how many goals a team or player “should have scored.” This gives us an idea of how typical teams and players finish, given certain types of opportunities, and then allows us to predict how they might do in the future. Using shot locations, if teams are getting a lot of shots from, say, zone 2 (the area around the penalty spot), then they should be scoring a lot of goals.

Expected Goals 2.0 for Teams

Now, in the 2.0 version, it’s not only about shot location. It’s also about whether or not shots are being taken with the head or the foot, and whether or not they come from corner kicks. Data from the 2013 season suggest that not only are header and corner kick shot totals predictive of themselves (stable metrics), but they also lead to lower finishing rates. Thus, teams that fare exceptionally well or poorly in these categories will now see changes in their Expected Goals metrics.

Example: In 2013, Portland took a low percentage of its total shots as headers (15.4%), as well as a low percentage of its total shots from corner kicks (12.3%). Conversely, it allowed higher percentages of those types of shots to its opponents (19.2% and 15.0%, respectively). Presumably, the Timbers’ style of play encourages this behavior, and this is why the 2.0 version of Expected Goal Differential (xGD) liked the Timbers more so than the 1.0 version

We also calculate Expected Goals 2.0 contextually–specifically during times periods of an even score (even gamestate)–for your loin-tickling pleasure.

Expected Goals 2.0 for Players

Another addition from the new data we have is that we can assess players’ finishing ability while controlling for the various types of shots. Players’ goal totals can be compared to their Expected Goals totals in an attempt to quantify their finishing ability. Finishing is still a controversial topic, but it’s this type of data that will help us to separate out good and bad finishers, if those distinctions even exist. Even if finishing is not a repeatable skill, players with consistently high Expected Goals totals may be seen as players that get themselves into dangerous positions on the pitch–perhaps a skill in its own right.

The other primary player influencing any shot is the main guy trying to stop it, the goalkeeper. This data will someday soon be used to assess goalkeepers’ saving abilities, based on the types of shot taken (location, run of play, body part), how well the shot was placed in the goal mouth, and whether the keeper gave up a dangerous rebound. Thus for keepers we will have goals allowed versus expected goals allowed.

Win Expectancy

Win Expectancy is something that exists for both Major League Baseball and the National Football League, and we are now introducing it here for Major League Soccer. When the away team takes the lead in the first 15 minutes, what does that mean for their chances of winning? These are the questions that can be answered by looking at past games in which a similar scenario unfolded. We will keep Win Expectancy charts updated based on 2013 and 2014 data.

Season Preview: Sporting Kansas City

Sporting Kansas City has been a lot of things in its 18-year existence. It’s been good and bad, in the Western Conference and the Eastern Conference, and it’s been the Wizards and the “Wiz.” However a transformation occurred more recently that began with the hiring of coach and former player, Peter Vermes, and then the ensuing rebranding of the club. Below you can see the significant boost in attendance that came with a new name and a new park in 2011:

Season Regular Playoffs
2007 11,586 12,442
2008 10,686 10,385
2009 10,053 DNQ
2010 10,287 DNQ
2011 17,810 19,702
2012 19,364 20,894
2013 19,709 20,777

This change culminated in a rapidly expanding fan base that is just as fervent and rabid as any in MLS, anchored by The Cauldron. The club has seen a lot of success in the past two seasons with a US Open Cup win in 2012 and last season’s MLS Cup win at Sporting Park. Things are looking up for Sporting, and this year should yield more of that same success for the defending MLS Cup Champions.

2013 Starting XI

Sporting KC's best XI in 2013

Roster churn: Sporting KC returns 87.7% of its minutes played in 2013 (1st in the East, 2nd in MLS)

Transactions

 Player Added Position From Player Lost Position To
Sal Zizzo M POR Jimmy Nielsen GK Retirement
Andy Gruenebaum GK CLB Kyle Miller D Waived
Brendan Ruiz D Waived

2014 Preview

Median age: 25.5 *Designated player

Median age: 25.5
*Designated player

SKCINFOMajor League Soccer has seen teams rise and fall from season to season as quickly as in any other sport.  A year ago at this time, most of us thought that the San Jose Earthquakes were a favorite in the West, coming off a 72-goal, 66-point performance in 2012’s regular season. We also probably thought the Portland Timbers would be lucky to slip into the Wildcard play-in game. Previous point totals and playoff results, obviously, must be taken with a grain of salt.

While winning the MLS Cup was likely one of the most important moments in many of Sporting players’ lives, it’s not nearly as important as shot data for predicting future success—and SKC limited scoring opportunities better than anyone in the league. Sporting also came in second to the Galaxy in the run for the coveted Golden TI-89 Trophy—given for best expected goal differential in MLS last season—and it returns players that made up 87.7 percent of the team’s total minutes played last season, good for second in MLS behind Real Salt Lake’s 90.5 percent.

It should be no surprise that teams which finish a season well do little to rock the boat for the coming season. But expected goal differential suggests that Sporting is justified in keeping its unit together (+18.3 xGD), while RSL’s success with its current squadron may not be as sustainable (-4.1 xGD).

While Sporting is losing 12.3 percent of its 2013 playing time, the loss of Jimmy Nielsen to retirement makes up most of that (9.1 percent of the team’s total minutes). Considering that our goalkeeper ratings here on the site, as well as those by our own Will Reno, didn’t like Nielsen much in 2013, this could actually make Sporting better in 2014. That’s scary.

Andy Gruenebaum probably ought to be the opening day starter between the posts, but if Vermes goes with Eric Kronberg, we can suppose it’s because he’s good, and we can suppose that both keepers are better than Nielsen.

Whether Vermes goes with Gruenebaum or Kronberg, we all know it’s that SKC defense that makes the biggest difference. Led by USMNT centerback Matt Besler, Sporting allowed the fewest goals in MLS (30), and more importantly for their 2014 projections, the fewest shots (8.9/game) and the lowest expected goals against (29.8).

Before we leave the defensive part of the pitch, I would be remiss if I did not mention Besler’s secret weapon. Despite getting paid mostly to stop others from scoring, Besler can become an offensive weapon with his throw in. Across MLS, about 100 shots were taken directly following throw ins, and 14 of those were scored. Sporting represented about one-quarter of the entire league’s offensive production from the throw in, thanks in large part to Besler’s triceps.

Though Sporting’s defense was best in the league, there is room to grow offensively. SKC ranked 5th in MLS in expected goals, but 11th in actual goals. A narrative worth following this season is the relationship between Vermes and his designated player Claudio Bieler. The Argentine/Ecuadorian striker led Sporting with 10 goals in 2013, but he scored only one of those after July 13th. Bieler found himself out of the lineup often as Sporting was making its push for the Supporters’ Shield (for which it finished 2nd behind New York). Vermes justified one such benching simply by saying that it was a “tactical decision.” Bieler may be Sporting’s best goal scorer, but first he has to make the coach happy and actually play. Our Expected Goals 2.0 suggests that Bieler scored 30 percent more goals than an average player would have, given his opportunities. That was good for 16th in MLS among those with at least 50 shots. Kansas City fans could see more goals from its team in 2014 if Bieler can rack up at least 30 starts and maintain last year’s finishing pace.

Another key cog in the offensive machine is Graham Zusi. Though he’s known mostly as a facilitator for others’ shots, Zusi’s six goals in 2013 were a bonus over the 3.7 an average player would be expected to score, given his shot selection. Though the merits of the assists statistic are up for debate, what is not is that Zusi is immensely valuable to Sporting’s possession-based style of play that generates the most efficient shot ratios in the league. And his hair, oh his hair.

While winning the MLS Cup last year is not, by itself, a great predictor of 2014 success for Sporting Kansas City, adding in the fact that their championship was backed by strong predictive statistics means a lot more, and we are likely to see another championship run from Sporting this season. Sporting has few questions to answer, and kicks off 2014 as the favorite in the East. If Bieler settles in for a full season, well, we could see back-to-back MLS Cups in The Blue Hell.

Crowd Sourcing Results

1st place in the Eastern Conference; Sporting Kansas City received 226 of 404 (55.9%) first place votes, and 93.6% of voters felt that Sporting would make the playoffs.

Season Preview: San Jose Earthquakes

Soccer in San Jose has a unique history, going back to the Clash earning MLS’s first ever victory in 1996. The franchise changed its name to the Earthquakes in 1999, and a few years later it started winning MLS Cups thanks to a Landon Donovan-sized gift from Bayer Leverkusen. A young Donovan helped to lead San Jose to MLS Cup wins in 2001 and 2003. But after the 2005 season, the ownership group grew tired of failing to embezzle funds from Silicon Valley’s tax payers and moved head coach Dominic Kinnear and the rest of the team to Houston. The Earthquakes were not reborn until the 2008 season, and since then they have been mired in a streak of mostly 6th and 7th-place finishes, with an out-of-the-blue, historic 2012 season sprinkled in.1

2013 Starting XI

SJ11

Transactions

Player Added Position From   Player Lost Position To
Jean-Baptiste Pierazzi M Out of nowhere Ramiro Corrales M Retired
Atiba Harris F Trade from Colorado Nana Attakora D Option declined
Billy Schuler F Weighted lottery Dan Gargan D Option declined
Tommy Thompson M Homegrown Marcus Tracy F Option declined
Shaun Francis D Re-Entry Stage 2 Evan Newton GK Option declined
Brandon Barklage D Re-Entry Stage 2 Peter McGlynn D Option declined
Bryan Meredith GK Free Cesar Diaz Pizarro F Option declined
J.J. Koval M SuperDraft Mehdi Ballouchy M Out of contract
Justin Morrow D Traded to Toronto FC
Rafael Baca M Transferred to Cruz Azul
Jaime Alas M Loan expired
Marvin Chávez M Traded to Colorado
Steven Beitashour D Traded to Vancouver

Roster churn: San Jose returns 68.8% of its minutes played from 2013, 14th in MLS and 6th in the Western Conference.

2014 Preview

SanJoseINFOComing off a 72-goal, 66-point performance in 2012’s regular season, many thought San Jose would likely find the playoffs again, and even be in the running for an MLS Cup Trophy. But 2013 saw the Earthquakes miss out on the playoffs completely, abruptly ending their hot run during the second half of the season. Striker Chris Wondolowski’s past two seasons mirrored those of the team, eclipsing the league in 2012 with 27 goals, and then failing to reach half that tally in 2013. Our upgraded shot locations data suggest that Wondo scored just 88.5 percent of the goals that a league-average player would be expected San Jose's 2014 Rosterto score based on his shot opportunities. Will 2014 see the return of Wondolowski and San Jose to one of the top seeds in the West, or will it prove to be the franchise that has placed 6th or 7th in five of the past six seasons?

San Jose was a perplexing club from a statistical standpoint last season. Our expected goal differential statistics (xGD) really liked the fact that the Earthquakes earned 4.8 shots per game from zone 2—the dangerous area around the penalty spot—which was good for second best in the entire league. San Jose finished with the league’s third-best xGD at +6.8. Those metrics seem to suggest that the second half of last season, when San Jose earned 33 points over 17 games, was more representative of their true ability. Indeed, it’s worth noting that San Jose has been one of the best teams in the league for three-quarters of the past two seasons.

Before we get moving too quickly, though, we have some new data to bring the Goonies partway back to earth. This year’s version of shot locations data will break shots down by how they were taken, specifically headed versus kicked. With almost all the data in, now, it turns out that San Jose took nearly 23 percent of all its shots as headers—second only to Seattle—but headers are finished at about half the rate of kicked shots. The upgraded xGD 2.0 pegged the Earthquakes at an xGD of about +2.0 in 2013, which placed them fifth in the West. Fitting, as our readers picked San Jose to finish 5th in this coming season.

Of course, statistics from last year have a hard time determining the effect of losing players like Steve Beitashour and Marvin Chávez (the team’s assist leader in 2012). The major scoring pieces are still there in Wondolowski, Alan Gordon, and Steven Lenhart, but it’s harder to peg down the importance and replaceability of those midfielders and defenders.

If San Jose can continue to generate dangerous opportunities, as they have in each of the past two seasons, then look for the Earthquakes to regain a playoff spot in 2014.

Crowd Sourcing Results

5th place in the Western Conference; 138 voters (31.4%) felt that San Jose will be either a 4th or 5th seed in the playoffs in 2014, but 228 voters (56.4%) projected them to miss the playoffs completely.

Calculating Expected Goal Differential 1.0

The basic premise of expected goal differential is to assess how dangerous a team’s shots are, and how dangerous its opponent’s shots are. A team that gets a lot of dangerous shots inside the box, but doesn’t give up such shots on defense, is likely to be doing something tactically or skillfully, and is likely to be able to reproduce those results.

The challenge to creating expected goal differential (xGD), then, is to obtain data that measures the difficulty of each shot all season long. Our xGD 1.0 utilized six zones on the field to parse out the dangerous shots from those less so. Soon, we will create xGD 2.0 in which shots are not only sorted by location, but also by body part (head vs. foot) and by run of play (typical vs. free kick or penalty). Obviously kicked shots are more dangerous than headed shots, and penalty kicks are more dangerous than other shots from zone two, the location just behind the six-yard box.

So now, for the calculations.

Across the entire league, for all 8,291 shots taken in 2013, we calculate the proportion of shots from each zone that were finished (scored):

Location Goals Shots Finish%
One 129 415 31.1%
Two 451 2547 17.7%
Three 100 1401 7.1%
Four 85 1596 5.3%
Five 51 2190 2.3%
Six 5 142 3.5%

We see that shots from zones one and two are the most dangerous, while shots from farther out or from wider angles are less dangerous. To calculate a team’s offensive “dangerousness,” we count the number of shots each team attempted from each zone, and then multiply each total by the league’s finishing rate. As an example, here we have Sporting Kansas City’s offensive totals:

Locations Goals Attempts Finish% ExpGoals
One 5 18 31.1% 5.6
Two 29 160 17.7% 28.3
Three 5 78 7.1% 5.6
Four 3 97 5.3% 5.2
Five 2 120 2.3% 2.8
Six 1 17 3.5% 0.6
Total 45 490 9.2% 48.1

Offensively, if SKC had finished at the league average rate from each respective zone, then it would have scored about 48 goals. Now let’s focus on SKC’s defensive shot totals:

Locations Goals Attempts Finish% ExpGoals
One 4 13 31.1% 4.0
Two 17 95 17.7% 16.8
Three 4 54 7.1% 3.9
Four 4 56 5.3% 3.0
Five 1 84 2.3% 2.0
Six 0 4 3.5% 0.1
Total 30 306 9.8% 29.8

Defensively, had SKC allowed the league average finishing rate from each zone, it would have allowed about 30 goals (incidentally, that’s exactly what it did allow, ignoring own goals).

Subtracting expected goals against from expected goals for, we get a team’s expected goal differential. Expected goal differential works so well as a predictor because teams are more capable of repeating their ability to get good (or bad) shots for themselves, and allow good (or bad) shots to their opponents. An extreme game in which a team finishes a high percentage of shots won’t sway that team’s xGD, nor that of its opponents, making xGD a better indicator of “true talent” at the team level.

As for xGD 2.0, coming soon to a laptop near you, the main difference is that there will be additional shot types to consider. Instead of just six zones, now there will be six zones broken down by headed and kicked shots (12 total zones) in addition to free kick—and possibly even penalty kick—opportunities (adding, at most, four more shot types). As with xGD 1.0, a team’s attempts for each type of shot will be multiplied by the league’s average finishing rates, and then those totals will be summed to find expected goals for and expected goals against.

Sporting adds Gruenebaum to twiddle thumbs

After Jimmy Nielsen retired on a high note, Sporting Kansas City wasted little time trading for Columbus’ starting No. 1, Andy Gruenebaum. SKC gave up a second-round draft pick to acquire Gruenebaum. Though a second-round pick in MLS is probably not as valuable as it is in, say, the NFL, Sporting has now essentially spent a draft pick on a backup goalkeeper because Vermes named Eric Kronberg the starter for 2014.

“The last two years, [Kronberg’s] been more than ready to try to assume the position,” Vermes said. “The difference is that Jimmy’s been on top of his game.”

Now, I haven’t seen Kronberg play at all because, well, who has? He’s only played 382 minutes over eight seasons—about the equivalent of four full starts. But Vermes’ decision still perplexes me. For instance, Kronberg has played behind Nielsen for some time, and based on 2013 data, Nielsen was not a very good goalkeeper. This from our own Will Reno and this from our shot locations data both suggest that Nielsen was basically “replacement level” this past season. Kronberg is not likely to be much better, if at all, since he was playing behind Nielsen.

Then there’s Gruenebaum. I talked about him on the podcast last week, but here’s the short of it. That same data up there suggests Gruenebaum was one of the better goalkeepers in MLS last season. Both Will and I independently arrived at our statistical ratings, and Will ranked Gruenebaum as the second-best keeper on a per-game basis, while I ranked him as the third-best in the league (among regular starters, by “Goal Ratio”). Nielsen was something like 16th. Kronberg watched Nielsen from the bench.

Obviously, I haven’t been watching Kronberg train as I am not Peter Vermes. But two independent sets of keeper ratings make Gruenebaum sound like a top shelf No. 1, making this a puzzling decision from my, admittedly limited, perspective.