Real Salt Lake: Perennial Model Buster?

If you take a look back at 2013’s expected goal differentials, probably the biggest outlier was MLS Cup runner up Real Salt Lake. Expected to score 0.08 fewer goals per game than its opponents, RSL actually scored 0.47 more goals than its opponents. That translates to a discrepancy of about 19 unexplained goals for the whole season. This year, RSL finds itself second in the Western Conference with a goal differential of a massive 0.80. However, like last year, the expected goal differential is lagging irritatingly behind at –0.77.

There are two extreme explanations for RSL’s discrepancy in observed versus expected performance, and while the truth probably lies in the middle, I think it’s valuable to start the discussion at the extremes and move in from there.

It could be that RSL plays a style and has the personnel to fool my expected goal differential statistic. Or, it could be that RSL is one lucky son of a bitch. Or XI lucky sons of bitches. Whatever.

Here are some ways that a team could fool expected goal differential:

  1. It could have the best fucking goalkeeper in the league.
  2. It could have players that simply finish better than the league average clip in each defined shot type.
  3. It could have defenders that make shots harder than they appear to be in each defined shot type–perhaps by forcing attackers onto their weak feet, or punching attackers in the balls whilst winding up.
  4. That’s about it.

We know are pretty sure that RSL does indeed have the best goalkeeper in the league, and Will and I estimated Nick Rimando’s value at anywhere between about six and eight goals above average* during the 2013 season. That makes up a sizable chunk of the discrepancy, but still leaves at least half unaccounted for.

The finishing  ability conversation is still a controversial one, but that’s where we’re likely to see the rest of the difference. RSL scored 56 goals (off their own bodies rather than those of their opponents), but were only expected to score about 44. That 12-goal difference can be conveniently explained by their five top scorers–Alvaro Saborio, Javier Morales, Ned Grabavoy, Olmes Garcia, and Robbie Findley–who scored 36 goals between them while taking shots valued at 25.8 goals. (see: Individual Expected Goals, and yes it’s biased to look at just the top five goal scorers, but read on.)

Here’s the catch, though. Using the sample of 28 players that recorded at least 50 shots last season and at least 5 shots this season, the correlation coefficient for the goals above expectation statistic is –0.43. It’s negative. Basically, players that were good last year have been bad this year, and players that were bad last year have been good this year. That comes with some caveats–and if the correlation stays negative then that is a topic fit for another whole series of posts–but for our purposes here it suggests that finishing isn’t stable, and thus finishing isn’t really a reliable skill. The fact that RSL players have finished well for the last 14 months means very little for how they will finish in the future.

Since I said there was a third way to fool expected goal differential–defense. I should point out that once we account for Rimando, RSL’s defense allowed about as many goals as expected. Thus the primary culprits of RSL’s ability to outperform expected goal differential have been Nick Rimando and its top five scorers. So now we can move on to the explanation on the other extreme, luck.

RSL has been largely lucky, using the following definition of lucky: Scoring goals they can’t hope to score again. A common argument I might expect is that no team could be this “lucky” for this long. If you’re a baseball fan, I urge you to read my piece on Matt Cain, but if not, here’s the point. 19 teams have played soccer in MLS the past two seasons. The probability that at least one of them gets lucky for 1.2 seasons worth of games is actually quite high. RSL very well may be that team–on offense, anyway.

Unless RSL’s top scorers are all the outliers–which is not impossible, but unlikely–then RSL is likely in for a rude awakening, and a dogfight for a playoff spot.

 

*Will’s GSAR statistic is actually Goals Saved Above Replacement, so I had to calibrate.

12 thoughts on “Real Salt Lake: Perennial Model Buster?

  1. You left out some other potential reasons for xGD defiance. Mainly it’s the other side of your #3. There are blindspots of the data drive xG, and thus blindspots in xG. For example, my pet theory is that RSL create shots on the break and/or directly off of throughballs, both of which make the resultant shot more likely to score. There are also blindspots within zones, probably best exemplified by the difference between a shot from the far corner of zone 2 versus straight on and nearly in zone 1. Not impossible that a team skews toward the better part of that zone.

    I also wouldn’t be at all surprised if Alvaro Saborio (and maybe one or two others on the team) is an xG-defying finisher.

  2. Truf. I was kind of encompassing all of the finishing stuff in part 2 with “finish better than the league average clip in each defined shot type.” Of course, RSL players would have to systematically get better shots from within each zone, whether because it was a better section of each zone, or because it was a “cleaner look.” It’s hard to account for clean looks, but the model does take into account whether a shot is from a fastbreak, corner kick, set piece, or other pattern of play, hopefully accounting for at least a little of that.

    It turns out that corner kicks are finished less often than other patterns of play, and here is some data about that:

    Typical proportion of corners: 13.4%
    RSL for: 13.1%
    RSL against: 14.7%

    Unfortunately Opta’s definition of a fastbreak is so strict that there were only 114 shots from fastbreaks in all of MLS in 2013. Thus, that doesn’t really help us much here since RSL could get a lot of clean looks on through balls that aren’t counted as fastbreaks.

    So yes, RSL could still be getting better looks from within each zone, though I’m skeptical.

    Why is it that you think Saborio can outperform his expected goals?

  3. I agree with Steve’s comment. Perhaps you could look at the number of shots they take per game which gives an indication of the defensive density into which they take their shots. Having just seen them live against Philly, they seem adept at creating chances VERY quickly once they gained possession. Whereas Philly is methodical in their movement up the field, RSL seems to choose quick strikes. Perhaps that is a road strategy, I’m not sure.

    • So here are some things I’m going to look into coming up:
      *RSL’s placement of the ball in the goal mouth (corner vs. middle and high vs. low)
      *RSL’s proportion of assisted shots, and whether or not assisted shots correlate to better finishing
      *Correlation between shots taken in game and xGoals. Though my thought is that by adjusting for gamestate (score), that would likely help control for defensive density.

  4. As a faithful RSL fan, I think that what might be missing from your model is that different MLS teams follow different strategies. RSL’s ideal game is one where the midfield passes in tight spaces and works to try to get the defense out of shape before taking a shot. This results in fewer shots taken, but the probability that an individual shot goes in will be higher because the defense isn’t set.

    A contrasting example would be a team who tries to send a lot of balls into the box and hope that their big target forwards can bang through the defenders and poke one in. This strategy would result in more shots taken, but lower chance that each individual shot goes in. I’m thinking of teams like San Jose or Houston here.

    When you lump all of those teams together to get the probability that a shot taken from a particular zone goes in, you’re missing how clear of a look the shooter is getting. As long as RSL creates clearer looks than the average MLS team, they’ll be model breakers, and it’s not because of luck or because their forwards are exceptionally more accurate than the typical MLS forward. It’s because of their style of play.

    Measuring that isn’t easy though. You’d want something like distance to nearest defender when the shot was taken, or something like that, but I don’t think that’s in Opta. But, if my hypothesis is right, you’ll find a negative correlation between number of shots taken in a game and (Goals – xGoals), since teams that take a lot of shots are likely taking lower probability shots, even conditional on the zone, body part, etc.

    Also, I think Steve is right that Saborio is likely to outperform his expected goals. He’s just more accurate than your average shot-taker in MLS, so by definition he will outperform the model. I’d bet that Wondo and Henry are also in the category.

    • Sorry, one other thing: can you explain why RSL could be lucky? I understand your point that with 19 teams playing 1.2 seasons it’s likely that one team will be lucky. But, your model is built off of individual shots, not teams or seasons. What is the probability that a team is lucky over 500+ shots? If i take 19 teams of exactly equal skill and have them take 500 shots each, the probability that one team will be a “model breaker” is pretty low, no?

    • For your second point first, consider a simplified scenario where all shots are 10% likely to go in, and 19 teams each take 500 shots. According to the binomial probability distribution, the probability that a particular team scores at least 12 goals more than the expected 50 is just 4.6%. In that sense, it would be rare for a single “pre-identified” team to get that lucky. However, with 19 teams all recording 500 shots, the probability that at least one team outperforms expectation by at least 12 goals is actually 59.1%. Which means that I would expect one team to do this randomly.

      Now, the binomial distribution is kind of a blunt tool for this example, since in reality there are various probabilities of scoring on a sequence of shots, not just one overall probability. However, the binomial distribution tends to produce a lower bound for variance, so I think it is appropriate for showing the possibility of outperforming expectation in this scenario.

      • Got it; makes sense. A lot of that depends on how far RSL (or any other team) deviates from the model, then. For example, deviating by 12 goals isn’t really a “model buster,” since we kind of expect at least one team to do this. But deviating by 20 goals might be. Also, deviating over more shots would make it less likely as well (e.g. increase n to 1000 instead of 500).

        Also, I realize you were only giving a hypothetical above, but using RSL’s actual numbers the probability is much lower, I think. Specifically, over 2013 and the first 6 games of 2014, RSL has taken 515 shots and scored 67 goals. Using your expected goals in 2013 and 2014, I calculate that you expected RSL to score 9.69% of their shots in 2013 and 10.18% of their shots in 2014. Taking a weighted average of the two (weights are share of shots in 2013 (87%) and 2014 (13%)), I get a blended probability of scoring over the whole sample of 9.76%. Plug those numbers into a binomial and the probability that RSL scores 67 goals in 515 shots is just 0.99%. If we had 19 other teams exactly like RSL, then, we would expect to see a team outperform by the 16.8 goals that RSL has done only 17.1% of the time. Still not statistically significant, but significantly less likely than the 59.1% you have in your hypothetical!

        Anyway, thanks for the clarification, makes a lot more sense now! Even at 17.1%, it’s easily within the realm of possibility that RSL is just lucky.

    • All that said, back to your first point. It is definitely possible that RSL has an ability to outperform expected goals. In the days since writing this piece, I have been doing more research on stabilization, and my viewpoint has softened a bit on RSL. I still think they’re over-performing, but not by a full 12 goals. I think that you and Steve and Jared are spot on, and that it’s possible teams can generate “cleaner looks.” Unfortunately, it is impossible to measure the cleanliness of the look with our current data set, but I think I may have found something that RSL does well which also stabilizes to some degree. It doesn’t necessarily suggest that they get cleaner looks, but that would be one possible conclusion.

      The article will go up Friday 🙂

      • Awesome. Definitely looking forward to the article, and thanks for the amazing stats! Much appreciated.

  5. Oh, good point! I was using just 2013’s over-performance, but I was using a total shots figure for both 2013 and 2014. I calculated about the same probability you did at 17% (I wish my students’ abilities to use the binomial distribution were as good as yours), which is obviously still possible by randomness, but gives RSL’s case a little more of a boost. My hunch is that the truth lies in the middle (as it always seems to), that RSL can over-perform consistently but not by this much, and I will hopefully support that adequately on Friday.

    Thanks for hanging out with us!

  6. Pingback: Looking for the model-busting formula | American Soccer Analysis

Leave a comment