The Predictive Power of Shot Locations Data

Two articles in particular inspired me this past week—one by Steve Fenn at the Shin Guardian, and the other by Mark Taylor at The Power of Goals. Steve showed us that, during the 2013 season, the expected goal differentials (xGD) derived from the shot locations data were better than any other statistics available at predicting outcomes in the second half of the season. It can be argued that statistics that are predictive are also stable, indicating underlying skill rather than luck or randomness. Mark came along and showed that the individual zones themselves behave differently. For example, Mark’s analysis suggested that conversion rates (goal scoring rates) are more skill-driven in zones one, two, and three, but more luck-driven or random in zones four, five, and six.

Piecing these fine analyses together, there is reason to believe that a partially regressed version of xGD may be the most predictive. The xGD currently presented on the site regresses all teams fully back league-average finishing rates. However, one might guess that finishing rates in certain zones may be more skill, and thus predictive. Essentially, we may be losing important information by fully regressing finishing rates to league average within each zone.

I assessed the predictive power of finishing rates within each zone by splitting the season into two halves, and then looking at the correlation between finishing rates in each half for each team. The chart is below:

Zone Correlation P-value
1 0.11 65.6%
2 0.26 28.0%
3 -0.08 74.6%
4 -0.41 8.2%
5 -0.33 17.3%
6 -0.14 58.5%

Wow. This surprised me when I saw it. There are no statistically significant correlations—especially when the issue of multiple testing is considered—and some of the suggested correlations are actually negative. Without more seasons of data (they’re coming, I promise), my best guess is that finishing rates within each zone are pretty much randomly driven in MLS over 17 games. Thus full regression might be the best way to go in the first half of the season. But just in case…

I grouped zones one, two, and three into the “close-to-the-goal” group, and zones four, five, and six into the “far-from-the-goal” group. The results:

Zone Correlation P-value
Close 0.23 34.5%
Far -0.47 4.1%

Okay, well this is interesting. Yes, the multiple testing problem still exists, but let’s assume for a second there actually is a moderate negative correlation for finishing rates in the “far zone.” Maybe the scouting report gets out by mid-season, and defenses close out faster on good shooters from distance? Or something else? Or this is all a type-I error—I’m still skeptical of that negative correlation.

Without doing that whole song and dance for finishing rates against, I will say that the results were similar. So full regression on finishing rates for now, more research with more data later!

But now, piggybacking onto what Mark found, there does seem to be skill-based differences in how many total goals are scored by zone. In other words, some teams are designed to thrive off of a few chances from higher-scoring zones, while others perhaps are more willing to go for quantity over quality. The last thing I want to check is whether or not the expected goal differentials separated by zone contain more predictive information than when lumped together.

Like some of Mark’s work implied, I found that our expected goal differentials inside the box are very predictive of a team’s actual second-half goal differentials inside the box—the correlation coefficient was 0.672, better than simple goal differential which registered a correlation of 0.546. This means that perhaps the expected goal differentials from zones one, two, and three should get more weight in a prediction formula. Additionally, having a better goal differential outside the box, specifically in zones five and six, is probably not a good thing. That would just mean that a team is taking too many shots from poor scoring zones. In the end, I went with a model that used attempt difference from each zone, and here’s the best model I found.*

Zone Coefficient P-value
(Intercept) -0.61 0.98
Zones 1, 3, 4 1.66 0.29
Zone 2 6.35 0.01
Zones 5, 6 -1.11 0.41

*Extremely similar results to using expected goal differential, since xGD within each zone is a linear function of attempts.

The R-squared for this model was 0.708, beating out the model that just used overall expected goal differential (0.650). The zone that stabilized fastest was zone two, which makes sense since about a third of all attempts come from zone two. Bigger sample sizes help with stabilization. For those curious, the inputs here were attempt differences per game over the first seventeen games, and the response output is predicted total goal differential in the second half of the season.

Not that there is a closed-the-door conclusion to this research, but I would suggest that each zone contains unique information, and separating those zones out some could strengthen predictions by a measurable amount. I would also suggest that breaking shots down by angle and distance, and then kicked and headed, would be even better. We all have our fantasies.

Montreal Impact And Shot Placement

We like raw numbers around these parts. The lowest common denominator the better. But we like numbers in general, it’s as if we are… kind of involved. There isn’t much in the way of discrimination. You can take Numbers, and they can tell a story. Numbers can be just as biased as any news reporter or general fan too. They can also help give us insight to a specific question that we may have.

A popular question around these parts is simply: why is Montreal so good? A club racing towards an opportunity for Supporting Shield. They sit 4th in the table with 26 points, two points behind the leading FC Dallas and have atleast two games in hand against all clubs above them in the standings. Obviously, they are in very good shape with a chance to run away this season with hardware. So how are they doing it?

Well, the one specific point of contention for us is their shooting. Currently the Impact are 5th in the league in shots on target per match and even further down the pipe at 14th with total shots attempted per match. So the question then becomes, how have they scored 1.69 goals a game, good for best in all of MLS?

They’re shooting the lights out. Well, sort of. The ball is ending up in the back of the net at unusually high rates. Matthias and I have pretty much just summed this up to being  an irregularity, an outlier, and one that will eventually see the Impact coming back down to earth.

And yet, they haven’t.

Montreal have the highest goal scoring rate in the league, yet have the same goal differential as the New England Revolution that sit 11th in the Supporter Shield table. 6 of their 8 wins have been by won by a single goal margin. Which tell us they’ve been strong in holding their leads.

It’s obviously something that could and likely will involve a much further investigation as time permits. But I did formulate some interesting enough thoughts while digging through and Squawka data.

Goal Locations

A good 80% of the goals are in high percentage conversation locations on the frame. Predominately low and presumably away from the keeper. You can see that trend continues with their overall shot selection.

shot locations

The majority of their shots are all, again, in great places with one third of the total shots in the lower half of the frame.

I’m not at this point sold that the Impact are going to come back down to earth with their conversion ratio. It’s not so much that they are taking shots, but the type of shots they are taking. Marco Di Vaio is 36 and with that comes experience and intelligence.

He understands what he’s doing. I believe that his effort to place high percentage shots is not only a skill; it’s purposeful, and it’s a game plan.

I’m not sure if they can continue to win in their +1 goal states, but their defence* has been very good thus far. It’s possible, considering their current form, that they have a legit shot at the Supporter Shield at year’s end.

Then again, we just may have to dig deeper into this.**

*Editor’s note: Harrison is turning Redcoat on us.

**Editor’s note: We will.