Tim Howard’s odd odyssey against Ghana

The United States match against Ghana felt odd, didn’t it? After years of focus on ball possession the United States could barely string together consecutive passes. After all the worrying about the inexperienced defense, the backline held like a tight string and almost pulled off perfection against an onslaught of attack. But the combination of those two circumstances, coupled with a remarkable early goal by Dempsey led to a very odd night for Tim Howard.

Howard actually led the USMNT in touches in the match with 61. A goalkeeper. With touches. So far, the 61 touches leads the World Cup for a goalkeeper. The average has been 32. The second highest was Bravo’s 58 touches during Chile’s 2-0 smashing of Spain. The games were similar in that both Chile and the United States scored first against a team that would ultimately dominate possession. That can and did lead to an odd night for our goalkeeper.

In the Ghana match, Beasley was second to Howard with 60. The 102% ratio of goalkeeper touches to highest player touches is also the highest of the World Cup. But that got me thinking. What is a typical ratio?

I looked at all 20 games of the World Cup played through June 18th. The average ratio was 45% goalkeeper touches to highest player touches. The standard deviation so far is 20%. That puts Tim Howard’s number at 2.8 standard deviations from the mean. Assuming a normal distribution, that implies the goalkeeper should have the most touches on a team just once in 400 games. That’s less often than once per MLS season. So it’s not unheard of, but it’s pretty irregular, and it definitely highlights some of the oddities of the United States win over Ghana; an early goal and a team that fails to keep possession, resulting in a backline and goalkeeper that were very busy.

Taking the touches analysis a bit further, I looked at the influence of certain positions having the most touches. I split the outcomes into games where Goalkeepers or Defenders had the most touches and where Midfielders or Forwards had the most touches. Some interesting things pop out even though the sample size is small.

Leader in Touches Team-Games Goals For Goals Against
GK or D 19 1.37 1.84
Mid or FW 21 1.62 1.19

There is a pretty clear advantage so far when a midfielder or forward leads the team in touches. Certainly there is a cause and effect issue at play here. Is this the result of one team’s dominance over the other, or is it really more important to have the ball at the feet of the attacking players more often?

Of the 20 World Cup matchups thus far, eight of them included one team being led in touches by a defender while the other team was led in touches by a midfielder or forward. In those games, the team which was led in touches by a midfielder or forward produced a record of 5-2-1 (W-D-L).

Again, it’s too early to read too much into this data,  but it will be interesting to follow through the tournament. The data does open up thoughts of tracking where touches are occurring on the pitch and how that might help describe outcomes or predict them.

No matter what, Tim Howard was more involved than any player for the United States on Monday night. Neither he nor the American fans felt comfortable throughout the match, and the touches data justifies that sentiment.

Advertisements

Individual Defensive Statistics: Which Ones Matter and Top 10 MLS Defenders

When a car breaks down, a mechanic’s job is to tell you what caused the failure. He or she can generally pinpoint the problem to a specific part reaching the end of its useful life. But have you ever asked a mechanic why your car is working fine? Or which part deserves the most credit for your car running smoothly? Of course not. That would be a waste of everyone’s time. There are many parts to a car and all are doing their job as designed. We never ask why when things are going well.

The same dilemma exists in assessing soccer defenders. After all, most of how we assess defenders has to do with what goals were not scored. And when all the parts of the defenses are working as designed, goals are avoided. But which defenders deserve the credit when goals aren’t scored? It’s like the pointless car question, which parts of the car deserve the most credit when the car runs smoothly?

To even begin this conversation we need to take stock of what data exists for soccer defenders. And just to be clear, I am going to steer clear looking at a defender’s offensive capability. I want to focus solely on defensive statistics. Whoscored is the only site that offers a collection of defensive statistics, and here is what they have and their definitions.

  • Blocked Shot: Prevention by an outfield player of an opponents shot reaching the goal
  • Clearance: Action by a defending player that temporarily removes the attacking threat on their goal/that effectively alleviates pressure on their goal
  • Interception: Preventing an opponent’s pass from reaching their teammates
  • Offside Won: The last man to step up to catch an opponent in an offside position
  • Tackle: Dispossessing an opponent, whether the tackling player comes away with the ball or not

These are the defensive-oriented statistics offered by Whoscored that are tracked at the individual player level. Of course, the other vital defensive statistic is shots conceded but those can’t be attributed to any one player. So then, do any of these statistics matter? First there are a couple of assumptions to iron out.

A defender should be judged by the rate at which he accumulates statistics. So to get to that number we need to adjust these statistics to account for the time that the opponent has the ball. For example, Player A who averages 5 clearances per game might be better than Player B who averages 6 clearances if Player A’s opposition had the ball 20% less often. That would mean player A made more clearances given the opportunities provided to him. So I will adjust all metrics by opposition possession.

Since I am trying to assess what goals are not scored, I going to look at the numbers at the team level first. It is only at the team level that goals can be attributed. After that analysis I will attempt to attribute value to the individual metrics.

sources: whoscored, mlssoccer.com

sources: whoscored, mlssoccer.com

Here are tackles per game per minute of opponent possession against goals scored. Tackles represents the strongest correlation of all the variables. In fact, tackles has a slightly stronger correlation to goals against than shots conceded. Here is a look at the shots conceded as a percent of opponent minute of possession.

sources: whoscored.com, mlssoccer.com

sources: whoscored.com, mlssoccer.com

The two points to the far left represent the LA Galaxy and Sporting Kansas City. They appear adept at limiting shots on goal per minute of opposition possession. They also stand out when looking at offsides won.

Rather than show every graph, here is a table of the defensive statistics, their level of impact and the R squared of the impact in predicting goals against.

Statistic Goals Avoided per Unit R squared
Clearances -0.041 27.1%
Interceptions -0.036 15.1%
Tackles -0.077 39.4%
Offsides Won -0.113 16.0%
Blocks % of Shots -0.017 0.3%

Offsides won is the most impactful of the statistics (has the greatest slope) but there is a weaker correlation than Tackles or Clearances–in other words, there are greater deviations from the trend line. It’s interesting to see that Blocks as a percent of shots has almost no impact on goals allowed.

This is interesting, but what to make of it all? In an ideal world we could compile these statistics into a meaningful metric in order to compare players. The most obvious way to do that statistically would be to run a multivariate regression using all of the statistics.  The trouble with the result is that the statistics end up not being statistically significant predictors when mashed together. So developing a score from these metrics would be a bit of a fool’s errand.

The other option would be to ignore the predictive strength of the variables and just use the goals avoided results as a scalar, multiply them by each player’s statistics, add them up and compile a score. In this case the resulting score would be something we relate to as we could say that this player avoids x number of goals per game. However, this would give offsides won the statistic with the greatest importance despite the fact that the correlation is not strong.

To factor in the correlation we could leave the realm of sound statistical practice. We could multiply the goals avoided scalar by the R square. We could turn that into an index with the highest metric (tackles) equaling 1. If we did that here is the resulting table and values for each metric.

Statistic Goals Avoided per Unit R squared GApU x R2 Index
Clearances -0.041 27.1% -0.011 0.37
Interceptions -0.036 15.1% -0.005 0.18
Tackles -0.077 39.4% -0.030 1.00
Offsides Won -0.113 16.0% -0.018 0.60
Blocks % of Shots -0.017 0.3% 0.000 0.00

Tackles would be the most important statistic followed by offsides won and then clearances and interceptions. It turns out blocked shots have no material value in estimating goals against.

Before I use these numbers to reveal the top 10 MLS defenders, here are the caveats. Obviously this ranking is missing a few vital elements of defending in soccer. The first major omission is positioning. Often a defender being in the right position forces an offense to not make a pass that would increase their chance of scoring. There is no measurement for that but obviously a defender out of position is not a valuable defender. Clearances, interceptions, tackles and offsides won are clearing indicators that the player was probably in position to make the play and they indicate the player succeeding making the necessary play. But offensive attempts avoided are clearly missing.

The other major omission is the offensive play of the defender. A defender who defends well and represents an offensive threat is that much more valuable. But I’m not trying to solve for that here. I leave that for the subject of another post to integrate passing and offensive numbers to build a better score for defenders.

Here are the top 10 MLS defenders based on the score developed through the last week for players with a minimum of four appearances.

Rank Name Team Tackles Intercepts Off Won Clears Defender Score
1 José Gonçalves New England Rev. 1.6 2.4 2 11.2 7.376
2 Giancarlo Gonzalez Columbus Crew 2.1 2.9 1.9 9.3 7.203
3 Norberto Paparatto Portland Timbers 1.8 4.8 1.3 9.3 6.885
4 Carlos Bocanegra CD Chivas USA 1.5 3.6 2.1 8.9 6.701
5 Andrew Farrell New England Rev. 2.9 2.4 0.3 8.3 6.583
6 Jamison Olave New York Red Bulls 1.9 3.1 1.7 6.7 5.957
7 Victor Bernardez San Jose Quakes 1.5 2.8 0.7 9.5 5.939
8 Matt Hedges FC Dallas 1.5 3.9 0.9 8.5 5.887
9 Eric Avila CD Chivas USA 4 2.4 0.8 2.3 5.763
10 Chris Schuler Real Salt Lake 1.8 2.8 0.5 8.3 5.675

I find it comforting that, for a new metric, Jose’ Goncalves, MLS Defender of the Year in 2013, tops the list. There’s a big drop between the top 2 defenders and Paparatto. There’s also another cliff after Andrew Farrell. But hey, it’s a start.

I hope this was an enlightening ride through the mechanics of defending from a soccer perspective. The next time you’re watching a game, don’t just focus on the breakdowns. Also look for what makes the defense successful.

 

Location Adjusted Total Shots Ratio

Millionaire Malcolm Forbes was famous for his quote, “He who dies with the most toys wins.” And while that might not be the most moral mantra for life, sports fans have a hard time arguing with the logic. After all, a game is about runs, points or goals, and after enough of those it’s about shiny trophy cases. But in the world of sports analysis there is no such victory in the absolute. Analysts need to explain how those runs, points or goals came about. In the world of soccer especially, there is never a complete answer. Goals are exceedingly rare, so explaining how they grace us with their presence mathematically is difficult, to say the least. We’re happy with higher R-squareds and other such geeky descriptive metrics. Have you ever seen a trophy case filled with strong correlations? Nope, all we get is a little blog post, and if we’re lucky, some twitter praise. Still, we search….

One of the more popular explanations for winning in soccer is Total Shots Ratio, which calculates the percentage of shots taken by a team in games played by that team. A 60% TSR means that a given team took 60% of the total shots fired in the games they played. The logic isn’t all that difficult to wrap your head around. If you can take more shots than your opponent you are likely to score more goals. For the English Premier League, TSR explains 68% of the variance in the point table, which is impressive for one statistic. TSR happens to be less important in MLS.

data sources: AmericanSoccerAnalysis, mlssoccer.com

data sources: AmericanSoccerAnalysis, mlssoccer.com

The variance prediction is just 37% and this is likely due to the lower finishing rates in MLS compared to the EPL, rendering shots less effective. But there are probably a number of other reasons why TSR is less predictive of points in MLS. There are a larger percentage of teams employing counterattack strategies which have significant impacts on finishing rates, which would in turn alter the effectiveness of TSR. But what if the shots were weighted to account for the location of the shots? It would be logical to assume that better teams take better shots and make it more difficult on the opposing shooters. But does that logic actually manifest itself when predicting points? ASA’s Expected Goals 1.0 worked pretty well, so a TSR adjusted for shot locations ought to work better than the original TSR.

The first thing required would be a fair weighting of shots by location. To do that I took the ratio of the finishing rate by location and divided by the average finishing rate. Here is the resulting table for adjusting the value of shots.

Location Weighting
1 3.14
2 1.79
3 0.72
4 0.54
5 0.24

For the sake of simplicity I have collapsed zones 5 & 6 into a fifth zone. This table illustrates that a shot from zone 1–inside the 6-yard box–is actually worth 3.14 average shots. And a shot from zone 5 is worth just .24 average shots. Adjusting all of the shots in MLS in 2013 yields the following result when attempting to predict table points.

data sources: AmericanSoccerAnalysis, mlssoccer.com

data sources: AmericanSoccerAnalysis, mlssoccer.com

You can tell from just eyeballing the dispersion of the data points that the location adjusted TSR better aligns with points and the Rsquared agrees. There is a 17-percent increase in R-squared. Not just pure volume of shots, but the location of those shots is vital to predicting points in MLS. It would be interesting to see if location is equally important in the EPL, where TSR is already such a strong predictor.

For the curious, the New York Red Bulls were the team that was best at getting better shots than their opponent. Their TSR improved from 47% to 52% when adjusting for shot location. Real Salt Lake actually took the biggest hit. Their TSR was 53% and their location-adjusted TSR dropped to 48%.

It’s only one season’s worth of data, but with such an impressive increase in the ability to explain the variance in point totals, it confirms that location does matter, and that teams are rewarded by taking better shots themselves while pushing their opponents -out farther from goal. And perhaps soccer analysts have another statistical toy to add to the toy box—Location-Adjusted Total Shot Ratio.

In Defense of the San Jose Earthquakes and American Soccer

Note: This is part II of the post using a finishing rate model and the binomial distribution to analyze game outcomes. Here is part I.

As if American soccer fans weren’t beaten down enough with the removal of 3 MLS clubs from the CONCACAF Champions League, Toluca coach Jose Cardozo questioned the growth of American soccer and criticized the strategy the San Jose Earthquake employed during Toluca’s penalty-kick win last Wednesday. Mark Watson’s team clearly packed it in defensively and looked to play “1,000 long balls” on the counterattack. It certainly doesn’t make for beautiful fluid soccer but was it a smart strategy? Are the Earthquakes really worthy of the criticism?

Perhaps it’s fitting that Toluca is almost 10,000 feet above sea level because at that level the strategy did look like a disaster. Toluca controlled the ball for 71.8% of the match and ripped off 36 shots to the Earthquakes’ 10. It does appear that San Jose was indeed lucky to be sitting 1-1 at the end of match. The fact that Toluca only scored one lone goal in those 36 shots must have been either unlucky or great defense, right? Or could it possibly have been expected?

The prior post examined using the binomial distribution to predict goals scored, and again one of the takeaways was that the finishing rates and expected goals scored in a match decline as shots increase, as seen below. This is a function of “defensive density,” I’ll call it, or basically how many players a team is committing to defense. When more players are committed to defending, the offense has the ball more and ultimately takes more shots. But due to the defensive intensity, the offense is less likely to score on each shot.

 source: AmericanSoccerAnalysis

Data source: American Soccer Analysis

Mapping that curve to an expected goals chart you can see that the Earthquakes expected goals are not that different from Toluca’s despite the extreme shot differential.

source data: AmericanSoccerAnalysis

Data sources: American Soccer Analysis, Golazo

Given this shot distribution, let’s apply the binomial distribution model to determine what the probability was of San Jose advancing to the semifinals of the Champions League. I’m going to use the actual shots and the expected finishing rate to model the outcomes. The actual shots taken can be controlled through Mark Watson’s strategy, but it’s best to use expected finishing rates to simulate what outcomes the Earthquakes were striving for. Going into the match the Earthquake needed a 1-1 draw to force a shootout. Any better result would have seen them advancing and anything worse would have seen them eliminated.

Inputs:

Toluca Shots: 36

Toluca Expected Finishing Rate: 3.6%

San Jose Shots: 10

San Jose Expected Finishing Rate: 11.2%

Outcomes:

Toluca Win: 39.6%

Toluca 0-0 Draw: 8.3%

Toluca 1-1 Draw: 13.9% x 50% PK Toluca = 6.9%

Total Probability Toluca advances= 54.9%

 

San Jose Win: 32.3%

2-2 or higher Draw = 5.8%

San Jose 1-1 Draw: 13.9% x 50% PK San Jose = 6.9%

Total Probability San Jose Advances = 45.1%

 

The odds of San Jose advancing with that strategy are clearly not as bad as the 10,000-foot level might indicate. Counterattacking soccer certainly isn’t pretty, but it wouldn’t still exist if it weren’t considered a solid strategy.

It’s difficult, but we can also try to simulate what a “normal” possession-based strategy might have looked like in Toluca. In MLS the average possession for the home team this year is 52.5% netting 15.1 shots per game. In Liga MX play, Toluca is only averaging about 11.4 shots per game so they are not a prolific shooting team. They are finishing at an excellent 15.2%, which could be the reason San Jose attempted to pack it in defensively. The away team in MLS is averaging 10.4 shots per game. If we assume that a more possession oriented strategy would have resulted in a typical MLS game then we have the following expected goals outcomes.

source data: AmericanSoccerAnalysis

Data sources: American Soccer Analysis, Golazo

Notice the expected goal differential is actually worse for San Jose by .05 goals. Though it may not be statistically significant, at the very least we can say that San Jose’s strategy was not ridiculous.

Re-running the expected outcomes with the above scenario reveals that San Jose advances 43.3% of the time. A 1.8% increase in the probability of advancing did not deserve any criticism, and definitely not such harsh criticism. It shows that the Earthquakes probably weren’t wrong in their approach to the match. And if we had factored in a higher finishing rate for Toluca, the probabilities would favor the counterattack strategy even more.

Even though the US struck out again in the CONCACAF Champions League, American’s don’t need to take abuse for their style of play. After all, soccer is about winning, and in the case of a tie, advancing. We shouldn’t be ashamed or be criticized when we do whatever it takes to move on.

 

Predicting Goals Scored using the Binomial Distribution

Much is made of the use of the Poisson distribution to predict game outcomes in soccer. Much less attention is paid to the use of the binomial distribution. The reason is a matter of convenience. To predict goals using a Poisson distribution, “all” that is needed is the expected goals scored (lambda). To use the binomial distribution, you would need to both know the number of shots taken (n) and the rate at which those shots are turned into goals (p). But if you have sufficient data, it may be a better way to analyze certain tactical decisions in a match. First, let’s examine if the binomial distribution is actually dependable as a model framework.

Here is the chart that shows how frequently a certain number of shots were taken in a MLS match.

source data: AmericanSoccerAnalysis

source data: AmericanSoccerAnalysis

The chart resembles a binomial distribution with right skew with the exception of the big bite taken out of the chart starting with 14 shots. How many shots are taken in a game is a function of many things, not the least of which are tactical decisions made by the club. For example it would be difficult to take 27 shots unless the opposing team were sitting back and defending and not looking to possess the ball. Deliberate counterattacking strategies may very well result in few shots taken but the strategy is supposed to provide chances in a more open field.

Out of curiosity let’s look at the average shot location by shots taken to see if there are any clues about the influence of tactics. To estimate this I looked expected goals by each shot total. This does not have any direct influence on the binomial analysis but could come in useful when we look for applications.

source: AmericanSoccerAnalysis

source data: AmericanSoccerAnalysis

The average MLS finishing rate was just over 10 percent in 2013. You can see that, at more than 10 shots per game, the expected finishing rate stays constant right at that 10-percent rate. This indicates that above 10 shots, the location distribution of those shots is typical of MLS games. However, at fewer than 10 shots you can see that the expected goal scoring rate dips consistently below 10%. This indicates that teams that take fewer shots in a game also take those shots from worse locations on average.

The next element in the binomial distribution is the actual finishing rate by number of shots taken.

 source: AmericanSoccerAnalysis

source data: AmericanSoccerAnalysis

Here it’s plain that the number of shots taken has a dramatic impact on the accuracy rate of each shot. This speaks to the tactics and pace of play involved in taking different shot amounts. A team able to squeeze off more than 20 shots is likely facing a packed box and a defense less interested in ball possession. What’s fascinating then is that teams that take few shots in a game have a significantly higher rate of success despite the fact that they are taking shots from farther out. This indicates that those teams are taking shots with significantly less pressure. This could indicate shots taken during a counterattack where the field of play is more wide open.

Combining the finishing accuracy model curve with number of shots we can project expected goals per game based on number of shots taken.

ExpGoalsbyShotsTaken

What’s interesting here is that the expected number of goals scored plateaus at about 18 shots and begins to decline after 23 shots. This, of course, must be a function of the intensity of the defense they are facing for those shots because we know their shot location is not significantly different. This model is the basis by which I will simulate tactical decisions throughout a game in Part II of this post.

Now we have the two key pieces to see if the binomial distribution is a good predictor of goals scored using total shots taken and finishing rate by number of shots taken. As a refresher, since most of us haven’t taken a stat class in a while, the probability mass function of the binomial distribution looks like the following:

source: wikipedia

Where:

n is the number of shots

p is the probability of success in each shot

k is the number of successful shots

Below I compare the actual distribution to the binomial distribution using 13 shots (since 13 is the mode number of shots from 2013’s data set), assuming a 10.05% finishing rate.

source data: AmericanSoccerAnalysis, Finishing Rate model

source data: AmericanSoccerAnalysis, Finishing Rate model

The binomial distribution under predicts scoring 2 goals and over predicts all other options. Overall the expected goals are close (1.369 actual to 1.362 binomial). The Poisson is similar to the binomial but the average error of the binomial is 12% better than the Poisson.

If we take the average of these distributions between 8 and 13 shots (where the sample size is greater than 40) the bumps smooth out.

source data: AmericanSoccerAnalysis, Finishing Rate model

source data: AmericanSoccerAnalysis, Finishing Rate model

The binomial distribution seems to do well to project the actual number of goals scored in a game, and the average binomial error is 23% lower than with the Poisson. When individually looking at shots taken 7 to 16 the binomial has 19% lower error if we just observe goal outcomes 0 and 1. But so what? Isn’t it near impossible to predict the number of shots a team will take in the game? It is. But there may be tactical decisions like counterattacking where we can look at shots taken and determine if the strategy was correct or not. And a model where the final stage of estimation is governed by the binomial distribution appears to be a compelling model for that analysis. In part II I will explore some possible applications of the model.

Jared Young writes for Brotherly Game, SB Nation’s Philadelphia Union blog. This is his first post for American Soccer Analysis, and we’re excited to have him!