Park Effects: How does this apply in MLS?

This sparked my curiosity the other day. Joe Posnanski, who is an amazing baseball writer, wrote about the current MVP race as the MLB season dwindles down. One of his large points is, of course, offensive production and how you properly place that into context. He mentions something about park factors between the two hitters in the race, young stud Mike Trout and hall-of-fame surefire Miguel Cabrera. As usual, this discussion of value in baseball spawned chaos in the comments section and lead to a follow-up article.

JoePo breaks down park factors crudely but effectively in stating the following:

But the BASICS of Park Factors are the easiest thing imaginable.

All you do is this:

Step one: You take the average runs scored in a ballpark (both teams).

Step two: You take the average runs scored in that team’s road games (both teams).

Step three: You divide the first total by the second.

And that’s all. Park Factors. There is so much contentiousness about Comerica Park but it’s all simple math. This year, the Tigers have scored 355 runs at Comerica and allowed 275 runs. That’s a total of 630 runs in 67 games — 9.4 runs per game.

Now, this was brought to my attention by Tom Tango and his mention on his blog. Tango also gives some brief qualifiers that are specific to baseball, as well as mentioning the uncertainty of the exact degree that it affects the race. It’s some great stuff for baseball.

It made me start wondering about park conditions in MLS. It’s obvious that pitches play differently and that they come in different shapes and sizes. This can be done a multitude of ways as mention by Poz and described by Basellball-Reference. It’s already been covered one way by Alex Olshansky of Tempo-Free Soccer in an article for StatsBomb a month ago.

However, Alex used Goal differential. Which is fine, if we’re strictly speaking about an individual park leading to more goals, not necessarily home field advantage. I wrote an article just a month ago about home field advantage working off percentage of points won at home versus total points.

Regardless, you can see most teams seem have an advantage playing at home. But I’m interested in how many of these home locations either increase limit or increase shots/goal opportunities. Is there a place in MLS that, due to the dimensions or the crowd or some other outside reason, it is better to be a striker versus a defender? That it limits or maybe creates more goal scoring opportunities?

It’s an interesting question for certain and maybe not even one to limit to goals. Maybe we could open it up to turnovers, as that might give us some indication of the quality of the pitch. These are just some thoughts on a Friday afternoon.

MLS Attack Pairings

Today, I was asked simply, which team has the best pairing in MLS? It’s a good question, and oddly one that I’ve been asked a lot and. Despite the frequency of requests, it’s something that I have trouble answering. There are a lot of ways to measure performance for attacking personnel, but due to my time restraints I found the easiest way to do this was to go to Squawka and use their attack score.

Below is a listing of teams and their two highest* attacking score combos. Since it’s a purely cumulative stat I pro-rated it to 90 minutes. As you probably wouldn’t be shocked to find out. Mike Magee, Landon Donovan and Federico Hinguian round out the top-3.

Player Team Minutes Attack Score AS per 90
Mike Magee Chicago 1051 582 50
Patrick Nyarko Chicago 1554 527 31
Carlos Alvarez Chivas USA 1653 360 20
Eric Avila Chivas USA 1634 260 14
Dillion Powers Colorado 2035 576 25
Deshorn Brown Colorado 1800 448 22
Federico Hinguian Columbus 2142 1162 49
Dominic Oduro Columbus 1987 610 28
Dwayne De Rosario DC United 1208 343 26
Kyle Porter DC United 1403 244 16
Blas Perez FC Dallas 1569 584 33
Michel FC Dallas 2004 538 24
Brad Davis Houston 1408 540 35
Will Bruin Houston 1721 472 25
Landon Donovan LA Galaxy 1380 753 49
Robbie Keane LA Galaxy 1320 698 48
Marco Di Vaio Montreal 1868 897 43
Felipe Martins Montreal 1768 535 27
Diego Fagundez New England 1621 613 34
Lee Nguyen New England 2137 527 22
Thierry Henry New York 1952 854 39
Tim Cahill New York 1761 441 23
Sabastian Le Toux Philadelphia 1864 729 35
Conor Casey Philadelphia 1528 667 39
Darlington Nagbe Portland 1895 761 36
Diego Valeri Portland 2072 725 31
Javier Morales RSL 1796 838 42
Ned Grabavoy RSL 2043 467 21
Chris Wondolowski San Jose 1890 530 25
Shea Salinas San Jose 1400 434 28
Eddie Johnson Seattle 1300 461 32
Obafemi Martins Seattle 1024 448 39
Graham Zusi Sporting KC 1860 680 33
Claudio Bieler Sporting KC 1986 620 28
Jonathan Osorio Toronto FC 1164 397 31
Robert Earnshaw Toronto FC 1495 333 20
Camilo Sanvezzo Vancouver 1674 876 47
Kenny Miller Vancouver 1305 506 35

There are a couple of key individuals missing from this list that may or may not “pop out” at you. The first is Philadelphia’s top goal scorer Jack McInereny. Part of this is due to his missing time with the Mens National Team during the early rounds of the Gold Cup tournament. The other part is that outside of his bunches of goals scored early in the season he hasn’t done much else with his time.

The other name, though less likely to be spotted, is Luis Silva. Since arriving at DC United, he’s posted the top overall score determined by Squawka, as well as the highest rating on Whoscored, with his new club. However, he’s only played 5 games and a total of 420 minutes for DCU, so it’s a small sample and I decided to drop him from the listing. This lowered DC United’s end score rather dramatically and yet corresponds quite well with whatever combination player they might be able to muster.

Now, taking all those dynamic duos and adding them together gave us a combined score of the two best attacking players on each team. Here are those in order.

AS per 90
LA Galaxy 97
Vancouver 82
Chicago 80
Crew 76
Philadelphia 74
Seattle 71
Montreal 70
Portland 68
RSL 63
New York 62
Sporting KC 61
Houston 59
FC Dallas 58
New England 56
San Jose 53
Toronto FC 51
Rapids 48
DC United 41
Chivas 34

It’s not a surprise to see LA at the top of any such list. Robbie Keane and Donovan have long be herald as the best dynamic attacking duo of the league. But if you are looking beyond those two the teams are rather surprising. Vancouver, Chicago, Columbus and Philly all make up the top-5 with the often scrutinized Obafemi Martins and Eddie Johnson contributions falling just outside the grouping.

Another interesting note, taking us further towards the discussion of single best player. While individual performances matter, it’s about team accomplishment rather than singular performances over the stretch of the season. It’s obvious that while Chicago and Columbus both have had outstanding performances from their key men up top, they are lacking something on a team level such that these individual metrics don’t correspond entirely to the tables at the end of the day.

The Chicago Fire and Goal Mouth Data

This is merely a trial run. I say that because in the last two days I’ve limited the collection of data and then expanded it. It comes down to how it tickles my fancy. The data I have collected is limited for the time being to the Chicago Fire as just a means of comparing a club and its data to the league and trying to make sense of it. This hopefully will develop into the means of how I can some how attribute value to clubs and their keepers in the future.

Below is a picture of the goal mouth, and the data has been collected from the website Coupled with a previously built image, you can see how Chicago compares to the rest of the league and how a majority of their goals have been scored this season. While numbers are always an important thing, remember that it’s more about ratios and the average occurrence than pure accumulation at this juncture. Not all teams have played the same amount of games and they haven’t had the same opportunities.  Shots+Goals and visuals


In addition to the Goal Mouth visual, here is a field map diagram as it applies to the dimension of the field. This has already been provided in raw form in the data that Matty has collected and posted in the raw shot data tab, but I wanted to have another visual to compare the above data.


The problem between the two is that there is no correlation between the fact that Chicago has allowed 4 goals in section 5 to the fact that they’ve also allowed 5 goals in SoT1 (for ease of the tally, I gave a numerical designation to each location on the goal mouth; starting at the top and working left to right). This is the next collaborative effort that I’m working on, gathering both the shot location of origin and placement on the goal, and from what specific individual at what time.

This is a very time-intensive task and it’ll probably take me the rest of the week to complete it just for Chicago. However, I’m taking suggestions on how I could compile this data without hand jamming it into a flat file. An SQL dump of the current Opta database for the season would go a long ways to helping compile this data and would be nice. But I’m never above a bit of hard work.


A Visual Look At Shots On Target

This is part of my efforts to try to come up with a zone rating of sorts for goal keepers. The problem I’m running into at the moment is trying to find visual information for shots against. If I want to know how good Dan Kennedy was preventing goals against Columbus in week 1, I have to go to Columbus’ page on Squawka and narrow the shot data to that specific game. Basically, It just boils down to more time digging than I initially planned to devote.

Quickly, here is a visual graphic that I made with the help of Excel. I know it’s not really pretty, but it delivers the data in the manner in which I needed it without getting caught up on eccentric details, details with which I often spend too much time meddling.
Shots+Goals and visuals

There isn’t a lot that this immediately tells you, of course. It’s more of a jumping off point to start comparing data once it is collected. That’s where the next effort is going to be headed. Who are the teams that are above league average and below league average? Are they bleeding low percentage goals, or are they being beat in an unusual zone? This information, while still miles from being complete, moves us in the right direction of knowing more about shots and goals than what we did previously.

You’ll notice that I also included shots that are wide but still close to the post. I’m curious as to whether these shots numbers become inflated when playing teams with “better keepers”. Unfortunately we need to define what better is. Better than what, exactly? I’m not sure. Again, parameters haven’t been set, and data sets are still being gathered.

This is a fun exercise and one that should, if nothing else, provide us with some excellent insight to teams and their seasons at this point.

Comparing Goalkeepers to Pitchers

Cruising around twitter is about the most social I get nowadays. It sounds nerdy, and really it is, but it’s amazing the amount of material that you can discover—not to mention the 140-character conversations you can have—produced by people smarter than me.

Looking around, I stumbled across an article that dates back about 10 days from the site ‘Bring On The Stats‘ by the anonymous author Chase H (aka @chaser_racer32 on twitter). Chase H, goes into a good post about how Sporting Kansas City’s goal keeper, Jimmy Neilsen, is—probably gradually—headed for the decline. He comes to this conclusion by going through save% and shots against per minute. A pretty good tactic that has some good reasoning.

“The table above is sorted by save %, which is pretty self-explanatory; it’s the percentage of shots saved by the keeper. Nielsen has the third-worse save % of all goalkeepers with more than 1400 minutes played. The perfect example of why wins and shutouts are not the best measures for a goalkeeper is the fact that Chivas USA keeper Dan Kennedy has saved a higher percentage of shots than Nielsen, and yet has only recorded 2 shutouts, and the team only has 4 wins. Kennedy has the misfortune of playing for one of the worst teams in the MLS, and he has faced almost 50 more shots than Jimmy Nielsen.

On the flip side, one can argue that because the defense plays so well, generally only the most quality shots make it on goal from the opponent. I do acknowledge that is a very big issue to this study, but to compare Neilsen’s stats from last season with the same defense, we see he saved 74% of the shots he faced while the defense conceded almost exactly the same numbers of shots per minute he played.”

I’m pretty sure I’ve seen the analogy of baseball pitchers compared to goal keepers before—if not from some random person or thing I read, then certainly from Matthias. The point of the comparison being that neither the goalkeeper nor the pitcher really has as much influence on goals allowed or runs scored against them as a lot of traditionalists and general fans believe.

In fact, baseball created an individual stat to track exactly what a pitcher controls, and Fangraphs grades him solely on that stat, “FIP.” The stat has been well-documented and was introduced to the general public by writers much more skilled than myself.

Back in the early 2000s, research by Voros McCracken revealed that the amount of balls that fall in for hits against pitchers do not correlate well across seasons. In other words, pitchers have little control over balls in play. McCracken outlined a better way to assess a pitcher’s talent level by looking at results a pitcher can control: strikeouts, walks, hit by pitches, and homeruns.

Finding some reading material on FIP today, and thinking about our podcast about the possibility of whether keepers influence shots on target, sparked some thoughts following the article by Chase H.

The idea of keepers being analogous to pitchers is all well and good. There are certainly some similarities. The problem I’m starting to have, though, is that there may be a better way of looking at it. Pitchers, while minimally, still control aspects of their performance such as ground ball and fly ball rates, strikeouts and walks. Keepers potentially could influence opponents psychologically, but truly the only physical effect they have at their disposal, prior to the shot, is their positioning. Positioning frequently corresponds to the defensive placement of a keeper’s teammates and the opposition that controls possession.

This isn’t the quiet like-to-like thinking that most jump into. However, I started reading about another baseball statistic and it made me think…

One of the differences between UZR and linear weights is that with UZR, the amount of credit that the fielder receives on each play—positive (if he makes an out) or negative (if he allows a hit or an ROE)—depends on how often that particular kind of batted ball, in terms of its location, speed and several other factors, is fielded by an average fielder at the same position. With offensive linear weights, if a batted ball is a hit or an out, the credit that the batter receives is not dependent on where or how hard the ball was hit, or any other parameters.

Maybe, we (and by we, I mean me) are looking at keepers the wrong way. Just like assuming that keepers have control over wins, shutouts and the like, is it any more responsible to assume that goals scored against them are purely their fault either? I’m talking about save percentage here.

To test this Keeper UZR out, we need to create set of guidelines in the same manner as what has been set out for UZR. There is also the key dependency that we don’t have 6 years worth of data to work from. We barely have3 years of chalkboard data, and if using WhoScored or Squawka, we have even less than that.

The other problem is that we don’t know the speed of the shot, and getting the angle of the shot isn’t necessarily easy either. Not that it’s particularly important. My goal this week is to take the shot data by Squawka and put together a visual representation of the six prominent scoring locations complete with shots saved data associated.


The first thing we need to establish is what are the areas shots are saved the least and how good keepers are at limiting goals they should. This seems rather silly, as I’m sure we can probably already theorize the likely goal-scoring locales as being the outside marks near the post. However, we still need numbers and we still need to know how good teams are at preventing goals that they often should.

Controlling for difficulty of shot on target by location on the frame at least starts to give us an intelligent understanding of what goal keepers are doing right and what they are doing wrong.

ASA Podcast XIX: The one where we talk ad nauseam about Landon Donovan

Okay, so Drew came up lame with a sore throat, leaving Matty and I to fend for ourselves and sending the podcast into a downward dive/30-minute discussion about Landon Donovan. We follow that up with another 30-minute discussion about the CONCACAF Champions League (CCL).

At some point, in perhaps a show of solidarity or more likely a show of that pessimism that could come only from a couple of Mariners fans, we projected good things for each other’s club. I for Portland, and Matty for Seattle.

Despite advertising a 2nd segment as a discussion about MLS front office personnel decisions—regarding past players and whether or not we’ll see any future Ivy League day traders being hired into technical director positions—we had to cut the segment from the podcast due to mass overages in the first segment. We hope to pick up the nerds-vs.-jocks  talk next week.

This is one of our longest running podcasts at 70 minutes, but it’s still a good one.  I hope you enjoy!

ASA Podcast XVIII: The One Where We Discuss Defense Influencing Shots

Family in town, waiting out this baby and being home doing nothing sure has made me lazy. So lazy that I really didn’t get around to editing this and putting it together until last night. My apologies for the late posting.

This week we discuss the USMNT and their romp in eastern Europe, a bit about Montreal and Omar Gonzalez. Then we transition to some discussion about whether goal keepers can influence shots on target. It’s all some interesting stuff with a lot of giggling by me because I coined a new nickname for Drew.


Noisy Finishing Rates

As a supplement to the stabilization analysis I did last week, I wanted to add the self-predictive powers of finishing rates—basically soccer’s shooting percentage. Team finishing rates can be found both on our MLS Tables and in our Shot Locations analysis, so it would be nice to know if we can trust them.

Last week I split the 2012 and 2013 seasons in half and assessed the simple linear relationships for various statistics between the two halves of each season across all 19 teams. Now I have 2011 data, and we can have even more fun. I included bivariate data from both 2011 and 2012 together, leaving out 2013 since it is not over yet. It is important to note that I am not looking across seasons, only within seasons. To the results!

Stat Correlation Pvalue



Total Attempts



Blocked Shots



Shots on Goal






Shots off Goal






Surprisingly, to me at least, a team’s points earned has been the most stable statistic in MLS (by my linear definition of stability). Not so surprising to me was that total attempts is also one of the most stable. Look down at the very bottom, and you’ll find finishing rates. Check out the graph below:

 Finishing Rates Stabilization 2011-2012

Some teams finish really well early in the season, then flop. Others finish poorly, then turn it on. But there’s no obvious to pattern that would allow us to predict second-half finishing rates. In fact, the best prediction for any given team would be to suggest that they will regress to league average, which is exactly what our Luck Table does. It regresses all teams’ finishing rates in each zone back to league averages, then calculates an expected goal differential.

On a side note, you might be asking yourself why I don’t just use points to predict points. Because this: while the correlation between first-half and second-half points is about 0.438, the correlation between first-half attempts ratios and second-half points is slightly stronger at 0.480. Also, in a multiple regression model where I let both first-half attempts ratio and first-half points duke it out, first-half attempts ratio edges out points for winner of the predictor trophy.

Estimate Std. Error T-stat P-value
Intercept 1.7019 5.97 0.285 77.7%
AttRatio 13.7067 6.32 2.17 3.7%
Points 0.3262 0.19 1.691 10.0%

And since this is a post about finishing rates…

Estimate Std. Error T-stat P-value
Intercept -2.243 7.75 -0.29 77.4%
AttRatio 18.570 5.71 3.26 0.3%
Finishing% 63.743 50.08 1.27 21.2%

A good prediction model (on which we are working) will include more than just a team’s attempts ratio, but for now, it is king of the team statistics.

Signal and Noise in MLS

Some Nate Silver guy wrote a whole book about “signal” and “noise” in data, so it must be important, right? Sports produce a lot of statistics, and it turns out that some of those statistics are pretty meaningless—that is, pretty noisy.

A pitcher’s ERA is sitting below 3.00 after eight starts, but he has more walks than strikeouts. Baseball sabermetricians will tell you that the low ERA is mostly noise, but that the high walk rate is a signal for impending doom. An MLS team leads the league in points per match, but it allows more shots than it earns for itself (note: this team is called “Montreal Impact”). Soccer nerds like me will tell you that its position in the standings is mostly noise, and that its low shots ratio is a signal for impending doom—or something worse than first place, anyway.

The reasoning behind both examples above is basically the same. Pitchers’ ERAs, like soccer teams’ points earned, are highly variable and unpredictable, while strikeout-to-walk ratios and shots ratios are more consistent. It’s better to put your money on something consistent and easy to predict, rather than something variable and hard to predict. Duh, right?

So here’s why we like shots data ’round these parts. Below I have provided two charts of MLS data, one from 2012 and one from 2013. I split each season into two parts and then measured the linear predictive power of each stat on itself. Did teams that scored lots of goals early in the season also score lots of goals later in the season? That’s the kind of question answered here.

2012 MLS Stat R2 Pvalue 2013 MLS Stat R2 Pvalue
Blocked Shots 37.1% 0.6% Shots off Goal 34.8% 0.8%
Total Attempts 26.1% 2.5% Total Attempts 34.5% 0.8%
Goals 20.3% 5.3% Shots on Goal 29.4% 1.7%
Points 20.1% 5.5% Points 4.1% 40.7%
Shots on Goal 18.2% 6.9% Blocked Shots 1.7% 60.0%
Shots off Goal 3.6% 43.7% Goals 1.5% 61.6%

As an example of what this means, let’s consider the attempts stat. Remember that an attempt is any effort in the direction of the goal, so basically an attempt is any shot—on target, off target, or blocked. In each of the past two seasons, MLS teams’ attempts totals in the first half of the season were able to help predict their attempts totals in the second half, explaining 26.1% and 34.5% of the variability in second-half attempts, respectively. Those might not seem like high percentages of explanation, but the MLS season is short, and statistically significant predictors are hard to find.

In baseball, such “self-predictors” have been referred to as “stabilization.” Stabilization is important because, as mentioned above, stabilization means that a stat is consistent, and that a team is likely to replicate its results in the future. This MLS season, points earned during the first 10 matches were essentially worthless at predicting points earned in the second 10 games. Even over the 34 games each team played in 2012, the stabilization for points earned was not as strong as that of attempts or goals scored.*

The next step is figuring out what predicts future points earned, since it does a pretty lame job of predicting itself. But I’ll leave that for another post after I have gathered data going back a few more seasons. The number one takeaway here is that some stats can only tell us what happened, but not what will happen. There is another group of stats that are doubly important because they also stabilize—predicting themselves using smaller sample sizes. Those stabilizing stats (like shot attempts) are the signal amid the sea of noise known most places as “football.”

Seattle has only played 21 games, so I cannot do 11-and-11 splits, yet.  Also, as for why shots off goal and blocked shots have essentially switched places, I would wager that’s more due to how they are (somewhat) subjectively categorized, but who knows.