Happy Thanksgiving, all! We have a Conference Finals-centric podcast concerning the outcomes this past weekend’s matches. Also, there was a little discussion that came out of it concerning formations vs. personnel. It’s just us talking and coming up some random thoughts. It’s not that long, but it’s us and we’re having a good chat… I think Matty at one point insults Sebastián Velásquez‘s hair cut, so there is that.
We’ve shown time and time again how helpful a team’s shot rates are in projecting how well that team is likely to do going forward. To this point, however, data has always been contained in-season, ignoring what teams did in past seasons. Since most teams keep large percentages of their personnel, it’s worth looking into the predictive power of last season.
We don’t currently have shot locations for previous seasons, but we do have general shot data going back to 2011. This means that I can look at all the 2012 and 2013 teams, and how important their 2011 and 2012 seasons were, respectively. Here goes.
First, I split each of the 2012 and 2013 seasons into two halves, calculating stats from each half. Let’s start by leaving out the previous season’s data. Here is the predictive power of shot rates and finishing rates, where the response variable is second-half goal differential.
|Attempt Diff (first 17)||0.14244||0.00%|
|Finishing Diff (first 17)||77.06047||1.18%|
To summarize, I used total shot attempt differential and finishing rate differential from the first 17 games to predict the goal differential for each team in the final 17 games. Also, I controlled for how many home games each team had remaining. The sample size here is the 56 team-seasons from 2011 through 2013. All three variables are significant in the model, though the individual slopes should be interpreted carefully.*
The residual standard error for this model is high at 6.4 goals of differential. Soccer is random, and predicting exact goal differentials is impossible, but that doesn’t mean this regression is worthless. The R-squared value is 0.574, though as James Grayson has pointed out to me, the square root of that figure (0.757) makes more intuitive sense. One might say that we are capable of explaining 57.4 percent of the variance in second-half goal differentials, or 75.7 percent of the standard deviation (sort of). Either way, we’re explaining something, and that’s cool.
But we’re here to talk about the effects of last season, so without further mumbo jumbo, the results of a more-involved linear regression:
|Attempt Diff (first 17)||0.12426||0.03%|
|Attempt Diff (last season)||0.02144||28.03%|
|Finishing Diff (first 17)||93.27359||1.14%|
|Finishing Diff (last season)||72.69412||12.09%|
Now we’ve added teams’ shot and finishing differentials from the previous season. Obviously, I had to cut out the 2011 data (since 2010 is not available to me currently), as well as Montreal’s 2012 season (since they made no Impact in 2011**). This left me with a sample size of 37 teams. Though the residual standard error was a little higher at 6.6 goals, the regression now explained 65.2 percent of the variance in second-half goal differential. Larger sample sizes would be nice, and I’ll work on that, but for now it seems that—even halfway through a season—the previous season’s data may improve the projection, especially when it comes to finishing rates.
But what about projecting outcomes for, say, a team’s fourth game of the season? Using its rates from just three games of the current season would lead to shaky projections at best. I theorize that, as a season progresses, the current season’s data get more and more important for the prediction, while the previous season’s data become relatively less important.
My results were most assuredly inconclusive, but leaned in a rather strange direction. The previous season’s shot data was seemingly more helpful in predicting outcomes during the second half of the season than it was in the first half—except, of course, the first few weeks of the season. Specifically, the previous season’s shot data was more helpful for predicting games from weeks 21 to 35 than it was from weeks 6 to 20. This was true for finishing rates, as well, and led me to recheck my data. The data was errorless, and now I’m left to explain why information from a team’s previous season helps project game outcomes in the second half of the current season better than the first half.
Anybody want to take a look? Here are the results of some logistic regression models. Note that the coefficients represent the estimated change in (natural) log odds of a home victory.
|Weeks 6 – 20||Coefficient||P-value|
|Home Shot Diff||0.139||0.35%|
|H Shot Diff (previous)||-0.073||29.30%|
|Away Shot Diff||-0.079||7.61%|
|A Shot Diff (previous)||-0.052||47.09%|
|Weeks 21 – 35||Coefficient||P-value|
|Home Shot Diff||0.087||19.37%|
|H Shot Diff (previous)||0.181||6.01%|
|Away Shot Diff||-0.096||15.78%|
|A Shot Diff (previous)||-0.181||4.85%|
Later on in the season, during weeks 21 to 35, the previous season’s data actually appears to become more important to the prediction than the current season’s data—both in statistical significance and actual significance. This despite the current season’s shot data being based on an ample sample of at least 19 games (depending on the specific match in the data set). So I guess I’m comfortable saying that last season matters, but I’m still confused—a condition I face daily.
*The model suggests that each additional home game remaining projects a three-goal improvement in differential (3.37, actually). In a vacuum, that makes no sense. However, we are not vacuuming. Teams that have more home games remaining have also played a tougher schedule. Thus the +3.37 coefficient for each additional home game remaining is also adjusting the projection for teams who’s shot rates are suffering due to playing on the road more frequently.
**Drew hates me right now.
Hey, guys… we’re back with better audio quality this week. A big thanks to Drew who put things together last week in my place, and despite technology failing apart around them, Drew and Matty were able to put together a great podcast.
This week on the show we tackle MLS playoffs, CONCACAF and USMNT dealings and then some Transfers/Loan rumors that are out there. It’s a longer podcast, but it’d been a few weeks since we all got together, and things just rolled. I hope you enjoy it.
In the wake of Major League Baseball awarding its MVP to Miguel Cabrera, debates over what “valuable” means have once again flared up. Though soccer and baseball are two incredibly different sports, I think we can apply some of the same logic to both MVP discussions. Major League Soccer has about two weeks remaining before its MVP award is handed out, and we will no doubt encounter many of the same controversies in the soccer blogosphere that appear in baseball every season.
The MVP controversy usually begins with what “valuable” means. I think there’s little doubt in most people’s minds that “valuable” and “skilled” are correlated. The main controversy is how correlated. To some, asking who was the best player in Major League Soccer in 2013 would be equivalent to asking who was the most valuable to his team. To others, there would be some key distinctions, the most common of which is that MVPs must come from teams that reach the post season.
In retort to that thinking, some very astute commenters in a Fangraphs.com forum offered up these nuggets. Hendu for Kutch made the analogy:
“We each want to buy something that costs $1. I’ve got a quarter, 8 nickels, and 10 pennies. My ‘team’ of coins is worth 75 cents and falls short of being able to buy the item. You have one dime and 18 nickels. Your ‘team’ is worth $1, and you successfully buy the item. Is your dime more valuable than my quarter simply because it led to a successful item purchase?”
Mike Trout = quarter and Miguel Cabrera = dime, for those of you not so into baseball, and the question is a good one. Few would argue that the dime is more valuable than the quarter just because it found itself in a position to help buy that scrumptious Twix.
In reply to someone arguing that the quarter had no value because it didn’t lead to the purchase of a desired item, BIP and ndavis910 then chimed in:
“Except not everything costs $1, and at any rate, you would always choose the quarter over the dime when accumulating money for a purchase.”
“Especially when you don’t know the cost of the items until you get to the store. In baseball, a team cannot be sure how many wins it will take to reach the playoffs until the last day of the season. In your example, the quarter is the most valuable piece regardless of whether or not the item cost $1 or $0.75.”
When thinking about attributing value to players like Marco Di Vaio, Mike Magee, Camilo Sanvezzo, Robbie Keane and company, why should it matter where their teams finished? If one believes that Magee, for instance, is the best player in MLS, then does it matter if he took his team from 39 points to 49, versus from 40 points to 50? Either way, it’s still ten points of value in the standings. When Magee was traded to Chicago, neither Chicago nor Magee knew that the Fire was going to need 50 points to make the playoffs. The fact that they got just 49 points shouldn’t negate any of Magee’s value.
If you say that it matters because MLS clubs get real value from extra playoff games, then think about this. Playoff cutoff lines are quite arbitrary. If MLS allowed only the top two teams in from each conference—not completely unreasonable for a league of just 19 teams—then none of the players mentioned above would be considered under this playoffs requirement. Playoffs represent an arbitrary bar that the players competing for the award don’t get to set, and while reaching the playoffs does bring the team measurable revenue and value, basing an award on something outside an individual’s control would, in my opinion, strip the award of its intended meaning and purpose.
Now let’s anticipate the logical counterargument—that players pick up their games in playoff races and play well when it matters most.
For a moment, let’s ignore the fact that little evidence has ever been found in professional sports that players can turn it on and turn it off as needed. This past season, Magee scored seven goals in Chicago’s final nine games, a stretch in which the team averaged 1.56 points per match. That represents a pace that would have gotten the Fire into the playoffs if maintained for the entire season. Di Vaio scored five goals in his last 10 games—I even included that tenth-to-last game in which he scored two goals—in a stretch where Montreal tallied just 0.7 points per match, limping into the playoffs on a tie-breaker with Chicago. Just because one team makes the playoffs doesn’t mean its best player was at his peak when it mattered. Goals are, admittedly, a narrow-minded way to measure a striker’s value, but I think the point is still valid.
For me, the Magee-Di Vaio example above may have been no more than an exercise in confirmation bias. I chose to see what I already believed. However, the logic behind the belief that team standings shouldn’t matter to players’ MVP merits is still good stuff, and transcends any biased example I can come up with.
If we’re ready to agree that that the MVP award should essentially be given to the best overall player, then we still have a tall task ahead of us. How do we measure skill on the soccer field? That is the 64,000-dollar question, and one we hope to help tackle here at ASA some day. But perhaps it’s not so crazy to think that a guy like Federico Higuain is deserving of the MVP award. If you scoff at that notion, you likely do so because you’ve been trained to think about MVP awards in a certain way.
We’re all about re-thinking things around here.
This week we talked about how cool and hip we are, followed by a discussion of the first legs of the MLS Cup semifinals. We continued with potential changes to MLS’ CONCACAF Champions League births, Klinsmann’s 23 man roster for the upcoming friendlies versus Scotland and Austria, and the top 50 players in MLS by pass completion percentage. We concluded with a discussion of burritos and proper burrito folding practices.
A look at the 4-2 scoreline may give the appearance that Real Salt Lake shredded Portland’s defense in an wide-open free-for-all. On the contrary, two of RSL’s goals came directly from corner kicks, while a third was courtesy of the generosity and stone touch of Futty Danso (who was also marking Schuler on RSL’s first goal). Credit should of course go to Salt Lake for piling on the pressure, but what really characterized Real Salt Lake’s play on Sunday was not a free-flowing attack, but rather excellent team defense and a commitment to attacking via the flanks.
No Space for Portland
Throughout the match, Real Salt Lake’s defensive shape remained resolute, and never came close to being broken down by Portland’s 4-3-3. Kyle Beckerman was, as ever, the linchpin of RSL’s midfield, leading the team in aerial duels won with 6 (of 7) and tackles (4, tied with Tony Beltran), and contributing 6 clearances. However, the incessant pressure of Sebastian Velazquez and Luis Gil—who it should be noted are 19 and 22 years old, respectively—along with the fullback pairing of Beltran (who led RSL in touches with 76) and Chris Wingert/Lovel Palmer, never allowed any space for Diego Valeri or Darlington Nagbe to work their magic in the midfield. Many of Portland’s forays into the penalty area stemmed from Rodney Wallace collecting the ball in wide positions and sending in listless crosses (0-for-6) that were easily dealt with by Nat Borchers. Forward Ryan Johnson was kept in check all game, limited to a mere 18 touches in his 59 minutes on the field.
The entirety of Portland’s productive offensive output consisted of Will Johnson’s free kick goal, Piquionne’s soaring headed goal, and a 77th minute shot from Alhassan after a slick dribbling spell through the heart of RSL’s midfield. For the entire game, Portland had only two successful dribbles and three successful crosses in the attacking third (one of which was Jewsbury’s beautiful assist).
Defending from the Front
The only change in the starting lineup for Real Salt Lake to start the game was Devon Sandoval replacing an ailing Alvaro Saborio. While few would argue that Sandoval is the better player, his kinetic style, defensive workrate, and ability to get into wide spaces provided problems for the Great Wall of Gambia.
Chalkboards of Devon Sandoval vs. Portland (left) and Alvaro Saborio vs. Los Angeles (right)
As you can see, the defense starts from the front. Sandoval pressured wide all game long, trying to disrupt Portland’s rhythm in the defensive half of the field. Of Sandoval’s 43 actions against Portland, only 11 (25.6%) took place in the center third of the field, compared to 15 of 28 (53.6%) for Saborio against Los Angeles. Sandoval also pressured back more than Saborio did: 8 of 43 (18.6%) actions by Sandoval took place in RSL’s half of the field, compared to a meager 2 of 28 for Saborio (7.1%).
Stretching the Diamond
What really stuck out about the way that Real Salt Lake played, however, was the way that their midfield “diamond” stretched from touchline-to-touchline, with Velazquez manning the left, Gil hugging the right, and Morales drifting from side-to-side, looking for an inch of space wherever he could find it.
Here is a chalkboard of passes attempted by Real Salt Lake, along with the percentage of passes attempted from each section of the field:
And here are all of the passes attempted by Portland, along with the percentage breakdown:
Real Salt Lake attempted only 13.6% of their passes from the central attacking portions of the field, while 64.3% of their passes came from the wide attacking areas. Portland, by contrast, attempted 18.9% of their passes from the central areas, and 58.6% of their passes coming from the wide attacking zones.
RSL ratio of wide-attacking passes to central-attacking passes: 4.73-to-1
POR ratio of wide-attacking passes to central-attacking passes: 3.10-to-1
Real Salt Lake took their chances against Portland’s flank defense rather than try to fight through Will Johnson and Diego Chara. The gambit worked well, as all eight of RSL’s key passes and assists came from wide positions.
Three questions for leg 2 in Portland:
1. Will Saborio be healthy? If so, Sandoval will likely see the bench again as Findley’s speed will serve as an outlet against a high-pressing, possibly desperate Timbers squad, unless…
2. Kreis opts for the 4-2-3-1? Beckerman and Yordany Alvarez were deployed in a double pivot at Los Angeles a few weeks ago, and while the results were not exactly convincing, it perhaps implies (or at least I’m inferring) that Kreis may want to take a more conservative approach on the road in the playoffs.
3. Ryan Johnson or Frederic Piquionne? Ryan Johnson has put in a workmanlike effort thus far in the playoffs, but with his playing time diminishing each game (83 min @ SEA, 69 min v SEA, 59 min @ RSL) and Piquionne finally healthy (and able to leap clear over Nat Borchers), it may be time for Piquionne to crack the starting lineup.
Though our game states data set doesn’t yet include all of 2013, it still includes 137 games. In those 137 games, only five home teams ever went down three goals, and all five teams lost. There were 24 games in which the home team went down two goals, with only one winner (4.2%) and five ties (20.8%). The sample of two-goal games perhaps gives a little hope to the Timbers, but these small sample sizes lend themselves to large margins of error.
It is also important to note that teams that go down two goals at home tend to be bad teams—like Chivas USA, which litters that particular data set. None of the five teams that ever went down three goals at home made the playoffs this year. Only seven of the 24 teams to go down two goals at home made it to the playoffs. Portland is a good team. Depending on your model of preference, the Timbers are somewhere in the top eight. So even if those probabilities up there hypothetically had small margins of error, they still wouldn’t necessarily apply to the Timbers.
Oh, and while we’re talking about extra variables, in those games the teams had less time to come back. To work around these confounding variables, I consulted a couple models, and I controlled for team ability using our expected goal differential. Here’s what I found.
A logistic model suggests that, for each goal of deficit early in a match, the odds of winning are reduced by a factor of about two or three. A tie, though, would also allow Portland to play on. A home team’s chances winning or tying fall from about 75 percent in a typical game that begins zero-zero, to about 25 percent being down two goals. Down three goals, and that probability plummets to less than 10 percent. But using this particular logistic regression was dangerous, as I was forced to extrapolate for situations that never happen during the regular season—starting a game from behind.
So I went to a linear model. The linear model expects Portland to win by about 0.4 goals. 15.5 percent of home teams in our model were able to perform at least 1.6 goals above expectation, what the Timbers would need to at least force a draw in regulation. Only 4.6 percent of teams performed 2.6 goals above expectation. If we just compromise between what the two models are telling us, then the Timbers probably have about a 20-percent chance to pull off a draw in regulation. That probability would have been closer to five percent had Piquionne not finished a beautiful header in stoppage time.