It is hard to construct probabilistic models for two-legged, home-and-home series based on a season of games that were all independent of one another (for the most part). And because our data sets from Opta and MLSsoccer.com only go back to 2011, there isn’t much of a sample size to work with come playoff time. Thus I will have to get tricky when trying to construct logical probabilities of victory in these playoff series.
The first thing to point out is that our model is based on regular season games that may or may not act like two-legged playoff series. There is a common belief that the team that plays at home for the second leg has an additional advantage. However, much of that belief likely comes from people like Simon Borg, who likes shitting on data and the presenting it. A reasonable study would need to account for the fact that the team playing at home is probably better. One such study attempted to do so for the UEFA Champion’s League, and found that the additional advantage due to hosting the second game was effectively nothing once team skill was controlled for. However, it should be noted that UEFA Champions League does not always play extra time when aggregate scores are tied.
As Borg notes, 22 of 36 (61.1%) two-legged series in the MLS playoffs have been won by the team that played the second leg at home. However, because home teams tend to be better, much of that is likely due to skill, and not an additional home-field edge. Our models, which don’t give any additional home-boost for second-leg home teams, projected three second-leg home teams to win in the first round top win: Portland with 69 percent, Sporting with 66 percent, and New York with 59 percent. Even factoring in RSL’s 46 percent, the average percent of second-leg home teams expected to win in the first round was almost exactly—you guessed it—60 percent. With the data currently available, we have chosen not to include an additional home boost for second-leg home teams. With that out of the way…moving on!
With two first-leg games down and two to go, we see two favorites in opposite positions. Portland is taking a one-goal lead back home, while Sporting returns to Kansas City facing a one-goal deficit. My method of projecting each team’s probability of winning its series will be derived from the assumption that teams favor a regulation win to a regulation draw on aggregate (and a draw to a loss) with the same weighted preferences as it would have favored those outcomes during the regular season. Thus, for example, I will treat the Portland-Seattle matchup as though Portland has an early lead in a regular-season-type game, and adjust our model’s probabilities according to that one-goal lead.
The probabilities will be adjusted based on some game states research I have been working on. I have shown some nifty graphs below to help us out. The two graphs chart the approximate probability that the home team has of each of the three possible match outcomes based on two things: the goal differential and the minute mark. These graphs were created from game data up through June 8th of this season. The data was smoothed out using a lowess curve.
Portland essentially leads a home match by one goal in the first minute. A league-average team would win this type of match with an estimated 75-percent probability and tie with about 20-percent probability.* Another way to say the same thing is to say that the home team has 3-to-1 (3.00) odds of winning, and 1-to-4 (0.25) odds of tying. Through June 8th of this season, typical home teams won with 46-percent probability (0.85 odds) and tied with 29-percent probability (0.41 odds). Thus I can say that a typical team increases its odds of winning from 0.85 to about 3.00, a factor of 3.53, with an early one-goal lead. Additionally a typical team decreases its odds of tying by a factor of about 1.6 with that one-goal lead.
.Portland’s odds of beating Seattle at home from an even game state are approximately 2.00 (66.7%), and its odds of tying are approximately 0.23 (18.8%). Using the appropriate odds ratios, one might conjecture that the Timbers’ odds of winning this game on aggregate are about 7.06 (87.6%), and it odds of tying this game are 0.14 (12.3%). A tie would essentially result in the coin-flipping grand finale known as penalty kicks, and thus Portland’s chances of a Conference Finals berth are 93.8 percent (.876 + 0.5 x .123).
Instead of going all nutzoid on Sporting KC as I did with Portland, one can trust that I followed the same methodology to arrive at my final conclusion. Sporting’s chances to advanced to the Eastern Conference Finals are about 47.8 percent by this use of odds ratios.** These probabilities will go into the simulation after all first legs are complete to update the overall Cup probabilities.
*Due to a small sample size of plus-one goal differentials in the first 15 minutes of matches, the graph is trying to make us believe that a loss is more probable than a tie, when our logic should allow us to infer that—with a one-goal lead—a draw would be more probable than a loss. Thus I am using the more-stabilized figures around the 40-to-60-minute marks. The even goal differential graph—not shown—as well as the two graphs above suggest that probabilities don’t begin to change all that much until the 60th minute, an interesting topic for another day.
**For those wanting to check my math, I assumed typical home teams in SKC’s position would win with 20-percent, probability and tie with 30-percent probability. SKC’s probabilities against New England in an even game state would be 64-percent and 26-percent for a win and tie, respectively.
3 thoughts on “Two-legged Series Probabilities”
I think it’s very logical that a loss is almost as probable as a draw, even with an early one goal lead.
Think about a theoretical scenario where each team has 10 goals on average. An early lead doesn’t mean much, and the probability of draw is very small.
Now in a more realistic scenario, try and do the same with 1.5 goals on average and you get 19.7% for a draw and 18.2% for a loss.
Are using, like, normal probability theory sort of? Say two teams are expected to score 1.5 goals on average with some standard deviation? The problem is that the game’s outcome is so discrete. Not that you couldn’t still derive usable probabilities from a continuous distribution model, though…it’s a tough one, but I like where your head’s at.
There seems to be some stabilization of regressed probabilities around 40 – 60 minutes, so I went with something close to that. I definitely need to get more data from past seasons and playoffs to get a firmer answer.
Pingback: Sounders Nation