the Dixon-Coles model: outsmarting the bookies (maybe?)

The Dixon-Coles model reigns supreme in the realm of football prediction models. While the independent Poisson model serves as a decent foundation, Dixon and Coles, in their seminal 1997 paper,  proposed key refinements to address its inherent shortcomings.

The Draw Conundrum: Unveiling the Bivariate Adjustment

One notable weakness of the independent Poisson model lies in its tendency to underestimate the frequency of low-scoring matches, especially draws. This is where the Dixon-Coles model steps in, introducing a bivariate adjustment to correct this bias. The secret sauce is a parameter denoted as rho (ρ), which dictates the degree of correlation between the probabilities of low-scoring outcomes.

Think of it this way:

  • Positive ρ: Suggests a tendency for matches to result in either high-scoring affairs for both teams or low-scoring encounters for both, implying a positive correlation between the number of goals scored by each team.
  • Negative ρ:  Indicates an inclination towards matches where one team scores high while the other scores low, suggesting a negative correlation.

Dixon and Coles, in their analysis of English football data, discovered a negative ρ value, implying a higher frequency of matches with one-sided scorelines compared to what the independent Poisson model would predict.

The adjustment operates by modifying the probabilities of the four low-scoring scorelines: 0-0, 1-0, 0-1, and 1-1.  The extent of this modification hinges on the value of ρ. When ρ is zero, the Dixon-Coles model simplifies to the independent Poisson model, effectively nullifying the adjustment.

Implementation in Code:

The Python code snippet below demonstrates the implementation of the ρ correction.

def rho_correction(x, y, lambda_x, mu_y, rho):
    if x == 0 and y == 0:
        return 1 - (lambda_x * mu_y * rho)
    elif x == 0 and y == 1:
        return 1 + (lambda_x * rho)
    elif x == 1 and y == 0:
        return 1 + (mu_y * rho)
    elif x == 1 and y == 1:
        return 1 - rho
    else:
        return 1.0

In this code, 'x' and 'y' represent the number of goals scored by the home and away teams, respectively. 'lambda_x' and 'mu_y' are the expected goals for the home and away teams, calculated based on their attack and defense strengths and the home advantage factor.

The Time Factor: Weighing Recent Matches More Heavily

Recognizing that team performance is not static but evolves over time, the Dixon-Coles model incorporates a time decay component. This essentially means giving more weight to recent matches when calculating a team's average goalscoring rate. The rationale is that a team's current form is a more reliable indicator of their future performance than their performance from several months ago.

The weighting function typically used is a negative exponential function, controlled by a parameter xi (ξ). A higher ξ value leads to a steeper decline in the weight assigned to older matches, effectively making the model more responsive to recent form.

Expressing the Time Decay Model Mathematically:

L(α, β, ρ, γ; t) = Π_{k∈K_t} τ(x_k, y_k, ρ) exp(-λ_k) λ_k^(x_k) / x_k! * exp(-μ_k) μ_k^(y_k) / y_k! * φ(t - t_k)

In this equation,  t denotes the time at which we are making predictions, K_t represents the set of matches played before time t, and φ is the weighting function. The other parameters remain the same as in the standard Dixon-Coles model.

Determining the Optimal ξ:

Finding the best value for ξ requires a bit of experimentation. One common approach is to test various ξ values and evaluate the model's predictive accuracy using metrics like the predicted profile log-likelihood or Ranked Probability Scores. The ξ value that yields the highest predictive accuracy is deemed optimal.

Practical Implementations and Limitations

This blog offers detailed R code examples, while the regista provides a convenient way to get started with predictions using the model in R.

However, despite its advancements, the Dixon-Coles model does have limitations. One key point is that it relies on the Poisson distribution to model goal scoring, which might not always accurately capture the complexities of real-world football matches. Additionally, the model assumes that a team's attack and defense strengths remain constant throughout a season, which might not hold true in practice.

Conclusion: A Powerful Tool, but Not a Crystal Ball

The Dixon-Coles model stands as a powerful tool for predicting football match outcomes. Its bivariate adjustment and time decay component address key weaknesses of the independent Poisson model, leading to more accurate predictions.

However, it's important to remember that it's still a statistical model based on assumptions that might not always perfectly reflect reality. As such, while it can provide valuable insights and inform predictions, it's not a foolproof system for guaranteed betting success.

Source: https://www.ajbuckeconbikesail.net/wkpapers/Airports/MVPoisson/soccer_betting.pdf