How many wins will it take to win the NFC East?
Not bloody many
Intro
tl;dr: after Week 10, a 6- or fewer-win East champ is about as likely as not.
The NFC East is, like, a historically bad division this year.
Just how few wins will be enough to be clinch the pod’s reserved playoff spot?
I borrowed FiveThirtyEight’s Elo ratings to model it out.
Methods
Overview
I’ll let FiveThirtyEight handle the math behind their Elo methodology. The basics are:
- The difference between two teams’ ratings maps to a probability each will beat the other.
- Ratings update each week based on each game’s results and each team’s win probability.
Starting from the ratings after Week 10, I simulated the rest of the season 10,000 times.
In each simulated week:
- A winner for each game was randomly drawn using the two opponents’ Elo ratings.
- The Elo ratings were updated approximately according to the method 538 described.
After each simulation of the last seven weeks, records in the NFC East were counted up.
Limitations
I’m mostly happy with the quality of this simulation approach.
However, there are a couple features it has to leave out.
- Elo ratings give only the chance of each team being the winner of each game.
- So things that happen in games other than winning and losing can’t be simulated.
First, I assume no (more) ties will occur.
- Elo doesn’t provide a probability that there is a winner, only that each team will be it.
- So I can’t randomly draw a tie as the outcome of any simulated game.
- NFL ties are rare enough that I don’t think this horribly invalidates the results.
Second, I can only update ratings approximately.
- FiveThirtyEight use an advanced Elo system that updates ratings based on the final score.
- But a pair of ratings doesn’t map to a distribution of scores – only to chances of winning.
- So I can’t simulate each game’s core, and thus I can’t feed it into the update step.
- I simply ignore the adjustment for final score in the update calculation.
Third, I use only the “traditional” ratings provided, not the quarterback-adjusted ones.
- I clearly can’t simulate who will be under center each week for each franchise.
Procedure
# load ratings and schedule
ELO <- readxl::read_xlsx(here::here("static", "data", "NFL.xlsx"), "ELO")
ELO <- dplyr::select(ELO, -.data$W, -.data$L, -.data$`T`, -.data$Diff)
ELO <- dplyr::mutate(ELO, bye = 0)
gms <- readxl::read_xlsx(here::here("static", "data", "NFL.xlsx"), "Schedule")
gms <- dplyr::select(gms, -.data$Date, -.data$Time)
gms <- dplyr::group_by(gms, .data$Week)
gms <- dplyr::mutate(gms, game = dplyr::row_number())
gms <- tidyr::gather(gms, "side", "name", .data$Home, .data$Visitor)
gms <- tidyr::nest(gms)
# a function to simulate a week of a season
foo <- function(elo, sch) {
sch <- dplyr::left_join(elo, sch)
sch <- dplyr::group_by(sch, .data$game)
sch <- dplyr::arrange(sch, .data$game, .data$side)
# there is a 33-point bonus for HFA and a 25-pt bonus for coming off a bye
sch <- dplyr::mutate(sch, emf = .data$ELO + 33 * (.data$side == "Home") + 25 * .data$bye)
win <- dplyr::filter(sch, !is.na(.data$game))
win <- dplyr::summarise(win,
edf = purrr::reduce(.data$emf, `-`), # difference in Elo after bonuses
exp = 10^(.data$edf / 400), # expected home wins per visitor win
win = sample(.data$side, 1, prob = c(.data$exp, 1)) # randomly draw a winner
)
sch <- dplyr::left_join(sch, win)
sch <- dplyr::mutate(sch,
exp = ifelse(.data$side == "Home", .data$exp, 1), # expected wins per visitor win
exp = .data$exp / sum(.data$exp), # expected wins per game
win = as.numeric(.data$win == .data$side), # realized wins
del = 20 * (.data$win - .data$exp), # calculate Elo change
del = ifelse(is.na(.data$del), 0, .data$del), # set to 0 for teams on a bye
ELO = .data$ELO + .data$del, # add up
w = .data$w + ifelse(is.na(.data$win), 0, .data$win),
l = .data$l + ifelse(is.na(.data$win), 0, (1 - .data$win)),
bye = as.numeric(is.na(.data$game)) # set bye flag
)
sch <- dplyr::group_by(sch)
dplyr::select(sch, .data$ELO:.data$bye)
}
# recursive function to simulate a multiple-week season
bar <- function(SCH, elo) {
if(length(SCH) < 1) {return(elo)}
baz <- foo(elo, dplyr::first(SCH))
bar(SCH[-1], baz)
}
# run ten thousand simulations
sims <- dplyr::tibble(i = 1:10000)
sims <- dplyr::group_by(sims, .data$i)
sims <- dplyr::group_modify(sims, function(.x, .y) {bar(gms$data, ELO)})
# save simulations
saveRDS(sims, here::here("static", "data", "nfl_sims_10_20.rds"))
# reload saved simulations instead of computing them every time I save the post
sims <- readRDS(here::here("static", "data", "nfl_sims_10_20.rds"))
# pick out the NFC East and record the top team's record
sums <- dplyr::filter(sims, .data$Team %in% c("PHI", "NYG", "WAS", "DAL"))
sums <- dplyr::mutate(sums, pct = .data$w + .data$t / 2)
sums <- dplyr::arrange(sums, .data$pct)
sums <- dplyr::summarise(sums,
W = dplyr::last(.data$w),
L = dplyr::last(.data$l),
D = dplyr::last(.data$t),
P = dplyr::last(.data$pct) / 16,
X = dplyr::last(.data$Team))
## `summarise()` ungrouping output (override with `.groups` argument)
Results
How many champs had at most any number of wins, or at least any number of losses?
sams <- table(sums$W)
sems <- table(sums$L)
sems <- rev(sems)
cumsum(sams)
## 4 5 6 7 8 9 10
## 102 1552 5305 8437 9706 9973 10000
cumsum(sems)
## 12 11 10 9 8 7 6 5
## 14 559 3049 6473 8812 9736 9973 10000
- In 53% of simulations, the division winner had 6 or fewer wins.
- In 16% of simulations, the division winner had 5 or fewer wins!
- In 30% of simulations, the division winner had 10 or more losses.
Note that these results credit the Eagles with the tie they already have.
- That means I can pick out the Eagles’ chances of winning in particular.
- I can’t do this for other teams because I haven’t counted up tiebreakers.
baz <- addmargins(table(sums$X == "PHI", sums$W), 2)
baz
##
## 4 5 6 7 8 9 10 Sum
## FALSE 14 457 1497 1168 375 30 0 3541
## TRUE 88 993 2256 1964 894 237 27 6459
- The Eagles appear about 65% to win the division.
- (Note, 538 currently says 59%. This may point to some imprecision in my simulations!)
print(t(t(baz) / rowSums(t(baz))), digits = 4)
##
## 4 5 6 7 8 9 10 Sum
## FALSE 0.1373 0.3152 0.3989 0.3729 0.2955 0.1124 0.0000 0.3541
## TRUE 0.8627 0.6848 0.6011 0.6271 0.7045 0.8876 1.0000 0.6459
- At any given number of wins, the Eagles are the most likely champion, due to their tie.