How many wins will it take to win the NFC East?

Not bloody many

Last updated on Nov 6, 2021 5 min read

Intro

tl;dr: after Week 10, a 6- or fewer-win East champ is about as likely as not.

The NFC East is, like, a historically bad division this year.

Just how few wins will be enough to be clinch the pod’s reserved playoff spot?

I borrowed FiveThirtyEight’s Elo ratings to model it out.

Methods

Overview

I’ll let FiveThirtyEight handle the math behind their Elo methodology. The basics are:

The difference between two teams’ ratings maps to a probability each will beat the other.
Ratings update each week based on each game’s results and each team’s win probability.

Starting from the ratings after Week 10, I simulated the rest of the season 10,000 times.

In each simulated week:

A winner for each game was randomly drawn using the two opponents’ Elo ratings.
The Elo ratings were updated approximately according to the method 538 described.

After each simulation of the last seven weeks, records in the NFC East were counted up.

Limitations

I’m mostly happy with the quality of this simulation approach.

However, there are a couple features it has to leave out.

Elo ratings give only the chance of each team being the winner of each game.
So things that happen in games other than winning and losing can’t be simulated.

First, I assume no (more) ties will occur.

Elo doesn’t provide a probability that there is a winner, only that each team will be it.
So I can’t randomly draw a tie as the outcome of any simulated game.
NFL ties are rare enough that I don’t think this horribly invalidates the results.

Second, I can only update ratings approximately.

FiveThirtyEight use an advanced Elo system that updates ratings based on the final score.
But a pair of ratings doesn’t map to a distribution of scores – only to chances of winning.
So I can’t simulate each game’s core, and thus I can’t feed it into the update step.
I simply ignore the adjustment for final score in the update calculation.

Third, I use only the “traditional” ratings provided, not the quarterback-adjusted ones.

I clearly can’t simulate who will be under center each week for each franchise.

Procedure

# load ratings and schedule
ELO <- readxl::read_xlsx(here::here("static", "data", "NFL.xlsx"), "ELO")
ELO <- dplyr::select(ELO, -.data$W, -.data$L, -.data$`T`, -.data$Diff)
ELO <- dplyr::mutate(ELO, bye = 0)

gms <- readxl::read_xlsx(here::here("static", "data", "NFL.xlsx"), "Schedule")
gms <- dplyr::select(gms, -.data$Date, -.data$Time)

gms <- dplyr::group_by(gms, .data$Week)
gms <- dplyr::mutate(gms, game = dplyr::row_number())
gms <- tidyr::gather(gms, "side", "name", .data$Home, .data$Visitor)
gms <- tidyr::nest(gms)

# a function to simulate a week of a season
foo <- function(elo, sch) {
  sch <- dplyr::left_join(elo, sch)
  sch <- dplyr::group_by(sch, .data$game)
  sch <- dplyr::arrange(sch, .data$game, .data$side)
  
  # there is a 33-point bonus for HFA and a 25-pt bonus for coming off a bye
  sch <- dplyr::mutate(sch, emf = .data$ELO + 33 * (.data$side == "Home") + 25 * .data$bye)
  
  win <- dplyr::filter(sch, !is.na(.data$game))
  win <- dplyr::summarise(win, 
    edf = purrr::reduce(.data$emf, `-`),                # difference in Elo after bonuses
    exp = 10^(.data$edf / 400),                         # expected home wins per visitor win
    win = sample(.data$side, 1, prob = c(.data$exp, 1)) # randomly draw a winner
  )
  
  sch <- dplyr::left_join(sch, win)
  sch <- dplyr::mutate(sch, 
          exp = ifelse(.data$side == "Home", .data$exp, 1), # expected wins per visitor win
          exp = .data$exp / sum(.data$exp),                 # expected wins per game
          win = as.numeric(.data$win == .data$side),        # realized wins
          del = 20 * (.data$win - .data$exp),               # calculate Elo change
          del = ifelse(is.na(.data$del), 0, .data$del),     # set to 0 for teams on a bye
          ELO = .data$ELO + .data$del,                      # add up
          w   = .data$w   + ifelse(is.na(.data$win), 0, .data$win),
          l   = .data$l   + ifelse(is.na(.data$win), 0, (1 - .data$win)),
          bye = as.numeric(is.na(.data$game))               # set bye flag
        )
  
  sch <- dplyr::group_by(sch)
  dplyr::select(sch, .data$ELO:.data$bye)
}

# recursive function to simulate a multiple-week season
bar <- function(SCH, elo) {
  if(length(SCH) < 1) {return(elo)}
  
  baz <- foo(elo, dplyr::first(SCH))
  
  bar(SCH[-1], baz)
}

# run ten thousand simulations
sims <- dplyr::tibble(i = 1:10000)
sims <- dplyr::group_by(sims, .data$i)
sims <- dplyr::group_modify(sims, function(.x, .y) {bar(gms$data, ELO)})

# save simulations
saveRDS(sims, here::here("static", "data", "nfl_sims_10_20.rds"))

# reload saved simulations instead of computing them every time I save the post
sims <- readRDS(here::here("static", "data", "nfl_sims_10_20.rds"))

# pick out the NFC East and record the top team's record
sums <- dplyr::filter(sims, .data$Team %in% c("PHI", "NYG", "WAS", "DAL"))
sums <- dplyr::mutate(sums, pct = .data$w + .data$t / 2)
sums <- dplyr::arrange(sums, .data$pct)
sums <- dplyr::summarise(sums,
                         W = dplyr::last(.data$w), 
                         L = dplyr::last(.data$l),
                         D = dplyr::last(.data$t),
                         P = dplyr::last(.data$pct) / 16,
                         X = dplyr::last(.data$Team))

## `summarise()` ungrouping output (override with `.groups` argument)

Results

How many champs had at most any number of wins, or at least any number of losses?

sams <- table(sums$W)
sems <- table(sums$L)
sems <- rev(sems)

cumsum(sams)

##     4     5     6     7     8     9    10 
##   102  1552  5305  8437  9706  9973 10000

cumsum(sems)

##    12    11    10     9     8     7     6     5 
##    14   559  3049  6473  8812  9736  9973 10000

In 53% of simulations, the division winner had 6 or fewer wins.
In 16% of simulations, the division winner had 5 or fewer wins!
In 30% of simulations, the division winner had 10 or more losses.

Note that these results credit the Eagles with the tie they already have.

That means I can pick out the Eagles’ chances of winning in particular.
I can’t do this for other teams because I haven’t counted up tiebreakers.

baz <- addmargins(table(sums$X == "PHI", sums$W), 2)

baz

##        
##            4    5    6    7    8    9   10  Sum
##   FALSE   14  457 1497 1168  375   30    0 3541
##   TRUE    88  993 2256 1964  894  237   27 6459

The Eagles appear about 65% to win the division.
(Note, 538 currently says 59%. This may point to some imprecision in my simulations!)

print(t(t(baz) / rowSums(t(baz))), digits = 4)

##        
##              4      5      6      7      8      9     10    Sum
##   FALSE 0.1373 0.3152 0.3989 0.3729 0.2955 0.1124 0.0000 0.3541
##   TRUE  0.8627 0.6848 0.6011 0.6271 0.7045 0.8876 1.0000 0.6459

At any given number of wins, the Eagles are the most likely champion, due to their tie.

football sports