To cite the hyper2 package in publications, please use
Hankin (2017). This script creates and analyses hyper2 object T20
which is a likelihood function for the strengths of the competitors in
the Indian Premier League. Dataframe T20_table has one row for each
T20 IPL match in the period 2008-2017 with the exception of three
no-result matches and seven tied matches, which were removed.
library("hyper2",quietly=TRUE)
library("magrittr",quietly=TRUE)
T20_table <- read.table("T20.txt",header=TRUE)
head(T20_table)
## team1 team2 toss_winner toss_decision match_winner
## 1 SH RCB RCB field SH
## 2 MI RPS RPS field RPS
## 3 GL KKR KKR field KKR
## 4 RPS KXIP KXIP field KXIP
## 5 RCB DD RCB bat RCB
## 6 GL SH SH field SH
nrow(T20_table)
## [1] 633
Object T20 is a likelihood function for the strengths of the 13
teams, and T20_toss is a likelihood function that also includes a
toss strength term. Some details are given in the package help file
T20.Rd (type ?T20 at the R prompt).
What is the probability of a team winning the match, given that they won the toss?
x <- table(T20_table$toss_winner==T20_table$match_winner)
x
##
## FALSE TRUE
## 308 325
binom.test(x,alternative="less")$p.value
## FALSE
## 0.26242
Binomial test against a null of \(p=0.5\) is thus not significant: there is no evidence that winning the toss increases one’s probability of winning the match. Now we consider the decision (to bat first or to field first) made by the toss winner:
table(T20_table$toss_decision)
##
## bat field
## 272 361
a clear majority elect to field first. We can now ask what the probability of the batting team winning the match is:
attach(T20_table)
bat_first_wins <-
((toss_winner==match_winner)&(toss_decision=='bat' )) |
((toss_winner!=match_winner)&(toss_decision=='field'))
sum(bat_first_wins)
## [1] 284
field_first_wins <-
((toss_winner==match_winner)&(toss_decision=='field' )) |
((toss_winner!=match_winner)&(toss_decision=='bat'))
sum(field_first_wins)
## [1] 349
detach(T20_table)
as a consistency check we have
sum(field_first_wins) + sum(bat_first_wins)
## [1] 633
nrow(T20_table)
## [1] 633
agreeing as expected.
We can do some contingency tables:
M <- table(
decision = T20_table$toss_decision,
won_toss_won_match = T20_table$toss_winner==T20_table$match_winner)
dimnames(M) <- list(
decision=c('bat first','field first'),
won_match =c(FALSE,TRUE))
M
## won_match
## decision FALSE TRUE
## bat first 148 124
## field first 160 201
We may reject homogeneity of proportion (Fisher’s exact test, \(p= 0.013\)). Recalling that a slight majority of toss winners elect to field first (361 out of 633), we see that this is a consistent result.
We can now consider the matches as Bernoulli trials and try to infer the teams’ strengths. The first step is to calculate a likelihood function for the strengths:
attach(lapply(T20_table[, ], as.vector)) # vector access to team1, etc
T20 <- hyper2()
for(i in seq_along(team1)){
T20[match_winner[i]] %<>% inc
T20[c(team1[i],team2[i])] %<>% dec
}
T20
## log(CSK^79 * (CSK + DC)^-10 * (CSK + DD)^-16 * (CSK + KKR)^-16 * (CSK + KTK)^-2 * (CSK +
## KXIP)^-17 * (CSK + MI)^-22 * (CSK + PW)^-6 * (CSK + RCB)^-19 * (CSK + RR)^-17 * (CSK +
## SH)^-6 * DC^29 * (DC + DD)^-11 * (DC + KKR)^-9 * (DC + KTK)^-1 * (DC + KXIP)^-10 * (DC +
## MI)^-10 * (DC + PW)^-4 * (DC + RCB)^-11 * (DC + RR)^-9 * DD^62 * (DD + GL)^-4 * (DD +
## KKR)^-19 * (DD + KTK)^-2 * (DD + KXIP)^-20 * (DD + MI)^-20 * (DD + PW)^-5 * (DD +
## RCB)^-18 * (DD + RPS)^-4 * (DD + RR)^-16 * (DD + SH)^-10 * GL^13 * (GL + KKR)^-4 * (GL +
## KXIP)^-4 * (GL + MI)^-4 * (GL + RCB)^-5 * (GL + RPS)^-4 * (GL + SH)^-5 * KKR^77 * (KKR +
## KTK)^-2 * (KKR + KXIP)^-21 * (KKR + MI)^-21 * (KKR + PW)^-5 * (KKR + RCB)^-20 * (KKR +
## RPS)^-4 * (KKR + RR)^-15 * (KKR + SH)^-12 * KTK^6 * (KTK + KXIP)^-1 * (KTK + MI)^-1 *
## (KTK + PW)^-1 * (KTK + RCB)^-2 * (KTK + RR)^-2 * KXIP^70 * (KXIP + MI)^-20 * (KXIP +
## PW)^-6 * (KXIP + RCB)^-20 * (KXIP + RPS)^-4 * (KXIP + RR)^-15 * (KXIP + SH)^-10 * MI^92 *
## (MI + PW)^-6 * (MI + RCB)^-21 * (MI + RPS)^-6 * (MI + RR)^-16 * (MI + SH)^-10 * PW^12 *
## (PW + RCB)^-5 * (PW + RR)^-5 * (PW + SH)^-2 * RCB^73 * (RCB + RPS)^-4 * (RCB + RR)^-15 *
## (RCB + SH)^-10 * RPS^15 * (RPS + SH)^-4 * RR^63 * (RR + SH)^-7 * SH^42)
and maximization is straightforward:
T20_maxp <- maxp(T20)
dotchart(T20_maxp)
So we can test the null that all the teams have the same strength:
a1 <- maxp(T20,give=TRUE)$value # likelihood at the evaluate
a2 <- loglik(indep(equalp(T20)),T20) # likelihood at p1=p2=...=p14
a1-a2
## [1] 14.491
and the above would correspond to a support \(\Lambda\) of about 14.5, or a likelihood ratio of about \(e^\Lambda\simeq 2\times 10^6\). This is not significant by Edwards’s criterion of two units of support per degree of freedom (we have 13df here). Alternatively, we might observe that the statistic \(2\Lambda\) is in the tail region of its asymptotic distribution \(\chi^2_{13}\) given by Wilks, with a p-value of
pchisq(2*14.5,df=13,lower.tail=FALSE)
## [1] 0.0065459
showing a significant result although frankly Edwards’s criterion is
somewhat flaky in this context. Wilks’s theorem is only asymptotic
and it is not clear that the actual sampling distribution is close to
its asymptotic limit of \(\chi^2\). However, it turns out that the
asymptotic approximation is good in this particular case:
inst/T20_analysis.Rmd presents some analysis (it takes a very
long time to run which is why it is not included in this vignette).
We can now attempt to see if the toss makes a difference:
T20_toss <- hyper2(pnames=c('toss',levels(T20_table$team1)))
for(i in seq_along(team1)){
players <- c(team1[i],team2[i])
if(toss_winner[i] == match_winner[i]){ # win toss, win match
T20_toss[c('toss',match_winner[i])] %<>% inc
} else { # win toss, lose match
T20_toss[match_winner[i]] %<>% inc
}
T20_toss[c('toss',players)] %<>% dec
T20_toss
}
and again we can maximize the likelihood:
T20_maxp_toss <- maxp(T20_toss)
dotchart(T20_maxp_toss)
Above, note the small estimated strength of the toss.
The following lines create T20.rda, residing in the data/
directory of the package.
save(T20_table,T20,T20_maxp,T20_toss,file="T20.rda")
hyper2 Package: Likelihood Functions for Generalized Bradley-Terry Models.” The R Journal 9 (2): 429–39.