To cite the hyper2 package in publications, please use Hankin (2017). This script creates and analyses hyper2 object T20 which is a likelihood function for the strengths of the competitors in the Indian Premier League. Dataframe T20_table has one row for each T20 IPL match in the period 2008-2017 with the exception of three no-result matches and seven tied matches, which were removed.

library("hyper2",quietly=TRUE)
library("magrittr",quietly=TRUE)
T20_table <- read.table("T20.txt",header=TRUE)
head(T20_table)

##   team1 team2 toss_winner toss_decision match_winner
## 1    SH   RCB         RCB         field           SH
## 2    MI   RPS         RPS         field          RPS
## 3    GL   KKR         KKR         field          KKR
## 4   RPS  KXIP        KXIP         field         KXIP
## 5   RCB    DD         RCB           bat          RCB
## 6    GL    SH          SH         field           SH

nrow(T20_table)

## [1] 633

Object T20 is a likelihood function for the strengths of the 13 teams, and T20_toss is a likelihood function that also includes a toss strength term. Some details are given in the package help file T20.Rd (type ?T20 at the R prompt).

0.1 One-way univariate analysis

What is the probability of a team winning the match, given that they won the toss?

x <- table(T20_table$toss_winner==T20_table$match_winner)
x

## 
## FALSE  TRUE 
##   308   325

binom.test(x,alternative="less")$p.value

##   FALSE 
## 0.26242

Binomial test against a null of \(p=0.5\) is thus not significant: there is no evidence that winning the toss increases one’s probability of winning the match. Now we consider the decision (to bat first or to field first) made by the toss winner:

table(T20_table$toss_decision)

## 
##   bat field 
##   272   361

a clear majority elect to field first. We can now ask what the probability of the batting team winning the match is:

attach(T20_table)
bat_first_wins <- 
((toss_winner==match_winner)&(toss_decision=='bat'  )) | 
((toss_winner!=match_winner)&(toss_decision=='field'))
sum(bat_first_wins)

## [1] 284

field_first_wins <- 
((toss_winner==match_winner)&(toss_decision=='field'  )) | 
((toss_winner!=match_winner)&(toss_decision=='bat'))
sum(field_first_wins)

## [1] 349

detach(T20_table)

as a consistency check we have

sum(field_first_wins) + sum(bat_first_wins)

## [1] 633

nrow(T20_table)

## [1] 633

agreeing as expected.

0.2 Two-way contingency tables

We can do some contingency tables:

M <- table(
decision           = T20_table$toss_decision,
won_toss_won_match = T20_table$toss_winner==T20_table$match_winner)
dimnames(M) <- list(
decision=c('bat first','field first'),
won_match =c(FALSE,TRUE))
M

##              won_match
## decision      FALSE TRUE
##   bat first     148  124
##   field first   160  201

We may reject homogeneity of proportion (Fisher’s exact test, \(p= 0.013\)). Recalling that a slight majority of toss winners elect to field first (361 out of 633), we see that this is a consistent result.

1 Strengths of the individual teams

We can now consider the matches as Bernoulli trials and try to infer the teams’ strengths. The first step is to calculate a likelihood function for the strengths:

attach(lapply(T20_table[, ], as.vector))  # vector access to team1, etc
T20 <- hyper2()
for(i in seq_along(team1)){
   T20[match_winner[i]] %<>% inc
   T20[c(team1[i],team2[i])] %<>% dec
                          
}              
T20

## log(CSK^79 * (CSK + DC)^-10 * (CSK + DD)^-16 * (CSK + KKR)^-16 * (CSK + KTK)^-2 * (CSK +
## KXIP)^-17 * (CSK + MI)^-22 * (CSK + PW)^-6 * (CSK + RCB)^-19 * (CSK + RR)^-17 * (CSK +
## SH)^-6 * DC^29 * (DC + DD)^-11 * (DC + KKR)^-9 * (DC + KTK)^-1 * (DC + KXIP)^-10 * (DC +
## MI)^-10 * (DC + PW)^-4 * (DC + RCB)^-11 * (DC + RR)^-9 * DD^62 * (DD + GL)^-4 * (DD +
## KKR)^-19 * (DD + KTK)^-2 * (DD + KXIP)^-20 * (DD + MI)^-20 * (DD + PW)^-5 * (DD +
## RCB)^-18 * (DD + RPS)^-4 * (DD + RR)^-16 * (DD + SH)^-10 * GL^13 * (GL + KKR)^-4 * (GL +
## KXIP)^-4 * (GL + MI)^-4 * (GL + RCB)^-5 * (GL + RPS)^-4 * (GL + SH)^-5 * KKR^77 * (KKR +
## KTK)^-2 * (KKR + KXIP)^-21 * (KKR + MI)^-21 * (KKR + PW)^-5 * (KKR + RCB)^-20 * (KKR +
## RPS)^-4 * (KKR + RR)^-15 * (KKR + SH)^-12 * KTK^6 * (KTK + KXIP)^-1 * (KTK + MI)^-1 *
## (KTK + PW)^-1 * (KTK + RCB)^-2 * (KTK + RR)^-2 * KXIP^70 * (KXIP + MI)^-20 * (KXIP +
## PW)^-6 * (KXIP + RCB)^-20 * (KXIP + RPS)^-4 * (KXIP + RR)^-15 * (KXIP + SH)^-10 * MI^92 *
## (MI + PW)^-6 * (MI + RCB)^-21 * (MI + RPS)^-6 * (MI + RR)^-16 * (MI + SH)^-10 * PW^12 *
## (PW + RCB)^-5 * (PW + RR)^-5 * (PW + SH)^-2 * RCB^73 * (RCB + RPS)^-4 * (RCB + RR)^-15 *
## (RCB + SH)^-10 * RPS^15 * (RPS + SH)^-4 * RR^63 * (RR + SH)^-7 * SH^42)

and maximization is straightforward:

T20_maxp <- maxp(T20)
dotchart(T20_maxp)

So we can test the null that all the teams have the same strength:

a1 <- maxp(T20,give=TRUE)$value        # likelihood at the evaluate
a2 <- loglik(indep(equalp(T20)),T20)  # likelihood at p1=p2=...=p14
a1-a2

## [1] 14.491

and the above would correspond to a support \(\Lambda\) of about 14.5, or a likelihood ratio of about \(e^\Lambda\simeq 2\times 10^6\). This is not significant by Edwards’s criterion of two units of support per degree of freedom (we have 13df here). Alternatively, we might observe that the statistic \(2\Lambda\) is in the tail region of its asymptotic distribution \(\chi^2_{13}\) given by Wilks, with a p-value of

pchisq(2*14.5,df=13,lower.tail=FALSE)

## [1] 0.0065459

showing a significant result although frankly Edwards’s criterion is somewhat flaky in this context. Wilks’s theorem is only asymptotic and it is not clear that the actual sampling distribution is close to its asymptotic limit of \(\chi^2\). However, it turns out that the asymptotic approximation is good in this particular case: inst/T20_analysis.Rmd presents some analysis (it takes a very long time to run which is why it is not included in this vignette).

2 The effect of the toss

We can now attempt to see if the toss makes a difference:

T20_toss  <- hyper2(pnames=c('toss',levels(T20_table$team1)))
for(i in seq_along(team1)){
      players <- c(team1[i],team2[i])
      if(toss_winner[i] == match_winner[i]){ # win toss, win match
         T20_toss[c('toss',match_winner[i])] %<>% inc
      } else { # win toss, lose match
         T20_toss[match_winner[i]] %<>% inc
      }          
   T20_toss[c('toss',players)] %<>% dec
   T20_toss
}

and again we can maximize the likelihood:

T20_maxp_toss <- maxp(T20_toss)
dotchart(T20_maxp_toss)

Above, note the small estimated strength of the toss.

Package dataset

The following lines create T20.rda, residing in the data/ directory of the package.

save(T20_table,T20,T20_maxp,T20_toss,file="T20.rda")

References

Hankin, R. K. S. 2017. “Partial Rank Data with the hyper2 Package: Likelihood Functions for Generalized Bradley-Terry Models.” The R Journal 9 (2): 429–39.