(takes about ten minutes to run without cache)

Abstract

To cite the hyper2 package in publications, please use Hankin (2017). Here I use the hyper2 package to quantify the strength of a circular rock-paper-scissors effect using the hyper2 package, improving on previous results (West and Hankin 2008) which merely established its existence. I use two datasets, a three-player scoreline and a synthetic dataset referring to wear on a seven-tooth gear wheel.

1 Introduction

Consider the following dataset:

chess_table <- as.matrix(read.table("chess.txt"))
dimnames(chess_table) <- list(
    match=c("T_vs_A","A_vs_K","T_vs_K"),
    player=c("Topalov","Anand","Karpov"))
chess_table

##         player
## match    Topalov Anand Karpov
##   T_vs_A      22    13     NA
##   A_vs_K      NA    23     12
##   T_vs_K       8    NA     10

This is documented in chess.Rd and shows that Topalov played Anand \(22+13=35\) times, winning 22 and losing 13 matches. The corresponding likelihood function is chess:

chess <- saffy(chess_table)
chess

## log(Anand^36 * (Anand + Karpov)^-35 * (Anand + Topalov)^-35 * Karpov^22 * (Karpov +
## Topalov)^-18 * Topalov^30)

chess_maxp <- maxp(chess)
chess_maxp

## Topalov   Anand  Karpov 
## 0.40361 0.34052 0.25587

(we refer to this as the “TAK” model). We might ask whether there is evidence for the players having differing strengths:

equalp.test(chess)

## 
##  Constrained support maximization
## 
## data:  chess
## null hypothesis: Topalov = Anand = Karpov
## null estimate:
## Topalov   Anand  Karpov 
## 0.33333 0.33333 0.33333 
## (argmax, constrained optimization)
## Support for null:  -60.997 + K
## 
## alternative hypothesis:  sum p_i=1 
## alternative estimate:
## Topalov   Anand  Karpov 
## 0.40361 0.34052 0.25587 
## (argmax, free optimization)
## Support for alternative:  -60.062 + K
## 
## degrees of freedom: 2
## support difference = 0.93521
## p-value: 0.3925

Thus we fail to reject \(H_0\colon p_T=p_A=p_K\) under T-A-K as the p-value is about 0.39.

2 RPS in chess

The existence of a rock-paper-scissors relationship between the players was discussed in Hankin (2010) and the null of no RPS phenomena could be rejected using the Alymer test.

Here we improve on that work by introducing a monster, M, that accounts for the rock-paper-scissors nature of the players. The RPS monster helps players that benefit from the RPS nature of the result. The simplest nontrivial probability model is that all three real players have the same strength: \(p_T=p_A=p_K\), which we write \(T=A=K\). Under \(T=A=K\), writing \(p_T=p_A=p_K=\alpha\) we can use a binomial test to calculate the p-value for the hypothesis that \(p_M=0\):

binom.test(sum(diag(chess_table)),sum(chess_table,na.rm=TRUE),alternative="greater")

## 
##  Exact binomial test
## 
## data:  sum(diag(chess_table)) and sum(chess_table, na.rm = TRUE)
## number of successes = 55, number of trials = 88, p-value = 0.012
## alternative hypothesis: true probability of success is greater than 0.5
## 95 percent confidence interval:
##  0.53218 1.00000
## sample estimates:
## probability of success 
##                  0.625

Above we see a p-value of about 1.2%, sufficient to reject the null of zero monster strength¹.

Estimating \(M\), the strength of the monster, is also straightforward. First we consider the case where the probability of a procyclic win is known to be \(p\) (a “procyclic win” is a win that would be expected under a strong monster).

\[ \operatorname{Prob}(\mathrm{procyclic}\ \mathrm{win})=\frac{\alpha+M}{2\alpha+M} \]

Here we also have \(3\alpha+M=1\), so to estimate the strength of the monster we would solve

\[ \frac{\alpha+M}{2\alpha+M}=p,\qquad{3\alpha + M}=1 \]

for \(\alpha\) and \(M\), where \(p\) is the probability of a procyclic win. This gives us

\[ \alpha=\frac{1-p}{2-p},\qquad M=\frac{2p-1}{2-p}. \]

If we use the maximum likelihood estimate value for a procyclic win of \(\frac{55}{88}=0.625\) we find that \(\hat{M}=0.18\) approximately. Further, we may express \(p\), the probability of a pro-cyclic win, as \(p=\frac{1+2M}{2+M}\). If we observe \(a\) procyclic wins and \(b\) countercyclic wins a likelihood function for \(M\) would then be

\[ \mathcal{L}(M)= p^a(1-p)^b=\left(\frac{1+2M}{2+M}\right)^a\cdot \left(\frac{1-M}{2+M}\right)^b \]

and here we have \(a=22+23+10=55\) and \(b=13+12+8=33\) (a graph is given below). It is straightforward but not trivial to translate this into hyper3 idiom. Noting that \(3\alpha+M=1\) we define \(\beta=3\alpha\) and then the unit sum constraint is \(\beta+M=1\) and our probabilities are

\[ \frac{\alpha+M;\alpha}{2\alpha+M } = \frac{\beta/3+M;\beta/3}{2\beta/3+M} = \frac{1+2M;1-M}{2+M} \]

(where the last expression is obtained using \(\beta+M=1\)). We may set up a log-likelihood function of \(a,b\) using hyper3 idiom as follows:

TqAqK <- function(a,b){
  H3 <- hyper3(pnames=c("beta","M"))
  H3[c(beta=1/3,M=1)] %<>% inc(a  )
  H3[c(beta=1/3    )] %<>% inc(  b)
  H3[c(beta=2/3,M=1)] %<>% dec(a+b)
  return(H3)
}
H3 <- TqAqK(55,33)
loglik(c(M=0.1,beta=0.9),H3)

## [1] -58.74

loglik(c(beta=0.9,M=0.1),H3)  #  NB: identical!

## [1] -58.74

[above, we note that \(\beta+M=3\alpha+M=1\), so the monster is the fillup value with respect to \(\beta\)]. This is easy to maximize:

optimize(function(M){loglik(c(M=M,beta=1-M),H3)},c(0.01,0.99),maximum=TRUE)

## $maximum
## [1] 0.18182
## 
## $objective
## [1] -58.218

This gives maximum likelihood estimates of \(M=0.18\) as would be expected from the simple binomial model; the probability of a procyclic win is about \(0.625\).

We may relax the assumption of equal strength of the real players by using a more sophisticated hyper2 construction:

H <- hyper2()
H[c("Topalov",                 "M")] %<>% inc(22   )
H[c(          "Anand"             )] %<>% inc(   13)
H[c("Topalov","Anand",         "M")] %<>% dec(22+13)

H[c(          "Anand",         "M")] %<>% inc(   23)
H[c(                  "Karpov"    )] %<>% inc(   12)
H[c(          "Anand","Karpov","M")] %<>% dec(23+12)

H[c("Topalov"                     )] %<>% inc( 8   )
H[c(                  "Karpov","M")] %<>% inc(   10)
H[c("Topalov",        "Karpov","M")] %<>% dec( 8+10)
H

## log(Anand^13 * (Anand + Karpov + M)^-35 * (Anand + M)^23 * (Anand + M + Topalov)^-35 *
## Karpov^12 * (Karpov + M)^10 * (Karpov + M + Topalov)^-18 * (M + Topalov)^22 * Topalov^8)

(in the above, a win by a player helped by the RPS effect has M added to his strength). We can call this the TAKM model, and find the evaluate:

maxp(H)

##   Anand  Karpov       M Topalov 
## 0.28457 0.23384 0.16362 0.31797

We see that, under TAKM, the monster has strength of about 0.16, slightly less than the 0.18 obtained for \(M\) under the equality assumption \(p_T=p_A=p_K\). Further, under TAKM, the three “real” players are again all of approximately equal strength, and we may test this:

samep.test(H,c("Topalov","Anand","Karpov"))

## Warning in optim(theta.old, fun, gradient, control = control, method = method, : one-dimensional optimization by Nelder-Mead is unreliable:
## use "Brent" or optimize() directly
## Warning in optim(theta.old, fun, gradient, control = control, method = method, : one-dimensional optimization by Nelder-Mead is unreliable:
## use "Brent" or optimize() directly

## 
##  Constrained support maximization
## 
## data:  H
## null hypothesis: Topalov = Anand = Karpov
## null estimate:
##   Anand  Karpov       M Topalov 
## 0.27270 0.27270 0.18189 0.27270 
## (argmax, constrained optimization)
## Support for null:  -58.218 + K
## 
## alternative hypothesis:  sum p_i=1 
## alternative estimate:
##   Anand  Karpov       M Topalov 
## 0.28457 0.23384 0.16362 0.31797 
## (argmax, free optimization)
## Support for alternative:  -57.957 + K
## 
## degrees of freedom: 2
## support difference = 0.26042
## p-value: 0.77073

(note the estimated monster strengths of 0.16 and 0.18 agree with the previous estimate). Thus the p-value of about 0.77 indicates failure to reject the null (compare 0.39 under TAK). Now we can test \(H_0\colon p_M=0\) under TAKM:

specificp.gt.test(H,"M",0)

## 
##  Constrained support maximization
## 
## data:  H
## null hypothesis: sum p_i=1, M <= 0 (notional)
## null estimate:
##      Anand     Karpov          M    Topalov 
## 3.3334e-01 3.3333e-01 9.9990e-06 3.3332e-01 
## (argmax, constrained optimization)
## Support for null:  -60.997 + K
## 
## alternative hypothesis:  sum p_i=1 
## alternative estimate:
##   Anand  Karpov       M Topalov 
## 0.28457 0.23384 0.16362 0.31797 
## (argmax, free optimization)
## Support for alternative:  -57.957 + K
## 
## degrees of freedom: 1
## support difference = 3.0396
## p-value: 0.013678 (one-sided)

(in the test above, note that the players may appear in a different order from previously). The test shows that we may reject the hypothesis that \(p_M=0\) under TAKM with a p-value of \(0.04\) (compare 0.012 from the binomial test).

3 Profile likelihood

We may plot a profile likelihood curve for \(M\):

M <- seq(from=0.001,to=0.4,len=40)
L1 <- profsupp(H,"M",M)
L1 <- L1-max(L1)

L2 <- 55*log(1+2*M)+33*log(1-M)-88*log(2+M)
L2 <- L2-max(L2)

plot(M,L1,type="l",col="black",ylab="log likelihood",xlab="Monster strength")
points(M,L2,type="l",col="red")
abline(h=c(0,-2),lty=2)
legend("topright",col=c("black","red"),lty=1,legend=c("TAKM","T=A=K"))

Above we see support curves for RPS monster strength using TAKM and \(T=A=K\), showing very similar results. Note that the T=A=K curve could have been obtained more nicely with function TqAqK but I haven’t got round to changing it yet.

4 Generalized RPS

West and Hankin (2008) consider a synthetic dataset referring to seven teeth of a gear wheel. Consecutive pairs are repeatedly compared and the most worn tooth is nominated by independent judges. Here a dataset of the same form will be used, but the entries have been changed for pedagogical reasons:

hw <- matrix(c(
    16,05,NA,NA,NA,NA,NA,
    NA,06,09,NA,NA,NA,NA,
    NA,NA,05,11,NA,NA,NA,
    NA,NA,NA,06,12,NA,NA,
    NA,NA,NA,NA,05,09,NA,
    NA,NA,NA,NA,NA,04,07,
    13,NA,NA,NA,NA,NA,05),
    7,7,byrow=TRUE)
colnames(hw) <- paste("t",1:7,sep="")
hw

##      t1 t2 t3 t4 t5 t6 t7
## [1,] 16  5 NA NA NA NA NA
## [2,] NA  6  9 NA NA NA NA
## [3,] NA NA  5 11 NA NA NA
## [4,] NA NA NA  6 12 NA NA
## [5,] NA NA NA NA  5  9 NA
## [6,] NA NA NA NA NA  4  7
## [7,] 13 NA NA NA NA NA  5

Thus, considering the top row, we see that in a total of \(16+5=21\) independent comparisons between t1 and t2, tooth t1 was judged to be the more worn one 16 times, and t2 was judged to be the more worn one \(5\) times (NB: in the above, I have changed the dataset in order to make t1 stronger than the other players while maintaining the RPS phenomenon). We may convert this dataset to a generalized hyper2 object. First a standard Bradley-Terry:

HW <- hyper2()
rs <- -rowSums(hw,na.rm=TRUE)
cs <- colSums(hw,na.rm=TRUE)
HW[t1] <- cs[1]
HW[t2] <- cs[2]
HW[t3] <- cs[3]
HW[t4] <- cs[4]
HW[t5] <- cs[5]
HW[t6] <- cs[6]
HW[t7] <- cs[7]
HW[c(t1,t2)] <- rs[1]
HW[c(t2,t3)] <- rs[2]
HW[c(t3,t4)] <- rs[3]
HW[c(t4,t5)] <- rs[4]
HW[c(t5,t6)] <- rs[5]
HW[c(t6,t7)] <- rs[6]
HW[c(t7,t1)] <- rs[7]
HW

## log(t1^29 * (t1 + t2)^-21 * (t1 + t7)^-18 * t2^11 * (t2 + t3)^-15 * t3^14 * (t3 + t4)^-16
## * t4^17 * (t4 + t5)^-18 * t5^17 * (t5 + t6)^-14 * t6^13 * (t6 + t7)^-11 * t7^12)

(likelihood function HW has no monster). We can then analyse this:

maxp(HW)

##       t1       t2       t3       t4       t5       t6       t7 
## 0.305468 0.063000 0.064540 0.096573 0.138031 0.163460 0.168928

equalp.test(HW)

## 
##  Constrained support maximization
## 
## data:  HW
## null hypothesis: t1 = t2 = t3 = t4 = t5 = t6 = t7
## null estimate:
##      t1      t2      t3      t4      t5      t6      t7 
## 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 
## (argmax, constrained optimization)
## Support for null:  -78.326 + K
## 
## alternative hypothesis:  sum p_i=1 
## alternative estimate:
##       t1       t2       t3       t4       t5       t6       t7 
## 0.305468 0.063000 0.064540 0.096573 0.138031 0.163460 0.168928 
## (argmax, free optimization)
## Support for alternative:  -71.966 + K
## 
## degrees of freedom: 6
## support difference = 6.3597
## p-value: 0.047716

Thus according to this model we may reject the hypothesis of equal strength. We now incorporate the monster:

HWM <- hyper2()
tt <- c("t1","t2","t3","t4","t5","t6","t7")
for(i in 1:6){
  HWM[c(tt[i]          )] <-  hw[i,i  ]
  HWM[c(      tt[i+1],M)] <-  hw[i,i+1]
  HWM[c(tt[i],tt[i+1],M)] <- rs[i]
}
HWM[c(tt[7]        )] <-  hw[7,7]
HWM[c(      tt[1],M)] <-  hw[7,1]
HWM[c(tt[7],tt[1],M)] <- rs[7]
HWM

## log((M + t1)^13 * (M + t1 + t2)^-21 * (M + t1 + t7)^-18 * (M + t2)^5 * (M + t2 + t3)^-15
## * (M + t3)^9 * (M + t3 + t4)^-16 * (M + t4)^11 * (M + t4 + t5)^-18 * (M + t5)^12 * (M +
## t5 + t6)^-14 * (M + t6)^9 * (M + t6 + t7)^-11 * (M + t7)^7 * t1^16 * t2^6 * t3^5 * t4^6 *
## t5^5 * t6^4 * t7^5)

Then the same analysis applies:

maxHW <- maxp(HWM)
maxHW

##        M       t1       t2       t3       t4       t5       t6       t7 
## 0.048730 0.376533 0.068949 0.054698 0.071616 0.094507 0.121378 0.163590

and we can test the hypothesis that \(p_M=0\):

specificp.gt.test(HWM,"M",0)

## 
##  Constrained support maximization
## 
## data:  HWM
## null hypothesis: sum p_i=1, M <= 0 (notional)
## null estimate:
##          M         t1         t2         t3         t4         t5         t6         t7 
## 9.9993e-06 1.4458e-01 9.3557e-02 1.1517e-01 1.4602e-01 1.6525e-01 1.7345e-01 1.6196e-01 
## (argmax, constrained optimization)
## Support for null:  -75.757 + K
## 
## alternative hypothesis:  sum p_i=1 
## alternative estimate:
##        M       t1       t2       t3       t4       t5       t6       t7 
## 0.048730 0.376533 0.068949 0.054698 0.071616 0.094507 0.121378 0.163590 
## (argmax, free optimization)
## Support for alternative:  -69.986 + K
## 
## degrees of freedom: 1
## support difference = 5.7706
## p-value: 0.00068067 (one-sided)

suggesting that we may reject the hypothesis of zero monster strength with a p-value of about 4.7%, even after accounting for the different strengths of the teeth.

4.1 Analysis of seven-tooth gear with `hyper3`

We can now use hyper3 to analyse the same dataset. Here the likelihood function for pairwise competition between \(p_1\) and \(p_2\), with \(a\) procyclic wins and \(b\) anticyclic wins, is

\[ \left(\frac{\lambda p_1}{\lambda p_1 + p_2}\right)^a \cdot \left(\frac{p_2}{\lambda p_1 + p_2}\right)^b \]

where \(\lambda\) is a quantification of the asymmetry. We can modify the hyper2 idiom to make a hyper3 object instead:

H3seven <- function(lambda){
    out <- hyper3()
    for(i in 1:6){
        out %<>% inc(dirichlet3(powers=hw[i,i:(i+1)],lambda=c(1,lambda)))
    }
    out %<>% inc(dirichlet3(powers=hw[7,c(7,1)],lambda=c(1,lambda)))
    return(out)
}
H3seven(lambda=1.66)

## log( (t1=1)^16 * (t1=1, t2=1.66)^-21 * (t1=1.66)^13 * (t1=1.66, t7=1)^-18 * (t2=1)^6 *
## (t2=1, t3=1.66)^-15 * (t2=1.66)^5 * (t3=1)^5 * (t3=1, t4=1.66)^-16 * (t3=1.66)^9 *
## (t4=1)^6 * (t4=1, t5=1.66)^-18 * (t4=1.66)^11 * (t5=1)^5 * (t5=1, t6=1.66)^-14 *
## (t5=1.66)^12 * (t6=1)^4 * (t6=1, t7=1.66)^-11 * (t6=1.66)^9 * (t7=1)^5 * (t7=1.66)^7)

profl <- function(lambda){
  H <- H3seven(lambda)
  return(loglik(maxp(H),H))
}

First optimize the likelihood:

o <- optimize(profl,c(1.4,1.6),maximum=TRUE)

## $maximum
## [1] 1.4976
## 
## $objective
## [1] -69.986

So now plot a profile likelihood curve:

lam <- seq(from=0.8,to=2.6,len=30)
like3 <- sapply(lam,profl)

plot(lam,like3-o$objective,type='b',ylab="likelihood",xlab="lambda")
abline(h=c(0,-2))
abline(v=c(1,o$maximum))

Package dataset

Following lines create chess.rda, residing in the data/ directory of the package.

save(chess_table,chess,chess_maxp,file="chess.rda")

References

Hankin, R. K. S. 2010. “A Generalization of the Dirichlet Distribution.” Journal of Statistical Software 33 (11): 1–18. https://doi.org/10.18637/jss.v033.i11.

Hankin, R. K. S. 2017. “Partial Rank Data with the hyper2 Package: Likelihood Functions for Generalized Bradley-Terry Models.” The R Journal 9 (2): 429–39.

West, Luke J., and R. K. S. Hankin. 2008. “Exact Tests for Two-Way Contingency Tables with Structural Zeros.” Journal of Statistical Software 28 (11). https://doi.org/10.18637/jss.v028/i11.

Note that diagonal entries of chess_table correspond to procyclic wins and off-diagonal entries correspond to anti-cyclic wins. This is due to the fact that chess_table is of the same type as the seven-toothed gear wheel dataset considered below, in which prograde “wins” appear on the diagonal and retrograde wins appear one place to the right (wrapped if necessary)↩︎

Quantifying the rock-paper-scissors effect

R. K. S. Hankin