(takes about ten minutes to run without cache)
To cite the hyper2 package in publications, please use
Hankin (2017). Here I use the hyper2 package to quantify the
strength of a circular rock-paper-scissors effect using the hyper2
package, improving on previous results (West and Hankin 2008) which merely
established its existence. I use two datasets, a three-player
scoreline and a synthetic dataset referring to wear on a seven-tooth
gear wheel.
Consider the following dataset:
chess_table <- as.matrix(read.table("chess.txt"))
dimnames(chess_table) <- list(
match=c("T_vs_A","A_vs_K","T_vs_K"),
player=c("Topalov","Anand","Karpov"))
chess_table
## player
## match Topalov Anand Karpov
## T_vs_A 22 13 NA
## A_vs_K NA 23 12
## T_vs_K 8 NA 10
This is documented in chess.Rd and shows that Topalov played Anand
\(22+13=35\) times, winning 22 and losing 13 matches. The corresponding
likelihood function is chess:
chess <- saffy(chess_table)
chess
## log(Anand^36 * (Anand + Karpov)^-35 * (Anand + Topalov)^-35 * Karpov^22
## * (Karpov + Topalov)^-18 * Topalov^30)
chess_maxp <- maxp(chess)
chess_maxp
## Topalov Anand Karpov
## 0.4036108 0.3405168 0.2558723
(we refer to this as the “TAK” model). We might ask whether there is evidence for the players having differing strengths:
equalp.test(chess)
##
## Constrained support maximization
##
## data: chess
## null hypothesis: Topalov = Anand = Karpov
## null estimate:
## Topalov Anand Karpov
## 0.3333333 0.3333333 0.3333333
## (argmax, constrained optimization)
## Support for null: -60.99695 + K
##
## alternative hypothesis: sum p_i=1
## alternative estimate:
## Topalov Anand Karpov
## 0.4036108 0.3405168 0.2558723
## (argmax, free optimization)
## Support for alternative: -60.06174 + K
##
## degrees of freedom: 2
## support difference = 0.9352125
## p-value: 0.3925025
Thus we fail to reject \(H_0\colon p_T=p_A=p_K\) under T-A-K as the p-value is about 0.39.
The existence of a rock-paper-scissors relationship between the players was discussed in Hankin (2010) and the null of no RPS phenomena could be rejected using the Alymer test.
Here we improve on that work by introducing a monster, M, that
accounts for the rock-paper-scissors nature of the players. The RPS
monster helps players that benefit from the RPS nature of the result.
The simplest nontrivial probability model is that all three real
players have the same strength: \(p_T=p_A=p_K\), which we write \(T=A=K\).
Under \(T=A=K\), writing \(p_T=p_A=p_K=\alpha\) we can use a binomial test
to calculate the p-value for the hypothesis that \(p_M=0\):
binom.test(sum(diag(chess_table)),sum(chess_table,na.rm=TRUE),alternative="greater")
##
## Exact binomial test
##
## data: sum(diag(chess_table)) and sum(chess_table, na.rm = TRUE)
## number of successes = 55, number of trials = 88, p-value = 0.01231
## alternative hypothesis: true probability of success is greater than 0.5
## 95 percent confidence interval:
## 0.5321799 1.0000000
## sample estimates:
## probability of success
## 0.625
Above we see a p-value of about 1.2%, sufficient to reject the null of zero monster strength1.
Estimating \(M\), the strength of the monster, is also straightforward. First we consider the case where the probability of a procyclic win is known to be \(p\) (a “procyclic win” is a win that would be expected under a strong monster).
\[ \operatorname{Prob}(\mathrm{procyclic}\ \mathrm{win})=\frac{\alpha+M}{2\alpha+M} \]
Here we also have \(3\alpha+M=1\), so to estimate the strength of the monster we would solve
\[ \frac{\alpha+M}{2\alpha+M}=p,\qquad{3\alpha + M}=1 \]
for \(\alpha\) and \(M\), where \(p\) is the probability of a procyclic win. This gives us
\[ \alpha=\frac{1-p}{2-p},\qquad M=\frac{2p-1}{2-p}. \]
If we use the maximum likelihood estimate value for a procyclic win of \(\frac{55}{88}=0.625\) we find that \(\hat{M}=0.18\) approximately. Further, we may express \(p\), the probability of a pro-cyclic win, as \(p=\frac{1+2M}{2+M}\). If we observe \(a\) procyclic wins and \(b\) countercyclic wins a likelihood function for \(M\) would then be
\[ \mathcal{L}(M)= p^a(1-p)^b=\left(\frac{1+2M}{2+M}\right)^a\cdot \left(\frac{1-M}{2+M}\right)^b \]
and here we have \(a=22+23+10=55\) and \(b=13+12+8=33\) (a graph is given
below). It is straightforward but not trivial to translate this into
hyper3 idiom. Noting that \(3\alpha+M=1\) we define \(\beta=3\alpha\)
and then the unit sum constraint is \(\beta+M=1\) and our probabilities
are
\[ \frac{\alpha+M;\alpha}{2\alpha+M } = \frac{\beta/3+M;\beta/3}{2\beta/3+M} = \frac{1+2M;1-M}{2+M} \]
(where the last expression is obtained using \(\beta+M=1\)). We may set
up a log-likelihood function of \(a,b\) using hyper3 idiom as follows:
TqAqK <- function(a,b){
H3 <- hyper3(pnames=c("beta","M"))
H3[c(beta=1/3,M=1)] %<>% inc(a )
H3[c(beta=1/3 )] %<>% inc( b)
H3[c(beta=2/3,M=1)] %<>% dec(a+b)
return(H3)
}
H3 <- TqAqK(55,33)
loglik(c(M=0.1,beta=0.9),H3)
## [1] -58.7397
loglik(c(beta=0.9,M=0.1),H3) # NB: identical!
## [1] -58.7397
[above, we note that \(\beta+M=3\alpha+M=1\), so the monster is the fillup value with respect to \(\beta\)]. This is easy to maximize:
optimize(function(M){loglik(c(M=M,beta=1-M),H3)},c(0.01,0.99),maximum=TRUE)
## $maximum
## [1] 0.1818228
##
## $objective
## [1] -58.21756
This gives maximum likelihood estimates of \(M=0.18\) as would be expected from the simple binomial model; the probability of a procyclic win is about \(0.625\).
We may relax the assumption of equal strength of the real players by using a more sophisticated hyper2 construction:
H <- hyper2()
H[c("Topalov", "M")] %<>% inc(22 )
H[c( "Anand" )] %<>% inc( 13)
H[c("Topalov","Anand", "M")] %<>% dec(22+13)
H[c( "Anand", "M")] %<>% inc( 23)
H[c( "Karpov" )] %<>% inc( 12)
H[c( "Anand","Karpov","M")] %<>% dec(23+12)
H[c("Topalov" )] %<>% inc( 8 )
H[c( "Karpov","M")] %<>% inc( 10)
H[c("Topalov", "Karpov","M")] %<>% dec( 8+10)
H
## log(Anand^13 * (Anand + Karpov + M)^-35 * (Anand + M)^23 * (Anand + M +
## Topalov)^-35 * Karpov^12 * (Karpov + M)^10 * (Karpov + M + Topalov)^-18
## * (M + Topalov)^22 * Topalov^8)
(in the above, a win by a player helped by the RPS effect has M
added to his strength). We can call this the TAKM model, and find the
evaluate:
maxp(H)
## Anand Karpov M Topalov
## 0.2845736 0.2338398 0.1636196 0.3179670
We see that, under TAKM, the monster has strength of about 0.16, slightly less than the 0.18 obtained for \(M\) under the equality assumption \(p_T=p_A=p_K\). Further, under TAKM, the three “real” players are again all of approximately equal strength, and we may test this:
samep.test(H,c("Topalov","Anand","Karpov"))
## Warning in optim(theta.old, fun, gradient, control = control, method = method, : one-dimensional optimization by Nelder-Mead is unreliable:
## use "Brent" or optimize() directly
## Warning in optim(theta.old, fun, gradient, control = control, method = method, : one-dimensional optimization by Nelder-Mead is unreliable:
## use "Brent" or optimize() directly
##
## Constrained support maximization
##
## data: H
## null hypothesis: Topalov = Anand = Karpov
## null estimate:
## Anand Karpov M Topalov
## 0.272704 0.272704 0.181888 0.272704
## (argmax, constrained optimization)
## Support for null: -58.21757 + K
##
## alternative hypothesis: sum p_i=1
## alternative estimate:
## Anand Karpov M Topalov
## 0.2845736 0.2338398 0.1636196 0.3179670
## (argmax, free optimization)
## Support for alternative: -57.95715 + K
##
## degrees of freedom: 2
## support difference = 0.260415
## p-value: 0.7707316
(note the estimated monster strengths of 0.16 and 0.18 agree with the previous estimate). Thus the p-value of about 0.77 indicates failure to reject the null (compare 0.39 under TAK). Now we can test \(H_0\colon p_M=0\) under TAKM:
specificp.gt.test(H,"M",0)
##
## Constrained support maximization
##
## data: H
## null hypothesis: sum p_i=1, M <= 0 (notional)
## null estimate:
## Anand Karpov M Topalov
## 3.403344e-01 3.048397e-01 9.999580e-06 3.548160e-01
## (argmax, constrained optimization)
## Support for null: -60.46761 + K
##
## alternative hypothesis: sum p_i=1
## alternative estimate:
## Anand Karpov M Topalov
## 0.2845736 0.2338398 0.1636196 0.3179670
## (argmax, free optimization)
## Support for alternative: -57.95715 + K
##
## degrees of freedom: 1
## support difference = 2.510459
## p-value: 0.0250429 (one-sided)
(in the test above, note that the players may appear in a different order from previously). The test shows that we may reject the hypothesis that \(p_M=0\) under TAKM with a p-value of \(0.04\) (compare 0.012 from the binomial test).
We may plot a profile likelihood curve for \(M\):
M <- seq(from=0.001,to=0.4,len=40)
L1 <- profsupp(H,"M",M)
L1 <- L1-max(L1)
L2 <- 55*log(1+2*M)+33*log(1-M)-88*log(2+M)
L2 <- L2-max(L2)
plot(M,L1,type="l",col="black",ylab="log likelihood",xlab="Monster strength")
points(M,L2,type="l",col="red")
abline(h=c(0,-2),lty=2)
legend("topright",col=c("black","red"),lty=1,legend=c("TAKM","T=A=K"))
Above we see support curves for RPS monster strength using TAKM and
\(T=A=K\), showing very similar results. Note that the T=A=K curve
could have been obtained more nicely with function TqAqK but I
haven’t got round to changing it yet.
West and Hankin (2008) consider a synthetic dataset referring to seven teeth of a gear wheel. Consecutive pairs are repeatedly compared and the most worn tooth is nominated by independent judges. Here a dataset of the same form will be used, but the entries have been changed for pedagogical reasons:
hw <- matrix(c(
16,05,NA,NA,NA,NA,NA,
NA,06,09,NA,NA,NA,NA,
NA,NA,05,11,NA,NA,NA,
NA,NA,NA,06,12,NA,NA,
NA,NA,NA,NA,05,09,NA,
NA,NA,NA,NA,NA,04,07,
13,NA,NA,NA,NA,NA,05),
7,7,byrow=TRUE)
colnames(hw) <- paste("t",1:7,sep="")
hw
## t1 t2 t3 t4 t5 t6 t7
## [1,] 16 5 NA NA NA NA NA
## [2,] NA 6 9 NA NA NA NA
## [3,] NA NA 5 11 NA NA NA
## [4,] NA NA NA 6 12 NA NA
## [5,] NA NA NA NA 5 9 NA
## [6,] NA NA NA NA NA 4 7
## [7,] 13 NA NA NA NA NA 5
Thus, considering the top row, we see that in a total of \(16+5=21\)
independent comparisons between t1 and t2, tooth t1 was judged
to be the more worn one 16 times, and t2 was judged to be the more
worn one \(5\) times (NB: in the above, I have changed the dataset in
order to make t1 stronger than the other players while maintaining
the RPS phenomenon). We may convert this dataset to a generalized
hyper2 object. First a standard Bradley-Terry:
HW <- hyper2()
rs <- -rowSums(hw,na.rm=TRUE)
cs <- colSums(hw,na.rm=TRUE)
HW[t1] <- cs[1]
HW[t2] <- cs[2]
HW[t3] <- cs[3]
HW[t4] <- cs[4]
HW[t5] <- cs[5]
HW[t6] <- cs[6]
HW[t7] <- cs[7]
HW[c(t1,t2)] <- rs[1]
HW[c(t2,t3)] <- rs[2]
HW[c(t3,t4)] <- rs[3]
HW[c(t4,t5)] <- rs[4]
HW[c(t5,t6)] <- rs[5]
HW[c(t6,t7)] <- rs[6]
HW[c(t7,t1)] <- rs[7]
HW
## log(t1^29 * (t1 + t2)^-21 * (t1 + t7)^-18 * t2^11 * (t2 + t3)^-15 *
## t3^14 * (t3 + t4)^-16 * t4^17 * (t4 + t5)^-18 * t5^17 * (t5 + t6)^-14 *
## t6^13 * (t6 + t7)^-11 * t7^12)
(likelihood function HW has no monster). We can then analyse this:
maxp(HW)
## t1 t2 t3 t4 t5 t6 t7
## 0.30547567 0.06300043 0.06453928 0.09656959 0.13802768 0.16345937 0.16892798
equalp.test(HW)
##
## Constrained support maximization
##
## data: HW
## null hypothesis: t1 = t2 = t3 = t4 = t5 = t6 = t7
## null estimate:
## t1 t2 t3 t4 t5 t6 t7
## 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571
## (argmax, constrained optimization)
## Support for null: -78.32563 + K
##
## alternative hypothesis: sum p_i=1
## alternative estimate:
## t1 t2 t3 t4 t5 t6 t7
## 0.30547567 0.06300043 0.06453928 0.09656959 0.13802768 0.16345937 0.16892798
## (argmax, free optimization)
## Support for alternative: -71.96596 + K
##
## degrees of freedom: 6
## support difference = 6.359671
## p-value: 0.04771576
Thus according to this model we may reject the hypothesis of equal strength. We now incorporate the monster:
HWM <- hyper2()
tt <- c("t1","t2","t3","t4","t5","t6","t7")
for(i in 1:6){
HWM[c(tt[i] )] <- hw[i,i ]
HWM[c( tt[i+1],M)] <- hw[i,i+1]
HWM[c(tt[i],tt[i+1],M)] <- rs[i]
}
HWM[c(tt[7] )] <- hw[7,7]
HWM[c( tt[1],M)] <- hw[7,1]
HWM[c(tt[7],tt[1],M)] <- rs[7]
HWM
## log((M + t1)^13 * (M + t1 + t2)^-21 * (M + t1 + t7)^-18 * (M + t2)^5 *
## (M + t2 + t3)^-15 * (M + t3)^9 * (M + t3 + t4)^-16 * (M + t4)^11 * (M +
## t4 + t5)^-18 * (M + t5)^12 * (M + t5 + t6)^-14 * (M + t6)^9 * (M + t6 +
## t7)^-11 * (M + t7)^7 * t1^16 * t2^6 * t3^5 * t4^6 * t5^5 * t6^4 * t7^5)
Then the same analysis applies:
maxHW <- maxp(HWM)
maxHW
## M t1 t2 t3 t4 t5 t6
## 0.04870602 0.37652705 0.06891871 0.05469264 0.07158761 0.09449590 0.12138900
## t7
## 0.16368307
and we can test the hypothesis that \(p_M=0\):
specificp.gt.test(HWM,"M",0)
##
## Constrained support maximization
##
## data: HWM
## null hypothesis: sum p_i=1, M <= 0 (notional)
## null estimate:
## M t1 t2 t3 t4 t5
## 9.999265e-06 1.428914e-01 1.211173e-01 1.328444e-01 1.450258e-01 1.517772e-01
## t6 t7
## 1.540200e-01 1.523139e-01
## (argmax, constrained optimization)
## Support for null: -77.23766 + K
##
## alternative hypothesis: sum p_i=1
## alternative estimate:
## M t1 t2 t3 t4 t5 t6
## 0.04870602 0.37652705 0.06891871 0.05469264 0.07158761 0.09449590 0.12138900
## t7
## 0.16368307
## (argmax, free optimization)
## Support for alternative: -69.98623 + K
##
## degrees of freedom: 1
## support difference = 7.251428
## p-value: 0.0001399473 (one-sided)
suggesting that we may reject the hypothesis of zero monster strength with a p-value of about 4.7%, even after accounting for the different strengths of the teeth.
hyper3We can now use hyper3 to analyse the same dataset. Here the
likelihood function for pairwise competition between \(p_1\) and \(p_2\),
with \(a\) procyclic wins and \(b\) anticyclic wins, is
\[ \left(\frac{\lambda p_1}{\lambda p_1 + p_2}\right)^a \cdot \left(\frac{p_2}{\lambda p_1 + p_2}\right)^b \]
where \(\lambda\) is a quantification of the asymmetry. We can modify
the hyper2 idiom to make a hyper3 object instead:
H3seven <- function(lambda){
out <- hyper3()
for(i in 1:6){
out %<>% inc(dirichlet3(powers=hw[i,i:(i+1)],lambda=c(1,lambda)))
}
out %<>% inc(dirichlet3(powers=hw[7,c(7,1)],lambda=c(1,lambda)))
return(out)
}
H3seven(lambda=1.66)
## log( (t1=1)^16 * (t1=1, t2=1.66)^-21 * (t1=1.66)^13 * (t1=1.66,
## t7=1)^-18 * (t2=1)^6 * (t2=1, t3=1.66)^-15 * (t2=1.66)^5 * (t3=1)^5 *
## (t3=1, t4=1.66)^-16 * (t3=1.66)^9 * (t4=1)^6 * (t4=1, t5=1.66)^-18 *
## (t4=1.66)^11 * (t5=1)^5 * (t5=1, t6=1.66)^-14 * (t5=1.66)^12 * (t6=1)^4
## * (t6=1, t7=1.66)^-11 * (t6=1.66)^9 * (t7=1)^5 * (t7=1.66)^7)
profl <- function(lambda){
H <- H3seven(lambda)
return(loglik(maxp(H),H))
}
First optimize the likelihood:
o <- optimize(profl,c(1.4,1.6),maximum=TRUE)
o
## $maximum
## [1] 1.497558
##
## $objective
## [1] -69.98623
So now plot a profile likelihood curve:
lam <- seq(from=0.8,to=2.6,len=30)
like3 <- sapply(lam,profl)
plot(lam,like3-o$objective,type='b',ylab="likelihood",xlab="lambda")
abline(h=c(0,-2))
abline(v=c(1,o$maximum))
Following lines create chess.rda, residing in the data/ directory of the package.
save(chess_table,chess,chess_maxp,file="chess.rda")
hyper2 Package: Likelihood Functions for Generalized Bradley-Terry Models.” The R Journal 9 (2): 429–39.
Note that diagonal entries of chess_table
correspond to procyclic wins and off-diagonal entries correspond to
anti-cyclic wins. This is due to the fact that chess_table is of
the same type as the seven-toothed gear wheel dataset considered
below, in which prograde “wins” appear on the diagonal and retrograde
wins appear one place to the right (wrapped if necessary)↩︎