To cite the hyper2 package in publications, please use Hankin (2017).

This document is a distillation and clarification of inst/plackett_luce_monster.Rmd.

In the following, I use the ‘’race’’ metaphor: five runners take part in a race and arrive in order: \(a\succ b\succ c\succ d\succ e\). Consider the following Plackett-Luce likelihood function for this observation:

\[\begin{equation} \frac{p_a}{p_a + p_b + p_c + p_d + p_e}\cdot \frac{p_b}{ p_b + p_c + p_d + p_e}\cdot \frac{p_c}{ p_c + p_d + p_e}\cdot \frac{p_d}{ p_d + p_e}\cdot \frac{p_e}{ p_e} \end{equation}\]

Note the use of multiplication between the terms, indicating conditional independence. We may use the multiplicative generalization of Bradley-Terry strengths to introduce an element of non-independence between the terms. There are many ways to do this, but one simple case would be to define equivalence classes of the competitors with each equivalence class comprising mutually supporting runners. Equivalence classes of size one correspond to unsupported runners. The metaphor would be that a runner who has finished the race is able to support other members of his equivalence class by cheering his teammates, boosting their better performance. The hyper2 package includes function cheering() which implements this functionality.

As an example, consider figure 0.1 This shows a partial probability tree diagram for some of the \(5!=120\) possible order statistics. The standard Plackett-Luce likelihoods have been modified to account for two groups of mutually supporting runners: \(\left\lbrace a,b,c\right\rbrace\) with support term \(\lambda\), and \(\left\lbrace d,e\right\rbrace\) with support term \(\mu\). From START, the first runner to cross the finishing line is associated with standard Plackett-Luce. Taking the top path as an example, we see that the likelihood function for \(a\succ b\succ c\succ d\succ e\) would be

Partial probability\label{tikzabcde} tree for five competitors $a$-$e$ with mutually supporting subsets $\left\lbrace a,b,c\right\rbrace$ [with support term $\lambda$] and $(de)$ [with support term $\mu$], hyper3 approach

Figure 0.1: Partial probability tree for five competitors \(a\)-\(e\) with mutually supporting subsets \(\left\lbrace a,b,c\right\rbrace\) [with support term \(\lambda\)] and \((de)\) [with support term \(\mu\)], hyper3 approach

\[\begin{equation}\label{asuccb} \frac{p_a}{p_a + p_b + p_c + p_d + p_e}\cdot \frac{\lambda p_b}{\lambda p_b + \lambda p_c + p_d + p_e}\cdot \frac{\lambda p_c}{\lambda p_c + p_d + p_e}\cdot \frac{p_d}{p_d + p_e} \end{equation}\]

In equation \(\ref{asuccb}\), the first term is standard Plackett-Luce: at this point, nooone has finished and cheering effects are absent. The second term reflects the fact that competitors \(b\) and \(c\) are supported by competitor \(a\), who has by this point finished the race and is supporting his teammates.

By contrast, the likelihood function for observation \(d\succ a\succ c\succ b\succ e\) would be

\[\begin{equation} \frac{p_d}{p_a + p_b + p_c + p_d + p_e}\cdot \frac{p_a}{p_a + p_b + p_c + \mu p_e}\cdot \frac{\lambda p_c}{\lambda p_b + \lambda p_c + p_e}\cdot \frac{\lambda p_b}{\lambda p_b + \mu p_e} \end{equation}\]

where this likelihood function reflects the mutual support for equivalence class \(\left\lbrace d,e\right\rbrace\). Note that the final term reflects the fact that competitors \(b\) and \(e\) have their support active when vying for fourth place, as members of both their teams have finished at this point.

This probability model could easily be modified to account for specific circumstances. The cheering effect be asymmetrical (with \(a\) helping \(b\) but not vice-versa, for example). The effect could operate only on specific ordered pairs. Or perhaps the effect might have a finite lifetime: if \(a\) places \(n^\mathrm{th}\), then the cheering effect is active only for competitors placing \((n+r)^\mathrm{th}\) or above.

1 Package idiom

We can investigate red bus-blue bus phenomenon (as discussed, in a slightly different context, in inst/red_bus_blue_bus.Rmd). Here, we consider a person who is given the choice of five transport methods:

Now, he does not really care what colour the bus is. If we ask him to rank his options, it is highly probable that he will put RB and BB consecutively (because they are essentially indistinguishable). Can we quantify the strength of this effect? To do this, we define a bespoke function RB_BB_LF() which returns a hyper3 log-likelihood function corresponding to repeated observations of our commuter’s reported ranks for the five options:

`RB_BB_LF` <- function(s){
    ec <- c(C=1,T=2,RB=3,BB=3,W=4) # equivalence classes
    h <- c(1,1,s,1)                # strength of support
    (
        cheering3(v=c("RB","BB","C" ,"T","W"), e=ec, h=h)*3 +
        cheering3(v=c("W" ,"BB","RB","C","T"), e=ec, h=h)*7 +
        cheering3(v=c("W" ,"RB","BB","C","T"), e=ec, h=h)*8 +
        cheering3(v=c("W" ,"BB","RB","T","C"), e=ec, h=h)*7 +
        cheering3(v=c("W" ,"RB","BB","C","T"), e=ec, h=h)*7 +
        cheering3(v=c("RB","BB","C" ,"T","W"), e=ec, h=h)*3 +
        cheering3(v=c("RB","BB","C" ,"T","W"), e=ec, h=h)*3 +
        cheering3(v=c("BB","RB","T" ,"C","W"), e=ec, h=h)*2 +
        cheering3(v=c("T" ,"BB","RB","C","W"), e=ec, h=h)*2 +
        cheering3(v=c("W" ,"BB","RB","T","C"), e=ec, h=h)*4 +
        cheering3(v=c("C" ,"RB","BB","W","T"), e=ec, h=h)*4 +
        cheering3(v=c("BB","C" ,"RB","T","W"), e=ec, h=h)*3
    )
}

Above, we see from the function body that he reported RB,BB,C,T,W three times [first row], BB,RB,T,C,W twice [second row], and so on; perhaps his ranking depends on the weather or how tired he is on any given day. Observe that in almost every case he ranks RB and BB consecutively. Function RB_BB_LF() takes argument s that quantifies the perceived similarity between RB and BB. For example:

(H <- RB_BB_LF(s=1.8888))
## log( (BB=1)^25 * (BB=1, C=1, RB=1, T=1)^-33 * (BB=1, C=1, RB=1, T=1,
## W=1)^-53 * (BB=1, C=1, RB=1, W=1)^-2 * (BB=1, RB=1, T=1, W=1)^-4 *
## (BB=1.8888)^28 * (BB=1.8888, C=1, T=1)^-15 * (BB=1.8888, C=1, T=1,
## W=1)^-9 * (BB=1.8888, T=1, W=1)^-4 * (C=1)^42 * (C=1, RB=1.8888,
## T=1)^-18 * (C=1, RB=1.8888, T=1, W=1)^-5 * (C=1, RB=1.8888, W=1)^-2 *
## (C=1, T=1)^-33 * (C=1, T=1, W=1)^-11 * (C=1, W=1)^-4 * (RB=1)^28 *
## (RB=1.8888)^25 * (RB=1.8888, T=1, W=1)^-3 * (T=1)^27 * (T=1, W=1)^-16 *
## (W=1)^37)
(mH <- maxp(H,n=1))
##       BB        C       RB        T        W 
## 0.324658 0.104333 0.320627 0.058544 0.191838

Now to find a profile likelihood function for s:

o <- function(s){maxp(RB_BB_LF(s),give=TRUE,n=1)$likes} # optimand
s <- exp(seq(from=log(1.3),to=log(147),len=17)) # putative similarity measures
L <- sapply(s,o)
L <- L-max(L)

We can plot these:

plot(s,L,type="b")
abline(h=c(0,-2))
abline(v=1)

plot(log(s),L,type="b")
abline(h=c(0,-2))
abline(v=0)

And formally maximize the likelihood:

(osup <- optimize(o,c(6,10),maximum=TRUE))
## $maximum
## [1] 9.9999
## 
## $objective
## [1] -191.34

So a likelihood ratio test of the null that \(S=1\) would be:

(suppdiff <- o(osup$maximum) - o(1))
## [1] 20.21

Easily satisfying Edwards’s two-units-of-support criterion; Wilks gives us an asymptotic \(p\)-value:

pchisq(suppdiff*2,df=1,lower.tail=FALSE)
## [1] 2.0488e-10

Now use the evaluate for the likelihood function:

maxHmax <- maxp(RB_BB_LF(s = osup$maximum))
maxHmax
##       BB        C       RB        T        W 
## 0.269163 0.139642 0.267252 0.078503 0.245440

2 University ranking analysis

Here we use a dataset of university rankings, timesData.csv, taken from https://github.com/arnaudbenard/university-ranking/blob/master/timesData.csv.

a <- read.table("timesData.txt", sep=",", header=TRUE)
wanted <- c("California Institute of Technology", "Harvard University",
"Massachusetts Institute of Technology", "Princeton University",
"Stanford University", "University of Cambridge", "University of Oxford")
names(wanted) <- c("cal","harv","mass","prin","stan","cam","ox")

a <- a[a$university_name %in% wanted,]
a <- cbind(a,"top7rank"=0)
for(y in unique(a$year)){
    a[a$year==y,"top7rank"] <- order(
                  as.numeric(a[a$year==y,"world_rank"]) +
                                     a[a$year==y,"research"]/1e6,
decreasing=TRUE)}
a <- a[,c("top7rank","university_name","year")]
a <- reshape(a,idvar="university_name",timevar="year",direction="wide")
for(i in seq_len(nrow(a))){
   a$university_name[i] <- names(which(wanted == a$university_name[i]))
}
rownames(a) <- a$university_name
a <- a[,-1]
colnames(a) <- paste("Y",2011:2016,sep="")
a
##      Y2011 Y2012 Y2013 Y2014 Y2015 Y2016
## harv     6     6     4     6     6     2
## cal      7     7     7     7     7     7
## mass     5     1     2     2     2     3
## stan     4     5     6     4     4     5
## prin     3     3     3     3     1     1
## cam      2     2     1     1     3     4
## ox       1     4     5     5     5     6
H <- suppfun(ordertable(a))
equalp.test(H)
## 
##  Constrained support maximization
## 
## data:  H
## null hypothesis: cal = cam = harv = mass = ox = prin = stan
## null estimate:
##     cal     cam    harv    mass      ox    prin    stan 
## 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 
## (argmax, constrained optimization)
## Support for null:  -51.151 + K
## 
## alternative hypothesis:  sum p_i=1 
## alternative estimate:
##      cal      cam     harv     mass       ox     prin     stan 
## 0.000001 0.348942 0.015828 0.265173 0.038985 0.287311 0.043760 
## (argmax, free optimization)
## Support for alternative:  -29.415 + K
## 
## degrees of freedom: 6
## support difference = 21.736
## p-value: 9.4034e-08
samep.test(H,c("ox","cam"))
## 
##  Constrained support maximization
## 
## data:  H
## null hypothesis: ox = cam
## null estimate:
##        cal        cam       harv       mass         ox       prin       stan 
## 1.0002e-06 1.3346e-01 2.9711e-02 2.8104e-01 1.3346e-01 3.4667e-01 7.5669e-02 
## (argmax, constrained optimization)
## Support for null:  -33.153 + K
## 
## alternative hypothesis:  sum p_i=1 
## alternative estimate:
##      cal      cam     harv     mass       ox     prin     stan 
## 0.000001 0.348942 0.015828 0.265173 0.038985 0.287311 0.043760 
## (argmax, free optimization)
## Support for alternative:  -29.415 + K
## 
## degrees of freedom: 1
## support difference = 3.7385
## p-value: 0.0062492

Start to use hyper3 idiom:

H3 <- function(oxcam){
  out <- hyper3()
  for(i in seq_len(ncol(a))){
    jj <- rep("",nrow(a))
    jj[a[,i]] <- rownames(a)
    out <- out + cheering3(v=jj,e=c(ox=1,cam=1,prin=2, stan=3, mass=4, harv=5, cal=6), help=c(oxcam,1,1,1,1,1))
  }
  return(out)
}
o <- function(oxcam){maxp(H3(oxcam),give=TRUE,n=1)$likes}
oc <- exp(seq(from=log(0.5),to=log(5),len=15))
L <- sapply(oc,o)
L <- L - max(L)
plot(log(oc),L,type="b")
abline(v=0)

3 Five nations championship

The five nations rugby championship was held from 1910 to 1999 and file five_nations.txt shows the order statistic for England (E), Scotland (S), Ireland (I), Wales (W), and France (F).

https://en.wikipedia.org/wiki/Six_Nations_Championship

Here is hyper2 analysis:

a <- as.matrix(read.table("five_nations.txt",header=FALSE))
head(a)
##      V1     V2  V3  V4  V5  V6 
## [1,] "1910" "E" "W" "S" "I" "F"
## [2,] "1911" "W" "I" "E" "F" "S"
## [3,] "1912" "E" "I" "S" "W" "F"
## [4,] "1913" "E" "W" "S" "I" "F"
## [5,] "1914" "E" "W" "I" "S" "F"
## [6,] "1920" "W" "S" "E" "F" "I"
H <- hyper2()
for(i in seq_len(nrow(a))){
  H <- H + race(a[i,-1])
}
mH <- maxp(H)
pie(mH)

equalp.test(H)
## 
##  Constrained support maximization
## 
## data:  H
## null hypothesis: E = F = I = S = W
## null estimate:
##   E   F   I   S   W 
## 0.2 0.2 0.2 0.2 0.2 
## (argmax, constrained optimization)
## Support for null:  -335.12 + K
## 
## alternative hypothesis:  sum p_i=1 
## alternative estimate:
##       E       F       I       S       W 
## 0.22044 0.21524 0.15607 0.15988 0.24837 
## (argmax, free optimization)
## Support for alternative:  -331.03 + K
## 
## degrees of freedom: 4
## support difference = 4.0955
## p-value: 0.08483

Now use hyper3 to see whether teams do better following a win:

rugby <- function(lambda){
  H3a <- hyper3()
  for(i in seq(from=2,to=nrow(a))){
        last_year_winner <- a[i-1,2] # e.g. "W" or "E"
        H3a <- H3a + ordervec2supp3a(a[i,-1],nonfinishers=NULL,helped=last_year_winner,lambda=lambda)
  }
  return(H3a)
}
rugby(1.888)
## log( (E=1)^38 * (E=1, F=1)^-7 * (E=1, F=1, I=1)^-6 * (E=1, F=1, I=1,
## S=1)^-23 * (E=1, F=1, I=1, S=1, W=1)^-69 * (E=1, F=1, I=1, W=1)^-6 *
## (E=1, F=1, S=1)^-4 * (E=1, F=1, S=1, W=1)^-6 * (E=1, F=1, W=1)^-4 *
## (E=1, I=1)^-7 * (E=1, I=1, S=1)^-11 * (E=1, I=1, S=1, W=1)^-16 * (E=1,
## I=1, W=1)^-4 * (E=1, S=1)^-9 * (E=1, S=1, W=1)^-7 * (E=1, W=1)^-2 *
## (E=1.888)^18 * (F=1)^42 * (F=1, I=1)^-4 * (F=1, I=1, S=1)^-12 * (F=1,
## I=1, S=1, W=1)^-18 * (F=1, I=1, W=1)^-5 * (F=1, S=1)^-10 * (F=1, S=1,
## W=1)^-5 * (F=1, W=1)^-5 * (F=1.888)^16 * (I=1)^45 * (I=1, S=1)^-11 *
## (I=1, S=1, W=1)^-11 * (I=1, W=1)^-10 * (I=1.888)^6 * (S=1)^47 * (S=1,
## W=1)^-4 * (S=1.888)^6 * (W=1)^35 * (W=1.888)^23)
rugby(1.111111)
## log( (E=1)^38 * (E=1, F=1)^-7 * (E=1, F=1, I=1)^-6 * (E=1, F=1, I=1,
## S=1)^-23 * (E=1, F=1, I=1, S=1, W=1)^-69 * (E=1, F=1, I=1, W=1)^-6 *
## (E=1, F=1, S=1)^-4 * (E=1, F=1, S=1, W=1)^-6 * (E=1, F=1, W=1)^-4 *
## (E=1, I=1)^-7 * (E=1, I=1, S=1)^-11 * (E=1, I=1, S=1, W=1)^-16 * (E=1,
## I=1, W=1)^-4 * (E=1, S=1)^-9 * (E=1, S=1, W=1)^-7 * (E=1, W=1)^-2 *
## (E=1.1111)^18 * (F=1)^42 * (F=1, I=1)^-4 * (F=1, I=1, S=1)^-12 * (F=1,
## I=1, S=1, W=1)^-18 * (F=1, I=1, W=1)^-5 * (F=1, S=1)^-10 * (F=1, S=1,
## W=1)^-5 * (F=1, W=1)^-5 * (F=1.1111)^16 * (I=1)^45 * (I=1, S=1)^-11 *
## (I=1, S=1, W=1)^-11 * (I=1, W=1)^-10 * (I=1.1111)^6 * (S=1)^47 * (S=1,
## W=1)^-4 * (S=1.1111)^6 * (W=1)^35 * (W=1.1111)^23)
maxp(rugby(1.8),n=1,give=TRUE)
## $par
##       E       F       I       S 
## 0.21677 0.22212 0.15621 0.15881 
## 
## $value
## [1] -285.72
## 
## $counts
## function gradient 
##       35        8 
## 
## $convergence
## [1] 0
## 
## $message
## NULL
## 
## $outer.iterations
## [1] 2
## 
## $barrier.value
## [1] 0.00015929
## 
## $likes
## [1] -285.72
## 
## $evaluate
##       E       F       I       S       W 
## 0.21677 0.22212 0.15621 0.15881 0.24610
maxp(rugby(1.9),n=1,give=TRUE)
## $par
##       E       F       I       S 
## 0.21677 0.22212 0.15621 0.15881 
## 
## $value
## [1] -281.98
## 
## $counts
## function gradient 
##       35        8 
## 
## $convergence
## [1] 0
## 
## $message
## NULL
## 
## $outer.iterations
## [1] 2
## 
## $barrier.value
## [1] 0.00015929
## 
## $likes
## [1] -281.98
## 
## $evaluate
##       E       F       I       S       W 
## 0.21677 0.22212 0.15621 0.15881 0.24610

References

Hankin, R. K. S. 2017. “Partial Rank Data with the hyper2 Package: Likelihood Functions for Generalized Bradley-Terry Models.” The R Journal 9 (2): 429–39.