To cite the hyper2 package in publications, please use Hankin (2017). Here we will sum probabilities across a sample space and verify that the total is constant. In theory the sum of the probabilities of all members of the sample space is 1, but in practice [and certainly in hyper2 idiom], the likelihood calculation includes an indeterminate constant which means that all we can say is that the total probability is not a function of the parameters of the distribution.

Here I present some relatively informal and unstructured thoughts on this issue from a numerical and hyper2 perspective.

M <- apply(partitions::multiset(c(1,1,2,2,3)),2,function(x){letters[x]})
colnames(M) <- rep("",ncol(M))   # tidier and more compact appearance
M <- noquote(M)
M
##                                                                 
## [1,] a a a a a a a a a a a a b b b b b b b b b b b b c c c c c c
## [2,] a a a b b b b b b c c c a a a a a a b b b c c c a a a b b b
## [3,] b b c a a b b c c a b b a a b b c c a a c a a b a b b a a b
## [4,] b c b b c a c a b b a b b c a c a b a c a a b a b a b a b a
## [5,] c b b c b c a b a b b a c b c a b a c a a b a a b b a b a a

Above, we use multiset() to enumerate the sample space of five competitors comprising two twins denoted a, two twins denoted b, and a singleton c. This document discusses the issue of whether the twins are identical twins and hence indistinguishable. Matrix M has \({5\choose2\,2\,1}=30\) columns, one per possible result. We will calculate the likelihood of each column and sum them to get the probability of observing any one of the 30 order statistics [the observations are mutually exclusive, so \(\operatorname{Prob}\left(x_1\cup\cdots\cup x_n\right) = \operatorname{Prob}\left(x_1\right)+\cdots +\operatorname{Prob}\left(x_n\right)\)]:

tfun <- function(help,strength,M){
  out <- 0
  for(i in seq_len(ncol(M))){
     vec <- M[,i]
     out <- out +
       loglik(strength,cheering3(v=vec,e=c(a=1,b=2,c=3),help=help),log=FALSE)
  }
  return(out)
}
print(tfun(help=(1:3)/10, strength= c(a=0.1,b=0.3,c=0.6),M))
## [1] 0.25
print(tfun(help=(7:5)/22, strength= c(a=0.3,b=0.2,c=0.5),M))
## [1] 0.25

Above we see that the sum of the likelihoods is constant. The value turns out to be \(\frac{1}{1!2!2!}=0.25\). This is because the likelihood function (being Plackett-Luce) considers the observation to be one of the \(5!=120\) ways of ordering five distinct competitors. Matrix M has only \(\frac{5!}{2!2!1!}=30\) columns, each one of which corresponds to four different observations, all of which have the same probability. Much of the confusion arises from the practice of identifying a competitor with his strength; we write “\(p_a\)’’ [or sometimes just “\(a\)’’] to signify both a particular competitor, and his Bradley-Terry strength.

To explain the factor of \(1/4\) discussed above we need some new notation. We have five competitors who I will call \(\mathbf{do}\), \(\mathbf{re}\), \(\mathbf{me}\), \(\mathbf{fa}\), and \(\mathbf{sol}\). Now \(p_X\) means “the Bradley Terry strength of \(X\)’’. The setup here is that \(\mathbf{do}\) and \(\mathbf{re}\) are twins, and that \(\mathbf{me}\) and \(\mathbf{fa}\) are twins (and \(\mathbf{sol}\) is a singleton). We have that the Bradley-Terry strength of \(\mathbf{do}\) and \(\mathbf{re}\) is a, the strength of \(\mathbf{me}\) and \(\mathbf{fa}\) is b, and the strength of \(\mathbf{sol}\) is c.

Consider, for example, the column of M reading a c b b a. In theory, this means the competitor who came first had strength a, the competitor who came second had strength c. So the order statistic is one of the \(2!2!1!=4\) following orders:

\[\mathbf{do}\succ\mathbf{sol}\succ\mathbf{me}\succ\mathbf{fa}\succ\mathbf{re}\] \[\mathbf{do}\succ\mathbf{sol}\succ\mathbf{fa}\succ\mathbf{me}\succ\mathbf{re}\] \[\mathbf{re}\succ\mathbf{sol}\succ\mathbf{me}\succ\mathbf{fa}\succ\mathbf{do}\] \[\mathbf{re}\succ\mathbf{sol}\succ\mathbf{fa}\succ\mathbf{me}\succ\mathbf{do}\]

Above we see that \(\mathbf{do}\) and \(\mathbf{re}\) come first and last (or last and first), and \(\mathbf{me}\) and \(\mathbf{fa}\) come third and fourth (or fourth and third), while \(\mathbf{sol}\) comes second. All four possibilities have the same probability and all four are admissible order statistics for the five solfeggio syllables.

But the hyper2 package [specifically the loglik() and cheering3() functions] consider a c b b a to mean something different. It considers it to be an observation of

\[\mathbf{do}\succ\mathbf{sol}\succ\mathbf{me}\succ\mathbf{fa}\succ\mathbf{re}\]

followed by an identification of \(\mathbf{do}\) with (BT strength) a, \(\mathbf{sol}\) with strength c, and so on. Any of the four possibilities above would be equally admissible, and they all have the same probability. So to get “real’’ probabilities [real in the sense that summing the probabilities over the columns in matrix M], we have to multiply by four to include all the admissible order statistics.

The difference between these two interpretations does not matter from a likelihood perspective as likelihood is defined only up to a multiplicative constant.

0.1 A more exacting numerical test

rs <- rp(100,dirichlet(c(a=1,b=1,c=1))) # random strengths
colnames(rs) <- letters[1:3]
head(rs)
##             a       b       c
## [1,] 0.333333 0.33333 0.33333
## [2,] 0.333333 0.33333 0.33333
## [3,] 0.249770 0.49286 0.25737
## [4,] 0.095775 0.40000 0.50422
## [5,] 0.169608 0.45758 0.37281
## [6,] 0.245967 0.37768 0.37635
f <- function(s){
  jj <- s
  names(jj) <- letters[1:3]
  tfun(help=1:3,strength=jj,M=M)
}
x <- apply(rs,1,f)
hist(x-mean(x))

Above we see very tight clustering about the mean value of 0.25.

References

Hankin, R. K. S. 2017. “Partial Rank Data with the hyper2 Package: Likelihood Functions for Generalized Bradley-Terry Models.” The R Journal 9 (2): 429–39.