cheering3() probability model for order statistics: a verification with the multiset() function of the partitions packageTo cite the hyper2 package in publications, please use Hankin (2017).
Here we will sum probabilities across a sample space and verify that
the total is constant. In theory the sum of the probabilities of all
members of the sample space is 1, but in practice [and certainly in
hyper2 idiom], the likelihood calculation includes an indeterminate
constant which means that all we can say is that the total probability
is not a function of the parameters of the distribution.
Here I present some relatively informal and unstructured thoughts on
this issue from a numerical and hyper2 perspective.
M <- apply(partitions::multiset(c(1,1,2,2,3)),2,function(x){letters[x]})
colnames(M) <- rep("",ncol(M)) # tidier and more compact appearance
M <- noquote(M)
M
##
## [1,] a a a a a a a a a a a a b b b b b b b b b b b b c c c c c c
## [2,] a a a b b b b b b c c c a a a a a a b b b c c c a a a b b b
## [3,] b b c a a b b c c a b b a a b b c c a a c a a b a b b a a b
## [4,] b c b b c a c a b b a b b c a c a b a c a a b a b a b a b a
## [5,] c b b c b c a b a b b a c b c a b a c a a b a a b b a b a a
Above, we use multiset() to enumerate the sample space of five
competitors comprising two twins denoted a, two twins denoted b,
and a singleton c. This document discusses the issue of whether the
twins are identical twins and hence indistinguishable. Matrix M has
\({5\choose2\,2\,1}=30\) columns, one per possible result. We will
calculate the likelihood of each column and sum them to get the
probability of observing any one of the 30 order statistics [the
observations are mutually exclusive, so
\(\operatorname{Prob}\left(x_1\cup\cdots\cup x_n\right) =
\operatorname{Prob}\left(x_1\right)+\cdots
+\operatorname{Prob}\left(x_n\right)\)]:
tfun <- function(help,strength,M){
out <- 0
for(i in seq_len(ncol(M))){
vec <- M[,i]
out <- out +
loglik(strength,cheering3(v=vec,e=c(a=1,b=2,c=3),help=help),log=FALSE)
}
return(out)
}
print(tfun(help=(1:3)/10, strength= c(a=0.1,b=0.3,c=0.6),M))
## [1] 0.25
print(tfun(help=(7:5)/22, strength= c(a=0.3,b=0.2,c=0.5),M))
## [1] 0.25
Above we see that the sum of the likelihoods is constant. The value
turns out to be \(\frac{1}{1!2!2!}=0.25\). This is because the
likelihood function (being Plackett-Luce) considers the observation to
be one of the \(5!=120\) ways of ordering five distinct competitors.
Matrix M has only \(\frac{5!}{2!2!1!}=30\) columns, each one of which
corresponds to four different observations, all of which have the same
probability. Much of the confusion arises from the practice of
identifying a competitor with his strength; we write “\(p_a\)’’ [or
sometimes just “\(a\)’’] to signify both a particular competitor, and
his Bradley-Terry strength.
To explain the factor of \(1/4\) discussed above we need some new
notation. We have five competitors who I will call \(\mathbf{do}\),
\(\mathbf{re}\), \(\mathbf{me}\), \(\mathbf{fa}\), and \(\mathbf{sol}\). Now
\(p_X\) means “the Bradley Terry strength of \(X\)’’. The setup here is
that \(\mathbf{do}\) and \(\mathbf{re}\) are twins, and that \(\mathbf{me}\)
and \(\mathbf{fa}\) are twins (and \(\mathbf{sol}\) is a singleton). We
have that the Bradley-Terry strength of \(\mathbf{do}\) and
\(\mathbf{re}\) is a, the strength of \(\mathbf{me}\) and \(\mathbf{fa}\)
is b, and the strength of \(\mathbf{sol}\) is c.
Consider, for example, the column of M reading a c b b a. In
theory, this means the competitor who came first had strength a, the
competitor who came second had strength c. So the order statistic
is one of the \(2!2!1!=4\) following orders:
\[\mathbf{do}\succ\mathbf{sol}\succ\mathbf{me}\succ\mathbf{fa}\succ\mathbf{re}\] \[\mathbf{do}\succ\mathbf{sol}\succ\mathbf{fa}\succ\mathbf{me}\succ\mathbf{re}\] \[\mathbf{re}\succ\mathbf{sol}\succ\mathbf{me}\succ\mathbf{fa}\succ\mathbf{do}\] \[\mathbf{re}\succ\mathbf{sol}\succ\mathbf{fa}\succ\mathbf{me}\succ\mathbf{do}\]
Above we see that \(\mathbf{do}\) and \(\mathbf{re}\) come first and last (or last and first), and \(\mathbf{me}\) and \(\mathbf{fa}\) come third and fourth (or fourth and third), while \(\mathbf{sol}\) comes second. All four possibilities have the same probability and all four are admissible order statistics for the five solfeggio syllables.
But the hyper2 package [specifically the loglik() and
cheering3() functions] consider a c b b a to mean something
different. It considers it to be an observation of
\[\mathbf{do}\succ\mathbf{sol}\succ\mathbf{me}\succ\mathbf{fa}\succ\mathbf{re}\]
followed by an identification of \(\mathbf{do}\) with (BT strength) a,
\(\mathbf{sol}\) with strength c, and so on. Any of the four
possibilities above would be equally admissible, and they all have the
same probability. So to get “real’’ probabilities [real in the sense
that summing the probabilities over the columns in matrix M], we
have to multiply by four to include all the admissible order
statistics.
The difference between these two interpretations does not matter from a likelihood perspective as likelihood is defined only up to a multiplicative constant.
rs <- rp(100,dirichlet(c(a=1,b=1,c=1))) # random strengths
colnames(rs) <- letters[1:3]
head(rs)
## a b c
## [1,] 0.333333 0.33333 0.33333
## [2,] 0.333333 0.33333 0.33333
## [3,] 0.249770 0.49286 0.25737
## [4,] 0.095775 0.40000 0.50422
## [5,] 0.169608 0.45758 0.37281
## [6,] 0.245967 0.37768 0.37635
f <- function(s){
jj <- s
names(jj) <- letters[1:3]
tfun(help=1:3,strength=jj,M=M)
}
x <- apply(rs,1,f)
hist(x-mean(x))
Above we see very tight clustering about the mean value of 0.25.
hyper2 Package: Likelihood Functions for Generalized Bradley-Terry Models.” The R Journal 9 (2): 429–39.