The weighted hyperdirichlet distribution: the hyper3 class

To cite the hyper2 package in publications, please use Hankin (2017). Objects of class hyper3 are a generalization of hyper2 objects that allow the brackets to contain weighted probabilities. Likelihood functions are defined on non-negative numbers \(p_1,\ldots, p_n\) subject to the unit-sum constraint \(\sum p_i=1\). Given known weights \(w^i_j\) with \(1\leq i\leq j\) we have

\[\mathcal{L}\left(p_1,\ldots, p_n\right)=\prod_j\left(\sum_{i=1}^n w^i_jp_i\right)^{n_j}.\]

As a motivating example, suppose two teams (players) with Bradley-Terry strengths \(p_1,p_2\) play football where we quantify the home-ground advantage with a term \(\lambda\). If \(p_1\) plays at home \(a+b\) times with \(a\) wins and \(b\) losses, and plays away [so \(p_2\) is at home] \(c+d\) times with \(c\) wins and \(d\) losses, then a sensible likelihood function might be

\[\mathcal{L}(p_1,p_2,\lambda;a,b,c,d)= \left(\frac{\lambda p_1}{\lambda p_1 + p_2}\right)^{a} \left(\frac{p_2 }{\lambda p_1 + p_2}\right)^{b} \left(\frac{p_1 }{p_1 + \lambda p_2}\right)^{c} \left(\frac{\lambda p_2}{p_1 + \lambda p_2}\right)^{d}. \]

where we understand that \(p_1+p_2=1\), \(p_1,p_2,\lambda\geqslant 0\). Elementary techniques allow us to identify a maximum and we find

\[ \hat{p_1}=\frac{\sqrt{ac}}{\sqrt{ac}+\sqrt{bd}}\qquad \hat{p_2}=\frac{\sqrt{bd}}{\sqrt{ac}+\sqrt{bd}}\qquad \hat{\lambda}=\sqrt{\frac{ad}{bc}} \]

If we consider \(p_1\) at home and \(p_2\) away, we have \(a+b\) matches. Using the maximum likelihood estimates \(\hat{p_1},\hat{p_2},\hat{\lambda}\), the expected number of wins by \(p_1\) is

\[ (a+b)\frac{\hat{\lambda}\hat{p_1}}{\hat{\lambda}\hat{p_1} + \hat{p_2}} = (a+b)\cdot \frac{ \sqrt{\frac{ad}{bc}}\frac{\sqrt{ac}}{\sqrt{ac}+\sqrt{bd}} }{ \sqrt{\frac{ad}{bc}}\frac{\sqrt{ac}}{\sqrt{ac}+\sqrt{bd}} + \frac{\sqrt{bd}}{\sqrt{ac}+\sqrt{bd}} } =\cdots=a \]

Also the expected number of matches won by \(p_2\) when \(p_1\) is at home can be shown to be \(b\), and similarly when \(p_2\) is at home, the expected number of victories is \(c,d\) respectively. Thus, at least in the two-team case, the expectations under the ML estimates are exactly the observations.

`hyper2` idiom for likelihood functions

Likelihoods of this form are easily specified using package idiom.

If \(a=6,b=2,c=7,d=1\) and \(\lambda=1.3\) [assumed for the moment to be known], appropriate package idiom might be:

library("hyper2",quietly=TRUE)
A <- 6 ; B <- 2 ; C <- 7 ; D <- 1 # upper-case to avoid conflicts
H <- hyper3()
H[c(p1=1.3)]      %<>% inc(A) 
H[c(p2=1)]        %<>% inc(B) 
H[c(p1=1.3,p2=1)] %<>% dec(A+B)
H[c(p1=1)]        %<>% inc(C) 
H[c(p2=1.3)]      %<>% inc(D) 
H[c(p1=1,p2=1.3)] %<>% dec(C+D)
H

## log( (p1=1)^7 * (p1=1, p2=1.3)^-8 * (p1=1.3)^6 * (p1=1.3, p2=1)^-8 *
## (p2=1)^2 * (p2=1.3)^1)

Bespoke function home_away3() achieves the same result with a more convenient input:

M <- matrix(c(NA, 6+2i ,1+7i, NA),2,2,byrow=TRUE)
teams <- c("p1","p2")
dimnames(M) <- list("@home" = teams,"@away"=teams)
dimnames(M) <- list("@home" = teams,"@away"=teams)
print(M)

##      @away
## @home   p1   p2
##    p1   NA 6+2i
##    p2 1+7i   NA

home_away3(M,lambda=1.3)

## log( (p1=1)^7 * (p1=1, p2=1.3)^-8 * (p1=1.3)^6 * (p1=1.3, p2=1)^-8 *
## (p2=1)^2 * (p2=1.3)^1)

Above, object M is a \(2\times 2\) complex matrix, with real parts being home wins and imaginary parts being away wins. See how home_away3(M) returns the same likelihood function as the explicit calculation performed by hand previously. Now, we can calculate maximum likelihood estimates \(\hat{p_1},\hat{p_2}\) of \(p_1\) and \(p_2\) using numerical optimization techniques, maxp() in package idiom:

maxp(H)

##      p1      p2 
## 0.81575 0.18425

Further, we can test whether the players are in fact of equal strength:

equalp.test(H)

## 
##  Constrained support maximization
## 
## data:  H
## null hypothesis: p1 = p2
## null estimate:
##  p1  p2 
## 0.5 0.5 
## (argmax, constrained optimization)
## Support for null:  -11.49 + K
## 
## alternative hypothesis:  sum p_i=1 
## alternative estimate:
##      p1      p2 
## 0.81575 0.18425 
## (argmax, free optimization)
## Support for alternative:  -8.067 + K
## 
## degrees of freedom: 1
## support difference = 3.423
## p-value: 0.0088838

Showing convincingly that we may reject the null that \(p_1=p_2\) and assert that the players do in fact differ in strength.

Another motivating example might come from Plackett-Luce likelihood functions for order statistics. Suppose we have three players with nonnegative strengths \(p_\mathrm{M},p_\mathrm{RB},p_\mathrm{F}\), with unit sum. These players might be conceptualised as F1 constructors (here Mercedes (M), Red Bull (RB), and Ferrari (F) for the sake of argument). Each constructor fields two cars and the order statistic might be

\[ RB\succ M\succ F\succ F\succ RB\succ M\]

indicating that the finishing order was Red Bull first, Mercedes second, Ferrari third, and so on (this was in fact observed at the 2021 Emilia Romagna Grand Prix). The Placket-Luce likelihood function would be

\[ \frac{p_\mathrm{RB}}{p_\mathrm{RB}+p_\mathrm{M}+p_\mathrm{F}+p_\mathrm{F}+p_\mathrm{RB}+p_\mathrm{M}}\cdot \frac{p_\mathrm{M}}{ p_\mathrm{M}+p_\mathrm{F}+p_\mathrm{F}+p_\mathrm{RB}+p_\mathrm{M}}\cdot \frac{p_\mathrm{F}}{ p_\mathrm{F}+p_\mathrm{F}+p_\mathrm{RB}+p_\mathrm{M}}\cdot \frac{p_\mathrm{F}}{ p_\mathrm{F}+p_\mathrm{RB}+p_\mathrm{M}}\cdot \frac{p_\mathrm{RB}}{ p_\mathrm{RB}+p_\mathrm{M}}\cdot \frac{p_\mathrm{M}}{ p_\mathrm{M}} \]

or, in a form more suitable for the package,

\[ \frac{p_\mathrm{RB}}{2p_\mathrm{RB}+ 2p_\mathrm{M}+ 2p_\mathrm{F}}\cdot \frac{p_\mathrm{M}}{ p_\mathrm{RB}+ 2p_\mathrm{M}+ 2p_\mathrm{F}}\cdot \frac{p_\mathrm{F}}{ p_\mathrm{RB}+ p_\mathrm{M}+ 2p_\mathrm{F}}\cdot \frac{p_\mathrm{F}}{ p_\mathrm{RB}+ p_\mathrm{M}+ p_\mathrm{F}}\cdot \frac{p_\mathrm{RB}}{ p_\mathrm{RB}+ p_\mathrm{M}}\]

We could proceed as follows:

a <- hyper3()
a[c(R=1)] %<>% inc
a[c(R=2,M=2,F=2)] %<>% dec
a[c(M=1)] %<>% inc
a[c(R=1,M=2,F=2)] %<>% dec
a[c(F=1)] %<>% inc
a[c(R=1,M=1,F=2)] %<>% dec
a[c(F=1)] %<>% inc
a[c(R=1,M=1,F=1)] %<>% dec
a[c(R=1)] %<>% inc
a[c(R=1,M=1    )] %<>% dec
a

## log( (F=1)^2 * (F=1, M=1, R=1)^-1 * (F=2, M=1, R=1)^-1 * (F=2, M=2,
## R=1)^-1 * (F=2, M=2, R=2)^-1 * (M=1)^1 * (M=1, R=1)^-1 * (R=1)^2)

Or maybe more logically

b <- hyper3()
b["R"] %<>% inc ; b[c("R","M","F","F","R","M")] %<>% dec
b["M"] %<>% inc ; b[c(    "M","F","F","R","M")] %<>% dec
b["F"] %<>% inc ; b[c(        "F","F","R","M")] %<>% dec
b["F"] %<>% inc ; b[c(            "F","R","M")] %<>% dec
b["R"] %<>% inc ; b[c(                "R","M")] %<>% dec

a==b

## [1] TRUE

However, bespoke function ordervec2supp3() performs all this in one step:

ordervec2supp3(c("RB","M","F","F","RB","M"))

## log( (F=1)^2 * (F=1, M=1, RB=1)^-1 * (F=2, M=1, RB=1)^-1 * (F=2, M=2,
## RB=1)^-1 * (F=2, M=2, RB=2)^-1 * (M=1)^1 * (M=1, RB=1)^-1 * (RB=1)^2)

And then

ma <- maxp(a)
ma

##       F       M       R 
## 0.43073 0.17538 0.39390

Evidence for different strengths?

equalp.test(a)

## 
##  Constrained support maximization
## 
## data:  a
## null hypothesis: F = M = R
## null estimate:
##       F       M       R 
## 0.33333 0.33333 0.33333 
## (argmax, constrained optimization)
## Support for null:  -6.5793 + K
## 
## alternative hypothesis:  sum p_i=1 
## alternative estimate:
##       F       M       R 
## 0.43073 0.17538 0.39390 
## (argmax, free optimization)
## Support for alternative:  -6.2505 + K
## 
## degrees of freedom: 2
## support difference = 0.32879
## p-value: 0.71979

No evidence agains the null of equal strengths. Further:

hyper2::gradient(a,ma)  # should be small at the evaluate

## [1] 0.00087972 0.00191719

References

Hankin, R. K. S. 2017. “Partial Rank Data with the hyper2 Package: Likelihood Functions for Generalized Bradley-Terry Models.” The R Journal 9 (2): 429–39.

The weighted hyperdirichlet distribution: the `hyper3` class

R. K. S. Hankin

`hyper2` idiom for likelihood functions

References

The weighted hyperdirichlet distribution: the hyper3 class

R. K. S. Hankin

hyper2 idiom for likelihood functions

References

The weighted hyperdirichlet distribution: the `hyper3` class

`hyper2` idiom for likelihood functions