To cite the hyper2 package in publications, please use Hankin (2017). Here we analyse results from the men’s javelin, 2020 Summer Olympics.

javelin_table <- as.attemptstable(read.table("javelin.txt", header=TRUE))
a <- javelin_table # saves typing
options("bold_personal_best" = FALSE)
javelin_table
## An attemptstable:
##           throw1 throw2 throw3 throw4 throw5 throw6
## Chopra     87.03  87.58  76.79      X      X  84.24
## Vadlejch   83.98      X      X  82.86  86.67      X
## Vesely     79.73  80.30  85.44      X  84.98      X
## Weber      85.30  77.90  78.00  83.10  85.15  75.72
## Nadeem     82.40      X  84.62  82.91  81.98      X
## Katkavets  82.49  81.03  83.71  79.24      X      X
## Mardare    81.16  81.73  82.84  81.90  83.30  81.09
## Etelatalo  78.43  76.59  83.28  79.20  79.99  83.05

Above, we see that Chopra threw 87.03 on his first throw, 87.58 on his second, and so on (all measurements are in meters). Vadlejch threw 83.98 on his first throw, etc. Print option bold_personal_best controls whether the personal (row-wise) best attempt is printed in bold. This does not work in rmarkdown documents but does work on the terminal and Rstudio console. We may convert javelin_table to a named vector with elements being the throw distances, and names being the competitors:

javelin_vector <- as.vector(javelin_table)
javelin_vector
##    Chopra    Chopra  Vadlejch    Vesely     Weber     Weber    Vesely    Nadeem 
##     87.58     87.03     86.67     85.44     85.30     85.15     84.98     84.62 
##    Chopra  Vadlejch Katkavets   Mardare Etelatalo     Weber Etelatalo    Nadeem 
##     84.24     83.98     83.71     83.30     83.28     83.10     83.05     82.91 
##  Vadlejch   Mardare Katkavets    Nadeem    Nadeem   Mardare   Mardare   Mardare 
##     82.86     82.84     82.49     82.40     81.98     81.90     81.73     81.16 
##   Mardare Katkavets    Vesely Etelatalo    Vesely Katkavets Etelatalo Etelatalo 
##     81.09     81.03     80.30     79.99     79.73     79.24     79.20     78.43 
##     Weber     Weber    Chopra Etelatalo     Weber  Vadlejch    Nadeem  Vadlejch 
##     78.00     77.90     76.79     76.59     75.72        NA        NA        NA 
##    Chopra    Vesely    Chopra Katkavets  Vadlejch    Vesely    Nadeem Katkavets 
##        NA        NA        NA        NA        NA        NA        NA        NA
o <- javelin_vector # saves typing

Above we see that Chopra threw the longest and second-longest throws of 87.58 and 87.03 respectively; Vadlejch threw the third-longest throw of 86.67, and so on. NA entries correspond to no-throws. It is easy to plot the data in visual form:

n <- length(levels(as.factor(names(o))))
plot(o, pch=16, col=rainbow(n)[as.factor(names(o))])
legend("topright", pch=16, col=rainbow(n), legend=levels(as.factor(names(o))))

Now convert the attempts table to a hyper3 object, using function . However, note that there are two reasonable interpretations: firstly, we simply ignore no-throws [in the package docs, these are sometimes referred to as “DNF”, or “Did not finish”, following Formula 1 terminology] and to do this we pass dnf.last=FALSE:

javelin <- suppfun(javelin_table, dnf.last=FALSE)
javelin_maxp  <- maxp(javelin)
javelin_maxp
##     Chopra  Etelatalo  Katkavets    Mardare     Nadeem   Vadlejch     Vesely 
## 0.09304820 0.04818052 0.09285335 0.11734164 0.17303031 0.32062833 0.11402735 
##      Weber 
## 0.04089030

Secondly, we may treat no-throws as being inferior to any actual measured throw. I don’t think this is realistic: noone cares about no-throws and certainly any model in which a thrower with five no-throws and one excellent throw is given a small strength is defective. However, there are some circumstances in which this might be reasonable [Formula 1 being one example] and this is implemented by specifying dnf.last=TRUE:

javelin2 <- suppfun(javelin_table, dnf.last=TRUE)
javelin2_maxp <- maxp(javelin2)

Investigation:

rbind(javelin_maxp, javelin2_maxp)
##                  Chopra  Etelatalo  Katkavets   Mardare    Nadeem   Vadlejch
## javelin_maxp  0.0930482 0.04818052 0.09285335 0.1173416 0.1730303 0.32062833
## javelin2_maxp 0.1057128 0.14550689 0.08912494 0.2232310 0.1049285 0.06967077
##                   Vesely     Weber
## javelin_maxp  0.11402735 0.0408903
## javelin2_maxp 0.09806042 0.1637647
pie(javelin_maxp)

pie(javelin2_maxp)

dotchart(javelin_maxp, pch=16)

dotchart(javelin2_maxp, pch=16)

ordertransplot(javelin_maxp, javelin2_maxp)

equalp.test(javelin)
## 
##  Constrained support maximization
## 
## data:  javelin
## null hypothesis: Chopra = Etelatalo = Katkavets = Mardare = Nadeem = Vadlejch = Vesely = Weber
## null estimate:
##    Chopra Etelatalo Katkavets   Mardare    Nadeem  Vadlejch    Vesely     Weber 
##     0.125     0.125     0.125     0.125     0.125     0.125     0.125     0.125 
## (argmax, constrained optimization)
## Support for null:  -99.33061 + K
## 
## alternative hypothesis:  sum p_i=1 
## alternative estimate:
##     Chopra  Etelatalo  Katkavets    Mardare     Nadeem   Vadlejch     Vesely 
## 0.09304820 0.04818052 0.09285335 0.11734164 0.17303031 0.32062833 0.11402735 
##      Weber 
## 0.04089030 
## (argmax, free optimization)
## Support for alternative:  -94.89693 + K
## 
## degrees of freedom: 7
## support difference = 4.433682
## p-value: 0.2623196
equalp.test(javelin2)
## 
##  Constrained support maximization
## 
## data:  javelin2
## null hypothesis: Chopra = Etelatalo = Katkavets = Mardare = Nadeem = Vadlejch = Vesely = Weber
## null estimate:
##    Chopra Etelatalo Katkavets   Mardare    Nadeem  Vadlejch    Vesely     Weber 
##     0.125     0.125     0.125     0.125     0.125     0.125     0.125     0.125 
## (argmax, constrained optimization)
## Support for null:  -123.1716 + K
## 
## alternative hypothesis:  sum p_i=1 
## alternative estimate:
##     Chopra  Etelatalo  Katkavets    Mardare     Nadeem   Vadlejch     Vesely 
## 0.10571282 0.14550689 0.08912494 0.22323096 0.10492853 0.06967077 0.09806042 
##      Weber 
## 0.16376468 
## (argmax, free optimization)
## Support for alternative:  -121.0856 + K
## 
## degrees of freedom: 7
## support difference = 2.086065
## p-value: 0.7597517

1 Discussion: maximum throw vs Bradley-Terry philosophy

The competition is won by the player who has the longest throw: the sufficient statistic would be the maximum of the six attempts. Bradley-Terry cannot handle this if different players have different variance: short throws are disregarded.

As an example, suppose two throwers have normally distributed distances; thrower A has a smaller mean distance than thrower B but a larger variance. Numerically:

set.seed(0)
thrower_A <- round(rnorm(6, 80.2, 2.6), 2)
thrower_B <- round(rnorm(6, 80.3, 1.3), 2)
D <- as.attemptstable(as.data.frame(rbind(thrower_A, thrower_B)))
D
## An attemptstable:
##              V1    V2    V3    V4    V5    V6
## thrower_A 83.48 79.35 83.66 83.51 81.28 76.20
## thrower_B 79.09 79.92 80.29 83.43 81.29 79.26
apply(D, 1, max)
## thrower_A thrower_B 
##     83.66     83.43

Above we see that thrower A wins, despite having a smaller mean distance.

2 Some thoughts on quantifying Chopra’s dominance

Chopra won the competition by virtue of his longest throw of 87.58 being longer than anyone else’s throw. Observe that his Bradley-Terry strength is small:

(M <- maxp(javelin))
##     Chopra  Etelatalo  Katkavets    Mardare     Nadeem   Vadlejch     Vesely 
## 0.09304820 0.04818052 0.09285335 0.11734164 0.17303031 0.32062833 0.11402735 
##      Weber 
## 0.04089030

By that criterion, Mardare wins, but not significantly:

specificp.test(javelin2, "Mardare", 1/8)
## 
##  Constrained support maximization
## 
## data:  H
## null hypothesis: sum p_i=1, Mardare = 0.125
## null estimate:
##     Chopra  Etelatalo  Katkavets    Mardare     Nadeem   Vadlejch     Vesely 
## 0.12281812 0.16225363 0.10161935 0.12500000 0.11871655 0.08132233 0.11190384 
##      Weber 
## 0.17636618 
## (argmax, constrained optimization)
## Support for null:  -122.0673 + K
## 
## alternative hypothesis:  sum p_i=1 
## alternative estimate:
##     Chopra  Etelatalo  Katkavets    Mardare     Nadeem   Vadlejch     Vesely 
## 0.10571282 0.14550689 0.08912494 0.22323096 0.10492853 0.06967077 0.09806042 
##      Weber 
## 0.16376468 
## (argmax, free optimization)
## Support for alternative:  -121.0856 + K
## 
## degrees of freedom: 1
## support difference = 0.9817391
## p-value: 0.1611418 (two sided)

It seems that Mardare’s consistent performance gives him BT strength, but not a long longest throw. How to modify BT to account for this?

2.1 Some thoughts on quantifying the longest throw

In general, consider a random variable \(X\) with PDF \(f(x)\) and CDF \(F(x)\). Write \(x_{(n)} = \max(X_1,\ldots,X_n)\) for the maximum of \(n\) observations. In the case where the \(X_i\) are independent and distributed the same as \(X\), we have that the CDF of \(x_{(n)}\) is \(F(x)^n\) and the PDF is \(f(x)F(x)^{n-1}\).

Consider, similarly, random variable \(y\) with PDF \(g(y)\) and CDF \(G(y)\), with the maximum \(y_{(m)}\) of \(m\) independent observations having CDF \(G(y)^m\). We ask some simple questions. Firstly, what is the probability \(P\) that \(x_{(n)}\) exceeds \(y_{(m)}\)? If \(X,Y\) are independent we have

\[\begin{eqnarray} P &=& \int_{x=-\infty}^{x=+\infty} \int_{y=-\infty}^{y=x} [F(x)^n]'[G(y)^m]'dy\,dx\\ &=& \int_{x=-\infty}^{x=+\infty} [F(x)^n]' \left[ \int_{y=-\infty}^{y=x} [G(y)^m]'\,dy\right]\,dx\\ &=& \int_{x=-\infty}^{x=+\infty} [F(x)^n]'G(x)^m\,dx\\ \end{eqnarray}\]

In the special case of \(F(x)=G(x)\) we have

\[\begin{eqnarray} P&=& \int_{x=-\infty}^{x=+\infty} [F(x)^n]'F(x)^m\,dx\\&=& \int_{x=-\infty}^{x=+\infty} nf(x)F(x)^{m+n}\,dx\\&=& \frac{n}{n+m}\int_{x=-\infty}^{x=+\infty} [F(x)^{n+m}]'\,dx\\&=& \frac{n}{n+m} \end{eqnarray}\]

(Above, the “trick” is to realise that if \(F(x)=G(x)\), then the integrand \(n[F(x)^n]'F(x)^m\) is easily integrated over the whole real line, being equal to \(n\frac{[F(x)^{n+m}]'}{n+m}\), giving \(\frac{n}{n+m}\). There does not seem to be any straightforward way to generalize this trick.

Package dataset

Following lines create javelin.rda, residing in the data/ directory of the package.

save(javelin, javelin2, javelin_table, javelin_vector, javelin_maxp, javelin2_maxp, file="javelin.rda")

References

https://en.wikipedia.org/w/index.php?title=Athletics_at_the_2020_Summer_Olympics_%E2%80%93_Men%27s_javelin_throw&oldid=1051014985

Hankin, R. K. S. 2017. “Partial Rank Data with the hyper2 Package: Likelihood Functions for Generalized Bradley-Terry Models.” The R Journal 9 (2): 429–39.