To cite the hyper2 package in publications, please use Hankin (2017). Here we analyse results from the men’s javelin, 2020 Summer Olympics.

javelin_table <- as.attemptstable(read.table("javelin.txt", header=TRUE))
a <- javelin_table # saves typing
options("bold_personal_best" = FALSE)
javelin_table
## An attemptstable:
##           throw1 throw2 throw3 throw4 throw5 throw6
## Chopra     87.03  87.58  76.79      X      X  84.24
## Vadlejch   83.98      X      X  82.86  86.67      X
## Vesely     79.73  80.30  85.44      X  84.98      X
## Weber      85.30  77.90  78.00  83.10  85.15  75.72
## Nadeem     82.40      X  84.62  82.91  81.98      X
## Katkavets  82.49  81.03  83.71  79.24      X      X
## Mardare    81.16  81.73  82.84  81.90  83.30  81.09
## Etelatalo  78.43  76.59  83.28  79.20  79.99  83.05

Above, we see that Chopra threw 87.03 on his first throw, 87.58 on his second, and so on (all measurements are in meters). Vadlejch threw 83.98 on his first throw, etc. Print option bold_personal_best controls whether the personal (row-wise) best attempt is printed in bold. This does not work in rmarkdown documents but does work on the terminal and Rstudio console. We may convert javelin_table to a named vector with elements being the throw distances, and names being the competitors:

javelin_vector <- as.vector(javelin_table)
javelin_vector
##    Chopra  Vadlejch    Vesely     Weber    Nadeem Katkavets   Mardare Etelatalo 
##     87.03     83.98     79.73     85.30     82.40     82.49     81.16     78.43 
##    Chopra  Vadlejch    Vesely     Weber    Nadeem Katkavets   Mardare Etelatalo 
##     87.58        NA     80.30     77.90        NA     81.03     81.73     76.59 
##    Chopra  Vadlejch    Vesely     Weber    Nadeem Katkavets   Mardare Etelatalo 
##     76.79        NA     85.44     78.00     84.62     83.71     82.84     83.28 
##    Chopra  Vadlejch    Vesely     Weber    Nadeem Katkavets   Mardare Etelatalo 
##        NA     82.86        NA     83.10     82.91     79.24     81.90     79.20 
##    Chopra  Vadlejch    Vesely     Weber    Nadeem Katkavets   Mardare Etelatalo 
##        NA     86.67     84.98     85.15     81.98        NA     83.30     79.99 
##    Chopra  Vadlejch    Vesely     Weber    Nadeem Katkavets   Mardare Etelatalo 
##     84.24        NA        NA     75.72        NA        NA     81.09     83.05
o <- javelin_vector # saves typing

Above we see that Chopra threw the longest and second-longest throws of 87.58 and 87.03 respectively; Vadlejch threw the third-longest throw of 86.67, and so on. NA entries correspond to no-throws. It is easy to plot the data in visual form:

n <- length(levels(as.factor(names(o))))
plot(o, pch=16, col=rainbow(n)[as.factor(names(o))])
legend("topright", pch=16, col=rainbow(n), legend=levels(as.factor(names(o))))

Now convert the attempts table to a hyper3 object, using function . However, note that there are two reasonable interpretations: firstly, we simply ignore no-throws [in the package docs, these are sometimes referred to as “DNF”, or “Did not finish”, following Formula 1 terminology] and to do this we pass nothrow_loses=TRUE:

javelin <- suppfun(javelin_table, nothrow_loses=TRUE)
javelin_maxp  <- maxp(javelin)
javelin_maxp
##     Chopra  Etelatalo  Katkavets    Mardare     Nadeem   Vadlejch     Vesely 
## 0.10571282 0.14550689 0.08912494 0.22323096 0.10492853 0.06967077 0.09806042 
##      Weber 
## 0.16376468

Secondly, we may treat no-throws as being inferior to any actual measured throw. I don’t think this is realistic: no-one cares about no-throws and certainly any model in which a thrower with five no-throws and one excellent throw is given a small strength is defective. However, there are some circumstances in which this might be reasonable [Formula 1 being one example] and this is implemented by specifying nothrow_loses=TRUE:

javelin2 <- suppfun(javelin_table, nothrow_loses=TRUE)
javelin2_maxp <- maxp(javelin2)

Investigation:

rbind(javelin_maxp, javelin2_maxp)
##                  Chopra Etelatalo  Katkavets  Mardare    Nadeem   Vadlejch
## javelin_maxp  0.1057128 0.1455069 0.08912494 0.223231 0.1049285 0.06967077
## javelin2_maxp 0.1057128 0.1455069 0.08912494 0.223231 0.1049285 0.06967077
##                   Vesely     Weber
## javelin_maxp  0.09806042 0.1637647
## javelin2_maxp 0.09806042 0.1637647
pie(javelin_maxp)

pie(javelin2_maxp)
dotchart(javelin_maxp, pch=16)

dotchart(javelin2_maxp, pch=16)
ordertransplot(javelin_maxp, javelin2_maxp)

equalp.test(javelin)
## 
##  Constrained support maximization
## 
## data:  javelin
## null hypothesis: Chopra = Etelatalo = Katkavets = Mardare = Nadeem = Vadlejch = Vesely = Weber
## null estimate:
##    Chopra Etelatalo Katkavets   Mardare    Nadeem  Vadlejch    Vesely     Weber 
##     0.125     0.125     0.125     0.125     0.125     0.125     0.125     0.125 
## (argmax, constrained optimization)
## Support for null:  -123.1716 + K
## 
## alternative hypothesis:  sum p_i=1 
## alternative estimate:
##     Chopra  Etelatalo  Katkavets    Mardare     Nadeem   Vadlejch     Vesely 
## 0.10571282 0.14550689 0.08912494 0.22323096 0.10492853 0.06967077 0.09806042 
##      Weber 
## 0.16376468 
## (argmax, free optimization)
## Support for alternative:  -121.0856 + K
## 
## degrees of freedom: 7
## support difference = 2.086065
## p-value: 0.7597517
equalp.test(javelin2)
## 
##  Constrained support maximization
## 
## data:  javelin2
## null hypothesis: Chopra = Etelatalo = Katkavets = Mardare = Nadeem = Vadlejch = Vesely = Weber
## null estimate:
##    Chopra Etelatalo Katkavets   Mardare    Nadeem  Vadlejch    Vesely     Weber 
##     0.125     0.125     0.125     0.125     0.125     0.125     0.125     0.125 
## (argmax, constrained optimization)
## Support for null:  -123.1716 + K
## 
## alternative hypothesis:  sum p_i=1 
## alternative estimate:
##     Chopra  Etelatalo  Katkavets    Mardare     Nadeem   Vadlejch     Vesely 
## 0.10571282 0.14550689 0.08912494 0.22323096 0.10492853 0.06967077 0.09806042 
##      Weber 
## 0.16376468 
## (argmax, free optimization)
## Support for alternative:  -121.0856 + K
## 
## degrees of freedom: 7
## support difference = 2.086065
## p-value: 0.7597517

1 Discussion: maximum throw vs Bradley-Terry philosophy

The competition is won by the player who has the longest throw: the sufficient statistic would be the maximum of the six attempts. Bradley-Terry cannot handle this if different players have different variance: short throws are disregarded.

As an example, suppose two throwers have normally distributed distances; thrower A has a smaller mean distance than thrower B but a larger variance. Numerically:

set.seed(0)
thrower_A <- round(rnorm(6, 80.2, 2.6), 2)
thrower_B <- round(rnorm(6, 80.3, 1.3), 2)
D <- as.attemptstable(as.data.frame(rbind(thrower_A, thrower_B)))
D
## An attemptstable:
##              V1    V2    V3    V4    V5    V6
## thrower_A 83.48 79.35 83.66 83.51 81.28 76.20
## thrower_B 79.09 79.92 80.29 83.43 81.29 79.26
apply(D, 1, max)
## thrower_A thrower_B 
##     83.66     83.43

Above we see that thrower A wins, despite having a smaller mean distance.

2 Some thoughts on quantifying Chopra’s dominance

Chopra won the competition by virtue of his longest throw of 87.58 being longer than anyone else’s throw. Observe that his Bradley-Terry strength is small:

(M <- maxp(javelin))
##     Chopra  Etelatalo  Katkavets    Mardare     Nadeem   Vadlejch     Vesely 
## 0.10571282 0.14550689 0.08912494 0.22323096 0.10492853 0.06967077 0.09806042 
##      Weber 
## 0.16376468

By that criterion, Mardare wins, but not significantly:

specificp.test(javelin2, "Mardare", 1/8)
## 
##  Constrained support maximization
## 
## data:  H
## null hypothesis: sum p_i=1, Mardare = 0.125
## null estimate:
##     Chopra  Etelatalo  Katkavets    Mardare     Nadeem   Vadlejch     Vesely 
## 0.12281812 0.16225363 0.10161935 0.12500000 0.11871655 0.08132233 0.11190384 
##      Weber 
## 0.17636618 
## (argmax, constrained optimization)
## Support for null:  -122.0673 + K
## 
## alternative hypothesis:  sum p_i=1 
## alternative estimate:
##     Chopra  Etelatalo  Katkavets    Mardare     Nadeem   Vadlejch     Vesely 
## 0.10571282 0.14550689 0.08912494 0.22323096 0.10492853 0.06967077 0.09806042 
##      Weber 
## 0.16376468 
## (argmax, free optimization)
## Support for alternative:  -121.0856 + K
## 
## degrees of freedom: 1
## support difference = 0.9817391
## p-value: 0.1611418 (two sided)

It seems that Mardare’s consistent performance gives him BT strength, but not a long longest throw. How to modify BT to account for this?

2.1 Some thoughts on argument nothrow_loses

Converting an attemptstable to a likelihood function with suppfun() takes Boolean argument nothrow_loses, via function attemptstable2supp().

a <- rattemptstable(3,6,0)
a[] <- c(10,9,8,rep(c(NA,8,9),4),10,9,8)
a
## An attemptstable:
##   throw1 throw2 throw3 throw4 throw5 throw6
## a     10      X      X      X      X     10
## b      9      8      8      8      8      9
## c      8      9      9      9      9      8

Above, we see a simple attemptstable object. Competitor a has many no-throws but his two actual throws are the best in the field. We can generate a likelihood function in two different ways, and then maximize the likelihood. First we set nothrow_loses=TRUE, so no-throws count as losing to any real throw. If this is the case, competitor a wins twice and loses four times:

maxp(suppfun(a, nothrow_loses=TRUE))
##          a          b          c 
## 0.08978683 0.39085552 0.51935765

Above, we see that a has the low estimated strength of about 0.09, reasonable on the grounds that he lost four times out of six.

maxp(suppfun(a, nothrow_loses=FALSE))
##            a            b            c 
## 9.999902e-01 4.182447e-06 5.585664e-06

Above, we see the effect of setting nothrow_loses=FALSE. Now a has the higher estimated strength of 0.99+, reasonable on the grounds that every time he competed (in an informative trial), he won.

2.2 Some thoughts on quantifying the longest throw

In general, consider a random variable \(X\) with PDF \(f(x)\) and CDF \(F(x)\). Write \(x_{(n)} = \max(X_1,\ldots,X_n)\) for the maximum of \(n\) observations. In the case where the \(X_i\) are independent and distributed the same as \(X\), we have that the CDF of \(x_{(n)}\) is \(F(x)^n\) and the PDF is \(f(x)F(x)^{n-1}\).

Consider, similarly, random variable \(y\) with PDF \(g(y)\) and CDF \(G(y)\), with the maximum \(y_{(m)}\) of \(m\) independent observations having CDF \(G(y)^m\). We ask some simple questions. Firstly, what is the probability \(P\) that \(x_{(n)}\) exceeds \(y_{(m)}\)? If \(X,Y\) are independent we have

\[\begin{eqnarray} P &=& \int_{x=-\infty}^{x=+\infty} \int_{y=-\infty}^{y=x} [F(x)^n]'[G(y)^m]'dy\,dx\\ &=& \int_{x=-\infty}^{x=+\infty} [F(x)^n]' \left[ \int_{y=-\infty}^{y=x} [G(y)^m]'\,dy\right]\,dx\\ &=& \int_{x=-\infty}^{x=+\infty} [F(x)^n]'G(x)^m\,dx\\ \end{eqnarray}\]

In the special case of \(F(x)=G(x)\) we have

\[\begin{eqnarray} P&=& \int_{x=-\infty}^{x=+\infty} [F(x)^n]'F(x)^m\,dx\\&=& \int_{x=-\infty}^{x=+\infty} nf(x)F(x)^{m+n}\,dx\\&=& \frac{n}{n+m}\int_{x=-\infty}^{x=+\infty} [F(x)^{n+m}]'\,dx\\&=& \frac{n}{n+m} \end{eqnarray}\]

(Above, the “trick” is to realise that if \(F(x)=G(x)\), then the integrand \(n[F(x)^n]'F(x)^m\) is easily integrated over the whole real line, being equal to \(n\frac{[F(x)^{n+m}]'}{n+m}\), giving \(\frac{n}{n+m}\). There does not seem to be any straightforward way to generalize this trick.

Package dataset

Following lines create javelin.rda, residing in the data/ directory of the package.

save(javelin, javelin2, javelin_table, javelin_vector, javelin_maxp, javelin2_maxp, file="javelin.rda")

References

https://en.wikipedia.org/w/index.php?title=Athletics_at_the_2020_Summer_Olympics_%E2%80%93_Men%27s_javelin_throw&oldid=1051014985

Hankin, R. K. S. 2017. “Partial Rank Data with the hyper2 Package: Likelihood Functions for Generalized Bradley-Terry Models.” The R Journal 9 (2): 429–39.