To cite the hyper2 package in publications, please use Hankin (2017). This short document analyses the rowing dataset: sculls in the 2016 olympics. It is not really standalone and is here as part of a consistent dataset generation mechanism used in the hyper2 package. See the hyper2 vignette for a more formal discussion. In Olympic sculling, up to six individual competitors race a small boat called a scull over a course of length \(2\,\mathrm{km}\); the object is to cross the finishing line first. Note that actual timings are irrelevant, given the model, as the sufficient statistic is the order in which competitors cross the finishing line. The 2016 Summer Olympics is a case in point: the gold and silver medallists finished less than \(\sim 5\) milliseconds apart, corresponding to a lead of \(\sim 2.5\,\mathrm{cm}\). The package dataset is file rowing.txt, here are the first three lines:

readLines("rowing.txt")[1:3]
## [1] "fournier cabrera bhokanal saensuk kelmelis teilemb"
## [2] "drysdale molnar esquivel garcia khafaji monasterio"
## [3] "obreno szymczyk rosso kholmirzayev gambour"

We can process this file using standard R functions.

rowing_table <- sapply(readLines("rowing.txt"),strsplit," ")
names(rowing_table) <- NULL  # names are pointless here
rowing_table[1:3]
## [[1]]
## [1] "fournier" "cabrera"  "bhokanal" "saensuk"  "kelmelis" "teilemb" 
## 
## [[2]]
## [1] "drysdale"   "molnar"     "esquivel"   "garcia"     "khafaji"    "monasterio"
## 
## [[3]]
## [1] "obreno"       "szymczyk"     "rosso"        "kholmirzayev" "gambour"

The first line shows that fournier came first, cabrera second, and so on. We can convert this dataset into a Plackett-Luce likelihood function as follows:

rowing <- Reduce("+",sapply(rowing_table,race))
summary(rowing)
## A hyper2 object of size 32.
## pnames:  banna bhokanal boudina cabrera campbell dongyong drysdale esquivel fournier gambour garcia grant hoff kelmelis khafaji kholmirzayev martin memo molnar monasterio obreno peebles rivarola rosso saensuk shcharbachenia synek szymczyk taieb teilemb yakovlev zambrano 
## Number of brackets: 137 
## Sum of powers: 0 
## 
## Table of bracket lengths:
##  1  2  3  4  5  6 
## 32 22 23 24 21 15 
## 
## Table of powers:
##  -3  -2  -1   1   2   3   4   5 
##   1   2 102   3   3   7  16   3

Object rowing is a loglikelihood function on the strengths of the 32 competitors. Our first step is to find the evaluate:

rowing_maxp <- maxp(rowing)
rowing_maxp
##          banna       bhokanal        boudina        cabrera       campbell       dongyong 
##     1.6203e-02     1.1179e-02     8.6617e-05     3.5607e-02     1.9001e-02     1.4935e-03 
##       drysdale       esquivel       fournier        gambour         garcia          grant 
##     3.9452e-01     1.5030e-03     4.4669e-02     1.0026e-06     2.2405e-04     2.6153e-02 
##           hoff       kelmelis        khafaji   kholmirzayev         martin           memo 
##     1.0504e-02     9.7357e-04     1.0890e-04     1.6090e-04     1.1207e-01     2.0449e-03 
##         molnar     monasterio         obreno        peebles       rivarola          rosso 
##     1.0112e-02     1.0544e-05     8.4125e-02     2.7493e-04     2.6797e-05     3.8616e-03 
##        saensuk shcharbachenia          synek       szymczyk          taieb        teilemb 
##     2.2736e-04     3.7770e-02     1.7246e-01     1.4547e-02     6.5507e-05     4.0192e-06 
##       yakovlev       zambrano 
##     1.0000e-06     9.6636e-06

and we may display it using a dot chart:

dot chart of results in the 2016 Olympic Sculls

Figure 1: dot chart of results in the 2016 Olympic Sculls

dot chart of results in the 2016 Olympic Sculls (log scale)

Figure 2: dot chart of results in the 2016 Olympic Sculls (log scale)

consistency(rowing)
consistency check for rowing function

Figure 3: consistency check for rowing function

Figure 1 shows that many competitors have very small strength and figure 2 shows that many competitors have zero strength to numerical precision. Looking at the original dataset we can see why: observe that gambour came last in every race except for race number 7 (won by boudina). In race 7, he beat only yaklovlev. But we see that yaklovlev came last in every race except race 20, where he beat only gambour. Thus the maximum likelihood estimate for gambour + yaklovlev would be zero, because these two competitors always lose to everyone else. It is thus reasonable to eliminate gambour and yaklovlev from the dataset. We may apply this reasoning recursively and end up with the following dataset:

show("rowing_minimal.txt")
## [1] "rowing_minimal.txt"

(also, we have removed drysdale, as he won everything and his ML strength would be 1). We may then analyse this:

rowing_minimal_table <- sapply(readLines("rowing_minimal.txt"),strsplit," ")
names(rowing_minimal_table) <- NULL  
rowing_minimal  <- Reduce("+",sapply(rowing_minimal_table,race))
summary(rowing_minimal)
## A hyper2 object of size 25.
## pnames:  banna bhokanal boudina cabrera campbell dongyong esquivel fournier garcia grant hoff kelmelis khafaji kholmirzayev martin memo molnar obreno peebles rivarola rosso saensuk shcharbachenia synek szymczyk 
## Number of brackets: 102 
## Sum of powers: 0 
## 
## Table of bracket lengths:
##  1  2  3  4  5  6 
## 25 18 19 17 14  9 
## 
## Table of powers:
## -4 -2 -1  1  2  3  4  5 
##  1  2 74  3  1  8 12  1

This is a much more manageable dataset, much easier to study and less cluttered, as competitors with zero estimated strength are not present.

rowing_minimal_maxp <- maxp(rowing_minimal)
rowing_minimal_maxp
##          banna       bhokanal        boudina        cabrera       campbell       dongyong 
##     1.3367e-02     3.3078e-03     3.7661e-06     3.3973e-02     1.7276e-02     2.0032e-04 
##       esquivel       fournier         garcia          grant           hoff       kelmelis 
##     2.2999e-04     5.7353e-02     1.2822e-05     2.0118e-02     9.1992e-03     8.0410e-05 
##        khafaji   kholmirzayev         martin           memo         molnar         obreno 
##     6.9772e-06     1.2491e-05     2.7153e-01     5.3597e-04     4.2569e-03     1.2824e-01 
##        peebles       rivarola          rosso        saensuk shcharbachenia          synek 
##     5.9694e-06     1.0000e-06     1.0429e-03     6.1653e-06     5.7130e-02     3.7082e-01 
##       szymczyk 
##     1.1282e-02
dotchart(rowing_minimal_maxp)
Minimal dataset, maximum likelihood estimate of strength

Figure 4: Minimal dataset, maximum likelihood estimate of strength

dotchart(log10(rowing_minimal_maxp))
Minimal dataset, maximum likelihood estimate of strength

Figure 5: Minimal dataset, maximum likelihood estimate of strength

Package dataset

Following lines create rowing.rda, residing in the data/ directory of the package.

save(rowing,rowing_minimal,rowing_maxp,rowing_minimal_maxp,rowing_table,rowing_minimal_table,file="rowing.rda")

References

Hankin, R. K. S. 2017. “Partial Rank Data with the hyper2 Package: Likelihood Functions for Generalized Bradley-Terry Models.” The R Journal 9 (2): 429–39.