To cite the hyper2 package in publications, please use Hankin (2017).
This short document analyses the rowing dataset: sculls in the 2016
olympics. It is not really standalone and is here as part of a
consistent dataset generation mechanism used in the hyper2 package.
See the hyper2 vignette for a more formal discussion. In Olympic
sculling, up to six individual competitors race a small boat called a
scull over a course of length \(2\,\mathrm{km}\); the object is to cross
the finishing line first. Note that actual timings are irrelevant,
given the model, as the sufficient statistic is the order in which
competitors cross the finishing line. The 2016 Summer Olympics is a
case in point: the gold and silver medallists finished less than \(\sim
5\) milliseconds apart, corresponding to a lead of \(\sim
2.5\,\mathrm{cm}\). The package dataset is file rowing.txt, here are
the first three lines:
readLines("rowing.txt")[1:3]
## [1] "fournier cabrera bhokanal saensuk kelmelis teilemb"
## [2] "drysdale molnar esquivel garcia khafaji monasterio"
## [3] "obreno szymczyk rosso kholmirzayev gambour"
We can process this file using standard R functions.
rowing_table <- sapply(readLines("rowing.txt"),strsplit," ")
names(rowing_table) <- NULL # names are pointless here
rowing_table[1:3]
## [[1]]
## [1] "fournier" "cabrera" "bhokanal" "saensuk" "kelmelis" "teilemb"
##
## [[2]]
## [1] "drysdale" "molnar" "esquivel" "garcia" "khafaji" "monasterio"
##
## [[3]]
## [1] "obreno" "szymczyk" "rosso" "kholmirzayev" "gambour"
The first line shows that fournier came first, cabrera second, and
so on. We can convert this dataset into a Plackett-Luce likelihood
function as follows:
rowing <- Reduce("+",sapply(rowing_table,race))
summary(rowing)
## A hyper2 object of size 32.
## pnames: banna bhokanal boudina cabrera campbell dongyong drysdale esquivel fournier gambour garcia grant hoff kelmelis khafaji kholmirzayev martin memo molnar monasterio obreno peebles rivarola rosso saensuk shcharbachenia synek szymczyk taieb teilemb yakovlev zambrano
## Number of brackets: 137
## Sum of powers: 0
##
## Table of bracket lengths:
## 1 2 3 4 5 6
## 32 22 23 24 21 15
##
## Table of powers:
## -3 -2 -1 1 2 3 4 5
## 1 2 102 3 3 7 16 3
Object rowing is a loglikelihood function on the strengths of the 32
competitors. Our first step is to find the evaluate:
rowing_maxp <- maxp(rowing)
rowing_maxp
## banna bhokanal boudina cabrera campbell dongyong
## 1.6203e-02 1.1179e-02 8.6617e-05 3.5607e-02 1.9001e-02 1.4935e-03
## drysdale esquivel fournier gambour garcia grant
## 3.9452e-01 1.5030e-03 4.4669e-02 1.0026e-06 2.2405e-04 2.6153e-02
## hoff kelmelis khafaji kholmirzayev martin memo
## 1.0504e-02 9.7357e-04 1.0890e-04 1.6090e-04 1.1207e-01 2.0449e-03
## molnar monasterio obreno peebles rivarola rosso
## 1.0112e-02 1.0544e-05 8.4125e-02 2.7493e-04 2.6797e-05 3.8616e-03
## saensuk shcharbachenia synek szymczyk taieb teilemb
## 2.2736e-04 3.7770e-02 1.7246e-01 1.4547e-02 6.5507e-05 4.0192e-06
## yakovlev zambrano
## 1.0000e-06 9.6636e-06
and we may display it using a dot chart:
Figure 1: dot chart of results in the 2016 Olympic Sculls
Figure 2: dot chart of results in the 2016 Olympic Sculls (log scale)
consistency(rowing)
Figure 3: consistency check for rowing function
Figure 1 shows that many competitors have
very small strength and figure 2 shows
that many competitors have zero strength to numerical precision.
Looking at the original dataset we can see why: observe that gambour
came last in every race except for race number 7 (won by boudina).
In race 7, he beat only yaklovlev. But we see that yaklovlev came
last in every race except race 20, where he beat only gambour. Thus
the maximum likelihood estimate for gambour + yaklovlev would be
zero, because these two competitors always lose to everyone else. It
is thus reasonable to eliminate gambour and yaklovlev from the
dataset. We may apply this reasoning recursively and end up with the
following dataset:
show("rowing_minimal.txt")
## [1] "rowing_minimal.txt"
(also, we have removed drysdale, as he won everything and his ML
strength would be 1). We may then analyse this:
rowing_minimal_table <- sapply(readLines("rowing_minimal.txt"),strsplit," ")
names(rowing_minimal_table) <- NULL
rowing_minimal <- Reduce("+",sapply(rowing_minimal_table,race))
summary(rowing_minimal)
## A hyper2 object of size 25.
## pnames: banna bhokanal boudina cabrera campbell dongyong esquivel fournier garcia grant hoff kelmelis khafaji kholmirzayev martin memo molnar obreno peebles rivarola rosso saensuk shcharbachenia synek szymczyk
## Number of brackets: 102
## Sum of powers: 0
##
## Table of bracket lengths:
## 1 2 3 4 5 6
## 25 18 19 17 14 9
##
## Table of powers:
## -4 -2 -1 1 2 3 4 5
## 1 2 74 3 1 8 12 1
This is a much more manageable dataset, much easier to study and less cluttered, as competitors with zero estimated strength are not present.
rowing_minimal_maxp <- maxp(rowing_minimal)
rowing_minimal_maxp
## banna bhokanal boudina cabrera campbell dongyong
## 1.3367e-02 3.3078e-03 3.7661e-06 3.3973e-02 1.7276e-02 2.0032e-04
## esquivel fournier garcia grant hoff kelmelis
## 2.2999e-04 5.7353e-02 1.2822e-05 2.0118e-02 9.1992e-03 8.0410e-05
## khafaji kholmirzayev martin memo molnar obreno
## 6.9772e-06 1.2491e-05 2.7153e-01 5.3597e-04 4.2569e-03 1.2824e-01
## peebles rivarola rosso saensuk shcharbachenia synek
## 5.9694e-06 1.0000e-06 1.0429e-03 6.1653e-06 5.7130e-02 3.7082e-01
## szymczyk
## 1.1282e-02
dotchart(rowing_minimal_maxp)
Figure 4: Minimal dataset, maximum likelihood estimate of strength
dotchart(log10(rowing_minimal_maxp))
Figure 5: Minimal dataset, maximum likelihood estimate of strength
Following lines create rowing.rda, residing in the data/ directory
of the package.
save(rowing,rowing_minimal,rowing_maxp,rowing_minimal_maxp,rowing_table,rowing_minimal_table,file="rowing.rda")
hyper2 Package: Likelihood Functions for Generalized Bradley-Terry Models.” The R Journal 9 (2): 429–39.