(takes about five minutes to run without cache)

To cite the hyper2 package in publications, please use Hankin (2017). Suppose there are five people in a family. Each day for five days, one person cooks a meal. After the fifth day, each person considers the four meals cooked by other family members, and puts them in order of preference. This process is repeated and here I consider how to interpret such data.

cooking_table <- as.matrix(read.table("cooking_weekly.txt", header=TRUE))
head(cooking_table)
##      robin lesley alice zac annabel
## W01a     0      4     3   2       1
## W01b     3      0     1   4       2
## W01c     3      1     0   4       2
## W01d     2      3     4   0       1
## W01e     3      2     1   4       0
## W02a     0      1     4   2       3
tail(cooking_table)
##      robin lesley alice zac annabel
## W14e     3      2     1   4       0
## W15a     0      3     1   2       4
## W15b     3      0     2   4       1
## W15c     3      2     0   4       1
## W15d     3      1     4   0       2
## W15e     3      1     2   4       0

Thus from the first row we find that one judge considered Lesley to be fourth best, Alice third, Zac second and Annabel best (this must have been Robin: no-one is allowed to vote for themselves and a zero means no vote). This is the transpose of an order table such as skating.txt. In this context a zero contains no information (unlike the formula 1 tables in which zeros signify did not finish). Document curling.Rmd contains a discussion and here we will use the same method. First we will consider the support function for the first row only:

(day1 <- cooking_table[1,])
##   robin  lesley   alice     zac annabel 
##       0       4       3       2       1
(H_row1 <- suppfun(names(sort(day1[day1 > 0]))))
## log( alice * (alice + annabel + lesley + zac)^-1 * (alice + lesley)^-1
## * (alice + lesley + zac)^-1 * annabel * zac)

In the above, we calculate the Plackett-Luce likelihood function; observe that it is uninformative about competitor number 1, which I show in two ways. Firstly:

p <- c(0.3,0.2,0.1,0.3)
# Now create p2 which is the same but competitor 1 is stronger:
p2 <- p
alpha <- 1.3
beta <- (1-alpha*p[1])/(1-p[1])
p2[1] <- p2[1]*alpha   # person 1 becomes stronger...
p2[-1] <- p2[-1]*beta  # ...and everyone else becomes weaker

loglik(indep(p),H_row1)
## [1] -2.5903
loglik(indep(p2),H_row1)
## [1] -2.8111

Complete dataset

It is easy to iterate over the rows of the dataset:

H <- hyper2()

for(i in seq_len(nrow(cooking_table))){  # iterate over rows
jj <- cooking_table[i,] 
    H <- H + suppfun(sort(jj[jj > 0]))
}
H
## log(alice^53 * (alice + annabel)^-4 * (alice + annabel + lesley)^-7 *
## (alice + annabel + lesley + robin)^-11 * (alice + annabel + lesley +
## zac)^-14 * (alice + annabel + robin)^-2 * (alice + annabel + robin +
## zac)^-15 * (alice + annabel + zac)^-6 * (alice + lesley)^-9 * (alice +
## lesley + robin)^-5 * (alice + lesley + robin + zac)^-15 * (alice +
## lesley + zac)^-6 * (alice + robin)^-4 * (alice + robin + zac)^-11 *
## (alice + zac)^-2 * annabel^45 * (annabel + lesley)^-6 * (annabel +
## lesley + robin)^-2 * (annabel + lesley + robin + zac)^-14 * (annabel +
## lesley + zac)^-4 * (annabel + robin + zac)^-10 * (annabel + zac)^-6 *
## lesley^51 * (lesley + robin)^-2 * (lesley + robin + zac)^-21 * (lesley
## + zac)^-8 * robin^44 * (robin + zac)^-33 * zac^24)

NOTE: converting cooking_table to an order table is not correct, for the zero is interpreted as “did not finish”, but the correct interpretation is “did not take part”: the difference is that DNF implies lower trength, but the likelihood function is uninformative about DNTP players. Even so, we may use ordertable2supp():

tc <- t(cooking_table)
tc[,1:20]
##         W01a W01b W01c W01d W01e W02a W02b W02c W02d W02e W03a W03b W03c W03d
## robin      0    3    3    2    3    0    4    4    4    4    0    4    0    3
## lesley     4    0    1    3    2    1    0    1    3    1    1    0    1    0
## alice      3    1    0    4    1    4    3    0    1    2    3    1    0    2
## zac        2    4    4    0    4    2    2    3    0    3    2    3    3    0
## annabel    1    2    2    1    0    3    1    2    2    0    4    2    2    1
##         W03e W04a W04b W04c W04d W04e
## robin      4    0    1    3    1    3
## lesley     2    1    0    2    3    2
## alice      1    2    3    0    4    1
## zac        3    3    4    4    0    4
## annabel    0    4    2    1    2    0

(compare file curling.Rmd in which we iterate over columns not rows). Standard techniques may be used:

equalp.test(H)
## 
##  Constrained support maximization
## 
## data:  H
## null hypothesis: alice = annabel = lesley = robin = zac
## null estimate:
##   alice annabel  lesley   robin     zac 
##     0.2     0.2     0.2     0.2     0.2 
## (argmax, constrained optimization)
## Support for null:  -228.24 + K
## 
## alternative hypothesis:  sum p_i=1 
## alternative estimate:
##   alice annabel  lesley   robin     zac 
## 0.31862 0.28910 0.23125 0.10859 0.05244 
## (argmax, free optimization)
## Support for alternative:  -198.93 + K
## 
## degrees of freedom: 4
## support difference = 29.315
## p-value: 5.6285e-12
mH <- maxp(H)
mH
##   alice annabel  lesley   robin     zac 
## 0.31862 0.28910 0.23125 0.10859 0.05244
pie(mH)

Preferential voting

It seems that the boys tend to prefer the boys’ cooking and the girls prefer the girls’ cooking. But simple things first:

a <- cooking_table # saves typing
a <- a[a[,4]>0,]  # just lines where zac does not vote
jj <- table(robin_votes=a[,1]==0,zac_last=a[,4]==4)
jj
##            zac_last
## robin_votes FALSE TRUE
##       FALSE    12   32
##       TRUE     13    2

The table above shows that Robin cast a total of 15 votes for Zac, and everyone else combined cast a total of 44 votes for Zac. Everyone else voted Zac last a total of 32 times and higher than last 12 times:

fisher.test(jj,alternative="less")
## 
##  Fisher's Exact Test for Count Data
## 
## data:  jj
## p-value = 7.7e-05
## alternative hypothesis: true odds ratio is less than 1
## 95 percent confidence interval:
##  0.0000 0.2677
## sample estimates:
## odds ratio 
##   0.061033

So it looks like Robin is less likely to vote Zac last than everyone else. Does he do likewise for me?

a <- cooking_table # saves typing
a <- a[a[,1]>0,]  # just lines where robin does not vote
jj <- table(zac_votes=a[,4]==0,robin_last=a[,1]==4)
jj
##          robin_last
## zac_votes FALSE TRUE
##     FALSE    33   11
##     TRUE     13    1
fisher.test(jj,alternative="less")
## 
##  Fisher's Exact Test for Count Data
## 
## data:  jj
## p-value = 0.14
## alternative hypothesis: true odds ratio is less than 1
## 95 percent confidence interval:
##  0.0000 1.5442
## sample estimates:
## odds ratio 
##    0.23527

No evidence for that.

We can do slightly better, and test a more general hypothesis, specifically that the girls systematically vote other girls higher, and boys systematically vote boys higher, than warranted. To test this hypothesis, we either posit two reified entities, girl and boy. Entity girl helps a girl when another girl is voting for her, and entity boy helps a boy when another boy is voting for him.

Or we use multipliers and hyper3 idiom (not yet done)

References

Hankin, R. K. S. 2017. “Partial Rank Data with the hyper2 Package: Likelihood Functions for Generalized Bradley-Terry Models.” The R Journal 9 (2): 429–39.