(takes about five minutes to run without cache)
To cite the hyper2 package in publications, please use
Hankin (2017). Suppose there are five people in a family. Each day
for five days, one person cooks a meal. After the fifth day, each
person considers the four meals cooked by other family members, and
puts them in order of preference. This process is repeated and here I
consider how to interpret such data.
cooking_table <- as.matrix(read.table("cooking_weekly.txt", header=TRUE))
head(cooking_table)
## robin lesley alice zac annabel
## W01a 0 4 3 2 1
## W01b 3 0 1 4 2
## W01c 3 1 0 4 2
## W01d 2 3 4 0 1
## W01e 3 2 1 4 0
## W02a 0 1 4 2 3
tail(cooking_table)
## robin lesley alice zac annabel
## W14e 3 2 1 4 0
## W15a 0 3 1 2 4
## W15b 3 0 2 4 1
## W15c 3 2 0 4 1
## W15d 3 1 4 0 2
## W15e 3 1 2 4 0
Thus from the first row we find that one judge considered Lesley to be
fourth best, Alice third, Zac second and Annabel best (this must have
been Robin: no-one is allowed to vote for themselves and a zero means
no vote). This is the transpose of an order table such as
skating.txt. In this context a zero contains no information (unlike
the formula 1 tables in which zeros signify did not finish). Document
curling.Rmd contains a discussion and here we will use the same
method. First we will consider the support function for the first row only:
(day1 <- cooking_table[1,])
## robin lesley alice zac annabel
## 0 4 3 2 1
(H_row1 <- suppfun(names(sort(day1[day1 > 0]))))
## log( alice * (alice + annabel + lesley + zac)^-1 * (alice + lesley)^-1
## * (alice + lesley + zac)^-1 * annabel * zac)
In the above, we calculate the Plackett-Luce likelihood function; observe that it is uninformative about competitor number 1, which I show in two ways. Firstly:
p <- c(0.3,0.2,0.1,0.3)
# Now create p2 which is the same but competitor 1 is stronger:
p2 <- p
alpha <- 1.3
beta <- (1-alpha*p[1])/(1-p[1])
p2[1] <- p2[1]*alpha # person 1 becomes stronger...
p2[-1] <- p2[-1]*beta # ...and everyone else becomes weaker
loglik(indep(p),H_row1)
## [1] -2.5903
loglik(indep(p2),H_row1)
## [1] -2.8111
It is easy to iterate over the rows of the dataset:
H <- hyper2()
for(i in seq_len(nrow(cooking_table))){ # iterate over rows
jj <- cooking_table[i,]
H <- H + suppfun(sort(jj[jj > 0]))
}
H
## log(alice^53 * (alice + annabel)^-4 * (alice + annabel + lesley)^-7 *
## (alice + annabel + lesley + robin)^-11 * (alice + annabel + lesley +
## zac)^-14 * (alice + annabel + robin)^-2 * (alice + annabel + robin +
## zac)^-15 * (alice + annabel + zac)^-6 * (alice + lesley)^-9 * (alice +
## lesley + robin)^-5 * (alice + lesley + robin + zac)^-15 * (alice +
## lesley + zac)^-6 * (alice + robin)^-4 * (alice + robin + zac)^-11 *
## (alice + zac)^-2 * annabel^45 * (annabel + lesley)^-6 * (annabel +
## lesley + robin)^-2 * (annabel + lesley + robin + zac)^-14 * (annabel +
## lesley + zac)^-4 * (annabel + robin + zac)^-10 * (annabel + zac)^-6 *
## lesley^51 * (lesley + robin)^-2 * (lesley + robin + zac)^-21 * (lesley
## + zac)^-8 * robin^44 * (robin + zac)^-33 * zac^24)
NOTE: converting cooking_table to an order table is not correct,
for the zero is interpreted as “did not finish”, but the correct
interpretation is “did not take part”: the difference is that DNF
implies lower trength, but the likelihood function is uninformative
about DNTP players. Even so, we may use ordertable2supp():
tc <- t(cooking_table)
tc[,1:20]
## W01a W01b W01c W01d W01e W02a W02b W02c W02d W02e W03a W03b W03c W03d
## robin 0 3 3 2 3 0 4 4 4 4 0 4 0 3
## lesley 4 0 1 3 2 1 0 1 3 1 1 0 1 0
## alice 3 1 0 4 1 4 3 0 1 2 3 1 0 2
## zac 2 4 4 0 4 2 2 3 0 3 2 3 3 0
## annabel 1 2 2 1 0 3 1 2 2 0 4 2 2 1
## W03e W04a W04b W04c W04d W04e
## robin 4 0 1 3 1 3
## lesley 2 1 0 2 3 2
## alice 1 2 3 0 4 1
## zac 3 3 4 4 0 4
## annabel 0 4 2 1 2 0
(compare file curling.Rmd in which we iterate over columns not rows).
Standard techniques may be used:
equalp.test(H)
##
## Constrained support maximization
##
## data: H
## null hypothesis: alice = annabel = lesley = robin = zac
## null estimate:
## alice annabel lesley robin zac
## 0.2 0.2 0.2 0.2 0.2
## (argmax, constrained optimization)
## Support for null: -228.24 + K
##
## alternative hypothesis: sum p_i=1
## alternative estimate:
## alice annabel lesley robin zac
## 0.31862 0.28910 0.23125 0.10859 0.05244
## (argmax, free optimization)
## Support for alternative: -198.93 + K
##
## degrees of freedom: 4
## support difference = 29.315
## p-value: 5.6285e-12
mH <- maxp(H)
mH
## alice annabel lesley robin zac
## 0.31862 0.28910 0.23125 0.10859 0.05244
pie(mH)
It seems that the boys tend to prefer the boys’ cooking and the girls prefer the girls’ cooking. But simple things first:
a <- cooking_table # saves typing
a <- a[a[,4]>0,] # just lines where zac does not vote
jj <- table(robin_votes=a[,1]==0,zac_last=a[,4]==4)
jj
## zac_last
## robin_votes FALSE TRUE
## FALSE 12 32
## TRUE 13 2
The table above shows that Robin cast a total of 15 votes for Zac, and everyone else combined cast a total of 44 votes for Zac. Everyone else voted Zac last a total of 32 times and higher than last 12 times:
fisher.test(jj,alternative="less")
##
## Fisher's Exact Test for Count Data
##
## data: jj
## p-value = 7.7e-05
## alternative hypothesis: true odds ratio is less than 1
## 95 percent confidence interval:
## 0.0000 0.2677
## sample estimates:
## odds ratio
## 0.061033
So it looks like Robin is less likely to vote Zac last than everyone else. Does he do likewise for me?
a <- cooking_table # saves typing
a <- a[a[,1]>0,] # just lines where robin does not vote
jj <- table(zac_votes=a[,4]==0,robin_last=a[,1]==4)
jj
## robin_last
## zac_votes FALSE TRUE
## FALSE 33 11
## TRUE 13 1
fisher.test(jj,alternative="less")
##
## Fisher's Exact Test for Count Data
##
## data: jj
## p-value = 0.14
## alternative hypothesis: true odds ratio is less than 1
## 95 percent confidence interval:
## 0.0000 1.5442
## sample estimates:
## odds ratio
## 0.23527
No evidence for that.
We can do slightly better, and test a more general hypothesis,
specifically that the girls systematically vote other girls higher,
and boys systematically vote boys higher, than warranted. To test
this hypothesis, we either posit two reified entities, girl and boy.
Entity girl helps a girl when another girl is voting for her, and
entity boy helps a boy when another boy is voting for him.
Or we use multipliers and hyper3 idiom (not yet done)
hyper2 Package: Likelihood Functions for Generalized Bradley-Terry Models.” The R Journal 9 (2): 429–39.