(takes about fifteen minutes to run without cache; the
value of carcinoma_maxp is calculated separately [discussed below],
and that takes a few days to run).
To cite the hyper2 package in publications, please use
Hankin (2017). This is a short analysis of the carcinoma dataset
presented by Agresti, table 13.1, p542. Seven pathologists classified
each of 118 slides on the presence or absence of carcinoma in the
uterine cervix.
Here, the object of inference is the set of pathologists. The probability model is as follows. To each pathologist \(i,1\leq i \leq 7\), we assign a non-negative Bradley-Terry strength and require \(\sum p_i=1\). Given that two pathologists \(i\) and \(j\) differ in their diagnosis then the probability of \(i\) diagnosing the presence, and \(j\) diagnosing the absence, of carcinoma is given by
\[\begin{equation}\tag{1} \operatorname{Prob}(i^+,j^-) = \frac{p_i}{p_i+p_j},\qquad \operatorname{Prob}(i^-,j^+) = \frac{p_j}{p_i+p_j} \end{equation}\]
(here superscripts \(+\) and \(-\) refer to a positive and negative
diagnosis respectively). Operationally the way we implement this in
hyper2 idiom is to posit an (unobservable) ranking on the
clinicians: we place the clinicians in order from most convinced of
the presence of carcinoma to the least convinced. Thus a ranking of
DACBEGF means that clinician D was most convinced of the presence
of carcinoma and F least convinced. Given this, and given a single
slide, our actual observation might be that D and A diagnosed
presence, and the others diagnosed absence, of carcinoma. The
corresponding entry would be
A B C D E F G
T F F T F F F
We ask what rankings are consistent with this observation. Any
ranking with A and D ahead of BCEFG would be admissible. In
hyper2 notation we would say
\[\left\lbrace A,D\right\rbrace\succ\left\lbrace B,C,E,F,G\right\rbrace\]
Any ranking where A and D occupy the first two ranks and
B,C,E,F,G ranks three to seven, would be admissible. We
would have \(2!5!=240\) rankings. The probability of the actual ranking
being one of these is simply the sum of these 48 as the rankings are
mutually exclusive.
Each ranking has a Plackett-Luce likelihood function which is created
in hyper2 idiom using the bespoke race() function. Here, we use
generalized grouped rank likelihood (ggrl()) to create an lsl
object, as used by Hankin 2019 for the MasterChef analysis. This
furnishes a Bradley-Terry model for the clinicians’ strength that
recovers equation (1).
carcinoma_table <- read.table("carcinoma.txt",header=TRUE)
carcinoma_table[,1:7] <- carcinoma_table[,1:7] > 0
carcinoma_table
## A B C D E F G n q1 q2 q3
## 1 FALSE FALSE FALSE FALSE FALSE FALSE FALSE 34 1.1 23.0 33.8
## 2 FALSE FALSE FALSE FALSE TRUE FALSE FALSE 2 1.6 6.6 2.0
## 3 FALSE TRUE FALSE FALSE FALSE FALSE FALSE 6 2.2 12.7 6.3
## 4 FALSE TRUE FALSE FALSE FALSE FALSE TRUE 1 2.8 1.7 1.5
## 5 FALSE TRUE FALSE FALSE TRUE FALSE FALSE 4 3.3 3.6 3.0
## 6 FALSE TRUE FALSE FALSE TRUE FALSE TRUE 5 4.2 0.5 4.7
## 7 TRUE FALSE FALSE FALSE FALSE FALSE FALSE 2 1.4 3.0 2.1
## 8 TRUE FALSE TRUE FALSE TRUE FALSE TRUE 1 1.6 0.2 0.2
## 9 TRUE TRUE FALSE FALSE FALSE FALSE FALSE 2 2.8 1.7 1.3
## 10 TRUE TRUE FALSE FALSE FALSE FALSE TRUE 1 3.5 0.3 1.6
## 11 TRUE TRUE FALSE FALSE TRUE FALSE FALSE 2 4.2 0.5 2.9
## 12 TRUE TRUE FALSE FALSE TRUE FALSE TRUE 7 5.3 3.7 6.5
## 13 TRUE TRUE FALSE FALSE TRUE TRUE TRUE 1 1.4 2.6 1.4
## 14 TRUE TRUE FALSE TRUE FALSE FALSE TRUE 1 1.3 0.1 0.1
## 15 TRUE TRUE FALSE TRUE TRUE FALSE TRUE 2 2.0 4.3 2.6
## 16 TRUE TRUE FALSE TRUE TRUE TRUE TRUE 3 0.5 3.1 2.0
## 17 TRUE TRUE TRUE FALSE TRUE FALSE TRUE 13 3.3 11.5 9.6
## 18 TRUE TRUE TRUE FALSE TRUE TRUE TRUE 5 0.9 8.4 8.7
## 19 TRUE TRUE TRUE TRUE TRUE FALSE TRUE 10 1.2 13.5 13.6
## 20 TRUE TRUE TRUE TRUE TRUE TRUE TRUE 16 0.3 9.9 12.3
wanted <- !(apply(carcinoma_table[,1:7],1,sum) %in% c(0,7))
carcinoma_table_short <- carcinoma_table[wanted,]
W <- hyper2(pnames=LETTERS[1:7])
Wi <- list()
for(i in seq_len(nrow(carcinoma_table_short))){
jj <- unlist(carcinoma_table_short[i,1:7])
negative <- names(jj)[jj==0]
positive <- names(jj)[jj==1]
Wi[[i]] <- ggrl(W,positive,negative)
}
carcinoma <- lsl(Wi,powers= carcinoma_table_short$n)
There are several natural BT strengths to consider. First, we could simply count how many times each clinician diagnoses carcinoma:
carcinoma_count <- colSums(sweep(carcinoma_table_short[,1:7],1,carcinoma_table_short$n,"*"))
names(carcinoma_count) <- colnames(carcinoma_table_short)[1:7]
carcinoma_count <- carcinoma_count/sum(carcinoma_count)
carcinoma_count
## A B C D E F G
## 0.183824 0.231618 0.106618 0.058824 0.202206 0.033088 0.183824
lc <- loglik_lsl(carcinoma_count,carcinoma)
Secondly, Zipf:
z <- zipf(7)
names(z) <- LETTERS[1:7]
z
## A B C D E F G
## 0.385675 0.192837 0.128558 0.096419 0.077135 0.064279 0.055096
lz <- loglik_lsl(z,carcinoma) # takes about 30 seconds to run
Thirdly, equal probabilities:
e <- equalp(W)
e
## A B C D E F G
## 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286
le <- loglik_lsl(e,carcinoma) # takes about 30 seconds to run
The maximum likelihood estimate is somewhat more demanding. To find the maximum likelihood estimate we would use:
maxp_lsl(carcinoma,startp=indep(carcinoma_count), control=list(trace=100))
but this takes many hours to run. It gives the following:
carcinoma_maxp <-
c(A = 0.118778759012282, B = 0.542873969223425, C = 0.00654044305270073,
D = 0.00242693714895262, E = 0.202821434317779, F = 0.00130596252816986,
G = 0.125252494716691)
carcinoma_maxp
## A B C D E F G
## 0.1187788 0.5428740 0.0065404 0.0024269 0.2028214 0.0013060 0.1252525
lmax <- loglik_lsl(carcinoma_maxp,carcinoma)
[of course, if carcinoma_maxp is available, we would use this as a
starting point, viz
maxp_lsl(carcinoma,startp=indep(carcinoma_maxp))]. With this, we
are now in a position to compare these four points:
(results <- c(count=lc, zipf=lz, equal=le,maxp=lmax))
## count zipf equal maxp
## -105.668 -172.314 -184.950 -70.822
results <- results - max(results)
results
## count zipf equal maxp
## -34.846 -101.493 -114.129 0.000
We can use Wilks here to assess the null of carcinoma_count. We
have \(-2\log(\Lambda/\Lambda_0)\sim\chi^2_6\), giving us a \(p\)-value of
pchisq(-2*results[1],df=6,lower.tail=FALSE)
## count
## 4.7282e-13
What is the difference between carcinoma_count and carcinoma_maxp?
carcinoma_count
## A B C D E F G
## 0.183824 0.231618 0.106618 0.058824 0.202206 0.033088 0.183824
carcinoma_maxp
## A B C D E F G
## 0.1187788 0.5428740 0.0065404 0.0024269 0.2028214 0.0013060 0.1252525
ordertransplot(carcinoma_count,carcinoma_maxp)
ordertransplot(log(carcinoma_count),log(carcinoma_maxp),plotlims=c(-8,0))
save(carcinoma_table,carcinoma, carcinoma_count, carcinoma_maxp,file="carcinoma.rda")
Following lines create carcinoma.rda, residing in the data/
directory of the package.
Agresti, table 13.1, p542.
Landis, J. R., and G. G. Koch. 1977. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics 33: 363-374.
hyper2 Package: Likelihood Functions for Generalized Bradley-Terry Models.” The R Journal 9 (2): 429–39.