Analysing a carcinoma dataset with the hyper2 package

(takes about fifteen minutes to run without cache; the value of carcinoma_maxp is calculated separately [discussed below], and that takes a few days to run).

To cite the hyper2 package in publications, please use Hankin (2017). This is a short analysis of the carcinoma dataset presented by Agresti, table 13.1, p542. Seven pathologists classified each of 118 slides on the presence or absence of carcinoma in the uterine cervix.

Here, the object of inference is the set of pathologists. The probability model is as follows. To each pathologist \(i,1\leq i \leq 7\), we assign a non-negative Bradley-Terry strength and require \(\sum p_i=1\). Given that two pathologists \(i\) and \(j\) differ in their diagnosis then the probability of \(i\) diagnosing the presence, and \(j\) diagnosing the absence, of carcinoma is given by

\[\begin{equation}\tag{1} \operatorname{Prob}(i^+,j^-) = \frac{p_i}{p_i+p_j},\qquad \operatorname{Prob}(i^-,j^+) = \frac{p_j}{p_i+p_j} \end{equation}\]

(here superscripts \(+\) and \(-\) refer to a positive and negative diagnosis respectively). Operationally the way we implement this in hyper2 idiom is to posit an (unobservable) ranking on the clinicians: we place the clinicians in order from most convinced of the presence of carcinoma to the least convinced. Thus a ranking of DACBEGF means that clinician D was most convinced of the presence of carcinoma and F least convinced. Given this, and given a single slide, our actual observation might be that D and A diagnosed presence, and the others diagnosed absence, of carcinoma. The corresponding entry would be

A B C D E F G
T F F T F F F

We ask what rankings are consistent with this observation. Any ranking with A and D ahead of BCEFG would be admissible. In hyper2 notation we would say

\[\left\lbrace A,D\right\rbrace\succ\left\lbrace B,C,E,F,G\right\rbrace\]

Any ranking where A and D occupy the first two ranks and B,C,E,F,G ranks three to seven, would be admissible. We would have \(2!5!=240\) rankings. The probability of the actual ranking being one of these is simply the sum of these 48 as the rankings are mutually exclusive.

Each ranking has a Plackett-Luce likelihood function which is created in hyper2 idiom using the bespoke race() function. Here, we use generalized grouped rank likelihood (ggrl()) to create an lsl object, as used by Hankin 2019 for the MasterChef analysis. This furnishes a Bradley-Terry model for the clinicians’ strength that recovers equation (1).

carcinoma_table <- read.table("carcinoma.txt",header=TRUE)
carcinoma_table[,1:7] <- carcinoma_table[,1:7] > 0
carcinoma_table

##        A     B     C     D     E     F     G  n  q1   q2   q3
## 1  FALSE FALSE FALSE FALSE FALSE FALSE FALSE 34 1.1 23.0 33.8
## 2  FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  2 1.6  6.6  2.0
## 3  FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  6 2.2 12.7  6.3
## 4  FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE  1 2.8  1.7  1.5
## 5  FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  4 3.3  3.6  3.0
## 6  FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE  5 4.2  0.5  4.7
## 7   TRUE FALSE FALSE FALSE FALSE FALSE FALSE  2 1.4  3.0  2.1
## 8   TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE  1 1.6  0.2  0.2
## 9   TRUE  TRUE FALSE FALSE FALSE FALSE FALSE  2 2.8  1.7  1.3
## 10  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE  1 3.5  0.3  1.6
## 11  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE  2 4.2  0.5  2.9
## 12  TRUE  TRUE FALSE FALSE  TRUE FALSE  TRUE  7 5.3  3.7  6.5
## 13  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  1 1.4  2.6  1.4
## 14  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE  1 1.3  0.1  0.1
## 15  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  2 2.0  4.3  2.6
## 16  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  3 0.5  3.1  2.0
## 17  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE 13 3.3 11.5  9.6
## 18  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  5 0.9  8.4  8.7
## 19  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE 10 1.2 13.5 13.6
## 20  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE 16 0.3  9.9 12.3

wanted <- !(apply(carcinoma_table[,1:7],1,sum) %in% c(0,7))
carcinoma_table_short <- carcinoma_table[wanted,]

W <- hyper2(pnames=LETTERS[1:7])
Wi <- list()

for(i in seq_len(nrow(carcinoma_table_short))){
      jj <- unlist(carcinoma_table_short[i,1:7])
      negative <- names(jj)[jj==0]
      positive <- names(jj)[jj==1]
      Wi[[i]] <- ggrl(W,positive,negative)
}
carcinoma <- lsl(Wi,powers= carcinoma_table_short$n)

There are several natural BT strengths to consider. First, we could simply count how many times each clinician diagnoses carcinoma:

carcinoma_count <- colSums(sweep(carcinoma_table_short[,1:7],1,carcinoma_table_short$n,"*"))
names(carcinoma_count) <- colnames(carcinoma_table_short)[1:7]
carcinoma_count <- carcinoma_count/sum(carcinoma_count)
carcinoma_count

##        A        B        C        D        E        F        G 
## 0.183824 0.231618 0.106618 0.058824 0.202206 0.033088 0.183824

lc <- loglik_lsl(carcinoma_count,carcinoma)

Secondly, Zipf:

z <- zipf(7)
names(z) <- LETTERS[1:7]
z

##        A        B        C        D        E        F        G 
## 0.385675 0.192837 0.128558 0.096419 0.077135 0.064279 0.055096

lz <- loglik_lsl(z,carcinoma)  # takes about 30 seconds to run

Thirdly, equal probabilities:

e <- equalp(W)
e

##       A       B       C       D       E       F       G 
## 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286 0.14286

le <- loglik_lsl(e,carcinoma) # takes about 30 seconds to run

The maximum likelihood estimate is somewhat more demanding. To find the maximum likelihood estimate we would use:

maxp_lsl(carcinoma,startp=indep(carcinoma_count), control=list(trace=100))

but this takes many hours to run. It gives the following:

carcinoma_maxp <- 
c(A = 0.118778759012282, B = 0.542873969223425, C = 0.00654044305270073, 
D = 0.00242693714895262, E = 0.202821434317779, F = 0.00130596252816986, 
G = 0.125252494716691)
carcinoma_maxp

##         A         B         C         D         E         F         G 
## 0.1187788 0.5428740 0.0065404 0.0024269 0.2028214 0.0013060 0.1252525

lmax <- loglik_lsl(carcinoma_maxp,carcinoma)

[of course, if carcinoma_maxp is available, we would use this as a starting point, viz maxp_lsl(carcinoma,startp=indep(carcinoma_maxp))]. With this, we are now in a position to compare these four points:

(results <- c(count=lc, zipf=lz, equal=le,maxp=lmax))

##    count     zipf    equal     maxp 
## -105.668 -172.314 -184.950  -70.822

results <- results - max(results)
results

##    count     zipf    equal     maxp 
##  -34.846 -101.493 -114.129    0.000

We can use Wilks here to assess the null of carcinoma_count. We have \(-2\log(\Lambda/\Lambda_0)\sim\chi^2_6\), giving us a \(p\)-value of

pchisq(-2*results[1],df=6,lower.tail=FALSE)

##      count 
## 4.7282e-13

What is the difference between carcinoma_count and carcinoma_maxp?

carcinoma_count

##        A        B        C        D        E        F        G 
## 0.183824 0.231618 0.106618 0.058824 0.202206 0.033088 0.183824

carcinoma_maxp

##         A         B         C         D         E         F         G 
## 0.1187788 0.5428740 0.0065404 0.0024269 0.2028214 0.0013060 0.1252525

ordertransplot(carcinoma_count,carcinoma_maxp)

ordertransplot(log(carcinoma_count),log(carcinoma_maxp),plotlims=c(-8,0))

save(carcinoma_table,carcinoma, carcinoma_count, carcinoma_maxp,file="carcinoma.rda")

Package dataset

Following lines create carcinoma.rda, residing in the data/ directory of the package.

References

Agresti, table 13.1, p542.

Landis, J. R., and G. G. Koch. 1977. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics 33: 363-374.

Hankin, R. K. S. 2017. “Partial Rank Data with the hyper2 Package: Likelihood Functions for Generalized Bradley-Terry Models.” The R Journal 9 (2): 429–39.

Analysing a carcinoma dataset with the hyper2 package

R. K. S. Hankin

Package dataset

References