To cite the hyper2 package in publications, please use Hankin (2017). This short document is based on Altham and Hankin (2010), which discussed an interesting dataset originally presented by Lin et. al. (2009). It uses more sophisticated idiom furnished by the hyper2 package, which is an improved and efficient upgrade of the primitive hyperdirichlet package. The dataset arose from 69 medical malpractice claims and refers to the two surgeon–reviewers’ answers to the question “was there a communication breakdown in the hand-off between physicians caring for the patient?”. The rows of Table 1 correspond to the answers that were given by reviewer 1, and the columns to the answers given by reviewer 2.

handover_table <- as.matrix(read.table("handover.txt"))
names(dimnames(handover_table)) <- c("reviewer1","reviewer2")
handover_table[3,3] <- NA # structural zero
handover_table
##          reviewer2
## reviewer1 Yes No Missing
##   Yes      26  1       2
##   No        5 18       9
##   Missing   4  4      NA

Thus we see that in 26 cases, both reviewers answered “yes”, and in 18 both said “no”. The table also records partial responses that include observations missed by one or both reviewers. Responses missed by both reviewers are uninformative and automatically rejected: structural zeros, in aylmer terminology. We first apply the Aylmer test to assess whether the two reviewers give the same proportion of “yes” responses:

> aylmer.test(a,alternative=function(a){a[1,2]-a[2,1]})

    Aylmer functional test for count data

data:  a
p-value = 0.169
alternative hypothesis: test function exceeds observed

> 

(Compare the McNemar test which gives an exact \(p\)-value of \(7/64\simeq 0.1094\)). We follow the previous workers and assume that cases are missing at random and seek a likelihood function on \(\theta_{11},\theta_{10},\theta_{01},\theta_{00}\) subject to \(\theta_{ij}\geq 0,\sum_{i,j \in\left\lbrace 0,1\right\rbrace}\theta_{ij}=1\). Here “1” means “yes” and “0” means “no”, so \(\theta_{01}\) is the probability of reviewer 1 responding “no” and reviewer 2 responding “yes”.

handover <- hyper2()
handover["t11"] %<>% inc(handover_table[1,1])
handover["t10"] %<>% inc(handover_table[1,2])
handover["t01"] %<>% inc(handover_table[2,1])
handover["t00"] %<>% inc(handover_table[2,2])

handover[c("t11","t10")] %<>% inc(handover_table[1,3])  # yes-yes plus yes-no  = 2 
handover[c("t01","t00")] %<>% inc(handover_table[2,3])  #  no-yes plus  no-no  = 9 
handover[c("t11","t01")] %<>% inc(handover_table[3,1])  # yes-yes plus  no-yes = 4 
handover[c("t10","t00")] %<>% inc(handover_table[3,2])  # yes-no  plus  no-no  = 4 
handover <- hyper2::balance(handover)
handover
## log(t00^18 * (t00 + t01)^9 * (t00 + t01 + t10 + t11)^-69 * (t00 +
## t10)^4 * t01^5 * (t01 + t11)^4 * t10 * (t10 + t11)^2 * t11^26)
handover_maxp <- maxp(handover)
handover_maxp
##      t00      t01      t10      t11 
## 0.419549 0.111276 0.017987 0.451188

We may simply test the hypothesis that \(\theta_{01} = \theta_{10}\):

samep.test(handover,c("t01","t10"))
## 
##  Constrained support maximization
## 
## data:  handover
## null hypothesis: t01 = t10
## null estimate:
##      t00      t01      t10      t11 
## 0.425931 0.060232 0.060232 0.453605 
## (argmax, constrained optimization)
## Support for null:  -66.144 + K
## 
## alternative hypothesis:  sum p_i=1 
## alternative estimate:
##      t00      t01      t10      t11 
## 0.419549 0.111276 0.017987 0.451188 
## (argmax, free optimization)
## Support for alternative:  -64.145 + K
## 
## degrees of freedom: 1
## support difference = 1.9991
## p-value: 0.045548

Alternatively we might use the Bayesian paradigm and calculate the probability of \(\theta_{01} > \theta_{10}\):

probability(handover,disallowed=function(p){p[2]<p[3]},tol=0.05)
## [1] 0.9998

the result differing slightly from the previous value, probably due to numerical issues.

We can now consider the distribution of \(\psi=\theta_{10}/(\theta_{10}+\theta_{01})\) under the posterior distribution induced by the complete cases:

jj <- rp(300000,H=handover)
head(jj)
##          t00     t01     t10     t11
## [1,] 0.25000 0.25000 0.25000 0.25000
## [2,] 0.25000 0.25000 0.25000 0.25000
## [3,] 0.25000 0.25000 0.25000 0.25000
## [4,] 0.25000 0.25000 0.25000 0.25000
## [5,] 0.25000 0.25000 0.25000 0.25000
## [6,] 0.22008 0.20885 0.27522 0.29585
psifun <- function(x){x[3]/(x[2]+x[3])}
logoddsfun <- function(x){log(x[2]/x[3])}
psi <- apply(jj,1,psifun)
log_odds <- apply(jj,1,logoddsfun)
psi_complete <- rbeta(length(psi),2,6)

We can now reproduce the four graphics given by Altham and Hankin:

Sample of \(\psi=\theta_{10}/(\theta_{10}+\theta_{01})\) under the posterior induced by the whole dataset (histogram), together with the posterior distribution of \(\psi\) induced by the complete cases only (a beta distribution with parameters 2 and 6).

Sample of $\psi=\theta_{10}/(\theta_{10}+\theta_{01})$ under the posterior induced  by the whole dataset (histogram), together with the posterior distribution of $\psi$ induced by the complete cases only (a beta distribution with parameters 2 and 6)

Figure 1: Sample of \(\psi=\theta_{10}/(\theta_{10}+\theta_{01})\) under the posterior induced by the whole dataset (histogram), together with the posterior distribution of \(\psi\) induced by the complete cases only (a beta distribution with parameters 2 and 6)

Quantile-quantile plot of $\psi=\theta_{10}/(\theta_{10}+\theta_{01})$ under the posterior induced  by the whole dataset against the posterior distribution of $\psi$ induced by the complete cases only (a beta distribution with parameters 2 and 6)

Figure 2: Quantile-quantile plot of \(\psi=\theta_{10}/(\theta_{10}+\theta_{01})\) under the posterior induced by the whole dataset against the posterior distribution of \(\psi\) induced by the complete cases only (a beta distribution with parameters 2 and 6)

Empirical cumulative distribution function of $\psi=\theta_{10}/(\theta_{10}+\theta_{01})$ under the posterior induced  by the whole dataset, and that of $\psi$ induced by the complete cases only (a beta distribution with parameters 2 and 6)

Figure 3: Empirical cumulative distribution function of \(\psi=\theta_{10}/(\theta_{10}+\theta_{01})\) under the posterior induced by the whole dataset, and that of \(\psi\) induced by the complete cases only (a beta distribution with parameters 2 and 6)

QQ plot of $\phi=\log(\theta_{01}/\theta_{10})$

Figure 4: QQ plot of \(\phi=\log(\theta_{01}/\theta_{10})\)

Figures 1 to 4 show different aspects of the dataset.

Package dataset

Following lines create handover.rda, residing in the data/ directory of the package.

save(handover_table,handover_maxp,handover,file="handover.rda")

References

Altham, P. M. E., and R. K. S. Hankin. 2010. “Using Recently Developed Software on a 2x2 Table of Matched Pairs with Incompletely Classified Data.” Journal of the Royal Statistical Society, Series C 59 (2): 377–79.
Hankin, R. K. S. 2017. “Partial Rank Data with the hyper2 Package: Likelihood Functions for Generalized Bradley-Terry Models.” The R Journal 9 (2): 429–39.