To cite the hyper2 package in publications, please use Hankin (2017).
This short document is based on Altham and Hankin (2010), which
discussed an interesting dataset originally presented by Lin
et. al. (2009). It uses more sophisticated idiom furnished by the
hyper2 package, which is an improved and efficient upgrade of the
primitive hyperdirichlet package. The dataset arose from 69 medical
malpractice claims and refers to the two surgeon–reviewers’ answers to
the question “was there a communication breakdown in the hand-off
between physicians caring for the patient?”. The rows of Table 1
correspond to the answers that were given by reviewer 1, and the
columns to the answers given by reviewer 2.
handover_table <- as.matrix(read.table("handover.txt"))
names(dimnames(handover_table)) <- c("reviewer1","reviewer2")
handover_table[3,3] <- NA # structural zero
handover_table
## reviewer2
## reviewer1 Yes No Missing
## Yes 26 1 2
## No 5 18 9
## Missing 4 4 NA
Thus we see that in 26 cases, both reviewers answered “yes”, and in 18
both said “no”. The table also records partial responses that include
observations missed by one or both reviewers. Responses missed by
both reviewers are uninformative and automatically rejected:
structural zeros, in aylmer terminology. We first apply the Aylmer
test to assess whether the two reviewers give the same proportion of
“yes” responses:
> aylmer.test(a,alternative=function(a){a[1,2]-a[2,1]})
Aylmer functional test for count data
data: a
p-value = 0.169
alternative hypothesis: test function exceeds observed
>
(Compare the McNemar test which gives an exact \(p\)-value of \(7/64\simeq 0.1094\)). We follow the previous workers and assume that cases are missing at random and seek a likelihood function on \(\theta_{11},\theta_{10},\theta_{01},\theta_{00}\) subject to \(\theta_{ij}\geq 0,\sum_{i,j \in\left\lbrace 0,1\right\rbrace}\theta_{ij}=1\). Here “1” means “yes” and “0” means “no”, so \(\theta_{01}\) is the probability of reviewer 1 responding “no” and reviewer 2 responding “yes”.
handover <- hyper2()
handover["t11"] %<>% inc(handover_table[1,1])
handover["t10"] %<>% inc(handover_table[1,2])
handover["t01"] %<>% inc(handover_table[2,1])
handover["t00"] %<>% inc(handover_table[2,2])
handover[c("t11","t10")] %<>% inc(handover_table[1,3]) # yes-yes plus yes-no = 2
handover[c("t01","t00")] %<>% inc(handover_table[2,3]) # no-yes plus no-no = 9
handover[c("t11","t01")] %<>% inc(handover_table[3,1]) # yes-yes plus no-yes = 4
handover[c("t10","t00")] %<>% inc(handover_table[3,2]) # yes-no plus no-no = 4
handover <- hyper2::balance(handover)
handover
## log(t00^18 * (t00 + t01)^9 * (t00 + t01 + t10 + t11)^-69 * (t00 +
## t10)^4 * t01^5 * (t01 + t11)^4 * t10 * (t10 + t11)^2 * t11^26)
handover_maxp <- maxp(handover)
handover_maxp
## t00 t01 t10 t11
## 0.419549 0.111276 0.017987 0.451188
We may simply test the hypothesis that \(\theta_{01} = \theta_{10}\):
samep.test(handover,c("t01","t10"))
##
## Constrained support maximization
##
## data: handover
## null hypothesis: t01 = t10
## null estimate:
## t00 t01 t10 t11
## 0.425931 0.060232 0.060232 0.453605
## (argmax, constrained optimization)
## Support for null: -66.144 + K
##
## alternative hypothesis: sum p_i=1
## alternative estimate:
## t00 t01 t10 t11
## 0.419549 0.111276 0.017987 0.451188
## (argmax, free optimization)
## Support for alternative: -64.145 + K
##
## degrees of freedom: 1
## support difference = 1.9991
## p-value: 0.045548
Alternatively we might use the Bayesian paradigm and calculate the probability of \(\theta_{01} > \theta_{10}\):
probability(handover,disallowed=function(p){p[2]<p[3]},tol=0.05)
## [1] 0.9998
the result differing slightly from the previous value, probably due to numerical issues.
We can now consider the distribution of \(\psi=\theta_{10}/(\theta_{10}+\theta_{01})\) under the posterior distribution induced by the complete cases:
jj <- rp(300000,H=handover)
head(jj)
## t00 t01 t10 t11
## [1,] 0.25000 0.25000 0.25000 0.25000
## [2,] 0.25000 0.25000 0.25000 0.25000
## [3,] 0.25000 0.25000 0.25000 0.25000
## [4,] 0.25000 0.25000 0.25000 0.25000
## [5,] 0.25000 0.25000 0.25000 0.25000
## [6,] 0.22008 0.20885 0.27522 0.29585
psifun <- function(x){x[3]/(x[2]+x[3])}
logoddsfun <- function(x){log(x[2]/x[3])}
psi <- apply(jj,1,psifun)
log_odds <- apply(jj,1,logoddsfun)
psi_complete <- rbeta(length(psi),2,6)
We can now reproduce the four graphics given by Altham and Hankin:
Sample of \(\psi=\theta_{10}/(\theta_{10}+\theta_{01})\) under the posterior induced by the whole dataset (histogram), together with the posterior distribution of \(\psi\) induced by the complete cases only (a beta distribution with parameters 2 and 6).
Figure 1: Sample of \(\psi=\theta_{10}/(\theta_{10}+\theta_{01})\) under the posterior induced by the whole dataset (histogram), together with the posterior distribution of \(\psi\) induced by the complete cases only (a beta distribution with parameters 2 and 6)
Figure 2: Quantile-quantile plot of \(\psi=\theta_{10}/(\theta_{10}+\theta_{01})\) under the posterior induced by the whole dataset against the posterior distribution of \(\psi\) induced by the complete cases only (a beta distribution with parameters 2 and 6)
Figure 3: Empirical cumulative distribution function of \(\psi=\theta_{10}/(\theta_{10}+\theta_{01})\) under the posterior induced by the whole dataset, and that of \(\psi\) induced by the complete cases only (a beta distribution with parameters 2 and 6)
Figure 4: QQ plot of \(\phi=\log(\theta_{01}/\theta_{10})\)
Figures 1 to 4 show different aspects of the dataset.
Following lines create handover.rda, residing in the data/ directory of the package.
save(handover_table,handover_maxp,handover,file="handover.rda")
hyper2 Package: Likelihood Functions for Generalized Bradley-Terry Models.” The R Journal 9 (2): 429–39.