(takes about 30 minutes to run without cache, most
of the time taken processing the maximi chunk)
To cite the hyper2 package in publications, please use
Hankin (2017). The file eurovision.txt, used below, is copied
from “Eurovision Song Contest 2009,” Wikipedia, accessed May 13, 2018.
It refers to semi-final 1. More documentation is given in
eurovision.Rd [type help(euro2009) at the R prompt].
First we specify the matrix as appearing in the Wikipedia page:
eurovision_table <- as.matrix(read.table("eurovision.txt"))
eurovision_table
## ME CZ BE BY SW AM AD CH TR IL BG IS MK RO FI PT MT BA DE UK
## ME NA 0 0 3 0 5 1 2 5 1 0 0 8 0 0 1 6 10 2 0
## CZ 0 NA 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## BE 0 0 NA 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## BY 2 1 0 NA 1 4 0 0 0 4 1 1 6 0 4 0 1 0 0 0
## SW 0 6 4 7 NA 8 7 4 4 7 0 10 3 4 10 8 8 4 4 7
## AM 4 12 10 10 5 NA 0 1 10 10 8 2 2 8 1 0 0 1 10 5
## AD 0 0 0 0 0 0 NA 0 1 0 0 0 0 0 0 4 3 0 0 0
## CH 0 0 0 2 2 0 2 NA 0 0 0 0 0 0 5 2 0 2 0 0
## TR 8 5 12 6 7 10 5 12 NA 6 12 7 12 12 7 5 10 12 12 12
## IL 5 4 3 4 6 7 8 5 3 NA 4 6 1 3 6 0 4 0 5 1
## BG 0 0 0 0 0 0 0 0 2 0 NA 0 5 0 0 0 0 0 0 0
## IS 7 10 7 12 12 12 10 7 8 12 6 NA 4 10 12 12 12 7 6 8
## MK 10 3 0 0 0 0 0 6 6 0 10 0 NA 2 0 0 0 8 0 0
## RO 6 0 2 1 0 2 4 0 7 8 5 4 7 NA 0 10 2 6 1 2
## FI 3 0 1 0 10 0 3 0 0 0 0 12 0 1 NA 3 5 0 0 4
## PT 0 2 6 0 3 0 12 10 0 2 2 8 0 7 2 NA 0 3 7 6
## MT 1 7 8 8 4 3 6 3 0 5 3 5 0 6 3 6 NA 5 3 10
## BA 12 8 5 5 8 6 0 8 12 3 7 3 10 5 8 7 7 NA 8 3
Each row corresponds to a contestant and each column to a judge. Note
that the matrix is not square: the last two columns correspond to
Germany and the UK, who were judges but not competitors. I have
removed the first column from the Wikipedia table (which gave the
total of the points), replaced self-voting entries with NA, and
replaced blanks with 0. Thus the first row corresponds to votes
cast for ME, Montenegro. We see that BY (Belarus) awarded them
three points, AM (Armenia) five points, and so on. The first column
corresponds to the points awarded by Monetenegro. We see that they
placed gave Belarus (BY) two points, Armenia (AM) four points, and
so on. Their favourite was Bosnia and Herzegovina (BA) to whom they
gave twelve points. The points system used was:
points <- c(12,10,8,7,6,5,4,3,2,1)
Variable points gives the number of points awarded, under Eurovision
rules, to voters’ first, second, third, etc choice. Points for any
competitor are added and the winner is the competitor with the most
points. However, in the hyper2 model, the numerical values
themselves do not affect the likelihood function; only the order of
the voters’ preferences matters. The following R idiom translates
wiki_matrix into a form suitable for analysis with hyper2:
preference <- eurovision_table*0
for(i in seq_along(points)){ preference[eurovision_table == points[i]] <- i }
countries <- data.frame(
fullname =
c("Montenegro", "Czech rep", "Belgium", "Belarus", "Sweden",
"Armenia", "Andorra", "Switzerland", "Turkey", "Israel",
"Bulgaria", "Iceland", "Macedonia", "Romania", "Finland",
"Portugal", "Malta", "Bosnia Herz", "Germany", "UK"),
abbreviation = c("ME","CZ","BE", "BY", "SW", "AM", "AD", "CH",
"TR", "IL", "BG", "IS", "MK", "RO", "FI", "PT",
"MT", "BA", "DE", "UK")
)
if(abbreviated){
jj <- countries$abbreviation
} else {
jj <- countries$fullname
}
competitors <- as.character(jj[1:18])
colnames(preference) <- jj # voters; 20 countries (18 + DE + UK)
rownames(preference) <- competitors
rownames(eurovision_table) <- jj[1:18]
colnames(eurovision_table) <- jj
In the above, matrix preference records voters’ first, second,
third, etc choice. A zero entry means no points (nul punkte!) and
NA means that voter was forbidden from voting for that player
(countries cannot vote for themselves). The competitors were the
first 18 countries (the last two countries,
Germany and the UK, voted but did not compete).
preference
## ME CZ BE BY SW AM AD CH TR IL BG IS MK RO FI PT MT BA DE UK
## ME NA 0 0 8 0 6 10 9 6 10 0 0 3 0 0 10 5 2 9 0
## CZ 0 NA 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## BE 0 0 NA 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## BY 9 10 0 NA 10 7 0 0 0 7 10 10 5 0 7 0 10 0 0 0
## SW 0 5 7 4 NA 3 4 7 7 4 0 2 8 7 2 3 3 7 7 4
## AM 7 1 2 2 6 NA 0 10 2 2 3 9 9 3 10 0 0 10 2 6
## AD 0 0 0 0 0 0 NA 0 10 0 0 0 0 0 0 7 8 0 0 0
## CH 0 0 0 9 9 0 9 NA 0 0 0 0 0 0 6 9 0 9 0 0
## TR 3 6 1 5 4 2 6 1 NA 5 1 4 1 1 4 6 2 1 1 1
## IL 6 7 8 7 5 4 3 6 8 NA 7 5 10 8 5 0 7 0 6 10
## BG 0 0 0 0 0 0 0 0 9 0 NA 0 6 0 0 0 0 0 0 0
## IS 4 2 4 1 1 1 2 4 3 1 5 NA 7 2 1 1 1 4 5 3
## MK 2 8 0 0 0 0 0 5 5 0 2 0 NA 9 0 0 0 3 0 0
## RO 5 0 9 10 0 9 7 0 4 3 6 7 4 NA 0 2 9 5 10 9
## FI 8 0 10 0 2 0 8 0 0 0 0 1 0 10 NA 8 6 0 0 7
## PT 0 9 5 0 8 0 1 2 0 9 9 3 0 4 9 NA 0 8 4 5
## MT 10 4 3 3 7 8 5 8 0 6 8 6 0 5 8 5 NA 6 8 2
## BA 1 3 6 6 3 5 0 3 1 8 4 8 2 6 3 4 4 NA 3 8
Now, take the first column of preference. This represents the order
of preferences of Montenegro (ME). Their favourite was [last row]
Bosnia & Herzegovina (BA), who they gave rank 1 to (that is, 12
points). Their second favourite was Macedonia (MK), their third was
Turkey (TR), and so on. They were not allowed to vote for themselves,
which is why the first row is NA. So the order was: BA (first),
MK (second) , TR (third), IS (fourth), RO (fifth), IL
(sixth), AM (seventh), FI (eigthth), BY (ninth), MT (tenth).
The other countries (CZ, BE, SW, CH, BG, PT) came joint
last and did not receive any points from Montenegro. Note the final
two rows corresponding to votes cast by Germany and the UK, who did
not compete and therefore cast votes for all competitors.
We need to convert matrix preference into a likelihood function,
here eurovision:
eurovision <- hyper2()
for(i in seq_len(ncol(preference))){ # cycle through the rows; each row is a voter
d <- preference[,i,drop=TRUE]
eurovision <- eurovision + suppfun(d[!is.na(d)])
} # i loop closes
eurovision_maxp <- maxp(eurovision)
eurovision_maxp
## AD AM BA BE BG BY CH
## 4.1652e-03 4.9610e-02 1.0126e-01 1.3499e-03 2.7491e-03 1.5750e-02 8.8204e-03
## CZ FI IL IS ME MK MT
## 1.0028e-06 1.5648e-02 5.0215e-02 2.7924e-01 1.9261e-02 1.2034e-02 5.7317e-02
## PT RO SW TR
## 3.0281e-02 3.3604e-02 7.7127e-02 2.4156e-01
pie(eurovision_maxp)
Figure 1: Maximum likelihood strengths of competitors in the 2009 Eurovision song contest, semifinal 1
Observe the small estimated strength for Czechoslovakia (CZ) at
about 1e-06. This is consistent with the second row of preference
which shows that noone gave them any points.
consistency(eurovision)
Figure 2: consistency check: maximum likelihood obtained two different ways. Exact agreement corresponds to points lying on the diagonal line. Note the small estimated strength of Czechoslovakia
Now we can check null of equal strengths:
equalp.test(eurovision)
##
## Constrained support maximization
##
## data: eurovision
## null hypothesis: AD = AM = BA = BE = BG = BY = CH = CZ = FI = IL = IS = ME = MK = MT = PT = RO = SW = TR
## null estimate:
## AD AM BA BE BG BY CH CZ
## 0.055556 0.055556 0.055556 0.055556 0.055556 0.055556 0.055556 0.055556
## FI IL IS ME MK MT PT RO
## 0.055556 0.055556 0.055556 0.055556 0.055556 0.055556 0.055556 0.055556
## SW TR
## 0.055556 0.055556
## (argmax, constrained optimization)
## Support for null: -501.22 + K
##
## alternative hypothesis: sum p_i=1
## alternative estimate:
## AD AM BA BE BG BY CH
## 4.1652e-03 4.9610e-02 1.0126e-01 1.3499e-03 2.7491e-03 1.5750e-02 8.8204e-03
## CZ FI IL IS ME MK MT
## 1.0028e-06 1.5648e-02 5.0215e-02 2.7924e-01 1.9261e-02 1.2034e-02 5.7317e-02
## PT RO SW TR
## 3.0281e-02 3.3604e-02 7.7127e-02 2.4156e-01
## (argmax, free optimization)
## Support for alternative: -378.92 + K
##
## degrees of freedom: 17
## support difference = 122.3
## p-value: 2.6467e-42
Thus the difference is about 122 units of support, surely significant.
points_scored <- rowSums(eurovision_table,na.rm=TRUE) # points
ox <- rank(-points_scored)
oy <- rank(-eurovision_maxp) # m = MLE
oyp <- ordertrans(oy,names(ox))
par(pty='s') # square plot
plot(ox,oyp,asp=1,pty='s',xlim=c(0,18),ylim=c(0,18),pch=16,
xlab="official order",ylab="my order",main="Eurovision 2009")
par(xpd=TRUE) # allow country names to appear outside plotting region
for(i in seq_along(ox)){text(ox[i],oyp[i],names(ox)[i],pos=4,col='gray') }
par(xpd=FALSE) # stop diagonal line from protruding beyond plotting region
abline(0,1)
hyper3 idiomFirst we use hyper2:
a <- read.table("euro_ranks.txt",header=TRUE)
jj <- table(a$country)
wanted_countries <- names(jj[jj >= 10])
a <- a[a$country %in% wanted_countries,]
H <- hyper2()
for(y in unique(a$year)){
H <- H + race((a$country)[a$year == y])
}
summary(H)
## A hyper2 object of size 27.
## pnames: Albania Armenia Azerbaijan BosHerz Cyprus Denmark Estonia Finland France Germany Greece Hungary Iceland Israel Italy Lithuania Malta Moldova Netherlands Norway Romania Russia Serbia Spain Sweden UK Ukraine
## Number of brackets: 357
## Sum of powers: 0
##
## Table of bracket lengths:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## 27 13 17 19 19 19 19 19 19 19 19 19 18 18 18 18 17 15 12 5 4 3 1
##
## Table of powers:
## -4 -3 -2 -1 9 10 11 12 13 14 15 16 17
## 1 1 3 325 2 5 6 2 3 3 1 1 4
Find the evaluate:
mH <- maxp(H)
dotchart(mH)
Now use hyper3 and follow RB_BB_LF() in
plackett_luce_monster.Rmd. The blocs follow those of Price 2020.
scandinavia <- c("Denmark","Finland","Iceland","Norway","Sweden")
balkans <- c("Italy","Serbia","Albania","BosHerz") # red
eastern <- c("Russia","Ukraine","Moldova","Armenia","Azerbaijan", "Malta","Israel") # purple
black <- c("Hungary","Romania","Greece","Cyprus") # orange
iberia <- c("Spain","Germany") # blue
northern <- c("Denmark", "Estonia", "Finland", "France", "Iceland", "Lithuania", "Netherlands", "Norway", "Sweden", "UK") # green
l <- list(balkans=balkans,eastern=eastern,black=black,iberia=iberia,northern=northern)
ab <- c(balkans,eastern,black,iberia,northern)
stopifnot(all(ab %in% wanted_countries))
stopifnot(all(wanted_countries %in% ab))
list2ec <- function(l){
jj <- stack(setNames(l,seq_along(l)))
out <- as.integer(c(jj$ind))
names(out) <- jj$values
return(out)
}
bloc <- function(lam){
out <- hyper3()
e <- list2ec(l)
for(y in unique(a$year)){
v <- a$country[a$year==y]
ec <- e[names(e) %in% v]
out <- out + cheering3(v=v, e=ec,h=rep(lam,length(unique(e))))
}
return(out)
}
o <- function(lam){ # maximize function o()
H <- bloc(lam)
return(maxp(H,give=TRUE,n=1)$likes)
}
lam <- seq(from=0.6,to=2,len=20)
L <- sapply(lam,o)
plot(lam,L-max(L),type="b",pch=16)
abline(h=c(0,-2))
abline(v=1)
Following lines create eurovision.rda, residing in the data/ directory of the package.
save(eurovision_table,eurovision_maxp,eurovision,file="eurovision.rda")
A. Price 2020. “Identifying voting blocs in the Eurovision Song Contest”. Towards Data Science,
https://medium.com/towards-data-science/identifying-voting-blocs-in-the-eurovision-song-contest-4792065fc337
hyper2 Package: Likelihood Functions for Generalized Bradley-Terry Models.” The R Journal 9 (2): 429–39.