Jester dataset — jester • hyper2

A likelihood function for the Jester datasets

data(jester)

Details

Object jester is a likelihood function for the 91 jokes rated by the first 150 respondents in file jester_dataset_1_3.zip, taken from Goldberg et al. Object jester_maxp is the result of running maxp(jester). The results table of (nearly) all jokes and respondents is given as jester_table in which each row is a joke and each column a respondent.

The dataset is interesting because it has been analysed by many workers, including Goldberg, for patterns; here I assume that all the respondents behave identically (but randomly). It is included here because it is a very severe numerical challenge in the context of the hyper2 package. I am not convinced that maxjest is even close to the true evaluate.

Objects jester, jester_table, and jester_maxp can be generated by running script inst/jester.Rmd, which includes some further technical documentation. This file takes about 10 minutes to run.

References

Eigentaste: A Constant Time Collaborative Filtering Algorithm. Ken Goldberg, Theresa Roeder, Dhruv Gupta, and Chris Perkins. Information Retrieval, 4(2), 133-151. July 2001.

Examples


# maxp(jester)  # takes too long

# Note that the possibly poor identification of the evaluate
#  nevertheless allows us to reject the null of equality:

(LAM <- -2*(loglik(equalp(jester),jester)-loglik(jester_maxp,jester)))
#> [1] 696.279
pval <- pchisq(LAM,df=size(jester),lower.tail=FALSE)