Differential calculus — gradient • hyper2

Given a hyper2 object and a point in probability space, function gradient() returns the gradient of the log-likelihood; function hessian() returns the bordered Hessian matrix. By default, both functions are evaluated at the maximum likelihood estimate for $p$ , as given by maxp().

gradient(H, probs=indep(maxp(H)))
hessian(H,probs=indep(maxp(H)),border=TRUE)
hessian_lowlevel(L, powers, probs, pnames,n) 
is_ok_hessian(M, give=TRUE)

Arguments

H: A hyper2 object
L,powers,n: Components of a hyper2 object
probs: A vector of probabilities
pnames: Character vector of names
border: Boolean, with default TRUE meaning to return the bordered Hessian and FALSE meaning to return the Hessian (warning: this option does not respect the unit sum constraint)
M: A bordered Hessian matrix, understood to have a single constraint (the unit sum) at the last row and column; the output of hessian(border=TRUE)
give: Boolean with default FALSE meaning for function is_ok_hessian() to return whether or not M corresponds to a negative-definite matrix, and TRUE meaning to return more details

Details

Function gradient() returns the gradient of the log-likelihood function. If the hyper2 object is of size $n$ , then argument probs may be a vector of length $n-1$ or $n$ ; in the former case it is interpreted as indep(p). In both cases, the returned gradient is a vector of length $n-1$ . The function returns the derivative of the loglikelihood with respect to the $n-1$ independent components of $\left(p_1,\ldots,p_n\right)$ , namely $\left(p_1,\ldots,p_{n-1}\right)$ . The fillup value $p_n$ is calculated as $1-\left(p_1+\cdots + p_{n-1}\right)$ .

Function gradientn() returns the gradient of the loglikelihood function but ignores the unit sum constraint. If the hyper2 object is of size $n$ , then argument probs must be a vector of length $n$ , and the function returns a named vector of length $n$ . The last element of the vector is not treated differently from the others; all $n$ elements are treated as independent. The sum need not equal one.

Function hessian() returns the bordered Hessian, a matrix of size $n+1\times n+1$ , which is useful when using Lagrange's method of undetermined multipliers. The first row and column correspond to the unit sum constraint, $\sum p_1=1$ . Row and column names of the matrix are the pnames() of the hyper2 object, plus “usc” for “Unit Sum Constraint”.

The unit sum constraint borders could have been added with idiom magic::adiag(0,pad=1,hess), which might be preferable.

Function is_ok_hessian() returns the result of the second derivative test for the maximum likelihood estimate being a local maximum on the constraint hypersurface. This is a generalization of the usual unconstrained problem, for which the test is the Hessian's being negative-definite.

Function hessian_lowlevel() is a low-level helper function that calls the C++ routine.

Further examples and discussion is given in file inst/gradient.Rmd. See also the discussion at maxp on the different optimization routines available.

Value

Function gradient() returns a vector of length $n-1$ with entries being the gradient of the log-likelihood with respect to the $n-1$ independent components of $\left(p_1,\ldots,p_n\right)$ , namely $\left(p_1,\ldots,p_{n-1}\right)$ . The fillup value $p_n$ is calculated as $1-\left(p_1,\ldots,p_{n-1}\right)$ .

If argument border is TRUE, function hessian() returns an $n$ -by- $n$ matrix of second derivatives; the borders are as returned by gradient(). If border is FALSE, ignore the fillup value and return an $n-1$ -by- $n-1$ matrix.

Calling hessian() at the evaluate will not return exact zeros for the constraint on the fillup value; gradient() is used and this does not return exactly zeros at the evaluate.

Author

Robin K. S. Hankin

Examples


data(chess)
p <- c(1/2,1/3)
delta <- rnorm(2)/1e5  # delta needs to be quite small

deltaL  <- loglik(p+delta,chess) - loglik(p,chess)
deltaLn <- sum(delta*gradient(chess,p + delta/2))   # numeric

deltaL - deltaLn  # should be small [zero to first order]
#> [1] 5.110577e-12

H <- hessian(icons)
is_ok_hessian(H)
#> [1]  7.126371e-02 -2.728246e+01  1.824722e+04 -7.317948e+06