gradient.Rd
Given a hyper2
object and a point in probability space,
function gradient()
returns the gradient of the log-likelihood;
function hessian()
returns the bordered Hessian matrix. By
default, both functions are evaluated at the maximum likelihood estimate
for \(p\), as given by maxp()
.
A hyper2
object
Components of a hyper2
object
A vector of probabilities
Character vector of names
Boolean, with default TRUE
meaning to return the
bordered Hessian and FALSE
meaning to return the Hessian
(warning: this option does not respect the unit sum constraint)
A bordered Hessian matrix, understood to have a single
constraint (the unit sum) at the last row and column; the output of
hessian(border=TRUE)
Boolean with default FALSE
meaning for function
is_ok_hessian()
to return whether or not M
corresponds to a negative-definite matrix, and TRUE
meaning
to return more details
Function gradient()
returns the gradient of the log-likelihood
function. If the hyper2
object is of size \(n\), then argument
probs
may be a vector of length \(n-1\) or \(n\); in the
former case it is interpreted as indep(p)
. In both cases, the
returned gradient is a vector of length \(n-1\).
The function returns the derivative of the loglikelihood with respect to
the \(n-1\) independent components of
\(\left(p_1,\ldots,p_n\right)\), namely
\(\left(p_1,\ldots,p_{n-1}\right)\). The fillup
value \(p_n\) is calculated as
\(1-\left(p_1+\cdots + p_{n-1}\right)\).
Function gradientn()
returns the gradient of the loglikelihood
function but ignores the unit sum constraint. If the hyper2
object is of size \(n\), then argument probs
must be a vector
of length \(n\), and the function returns a named vector of length
\(n\). The last element of the vector is not treated differently from
the others; all \(n\) elements are treated as independent. The sum
need not equal one.
Function hessian()
returns the bordered Hessian, a matrix
of size \(n+1\times n+1\), which is useful when using
Lagrange's method of undetermined multipliers. The first row and column
correspond to the unit sum constraint, \(\sum p_1=1\).
Row and column names of the matrix are the pnames()
of the
hyper2
object, plus “usc
” for “Unit Sum
Constraint”.
The unit sum constraint borders could have been added with idiom
magic::adiag(0,pad=1,hess)
, which might be preferable.
Function is_ok_hessian()
returns the result of the second
derivative test for the maximum likelihood estimate being a local
maximum on the constraint hypersurface. This is a generalization of the
usual unconstrained problem, for which the test is the Hessian's being
negative-definite.
Function hessian_lowlevel()
is a low-level helper function that
calls the C++ routine.
Further examples and discussion is given in file
inst/gradient.Rmd
. See also the discussion at maxp on the
different optimization routines available.
Function gradient()
returns a vector of length \(n-1\) with
entries being the gradient of the log-likelihood with respect to the
\(n-1\) independent components of
\(\left(p_1,\ldots,p_n\right)\), namely
\(\left(p_1,\ldots,p_{n-1}\right)\). The fillup
value \(p_n\) is calculated as
\(1-\left(p_1,\ldots,p_{n-1}\right)\).
If argument border
is TRUE
, function hessian()
returns an \(n\)-by-\(n\) matrix of second derivatives; the borders
are as returned by gradient()
. If border
is FALSE
,
ignore the fillup value and return an \(n-1\)-by-\(n-1\) matrix.
Calling hessian()
at the evaluate will not return exact zeros for
the constraint on the fillup value; gradient()
is used and this
does not return exactly zeros at the evaluate.
data(chess)
p <- c(1/2,1/3)
delta <- rnorm(2)/1e5 # delta needs to be quite small
deltaL <- loglik(p+delta,chess) - loglik(p,chess)
deltaLn <- sum(delta*gradient(chess,p + delta/2)) # numeric
deltaL - deltaLn # should be small [zero to first order]
#> [1] 4.251371e-13
H <- hessian(icons)
is_ok_hessian(H)
#> [1] 7.126371e-02 -2.728246e+01 1.824722e+04 -7.317948e+06