Simpson's diversity index
simpson.Rd
Simpson's diversity index
Details
Returns the Simpson index \(D\): the probability that two randomly sampled individuals belong to different species.
There is some confusion as to the precise definition: some authors specify that the two individuals are necessarily distinct (ie sampling without replacement), and some do not.
Simpson (1949) assumed sampling without replacement and gave
$$ 1-\frac{\sum_{i=1}^Sn_i\left(n_i-1\right)}{J(J-1)} $$ in our notation.
He and Hu (2005) assumed sampling with replacement: $$ 1-\frac{\sum_{i=1}^Sn_i^2}{J^2}. $$
The difference is largely academic but is most pronounced when many species occur with low counts (ie close to 1).
References
S. P. Hubbell 2001. “The Unified Neutral Theory of Biodiversity”. Princeton University Press.
F. He and X.-S. Hu 2005. “Hubbell's Fundamental Biodiversity Parameter and the Simpson Diversity Index”. Ecology Letters, volume 8, pp386-390. doi:
10.1111/j.1461-0248.2005.00729.x
E. H. Simpson 1949. “Measurement of diversity”, Nature, volume 163, p688
Examples
data(butterflies)
D <- simpson(butterflies)
theta <- optimal.prob(butterflies)*2*no.of.ind(butterflies)
# compare theta with D/(1-D) (should be roughly equal; see He & Hu 2005):
theta
#> [1] 9.989579
D/(1-D)
#> [1] 11.32517
# Second argument pedantic in practice.
# Mostly, the difference is small:
simpson(butterflies,FALSE) - simpson(butterflies,TRUE)
#> [1] 0.002443791
# Most extreme example:
x <- count(c(1,1))
simpson(x,TRUE)
#> [1] 0.5
simpson(x,FALSE)
#> [1] 1