A very short function that reproduces Zipf's law: a harmonic rank-probability distribution, formally

$$p(i)=\frac{i^{-1}}{\sum_{i=1}^{N} i^{-1}},\qquad i=1,\ldots,N$$

The volleyball dataset might reasonably be assumed to be zipf, but one can reject this hypothesis at \(5\%\), see the examples.

zipf(n)

Arguments

n

Integer; if a hyper2 object is supplied this is interpreted as size(n)

Value

Returns a numeric vector summing to one

Author

Robin K. S. Hankin

See also

Examples

zipf(icons)
#>         NB          L         PB        THC         OA       WAIS 
#> 0.40816327 0.20408163 0.13605442 0.10204082 0.08163265 0.06802721 
knownp.test(volleyball,zipf(volleyball))
#> 
#> 	Constrained support maximization
#> 
#> data:  volleyball
#> null hypothesis: p1 = 0.3534858, p2 = 0.1767429, p3 = 0.1178286, p4 = 0.0883714, p5 = 0.0706972, p6 = 0.0589143, p7 = 0.050498, p8 = 0.0441857, p9 = 0.0392762
#> null estimate:
#>         p1         p2         p3         p4         p5         p6         p7 
#> 0.35348576 0.17674288 0.11782859 0.08837144 0.07069715 0.05891429 0.05049797 
#>         p8         p9 
#> 0.04418572 0.03927620 
#> (argmax, constrained optimization)
#> Support for null:  -33.33813 + K
#> 
#> alternative hypothesis:  sum p_i=1 
#> alternative estimate:
#>           p1           p2           p3           p4           p5           p6 
#> 4.174629e-01 7.590639e-02 4.352590e-01 1.051670e-06 9.634895e-04 1.016641e-06 
#>           p7           p8           p9 
#> 1.000184e-06 2.904115e-02 4.136399e-02 
#> (argmax, free optimization)
#> Support for alternative:  -25.32838 + K
#> 
#> degrees of freedom: 8
#> support difference = 8.009745
#> p-value: 0.04210199 
#>