We denote by the raw empirical moment by $$ m_j = \frac1n \sum_{i=1}^n x_i^j, $$ by the centered empirical moment by $$ \mu_j = \frac1n \sum_{i=1}^n (x_i^j-m_1). $$ Starting values are computed in R/util-startarg.R. We give below the starting values for discrete and continuous distributions and refer to the bibliograhy sections for details.

1. Discrete distributions

1.1. Base R distribution

1.1.1. Geometric distribution

The MME is used p̂ = 1/(1 + m₁).

1.1.2. Negative binomial distribution

The MME is used n̂ = m₁²/(μ₂ − m₁).

1.1.3. Poisson distribution

Both the MME and the MLE is λ̂ = m₁.

1.1.4. Binomial distribution

The MME is used Var[X]/E[X] = 1 − p ⇒ p̂ = 1 − μ₂/m₁. the size parameter is n̂ = ⌈max (max_ix_i, m₁/p̂)⌉.

1.2. logarithmic distribution

The expectation simplifies for small values of p $$ E[X] = -\frac{1}{\log(1-p)}\frac{p}{1-p} \approx -\frac{1}{-p}\frac{p}{1-p} =\frac{1}{1-p}. $$ So the initial estimate is p̂ = 1 − 1/m₁.

1.3. Zero truncated distributions

This distribution are the distribution of X|X > 0 when X follows a particular discrete distributions. Hence the initial estimate are the one used for base R on sample x − 1.

1.4. Zero modified distributions

The MLE of the probability parameter is the empirical mass at 0 $\hat p_0=\frac1n \sum_i 1_{x_i=0}$. For other estimators we use the classical estimator with probability parameter 1 − p̂₀.

1.5. Poisson inverse Gaussian distribution

The first two moments are E[X] = μ, Var[X] = μ + ϕμ³. So the initial estimate are μ̂ = m₁, ϕ̂ = (μ₂ − m₁)/m₁³.

2. Continuous distributions

2.1. Normal distribution

The MLE is the MME so we use the empirical mean and variance.

2.2. Lognormal distribution

The log sample follows a normal distribution, so same as normal on the log sample.

2.3. Beta distribution (of the first kind)

The density function for a beta ℬe(a, b) is $$ f_X(x) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)} x^{a-1}(1-x)^{b-1}. $$ The initial estimate is the MME

2.4. Other continuous distribution in `actuar`

2.4.1. Log-gamma

Use the gamma initial values on the sample log (x)

2.4.2. Gumbel

The distribution function is $$ F(x) = \exp(-\exp(-\frac{x-\alpha}{\theta})). $$ Let q₁ and q₃ the first and the third quartiles. $$ \left\{\begin{array} -\theta\log(-\log(p_1)) = q_1-\alpha \\ -\theta\log(-\log(p_3)) = q_3-\alpha \end{array}\right. \Leftrightarrow \left\{\begin{array} -\theta\log(-\log(p_1))+\theta\log(-\log(p_3)) = q_1-q_3 \\ \alpha= \theta\log(-\log(p_3)) + q_3 \end{array}\right. \Leftrightarrow \left\{\begin{array} \theta= \frac{q_1-q_3}{\log(-\log(p_3)) - \log(-\log(p_1))} \\ \alpha= \theta\log(-\log(p_3)) + q_3 \end{array}\right.. $$ Using the median for the location parameter α yields to initial estimate $$ \hat\theta= \frac{q_1-q_3}{\log(\log(4/3)) - \log(\log(4))}, \hat\alpha = \hat\theta\log(\log(2)) + q_2. $$

2.4.3. Inverse Gaussian distribution

The moments of this distribution are E[X] = μ, Var[X] = μ³ϕ. Hence the initial estimate are μ̂ = m₁, ϕ̂ = μ₂/m₁³.

2.4.4. Generalized beta

This is the distribution of θX^1/τ when X is beta distributed ℬe(a, b) The moments are $$ E[X] = \theta \beta(a+1/\tau, b)/\beta(a,b) = \theta \frac{\Gamma(a+1/\tau)}{\Gamma(a)}\frac{\Gamma(a+b)}{\Gamma(a+b+1/\tau)}, $$ $$ E[X^2] = \theta^2 \frac{\Gamma(a+2/\tau)}{\Gamma(a)}\frac{\Gamma(a+b)}{\Gamma(a+b+2/\tau)}. $$ Hence for large value of τ, we have $$ E[X^2] /E[X] = \theta \frac{\Gamma(a+2/\tau)}{\Gamma(a+b+2/\tau)} \frac{\Gamma(a+b+1/\tau)}{\Gamma(a+1/\tau)} \approx \theta. $$ Note that the MLE of θ is the maximum We use $$ \hat\tau=3, \hat\theta = \frac{m_2}{m_1}\max_i x_i 1_{m_2>m_1} +\frac{m_1}{m_2}\max_i x_i 1_{m_2\geq m_1}. $$ then we use beta initial estimate on sample $(\frac{x_i}{\hat\theta})^{\hat\tau}$.

2.5. Feller-Pareto family

The Feller-Pareto distribution is the distribution X = μ + θ(1/B − 1)^1/γ when B follows a beta distribution with shape parameters α and τ. See details at https://doi.org/10.18637/jss.v103.i06 Hence let Y = (X − μ)/θ, we have $$ \frac{Y}{1+Y} = \frac{X-\mu}{\theta+X-\mu} = (1-B)^{1/\gamma}. $$ For γ close to 1, $\frac{Y}{1+Y}$ is approximately beta distributed τ and α.

The log-likelihood is The MLE of μ is the minimum.

The gradient with respect to θ, α, γ, τ is Cancelling the first component of score for γ = α = 2, we get $$ -(2\tau - 1) \sum_{i} \frac{x_i}{\theta(x_i-\mu)} + (2+\tau)\sum_i \frac{x_i 2(x_i-\mu)}{\theta^3(1+(\frac{x_i-\mu}\theta)^2)} = \frac{n}{\theta} \Leftrightarrow -(2\tau - 1)\theta^2\frac1n \sum_{i} \frac{x_i}{x_i-\mu} + (2+\tau) \frac1n\sum_i \frac{x_i 2(x_i-\mu)}{(1+(\frac{x_i-\mu}\theta)^2)} = \theta^2 $$ $$ \Leftrightarrow (2+\tau) \frac1n\sum_i \frac{x_i 2(x_i-\mu)}{1+(\frac{x_i-\mu}\theta)^2} = (2\tau - 1)\theta^2\left(\frac1n \sum_{i} \frac{x_i}{x_i-\mu} -1\right) \Leftrightarrow \sqrt{ \frac{(2+\tau) \frac1n\sum_i \frac{x_i 2(x_i-\mu)}{1+(\frac{x_i-\mu}\theta)^2} }{(2\tau - 1)\left(\frac1n \sum_{i} \frac{x_i}{x_i-\mu} -1\right)} } = \theta. $$ Neglecting unknown value of τ and the denominator in θ, we get with μ̂ set with (@ref(eq:pareto4muinit)) Initial value of τ, α are obtained on the sample (z_i)_i z_i = y_i/(1 + y_i), y_i = (x_i − μ̂)/θ̂, with initial values of a beta distribution which is based on MME (@ref(eq:betaguessestimator)).

Cancelling the last component of the gradient leads to $$ (\gamma - 1) \frac1n\sum_{i} \log(\frac{x_i-\mu}\theta) - \frac1n\sum_i \log(1+(\frac{x_i-\mu}\theta)^\gamma) = \psi(\tau) - \psi(\alpha+\tau) \Leftrightarrow (\gamma - 1) \frac1n\sum_{i} \log(\frac{x_i-\mu}\theta) = \psi(\tau) - \psi(\alpha+\tau) +\frac1n\sum_i \log(1+(\frac{x_i-\mu}\theta)^\gamma) . $$ Neglecting the value γ on the right-hand side we obtain

2.5.1. Transformed beta

This is the Feller-Pareto with μ = 0. So the first component of @ref(eq:fellerparetogradient) simplifies to with γ = α = 2 $$ -(2\tau - 1) \sum_{i} \frac{x_i}{\theta(x_i)} + (2+\tau)\sum_i \frac{2x_i^2}{\theta^3(1+(\frac{x_i}\theta)^2)} = \frac{n}{\theta} \Leftrightarrow -(2\tau - 1) \theta^2 + (2+\tau)\frac1n\sum_i \frac{2x_i^2}{1+(\frac{x_i}\theta)^2} = \theta^2 $$ $$ \theta^2=\frac{2+\tau}{2\tau}\frac1n\sum_i \frac{2x_i^2}{1+(\frac{x_i}\theta)^2}. $$ Neglecting unknown value of τ in the denominator in θ, we get Initial value of τ, α are obtained on the sample (z_i)_i z_i = y_i/(1 + y_i), y_i = x_i/θ̂, with initial values of a beta distribution which is based on MME (@ref(eq:betaguessestimator)). Similar to Feller-Pareto, we set

2.5.2. Generalized Pareto

This is the Feller-Pareto with μ = 0 γ = 1. So the first component of @ref(eq:fellerparetogradient) simplifies to with γ = 2 $$ -(\tau - 1) \frac{n}{\theta} + (2+\tau)\sum_i \frac{x_i}{\theta^2(1+\frac{x_i}\theta} = n/\theta \Leftrightarrow -(\tau - 1) \theta + (2+\tau)\frac1n\sum_i \frac{x_i}{(1+\frac{x_i}\theta} = \theta. $$ Neglecting unknown value of τ leads to

Initial value of τ, α are obtained on the sample (z_i)_i z_i = y_i/(1 + y_i), y_i = x_i/θ̂, with initial values of a beta distribution which is based on MME (@ref(eq:betaguessestimator)).

2.5.3. Burr

Burr is a Feller-Pareto distribution with μ = 0, τ = 1.

The survival function is 1 − F(x) = (1 + (x/θ)^γ)^−α. Using the median q₂, we have log (1/2) = −αlog (1 + (q₂/θ)^γ). The initial value is

So the first component of @ref(eq:fellerparetogradient) simplifies to with γ = α = 2, τ = 1, μ = 0. $$ - n/\theta + 3\sum_i \frac{2x_i(\frac{x_i}\theta)}{\theta^2(1+(\frac{x_i}\theta)^2)} = n/\theta \Leftrightarrow \theta^2\frac1n\sum_i \frac{2x_i(\frac{x_i}\theta)}{(1+(\frac{x_i}\theta)^2)} = 2/3. $$ Neglecting unknown value in the denominator in θ, we get We use for γ̂ @ref(eq:fellerparetogammahat) with τ = 1 and α = 2 and previous θ̂.

2.5.4. Loglogistic

Loglogistic is a Feller-Pareto distribution with μ = 0, τ = 1, α = 1. The survival function is 1 − F(x) = (1 + (x/θ)^γ)⁻¹. So $$ \frac1{1-F(x)}-1 = (x/\theta)^\gamma \Leftrightarrow \log(\frac{F(x)}{1-F(x)}) = \gamma\log(x/\theta). $$ Let q₁ and q₃ be the first and the third quartile. $$ \log(\frac{1/3}{2/3})= \gamma\log(q_1/\theta), \log(\frac{2/3}{1/3})= \gamma\log(q_3/\theta) \Leftrightarrow -\log(2)= \gamma\log(q_1/\theta), \log(2)= \gamma\log(q_3/\theta). $$ The difference of previous equations simplifies to $$ \hat\gamma=\frac{2\log(2)}{\log(q_3/q_1)}. $$ The sum of previous equations 0 = γlog (q₁) + γlog (q₃) − 2γlog (θ).

2.5.5. Paralogistic

Paralogistic is a Feller-Pareto distribution with μ = 0, τ = 1, α = γ. The survival function is 1 − F(x) = (1 + (x/θ)^α)^−α. So log (1 − F(x)) = −αlog (1 + (x/θ)^α). The log-likelihood is The gradient with respect to θ, α is $$ \begin{pmatrix} ( \alpha - 1)\frac{-n}{\theta} - (\alpha+1)\sum_i \frac{-x_i\alpha(x_i/\theta)^{\alpha-1}}{1+(\frac{x_i}\theta)^\alpha} - n/\theta \\ \sum_{i} \log(\frac{ \frac{x_i}\theta}{1+(\frac{x_i}\theta)^\alpha }) - (\alpha+1)\sum_i \frac{(\frac{x_i}\theta)^\alpha \log(x_i/\theta)}{1+(\frac{x_i}\theta)^\alpha} + 2n/\alpha \\ \end{pmatrix}. $$ The first component cancels when $$ - (\alpha+1)\sum_i \frac{-x_i\alpha(x_i/\theta)^{\alpha-1}}{1+(\frac{x_i}\theta)^\alpha} = \alpha n/\theta \Leftrightarrow (\alpha+1)\frac1n\sum_i \frac{ (x_i)^{\alpha+1}}{1+(\frac{x_i}\theta)^\alpha} = \theta^\alpha. $$ The second component cancels when $$ \frac1n\sum_{i} \log(\frac{ \frac{x_i}\theta}{1+(\frac{x_i}\theta)^\alpha }) = -2/\alpha +(\alpha+1)\frac1n\sum_i \frac{(\frac{x_i}\theta)^\alpha \log(x_i/\theta)}{1+(\frac{x_i}\theta)^\alpha}. $$ Choosing θ = 1, α = 2 in sums leads to $$ \frac1n\sum_{i} \log(\frac{ \frac{x_i}\theta}{1+x_i^2 }) - \frac1n\sum_i \frac{x_i^2\log(x_i)}{1+x_i^2} = -2/\alpha +(\alpha)\frac1n\sum_i \frac{x_i^2\log(x_i)}{1+x_i^2}. $$ Initial estimators are

2.5.6. Inverse Burr

Use Burr estimate on the sample 1/x then inverse the scale parameter.

2.5.7. Inverse paralogistic

Use paralogistic estimate on the sample 1/x then inverse the scale parameter.

2.5.8. Inverse pareto

Use pareto estimate on the sample 1/x then inverse the scale parameter.

2.5.9. Pareto IV

The survival function is $$ 1-F(x) = \left(1+ \left(\frac{x-\mu}{\theta}\right)^{\gamma} \right)^{-\alpha}, $$ see ?Pareto4 in actuar.

The first and third quartiles q₁ and q₃ verify $$ ((\frac34)^{-1/\alpha}-1)^{1/\gamma} = \frac{q_1-\mu}{\theta}, ((\frac14)^{-1/\alpha}-1)^{1/\gamma} = \frac{q_3-\mu}{\theta}. $$ Hence we get two useful relations

The log-likelihood of a Pareto 4 sample (see Equation (5.2.94) of Arnold (2015) updated with Goulet et al. notation) is $$ \mathcal L(\mu,\theta,\gamma,\alpha) = (\gamma -1) \sum_i \log(\frac{x_i-\mu}{\theta}) -(\alpha+1)\sum_i \log(1+ (\frac{x_i-\mu}{\theta})^{\gamma}) +n\log(\gamma) -n\log(\theta)+n\log(\alpha). $$ Cancelling the derivate of ℒ(μ, θ, γ, α) with respect to α leads to

The MLE of the threshold parameter μ is the minimum. So the initial estimate is slightly under the minimum in order that all observations are strictly above it where ϵ = 0.05.

Initial parameter estimation is μ̂, α^⋆ = 2 , γ̂ from @ref(eq:pareto4gammarelation) with α^⋆, θ̂ from @ref(eq:pareto4thetarelation) with α^⋆ and γ̂, α̂ from @ref(eq:pareto4alpharelation) with μ̂, θ̂ and γ̂.

2.5.10. Pareto III

Pareto III corresponds to Pareto IV with α = 1.

Initial parameter estimation is μ̂, γ̂ from , θ̂ from with γ̂.

2.5.11. Pareto II

Pareto II corresponds to Pareto IV with γ = 1.

Initial parameter estimation is μ̂, α^⋆ = 2 , θ̂ from with α^⋆ and γ = 1, α̂ from with μ̂, θ̂ and γ = 1,

2.5.12. Pareto I

Pareto I corresponds to Pareto IV with γ = 1, μ = θ.

The MLE is

This can be rewritten with the geometric mean of the sample $G_n = (\prod_{i=1}^n X_i)^{1/n}$ as α̂ = log (G_n/μ̂).

Initial parameter estimation is μ̂, α̂ from .

2.5.13. Pareto

Pareto corresponds to Pareto IV with γ = 1, μ = 0.

Initial parameter estimation is α^⋆ = max (2, 2(m₂ − m₁²)/(m₂ − 2m₁²)), with m_i are empirical raw moment of order i, θ̂ from with α^⋆ and γ = 1, α̂ from with μ = 0, θ̂ and γ = 1.

2.6. Transformed gamma family

2.6.1. Transformed gamma distribution

The log-likelihood is given by ℒ(α, τ, θ) = nlog (τ) + ατ∑_ilog (x_i/θ) − ∑_i(x_i/θ)^τ − ∑_ilog (x_i) − nlog (Gamma(α)). The gradient with respect to α, τ, θ is given by $$ \begin{pmatrix} \tau- n\psi(\alpha)) \\ n/\tau + \alpha\sum_i \log(x_i/\theta) -\sum_i (x_i/\theta)^{\tau} \log(x_i/\theta) \\ -\alpha\tau /\theta +\sum_i \tau \frac{x_i}{\theta^2}(x_i/\theta)^{\tau-1} \end{pmatrix}. $$ We compute the moment-estimator as in gamma α̂ = m₂²/μ₂, θ̂ = μ₂/m₁. Then cancelling the first component of the gradient we set $$ \hat\tau = \frac{\psi(\hat\alpha)}{\frac1n\sum_i \log(x_i/\hat\theta) }. $$

2.6.2. gamma distribution

Transformed gamma with τ = 1

We compute the moment-estimator given by

2.6.3. Weibull distribution

Transformed gamma with α = 1

Let $\tilde m=\frac1n\sum_i \log(x_i)$ and $\tilde v=\frac1n\sum_i (\log(x_i) - \tilde m)^2$. We use an approximate MME τ̂ = 1.2/sqrt(ṽ), θ̂ = exp(m̃ + 0.572/τ̂). Alternatively, we can use the distribution function F(x) = 1 − e^{−(x/σ)^τ} ⇒ log (−log (1 − F(x))) = τlog (x) − τlog (θ), Hence the QME for Weibull is $$ \tilde\tau = \frac{ \log(-\log(1-p_1)) - \log(-\log(1-p_2)) }{ \log(x_1) - \log(x_2) }, \tilde\tau = x_3/(-\log(1-p_3))^{1/\tilde\tau} $$ with p₁ = 1/4, p₂ = 3/4, p₃ = 1/2, x_i corresponding empirical quantiles.

Initial parameters are τ̃ and θ̃ unless the empirical quantiles x₁ = x₂, in that case we use τ̂, θ̂.

2.6.4. Exponential distribution

The MLE is the MME λ̂ = 1/m₁.

2.7. Inverse transformed gamma family

2.7.1. Inverse transformed gamma distribution

Same as transformed gamma distribution with (1/x_i)_i then inverse the scale parameter.

2.7.2. Inverse gamma distribution

We compute moment-estimator as α̂ = (2m₂ − m₁²)/(m₂ − m₁²), θ̂ = m₁m₂/(m₂ − m₁²).

2.7.3. Inverse Weibull distribution

We use the QME.

2.7.4. Inverse exponential

Same as transformed gamma distribution with (1/x_i)_i then inverse the rate parameter.

3. Bibliography

3.1. General books

N. L. Johnson, S. Kotz, N. Balakrishnan (1994). Continuous univariate distributions, Volume 1, Wiley.
N. L. Johnson, S. Kotz, N. Balakrishnan (1995). Continuous univariate distributions, Volume 2, Wiley.
N. L. Johnson, A. W. Kemp, S. Kotz (2008). Univariate discrete distributions, Wiley.
G. Wimmer (1999), Thesaurus of univariate discrete probability distributions.

3.2. Books dedicated to a distribution family

M. Ahsanullah, B.M. Golam Kibria, M. Shakil (2014). Normal and Student’s t Distributions and Their Applications, Springer.
B. C. Arnold (2010). Pareto Distributions, Chapman and Hall.
A. Azzalini (2013). The Skew-Normal and Related Families.
N. Balakrishnan (2014). Handbook of the Logistic Distribution, CRC Press.

3.3. Books with applications

C. Forbes, M. Evans, N. Hastings, B. Peacock (2011). Statistical Distributions, Wiley.
Z. A. Karian, E. J. Dudewicz, K. Shimizu (2010). Handbook of Fitting Statistical Distributions with R, CRC Press.
K. Krishnamoorthy (2015). Handbook of Statistical Distributions with Applications, Chapman and Hall.
Klugman, S., Panjer, H. & Willmot, G. (2019). Loss Models: From Data to Decisions, 5th ed., John Wiley & Sons.

Starting values used in fitdistrplus