Starting values used in fitdistrplus

We denote by the raw empirical moment by $$ m_j = \frac1n \sum_{i=1}^n x_i^j, $$ by the centered empirical moment by $$ \mu_j = \frac1n \sum_{i=1}^n (x_i^j-m_1). $$ Starting values are computed in R/util-startarg.R. We give below the starting values for discrete and continuous distributions and refer to the bibliograhy sections for details.

1. Discrete distributions

1.1. Base R distribution

1.1.1. Geometric distribution

The MME is used  = 1/(1 + m1).

1.1.2. Negative binomial distribution

The MME is used  = m12/(μ2 − m1).

1.1.3. Poisson distribution

Both the MME and the MLE is λ̂ = m1.

1.1.4. Binomial distribution

The MME is used Var[X]/E[X] = 1 − p ⇒  = 1 − μ2/m1. the size parameter is  = ⌈max (maxixi, m1/)⌉.

1.2. logarithmic distribution

The expectation simplifies for small values of p $$ E[X] = -\frac{1}{\log(1-p)}\frac{p}{1-p} \approx -\frac{1}{-p}\frac{p}{1-p} =\frac{1}{1-p}. $$ So the initial estimate is  = 1 − 1/m1.

1.3. Zero truncated distributions

This distribution are the distribution of X|X > 0 when X follows a particular discrete distributions. Hence the initial estimate are the one used for base R on sample x − 1.

1.4. Zero modified distributions

The MLE of the probability parameter is the empirical mass at 0 $\hat p_0=\frac1n \sum_i 1_{x_i=0}$. For other estimators we use the classical estimator with probability parameter 1 − 0.

1.5. Poisson inverse Gaussian distribution

The first two moments are E[X] = μ, Var[X] = μ + ϕμ3. So the initial estimate are μ̂ = m1, ϕ̂ = (μ2 − m1)/m13.

2. Continuous distributions

2.1. Normal distribution

The MLE is the MME so we use the empirical mean and variance.

2.2. Lognormal distribution

The log sample follows a normal distribution, so same as normal on the log sample.

2.3. Beta distribution (of the first kind)

The density function for a beta e(a, b) is $$ f_X(x) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)} x^{a-1}(1-x)^{b-1}. $$ The initial estimate is the MME

2.4. Other continuous distribution in actuar

2.4.1. Log-gamma

Use the gamma initial values on the sample log (x)

2.4.2. Gumbel

The distribution function is $$ F(x) = \exp(-\exp(-\frac{x-\alpha}{\theta})). $$ Let q1 and q3 the first and the third quartiles. $$ \left\{\begin{array} -\theta\log(-\log(p_1)) = q_1-\alpha \\ -\theta\log(-\log(p_3)) = q_3-\alpha \end{array}\right. \Leftrightarrow \left\{\begin{array} -\theta\log(-\log(p_1))+\theta\log(-\log(p_3)) = q_1-q_3 \\ \alpha= \theta\log(-\log(p_3)) + q_3 \end{array}\right. \Leftrightarrow \left\{\begin{array} \theta= \frac{q_1-q_3}{\log(-\log(p_3)) - \log(-\log(p_1))} \\ \alpha= \theta\log(-\log(p_3)) + q_3 \end{array}\right.. $$ Using the median for the location parameter α yields to initial estimate $$ \hat\theta= \frac{q_1-q_3}{\log(\log(4/3)) - \log(\log(4))}, \hat\alpha = \hat\theta\log(\log(2)) + q_2. $$

2.4.3. Inverse Gaussian distribution

The moments of this distribution are E[X] = μ, Var[X] = μ3ϕ. Hence the initial estimate are μ̂ = m1, ϕ̂ = μ2/m13.

2.4.4. Generalized beta

This is the distribution of θX1/τ when X is beta distributed e(a, b) The moments are $$ E[X] = \theta \beta(a+1/\tau, b)/\beta(a,b) = \theta \frac{\Gamma(a+1/\tau)}{\Gamma(a)}\frac{\Gamma(a+b)}{\Gamma(a+b+1/\tau)}, $$ $$ E[X^2] = \theta^2 \frac{\Gamma(a+2/\tau)}{\Gamma(a)}\frac{\Gamma(a+b)}{\Gamma(a+b+2/\tau)}. $$ Hence for large value of τ, we have $$ E[X^2] /E[X] = \theta \frac{\Gamma(a+2/\tau)}{\Gamma(a+b+2/\tau)} \frac{\Gamma(a+b+1/\tau)}{\Gamma(a+1/\tau)} \approx \theta. $$ Note that the MLE of θ is the maximum We use $$ \hat\tau=3, \hat\theta = \frac{m_2}{m_1}\max_i x_i 1_{m_2>m_1} +\frac{m_1}{m_2}\max_i x_i 1_{m_2\geq m_1}. $$ then we use beta initial estimate on sample $(\frac{x_i}{\hat\theta})^{\hat\tau}$.

2.5. Feller-Pareto family

The Feller-Pareto distribution is the distribution X = μ + θ(1/B − 1)1/γ when B follows a beta distribution with shape parameters α and τ. See details at https://doi.org/10.18637/jss.v103.i06 Hence let Y = (X − μ)/θ, we have $$ \frac{Y}{1+Y} = \frac{X-\mu}{\theta+X-\mu} = (1-B)^{1/\gamma}. $$ For γ close to 1, $\frac{Y}{1+Y}$ is approximately beta distributed τ and α.

The log-likelihood is The MLE of μ is the minimum.

The gradient with respect to θ, α, γ, τ is Cancelling the first component of score for γ = α = 2, we get $$ -(2\tau - 1) \sum_{i} \frac{x_i}{\theta(x_i-\mu)} + (2+\tau)\sum_i \frac{x_i 2(x_i-\mu)}{\theta^3(1+(\frac{x_i-\mu}\theta)^2)} = \frac{n}{\theta} \Leftrightarrow -(2\tau - 1)\theta^2\frac1n \sum_{i} \frac{x_i}{x_i-\mu} + (2+\tau) \frac1n\sum_i \frac{x_i 2(x_i-\mu)}{(1+(\frac{x_i-\mu}\theta)^2)} = \theta^2 $$ $$ \Leftrightarrow (2+\tau) \frac1n\sum_i \frac{x_i 2(x_i-\mu)}{1+(\frac{x_i-\mu}\theta)^2} = (2\tau - 1)\theta^2\left(\frac1n \sum_{i} \frac{x_i}{x_i-\mu} -1\right) \Leftrightarrow \sqrt{ \frac{(2+\tau) \frac1n\sum_i \frac{x_i 2(x_i-\mu)}{1+(\frac{x_i-\mu}\theta)^2} }{(2\tau - 1)\left(\frac1n \sum_{i} \frac{x_i}{x_i-\mu} -1\right)} } = \theta. $$ Neglecting unknown value of τ and the denominator in θ, we get with μ̂ set with (@ref(eq:pareto4muinit)) Initial value of τ, α are obtained on the sample (zi)i zi = yi/(1 + yi), yi = (xi − μ̂)/θ̂, with initial values of a beta distribution which is based on MME (@ref(eq:betaguessestimator)).

Cancelling the last component of the gradient leads to $$ (\gamma - 1) \frac1n\sum_{i} \log(\frac{x_i-\mu}\theta) - \frac1n\sum_i \log(1+(\frac{x_i-\mu}\theta)^\gamma) = \psi(\tau) - \psi(\alpha+\tau) \Leftrightarrow (\gamma - 1) \frac1n\sum_{i} \log(\frac{x_i-\mu}\theta) = \psi(\tau) - \psi(\alpha+\tau) +\frac1n\sum_i \log(1+(\frac{x_i-\mu}\theta)^\gamma) . $$ Neglecting the value γ on the right-hand side we obtain

2.5.1. Transformed beta

This is the Feller-Pareto with μ = 0. So the first component of @ref(eq:fellerparetogradient) simplifies to with γ = α = 2 $$ -(2\tau - 1) \sum_{i} \frac{x_i}{\theta(x_i)} + (2+\tau)\sum_i \frac{2x_i^2}{\theta^3(1+(\frac{x_i}\theta)^2)} = \frac{n}{\theta} \Leftrightarrow -(2\tau - 1) \theta^2 + (2+\tau)\frac1n\sum_i \frac{2x_i^2}{1+(\frac{x_i}\theta)^2} = \theta^2 $$ $$ \theta^2=\frac{2+\tau}{2\tau}\frac1n\sum_i \frac{2x_i^2}{1+(\frac{x_i}\theta)^2}. $$ Neglecting unknown value of τ in the denominator in θ, we get Initial value of τ, α are obtained on the sample (zi)i zi = yi/(1 + yi), yi = xi/θ̂, with initial values of a beta distribution which is based on MME (@ref(eq:betaguessestimator)). Similar to Feller-Pareto, we set

2.5.2. Generalized Pareto

This is the Feller-Pareto with μ = 0 γ = 1. So the first component of @ref(eq:fellerparetogradient) simplifies to with γ = 2 $$ -(\tau - 1) \frac{n}{\theta} + (2+\tau)\sum_i \frac{x_i}{\theta^2(1+\frac{x_i}\theta} = n/\theta \Leftrightarrow -(\tau - 1) \theta + (2+\tau)\frac1n\sum_i \frac{x_i}{(1+\frac{x_i}\theta} = \theta. $$ Neglecting unknown value of τ leads to

Initial value of τ, α are obtained on the sample (zi)i zi = yi/(1 + yi), yi = xi/θ̂, with initial values of a beta distribution which is based on MME (@ref(eq:betaguessestimator)).

2.5.3. Burr

Burr is a Feller-Pareto distribution with μ = 0, τ = 1.

The survival function is 1 − F(x) = (1 + (x/θ)γ)α. Using the median q2, we have log (1/2) = −αlog (1 + (q2/θ)γ). The initial value is

So the first component of @ref(eq:fellerparetogradient) simplifies to with γ = α = 2, τ = 1, μ = 0. $$ - n/\theta + 3\sum_i \frac{2x_i(\frac{x_i}\theta)}{\theta^2(1+(\frac{x_i}\theta)^2)} = n/\theta \Leftrightarrow \theta^2\frac1n\sum_i \frac{2x_i(\frac{x_i}\theta)}{(1+(\frac{x_i}\theta)^2)} = 2/3. $$ Neglecting unknown value in the denominator in θ, we get We use for γ̂ @ref(eq:fellerparetogammahat) with τ = 1 and α = 2 and previous θ̂.

2.5.4. Loglogistic

Loglogistic is a Feller-Pareto distribution with μ = 0, τ = 1, α = 1. The survival function is 1 − F(x) = (1 + (x/θ)γ)−1. So $$ \frac1{1-F(x)}-1 = (x/\theta)^\gamma \Leftrightarrow \log(\frac{F(x)}{1-F(x)}) = \gamma\log(x/\theta). $$ Let q1 and q3 be the first and the third quartile. $$ \log(\frac{1/3}{2/3})= \gamma\log(q_1/\theta), \log(\frac{2/3}{1/3})= \gamma\log(q_3/\theta) \Leftrightarrow -\log(2)= \gamma\log(q_1/\theta), \log(2)= \gamma\log(q_3/\theta). $$ The difference of previous equations simplifies to $$ \hat\gamma=\frac{2\log(2)}{\log(q_3/q_1)}. $$ The sum of previous equations 0 = γlog (q1) + γlog (q3) − 2γlog (θ).

2.5.5. Paralogistic

Paralogistic is a Feller-Pareto distribution with μ = 0, τ = 1, α = γ. The survival function is 1 − F(x) = (1 + (x/θ)α)α. So log (1 − F(x)) = −αlog (1 + (x/θ)α). The log-likelihood is The gradient with respect to θ, α is $$ \begin{pmatrix} ( \alpha - 1)\frac{-n}{\theta} - (\alpha+1)\sum_i \frac{-x_i\alpha(x_i/\theta)^{\alpha-1}}{1+(\frac{x_i}\theta)^\alpha} - n/\theta \\ \sum_{i} \log(\frac{ \frac{x_i}\theta}{1+(\frac{x_i}\theta)^\alpha }) - (\alpha+1)\sum_i \frac{(\frac{x_i}\theta)^\alpha \log(x_i/\theta)}{1+(\frac{x_i}\theta)^\alpha} + 2n/\alpha \\ \end{pmatrix}. $$ The first component cancels when $$ - (\alpha+1)\sum_i \frac{-x_i\alpha(x_i/\theta)^{\alpha-1}}{1+(\frac{x_i}\theta)^\alpha} = \alpha n/\theta \Leftrightarrow (\alpha+1)\frac1n\sum_i \frac{ (x_i)^{\alpha+1}}{1+(\frac{x_i}\theta)^\alpha} = \theta^\alpha. $$ The second component cancels when $$ \frac1n\sum_{i} \log(\frac{ \frac{x_i}\theta}{1+(\frac{x_i}\theta)^\alpha }) = -2/\alpha +(\alpha+1)\frac1n\sum_i \frac{(\frac{x_i}\theta)^\alpha \log(x_i/\theta)}{1+(\frac{x_i}\theta)^\alpha}. $$ Choosing θ = 1, α = 2 in sums leads to $$ \frac1n\sum_{i} \log(\frac{ \frac{x_i}\theta}{1+x_i^2 }) - \frac1n\sum_i \frac{x_i^2\log(x_i)}{1+x_i^2} = -2/\alpha +(\alpha)\frac1n\sum_i \frac{x_i^2\log(x_i)}{1+x_i^2}. $$ Initial estimators are

2.5.6. Inverse Burr

Use Burr estimate on the sample 1/x

2.5.7. Inverse paralogistic

Use paralogistic estimate on the sample 1/x

2.5.8. Inverse pareto

Use pareto estimate on the sample 1/x

2.5.9. Pareto IV

The survival function is $$ 1-F(x) = \left(1+ \left(\frac{x-\mu}{\theta}\right)^{\gamma} \right)^{-\alpha}, $$ see ?Pareto4 in actuar.

The first and third quartiles q1 and q3 verify $$ ((\frac34)^{-1/\alpha}-1)^{1/\gamma} = \frac{q_1-\mu}{\theta}, ((\frac14)^{-1/\alpha}-1)^{1/\gamma} = \frac{q_3-\mu}{\theta}. $$ Hence we get two useful relations

The log-likelihood of a Pareto 4 sample (see Equation (5.2.94) of Arnold (2015) updated with Goulet et al. notation) is $$ \mathcal L(\mu,\theta,\gamma,\alpha) = (\gamma -1) \sum_i \log(\frac{x_i-\mu}{\theta}) -(\alpha+1)\sum_i \log(1+ (\frac{x_i-\mu}{\theta})^{\gamma}) +n\log(\gamma) -n\log(\theta)+n\log(\alpha). $$ Cancelling the derivate of ℒ(μ, θ, γ, α) with respect to α leads to

The MLE of the threshold parameter μ is the minimum. So the initial estimate is slightly under the minimum in order that all observations are strictly above it where ϵ = 0.05.

Initial parameter estimation is μ̂, α = 2 , γ̂ from @ref(eq:pareto4gammarelation) with α, θ̂ from @ref(eq:pareto4thetarelation) with α and γ̂, α̂ from @ref(eq:pareto4alpharelation) with μ̂, θ̂ and γ̂.

2.5.10. Pareto III

Pareto III corresponds to Pareto IV with α = 1.

Initial parameter estimation is μ̂, γ̂ from , θ̂ from with γ̂.

2.5.11. Pareto II

Pareto II corresponds to Pareto IV with γ = 1.

Initial parameter estimation is μ̂, α = 2 , θ̂ from with α and γ = 1, α̂ from with μ̂, θ̂ and γ = 1,

2.5.12. Pareto I

Pareto I corresponds to Pareto IV with γ = 1, μ = θ.

The MLE is

This can be rewritten with the geometric mean of the sample $G_n = (\prod_{i=1}^n X_i)^{1/n}$ as α̂ = log (Gn/μ̂).

Initial parameter estimation is μ̂, α̂ from .

2.5.13. Pareto

Pareto corresponds to Pareto IV with γ = 1, μ = 0.

Initial parameter estimation is α = max (2, 2(m2 − m12)/(m2 − 2m12)), with mi are empirical raw moment of order i, θ̂ from with α and γ = 1, α̂ from with μ = 0, θ̂ and γ = 1.

2.6. Transformed gamma family

2.6.1. Transformed gamma distribution

The log-likelihood is given by ℒ(α, τ, θ) = nlog (τ) + ατilog (xi/θ) − ∑i(xi/θ)τ − ∑ilog (xi) − nlog (Gamma(α)). The gradient with respect to α, τ, θ is given by $$ \begin{pmatrix} \tau- n\psi(\alpha)) \\ n/\tau + \alpha\sum_i \log(x_i/\theta) -\sum_i (x_i/\theta)^{\tau} \log(x_i/\theta) \\ -\alpha\tau /\theta +\sum_i \tau \frac{x_i}{\theta^2}(x_i/\theta)^{\tau-1} \end{pmatrix}. $$ We compute the moment-estimator as in gamma α̂ = m22/μ2, θ̂ = μ2/m1. Then cancelling the first component of the gradient we set $$ \hat\tau = \frac{\psi(\hat\alpha)}{\frac1n\sum_i \log(x_i/\hat\theta) }. $$

2.6.2. gamma distribution

Transformed gamma with τ = 1

We compute the moment-estimator given by

2.6.3. Weibull distribution

Transformed gamma with α = 1

Let $\tilde m=\frac1n\sum_i \log(x_i)$ and $\tilde v=\frac1n\sum_i (\log(x_i) - \tilde m)^2$. We use an approximate MME τ̂ = 1.2/sqrt(), θ̂ = exp( + 0.572/τ̂). Alternatively, we can use the distribution function F(x) = 1 − e−(x/σ)τ ⇒ log (−log (1 − F(x))) = τlog (x) − τlog (θ), Hence the QME for Weibull is $$ \tilde\tau = \frac{ \log(-\log(1-p_1)) - \log(-\log(1-p_2)) }{ \log(x_1) - \log(x_2) }, \tilde\tau = x_3/(-\log(1-p_3))^{1/\tilde\tau} $$ with p1 = 1/4, p2 = 3/4, p3 = 1/2, xi corresponding empirical quantiles.

Initial parameters are τ̃ and θ̃ unless the empirical quantiles x1 = x2, in that case we use τ̂, θ̂.

2.6.4. Exponential distribution

The MLE is the MME λ̂ = 1/m1.

2.7. Inverse transformed gamma family

2.7.1. Inverse transformed gamma distribution

Same as transformed gamma distribution with (1/xi)i.

2.7.2. Inverse gamma distribution

We compute moment-estimator as α̂ = (2m2 − m12)/(m2 − m12), θ̂ = m1m2/(m2 − m12).

2.7.3. Inverse Weibull distribution

We use the QME.

2.7.4. Inverse exponential

Same as transformed gamma distribution with (1/xi)i.

3. Bibliography

3.1. General books

  • N. L. Johnson, S. Kotz, N. Balakrishnan (1994). Continuous univariate distributions, Volume 1, Wiley.
  • N. L. Johnson, S. Kotz, N. Balakrishnan (1995). Continuous univariate distributions, Volume 2, Wiley.
  • N. L. Johnson, A. W. Kemp, S. Kotz (2008). Univariate discrete distributions, Wiley.
  • G. Wimmer (1999), Thesaurus of univariate discrete probability distributions.

3.2. Books dedicated to a distribution family

  • M. Ahsanullah, B.M. Golam Kibria, M. Shakil (2014). Normal and Student’s t Distributions and Their Applications, Springer.
  • B. C. Arnold (2010). Pareto Distributions, Chapman and Hall.
  • A. Azzalini (2013). The Skew-Normal and Related Families.
  • N. Balakrishnan (2014). Handbook of the Logistic Distribution, CRC Press.

3.3. Books with applications

  • C. Forbes, M. Evans, N. Hastings, B. Peacock (2011). Statistical Distributions, Wiley.
  • Z. A. Karian, E. J. Dudewicz, K. Shimizu (2010). Handbook of Fitting Statistical Distributions with R, CRC Press.
  • K. Krishnamoorthy (2015). Handbook of Statistical Distributions with Applications, Chapman and Hall.
  • Klugman, S., Panjer, H. & Willmot, G. (2019). Loss Models: From Data to Decisions, 5th ed., John Wiley & Sons.