---
title: Starting values used in fitdistrplus
author: Marie Laure Delignette Muller, Christophe Dutang
date: '`r Sys.Date()`'
output:
  bookdown::html_document2:
    base_format: rmarkdown::html_vignette
    fig_caption: yes
    toc: true
    number_sections: yes
link-citations: true
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{Starting values used in fitdistrplus} 
  %!\VignetteEncoding{UTF-8}
  \usepackage[utf8]{inputenc}
pkgdown:
  as_is: true
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

We denote by the raw empirical moment by 
$$
m_j = \frac1n \sum_{i=1}^n x_i^j,
$$
by the centered empirical moment by 
$$
\mu_j = \frac1n \sum_{i=1}^n (x_i^j-m_1).
$$
Starting values are computed in `R/util-startarg.R`.
We give below the starting values for discrete and continuous
distributions and refer to the bibliograhy sections for
details.


# Discrete distributions

## Base R distribution

### Geometric distribution

The MME is used $\hat p=1/(1+m_1)$.


### Negative binomial distribution

The MME is used $\hat n = m_1^2/(\mu_2-m_1)$.


### Poisson distribution

Both the MME and the MLE is $\hat \lambda = m_1$.

### Binomial distribution

The MME is used 
$$
Var[X]/E[X] = 1-p
\Rightarrow
\hat p = 1- \mu_2/m_1.
$$
the size parameter is 
$$
\hat n = \lceil\max(\max_i x_i, m_1/\hat p)\rceil.
$$


## logarithmic distribution

The expectation simplifies for small values of $p$
$$
E[X] = -\frac{1}{\log(1-p)}\frac{p}{1-p}
\approx
-\frac{1}{-p}\frac{p}{1-p}
=\frac{1}{1-p}.
$$
So the initial estimate is 
$$
\hat p = 1-1/m_1.
$$

## Zero truncated distributions

This distribution are the distribution of $X\vert X>0$ when $X$ follows a particular discrete distributions.
Hence the initial estimate are the one used for base R on sample $x-1$.

## Zero modified distributions

The MLE of the probability parameter is the empirical mass at 0 $\hat p_0=\frac1n \sum_i 1_{x_i=0}$.
For other estimators we use the classical estimator with probability
parameter $1-\hat p_0$.

## Poisson inverse Gaussian distribution

The first two moments are
$$
E[X]=\mu,
Var[X] = \mu+\phi\mu^3.
$$
So the initial estimate are
$$
\hat\mu=m_1,
\hat\phi = (\mu_2 - m_1)/m_1^3.
$$


# Continuous distributions

## Normal distribution

The MLE is the MME so we use the empirical mean and variance.

## Lognormal distribution

The log sample follows a normal distribution, so same as normal on the log sample.

## Beta distribution (of the first kind)

The density function for a beta $\mathcal Be(a,b)$ is
$$
f_X(x) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)} x^{a-1}(1-x)^{b-1}.
$$
The initial estimate is the MME
\begin{equation}
\hat a = m_1 \delta, \hat b = (1-m_1)\delta,
\delta = \frac{m_1(1-m_1)}{\mu_2}-1,
(\#eq:betaguessestimator)
\end{equation}

## Other continuous distribution in `actuar`

### Log-gamma

Use the gamma initial values on the sample $\log(x)$

### Gumbel

The distribution function is
$$
F(x) = \exp(-\exp(-\frac{x-\alpha}{\theta})).
$$
Let $q_1$ and $q_3$ the first and the third quartiles.
$$
\left\{\begin{array}
-\theta\log(-\log(p_1)) = q_1-\alpha \\
-\theta\log(-\log(p_3)) = q_3-\alpha
\end{array}\right.
\Leftrightarrow
\left\{\begin{array}
-\theta\log(-\log(p_1))+\theta\log(-\log(p_3)) = q_1-q_3 \\
\alpha= \theta\log(-\log(p_3)) + q_3
\end{array}\right.
\Leftrightarrow
\left\{\begin{array}
\theta= \frac{q_1-q_3}{\log(-\log(p_3)) - \log(-\log(p_1))} \\
\alpha= \theta\log(-\log(p_3)) + q_3
\end{array}\right..
$$
Using the median for the location parameter $\alpha$ yields to
initial estimate 
$$
\hat\theta= \frac{q_1-q_3}{\log(\log(4/3)) - \log(\log(4))},
\hat\alpha = \hat\theta\log(\log(2)) + q_2.
$$

### Inverse Gaussian distribution

The moments of this distribution are
$$
E[X] = \mu,
Var[X] = \mu^3\phi.
$$
Hence the initial estimate are $\hat\mu=m_1$, $\hat\phi=\mu_2/m_1^3$.

### Generalized beta

This is the distribution of $\theta X^{1/\tau}$ when $X$ is beta distributed $\mathcal Be(a,b)$
The moments are
$$
E[X] = \theta \beta(a+1/\tau, b)/\beta(a,b)
= \theta \frac{\Gamma(a+1/\tau)}{\Gamma(a)}\frac{\Gamma(a+b)}{\Gamma(a+b+1/\tau)},
$$
$$
E[X^2] 
= \theta^2 \frac{\Gamma(a+2/\tau)}{\Gamma(a)}\frac{\Gamma(a+b)}{\Gamma(a+b+2/\tau)}.
$$
Hence for large value of $\tau$, we have
$$
E[X^2] /E[X] = \theta \frac{\Gamma(a+2/\tau)}{\Gamma(a+b+2/\tau)}
\frac{\Gamma(a+b+1/\tau)}{\Gamma(a+1/\tau)}
\approx
\theta.
$$
Note that the MLE of $\theta$ is the maximum
We use 
$$
\hat\tau=3,
\hat\theta = \frac{m_2}{m_1}\max_i x_i 1_{m_2>m_1}
+\frac{m_1}{m_2}\max_i x_i 1_{m_2\geq m_1}.
$$
then we use beta initial estimate on sample 
$(\frac{x_i}{\hat\theta})^{\hat\tau}$.


## Feller-Pareto family


The Feller-Pareto distribution is the distribution $X=\mu+\theta(1/B-1)^{1/\gamma}$ when
$B$ follows a beta distribution with shape parameters $\alpha$ and  $\tau$.
See details at <https://doi.org/10.18637/jss.v103.i06>
Hence let $Y = (X-\mu)/\theta$, we have
$$
\frac{Y}{1+Y} = 
\frac{X-\mu}{\theta+X-\mu} = (1-B)^{1/\gamma}.
$$
For $\gamma$ close to 1, $\frac{Y}{1+Y}$ is approximately beta distributed $\tau$ and  $\alpha$.

The log-likelihood is
\begin{equation}
\mathcal L(\mu, \theta, \alpha, \gamma, \tau)
= (\tau \gamma - 1) \sum_{i} \log(\frac{x_i-\mu}\theta) 
- (\alpha+\tau)\sum_i \log(1+(\frac{x_i-\mu}\theta)^\gamma)
+ n\log(\gamma) - n\log(\theta) -n \log(\beta(\alpha,\tau)).
(\#eq:fellerparetologlik).
\end{equation}
The MLE of $\mu$ is the minimum.

The gradient with respect to $\theta, \alpha, \gamma, \tau$ is
\begin{equation}
\nabla
\mathcal L(\mu, \theta, \alpha, \gamma, \tau)
=
\begin{pmatrix}
-(\tau \gamma - 1) \sum_{i} \frac{x_i}{\theta(x_i-\mu)} 
+ (\alpha+\tau)\sum_i \frac{x_i\gamma(\frac{x_i-\mu}\theta)^{\gamma-1}}{\theta^2(1+(\frac{x_i-\mu}\theta)^\gamma)}
- n/\theta
\\
- \sum_i \log(1+(\frac{x_i-\mu}\theta)^\gamma)
-n(\psi(\tau) - \psi(\alpha+\tau))
\\
(\tau  - 1) \sum_{i} \log(\frac{x_i-\mu}\theta) 
- (\alpha+\tau)\sum_i \frac{(\frac{x_i-\mu}\theta)^\gamma}{ 1+(\frac{x_i-\mu}\theta)^\gamma}\log(\frac{x_i-\mu}\theta)
+ n/\gamma
\\
 (\gamma - 1) \sum_{i} \log(\frac{x_i-\mu}\theta) 
- \sum_i \log(1+(\frac{x_i-\mu}\theta)^\gamma)
-n (\psi(\tau) - \psi(\alpha+\tau))
\end{pmatrix}.
(\#eq:fellerparetogradient)
\end{equation}
Cancelling the first component of score for $\gamma=\alpha=2$, we get
$$
-(2\tau  - 1) \sum_{i} \frac{x_i}{\theta(x_i-\mu)} 
+ (2+\tau)\sum_i \frac{x_i 2(x_i-\mu)}{\theta^3(1+(\frac{x_i-\mu}\theta)^2)}
= \frac{n}{\theta}
\Leftrightarrow
-(2\tau  - 1)\theta^2\frac1n \sum_{i} \frac{x_i}{x_i-\mu} 
+ (2+\tau) \frac1n\sum_i \frac{x_i 2(x_i-\mu)}{(1+(\frac{x_i-\mu}\theta)^2)}
= \theta^2
$$
$$
\Leftrightarrow
(2+\tau) \frac1n\sum_i \frac{x_i 2(x_i-\mu)}{1+(\frac{x_i-\mu}\theta)^2}
= (2\tau  - 1)\theta^2\left(\frac1n \sum_{i} \frac{x_i}{x_i-\mu} -1\right)
\Leftrightarrow
\sqrt{
\frac{(2+\tau) \frac1n\sum_i \frac{x_i 2(x_i-\mu)}{1+(\frac{x_i-\mu}\theta)^2}
}{(2\tau  - 1)\left(\frac1n \sum_{i} \frac{x_i}{x_i-\mu} -1\right)}
}
= \theta.
$$
Neglecting unknown value of $\tau$ and the denominator in $\theta$, we get
with $\hat\mu$ set with (\@ref(eq:pareto4muinit))
\begin{equation}
\hat\theta = \sqrt{
\frac{ \frac1n\sum_i \frac{x_i 2(x_i-\hat\mu)}{1+(x_i-\hat\mu)^2}
}{\left(\frac1n \sum_{i} \frac{x_i}{x_i-\hat\mu} -1\right)}
}.
(\#eq:fellerparetothetahat)
\end{equation}
Initial value of $\tau,\alpha$ are obtained on the sample $(z_i)_i$
$$
z_i = y_i/(1+y_i),
y_i = (x_i - \hat\mu)/\hat\theta,
$$
with initial values of a beta distribution which is based on MME
(\@ref(eq:betaguessestimator)).

Cancelling the last component of the gradient leads to
$$
(\gamma - 1) \frac1n\sum_{i} \log(\frac{x_i-\mu}\theta) 
- \frac1n\sum_i \log(1+(\frac{x_i-\mu}\theta)^\gamma)
= \psi(\tau) - \psi(\alpha+\tau)
\Leftrightarrow
(\gamma - 1) \frac1n\sum_{i} \log(\frac{x_i-\mu}\theta) 
=
\psi(\tau) - \psi(\alpha+\tau)
+\frac1n\sum_i \log(1+(\frac{x_i-\mu}\theta)^\gamma) .
$$
Neglecting the value $\gamma$ on the right-hand side we obtain
\begin{equation}
\hat\gamma = 
1+
\frac{
\psi(\tau) - \psi(\alpha+\tau)
+\frac1n\sum_i \log(1+(\frac{x_i-\mu}\theta)) 
}{
\frac1n\sum_{i} \log(\frac{x_i-\mu}\theta) 
}.
(\#eq:fellerparetogammahat)
\end{equation}


### Transformed beta

This is the Feller-Pareto with $\mu=0$. 
So the first component of \@ref(eq:fellerparetogradient) simplifies to with
$\gamma=\alpha=2$
$$
-(2\tau  - 1) \sum_{i} \frac{x_i}{\theta(x_i)} 
+ (2+\tau)\sum_i \frac{2x_i^2}{\theta^3(1+(\frac{x_i}\theta)^2)}
= \frac{n}{\theta}
\Leftrightarrow
-(2\tau  - 1) \theta^2
+ (2+\tau)\frac1n\sum_i \frac{2x_i^2}{1+(\frac{x_i}\theta)^2}
= \theta^2
$$
$$
\theta^2=\frac{2+\tau}{2\tau}\frac1n\sum_i \frac{2x_i^2}{1+(\frac{x_i}\theta)^2}.
$$
Neglecting unknown value of $\tau$ in the denominator in $\theta$, we get
\begin{equation}
\hat\theta = \sqrt{
\frac1n\sum_i \frac{2x_i^2}{1+x_i^2}
}.
(\#eq:trbetathetahat)
\end{equation}
Initial value of $\tau,\alpha$ are obtained on the sample $(z_i)_i$
$$
z_i = y_i/(1+y_i),
y_i = x_i/\hat\theta,
$$
with initial values of a beta distribution which is based on MME
(\@ref(eq:betaguessestimator)).
Similar to Feller-Pareto, we set
\begin{equation}
\hat\gamma = 
1+
\frac{
\psi(\tau) - \psi(\alpha+\tau)
+\frac1n\sum_i \log(1+\frac{x_i}\theta) 
}{
\frac1n\sum_{i} \log(\frac{x_i}\theta) 
}.
(\#eq:fellerparetogammahat)
\end{equation}

### Generalized Pareto

This is the Feller-Pareto with $\mu=0$ $\gamma=1$. 
So the first component of \@ref(eq:fellerparetogradient) simplifies to with
$\gamma=2$
$$
-(\tau - 1) \frac{n}{\theta} 
+ (2+\tau)\sum_i \frac{x_i}{\theta^2(1+\frac{x_i}\theta}
= n/\theta
\Leftrightarrow
-(\tau - 1) \theta
+ (2+\tau)\frac1n\sum_i \frac{x_i}{(1+\frac{x_i}\theta}
= \theta.
$$
Neglecting unknown value of $\tau$ leads to
\begin{equation}
\hat\theta = 
\frac1n\sum_i \frac{x_i}{1+x_i}
(\#eq:generalizedparetotheta)
\end{equation}

Initial value of $\tau,\alpha$ are obtained on the sample $(z_i)_i$
$$
z_i = y_i/(1+y_i),
y_i = x_i/\hat\theta,
$$
with initial values of a beta distribution which is based on MME
(\@ref(eq:betaguessestimator)).

### Burr

Burr is a Feller-Pareto distribution with $\mu=0$, $\tau=1$.

The survival function is
$$
1-F(x) = (1+(x/\theta)^\gamma)^{-\alpha}.
$$
Using the median $q_2$, we have
$$
\log(1/2) = - \alpha \log(1+(q_2/\theta)^\gamma).
$$
The initial value is
\begin{equation}
\alpha
=
\frac{\log(2)}{\log(1+(q_2/\theta)^\gamma)},
(\#eq:burralpharelation)
\end{equation}

So the first component of \@ref(eq:fellerparetogradient) simplifies to with
$\gamma=\alpha=2$, $\tau=1$, $\mu=0$.
$$
- n/\theta
+ 3\sum_i \frac{2x_i(\frac{x_i}\theta)}{\theta^2(1+(\frac{x_i}\theta)^2)}
= n/\theta
\Leftrightarrow
\theta^2\frac1n\sum_i \frac{2x_i(\frac{x_i}\theta)}{(1+(\frac{x_i}\theta)^2)}
= 2/3.
$$
Neglecting unknown value in the denominator in $\theta$, we get
\begin{equation}
\hat\theta = \sqrt{
\frac{2}{3
\frac1n\sum_i \frac{2x_i^2}{1+(x_i)^2}
}
}.
(\#eq:trbetathetahat)
\end{equation}
We use for $\hat\gamma$ \@ref(eq:fellerparetogammahat) with $\tau=1$ and $\alpha=2$ and previous $\hat\theta$.


### Loglogistic

Loglogistic is a Feller-Pareto distribution with $\mu=0$, $\tau=1$, $\alpha=1$.
The survival function is
$$
1-F(x) = (1+(x/\theta)^\gamma)^{-1}.
$$
So
$$
\frac1{1-F(x)}-1 = (x/\theta)^\gamma
\Leftrightarrow
\log(\frac{F(x)}{1-F(x)}) = \gamma\log(x/\theta).
$$
Let $q_1$ and $q_3$ be the first and the third quartile.
$$
\log(\frac{1/3}{2/3})= \gamma\log(q_1/\theta),
\log(\frac{2/3}{1/3})= \gamma\log(q_3/\theta)
\Leftrightarrow
-\log(2)= \gamma\log(q_1/\theta),
\log(2)= \gamma\log(q_3/\theta).
$$
The difference of previous equations simplifies to
$$
\hat\gamma=\frac{2\log(2)}{\log(q_3/q_1)}. 
$$
The sum of previous equations
$$
0 = \gamma\log(q_1)+\gamma\log(q_3) - 2\gamma\log(\theta).
$$
\begin{equation}
\hat\theta = \frac12 e^{\log(q_1q_3)}.
(\#eq:llogisthetahat)
\end{equation}

### Paralogistic

Paralogistic is a Feller-Pareto distribution with $\mu=0$, $\tau=1$, $\alpha=\gamma$.
The survival function is
$$
1-F(x) = (1+(x/\theta)^\alpha)^{-\alpha}.
$$
So
$$
\log(1-F(x)) = -\alpha \log(1+(x/\theta)^\alpha).
$$
The log-likelihood is
\begin{equation}
\mathcal L(\theta, \alpha)
= ( \alpha - 1) \sum_{i} \log(\frac{x_i}\theta) 
- (\alpha+1)\sum_i \log(1+(\frac{x_i}\theta)^\alpha)
+ 2n\log(\alpha) - n\log(\theta).
(\#eq:paralogisloglik)
\end{equation}
The gradient with respect to $\theta$, $\alpha$ is
$$
\begin{pmatrix}
( \alpha - 1)\frac{-n}{\theta}
- (\alpha+1)\sum_i \frac{-x_i\alpha(x_i/\theta)^{\alpha-1}}{1+(\frac{x_i}\theta)^\alpha}
- n/\theta
\\
\sum_{i} \log(\frac{ \frac{x_i}\theta}{1+(\frac{x_i}\theta)^\alpha }) 
- (\alpha+1)\sum_i \frac{(\frac{x_i}\theta)^\alpha \log(x_i/\theta)}{1+(\frac{x_i}\theta)^\alpha}
+ 2n/\alpha
\\
\end{pmatrix}.
$$
The first component cancels when
$$
- (\alpha+1)\sum_i \frac{-x_i\alpha(x_i/\theta)^{\alpha-1}}{1+(\frac{x_i}\theta)^\alpha}
= \alpha n/\theta
\Leftrightarrow
(\alpha+1)\frac1n\sum_i \frac{ (x_i)^{\alpha+1}}{1+(\frac{x_i}\theta)^\alpha}
= \theta^\alpha.
$$
The second component cancels when
$$
\frac1n\sum_{i} \log(\frac{ \frac{x_i}\theta}{1+(\frac{x_i}\theta)^\alpha }) 
= -2/\alpha
+(\alpha+1)\frac1n\sum_i \frac{(\frac{x_i}\theta)^\alpha \log(x_i/\theta)}{1+(\frac{x_i}\theta)^\alpha}.
$$
Choosing $\theta=1$, $\alpha=2$ in sums leads to 
$$
\frac1n\sum_{i} \log(\frac{ \frac{x_i}\theta}{1+x_i^2 }) 
- \frac1n\sum_i \frac{x_i^2\log(x_i)}{1+x_i^2}
= -2/\alpha
+(\alpha)\frac1n\sum_i \frac{x_i^2\log(x_i)}{1+x_i^2}.
$$
Initial estimators are
\begin{equation}
\hat\alpha = \frac{
\frac1n\sum_{i} \log(\frac{ x_i}{1+x_i^2 }) 
- \frac1n\sum_i \frac{x_i^2\log(x_i)}{1+x_i^2}
}{
\frac1n\sum_i \frac{x_i^2\log(x_i)}{1+x_i^2} - 2
},
(\#eq:paralogisalphahat)
\end{equation}
\begin{equation}
\hat\theta = (\hat\alpha+1)\frac1n\sum_i \frac{ (x_i)^{\hat\alpha+1}}{1+(x_i)^{\hat\alpha}}.
(\#eq:paralogisthetahat)
\end{equation}


### Inverse Burr

Use Burr estimate on the sample $1/x$

### Inverse paralogistic

Use paralogistic estimate on the sample $1/x$

### Inverse pareto

Use pareto estimate on the sample $1/x$

### Pareto IV

The survival function is 
$$
1-F(x) = \left(1+
\left(\frac{x-\mu}{\theta}\right)^{\gamma}
\right)^{-\alpha},
$$ 
see `?Pareto4` in `actuar`. 

The first and third quartiles $q_1$ and $q_3$ verify 
$$
((\frac34)^{-1/\alpha}-1)^{1/\gamma} = \frac{q_1-\mu}{\theta},
((\frac14)^{-1/\alpha}-1)^{1/\gamma} = \frac{q_3-\mu}{\theta}.
$$ 
Hence we get two useful relations 
\begin{equation}
\gamma
=
\frac{
\log\left(
\frac{
(\frac43)^{1/\alpha}-1
}{
(4)^{1/\alpha}-1
}
\right)
}{
\log\left(\frac{q_1-\mu}{q_3-\mu}\right)
},
(\#eq:pareto4gammarelation)
\end{equation}
\begin{equation}
\theta
= 
\frac{q_1- q_3
}{
((\frac43)^{1/\alpha}-1)^{1/\gamma}
- ((4)^{1/\alpha}-1)^{1/\gamma}
}.
(\#eq:pareto4thetarelation)
\end{equation}

The log-likelihood of a Pareto 4 sample (see Equation (5.2.94) of Arnold (2015)
updated with Goulet et al. notation) is
$$
\mathcal L(\mu,\theta,\gamma,\alpha)
= (\gamma -1) \sum_i \log(\frac{x_i-\mu}{\theta})
-(\alpha+1)\sum_i \log(1+ (\frac{x_i-\mu}{\theta})^{\gamma})
+n\log(\gamma) -n\log(\theta)+n\log(\alpha).
$$
Cancelling the derivate of $\mathcal L(\mu,\theta,\gamma,\alpha)$ with respect to $\alpha$
leads to 
\begin{equation}
\alpha
=n/\sum_i \log(1+ (\frac{x_i-\mu}{\theta})^{\gamma}).
(\#eq:pareto4alpharelation)
\end{equation}

The MLE of the threshold parameter $\mu$
is the minimum. So the initial estimate is slightly under the minimum in
order that all observations are strictly above it 
\begin{equation}
\hat\mu = 
\left\{ \begin{array}{ll}
(1-\epsilon) \min_i x_i & \text{if } \min_i x_i <0 \\
(1+\epsilon)\min_i x_i & \text{if } \min_i x_i \geq 0 \\
\end{array} \right. .
(\#eq:pareto4muinit)
\end{equation}
where $\epsilon=0.05$.

Initial parameter estimation is $\hat\mu$, 
$\alpha^\star = 2$ ,
$\hat\gamma$ from \@ref(eq:pareto4gammarelation) with $\alpha^\star$,
$\hat\theta$ from \@ref(eq:pareto4thetarelation) with $\alpha^\star$ and $\hat\gamma$,
$\hat\alpha$ from \@ref(eq:pareto4alpharelation) with $\hat\mu$, $\hat\theta$ and $\hat\gamma$.


### Pareto III

Pareto III corresponds to Pareto IV with $\alpha=1$. 
\begin{equation}
\gamma
=
\frac{
\log\left(
\frac{
\frac43-1
}{
4-1
}
\right)
}{
\log\left(\frac{q_1-\mu}{q_3-\mu}\right)
},
\label{eq:pareto3:gamma:relation}
\end{equation}

\begin{equation}
\theta
= 
\frac{
(\frac13)^{1/\gamma}
- (3)^{1/\gamma}
}{q_1- q_3
}.
\label{eq:pareto3:theta:relation}
\end{equation}

Initial parameter estimation is $\hat\mu$,
$\hat\gamma$ from \eqref{eq:pareto3:gamma:relation},
$\hat\theta$ from \eqref{eq:pareto3:theta:relation} with $\hat\gamma$.


### Pareto II

Pareto II corresponds to Pareto IV with $\gamma=1$. 

\begin{equation}
\theta
= 
\frac{
(\frac43)^{1/\alpha}
- 4^{1/\alpha}
}{q_1- q_3
}.
\label{eq:pareto2:theta:relation}
\end{equation}

Initial parameter estimation is $\hat\mu$, 
$\alpha^\star = 2$ ,
$\hat\theta$ from \eqref{eq:pareto4:theta:relation} with $\alpha^\star$ and $\gamma=1$,
$\hat\alpha$ from \eqref{eq:pareto4:alpha:relation} with $\hat\mu$, $\hat\theta$ and $\gamma=1$,


### Pareto I

Pareto I corresponds to Pareto IV with $\gamma=1$, $\mu=\theta$. 

The MLE is 
\begin{equation}
\hat\mu = \min_i X_i,
\hat\alpha = \left(\frac1n \sum_{i=1}^n \log(X_i/\hat\mu) \right)^{-1}.
\label{eq:pareto1:alpha:mu:relation}
\end{equation}

This can be rewritten with the geometric mean of the sample
$G_n = (\prod_{i=1}^n X_i)^{1/n}$ as
$$
\hat\alpha = \log(G_n/\hat\mu).
$$


Initial parameter estimation is $\hat\mu$, 
$\hat\alpha$ from \eqref{eq:pareto1:alpha:mu:relation}.


### Pareto

Pareto corresponds to Pareto IV with $\gamma=1$, $\mu=0$. 
\begin{equation}
\theta
= 
\frac{
(\frac43)^{1/\alpha}
- 4^{1/\alpha}
}{q_1- q_3
}.
\label{eq:pareto:theta:relation}
\end{equation}

Initial parameter estimation is 
$$
\alpha^\star = \max(2,  2(m_2-m_1^2)/(m_2-2m_1^2)),
$$
with $m_i$ are empirical raw moment of order $i$,
$\hat\theta$ from \eqref{eq:pareto4:theta:relation} with $\alpha^\star$ and $\gamma=1$,
$\hat\alpha$ from \eqref{eq:pareto4:alpha:relation} with $\mu=0$, $\hat\theta$ and $\gamma=1$.

## Transformed gamma family

### Transformed gamma distribution

The log-likelihood is given by
$$
\mathcal L(\alpha,\tau,\theta)
= n\log(\tau) + \alpha\tau\sum_i \log(x_i/\theta)
-\sum_i (x_i/\theta)^\tau
- \sum_i\log(x_i) - n\log(Gamma(\alpha)).
$$
The gradient with respect to $\alpha,\tau,\theta$ is given by 
$$
\begin{pmatrix}
\tau- n\psi(\alpha)) 
\\
n/\tau + \alpha\sum_i \log(x_i/\theta)
-\sum_i (x_i/\theta)^{\tau} \log(x_i/\theta)
\\
-\alpha\tau /\theta
+\sum_i \tau \frac{x_i}{\theta^2}(x_i/\theta)^{\tau-1}
\end{pmatrix}.
$$
We compute the moment-estimator as in gamma \eqref{eq:gamma:relation}
$$
\hat\alpha = m_2^2/\mu_2, 
\hat\theta= \mu_2/m_1.
$$
Then cancelling the first component of the gradient we set
$$
\hat\tau =  \frac{\psi(\hat\alpha)}{\frac1n\sum_i \log(x_i/\hat\theta) }.
$$

### gamma distribution

Transformed gamma with $\tau=1$

We compute the moment-estimator given by
\begin{equation}
\hat\alpha = m_2^2/\mu_2, 
\hat\theta= \mu_2/m_1.
\label{eq:gamma:relation}
\end{equation}


### Weibull distribution

Transformed gamma with $\alpha=1$

Let $\tilde m=\frac1n\sum_i \log(x_i)$ and  $\tilde v=\frac1n\sum_i (\log(x_i) - \tilde m)^2$.
We use an approximate MME
$$
\hat\tau =  1.2/sqrt(\tilde v),
\hat\theta =  exp(\tilde m + 0.572/\hat \tau).
$$
Alternatively, we can use the distribution function 
$$
F(x) = 1 - e^{-(x/\sigma)^\tau}
\Rightarrow
\log(-\log(1-F(x))) = \tau\log(x) - \tau\log(\theta),
$$
Hence the QME for Weibull is 
$$
\tilde\tau = \frac{
\log(-\log(1-p_1))
- \log(-\log(1-p_2))
}{
\log(x_1) - \log(x_2)
},
\tilde\tau
= x_3/(-\log(1-p_3))^{1/\tilde\tau}
$$
with $p_1=1/4$, $p_2=3/4$, $p_3=1/2$, $x_i$ corresponding empirical quantiles.

Initial parameters are $\tilde\tau$ and $\tilde\theta$ unless the empirical quantiles $x_1=x_2$, in that case we use $\hat\tau$, $\hat\theta$.

### Exponential distribution

The MLE is the MME
$\hat\lambda = 1/m_1.$

## Inverse transformed gamma family

### Inverse transformed gamma distribution

Same as transformed gamma distribution with $(1/x_i)_i$.

### Inverse gamma distribution

We compute moment-estimator as 
$$
\hat\alpha = (2m_2-m_1^2)/(m_2-m_1^2), 
\hat\theta= m_1m_2/(m_2-m_1^2).
$$

### Inverse Weibull distribution

We use the QME.

### Inverse exponential 

Same as transformed gamma distribution with $(1/x_i)_i$.

# Bibliography

## General books

- N. L. Johnson, S. Kotz, N. Balakrishnan (1994). Continuous univariate distributions, Volume 1, Wiley. 
- N. L. Johnson, S. Kotz, N. Balakrishnan (1995). Continuous univariate distributions, Volume 2, Wiley. 
- N. L. Johnson, A. W. Kemp, S. Kotz (2008). Univariate discrete distributions, Wiley. 
- G. Wimmer (1999),  Thesaurus of univariate discrete probability distributions.

## Books dedicated to a distribution family

- M. Ahsanullah, B.M. Golam Kibria, M. Shakil (2014). Normal and Student's t Distributions and Their Applications, Springer. 
- B. C. Arnold (2010). Pareto Distributions, Chapman and Hall. 
- A. Azzalini (2013). The Skew-Normal and Related Families. 
- N. Balakrishnan (2014). Handbook of the Logistic Distribution, CRC Press. 

## Books with applications

- C. Forbes, M. Evans, N. Hastings, B. Peacock (2011). Statistical Distributions, Wiley. 
- Z. A. Karian, E. J. Dudewicz, K. Shimizu (2010). Handbook of Fitting Statistical Distributions with R, CRC Press. 
- K. Krishnamoorthy (2015). Handbook of Statistical Distributions with Applications, Chapman and Hall. 
- Klugman, S., Panjer, H. & Willmot, G. (2019). Loss Models: From Data to Decisions, 5th ed., John Wiley & Sons.