Likelihood is just a Join Probability of the data given model parameters , but viewed as a function of , i.e.

One way to do Maximum Likelihood Estimation is to minimize the negative log likelihood.

That is

Log Likelihood

It makes the math much easier:

We do this because might be an extremely small number, so we perform addition instead.

Log-likelihood (Average):

Negative log-likelihood (Average):