The poor rabbit chased by Python and Anaconda :p

0%

Classification Evaluation Metrics: pros and cons

Let’s put a summary of the common Classification evaluation metrics. What they mean and how to use them.

Accuracy

png

Meaning:

Correct identifications / all examples

pros:

  • easy to explain

    cons:

  • works poorly with unbalanced data
  • cannot express uncertainty about a certain prediction

F1 score

png

Meaning:

a weighted average of the precision and recall

pros:

  • can be used for multi-class/multi-label problems by choosing the average method
  • micro: globally by counting total positives
  • macro: calculate f1 for each label, then compute unweighted mean
  • weighted: like macro, but weighted mean

    cons:

Area Under the Receiver Operating Characteristic Curve (ROC AUC)

png

Meaning:

ROC (Receiver Operating Characteristic) Curve tells us about how good the model can distinguish between two things. Thus, AUC ROC indicates how well the probabilities from the positive classes are separated from the negative classes.

pros:

  • independent of the response rate

    cons:

  • based on the ranking of probabilities, not the real probabilities values
  • not be able to interpret your predictions as probabilities
  • problematic especially the data is imbalanced (highly skewed).
  • increasing of AUC doesn’t really reflect a better classifier. It’s just the side-effect of too many negative examples.

Brier Score

png

Meaning:

how close the prediction is to the real case. The lower the closer.

pros:

  • a great supplement to AUC ROC, measuring the scales.

    cons:

Log Loss

png

Meaning:

negative average of the log of corrected predicted probabilities for each instance. The lower the better, but no absolute values. score depends on case.

pros:

  • estimates can be interpreted as probabilities

    cons:

  • If you have a lot of predictions that are near the boundaries, your error metric may be very sensitive to false positives or false negatives.