I think there is no solid evidence to prove which is better than another. These two algorithms build from different methods with different hyper-parameters to tune. Therefore, I think the right approach is to understand the pros and cons of the two, recall the pros and cons when solving your specific problem.


  • Kernel based, turn linear sep problem to non-linear problem.
  • Good for high dimension feature space
  • slow for large training set
  • hard to interpret
  • hard to find a good kernel
  • not robust for noisy training set
  • sensitive for missing data

Decision tree

  • Good for handling non-linear problems
  • Handle categorical data well
  • easy to interpret
  • can handle some noisy data
  • easy over-fitting
  • not work so well for large feature space
  • not work so well if some features have close correlation

Logistic regression

  • Assumptions: linear relationship/multivariate normal/little or no multicollinearity/no or little auto-correlation / meaning the residuals are equal across the regression line
  • handle linear problem well
  • robust to noise, use l1,l2 regularization for model selection, avoid overfitting
  • computationally efficient
  • variations with regularizations:
    • Lasso: penalizes the absolute size of coefficients, offers automatic feature selection
    • Ridge: penalizes the squared size of coefficients, offers automatic feature shrinkage
    • Elastic Net: combination of Lasso and Ridge

Naive Bayes

  • computationally efficient when P is large by alleviating the curse of dimensionality
  • works surprisingly well for some cases even if the condition doesn’t hold