Available models

Your help is welcome to help extend these lists. Note the current limitations:

  • The models are built and tested assuming n > p; if this doesn't hold, tricks should be employed to speed up computations; these have not been implemented yet.
  • CV-aware code not implemented yet (code that re-uses computations when fitting over a number of hyper-parameters); "Meta" functionalities such as One-vs-All or Cross-Validation are left to other packages such as MLJ.
  • No support yet for sparse matrices.
  • Stochastic solvers have not yet been implemented.
  • All computations are assumed to be done in Float64.

Regression models

RegressorsFormulation¹Available solversComments
OLS & RidgeL2Loss + 0/L2Analytical² or CG³
Lasso & Elastic-NetL2Loss + 0/L2 + L1(F)ISTA⁴
Robust 0/L2RobustLoss⁵ + 0/L2Newton, NewtonCG, LBFGS, IWLS-CG⁶no scale⁷
Robust L1/ENRobustLoss + 0/L2 + L1(F)ISTA
Quantile⁸ + 0/L2RobustLoss + 0/L2LBFGS, IWLS-CG
Quantile L1/ENRobustLoss + 0/L2 + L1(F)ISTA
  1. "0" stands for no penalty
  2. Analytical means the solution is computed in "one shot" using the \ solver,
  3. CG = conjugate gradient
  4. (Accelerated) Proximal Gradient Descent
  5. Huber, Andrews, Bisquare, Logistic, Fair and Talwar weighing functions available.
  6. Iteratively re-Weighted Least Squares where each system is solved iteratively via CG
  7. In other packages such as Scikit-Learn, a scale factor is estimated along with the parameters, this is a bit ad-hoc and corresponds more to a statistical perspective, further it does not work well with penalties; we recommend using cross-validation to set the parameter of the Huber Loss.
  8. Includes as special case the least absolute deviation (LAD) regression when δ=0.5.

Classification models

ClassifiersFormulationAvailable solversComments
Logistic 0/L2LogisticLoss + 0/L2Newton, Newton-CG, LBFGSyᵢ∈{±1}
Logistic L1/ENLogisticLoss + 0/L2 + L1(F)ISTAyᵢ∈{±1}
Multinomial 0/L2MultinomialLoss + 0/L2Newton-CG, LBFGSyᵢ∈{1,...,c}
Multinomial L1/ENMultinomialLoss + 0/L2 + L1ISTA, FISTAyᵢ∈{1,...,c}

Unless otherwise specified:

  • Newton-like solvers use Hager-Zhang line search (default in Optim.jl)
  • ISTA, FISTA solvers use backtracking line search and a shrinkage factor of β=0.8

Note: these models were all tested for correctness whenever a direct comparison with another package was possible, usually by comparing the objective function at the coefficients returned (cf. the tests):

  • (against scikit-learn): Lasso, Elastic-Net, Logistic (L1/L2/EN), Multinomial (L1/L2/EN)
  • (against quantreg): Quantile (0/L1)

Systematic timing benchmarks have not been run yet but it's planned (see this issue).