

A model type for constructing a CatBoost regressor, based on CatBoost.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

CatBoostRegressor = @load CatBoostRegressor pkg=CatBoost

Do model = CatBoostRegressor() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in CatBoostRegressor(iterations=...).

Training data

In MLJ or MLJBase, bind an instance model to data with

mach = machine(model, X, y)


  • X: any table of input features (eg, a DataFrame) whose columns each have one of the following element scitypes: Continuous, Count, Finite, Textual; check column scitypes with schema(X). Textual columns will be passed to catboost as text_features, Multiclass columns will be passed to catboost as cat_features, and OrderedFactor columns will be converted to integers.
  • y: the target, which can be any AbstractVector whose element scitype is Continuous; check the scitype with scitype(y)

Train the machine with fit!(mach, rows=...).


More details on the catboost hyperparameters, here are the Python docs:


  • predict(mach, Xnew): probabilistic predictions of the target given new features Xnew having the same scitype as X above.

Accessor functions

  • feature_importances(mach): return vector of feature importances, in the form of feature::Symbol => importance::Real pairs

Fitted parameters

The fields of fitted_params(mach) are:

  • model: The Python CatBoostRegressor model


The fields of report(mach) are:

  • feature_importances: Vector{Pair{Symbol, Float64}} of feature importances


using CatBoost.MLJCatBoostInterface
using MLJ

X = (
    duration = [1.5, 4.1, 5.0, 6.7], 
    n_phone_calls = [4, 5, 6, 7], 
    department = coerce(["acc", "ops", "acc", "ops"], Multiclass), 
y = [2.0, 4.0, 6.0, 7.0]

model = CatBoostRegressor(iterations=5)
mach = machine(model, X, y)
preds = predict(mach, X)

See also catboost and the unwrapped model type CatBoost.CatBoostRegressor.