RandomForestRegressor
RandomForestRegressorA model type for constructing a CART random forest regressor, based on DecisionTree.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
RandomForestRegressor = @load RandomForestRegressor pkg=DecisionTreeDo model = RandomForestRegressor() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in RandomForestRegressor(max_depth=...).
DecisionTreeRegressor implements the standard Random Forest algorithm, originally published in Breiman, L. (2001): "Random Forests.", Machine Learning, vol. 45, pp. 5–32
Training data
In MLJ or MLJBase, bind an instance model to data with
mach = machine(model, X, y)where
X: any table of input features (eg, aDataFrame) whose columns each have one of the following element scitypes:Continuous,Count, or<:OrderedFactor; check column scitypes withschema(X)y: the target, which can be anyAbstractVectorwhose element scitype isContinuous; check the scitype withscitype(y)
Train the machine with fit!(mach, rows=...).
Hyperparameters
max_depth=-1: max depth of the decision tree (-1=any)min_samples_leaf=1: min number of samples each leaf needs to havemin_samples_split=2: min number of samples needed for a splitmin_purity_increase=0: min purity needed for a splitn_subfeatures=-1: number of features to select at random (0 for all, -1 for square root of number of features)n_trees=10: number of trees to trainsampling_fraction=0.7fraction of samples to train each tree onfeature_importance: method to use for computing feature importances. One of(:impurity, :split)rng=Random.GLOBAL_RNG: random number generator or seed
Operations
predict(mach, Xnew): return predictions of the target given new featuresXnewhaving the same scitype asXabove.
Fitted parameters
The fields of fitted_params(mach) are:
forest: theEnsembleobject returned by the core DecisionTree.jl algorithm
Report
The fields of report(mach) are:
features: the names of the features encountered in training
Accessor functions
feature_importances(mach)returns a vector of(feature::Symbol => importance)pairs; the type of importance is determined by the hyperparameterfeature_importance(see above)
Examples
using MLJ
Forest = @load RandomForestRegressor pkg=DecisionTree
forest = Forest(max_depth=4, min_samples_split=3)
X, y = make_regression(100, 2) ## synthetic data
mach = machine(forest, X, y) |> fit!
Xnew, _ = make_regression(3, 2)
yhat = predict(mach, Xnew) ## new predictions
fitted_params(mach).forest ## raw `Ensemble` object from DecisionTree.jl
feature_importances(mach)See also DecisionTree.jl and the unwrapped model type MLJDecisionTreeInterface.DecisionTree.RandomForestRegressor.