RandomForestRegressor

mutable struct RandomForestRegressor <: MLJModelInterface.Deterministic

A simple Random Forest model for regression with support for Missing data, from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • n_trees::Int64: Number of (decision) trees in the forest [def: 30]
  • max_depth::Int64: The maximum depth the tree is allowed to reach. When this is reached the node is forced to become a leaf [def: 0, i.e. no limits]
  • min_gain::Float64: The minimum information gain to allow for a node's partition [def: 0]
  • min_records::Int64: The minimum number of records a node must holds to consider for a partition of it [def: 2]
  • max_features::Int64: The maximum number of (random) features to consider at each partitioning [def: 0, i.e. square root of the data dimension]
  • splitting_criterion::Function: This is the name of the function to be used to compute the information gain of a specific partition. This is done by measuring the difference betwwen the "impurity" of the labels of the parent node with those of the two child nodes, weighted by the respective number of items. [def: variance]. Either variance or a custom function. It can also be an anonymous function.
  • β::Float64: Parameter that regulate the weights of the scoring of each tree, to be (optionally) used in prediction based on the error of the individual trees computed on the records on which trees have not been trained. Higher values favour "better" trees, but too high values will cause overfitting [def: 0, i.e. uniform weigths]
  • rng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]

Example:

julia> using MLJ

julia> X, y        = @load_boston;

julia> modelType   = @load RandomForestRegressor pkg = "BetaML" verbosity=0
BetaML.Trees.RandomForestRegressor

julia> model       = modelType()
RandomForestRegressor(
  n_trees = 30, 
  max_depth = 0, 
  min_gain = 0.0, 
  min_records = 2, 
  max_features = 0, 
  splitting_criterion = BetaML.Utils.variance, 
  β = 0.0, 
  rng = Random._GLOBAL_RNG())

julia> mach        = machine(model, X, y);

julia> fit!(mach);
[ Info: Training machine(RandomForestRegressor(n_trees = 30, …), …).

julia> ŷ           = predict(mach, X);

julia> hcat(y,ŷ)
506×2 Matrix{Float64}:
 24.0  25.8433
 21.6  22.4317
 34.7  35.5742
 33.4  33.9233
  ⋮    
 23.9  24.42
 22.0  22.4433
 11.9  15.5833