EnsembleModel

EnsembleModel(model,
              atomic_weights=Float64[],
              bagging_fraction=0.8,
              n=100,
              rng=GLOBAL_RNG,
              acceleration=CPU1(),
              out_of_bag_measure=[])

Create a model for training an ensemble of n clones of model, with optional bagging. Ensembling is useful if fit!(machine(atom, data...)) does not create identical models on repeated calls (ie, is a stochastic model, such as a decision tree with randomized node selection criteria), or if bagging_fraction is set to a value less than 1.0, or both.

Here the atomic model must support targets with scitype AbstractVector{<:Finite} (single-target classifiers) or AbstractVector{<:Continuous} (single-target regressors).

If rng is an integer, then MersenneTwister(rng) is the random number generator used for bagging. Otherwise some AbstractRNG object is expected.

The atomic predictions are optionally weighted according to the vector atomic_weights (to allow for external optimization) except in the case that model is a Deterministic classifier, in which case atomic_weights are ignored.

The ensemble model is Deterministic or Probabilistic, according to the corresponding supertype of atom. In the case of deterministic classifiers (target_scitype(atom) <: Abstract{<:Finite}), the predictions are majority votes, and for regressors (target_scitype(atom)<: AbstractVector{<:Continuous}) they are ordinary averages. Probabilistic predictions are obtained by averaging the atomic probability distribution/mass functions; in particular, for regressors, the ensemble prediction on each input pattern has the type MixtureModel{VF,VS,D} from the Distributions.jl package, where D is the type of predicted distribution for atom.

Specify acceleration=CPUProcesses() for distributed computing, or CPUThreads() for multithreading.

If a single measure or non-empty vector of measures is specified by out_of_bag_measure, then out-of-bag estimates of performance are written to the training report (call report on the trained machine wrapping the ensemble model).

Important: If per-observation or class weights w (not to be confused with atomic weights) are specified when constructing a machine for the ensemble model, as in mach = machine(ensemble_model, X, y, w), then w is used by any measures specified in out_of_bag_measure that support them.