RandomOversampler
Initiate a random oversampling model with the given hyper-parameters.
RandomOversamplerA model type for constructing a random oversampler, based on Imbalance.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
RandomOversampler = @load RandomOversampler pkg=ImbalanceDo model = RandomOversampler() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in RandomOversampler(ratios=...).
RandomOversampler implements naive oversampling by repeating existing observations with replacement.
Training data
In MLJ or MLJBase, wrap the model in a machine by mach = machine(model)
There is no need to provide any data here because the model is a static transformer.
Likewise, there is no need to fit!(mach).
For default values of the hyper-parameters, model can be constructed by model = RandomOverSampler()
Hyperparameters
ratios=1.0: A parameter that controls the amount of oversampling to be done for each class- Can be a float and in this case each class will be oversampled to the size of the majority class times the float. By default, all classes are oversampled to the size of the majority class
- Can be a dictionary mapping each class label to the float ratio for that class
rng::Union{AbstractRNG, Integer}=default_rng(): Either anAbstractRNGobject or anIntegerseed to be used withXoshiroif the JuliaVERSIONsupports it. Otherwise, uses MersenneTwister`.
Transform Inputs
X: A matrix of real numbers or a table with element scitypes that subtypeUnion{Finite, Infinite}. Elements in nominal columns should subtypeFinite(i.e., have scitypeOrderedFactororMulticlass) and elements in continuous columns should subtypeInfinite(i.e., have scitypeCountorContinuous).y: An abstract vector of labels (e.g., strings) that correspond to the observations inX
Transform Outputs
Xover: A matrix or table that includes original data and the new observations due to oversampling. depending on whether the inputXis a matrix or table respectivelyyover: An abstract vector of labels corresponding toXover
Operations
transform(mach, X, y): resample the dataXandyusing RandomOversampler, returning both the new and original observations
Example
using MLJ
import Imbalance
## set probability of each class
class_probs = [0.5, 0.2, 0.3]
num_rows, num_continuous_feats = 100, 5
## generate a table and categorical vector accordingly
X, y = Imbalance.generate_imbalanced_data(num_rows, num_continuous_feats;
class_probs, rng=42)
julia> Imbalance.checkbalance(y)
1: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 19 (39.6%)
2: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 33 (68.8%)
0: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 48 (100.0%)
## load RandomOversampler
RandomOversampler = @load RandomOversampler pkg=Imbalance
## wrap the model in a machine
oversampler = RandomOversampler(ratios=Dict(0=>1.0, 1=> 0.9, 2=>0.8), rng=42)
mach = machine(oversampler)
## provide the data to transform (there is nothing to fit)
Xover, yover = transform(mach, X, y)
julia> Imbalance.checkbalance(yover)
2: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 38 (79.2%)
1: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 43 (89.6%)
0: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 48 (100.0%)