RandomUndersampler
Initiate a random undersampling model with the given hyper-parameters.
RandomUndersampler
A model type for constructing a random undersampler, based on Imbalance.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
RandomUndersampler = @load RandomUndersampler pkg=Imbalance
Do model = RandomUndersampler()
to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in RandomUndersampler(ratios=...)
.
RandomUndersampler
implements naive undersampling by randomly removing existing observations.
Training data
In MLJ or MLJBase, wrap the model in a machine by mach = machine(model)
There is no need to provide any data here because the model is a static transformer.
Likewise, there is no need to fit!(mach)
.
For default values of the hyper-parameters, model can be constructed by model = RandomUndersampler()
Hyperparameters
ratios=1.0
: A parameter that controls the amount of undersampling to be done for each class- Can be a float and in this case each class will be undersampled to the size of the minority class times the float. By default, all classes are undersampled to the size of the minority class
- Can be a dictionary mapping each class label to the float ratio for that class
rng::Union{AbstractRNG, Integer}=default_rng()
: Either anAbstractRNG
object or anInteger
seed to be used withXoshiro
if the JuliaVERSION
supports it. Otherwise, uses MersenneTwister`.
Transform Inputs
X
: A matrix of real numbers or a table with element scitypes that subtypeUnion{Finite, Infinite}
. Elements in nominal columns should subtypeFinite
(i.e., have scitypeOrderedFactor
orMulticlass
) and elements in continuous columns should subtypeInfinite
(i.e., have scitypeCount
orContinuous
).y
: An abstract vector of labels (e.g., strings) that correspond to the observations inX
Transform Outputs
X_under
: A matrix or table that includes the data after undersampling depending on whether the inputX
is a matrix or table respectivelyy_under
: An abstract vector of labels corresponding toX_under
Operations
transform(mach, X, y)
: resample the dataX
andy
using RandomUndersampler, returning both the new and original observations
Example
using MLJ
import Imbalance
## set probability of each class
class_probs = [0.5, 0.2, 0.3]
num_rows, num_continuous_feats = 100, 5
## generate a table and categorical vector accordingly
X, y = Imbalance.generate_imbalanced_data(num_rows, num_continuous_feats;
class_probs, rng=42)
julia> Imbalance.checkbalance(y; ref="minority")
1: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 19 (100.0%)
2: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 33 (173.7%)
0: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 48 (252.6%)
## load RandomUndersampler
RandomUndersampler = @load RandomUndersampler pkg=Imbalance
## wrap the model in a machine
undersampler = RandomUndersampler(ratios=Dict(0=>1.0, 1=> 1.0, 2=>1.0),
rng=42)
mach = machine(undersampler)
## provide the data to transform (there is nothing to fit)
X_under, y_under = transform(mach, X, y)
julia> Imbalance.checkbalance(y_under; ref="minority")
0: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 19 (100.0%)
2: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 19 (100.0%)
1: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 19 (100.0%)