BorderlineSMOTE1

Initiate a BorderlineSMOTE1 model with the given hyper-parameters.

BorderlineSMOTE1

A model type for constructing a borderline smot e1, based on Imbalance.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

BorderlineSMOTE1 = @load BorderlineSMOTE1 pkg=Imbalance

Do model = BorderlineSMOTE1() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in BorderlineSMOTE1(m=...).

BorderlineSMOTE1 implements the BorderlineSMOTE1 algorithm to correct for class imbalance as in Han, H., Wang, W.-Y., & Mao, B.-H. (2005). Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In D.S. Huang, X.-P. Zhang, & G.-B. Huang (Eds.), Advances in Intelligent Computing (pp. 878-887). Springer.

Training data

In MLJ or MLJBase, wrap the model in a machine by

mach = machine(model)

There is no need to provide any data here because the model is a static transformer.

Likewise, there is no need to fit!(mach).

For default values of the hyper-parameters, model can be constructed by

model = BorderlineSMOTE1()

Hyperparameters

  • m::Integer=5: The number of neighbors to consider while checking the BorderlineSMOTE1 condition. Should be within the range 0 < m < N where N is the number of observations in the data. It will be automatically set to N-1 if N ≤ m.

  • k::Integer=5: Number of nearest neighbors to consider in the SMOTE part of the algorithm. Should be within the range 0 < k < n where n is the number of observations in the smallest class. It will be automatically set to l-1 for any class with l points where l ≤ k.

  • ratios=1.0: A parameter that controls the amount of oversampling to be done for each class

    • Can be a float and in this case each class will be oversampled to the size of the majority class times the float. By default, all classes are oversampled to the size of the majority class
    • Can be a dictionary mapping each class label to the float ratio for that class
  • rng::Union{AbstractRNG, Integer}=default_rng(): Either an AbstractRNG object or an Integer seed to be used with Xoshiro if the Julia VERSION supports it. Otherwise, uses MersenneTwister`.

  • verbosity::Integer=1: Whenever higher than 0 info regarding the points that will participate in oversampling is logged.

Transform Inputs

  • X: A matrix or table of floats where each row is an observation from the dataset
  • y: An abstract vector of labels (e.g., strings) that correspond to the observations in X

Transform Outputs

  • Xover: A matrix or table that includes original data and the new observations due to oversampling. depending on whether the input X is a matrix or table respectively
  • yover: An abstract vector of labels corresponding to Xover

Operations

  • transform(mach, X, y): resample the data X and y using BorderlineSMOTE1, returning both the new and original observations

Example

using MLJ
import Imbalance

## set probability of each class
class_probs = [0.5, 0.2, 0.3]                         
num_rows, num_continuous_feats = 1000, 5
## generate a table and categorical vector accordingly
X, y = Imbalance.generate_imbalanced_data(num_rows, num_continuous_feats; 
                                stds=[0.1 0.1 0.1], min_sep=0.01, class_probs, rng=42)            

julia> Imbalance.checkbalance(y)
1: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 200 (40.8%) 
2: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 310 (63.3%) 
0: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 490 (100.0%) 

## load BorderlineSMOTE1
BorderlineSMOTE1 = @load BorderlineSMOTE1 pkg=Imbalance

## wrap the model in a machine
oversampler = BorderlineSMOTE1(m=3, k=5, ratios=Dict(0=>1.0, 1=> 0.9, 2=>0.8), rng=42)
mach = machine(oversampler)

## provide the data to transform (there is nothing to fit)
Xover, yover = transform(mach, X, y)


julia> Imbalance.checkbalance(yover)
2: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 392 (80.0%) 
1: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 441 (90.0%) 
0: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 490 (100.0%)