API
Models
NearestNeighborModels.KNNClassifier
— TypeKNNClassifier
A model type for constructing a K-nearest neighbor classifier, based on NearestNeighborModels.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
KNNClassifier = @load KNNClassifier pkg=NearestNeighborModels
Do model = KNNClassifier()
to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in KNNClassifier(K=...)
.
KNNClassifier implements K-Nearest Neighbors classifier which is non-parametric algorithm that predicts a discrete class distribution associated with a new point by taking a vote over the classes of the k-nearest points. Each neighbor vote is assigned a weight based on proximity of the neighbor point to the test point according to a specified distance metric.
For more information about the weighting kernels, see the paper by Geler et.al Comparison of different weighting schemes for the kNN classifier on time-series data.
Training data
In MLJ or MLJBase, bind an instance model
to data with
mach = machine(model, X, y)
OR
mach = machine(model, X, y, w)
Here:
X
is any table of input features (eg, aDataFrame
) whose columns are of scitypeContinuous
; check column scitypes withschema(X)
.y
is the target, which can be anyAbstractVector
whose element scitype is<:Finite
(<:Multiclass
or<:OrderedFactor
will do); check the scitype withscitype(y)
w
is the observation weights which can either benothing
(default) or anAbstractVector
whose element scitype isCount
orContinuous
. This is different fromweights
kernel which is a model hyperparameter, see below.
Train the machine using fit!(mach, rows=...)
.
Hyper-parameters
K::Int=5
: number of neighborsalgorithm::Symbol = :kdtree
: one of(:kdtree, :brutetree, :balltree)
metric::Metric = Euclidean()
: anyMetric
from Distances.jl for the distance between points. Foralgorithm = :kdtree
only metrics which are instances ofUnion{Distances.Chebyshev, Distances.Cityblock, Distances.Euclidean, Distances.Minkowski, Distances.WeightedCityblock, Distances.WeightedEuclidean, Distances.WeightedMinkowski}
are supported.leafsize::Int = algorithm == 10
: determines the number of points at which to stop splitting the tree. This option is ignored and always taken as0
foralgorithm = :brutetree
, sincebrutetree
isn't actually a tree.reorder::Bool = true
: iftrue
then points which are close in distance are placed close in memory. In this case, a copy of the original data will be made so that the original data is left unmodified. Setting this totrue
can significantly improve performance of the specifiedalgorithm
(except:brutetree
). This option is ignored and always taken asfalse
foralgorithm = :brutetree
.weights::KNNKernel=Uniform()
: kernel used in assigning weights to the k-nearest neighbors for each observation. An instance of one of the types inlist_kernels()
. User-defined weighting functions can be passed by wrapping the function in aUserDefinedKernel
kernel (do?NearestNeighborModels.UserDefinedKernel
for more info). If observation weightsw
are passed during machine construction then the weight assigned to each neighbor vote is the product of the kernel generated weight for that neighbor and the corresponding observation weight.
Operations
predict(mach, Xnew)
: Return predictions of the target given featuresXnew
, which should have same scitype asX
above. Predictions are probabilistic but uncalibrated.predict_mode(mach, Xnew)
: Return the modes of the probabilistic predictions returned above.
Fitted parameters
The fields of fitted_params(mach)
are:
tree
: An instance of eitherKDTree
,BruteTree
orBallTree
depending on the value of thealgorithm
hyperparameter (See hyper-parameters section above). These are data structures that stores the training data with the view of making quicker nearest neighbor searches on test data points.
Examples
using MLJ
KNNClassifier = @load KNNClassifier pkg=NearestNeighborModels
X, y = @load_crabs; # a table and a vector from the crabs dataset
# view possible kernels
NearestNeighborModels.list_kernels()
# KNNClassifier instantiation
model = KNNClassifier(weights = NearestNeighborModels.Inverse())
mach = machine(model, X, y) |> fit! # wrap model and required data in an MLJ machine and fit
y_hat = predict(mach, X)
labels = predict_mode(mach, X)
See also MultitargetKNNClassifier
NearestNeighborModels.KNNRegressor
— TypeKNNRegressor
A model type for constructing a K-nearest neighbor regressor, based on NearestNeighborModels.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
KNNRegressor = @load KNNRegressor pkg=NearestNeighborModels
Do model = KNNRegressor()
to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in KNNRegressor(K=...)
.
KNNRegressor implements K-Nearest Neighbors regressor which is non-parametric algorithm that predicts the response associated with a new point by taking an weighted average of the response of the K-nearest points.
Training data
In MLJ or MLJBase, bind an instance model
to data with
mach = machine(model, X, y)
OR
mach = machine(model, X, y, w)
Here:
X
is any table of input features (eg, aDataFrame
) whose columns are of scitypeContinuous
; check column scitypes withschema(X)
.y
is the target, which can be any table of responses whose element scitype isContinuous
; check the scitype withscitype(y)
.w
is the observation weights which can either benothing
(default) or anAbstractVector
whoose element scitype isCount
orContinuous
. This is different fromweights
kernel which is an hyperparameter to the model, see below.
Train the machine using fit!(mach, rows=...)
.
Hyper-parameters
K::Int=5
: number of neighborsalgorithm::Symbol = :kdtree
: one of(:kdtree, :brutetree, :balltree)
metric::Metric = Euclidean()
: anyMetric
from Distances.jl for the distance between points. Foralgorithm = :kdtree
only metrics which are instances ofUnion{Distances.Chebyshev, Distances.Cityblock, Distances.Euclidean, Distances.Minkowski, Distances.WeightedCityblock, Distances.WeightedEuclidean, Distances.WeightedMinkowski}
are supported.leafsize::Int = algorithm == 10
: determines the number of points at which to stop splitting the tree. This option is ignored and always taken as0
foralgorithm = :brutetree
, sincebrutetree
isn't actually a tree.reorder::Bool = true
: iftrue
then points which are close in distance are placed close in memory. In this case, a copy of the original data will be made so that the original data is left unmodified. Setting this totrue
can significantly improve performance of the specifiedalgorithm
(except:brutetree
). This option is ignored and always taken asfalse
foralgorithm = :brutetree
.weights::KNNKernel=Uniform()
: kernel used in assigning weights to the k-nearest neighbors for each observation. An instance of one of the types inlist_kernels()
. User-defined weighting functions can be passed by wrapping the function in aUserDefinedKernel
kernel (do?NearestNeighborModels.UserDefinedKernel
for more info). If observation weightsw
are passed during machine construction then the weight assigned to each neighbor vote is the product of the kernel generated weight for that neighbor and the corresponding observation weight.
Operations
predict(mach, Xnew)
: Return predictions of the target given featuresXnew
, which should have same scitype asX
above.
Fitted parameters
The fields of fitted_params(mach)
are:
tree
: An instance of eitherKDTree
,BruteTree
orBallTree
depending on the value of thealgorithm
hyperparameter (See hyper-parameters section above). These are data structures that stores the training data with the view of making quicker nearest neighbor searches on test data points.
Examples
using MLJ
KNNRegressor = @load KNNRegressor pkg=NearestNeighborModels
X, y = @load_boston; # loads the crabs dataset from MLJBase
# view possible kernels
NearestNeighborModels.list_kernels()
model = KNNRegressor(weights = NearestNeighborModels.Inverse()) #KNNRegressor instantiation
mach = machine(model, X, y) |> fit! # wrap model and required data in an MLJ machine and fit
y_hat = predict(mach, X)
See also MultitargetKNNRegressor
NearestNeighborModels.MultitargetKNNRegressor
— TypeMultitargetKNNRegressor
A model type for constructing a multitarget K-nearest neighbor regressor, based on NearestNeighborModels.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
MultitargetKNNRegressor = @load MultitargetKNNRegressor pkg=NearestNeighborModels
Do model = MultitargetKNNRegressor()
to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in MultitargetKNNRegressor(K=...)
.
Multi-target K-Nearest Neighbors regressor (MultitargetKNNRegressor) is a variation of KNNRegressor
that assumes the target variable is vector-valued with Continuous
components. (Target data must be presented as a table, however.)
Training data
In MLJ or MLJBase, bind an instance model
to data with
mach = machine(model, X, y)
OR
mach = machine(model, X, y, w)
Here:
X
is any table of input features (eg, aDataFrame
) whose columns are of scitypeContinuous
; check column scitypes withschema(X)
.y
is the target, which can be any table of responses whose element scitype isContinuous
; check column scitypes withschema(y)
.w
is the observation weights which can either benothing
(default) or anAbstractVector
whoose element scitype isCount
orContinuous
. This is different fromweights
kernel which is an hyperparameter to the model, see below.
Train the machine using fit!(mach, rows=...)
.
Hyper-parameters
K::Int=5
: number of neighborsalgorithm::Symbol = :kdtree
: one of(:kdtree, :brutetree, :balltree)
metric::Metric = Euclidean()
: anyMetric
from Distances.jl for the distance between points. Foralgorithm = :kdtree
only metrics which are instances ofUnion{Distances.Chebyshev, Distances.Cityblock, Distances.Euclidean, Distances.Minkowski, Distances.WeightedCityblock, Distances.WeightedEuclidean, Distances.WeightedMinkowski}
are supported.leafsize::Int = algorithm == 10
: determines the number of points at which to stop splitting the tree. This option is ignored and always taken as0
foralgorithm = :brutetree
, sincebrutetree
isn't actually a tree.reorder::Bool = true
: iftrue
then points which are close in distance are placed close in memory. In this case, a copy of the original data will be made so that the original data is left unmodified. Setting this totrue
can significantly improve performance of the specifiedalgorithm
(except:brutetree
). This option is ignored and always taken asfalse
foralgorithm = :brutetree
.weights::KNNKernel=Uniform()
: kernel used in assigning weights to the k-nearest neighbors for each observation. An instance of one of the types inlist_kernels()
. User-defined weighting functions can be passed by wrapping the function in aUserDefinedKernel
kernel (do?NearestNeighborModels.UserDefinedKernel
for more info). If observation weightsw
are passed during machine construction then the weight assigned to each neighbor vote is the product of the kernel generated weight for that neighbor and the corresponding observation weight.
Operations
predict(mach, Xnew)
: Return predictions of the target given featuresXnew
, which should have same scitype asX
above.
Fitted parameters
The fields of fitted_params(mach)
are:
tree
: An instance of eitherKDTree
,BruteTree
orBallTree
depending on the value of thealgorithm
hyperparameter (See hyper-parameters section above). These are data structures that stores the training data with the view of making quicker nearest neighbor searches on test data points.
Examples
using MLJ
# Create Data
X, y = make_regression(10, 5, n_targets=2)
# load MultitargetKNNRegressor
MultitargetKNNRegressor = @load MultitargetKNNRegressor pkg=NearestNeighborModels
# view possible kernels
NearestNeighborModels.list_kernels()
# MutlitargetKNNRegressor instantiation
model = MultitargetKNNRegressor(weights = NearestNeighborModels.Inverse())
# Wrap model and required data in an MLJ machine and fit.
mach = machine(model, X, y) |> fit!
# Predict
y_hat = predict(mach, X)
See also KNNRegressor
NearestNeighborModels.MultitargetKNNClassifier
— TypeMultitargetKNNClassifier
A model type for constructing a multitarget K-nearest neighbor classifier, based on NearestNeighborModels.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
MultitargetKNNClassifier = @load MultitargetKNNClassifier pkg=NearestNeighborModels
Do model = MultitargetKNNClassifier()
to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in MultitargetKNNClassifier(K=...)
.
Multi-target K-Nearest Neighbors Classifier (MultitargetKNNClassifier) is a variation of KNNClassifier
that assumes the target variable is vector-valued with Multiclass
or OrderedFactor
components. (Target data must be presented as a table, however.)
Training data
In MLJ or MLJBase, bind an instance model
to data with
mach = machine(model, X, y)
OR
mach = machine(model, X, y, w)
Here:
X
is any table of input features (eg, aDataFrame
) whose columns are of scitypeContinuous
; check column scitypes withschema(X)
.y
is the target, which can be any table of responses whose element scitype is either
<:Finite(
<:Multiclassor
<:OrderedFactorwill do); check the columns scitypes with
schema(y). Each column of
y` is assumed to belong to a common categorical pool.w
is the observation weights which can either benothing
(default) or anAbstractVector
whose element scitype isCount
orContinuous
. This is different fromweights
kernel which is a model hyperparameter, see below.
Train the machine using fit!(mach, rows=...)
.
Hyper-parameters
K::Int=5
: number of neighborsalgorithm::Symbol = :kdtree
: one of(:kdtree, :brutetree, :balltree)
metric::Metric = Euclidean()
: anyMetric
from Distances.jl for the distance between points. Foralgorithm = :kdtree
only metrics which are instances ofUnion{Distances.Chebyshev, Distances.Cityblock, Distances.Euclidean, Distances.Minkowski, Distances.WeightedCityblock, Distances.WeightedEuclidean, Distances.WeightedMinkowski}
are supported.leafsize::Int = algorithm == 10
: determines the number of points at which to stop splitting the tree. This option is ignored and always taken as0
foralgorithm = :brutetree
, sincebrutetree
isn't actually a tree.reorder::Bool = true
: iftrue
then points which are close in distance are placed close in memory. In this case, a copy of the original data will be made so that the original data is left unmodified. Setting this totrue
can significantly improve performance of the specifiedalgorithm
(except:brutetree
). This option is ignored and always taken asfalse
foralgorithm = :brutetree
.weights::KNNKernel=Uniform()
: kernel used in assigning weights to the k-nearest neighbors for each observation. An instance of one of the types inlist_kernels()
. User-defined weighting functions can be passed by wrapping the function in aUserDefinedKernel
kernel (do?NearestNeighborModels.UserDefinedKernel
for more info). If observation weightsw
are passed during machine construction then the weight assigned to each neighbor vote is the product of the kernel generated weight for that neighbor and the corresponding observation weight.
output_type::Type{<:MultiUnivariateFinite}=DictTable
: One of (ColumnTable
,DictTable
). The type of table type to use for predictions. Setting toColumnTable
might improve performance for narrow tables while setting toDictTable
improves performance for wide tables.
Operations
predict(mach, Xnew)
: Return predictions of the target given featuresXnew
, which should have same scitype asX
above. Predictions are either aColumnTable
orDictTable
ofUnivariateFiniteVector
columns depending on the value set for theoutput_type
parameter discussed above. The probabilistic predictions are uncalibrated.predict_mode(mach, Xnew)
: Return the modes of each column of the table of probabilistic predictions returned above.
Fitted parameters
The fields of fitted_params(mach)
are:
tree
: An instance of eitherKDTree
,BruteTree
orBallTree
depending on the value of thealgorithm
hyperparameter (See hyper-parameters section above). These are data structures that stores the training data with the view of making quicker nearest neighbor searches on test data points.
Examples
using MLJ, StableRNGs
# set rng for reproducibility
rng = StableRNG(10)
# Dataset generation
n, p = 10, 3
X = table(randn(rng, n, p)) # feature table
fruit, color = categorical(["apple", "orange"]), categorical(["blue", "green"])
y = [(fruit = rand(rng, fruit), color = rand(rng, color)) for _ in 1:n] # target_table
# Each column in y has a common categorical pool as expected
selectcols(y, :fruit) # categorical array
selectcols(y, :color) # categorical array
# Load MultitargetKNNClassifier
MultitargetKNNClassifier = @load MultitargetKNNClassifier pkg=NearestNeighborModels
# view possible kernels
NearestNeighborModels.list_kernels()
# MultitargetKNNClassifier instantiation
model = MultitargetKNNClassifier(K=3, weights = NearestNeighborModels.Inverse())
# wrap model and required data in an MLJ machine and fit
mach = machine(model, X, y) |> fit!
# predict
y_hat = predict(mach, X)
labels = predict_mode(mach, X)
See also KNNClassifier
Kernels
NearestNeighborModels.KNNKernel
— TypeKNNKernel
Abstract super type for all weighting kernels
NearestNeighborModels.list_kernels
— Functionlist_kernels()
Lists all implemented KNN weighting kernels
NearestNeighborModels.DualU
— TypeDualU()
Assigns the closest neighbor a weight of 1
, the furthest neighbor weight 0
and the others are scaled between by a mapping.
For more information see the paper by Geler et.al Comparison of different weighting schemes for the kNN classifier on time-series data.
see also: DualD
NearestNeighborModels.DualD
— TypeDualD()
Assigns the closest neighbor a weight of 1
, the furthest neighbor weight 0
and the others are scaled between by a mapping.
For more information see the paper by Geler et.al Comparison of different weighting schemes for the kNN classifier on time-series data.
see also: DualU
NearestNeighborModels.Dudani
— TypeDudani()
Assigns the closest neighbor a weight of 1
, the furthest neighbor weight 0
and the others are scaled between by a linear mapping.
For more information see the paper by Geler et.al Comparison of different weighting schemes for the kNN classifier on time-series data.
NearestNeighborModels.Fibonacci
— TypeFibonacci()
Assigns neighbors weights corresponding to fibonacci numbers starting from the furthest neighbor. i.e the furthest neighbor a weight of 1
, the second furthest neighbor a weight of 1
and the third furthest neighbor a weight of 2
and so on.
For more information see the paper by Geler et.al Comparison of different weighting schemes for the kNN classifier on time-series data.
NearestNeighborModels.Inverse
— TypeInverse()
Assigns each neighbor a weight equal to the inverse of the corresponsting distance of the neighbor.
For more information see the paper by Geler et.al Comparison of different weighting schemes for the kNN classifier on time-series data.
see also: ISquared
NearestNeighborModels.ISquared
— TypeISquared()
Assigns each neighbor a weight equal to the inverse of the corresponsting squared-distance of the neighbor.
For more information see the paper by Geler et.al Comparison of different weighting schemes for the kNN classifier on time-series data.
NearestNeighborModels.Macleod
— TypeMacleod(;a::Real= 0.0)
Assigns the closest neighbor a weight of 1
, the furthest neighbor weight 0
and the others are scaled between by a linear mapping.
For more information see the paper by Geler et.al Comparison of different weighting schemes for the kNN classifier on time-series data.
NearestNeighborModels.Rank
— TypeRank()
Assigns each neighbor a weight as a rank such that the closest neighbor get's a weight of 1
and the Kth closest neighbor gets a weight of K
.
For more information see the paper by Geler et.al Comparison of different weighting schemes for the kNN classifier on time-series data.
see also: ReciprocalRank
NearestNeighborModels.ReciprocalRank
— TypeReciprocalRank(;a::Real= 0.0)
Assigns each closest neighbor a weight which is equal to the reciprocal of it's rank. i.e the closest neighbor get's a weight of 1
and the Kth closest weight get's a weight of 1/K
For more information see the paper by Geler et.al Comparison of different weighting schemes for the kNN classifier on time-series data.
see also: Rank
NearestNeighborModels.UDK
— TypeUDK
Alias for UserDefinedKernel
NearestNeighborModels.Uniform
— TypeNearestNeighborModels.UserDefinedKernel
— TypeUserDefinedKernel(;func::Function = x->nothing, sort::Bool=false)
Wrap a user defined nearest neighbors weighting function func
as a KNNKernel
.
Keywords
func
: user-defined nearest neighbors weighting function. The function should have the signaturefunc(dists_matrix)::Union{Nothing, <:AbstractMatrix}
. Thedists_matrix
is an
byK
nearest neighbors distances matrix wheren
is the number of samples in the test dataset andK
is number of neighbors.func
should either outputnothing
or anAbstractMatrix
of the same shape asdists_matrix
. Iffunc(dists_matrix)
returns nothing then all k-nearest neighbors in each row are assign equal weights.sort
: if true requests that thedists_matrix
be sorted before being passed tofunc
. The sort is done in a manner that puts the k-nearest neighbors in each row ofdists_matrix
in acesending order .
NearestNeighborModels.Zavreal
— TypeZavreal(;s::Real = 0.0, a::Real=1.0)
Assigns each neighbor an exponential weight given by $e^{ - α ⋅ d_i^{eta}}$ where α
and β
are constants and dᵢ
is the distance of the given neighbor.
For more information see the paper by Geler et.al Comparison of different weighting schemes for the kNN classifier on time-series data.