MultitargetKNNRegressor
MultitargetKNNRegressor
A model type for constructing a multitarget K-nearest neighbor regressor, based on NearestNeighborModels.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
MultitargetKNNRegressor = @load MultitargetKNNRegressor pkg=NearestNeighborModels
Do model = MultitargetKNNRegressor()
to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in MultitargetKNNRegressor(K=...)
.
Multi-target K-Nearest Neighbors regressor (MultitargetKNNRegressor) is a variation of KNNRegressor
that assumes the target variable is vector-valued with Continuous
components. (Target data must be presented as a table, however.)
Training data
In MLJ or MLJBase, bind an instance model
to data with
mach = machine(model, X, y)
OR
mach = machine(model, X, y, w)
Here:
X
is any table of input features (eg, aDataFrame
) whose columns are of scitypeContinuous
; check column scitypes withschema(X)
.y
is the target, which can be any table of responses whose element scitype isContinuous
; check column scitypes withschema(y)
.w
is the observation weights which can either benothing
(default) or anAbstractVector
whoose element scitype isCount
orContinuous
. This is different fromweights
kernel which is an hyperparameter to the model, see below.
Train the machine using fit!(mach, rows=...)
.
Hyper-parameters
K::Int=5
: number of neighborsalgorithm::Symbol = :kdtree
: one of(:kdtree, :brutetree, :balltree)
metric::Metric = Euclidean()
: anyMetric
from Distances.jl for the distance between points. Foralgorithm = :kdtree
only metrics which are instances ofUnion{Distances.Chebyshev, Distances.Cityblock, Distances.Euclidean, Distances.Minkowski, Distances.WeightedCityblock, Distances.WeightedEuclidean, Distances.WeightedMinkowski}
are supported.leafsize::Int = algorithm == 10
: determines the number of points at which to stop splitting the tree. This option is ignored and always taken as0
foralgorithm = :brutetree
, sincebrutetree
isn't actually a tree.reorder::Bool = true
: iftrue
then points which are close in distance are placed close in memory. In this case, a copy of the original data will be made so that the original data is left unmodified. Setting this totrue
can significantly improve performance of the specifiedalgorithm
(except:brutetree
). This option is ignored and always taken asfalse
foralgorithm = :brutetree
.weights::KNNKernel=Uniform()
: kernel used in assigning weights to the k-nearest neighbors for each observation. An instance of one of the types inlist_kernels()
. User-defined weighting functions can be passed by wrapping the function in aUserDefinedKernel
kernel (do?NearestNeighborModels.UserDefinedKernel
for more info). If observation weightsw
are passed during machine construction then the weight assigned to each neighbor vote is the product of the kernel generated weight for that neighbor and the corresponding observation weight.
Operations
predict(mach, Xnew)
: Return predictions of the target given featuresXnew
, which should have same scitype asX
above.
Fitted parameters
The fields of fitted_params(mach)
are:
tree
: An instance of eitherKDTree
,BruteTree
orBallTree
depending on the value of thealgorithm
hyperparameter (See hyper-parameters section above). These are data structures that stores the training data with the view of making quicker nearest neighbor searches on test data points.
Examples
using MLJ
## Create Data
X, y = make_regression(10, 5, n_targets=2)
## load MultitargetKNNRegressor
MultitargetKNNRegressor = @load MultitargetKNNRegressor pkg=NearestNeighborModels
## view possible kernels
NearestNeighborModels.list_kernels()
## MutlitargetKNNRegressor instantiation
model = MultitargetKNNRegressor(weights = NearestNeighborModels.Inverse())
## Wrap model and required data in an MLJ machine and fit.
mach = machine(model, X, y) |> fit!
## Predict
y_hat = predict(mach, X)
See also KNNRegressor