KernelPCA
KernelPCAA model type for constructing a kernel prinicipal component analysis model, based on MultivariateStats.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
KernelPCA = @load KernelPCA pkg=MultivariateStatsDo model = KernelPCA() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in KernelPCA(maxoutdim=...).
In kernel PCA the linear operations of ordinary principal component analysis are performed in a reproducing Hilbert space.
Training data
In MLJ or MLJBase, bind an instance model to data with
mach = machine(model, X)Here:
Xis any table of input features (eg, aDataFrame) whose columns are of scitypeContinuous; check column scitypes withschema(X).
Train the machine using fit!(mach, rows=...).
Hyper-parameters
maxoutdim=0: Controls the the dimension (number of columns) of the output,outdim. Specifically,outdim = min(n, indim, maxoutdim), wherenis the number of observations andindimthe input dimension.kernel::Function=(x,y)->x'y: The kernel function, takes in 2 vector arguments x and y, returns a scalar value. Defaults to the dot product ofxandy.solver::Symbol=:eig: solver to use for the eigenvalues, one of:eig(default, usesLinearAlgebra.eigen),:eigs(usesArpack.eigs).inverse::Bool=true: perform calculations needed for inverse transformbeta::Real=1.0: strength of the ridge regression that learns the inverse transform when inverse is true.tol::Real=0.0: Convergence tolerance for eigenvalue solver.maxiter::Int=300: maximum number of iterations for eigenvalue solver.
Operations
transform(mach, Xnew): Return a lower dimensional projection of the inputXnew, which should have the same scitype asXabove.inverse_transform(mach, Xsmall): For a dimension-reduced tableXsmall, such as returned bytransform, reconstruct a table, having same the number of columns as the original training dataX, that transforms toXsmall. Mathematically,inverse_transformis a right-inverse for the PCA projection map, whose image is orthogonal to the kernel of that map. In particular, ifXsmall = transform(mach, Xnew), theninverse_transform(Xsmall)is only an approximation toXnew.
Fitted parameters
The fields of fitted_params(mach) are:
projection: Returns the projection matrix, which has size(indim, outdim), whereindimandoutdimare the number of features of the input and ouput respectively.
Report
The fields of report(mach) are:
indim: Dimension (number of columns) of the training data and new data to be transformed.outdim: Dimension of transformed data.principalvars: The variance of the principal components.
Examples
using MLJ
using LinearAlgebra
KernelPCA = @load KernelPCA pkg=MultivariateStats
X, y = @load_iris ## a table and a vector
function rbf_kernel(length_scale)
return (x,y) -> norm(x-y)^2 / ((2 * length_scale)^2)
end
model = KernelPCA(maxoutdim=2, kernel=rbf_kernel(1))
mach = machine(model, X) |> fit!
Xproj = transform(mach, X)See also PCA, ICA, FactorAnalysis, PPCA