PPCA
PPCAA model type for constructing a probabilistic PCA model, based on MultivariateStats.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
PPCA = @load PPCA pkg=MultivariateStatsDo model = PPCA() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in PPCA(maxoutdim=...).
Probabilistic principal component analysis is a dimension-reduction algorithm which represents a constrained form of the Gaussian distribution in which the number of free parameters can be restricted while still allowing the model to capture the dominant correlations in a data set. It is expressed as the maximum likelihood solution of a probabilistic latent variable model. For details, see Bishop (2006): C. M. Pattern Recognition and Machine Learning.
Training data
In MLJ or MLJBase, bind an instance model to data with
mach = machine(model, X)Here:
Xis any table of input features (eg, aDataFrame) whose columns are of scitypeContinuous; check column scitypes withschema(X).
Train the machine using fit!(mach, rows=...).
Hyper-parameters
maxoutdim=0: Controls the the dimension (number of columns) of the output,outdim. Specifically,outdim = min(n, indim, maxoutdim), wherenis the number of observations andindimthe input dimension.method::Symbol=:ml: The method to use to solve the problem, one of:ml,:em,:bayes.maxiter::Int=1000: The maximum number of iterations.tol::Real=1e-6: The convergence tolerance.mean::Union{Nothing, Real, Vector{Float64}}=nothing: Ifnothing, centering will be computed and applied; if set to0no centering is applied (data is assumed pre-centered); if a vector, the centering is done with that vector.
Operations
transform(mach, Xnew): Return a lower dimensional projection of the inputXnew, which should have the same scitype asXabove.inverse_transform(mach, Xsmall): For a dimension-reduced tableXsmall, such as returned bytransform, reconstruct a table, having same the number of columns as the original training dataX, that transforms toXsmall. Mathematically,inverse_transformis a right-inverse for the PCA projection map, whose image is orthogonal to the kernel of that map. In particular, ifXsmall = transform(mach, Xnew), theninverse_transform(Xsmall)is only an approximation toXnew.
Fitted parameters
The fields of fitted_params(mach) are:
projection: Returns the projection matrix, which has size(indim, outdim), whereindimandoutdimare the number of features of the input and ouput respectively. Each column of the projection matrix corresponds to a principal component.
Report
The fields of report(mach) are:
indim: Dimension (number of columns) of the training data and new data to be transformed.outdim: Dimension of transformed data.tvat: The variance of the components.loadings: The model's loadings matrix. A matrix of size (indim,outdim) whereindimandoutdimas as defined above.
Examples
using MLJ
PPCA = @load PPCA pkg=MultivariateStats
X, y = @load_iris ## a table and a vector
model = PPCA(maxoutdim=2)
mach = machine(model, X) |> fit!
Xproj = transform(mach, X)See also KernelPCA, ICA, FactorAnalysis, PCA