PCA
PCAA model type for constructing a pca, based on MultivariateStats.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
PCA = @load PCA pkg=MultivariateStatsDo model = PCA() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in PCA(maxoutdim=...).
Principal component analysis learns a linear projection onto a lower dimensional space while preserving most of the initial variance seen in the training data.
Training data
In MLJ or MLJBase, bind an instance model to data with
mach = machine(model, X)Here:
Xis any table of input features (eg, aDataFrame) whose columns are of scitypeContinuous; check column scitypes withschema(X).
Train the machine using fit!(mach, rows=...).
Hyper-parameters
maxoutdim=0: Together withvariance_ratio, controls the output dimensionoutdimchosen by the model. Specifically, suppose thatkis the smallest integer such that retaining thekmost significant principal components accounts forvariance_ratioof the total variance in the training data. Thenoutdim = min(outdim, maxoutdim). Ifmaxoutdim=0(default) then the effectivemaxoutdimismin(n, indim - 1)wherenis the number of observations andindimthe number of features in the training data.variance_ratio::Float64=0.99: The ratio of variance preserved after the transformationmethod=:auto: The method to use to solve the problem. Choices are:svd: Support Vector Decomposition of the matrix.:cov: Covariance matrix decomposition.:auto: Use:covif the matrices first dimension is smaller than its second dimension and otherwise use:svd
mean=nothing: ifnothing, centering will be computed and applied, if set to0no centering (data is assumed pre-centered); if a vector is passed, the centering is done with that vector.
Operations
transform(mach, Xnew): Return a lower dimensional projection of the inputXnew, which should have the same scitype asXabove.inverse_transform(mach, Xsmall): For a dimension-reduced tableXsmall, such as returned bytransform, reconstruct a table, having same the number of columns as the original training dataX, that transforms toXsmall. Mathematically,inverse_transformis a right-inverse for the PCA projection map, whose image is orthogonal to the kernel of that map. In particular, ifXsmall = transform(mach, Xnew), theninverse_transform(Xsmall)is only an approximation toXnew.
Fitted parameters
The fields of fitted_params(mach) are:
projection: Returns the projection matrix, which has size(indim, outdim), whereindimandoutdimare the number of features of the input and output respectively.
Report
The fields of report(mach) are:
indim: Dimension (number of columns) of the training data and new data to be transformed.outdim = min(n, indim, maxoutdim)is the output dimension; herenis the number of observations.tprincipalvar: Total variance of the principal components.tresidualvar: Total residual variance.tvar: Total observation variance (principal + residual variance).mean: The mean of the untransformed training data, of lengthindim.principalvars: The variance of the principal components. An AbstractVector of lengthoutdimloadings: The models loadings, weights for each variable used when calculating principal components. A matrix of size (indim,outdim) whereindimandoutdimare as defined above.
Examples
using MLJ
PCA = @load PCA pkg=MultivariateStats
X, y = @load_iris ## a table and a vector
model = PCA(maxoutdim=2)
mach = machine(model, X) |> fit!
Xproj = transform(mach, X)See also KernelPCA, ICA, FactorAnalysis, PPCA