Machines
Recall from Getting Started that a machine binds a model (i.e., a choice of algorithm + hyperparameters) to data (see more at Constructing machines below). A machine is also the object storing learned parameters. Under the hood, calling fit!
on a machine calls either MLJBase.fit
or MLJBase.update
, depending on the machine's internal state (as recorded in private fields old_model
and old_rows
). These lower-level fit
and update
methods, which are not ordinarily called directly by the user, dispatch on the model and a view of the data defined by the optional rows
keyword argument of fit!
(all rows by default).
Warm restarts
If a model update
method has been implemented for the model, calls to fit!
will avoid redundant calculations for certain kinds of model mutations. The main use-case is increasing an iteration parameter, such as the number of epochs in a neural network. To test if SomeIterativeModel
supports this feature, check iteration_parameter(SomeIterativeModel)
is different from nothing
.
tree = (@load DecisionTreeClassifier pkg=DecisionTree verbosity=0)()
forest = EnsembleModel(model=tree, n=10);
X, y = @load_iris;
mach = machine(forest, X, y)
fit!(mach, verbosity=2);
trained Machine; caches model-specific representations of data
model: ProbabilisticEnsembleModel(model = DecisionTreeClassifier(max_depth = -1, …), …)
args:
1: Source @165 ⏎ Table{AbstractVector{Continuous}}
2: Source @701 ⏎ AbstractVector{Multiclass{3}}
Generally, changing a hyperparameter triggers retraining on calls to subsequent fit!
:
julia> forest.bagging_fraction = 0.5;
julia> fit!(mach, verbosity=2);
[ Info: Updating machine(ProbabilisticEnsembleModel(model = DecisionTreeClassifier(max_depth = -1, …), …), …). [ Info: Truncating existing ensemble.
However, for this iterative model, increasing the iteration parameter only adds models to the existing ensemble:
julia> forest.n = 15;
julia> fit!(mach, verbosity=2);
[ Info: Updating machine(ProbabilisticEnsembleModel(model = DecisionTreeClassifier(max_depth = -1, …), …), …). [ Info: Building on existing ensemble of length 10 [ Info: One hash per new atom trained: #####
Call fit!
again without making a change and no retraining occurs:
julia> fit!(mach);
[ Info: Not retraining machine(ProbabilisticEnsembleModel(model = DecisionTreeClassifier(max_depth = -1, …), …), …). Use `force=true` to force.
However, retraining can be forced:
julia> fit!(mach, force=true);
[ Info: Training machine(ProbabilisticEnsembleModel(model = DecisionTreeClassifier(max_depth = -1, …), …), …).
And is re-triggered if the view of the data changes:
julia> fit!(mach, rows=1:100);
[ Info: Training machine(ProbabilisticEnsembleModel(model = DecisionTreeClassifier(max_depth = -1, …), …), …).
julia> fit!(mach, rows=1:100);
[ Info: Not retraining machine(ProbabilisticEnsembleModel(model = DecisionTreeClassifier(max_depth = -1, …), …), …). Use `force=true` to force.
If an iterative model exposes its iteration parameter as a hyperparameter, and it implements the warm restart behavior above, then it can be wrapped in a "control strategy", like an early stopping criterion. See Controlling Iterative Models for details.
Inspecting machines
There are two principal methods for inspecting the outcomes of training in MLJ. To obtain a named-tuple describing the learned parameters (in a user-friendly way where possible) use fitted_params(mach)
. All other training-related outcomes are inspected with report(mach)
.
X, y = @load_iris
pca = (@load PCA verbosity=0)()
mach = machine(pca, X)
fit!(mach)
trained Machine; caches model-specific representations of data
model: PCA(maxoutdim = 0, …)
args:
1: Source @549 ⏎ Table{AbstractVector{Continuous}}
julia> fitted_params(mach)
(projection = [-0.36158967738145 0.6565398832858296 0.5809972798276162; 0.08226888989221415 0.7297123713264985 -0.5964180879380994; -0.8565721052905275 -0.175767403428653 -0.07252407548695988; -0.3588439262482158 -0.07470647013503479 -0.5490609107266099],)
julia> report(mach)
(indim = 4, outdim = 3, tprincipalvar = 4.545608248041779, tresidualvar = 0.023683027126000233, tvar = 4.569291275167779, mean = [5.843333333333334, 3.0540000000000003, 3.758666666666667, 1.198666666666667], principalvars = [4.224840768320109, 0.24224357162751498, 0.0785239080941545], loadings = [-0.7432265175592332 0.3231374133069471 0.16280774164399525; 0.16909891062391016 0.3591516283038468 -0.16712897864451629; -1.7606340630732822 -0.0865096325959021 -0.02032278180089568; -0.73758278605778 -0.03676921407410996 -0.15385849470227703],)
MLJModelInterface.fitted_params
— Methodfitted_params(mach)
Return the learned parameters for a machine mach
that has been fit!
, for example the coefficients in a linear model.
This is a named tuple and human-readable if possible.
If mach
is a machine for a composite model, such as a model constructed using the pipeline syntax model1 |> model2 |> ...
, then the returned named tuple has the composite type's field names as keys. The corresponding value is the fitted parameters for the machine in the underlying learning network bound to that model. (If multiple machines share the same model, then the value is a vector.)
julia> using MLJ
julia> @load LogisticClassifier pkg=MLJLinearModels
julia> X, y = @load_crabs;
julia> pipe = Standardizer() |> LogisticClassifier();
julia> mach = machine(pipe, X, y) |> fit!;
julia> fitted_params(mach).logistic_classifier
(classes = CategoricalArrays.CategoricalValue{String,UInt32}["B", "O"],
coefs = Pair{Symbol,Float64}[:FL => 3.7095037897680405, :RW => 0.1135739140854546, :CL => -1.6036892745322038, :CW => -4.415667573486482, :BD => 3.238476051092471],
intercept = 0.0883301599726305,)
See also report
MLJBase.report
— Methodreport(mach)
Return the report for a machine mach
that has been fit!
, for example the coefficients in a linear model.
This is a named tuple and human-readable if possible.
If mach
is a machine for a composite model, such as a model constructed using the pipeline syntax model1 |> model2 |> ...
, then the returned named tuple has the composite type's field names as keys. The corresponding value is the report for the machine in the underlying learning network bound to that model. (If multiple machines share the same model, then the value is a vector.)
julia> using MLJ
julia> @load LinearBinaryClassifier pkg=GLM
julia> X, y = @load_crabs;
julia> pipe = Standardizer() |> LinearBinaryClassifier();
julia> mach = machine(pipe, X, y) |> fit!;
julia> report(mach).linear_binary_classifier
(deviance = 3.8893386087844543e-7,
dof_residual = 195.0,
stderror = [18954.83496713119, 6502.845740757159, 48484.240246060406, 34971.131004997274, 20654.82322484894, 2111.1294584763386],
vcov = [3.592857686311793e8 9.122732393971942e6 … -8.454645589364915e7 5.38856837634321e6; 9.122732393971942e6 4.228700272808351e7 … -4.978433790526467e7 -8.442545425533723e6; … ; -8.454645589364915e7 -4.978433790526467e7 … 4.2662172244975924e8 2.1799125705781363e7; 5.38856837634321e6 -8.442545425533723e6 … 2.1799125705781363e7 4.456867590446599e6],)
See also fitted_params
Training losses and feature importances
Training losses and feature importances, if reported by a model, will be available in the machine's report (see above). However, there are also direct access methods where supported:
training_losses(mach::Machine) -> vector_of_losses
Here vector_of_losses
will be in historical order (most recent loss last). This kind of access is supported for model = mach.model
if supports_training_losses(model) == true
.
feature_importances(mach::Machine) -> vector_of_pairs
Here a vector_of_pairs
is a vector of elements of the form feature => importance_value
, where feature
is a symbol. For example, vector_of_pairs = [:gender => 0.23, :height => 0.7, :weight => 0.1]
. If a model does not support feature importances for some model hyperparameters, every importance_value
will be zero. This kind of access is supported for model = mach.model
if reports_feature_importances(model) == true
.
If a model can report multiple types of feature importances, then there will be a model hyper-parameter controlling the active type.
Constructing machines
A machine is constructed with the syntax machine(model, args...)
where the possibilities for args
(called training arguments) are summarized in the table below. Here X
and y
represent inputs and target, respectively, and Xout
is the output of a transform
call. Machines for supervised models may have additional training arguments, such as a vector of per-observation weights (in which case supports_weights(model) == true
).
model supertype | machine constructor calls | operation calls (first compulsory) |
---|---|---|
Deterministic <: Supervised | machine(model, X, y, extras...) | predict(mach, Xnew) , transform(mach, Xnew) , inverse_transform(mach, Xout) |
Probabilistic <: Supervised | machine(model, X, y, extras...) | predict(mach, Xnew) , predict_mean(mach, Xnew) , predict_median(mach, Xnew) , predict_mode(mach, Xnew) , transform(mach, Xnew) , inverse_transform(mach, Xout) |
Unsupervised (except Static ) | machine(model, X) | transform(mach, Xnew) , inverse_transform(mach, Xout) , predict(mach, Xnew) |
Static | machine(model) | transform(mach, Xnews...) , inverse_transform(mach, Xout) |
All operations on machines (predict
, transform
, etc) have exactly one argument (Xnew
or Xout
above) after mach
, the machine instance. An exception is a machine bound to a Static
model, which can have any number of arguments after mach
. For more on Static
transformers (which have no training arguments) see Static transformers.
A machine is reconstructed from a file using the syntax machine("my_machine.jlso")
, or machine("my_machine.jlso", args...)
if retraining using new data. See Saving machines below.
Lowering memory demands
For large data sets, you may be able to save memory by suppressing data caching that some models perform to increase speed. To do this, specify cache=false
, as in
machine(model, X, y, cache=false)
Constructing machines in learning networks
Instead of data X
, y
, etc, the machine
constructor is provided Node
or Source
objects ("dynamic data") when building a learning network. See Learning Networks for more on this advanced feature.
Saving machines
Users can save and restore MLJ machines using any external serialization package by suitably preparing their Machine
object, and applying a post-processing step to the deserialized object. This is explained under Using an arbitrary serializer below.
However, if a user is happy to use Julia's standard library Serialization module, there is a simplified workflow described first.
The usual serialization provisos apply. For example, when deserializing you need to have all code on which the serialization object depended loaded at the time of deserialization also. If a hyper-parameter happens to be a user-defined function, then that function must be defined at deserialization. And you should only deserialize objects from trusted sources.
Using Julia's native serializer
MLJModelInterface.save
— FunctionMLJ.save(filename, mach::Machine)
MLJ.save(io, mach::Machine)
MLJBase.save(filename, mach::Machine)
MLJBase.save(io, mach::Machine)
Serialize the machine mach
to a file with path filename
, or to an input/output stream io
(at least IOBuffer
instances are supported) using the Serialization module.
To serialise using a different format, see serializable
.
Machines are deserialized using the machine
constructor as shown in the example below.
The implementation of save
for machines changed in MLJ 0.18 (MLJBase 0.20). You can only restore a machine saved using older versions of MLJ using an older version.
Example
using MLJ
Tree = @load DecisionTreeClassifier
X, y = @load_iris
mach = fit!(machine(Tree(), X, y))
MLJ.save("tree.jls", mach)
mach_predict_only = machine("tree.jls")
predict(mach_predict_only, X)
# using a buffer:
io = IOBuffer()
MLJ.save(io, mach)
seekstart(io)
predict_only_mach = machine(io)
predict(predict_only_mach, X)
Maliciously constructed JLS files, like pickles, and most other general purpose serialization formats, can allow for arbitrary code execution during loading. This means it is possible for someone to use a JLS file that looks like a serialized MLJ machine as a Trojan horse.
See also serializable
, machine
.
MLJ.save(mach)
MLJBase.save(mach)
Save the current machine as an artifact at the location associated with default_logger
](@ref).
Using an arbitrary serializer
Since machines contain training data, serializing a machine directly is not recommended. Also, the learned parameters of models implemented in a language other than Julia may not have persistent representations, which means serializing them is useless. To address these two issues, users:
Call
serializable(mach)
on a machinemach
they wish to save (to remove data and create persistent learned parameters)Serialize the returned object using
SomeSerializationPkg
To restore the original machine (minus training data) they:
- Deserialize using
SomeSerializationPkg
to obtain a new objectmach
- Call
restore!(mach)
to ensuremach
can be used to predict or transform new data.
MLJBase.serializable
— Functionserializable(mach::Machine)
Returns a shallow copy of the machine to make it serializable. In particular, all training data is removed and, if necessary, learned parameters are replaced with persistent representations.
Any general purpose Julia serializer may be applied to the output of serializable
(eg, JLSO, BSON, JLD) but you must call restore!(mach)
on the deserialised object mach
before using it. See the example below.
If using Julia's standard Serialization library, a shorter workflow is available using the MLJBase.save
(or MLJ.save
) method.
A machine returned by serializable
is characterized by the property mach.state == -1
.
Example using JLSO
using MLJ
using JLSO
Tree = @load DecisionTreeClassifier
tree = Tree()
X, y = @load_iris
mach = fit!(machine(tree, X, y))
# This machine can now be serialized
smach = serializable(mach)
JLSO.save("machine.jlso", :machine => smach)
# Deserialize and restore learned parameters to useable form:
loaded_mach = JLSO.load("machine.jlso")[:machine]
restore!(loaded_mach)
predict(loaded_mach, X)
predict(mach, X)
See also restore!
, MLJBase.save
.
MLJBase.restore!
— Functionrestore!(mach::Machine)
Restore the state of a machine that is currently serializable but which may not be otherwise usable. For such a machine, mach
, one has mach.state=1
. Intended for restoring deserialized machine objects to a useable form.
For an example see serializable
.
Internals
For a supervised machine, the predict
method calls a lower-level MLJBase.predict
method, dispatched on the underlying model and the fitresult
(see below). To see predict
in action, as well as its unsupervised cousins transform
and inverse_transform
, see Getting Started.
Except for model
, a Machine
instance has several fields which the user should not directly access; these include:
model
- the struct containing the hyperparameters to be used in calls tofit!
fitresult
- the learned parameters in a raw form, initially undefinedargs
- a tuple of the data, each element wrapped in a source node; see Learning Networks (in the supervised learning example above,args = (source(X), source(y))
)report
- outputs of training not encoded infitresult
(eg, feature rankings), initially undefinedold_model
- a deep copy of the model used in the last call tofit!
old_rows
- a copy of the row indices used in the last call tofit!
cache
The interested reader can learn more about machine internals by examining the simplified code excerpt in Internals.
API Reference
MLJBase.machine
— Functionmachine(model, args...; cache=true, scitype_check_level=1)
Construct a Machine
object binding a model
, storing hyper-parameters of some machine learning algorithm, to some data, args
. Calling fit!
on a Machine
instance mach
stores outcomes of applying the algorithm in mach
, which can be inspected using fitted_params(mach)
(learned paramters) and report(mach)
(other outcomes). This in turn enables generalization to new data using operations such as predict
or transform
:
using MLJModels
X, y = make_regression()
PCA = @load PCA pkg=MultivariateStats
model = PCA()
mach = machine(model, X)
fit!(mach, rows=1:50)
transform(mach, selectrows(X, 51:100)) # or transform(mach, rows=51:100)
DecisionTreeRegressor = @load DecisionTreeRegressor pkg=DecisionTree
model = DecisionTreeRegressor()
mach = machine(model, X, y)
fit!(mach, rows=1:50)
predict(mach, selectrows(X, 51:100)) # or predict(mach, rows=51:100)
Specify cache=false
to prioritize memory management over speed.
When building a learning network, Node
objects can be substituted for the concrete data but no type or dimension checks are applied.
Checks on the types of training data
A model articulates its data requirements using scientific types, i.e., using the scitype
function instead of the typeof
function.
If scitype_check_level > 0
then the scitype of each arg
in args
is computed, and this is compared with the scitypes expected by the model, unless args
contains Unknown
scitypes and scitype_check_level < 4
, in which case no further action is taken. Whether warnings are issued or errors thrown depends the level. For details, see default_scitype_check_level
, a method to inspect or change the default level (1
at startup).
Machines with model placeholders
A symbol can be substituted for a model in machine constructors to act as a placeholder for a model specified at training time. The symbol must be the field name for a struct whose corresponding value is a model, as shown in the following example:
mutable struct MyComposite
transformer
classifier
end
my_composite = MyComposite(Standardizer(), ConstantClassifier)
X, y = make_blobs()
mach = machine(:classifier, X, y)
fit!(mach, composite=my_composite)
The last two lines are equivalent to
mach = machine(ConstantClassifier(), X, y)
fit!(mach)
Delaying model specification is used when exporting learning networks as new stand-alone model types. See prefit
and the MLJ documentation on learning networks.
See also fit!
, default_scitype_check_level
, MLJBase.save
, serializable
.
StatsAPI.fit!
— Functionfit!(mach::Machine, rows=nothing, verbosity=1, force=false, composite=nothing)
Fit the machine mach
. In the case that mach
has Node
arguments, first train all other machines on which mach
depends.
To attempt to fit a machine without touching any other machine, use fit_only!
. For more on options and the the internal logic of fitting see fit_only!
fit!(N::Node;
rows=nothing,
verbosity=1,
force=false,
acceleration=CPU1())
Train all machines required to call the node N
, in an appropriate order, but parallelizing where possible using specified acceleration
mode. These machines are those returned by machines(N)
.
Supported modes of acceleration
: CPU1()
, CPUThreads()
.
MLJBase.fit_only!
— FunctionMLJBase.fit_only!(
mach::Machine;
rows=nothing,
verbosity=1,
force=false,
composite=nothing,
)
Without mutating any other machine on which it may depend, perform one of the following actions to the machine mach
, using the data and model bound to it, and restricting the data to rows
if specified:
Ab initio training. Ignoring any previous learned parameters and cache, compute and store new learned parameters. Increment
mach.state
.Training update. Making use of previous learned parameters and/or cache, replace or mutate existing learned parameters. The effect is the same (or nearly the same) as in ab initio training, but may be faster or use less memory, assuming the model supports an update option (implements
MLJBase.update
). Incrementmach.state
.No-operation. Leave existing learned parameters untouched. Do not increment
mach.state
.
If the model, model
, bound to mach
is a symbol, then instead perform the action using the true model given by getproperty(composite, model)
. See also machine
.
Training action logic
For the action to be a no-operation, either mach.frozen == true
or or none of the following apply:
mach
has never been trained (mach.state == 0
).force == true
.The
state
of some other machine on whichmach
depends has changed since the last timemach
was trained (ie, the last timemach.state
was last incremented).The specified
rows
have changed since the last retraining andmach.model
does not haveStatic
type.mach.model
is a model and different from the last model used for training, but has the same type.mach.model
is a model but has a type different from the last model used for training.mach.model
is a symbol and(composite, mach.model)
is different from the last model used for training, but has the same type.mach.model
is a symbol and(composite, mach.model)
has a different type from the last model used for training.
In any of the cases (1) - (4), (6), or (8), mach
is trained ab initio. If (5) or (7) is true, then a training update is applied.
To freeze or unfreeze mach
, use freeze!(mach)
or thaw!(mach)
.
Implementation details
The data to which a machine is bound is stored in mach.args
. Each element of args
is either a Node
object, or, in the case that concrete data was bound to the machine, it is concrete data wrapped in a Source
node. In all cases, to obtain concrete data for actual training, each argument N
is called, as in N()
or N(rows=rows)
, and either MLJBase.fit
(ab initio training) or MLJBase.update
(training update) is dispatched on mach.model
and this data. See the "Adding models for general use" section of the MLJ documentation for more on these lower-level training methods.