Utilities
Machines
Base.replace
— Methodreplace(mach::Machine, field1 => value1, field2 => value2, ...)
Private method.
Return a shallow copy of the machine mach
with the specified field replacements. Undefined field values are preserved. Unspecified fields have identically equal values, with the exception of mach.fit_okay
, which is always a new instance Channel{Bool}(1)
.
The following example returns a machine with no traces of training data (but also removes any upstream dependencies in a learning network):
replace(mach, :args => (), :data => (), :data_resampled_data => (), :cache => nothing)
MLJBase.age
— Methodage(mach::Machine)
Return an integer representing the number of times mach
has been trained or updated. For more detail, see the discussion of training logic at fit_only!
.
MLJBase.ancestors
— Methodancestors(mach::Machine; self=false)
All ancestors of mach
, including mach
if self=true
.
MLJBase.default_scitype_check_level
— Functiondefault_scitype_check_level()
Return the current global default value for scientific type checking when constructing machines.
default_scitype_check_level(i::Integer)
Set the global default value for scientific type checking to i
.
The effect of the scitype_check_level
option in calls of the form machine(model, data, scitype_check_level=...)
is summarized below:
scitype_check_level | Inspect scitypes? | If Unknown in scitypes | If other scitype mismatch |
---|---|---|---|
0 | × | ||
1 (value at startup) | ✓ | warning | |
2 | ✓ | warning | warning |
3 | ✓ | warning | error |
4 | ✓ | error | error |
See also machine
MLJBase.fit_only!
— MethodMLJBase.fit_only!(
mach::Machine;
rows=nothing,
verbosity=1,
force=false,
composite=nothing,
)
Without mutating any other machine on which it may depend, perform one of the following actions to the machine mach
, using the data and model bound to it, and restricting the data to rows
if specified:
Ab initio training. Ignoring any previous learned parameters and cache, compute and store new learned parameters. Increment
mach.state
.Training update. Making use of previous learned parameters and/or cache, replace or mutate existing learned parameters. The effect is the same (or nearly the same) as in ab initio training, but may be faster or use less memory, assuming the model supports an update option (implements
MLJBase.update
). Incrementmach.state
.No-operation. Leave existing learned parameters untouched. Do not increment
mach.state
.
If the model, model
, bound to mach
is a symbol, then instead perform the action using the true model given by getproperty(composite, model)
. See also machine
.
Training action logic
For the action to be a no-operation, either mach.frozen == true
or or none of the following apply:
mach
has never been trained (mach.state == 0
).force == true
.The
state
of some other machine on whichmach
depends has changed since the last timemach
was trained (ie, the last timemach.state
was last incremented).The specified
rows
have changed since the last retraining andmach.model
does not haveStatic
type.mach.model
is a model and different from the last model used for training, but has the same type.mach.model
is a model but has a type different from the last model used for training.mach.model
is a symbol and(composite, mach.model)
is different from the last model used for training, but has the same type.mach.model
is a symbol and(composite, mach.model)
has a different type from the last model used for training.
In any of the cases (1) - (4), (6), or (8), mach
is trained ab initio. If (5) or (7) is true, then a training update is applied.
To freeze or unfreeze mach
, use freeze!(mach)
or thaw!(mach)
.
Implementation details
The data to which a machine is bound is stored in mach.args
. Each element of args
is either a Node
object, or, in the case that concrete data was bound to the machine, it is concrete data wrapped in a Source
node. In all cases, to obtain concrete data for actual training, each argument N
is called, as in N()
or N(rows=rows)
, and either MLJBase.fit
(ab initio training) or MLJBase.update
(training update) is dispatched on mach.model
and this data. See the "Adding models for general use" section of the MLJ documentation for more on these lower-level training methods.
MLJBase.freeze!
— Methodfreeze!(mach)
Freeze the machine mach
so that it will never be retrained (unless thawed).
See also thaw!
.
MLJBase.last_model
— Methodlast_model(mach::Machine)
Return the last model used to train the machine mach
. This is a bona fide model, even if mach.model
is a symbol.
Returns nothing
if mach
has not been trained.
MLJBase.machine
— Functionmachine(model, args...; cache=true, scitype_check_level=1)
Construct a Machine
object binding a model
, storing hyper-parameters of some machine learning algorithm, to some data, args
. Calling fit!
on a Machine
instance mach
stores outcomes of applying the algorithm in mach
, which can be inspected using fitted_params(mach)
(learned paramters) and report(mach)
(other outcomes). This in turn enables generalization to new data using operations such as predict
or transform
:
using MLJModels
X, y = make_regression()
PCA = @load PCA pkg=MultivariateStats
model = PCA()
mach = machine(model, X)
fit!(mach, rows=1:50)
transform(mach, selectrows(X, 51:100)) # or transform(mach, rows=51:100)
DecisionTreeRegressor = @load DecisionTreeRegressor pkg=DecisionTree
model = DecisionTreeRegressor()
mach = machine(model, X, y)
fit!(mach, rows=1:50)
predict(mach, selectrows(X, 51:100)) # or predict(mach, rows=51:100)
Specify cache=false
to prioritize memory management over speed.
When building a learning network, Node
objects can be substituted for the concrete data but no type or dimension checks are applied.
Checks on the types of training data
A model articulates its data requirements using scientific types, i.e., using the scitype
function instead of the typeof
function.
If scitype_check_level > 0
then the scitype of each arg
in args
is computed, and this is compared with the scitypes expected by the model, unless args
contains Unknown
scitypes and scitype_check_level < 4
, in which case no further action is taken. Whether warnings are issued or errors thrown depends the level. For details, see default_scitype_check_level
, a method to inspect or change the default level (1
at startup).
Machines with model placeholders
A symbol can be substituted for a model in machine constructors to act as a placeholder for a model specified at training time. The symbol must be the field name for a struct whose corresponding value is a model, as shown in the following example:
mutable struct MyComposite
transformer
classifier
end
my_composite = MyComposite(Standardizer(), ConstantClassifier)
X, y = make_blobs()
mach = machine(:classifier, X, y)
fit!(mach, composite=my_composite)
The last two lines are equivalent to
mach = machine(ConstantClassifier(), X, y)
fit!(mach)
Delaying model specification is used when exporting learning networks as new stand-alone model types. See prefit
and the MLJ documentation on learning networks.
See also fit!
, default_scitype_check_level
, MLJBase.save
, serializable
.
MLJBase.machine
— Methodmachine(file::Union{String, IO})
Rebuild from a file a machine that has been serialized using the default Serialization module.
MLJBase.report
— Methodreport(mach)
Return the report for a machine mach
that has been fit!
, for example the coefficients in a linear model.
This is a named tuple and human-readable if possible.
If mach
is a machine for a composite model, such as a model constructed using the pipeline syntax model1 |> model2 |> ...
, then the returned named tuple has the composite type's field names as keys. The corresponding value is the report for the machine in the underlying learning network bound to that model. (If multiple machines share the same model, then the value is a vector.)
julia> using MLJ
julia> @load LinearBinaryClassifier pkg=GLM
julia> X, y = @load_crabs;
julia> pipe = Standardizer() |> LinearBinaryClassifier();
julia> mach = machine(pipe, X, y) |> fit!;
julia> report(mach).linear_binary_classifier
(deviance = 3.8893386087844543e-7,
dof_residual = 195.0,
stderror = [18954.83496713119, 6502.845740757159, 48484.240246060406, 34971.131004997274, 20654.82322484894, 2111.1294584763386],
vcov = [3.592857686311793e8 9.122732393971942e6 … -8.454645589364915e7 5.38856837634321e6; 9.122732393971942e6 4.228700272808351e7 … -4.978433790526467e7 -8.442545425533723e6; … ; -8.454645589364915e7 -4.978433790526467e7 … 4.2662172244975924e8 2.1799125705781363e7; 5.38856837634321e6 -8.442545425533723e6 … 2.1799125705781363e7 4.456867590446599e6],)
See also fitted_params
MLJBase.report_given_method
— Methodreport_given_method(mach::Machine)
Same as report(mach)
but broken down by the method (fit
, predict
, etc) that contributed the report.
A specialized method intended for learning network applications.
The return value is a dictionary keyed on the symbol representing the method (:fit
, :predict
, etc) and the values report contributed by that method.
MLJBase.restore!
— Functionrestore!(mach::Machine)
Restore the state of a machine that is currently serializable but which may not be otherwise usable. For such a machine, mach
, one has mach.state=1
. Intended for restoring deserialized machine objects to a useable form.
For an example see serializable
.
MLJBase.serializable
— Methodserializable(mach::Machine)
Returns a shallow copy of the machine to make it serializable. In particular, all training data is removed and, if necessary, learned parameters are replaced with persistent representations.
Any general purpose Julia serializer may be applied to the output of serializable
(eg, JLSO, BSON, JLD) but you must call restore!(mach)
on the deserialised object mach
before using it. See the example below.
If using Julia's standard Serialization library, a shorter workflow is available using the MLJBase.save
(or MLJ.save
) method.
A machine returned by serializable
is characterized by the property mach.state == -1
.
Example using JLSO
using MLJ
using JLSO
Tree = @load DecisionTreeClassifier
tree = Tree()
X, y = @load_iris
mach = fit!(machine(tree, X, y))
# This machine can now be serialized
smach = serializable(mach)
JLSO.save("machine.jlso", :machine => smach)
# Deserialize and restore learned parameters to useable form:
loaded_mach = JLSO.load("machine.jlso")[:machine]
restore!(loaded_mach)
predict(loaded_mach, X)
predict(mach, X)
See also restore!
, MLJBase.save
.
MLJBase.thaw!
— MethodMLJModelInterface.feature_importances
— Methodfeature_importances(mach::Machine)
Return a list of feature => importance
pairs for a fitted machine, mach
, for supported models. Otherwise return nothing
.
MLJModelInterface.fitted_params
— Methodfitted_params(mach)
Return the learned parameters for a machine mach
that has been fit!
, for example the coefficients in a linear model.
This is a named tuple and human-readable if possible.
If mach
is a machine for a composite model, such as a model constructed using the pipeline syntax model1 |> model2 |> ...
, then the returned named tuple has the composite type's field names as keys. The corresponding value is the fitted parameters for the machine in the underlying learning network bound to that model. (If multiple machines share the same model, then the value is a vector.)
julia> using MLJ
julia> @load LogisticClassifier pkg=MLJLinearModels
julia> X, y = @load_crabs;
julia> pipe = Standardizer() |> LogisticClassifier();
julia> mach = machine(pipe, X, y) |> fit!;
julia> fitted_params(mach).logistic_classifier
(classes = CategoricalArrays.CategoricalValue{String,UInt32}["B", "O"],
coefs = Pair{Symbol,Float64}[:FL => 3.7095037897680405, :RW => 0.1135739140854546, :CL => -1.6036892745322038, :CW => -4.415667573486482, :BD => 3.238476051092471],
intercept = 0.0883301599726305,)
See also report
MLJModelInterface.save
— MethodMLJ.save(mach)
MLJBase.save(mach)
Save the current machine as an artifact at the location associated with default_logger
](@ref).
MLJModelInterface.save
— MethodMLJ.save(filename, mach::Machine)
MLJ.save(io, mach::Machine)
MLJBase.save(filename, mach::Machine)
MLJBase.save(io, mach::Machine)
Serialize the machine mach
to a file with path filename
, or to an input/output stream io
(at least IOBuffer
instances are supported) using the Serialization module.
To serialise using a different format, see serializable
.
Machines are deserialized using the machine
constructor as shown in the example below.
The implementation of save
for machines changed in MLJ 0.18 (MLJBase 0.20). You can only restore a machine saved using older versions of MLJ using an older version.
Example
using MLJ
Tree = @load DecisionTreeClassifier
X, y = @load_iris
mach = fit!(machine(Tree(), X, y))
MLJ.save("tree.jls", mach)
mach_predict_only = machine("tree.jls")
predict(mach_predict_only, X)
# using a buffer:
io = IOBuffer()
MLJ.save(io, mach)
seekstart(io)
predict_only_mach = machine(io)
predict(predict_only_mach, X)
Maliciously constructed JLS files, like pickles, and most other general purpose serialization formats, can allow for arbitrary code execution during loading. This means it is possible for someone to use a JLS file that looks like a serialized MLJ machine as a Trojan horse.
See also serializable
, machine
.
StatsAPI.fit!
— Methodfit!(mach::Machine, rows=nothing, verbosity=1, force=false, composite=nothing)
Fit the machine mach
. In the case that mach
has Node
arguments, first train all other machines on which mach
depends.
To attempt to fit a machine without touching any other machine, use fit_only!
. For more on options and the the internal logic of fitting see fit_only!
Parameter Inspection
Show
MLJBase._recursive_show
— Method_recursive_show(stream, object, current_depth, depth)
Private method.
Generate a table of the properties of the MLJType
object, dislaying each property value by calling the method _show
on it. The behaviour of _show(stream, f)
is as follows:
If
f
is itself aMLJType
object, then its short form is shown and_recursive_show
generates as separate table for each of its properties (and so on, up to a depth of argumentdepth
).Otherwise
f
is displayed as "(omitted T)" whereT = typeof(f)
, unlessistoobig(f)
is false (theistoobig
fall-back for arbitrary types beingtrue
). In the latter case, the long (ie, MIME"plain/text") form off
is shown. To override this behaviour, overload the_show
method for the type in question.
MLJBase.abbreviated
— Methodabbreviated(n)
Display abbreviated versions of integers.
MLJBase.color_off
— Methodcolor_off()
Suppress color and bold output at the REPL for displaying MLJ objects.
MLJBase.color_on
— Methodcolor_on()
Enable color and bold output at the REPL, for enhanced display of MLJ objects.
MLJBase.handle
— Methodhandle(X)
return abbreviated object id (as string) or it's registered handle (as string) if this exists
MLJBase.@constant
— Macro@constant x = value
Private method (used in testing).
Equivalent to const x = value
but registers the binding thus:
MLJBase.HANDLE_GIVEN_ID[objectid(value)] = :x
Registered objects get displayed using the variable name to which it was bound in calls to show(x)
, etc.
As with any const
declaration, binding x
to new value of the same type is not prevented and the registration will not be updated.
MLJBase.@more
— Macro@more
Entered at the REPL, equivalent to show(ans, 100)
. Use to get a recursive description of all properties of the last REPL value.
Utility functions
MLJBase._permute_rows
— Method_permute_rows(obj, perm)
Internal function to return a vector or matrix with permuted rows given the permutation perm
.
MLJBase.available_name
— Methodavailable_name(modl::Module, name::Symbol)
Function to replace, if necessary, a given name
with a modified one that ensures it is not the name of any existing object in the global scope of modl
. Modifications are created with numerical suffixes.
MLJBase.check_same_nrows
— Methodcheck_same_nrows(X, Y)
Internal function to check two objects, each a vector or a matrix, have the same number of rows.
MLJBase.chunks
— Methodchunks(range, n)
Split an AbstractRange
into n
subranges of approximately equal length.
Example
julia> collect(chunks(1:5, 2))
2-element Vector{UnitRange{Int64}}:
1:3
4:5
Private method
MLJBase.flat_values
— Methodflat_values(t::NamedTuple)
View a nested named tuple t
as a tree and return, as a tuple, the values at the leaves, in the order they appear in the original tuple.
julia> t = (X = (x = 1, y = 2), Y = 3);
julia> flat_values(t)
(1, 2, 3)
MLJBase.generate_name!
— Methodgenerate_name!(M, existing_names; only=Union{Function,Type}, substitute=:f)
Given a type M
(e.g., MyEvenInteger{N}
) return a symbolic, snake-case, representation of the type name (such as my_even_integer
). The symbol is pushed to existing_names
, which must be an AbstractVector
to which a Symbol
can be pushed.
If the snake-case representation already exists in existing_names
a suitable integer is appended to the name.
If only
is specified, then the operation is restricted to those M
for which M isa only
. In all other cases the symbolic name is generated using substitute
as the base symbol.
julia> existing_names = [];
julia> generate_name!(Vector{Int}, existing_names)
:vector
julia> generate_name!(Vector{Int}, existing_names)
:vector2
julia> generate_name!(AbstractFloat, existing_names)
:abstract_float
julia> generate_name!(Int, existing_names, only=Array, substitute=:not_array)
:not_array
julia> generate_name!(Int, existing_names, only=Array, substitute=:not_array)
:not_array2
MLJBase.guess_model_target_observation_scitype
— Methodguess_model_targetobservation_scitype(model)
Private method
Try to infer a lowest upper bound on the scitype of target observations acceptable to model
, by inspecting target_scitype(model)
. Return Unknown
if unable to draw reliable inferrence.
The observation scitype for a table is here understood as the scitype of a row converted to a vector.
MLJBase.guess_observation_scitype
— Methodguess_observation_scitype(y)
Private method.
If y
is an AbstractArray
, return the scitype of y[:, :, ..., :, 1]
. If y
is a table, return the scitype of the first row, converted to a vector, unless this row has missing
elements, in which case return Unknown
.
In all other cases, Unknown
.
julia> guess_observation_scitype([missing, 1, 2, 3])
Union{Missing, Count}
julia> guess_observation_scitype(rand(3, 2))
AbstractVector{Continuous}
julia> guess_observation_scitype((x=rand(3), y=rand(Bool, 3)))
AbstractVector{Union{Continuous, Count}}
julia> guess_observation_scitype((x=[missing, 1, 2], y=[1, 2, 3]))
Unknown
MLJBase.init_rng
— Methodinit_rng(rng)
Create an AbstractRNG
from rng
. If rng
is a non-negative Integer
, it returns a MersenneTwister
random number generator seeded with rng
; If rng
is an AbstractRNG
object it returns rng
, otherwise it throws an error.
MLJBase.observation
— Methodobservation(S)
Private method.
Tries to infer the per-observation scitype from the scitype of S
, when S
is known to be the scitype of some container with multiple observations. Return Unknown
if unable to draw reliable inferrence.
The observation scitype for a table is here understood as the scitype of a row converted to a vector.
MLJBase.prepend
— MethodMLJBase.prepend(::Symbol, ::Union{Symbol,Expr,Nothing})
For prepending symbols in expressions like :(y.w)
and :(x1.x2.x3)
.
julia> prepend(:x, :y)
:(x.y)
julia> prepend(:x, :(y.z))
:(x.y.z)
julia> prepend(:w, ans)
:(w.x.y.z)
If the second argument is nothing
, then nothing
is returned.
MLJBase.recursive_getproperty
— Methodrecursive_getproperty(object, nested_name::Expr)
Call getproperty recursively on object
to extract the value of some nested property, as in the following example:
julia> object = (X = (x = 1, y = 2), Y = 3);
julia> recursive_getproperty(object, :(X.y))
2
MLJBase.recursive_setproperty!
— Methodrecursively_setproperty!(object, nested_name::Expr, value)
Set a nested property of an object
to value
, as in the following example:
julia> mutable struct Foo
X
Y
end
julia> mutable struct Bar
x
y
end
julia> object = Foo(Bar(1, 2), 3)
Foo(Bar(1, 2), 3)
julia> recursively_setproperty!(object, :(X.y), 42)
42
julia> object
Foo(Bar(1, 42), 3)
MLJBase.sequence_string
— Methodsequence_string(itr, n=3)
Return a "sequence" string from the first n
elements generated by itr
.
julia> MLJBase.sequence_string(1:10, 4)
"1, 2, 3, 4, ..."
Private method.
MLJBase.shuffle_rows
— Methodshuffle_rows(X::AbstractVecOrMat,
Y::AbstractVecOrMat;
rng::AbstractRNG=Random.GLOBAL_RNG)
Return row-shuffled vectors or matrices using a random permutation of X
and Y
. An optional random number generator can be specified using the rng
argument.
MLJBase.unwind
— Methodunwind(iterators...)
Represent all possible combinations of values generated by iterators
as rows of a matrix A
. In more detail, A
has one column for each iterator in iterators
and one row for each distinct possible combination of values taken on by the iterators. Elements in the first column cycle fastest, those in the last clolumn slowest.
Example
julia> iterators = ([1, 2], ["a","b"], ["x", "y", "z"]);
julia> MLJTuning.unwind(iterators...)
12×3 Matrix{Any}:
1 "a" "x"
2 "a" "x"
1 "b" "x"
2 "b" "x"
1 "a" "y"
2 "a" "y"
1 "b" "y"
2 "b" "y"
1 "a" "z"
2 "a" "z"
1 "b" "z"
2 "b" "z"