Utilities

Machines

Base.replaceMethod
replace(mach::Machine, field1 => value1, field2 => value2, ...)

Private method.

Return a shallow copy of the machine mach with the specified field replacements. Undefined field values are preserved. Unspecified fields have identically equal values, with the exception of mach.fit_okay, which is always a new instance Channel{Bool}(1).

The following example returns a machine with no traces of training data (but also removes any upstream dependencies in a learning network):

replace(mach, :args => (), :data => (), :data_resampled_data => (), :cache => nothing)
source
MLJBase.ageMethod
age(mach::Machine)

Return an integer representing the number of times mach has been trained or updated. For more detail, see the discussion of training logic at fit_only!.

source
MLJBase.ancestorsMethod
ancestors(mach::Machine; self=false)

All ancestors of mach, including mach if self=true.

source
MLJBase.default_scitype_check_levelFunction
default_scitype_check_level()

Return the current global default value for scientific type checking when constructing machines.

default_scitype_check_level(i::Integer)

Set the global default value for scientific type checking to i.

The effect of the scitype_check_level option in calls of the form machine(model, data, scitype_check_level=...) is summarized below:

scitype_check_levelInspect scitypes?If Unknown in scitypesIf other scitype mismatch
0×
1 (value at startup)warning
2warningwarning
3warningerror
4errorerror

See also machine

source
MLJBase.fit_only!Method
MLJBase.fit_only!(
    mach::Machine;
    rows=nothing,
    verbosity=1,
    force=false,
    composite=nothing,
)

Without mutating any other machine on which it may depend, perform one of the following actions to the machine mach, using the data and model bound to it, and restricting the data to rows if specified:

  • Ab initio training. Ignoring any previous learned parameters and cache, compute and store new learned parameters. Increment mach.state.

  • Training update. Making use of previous learned parameters and/or cache, replace or mutate existing learned parameters. The effect is the same (or nearly the same) as in ab initio training, but may be faster or use less memory, assuming the model supports an update option (implements MLJBase.update). Increment mach.state.

  • No-operation. Leave existing learned parameters untouched. Do not increment mach.state.

If the model, model, bound to mach is a symbol, then instead perform the action using the true model given by getproperty(composite, model). See also machine.

Training action logic

For the action to be a no-operation, either mach.frozen == true or or none of the following apply:

  1. mach has never been trained (mach.state == 0).

  2. force == true.

  3. The state of some other machine on which mach depends has changed since the last time mach was trained (ie, the last time mach.state was last incremented).

  4. The specified rows have changed since the last retraining and mach.model does not have Static type.

  5. mach.model is a model and different from the last model used for training, but has the same type.

  6. mach.model is a model but has a type different from the last model used for training.

  7. mach.model is a symbol and (composite, mach.model) is different from the last model used for training, but has the same type.

  8. mach.model is a symbol and (composite, mach.model) has a different type from the last model used for training.

In any of the cases (1) - (4), (6), or (8), mach is trained ab initio. If (5) or (7) is true, then a training update is applied.

To freeze or unfreeze mach, use freeze!(mach) or thaw!(mach).

Implementation details

The data to which a machine is bound is stored in mach.args. Each element of args is either a Node object, or, in the case that concrete data was bound to the machine, it is concrete data wrapped in a Source node. In all cases, to obtain concrete data for actual training, each argument N is called, as in N() or N(rows=rows), and either MLJBase.fit (ab initio training) or MLJBase.update (training update) is dispatched on mach.model and this data. See the "Adding models for general use" section of the MLJ documentation for more on these lower-level training methods.

source
MLJBase.freeze!Method
freeze!(mach)

Freeze the machine mach so that it will never be retrained (unless thawed).

See also thaw!.

source
MLJBase.last_modelMethod
last_model(mach::Machine)

Return the last model used to train the machine mach. This is a bona fide model, even if mach.model is a symbol.

Returns nothing if mach has not been trained.

source
MLJBase.machineFunction
machine(model, args...; cache=true, scitype_check_level=1)

Construct a Machine object binding a model, storing hyper-parameters of some machine learning algorithm, to some data, args. Calling fit! on a Machine instance mach stores outcomes of applying the algorithm in mach, which can be inspected using fitted_params(mach) (learned paramters) and report(mach) (other outcomes). This in turn enables generalization to new data using operations such as predict or transform:

using MLJModels
X, y = make_regression()

PCA = @load PCA pkg=MultivariateStats
model = PCA()
mach = machine(model, X)
fit!(mach, rows=1:50)
transform(mach, selectrows(X, 51:100)) # or transform(mach, rows=51:100)

DecisionTreeRegressor = @load DecisionTreeRegressor pkg=DecisionTree
model = DecisionTreeRegressor()
mach = machine(model, X, y)
fit!(mach, rows=1:50)
predict(mach, selectrows(X, 51:100)) # or predict(mach, rows=51:100)

Specify cache=false to prioritize memory management over speed.

When building a learning network, Node objects can be substituted for the concrete data but no type or dimension checks are applied.

Checks on the types of training data

A model articulates its data requirements using scientific types, i.e., using the scitype function instead of the typeof function.

If scitype_check_level > 0 then the scitype of each arg in args is computed, and this is compared with the scitypes expected by the model, unless args contains Unknown scitypes and scitype_check_level < 4, in which case no further action is taken. Whether warnings are issued or errors thrown depends the level. For details, see default_scitype_check_level, a method to inspect or change the default level (1 at startup).

Machines with model placeholders

A symbol can be substituted for a model in machine constructors to act as a placeholder for a model specified at training time. The symbol must be the field name for a struct whose corresponding value is a model, as shown in the following example:

mutable struct MyComposite
    transformer
    classifier
end

my_composite = MyComposite(Standardizer(), ConstantClassifier)

X, y = make_blobs()
mach = machine(:classifier, X, y)
fit!(mach, composite=my_composite)

The last two lines are equivalent to

mach = machine(ConstantClassifier(), X, y)
fit!(mach)

Delaying model specification is used when exporting learning networks as new stand-alone model types. See prefit and the MLJ documentation on learning networks.

See also fit!, default_scitype_check_level, MLJBase.save, serializable.

source
MLJBase.machineMethod
machine(file::Union{String, IO})

Rebuild from a file a machine that has been serialized using the default Serialization module.

source
MLJBase.reportMethod
report(mach)

Return the report for a machine mach that has been fit!, for example the coefficients in a linear model.

This is a named tuple and human-readable if possible.

If mach is a machine for a composite model, such as a model constructed using the pipeline syntax model1 |> model2 |> ..., then the returned named tuple has the composite type's field names as keys. The corresponding value is the report for the machine in the underlying learning network bound to that model. (If multiple machines share the same model, then the value is a vector.)

julia> using MLJ
julia> @load LinearBinaryClassifier pkg=GLM
julia> X, y = @load_crabs;
julia> pipe = Standardizer() |> LinearBinaryClassifier();
julia> mach = machine(pipe, X, y) |> fit!;

julia> report(mach).linear_binary_classifier
(deviance = 3.8893386087844543e-7,
 dof_residual = 195.0,
 stderror = [18954.83496713119, 6502.845740757159, 48484.240246060406, 34971.131004997274, 20654.82322484894, 2111.1294584763386],
 vcov = [3.592857686311793e8 9.122732393971942e6 … -8.454645589364915e7 5.38856837634321e6; 9.122732393971942e6 4.228700272808351e7 … -4.978433790526467e7 -8.442545425533723e6; … ; -8.454645589364915e7 -4.978433790526467e7 … 4.2662172244975924e8 2.1799125705781363e7; 5.38856837634321e6 -8.442545425533723e6 … 2.1799125705781363e7 4.456867590446599e6],)

See also fitted_params

source
MLJBase.report_given_methodMethod
report_given_method(mach::Machine)

Same as report(mach) but broken down by the method (fit, predict, etc) that contributed the report.

A specialized method intended for learning network applications.

The return value is a dictionary keyed on the symbol representing the method (:fit, :predict, etc) and the values report contributed by that method.

source
MLJBase.restore!Function
restore!(mach::Machine)

Restore the state of a machine that is currently serializable but which may not be otherwise usable. For such a machine, mach, one has mach.state=1. Intended for restoring deserialized machine objects to a useable form.

For an example see serializable.

source
MLJBase.serializableMethod
serializable(mach::Machine)

Returns a shallow copy of the machine to make it serializable. In particular, all training data is removed and, if necessary, learned parameters are replaced with persistent representations.

Any general purpose Julia serializer may be applied to the output of serializable (eg, JLSO, BSON, JLD) but you must call restore!(mach) on the deserialised object mach before using it. See the example below.

If using Julia's standard Serialization library, a shorter workflow is available using the MLJBase.save (or MLJ.save) method.

A machine returned by serializable is characterized by the property mach.state == -1.

Example using JLSO

using MLJ
using JLSO
Tree = @load DecisionTreeClassifier
tree = Tree()
X, y = @load_iris
mach = fit!(machine(tree, X, y))

# This machine can now be serialized
smach = serializable(mach)
JLSO.save("machine.jlso", :machine => smach)

# Deserialize and restore learned parameters to useable form:
loaded_mach = JLSO.load("machine.jlso")[:machine]
restore!(loaded_mach)

predict(loaded_mach, X)
predict(mach, X)

See also restore!, MLJBase.save.

source
MLJModelInterface.fitted_paramsMethod
fitted_params(mach)

Return the learned parameters for a machine mach that has been fit!, for example the coefficients in a linear model.

This is a named tuple and human-readable if possible.

If mach is a machine for a composite model, such as a model constructed using the pipeline syntax model1 |> model2 |> ..., then the returned named tuple has the composite type's field names as keys. The corresponding value is the fitted parameters for the machine in the underlying learning network bound to that model. (If multiple machines share the same model, then the value is a vector.)

julia> using MLJ
julia> @load LogisticClassifier pkg=MLJLinearModels
julia> X, y = @load_crabs;
julia> pipe = Standardizer() |> LogisticClassifier();
julia> mach = machine(pipe, X, y) |> fit!;

julia> fitted_params(mach).logistic_classifier
(classes = CategoricalArrays.CategoricalValue{String,UInt32}["B", "O"],
 coefs = Pair{Symbol,Float64}[:FL => 3.7095037897680405, :RW => 0.1135739140854546, :CL => -1.6036892745322038, :CW => -4.415667573486482, :BD => 3.238476051092471],
 intercept = 0.0883301599726305,)

See also report

source
MLJModelInterface.saveMethod
MLJ.save(mach)
MLJBase.save(mach)

Save the current machine as an artifact at the location associated with default_logger](@ref).

source
MLJModelInterface.saveMethod
MLJ.save(filename, mach::Machine)
MLJ.save(io, mach::Machine)

MLJBase.save(filename, mach::Machine)
MLJBase.save(io, mach::Machine)

Serialize the machine mach to a file with path filename, or to an input/output stream io (at least IOBuffer instances are supported) using the Serialization module.

To serialise using a different format, see serializable.

Machines are deserialized using the machine constructor as shown in the example below.

Note

The implementation of save for machines changed in MLJ 0.18 (MLJBase 0.20). You can only restore a machine saved using older versions of MLJ using an older version.

Example

using MLJ
Tree = @load DecisionTreeClassifier
X, y = @load_iris
mach = fit!(machine(Tree(), X, y))

MLJ.save("tree.jls", mach)
mach_predict_only = machine("tree.jls")
predict(mach_predict_only, X)

# using a buffer:
io = IOBuffer()
MLJ.save(io, mach)
seekstart(io)
predict_only_mach = machine(io)
predict(predict_only_mach, X)
Only load files from trusted sources

Maliciously constructed JLS files, like pickles, and most other general purpose serialization formats, can allow for arbitrary code execution during loading. This means it is possible for someone to use a JLS file that looks like a serialized MLJ machine as a Trojan horse.

See also serializable, machine.

source
StatsAPI.fit!Method
fit!(mach::Machine, rows=nothing, verbosity=1, force=false, composite=nothing)

Fit the machine mach. In the case that mach has Node arguments, first train all other machines on which mach depends.

To attempt to fit a machine without touching any other machine, use fit_only!. For more on options and the the internal logic of fitting see fit_only!

source

Parameter Inspection

Show

MLJBase._recursive_showMethod
_recursive_show(stream, object, current_depth, depth)

Private method.

Generate a table of the properties of the MLJType object, dislaying each property value by calling the method _show on it. The behaviour of _show(stream, f) is as follows:

  1. If f is itself a MLJType object, then its short form is shown and _recursive_show generates as separate table for each of its properties (and so on, up to a depth of argument depth).

  2. Otherwise f is displayed as "(omitted T)" where T = typeof(f), unless istoobig(f) is false (the istoobig fall-back for arbitrary types being true). In the latter case, the long (ie, MIME"plain/text") form of f is shown. To override this behaviour, overload the _show method for the type in question.

source
MLJBase.color_offMethod
color_off()

Suppress color and bold output at the REPL for displaying MLJ objects.

source
MLJBase.color_onMethod
color_on()

Enable color and bold output at the REPL, for enhanced display of MLJ objects.

source
MLJBase.handleMethod
handle(X)

return abbreviated object id (as string) or it's registered handle (as string) if this exists

source
MLJBase.@constantMacro
@constant x = value

Private method (used in testing).

Equivalent to const x = value but registers the binding thus:

MLJBase.HANDLE_GIVEN_ID[objectid(value)] = :x

Registered objects get displayed using the variable name to which it was bound in calls to show(x), etc.

Warning

As with any const declaration, binding x to new value of the same type is not prevented and the registration will not be updated.

source
MLJBase.@moreMacro
@more

Entered at the REPL, equivalent to show(ans, 100). Use to get a recursive description of all properties of the last REPL value.

source

Utility functions

MLJBase._permute_rowsMethod
_permute_rows(obj, perm)

Internal function to return a vector or matrix with permuted rows given the permutation perm.

source
MLJBase.available_nameMethod
available_name(modl::Module, name::Symbol)

Function to replace, if necessary, a given name with a modified one that ensures it is not the name of any existing object in the global scope of modl. Modifications are created with numerical suffixes.

source
MLJBase.check_same_nrowsMethod
check_same_nrows(X, Y)

Internal function to check two objects, each a vector or a matrix, have the same number of rows.

source
MLJBase.chunksMethod
chunks(range, n)

Split an AbstractRange into n subranges of approximately equal length.

Example

julia> collect(chunks(1:5, 2))
2-element Vector{UnitRange{Int64}}:
 1:3
 4:5

Private method

source
MLJBase.flat_valuesMethod
flat_values(t::NamedTuple)

View a nested named tuple t as a tree and return, as a tuple, the values at the leaves, in the order they appear in the original tuple.

julia> t = (X = (x = 1, y = 2), Y = 3);
julia> flat_values(t)
(1, 2, 3)
source
MLJBase.generate_name!Method
generate_name!(M, existing_names; only=Union{Function,Type}, substitute=:f)

Given a type M (e.g., MyEvenInteger{N}) return a symbolic, snake-case, representation of the type name (such as my_even_integer). The symbol is pushed to existing_names, which must be an AbstractVector to which a Symbol can be pushed.

If the snake-case representation already exists in existing_names a suitable integer is appended to the name.

If only is specified, then the operation is restricted to those M for which M isa only. In all other cases the symbolic name is generated using substitute as the base symbol.

julia> existing_names = [];
julia> generate_name!(Vector{Int}, existing_names)
:vector

julia> generate_name!(Vector{Int}, existing_names)
:vector2

julia> generate_name!(AbstractFloat, existing_names)
:abstract_float

julia> generate_name!(Int, existing_names, only=Array, substitute=:not_array)
:not_array

julia> generate_name!(Int, existing_names, only=Array, substitute=:not_array)
:not_array2
source
MLJBase.guess_model_target_observation_scitypeMethod
guess_model_targetobservation_scitype(model)

Private method

Try to infer a lowest upper bound on the scitype of target observations acceptable to model, by inspecting target_scitype(model). Return Unknown if unable to draw reliable inferrence.

The observation scitype for a table is here understood as the scitype of a row converted to a vector.

source
MLJBase.guess_observation_scitypeMethod
guess_observation_scitype(y)

Private method.

If y is an AbstractArray, return the scitype of y[:, :, ..., :, 1]. If y is a table, return the scitype of the first row, converted to a vector, unless this row has missing elements, in which case return Unknown.

In all other cases, Unknown.

julia> guess_observation_scitype([missing, 1, 2, 3])
Union{Missing, Count}

julia> guess_observation_scitype(rand(3, 2))
AbstractVector{Continuous}

julia> guess_observation_scitype((x=rand(3), y=rand(Bool, 3)))
AbstractVector{Union{Continuous, Count}}

julia> guess_observation_scitype((x=[missing, 1, 2], y=[1, 2, 3]))
Unknown
source
MLJBase.init_rngMethod
init_rng(rng)

Create an AbstractRNG from rng. If rng is a non-negative Integer, it returns a MersenneTwister random number generator seeded with rng; If rng is an AbstractRNG object it returns rng, otherwise it throws an error.

source
MLJBase.observationMethod
observation(S)

Private method.

Tries to infer the per-observation scitype from the scitype of S, when S is known to be the scitype of some container with multiple observations; here we view the scitype for one row of a table to be the scitype of the row converted to a vector. Return Unknown if unable to draw reliable inferrence.

The observation scitype for a table is here understood as the scitype of a row converted to a vector.

source
MLJBase.prependMethod
MLJBase.prepend(::Symbol, ::Union{Symbol,Expr,Nothing})

For prepending symbols in expressions like :(y.w) and :(x1.x2.x3).

julia> prepend(:x, :y)
:(x.y)

julia> prepend(:x, :(y.z))
:(x.y.z)

julia> prepend(:w, ans)
:(w.x.y.z)

If the second argument is nothing, then nothing is returned.

source
MLJBase.recursive_getpropertyMethod
recursive_getproperty(object, nested_name::Expr)

Call getproperty recursively on object to extract the value of some nested property, as in the following example:

julia> object = (X = (x = 1, y = 2), Y = 3);
julia> recursive_getproperty(object, :(X.y))
2
source
MLJBase.recursive_setproperty!Method
recursively_setproperty!(object, nested_name::Expr, value)

Set a nested property of an object to value, as in the following example:

julia> mutable struct Foo
           X
           Y
       end

julia> mutable struct Bar
           x
           y
       end

julia> object = Foo(Bar(1, 2), 3)
Foo(Bar(1, 2), 3)

julia> recursively_setproperty!(object, :(X.y), 42)
42

julia> object
Foo(Bar(1, 42), 3)
source
MLJBase.sequence_stringMethod
sequence_string(itr, n=3)

Return a "sequence" string from the first n elements generated by itr.

julia> MLJBase.sequence_string(1:10, 4)
"1, 2, 3, 4, ..."

Private method.

source
MLJBase.shuffle_rowsMethod
shuffle_rows(X::AbstractVecOrMat,
             Y::AbstractVecOrMat;
             rng::AbstractRNG=Random.GLOBAL_RNG)

Return row-shuffled vectors or matrices using a random permutation of X and Y. An optional random number generator can be specified using the rng argument.

source
MLJBase.unwindMethod
unwind(iterators...)

Represent all possible combinations of values generated by iterators as rows of a matrix A. In more detail, A has one column for each iterator in iterators and one row for each distinct possible combination of values taken on by the iterators. Elements in the first column cycle fastest, those in the last clolumn slowest.

Example

julia> iterators = ([1, 2], ["a","b"], ["x", "y", "z"]);
julia> MLJTuning.unwind(iterators...)
12×3 Matrix{Any}:
 1  "a"  "x"
 2  "a"  "x"
 1  "b"  "x"
 2  "b"  "x"
 1  "a"  "y"
 2  "a"  "y"
 1  "b"  "y"
 2  "b"  "y"
 1  "a"  "z"
 2  "a"  "z"
 1  "b"  "z"
 2  "b"  "z"
source