API
Functions
MLJ.EnsembleModel
— Method.EnsembleModel(atom=nothing, weights=Float64[], bagging_fraction=0.8, rng=GLOBAL_RNG, n=100, parallel=true, out_of_bag_measure=[])
Create a model for training an ensemble of n
learners, with optional bagging, each with associated model atom
. Ensembling is useful if fit!(machine(atom, data...))
does not create identical models on repeated calls (ie, is a stochastic model, such as a decision tree with randomized node selection criteria), or if bagging_fraction
is set to a value less than 1.0, or both. The constructor fails if no atom
is specified.
If rng
is an integer, then MersenneTwister(rng)
is the random number generator used for bagging. Otherwise some AbstractRNG
object is expected.
Predictions are weighted according to the vector weights
(to allow for external optimization) except in the case that atom
is a Deterministic
classifier. Uniform weights are used if weight
has zero length.
The ensemble model is Deterministic
or Probabilistic
, according to the corresponding supertype of atom
. In the case of deterministic classifiers (target_scitype_union(atom) <: Union{Multiclass,FiniteOrderedFactor}
), the predictions are majority votes, and for regressors (target_scitype_union(atom)<: Continuous
) they are ordinary averages. Probabilistic predictions are obtained by averaging the atomic probability distribution/mass functions; in particular, for regressors, the ensemble prediction on each input pattern has the type MixtureModel{VF,VS,D}
from the Distributions.jl package, where D
is the type of predicted distribution for atom
.
If a single measure or non-empty vector of measusres is specified by out_of_bag_measure
, then out of bag estimates of performance are reported.
MLJ.TunedModel
— Method.tuned_model = TunedModel(; model=nothing,
tuning=Grid(),
resampling=Holdout(),
measure=nothing,
operation=predict,
nested_ranges=NamedTuple(),
minimize=true,
full_report=true)
Construct a model wrapper for hyperparameter optimization of a supervised learner.
Calling fit!(mach)
on a machine mach=machine(tuned_model, X, y)
will: (i) Instigate a search, over clones of model
with the hyperparameter mutations specified by nested_ranges
, for that model optimizing the specified measure
, according to evaluations carried out using the specified tuning
strategy and resampling
strategy; and (ii) Fit a machine, mach_optimal = mach.fitresult
, wrapping the optimal model
object in all the provided data X, y
. Calling predict(mach, Xnew)
then returns predictions on Xnew
of the machine mach_optimal
.
If measure
is a score, rather than a loss, specify minimize=false
.
The optimal clone of model
is accessible as fitted_params(mach).best_model
. In the case of two-parameter tuning, a Plots.jl plot of performance estimates is returned by plot(mach)
or heatmap(mach)
.
MLJ.evaluate!
— Method.evaluate!(mach, resampling=CV(), measure=nothing, operation=predict, verbosity=1)
Estimate the performance of a machine mach
using the specified resampling
strategy (defaulting to 6-fold cross-validation) and measure
, which can be a single measure or vector.
Although evaluate! is mutating, mach.model
and mach.args
are preserved.
Resampling and testing is based exclusively on data in rows
, when specified.
If no measure is specified, then default_measure(mach.model)
is used, unless this default is nothing
and an error is thrown.
MLJ.iterator
— Method.iterator(model::Model, param_iterators::NamedTuple)
Iterator over all models of type typeof(model)
defined by param_iterators
.
Each name
in the nested :name => value
pairs of param_iterators
should be the name of a (possibly nested) field of model
; and each element of flat_values(param_iterators)
(the corresponding final values) is an iterator over values of one of those fields.
See also iterator
and params
.
MLJ.learning_curve!
— Method.curve = learning_curve!(mach; resolution=30, resampling=Holdout(), measure=rms, operation=predict, nested_range=nothing, n=1)
Given a supervised machine mach
, returns a named tuple of objects needed to generate a plot of performance measurements, as a function of the single hyperparameter specified in nested_range
. The tuple curve
has the following keys: :parameter_name
, :parameter_scale
, :parameter_values
, :measurements
.
For n
not equal to 1, multiple curves are computed, and the value of curve.measurements
is an array, one column for each run. This is useful in the case of models with indeterminate fit-results, such as a random forest.
X, y = datanow()
atom = RidgeRegressor()
ensemble = EnsembleModel(atom=atom)
mach = machine(ensemble, X, y)
r_lambda = range(atom, :lambda, lower=0.1, upper=100, scale=:log10)
curve = MLJ.learning_curve!(mach; nested_range=(atom=(lambda=r_lambda,),))
using Plots
plot(curve.parameter_values, curve.measurements, xlab=curve.parameter_name, xscale=curve.parameter_scale)
Smart fitting applies. For example, if the model is an ensemble model, and the hyperparemeter parameter is n
, then atomic models are progressively added to the ensemble, not recomputed from scratch for each new value of n
.
atom.lambda=1.0
r_n = range(ensemble, :n, lower=2, upper=150)
curves = MLJ.learning_curve!(mach; nested_range=(n=r_n,), verbosity=3, n=5)
plot(curves.parameter_values, curves.measurements, xlab=curves.parameter_name)
MLJ.rmsp
— Method.Root mean squared percentage loss
MLJ.sources
— Method.sources(N)
Return a list of all ultimate sources of a node N
.
See also: node, source
MLJBase.info
— Method.info(model, pkg=nothing)
Return the dictionary of metadata associated with model::String
. If more than one package implements model
then pkg::String
will need to be specified.
StatsBase.fit!
— Method.fit!(mach::Machine; rows=nothing, verbosity=1)
Train the machine mach
using the algorithm and hyperparameters specified by mach.model
, using those rows of the wrapped data having indices in rows
.
fit!(mach::NodalMachine; rows=nothing, verbosity=1)
A nodal machine is trained in the same way as a regular machine with one difference: Instead of training the model on the wrapped data indexed on rows
, it is trained on the wrapped nodes called on rows
, with calling being a recursive operation on nodes within a learning network.
MLJ.@curve
— Macro.@curve
The code,
@curve var range code
evaluates code
, replacing appearances of var
therein with each value in range
. The range and corresponding evaluations are returned as a tuple of arrays. For example,
@curve x 1:3 (x^2 + 1)
evaluates to
([1,2,3], [2, 5, 10])
This is convenient for plotting functions using, eg, the Plots
package:
plot(@curve x 1:3 (x^2 + 1))
A macro @pcurve
parallelizes the same behaviour. A two-variable implementation is also available, operating as in the following example:
julia> @curve x [1,2,3] y [7,8] (x + y)
([1,2,3],[7 8],[8.0 9.0; 9.0 10.0; 10.0 11.0])
julia> ans[3]
3×2 Array{Float64,2}:
8.0 9.0
9.0 10.0
10.0 11.0
N.B. The second range is returned as a row vector for consistency with the output matrix. This is also helpful when plotting, as in:
julia> u1, u2, A = @curve x range(0, stop=1, length=100) α [1,2,3] x^α
julia> u2 = map(u2) do α "α = "*string(α) end
julia> plot(u1, A, label=u2)
which generates three superimposed plots - of the functions x, x^2 and x^3 - each labels with the exponents α = 1, 2, 3 in the legend.
SimpleDeterministicCompositeModel(;regressor=ConstantRegressor(),
transformer=FeatureSelector())
Construct a composite model consisting of a transformer (Unsupervised
model) followed by a Deterministic
model. Mainly intended for internal testing .
Base.copy
— Function.copy(params::NamedTuple, values=nothing)
Return a copy of params
with new values
. That is, flat_values(copy(params, values)) == values
is true, while the nested keys remain unchanged.
If values
is not specified a deep copy is returned.
Base.merge!
— Method.merge!(tape1, tape2)
Incrementally appends to tape1
all elements in tape2
, excluding any element previously added (or any element of tape1
in its initial state).
Base.range
— Method.r = range(model, :hyper; values=nothing)
Defines a NominalRange
object for a field hyper
of model
. Note that r
is not directly iterable but iterator(r)
iterates over values
.
r = range(model, :hyper; upper=nothing, lower=nothing, scale=:linear)
Defines a NumericRange
object for a field hyper
of model
. Note that r
is not directly iteratable but iterator(r, n)
iterates over n
values between lower
and upper
values, according to the specified scale
. The supported scales are :linear, :log, :log10, :log2
. Values for Integer
types are rounded (with duplicate values removed, resulting in possibly less than n
values).
Alternatively, if a function f
is provided as scale
, then iterator(r, n)
iterates over the values [f(x1), f(x2), ... , f(xn)]
, where x1, x2, ..., xn
are linearly spaced between lower
and upper
.
See also: iterator
MLJ.flat_keys
— Method. flat_keys(params::NamedTuple)
Use dot-concatentation to express each possibly nested key of params
in string form.
Example
julia> flat_keys((A=(x=2, y=3), B=9)))
["A.x", "A.y", "B"]
MLJ.get_type
— Method.get_type(T, field::Symbol)
Returns the type of the field field
of DataType
T. Not a type-stable function.
MLJ.scale
— Method.MLJ.scale(r::ParamRange)
Return the scale associated with the ParamRange
object r
. The possible return values are: :none
(for a NominalRange
), :linear
, :log
, :log10
, :log2
, or :custom
(if r.scale
is function).
MLJ.unwind
— Method.unwind(iterators...)
Represent all possible combinations of values generated by iterators
as rows of a matrix A
. In more detail, A
has one column for each iterator in iterators
and one row for each distinct possible combination of values taken on by the iterators. Elements in the first column cycle fastest, those in the last clolumn slowest.
Example
julia> iterators = ([1, 2], ["a","b"], ["x", "y", "z"]);
julia> MLJ.unwind(iterators...)
12×3 Array{Any,2}:
1 "a" "x"
2 "a" "x"
1 "b" "x"
2 "b" "x"
1 "a" "y"
2 "a" "y"
1 "b" "y"
2 "b" "y"
1 "a" "z"
2 "a" "z"
1 "b" "z"
2 "b" "z"
MLJBase.StratifiedKFold
— Type.StratifiedKFold(strata,k)
Struct for StratifiedKFold provide strata's and number of partitions(k) and simply collect the object for the indices. Taken from MLBase (https://github.com/JuliaStats/MLBase.jl).
MLJBase.SupervisedTask
— Method.task = SupervisedTask(data=nothing, is_probabilistic=false, target=nothing, ignore=Symbol[], verbosity=1)
Construct a supervised learning task with input features X
and target y
, where: y
is the column vector from data
named target
, if this is a single symbol, or, a vector of tuples, if target
is a vector; X
consists of all remaining columns of data
not named in ignore
, and is a table unless it has only one column, in which case it is a vector.
task = SupervisedTask(X, y; is_probabilistic=false, input_is_multivariate=true, verbosity=1)
A more customizable constructor, this returns a supervised learning task with input features X
and target y
, where: X
must be a table or vector, according to whether it is multivariate or univariate, while y
must be a vector whose elements are scalars, or tuples scalars (of constant length for ordinary multivariate predictions, and of variable length for sequence prediction). Table rows must correspond to patterns and columns to features.
X, y = task()
Returns the input X
and target y
of the task, also available as task.X
and task.y
.
MLJBase.UnivariateNominal
— Type.UnivariateNominal(prob_given_level)
A discrete univariate distribution whose finite support is the set of keys of the provided dictionary, prob_given_level
. The dictionary values specify the corresponding probabilities, which must be nonnegative and sum to one.
UnivariateNominal(levels, p)
A discrete univariate distribution whose finite support is the elements of the vector levels
, and whose corresponding probabilities are elements of the vector p
.
levels(d::UnivariateNominal)
Return the levels of d
.
d = UnivariateNominal(["yes", "no", "maybe"], [0.1, 0.2, 0.7])
pdf(d, "no") # 0.2
mode(d) # "maybe"
rand(d, 5) # ["maybe", "no", "maybe", "maybe", "no"]
d = fit(UnivariateNominal, ["maybe", "no", "maybe", "yes"])
pdf(d, "maybe") ≈ 0.5 # true
levels(d) # ["yes", "no", "maybe"]
If v
is a CategoricalVector
then fit(UnivariateNominal, v)
includes all levels in pool of v
in its support, assigning unseen levels probability zero.
MLJBase.UnsupervisedTask
— Method.task = UnsupervisedTask(data=nothing, ignore=Symbol[], verbosity=1)
Construct an unsupervised learning task with given input data
, which should be a table or, in the case of univariate inputs, a single vector.
Rows of data
must correspond to patterns and columns to features. Columns in data
whose names appear in ignore
are ignored.
X = task()
Return the input data in form to be used in models.
MLJBase.container_type
— Method.container_type(X)
Return :table
, :sparse
, or :other
, according to whether X
is a supported table format, a supported sparse table format, or something else.
The first two formats, together abstract vectors, support the MLJBase
accessor methods selectrows
, selectcols
, select
, nrows
, schema
, and union_scitypes
.
MLJBase.datanow
— Method.Get some supervised data now!!
MLJBase.fitresult_type
— Method.MLJBase.fitresult_type(m)
Returns the fitresult type of any supervised model (or model type) m
, as declared in the model mutable struct
declaration.
MLJBase.load_ames
— Method.Load the full version of the well-known Ames Housing task.
MLJBase.load_boston
— Method.Load a well-known public regression dataset with nominal features.
MLJBase.load_crabs
— Method.Load a well-known crab classification dataset with nominal features.
MLJBase.load_iris
— Method.Load a well-known public classification task with nominal features.
MLJBase.load_reduced_ames
— Method.Load a reduced version of the well-known Ames Housing task, having six numerical and six categorical features.
MLJBase.matrix
— Method.MLJBase.matrix(X)
Convert a table source X
into an Matrix
; or, if X
is a AbstractMatrix
, return X
. Optimized for column-based sources.
If instead X is a sparse table, then a SparseMatrixCSC
object is returned. The integer relabelling of column names follows the lexicographic ordering (as indicated by schema(X).names
).
MLJBase.nrows
— Method.nrows(X)
Return the number of rows in a table, sparse table, or abstract vector.
MLJBase.params
— Method.params(m)
Recursively convert any object of subtype MLJType
into a named tuple, keyed on the fields of m
. The named tuple is possibly nested because params
is recursively applied to the field values, which themselves might be MLJType
objects.
Used, in particluar, in the case that m
is a model, to inspect its nested hyperparameters:
julia> params(EnsembleModel(atom=ConstantClassifier()))
(atom = (target_type = Bool,),
weights = Float64[],
bagging_fraction = 0.8,
rng_seed = 0,
n = 100,
parallel = true,)
MLJBase.partition
— Method.partition(rows::AbstractVector{Int}, fractions...; shuffle=false)
Splits the vector rows
into a tuple of vectors whose lengths are given by the corresponding fractions
of length(rows)
. The last fraction is not provided, as it is inferred from the preceding ones. So, for example,
julia> partition(1:1000, 0.2, 0.7)
(1:200, 201:900, 901:1000)
MLJBase.schema
— Method.schema(X)
Returns a struct with properties names
, types
with the obvious meanings. Here X
is any table or sparse table.
MLJBase.scitype
— Method.scitype(x)
Return the scientific type for scalar values that object x
can represent. If x
is a tuple, then Tuple{scitype.(x)...}
is returned.
julia> scitype(4.5)
Continous
julia> scitype("book")
Unknown
julia> scitype((1, 4.5))
Tuple{Count,Continuous}
julia> using CategoricalArrays
julia> v = categorical([:m, :f, :f])
julia> scitype(v[1])
Multiclass{2}
MLJBase.scitype_union
— Method.scitype_union(A)
Return the type union, over all elements x
generated by the iterable A
, of scitype(x)
.
MLJBase.scitypes
— Method.scitypes(X)
Returns a named tuple keyed on the column names of the table X
with values the corresponding scitype unions over a column's entries.
MLJBase.select
— Method.select(X, r, c)
Select element of a table or sparse table at row r
and column c
. In the case of sparse data where the key (r, c)
, zero or missing
is returned, depending on the value type.
See also: selectrows, selectcols
MLJBase.selectcols
— Method.selectcols(X, c)
Select single or multiple columns from any table or sparse table X
. If c
is an abstract vector of integers or symbols, then the object returned is a table of the preferred sink type of typeof(X)
. If c
is a single integer or column, then a Vector
or CategoricalVector
is returned.
MLJBase.selectrows
— Method.selectrows(X, r)
Select single or multiple rows from any table, sparse table, or abstract vector X
. If X
is tabular, the object returned is a table of the preferred sink type of typeof(X)
, even a single row is selected.
MLJBase.table
— Method.MLJBase.table(cols; prototype=cols)
Convert a named tuple of vectors cols
, into a table. The table type returned is the "preferred sink type" for prototype
(see the Tables.jl documentation).
MLJBase.table(X::AbstractMatrix; names=nothing, prototype=nothing)
Convert an abstract matrix X
into a table with names
(a tuple of symbols) as column names, or with labels (:x1, :x2, ..., :xn)
where n=size(X, 2)
, if names
is not specified. If prototype=nothing, then a named tuple of vectors is returned.
Equivalent to table(cols, prototype=prototype)
where cols
is the named tuple of columns of X
, with keys(cols) = names
.
MLJBase.@constant
— Macro.@constant x = value
Equivalent to const x = value
but registers the binding thus:
MLJBase.HANDLE_GIVEN_ID[objectid(value)] = :x
Registered objects get displayed using the variable name to which it was bound in calls to show(x)
, etc.
WARNING: As with any const
declaration, binding x
to new value of the same type is not prevented and the registration will not be updated.
MLJBase.@more
— Macro.@more
Entered at the REPL, equivalent to show(ans, 100)
. Use to get a recursive description of all fields of the last REPL value.
MLJBase._cummulative
— Method._cummulative(d::UnivariateNominal)
Return the cummulative probability vector [0, ..., 1]
for the distribution d
, using whatever ordering is used in the dictionary d.prob_given_level
. Used only for to implement random sampling from d
.
MLJBase._rand
— Method.rand(pcummulative)
Randomly sample the distribution with discrete support 1:n
which has cummulative probability vector p_cummulative=[0, ..., 1]
(of length n+1
). Does not check the first and last elements of p_cummulative
but does not use them either.
MLJBase._recursive_show
— Method._recursive_show(stream, object, current_depth, depth)
Generate a table of the field values of the MLJType
object, dislaying each value by calling the method _show
on it. The behaviour of _show(stream, f)
is as follows:
- If
f
is itself aMLJType
object, then its short form is shown
and _recursive_show
generates as separate table for each of its field values (and so on, up to a depth of argument depth
).
- Otherwise
f
is displayed as "(omitted T)" whereT = typeof(f)
,
unless istoobig(f)
is false (the istoobig
fall-back for arbitrary types being true
). In the latter case, the long (ie, MIME"plain/text") form of f
is shown. To override this behaviour, overload the _show
method for the type in question.
MLJBase.abbreviated
— Method.to display abbreviated versions of integers
MLJBase.handle
— Method.return abbreviated object id (as string) or it's registered handle (as string) if this exists
Index
MLJ.SimpleDeterministicCompositeModel
MLJ.Transformers.FeatureSelector
MLJ.Transformers.OneHotEncoder
MLJ.Transformers.Standardizer
MLJ.Transformers.UnivariateBoxCoxTransformer
MLJ.Transformers.UnivariateStandardizer
MLJ.node
MLJBase.CategoricalDecoder
MLJBase.StratifiedKFold
MLJBase.SupervisedTask
MLJBase.SupervisedTask
MLJBase.UnivariateNominal
MLJBase.UnsupervisedTask
MLJBase.UnsupervisedTask
Base.copy
Base.merge!
Base.range
MLJ.EnsembleModel
MLJ.TunedModel
MLJ.evaluate!
MLJ.flat_keys
MLJ.get_type
MLJ.iterator
MLJ.learning_curve!
MLJ.localmodels
MLJ.models
MLJ.rmsp
MLJ.scale
MLJ.source
MLJ.sources
MLJ.sources
MLJ.unwind
MLJBase._cummulative
MLJBase._rand
MLJBase._recursive_show
MLJBase.abbreviated
MLJBase.container_type
MLJBase.datanow
MLJBase.fitresult_type
MLJBase.handle
MLJBase.info
MLJBase.load_ames
MLJBase.load_boston
MLJBase.load_crabs
MLJBase.load_iris
MLJBase.load_reduced_ames
MLJBase.matrix
MLJBase.matrix
MLJBase.nrows
MLJBase.nrows
MLJBase.params
MLJBase.partition
MLJBase.schema
MLJBase.schema
MLJBase.scitype
MLJBase.scitype
MLJBase.scitype_union
MLJBase.scitype_union
MLJBase.scitypes
MLJBase.scitypes
MLJBase.select
MLJBase.select
MLJBase.selectcols
MLJBase.selectcols
MLJBase.selectrows
MLJBase.selectrows
MLJBase.table
MLJBase.table
StatsBase.fit!
StatsBase.fit!
MLJ.@curve
MLJBase.@constant
MLJBase.@more