API

Functions

MLJ.EnsembleModel — Method.

EnsembleModel(atom=nothing, 
              weights=Float64[],
              bagging_fraction=0.8,
              rng=GLOBAL_RNG, n=100,
              parallel=true,
              out_of_bag_measure=[])

Create a model for training an ensemble of n learners, with optional bagging, each with associated model atom. Ensembling is useful if fit!(machine(atom, data...)) does not create identical models on repeated calls (ie, is a stochastic model, such as a decision tree with randomized node selection criteria), or if bagging_fraction is set to a value less than 1.0, or both. The constructor fails if no atom is specified.

If rng is an integer, then MersenneTwister(rng) is the random number generator used for bagging. Otherwise some AbstractRNG object is expected.

Predictions are weighted according to the vector weights (to allow for external optimization) except in the case that atom is a Deterministic classifier. Uniform weights are used if weight has zero length.

The ensemble model is Deterministic or Probabilistic, according to the corresponding supertype of atom. In the case of deterministic classifiers (target_scitype_union(atom) <: Finite), the predictions are majority votes, and for regressors (target_scitype_union(atom)<: Continuous) they are ordinary averages. Probabilistic predictions are obtained by averaging the atomic probability distribution/mass functions; in particular, for regressors, the ensemble prediction on each input pattern has the type MixtureModel{VF,VS,D} from the Distributions.jl package, where D is the type of predicted distribution for atom.

If a single measure or non-empty vector of measusres is specified by out_of_bag_measure, then out-of-bag estimates of performance are reported.

source

MLJ.TunedModel — Method.

tuned_model = TunedModel(; model=nothing,
                         tuning=Grid(),
                         resampling=Holdout(),
                         measure=nothing,
                         operation=predict,
                         nested_ranges=NamedTuple(),
                         minimize=true,
                         full_report=true)

Construct a model wrapper for hyperparameter optimization of a supervised learner.

Calling fit!(mach) on a machine mach=machine(tuned_model, X, y) or mach=machine(tuned_model, task) will: (i) Instigate a search, over clones of model with the hyperparameter mutations specified by nested_ranges, for that model optimizing the specified measure, according to evaluations carried out using the specified tuning strategy and resampling strategy; and (ii) Fit a machine, mach_optimal = fitted_params(mach).best_model, wrapping the optimal model object in all the provided data X, y (or in task). Calling predict(mach, Xnew) then returns predictions on Xnew of the machine mach_optimal.

If measure is a score, rather than a loss, specify minimize=false.

In the case of two-parameter tuning, a Plots.jl plot of performance estimates is returned by plot(mach) or heatmap(mach).

source

MLJ.coerce — Method.

coerce(d::Dict, X)

Return a copy of the table X with columns named in the keys of d coerced to have scitype_union equal to the corresponding value.

source

MLJ.coerce — Method.

coerce(T, v::AbstractVector)

Coerce the machine types of elements of v to ensure the returned vector has T as its scitype_union, or Union{Missing,T}, if v has missing values.

julia> v = coerce(Continuous, [1, missing, 5])
3-element Array{Union{Missing, Float64},1}:
 1.0     
 missing
 5.0  

julia> scitype_union(v)
Union{Missing,Continuous}

See also scitype, scitype_union, scitypes

source

MLJ.evaluate! — Method.

evaluate!(mach, resampling=CV(), measure=nothing, operation=predict, force=false, verbosity=1)

Estimate the performance of a machine mach using the specified resampling strategy (defaulting to 6-fold cross-validation) and measure, which can be a single measure or vector.

Although evaluate! is mutating, mach.model and mach.args are preserved.

Resampling and testing is based exclusively on data in rows, when specified.

If no measure is specified, then default_measure(mach.model) is used, unless this default is nothing and an error is thrown.

source

MLJ.iterator — Method.

iterator(model::Model, param_iterators::NamedTuple)

Iterator over all models of type typeof(model) defined by param_iterators.

Each name in the nested :name => value pairs of param_iterators should be the name of a (possibly nested) field of model; and each element of flat_values(param_iterators) (the corresponding final values) is an iterator over values of one of those fields.

See also: orgins, source

source

MLJ.supervised — Method.

task = supervised(data=nothing, 
                  types=Dict(), 
                  target=nothing,  
                  ignore=Symbol[], 
                  is_probabilistic=false, 
                  verbosity=1)

Construct a supervised learning task with input features X and target y, where: y is the column vector from data named target, if this is a single symbol, or a vector of tuples, if target is a vector; X consists of all remaining columns of data not named in ignore, and is a table unless it has only one column, in which case it is a vector.

The data types of elements in a column of data named as a key of the dictionary types are coerced to have a scientific type given by the corresponding value. Possible values are Continuous, Multiclass, OrderedFactor and Count. So, for example, types=Dict(:x1=>Count) means elements of the column of data named :x1 will be coerced into integers (whose scitypes are always Count).

task = supervised(X, y; 
                  input_is_multivariate=true, 
                  is_probabilistic=false, 
                  verbosity=1)

A more customizable constructor, this returns a supervised learning task with input features X and target y, where: X must be a table or vector, according to whether it is multivariate or univariate, while y must be a vector whose elements are scalars, or tuples scalars (of constant length for ordinary multivariate predictions, and of variable length for sequence prediction). Table rows must correspond to patterns and columns to features. Type coercion is not available for this constructor (but see also coerce).

X, y = task()

Returns the input X and target y of the task, also available as task.X and task.y.

source

MLJ.unsupervised — Method.

task = unsupervised(data=nothing, types=Dict(), ignore=Symbol[], verbosity=1)

Construct an unsupervised learning task with given input data, which should be a table or, in the case of univariate inputs, a single vector.

Rows of data must correspond to patterns and columns to features. Columns in data whose names appear in ignore are ignored.

X = task()

Return the input data in form to be used in models.

source

MLJBase.info — Method.

info(model, pkg=nothing)

Return the dictionary of metadata associated with model::String. If more than one package implements model then pkg::String will need to be specified.

source

MLJ.@curve — Macro.

@curve

The code,

@curve var range code

evaluates code, replacing appearances of var therein with each value in range. The range and corresponding evaluations are returned as a tuple of arrays. For example,

@curve  x 1:3 (x^2 + 1)

evaluates to

([1,2,3], [2, 5, 10])

This is convenient for plotting functions using, eg, the Plots package:

plot(@curve x 1:3 (x^2 + 1))

A macro @pcurve parallelizes the same behaviour. A two-variable implementation is also available, operating as in the following example:

julia> @curve x [1,2,3] y [7,8] (x + y)
([1,2,3],[7 8],[8.0 9.0; 9.0 10.0; 10.0 11.0])

julia> ans[3]
3×2 Array{Float64,2}:
  8.0   9.0
  9.0  10.0
 10.0  11.0

N.B. The second range is returned as a row vector for consistency with the output matrix. This is also helpful when plotting, as in:

julia> u1, u2, A = @curve x range(0, stop=1, length=100) α [1,2,3] x^α
julia> u2 = map(u2) do α "α = "*string(α) end
julia> plot(u1, A, label=u2)

which generates three superimposed plots - of the functions x, x^2 and x^3 - each labels with the exponents α = 1, 2, 3 in the legend.

source

MLJ.@from_network — Macro.

@fromnetwork NewCompositeModel(fld1=model1, fld2=model2, ...) <= (Xs, N) @fromnetwork NewCompositeModel(fld1=model1, fld2=model2, ...) <= (Xs, ys, N)

Create, respectively, a new stand-alone unsupervised or superivsed model type NewCompositeModel using a learning network as a blueprint. Here Xs, ys and N refer to the input source, node, target source node and terminating source node of the network. The model type NewCompositeModel is equipped with fields named :fld1, :fld2, ..., which correspond to component models model1, model2 appearing in the network (which must therefore be elements of models(N)). Deep copies of the specified component models are used as default values in an automatically generated keyword constructor for NewCompositeModel.

Return value: A new NewCompositeModel instance, with default field values.

For details and examples refer to the "Learning Networks" section of the documentation.

source

MLJ.SimpleDeterministicCompositeModel — Type.

SimpleDeterministicCompositeModel(;regressor=ConstantRegressor(), 
                          transformer=FeatureSelector())

Construct a composite model consisting of a transformer (Unsupervised model) followed by a Deterministic model. Mainly intended for internal testing .

source

Base.copy — Function.

copy(params::NamedTuple, values=nothing)

Return a copy of params with new values. That is, flat_values(copy(params, values)) == values is true, while the nested keys remain unchanged.

If values is not specified a deep copy is returned.

source

Base.range — Method.

r = range(model, :hyper; values=nothing)

Defines a `NominalRange` object for a field `hyper` of `model`,

assuming the field is a not a subtype of Real. Note that r is not directly iterable but iterator(r) iterates over values.

r = range(model, :hyper; upper=nothing, lower=nothing, scale=:linear)

Defines a NumericRange object for a Real field hyper of model. Note that r is not directly iteratable but iterator(r, n) iterates over n values between lower and upper values, according to the specified scale. The supported scales are :linear, :log, :log10, :log2. Values for Integer types are rounded (with duplicate values removed, resulting in possibly less than n values).

Alternatively, if a function f is provided as scale, then iterator(r, n) iterates over the values [f(x1), f(x2), ... , f(xn)], where x1, x2, ..., xn are linearly spaced between lower and upper.

Index

MLJ.SimpleDeterministicCompositeModel
MLJ.Transformers.FeatureSelector
MLJ.Transformers.OneHotEncoder
MLJ.Transformers.Standardizer
MLJ.Transformers.UnivariateBoxCoxTransformer
MLJ.Transformers.UnivariateStandardizer
MLJ.node
MLJBase.StratifiedKFold
MLJBase.UnivariateFinite
Base.copy
Base.range
Base.replace
MLJ.EnsembleModel
MLJ.TunedModel
MLJ.coerce
MLJ.coerce
MLJ.coerce
MLJ.evaluate!
MLJ.flat_keys
MLJ.get_type
MLJ.iterator
MLJ.learning_curve!
MLJ.localmodels
MLJ.machines
MLJ.models
MLJ.models
MLJ.origins
MLJ.rebind!
MLJ.reset!
MLJ.rmsp
MLJ.scale
MLJ.set_params!
MLJ.source
MLJ.sources
MLJ.sources
MLJ.supervised
MLJ.supervised
MLJ.tree
MLJ.unsupervised
MLJ.unsupervised
MLJ.unwind
MLJBase._cummulative
MLJBase._rand
MLJBase._recursive_show
MLJBase.abbreviated
MLJBase.classes
MLJBase.classes
MLJBase.color_off
MLJBase.color_on
MLJBase.container_type
MLJBase.datanow
MLJBase.decoder
MLJBase.decoder
MLJBase.handle
MLJBase.info
MLJBase.int
MLJBase.int
MLJBase.load_ames
MLJBase.load_boston
MLJBase.load_crabs
MLJBase.load_iris
MLJBase.load_reduced_ames
MLJBase.matrix
MLJBase.matrix
MLJBase.nrows
MLJBase.nrows
MLJBase.params
MLJBase.partition
MLJBase.schema
MLJBase.schema
MLJBase.scitype
MLJBase.scitype
MLJBase.scitype_union
MLJBase.scitype_union
MLJBase.scitypes
MLJBase.scitypes
MLJBase.select
MLJBase.select
MLJBase.selectcols
MLJBase.selectcols
MLJBase.selectrows
MLJBase.selectrows
MLJBase.table
MLJBase.table
StatsBase.fit!
StatsBase.fit!
MLJ.@curve
MLJ.@from_network
MLJBase.@constant
MLJBase.@more
MLJBase.@set_defaults