MLJ News
News for MLJ and its satellite packages: MLJBase, MLJModels, and ScientificTypes.
Latest release notes
MLJ (general users)
MLJBase | MLJModels | ScientificTypes (mainly for developers)
News
Note: New patch releases are no longer being announced below. Refer to the links above for complete release notes.
30 Oct 2019
MLJModels 0.5.3 released.
MLJBase 0.7.2 released.
22 Oct 2019
MLJ 0.5.1 released.
21 Oct 2019
MLJBase 0.7.1 released.
ScientificTypes 0.2.2 released.
MLJModels 0.5.2 released.
17 Oct 2019
MLJBase 0.7 released.
11 Oct 2019
MLJModels 0.5.1 released.
30 Sep 2019
MLJ 0.5 released.
29 Sep 2019
MLJModels 0.5 released.
26 Sep 2019
MLJBase 0.6 released.
Older release notes
MLJ 0.4.0
(Enhancment) Update to MLJBase 0.5.0 and MLJModels 0.4.0. In particular, this updates considerably the list of wrapped scikit-learn models available to the MLJ user:
- ScikitLearn.jl
- SVM:
SVMClassifier
,SVMRegressor
,SVMNuClassifier
,SVMNuRegressor
,SVMLClassifier
,SVMLRegressor
, - Linear Models (regressors):
ARDRegressor
,BayesianRidgeRegressor
,ElasticNetRegressor
,ElasticNetCVRegressor
,HuberRegressor
,LarsRegressor
,LarsCVRegressor
,LassoRegressor
,LassoCVRegressor
,LassoLarsRegressor
,LassoLarsCVRegressor
,LassoLarsICRegressor
,LinearRegressor
,OrthogonalMatchingPursuitRegressor
,OrthogonalMatchingPursuitCVRegressor
,PassiveAggressiveRegressor
,RidgeRegressor
,RidgeCVRegressor
,SGDRegressor
,TheilSenRegressor
- SVM:
- ScikitLearn.jl
(Enhancement) The macro
@pipeline
allows one to construct linear (non-branching) pipeline composite models with one line of code. One may include static transformations (ordinary functions) in the pipeline, as well as target transformations for the supervised case (when one component model is supervised).(Breaking) Source nodes (type
Source
) now have akind
field, which is either:input
,:target
or:other
, with:input
the default value in thesource
constructor. If building a learning network, and the network is to be exported as a standalone model, then it is now necessary to tag the source nodes accordingly, as inXs = source(X)
andys = source(y, kind=:target)
.(Breaking) By virtue of the preceding change, the syntax for exporting a learning network is simplified. Do
?@from_network
for details. Also, one now usesfitresults(N)
instead offit results(N, X, y)
andfitresults(N, X)
when exporting a learning networkN
"by hand"; see the updated manual for details.(Breaking) One must explicitly state if a supervised learning network being exported with
@from_network
is probabilistic by addingis_probablistic=true
to the macro expression. Before, this information was unreliably inferred from the network.(Enhancement) Add macro-free method for loading model code into an arbitrary module. Do
?load
for details.(Enhancement)
@load
now returns a mode instance with default hyperparameters (instead of nothing), as intree_model = @load DecisionTreeRegressor
(Breaking)
info("PCA")
now returns a named-tuple, instead of a dictionary, of the properties of a the model named "PCA"(Breaking) The list returned by
models(conditional)
is now a list of complete metadata entries (named-tuples, as returned byinfo
). An entryproxy
appears in the list exactly whenconditional(proxy) == true
. Model query is simplified; for examplemodels() do model model.is_supervised && model.is_pure_julia end
finds all pure julia supervised models.(Bug fix) Introduce new private methods to avoid relying on MLJBase type piracy MLJBase #30.
(Enhancement) If
composite
is a a learning network exported as a model, andm = machine(composite, args...)
thenreport(m)
returns the reports for each machine in the learning network, and similarly forfitted_params(m)
.(Enhancement)
MLJ.table
,vcat
andhcat
now overloaded forAbstractNode
, so that they can immediately be used in defining learning networks. For example, ifX = source(rand(20,3))
andy=source(rand(20))
thenMLJ.table(X)
andvcat(y, y)
both make sense and define new nodes.(Enhancement)
pretty(X)
prints a pretty version of any tableX
, complete with types and scitype annotations. Do?pretty
for options. A wrap ofpretty_table
fromPrettyTables.jl
.(Enhancement)
std
is re-exported fromStatistics
(Enhancement) The manual and MLJ cheatsheet have been updated.
Performance measures have been migrated to MLJBase, while the model registry and model load/search facilities have migrated to MLJModels. As relevant methods are re-exported to MLJ, this is unlikely to effect many users.
MLJModels 0.4.0
(Enhancement) Add a number of scikit-learn model wraps. See the above MLJ 0.4.0 release notes for a detailed list.
The following have all been migrated to MLJModels from MLJ:
- MLJ's built-in models (e.g., basic transformers such as
OneHotEncoder
)
- MLJ's built-in models (e.g., basic transformers such as
- The model registry metadata (src/registry/METADATA.toml)
- The metadata `@update` facility for administrator registration
of new models
- The `@load` macro and `load` function for loading code for a registered model
- The `models` and `localmodels` model-search functions
- The `info` command for returning the metadata entry of a model
- (Breaking) MLJBase v0.5.0, which introduces some changes and additions to model traits, is a requirement, meaning the format of metadata as changed.
- (Breaking) The
model
method for retrieving model metadata has been renamed back toinfo
, but continues to return a named-tuple. (TheMLJBase.info
method, returning the dictionary form of the metadata, is now calledMLJBase.info_dic
).
MLJBase 0.5.0
Bump ScientificTypes requirement to v0.2.0
(Enhancement) The performance measures API (built-in measures + adaptor for external measures) from MLJ has been migrated to MLJBase. MLJ.
(Breaking)
info
, which returns a dictionary (needed for TOML serialization) is renamed toinfo_dic
. In this way "info" is reserved for a method in MLJModels/MLJ that returns a more-convenient named-tuple(Breaking) The
is_probabilistic
model trait is replaced withprediction_type
, which can have the values:deterministic
,:probabilistic
or:interval
, to allow for models predicting real intervals, and for consistency with measures API.(Bug fix, mildly breaking) The
package_license
model trait is now included ininfo_dict
in the case of unsupervisd models.(Enhancement, mildly breaking) Add new model traits
hyperparameters
,hyperparameter_types
,docstring
, andimplemented_operations
(fit
,predict
,inverse_transform
, etc) (#36, #37, #38)(Enhancement) The
MLJBase.table
andMLJBase.matrix
operations are now direct wraps of the correspondingTables.jl
operations for improved performance. In particularMLJBase.matrix(MLJBase.table(A))
is essentially a non-operation, and one can passMLJBase.matrix
the keyword argumenttranspose=...
.(Breaking) The built-in dataset methods
load_iris
,load_boston
,load_ames
,load_reduced_ames
,load_crabs
return a rawDataFrame
, instead of anMLJTask
object, and continue to requireimport CSV
to become available. However, macro versions@load_iris
, etc, are always available, automatically triggeringimport CSV
; these macros return a tuple(X, y)
of inputDataFrame
and target vectory
, with scitypes appropriately coerced. (MLJ #224)(Enhancement)
selectrows
now works for matrices. Needed to allow matrices as "node type" in MLJ learning networks; see MLJ #209.(Bug) Fix problem with
==
forMLJType
objects (#35)(Breaking) Update requirement on ScientficTypes.jl to v0.2.0 to mitigate bug with coercion of column scitypes for tables that are also AbstractVectors, and to make
coerce
more convenient.(Enhancement) Add new method
unpack
for splitting tables, as iny, X = unpack(df,==(:target),!=(:dummy))
. See doc-string for details.(Bug fix) Remove type piracy in get/setproperty! (#30)
ScientificTypes 0.2.0
(Breaking) The argument order is switched in
coerce
methods. So now usecoerce(v, T)
for a vectorv
and scientific typeT
andcoerce(X, d)
for a tableX
and dictionaryd
.(Feature) You can now call
coerce
on tables without needing to wrap specs in a dictionary, as inscitype(X, :age => Continuous, :ncalls => Count)
.
ScientficTypes 0.1.3
MLJ 0.4.0
Introduction of traits for measures (loss functions, etc); see top of /src/measures.jl for definitions. This
- allows user to use loss functions from LossFunctions.jl,
- enables improved measure checks and error message reporting with measures
- allows
evaluate!
to report per-observation measures when available (for later use by Bayesian optimisers, for example) - allows support for sample-weighted measures playing nicely with rest of API
Improvements to resampling:
evaluate!
method now reports per-observation measures when available- sample weights can be passed to
evaluate!
for use by measures that support weights - user can pass a list of train/evaluation pairs of row indices directly to
evaluate!
, in place of aResamplingStrategy
object - implementing a new
ResamplingStrategy
is now straightforward (see docs) - one can call
evaluate
(no exclamation mark) directly on model + data without first constructing a machine, if desired
Doc strings and the manual have been revised and updated. The manual includes a new section "Tuning models", and extra material under "Learning networks" explaining how to export learning networks as stand-alone models using the
@from_network
macro.Improved checks and error-reporting for binding models to data in machines.
(Breaking) CSV is now an optional dependency, which means you now need to import CSV before you can load tasks with
load_boston()
,load_iris()
,load_crabs()
,load_ames()
,load_reduced_ames()
Added
schema
method for tables (re-exported from ScientificTypes.jl). Returns a named tuple with keys:names
,:types
,:scitypes
and:nrows
.(Breaking) Eliminate
scitypes
method. The scientific types of a table are returned as part of ScientificTypesschema
method (see above)
MLJModels 0.3.0
MLJBase v0.4.0
ScientificTypes 0.1.2
- New package to which the scientific types API has been moved (from MLJBase).
MLJBase v0.3.0
- Make CSV an optional dependency (breaking). To use
load_iris()
,load_ames()
, etc, need first to import CSV.
MLJBase v0.2.4
Add ColorImage and GreyImage scitypes
Overload
in
method for subtypes ofModel
(apparently causing Julia crashes in an untagged commit, because of a method signature ambiguity, now resolved).
MLJ v0.2.5
Add MLJ cheatsheet
Allow
models
to query specific traits, in addition to tasks. Query?models
for detailsadd
@from_networks
macro for exporting learning networks as models (experimental).
MLJModels v0.2.4
- Add compatibility requirement MLJBase="0.2.3"
MLJBase v0.2.3
- Small changes on definitions of
==
andisequal
forMLJType
objects. In particular, fields that are random number generators may change state without effecting an object's==
equivalence class. - Add
@set_defaults
macro for generating keywork constructors forModel
subtypes.
- Add abstract type
UnsupervisedNetwork <: Unsupervised
.
MLJ v0.2.3
Fixed bug in models(::MLJTask) method which excluded some relevant models. (#153)
Fixed some broken links to the tour.ipynb.
MLJ v0.2.2
Resolved these isssues:
- Specifying new rows in calls to
fit!
on a Node not triggering retraining. (#147)
- Specifying new rows in calls to
- fit! of Node sometimes calls `update` on model when it should
call `fit` on model
[(#146)](https://github.com/alan-turing-institute/MLJ.jl/issues/146)
- Error running the tour.ipynb notebook
[(#140)](https://github.com/alan-turing-institute/MLJ.jl/issues/140)
- For reproducibility, include a Manifest.toml file with all
examples. [(#137)](https://github.com/alan-turing-institute/MLJ.jl/issues/137)
- Activated overalls code coverage (#131)
Removed local version of MultivariateStats (now in MLJModels, see below).
Minor changes to OneHotEncoder, in line with scitype philosophy.
MLJBase v0.2.2
Fix some minor bugs.
Added compatibility requirement CSV v0.5 or higher to allow removal of
allowmissing
keyword inCSV.read
, which is to be depreciated.
Announcement: MLJ tutorial and development sprint
- Details here Applications close May 29th 5pm (GMTT + 1 = London)
MLJModels v0.2.3
- The following support vector machine models from LIBSVM.jl have been added: EpsilonSVR, LinearSVC, NuSVR, NuSVC, SVC, OneClassSVM.
MLJModels v0.2.2
- MulitivariateStats models RidgeRegressor and PCA migrated here from MLJ. Addresses: MLJ #125.
MLJModels v0.2.1
- ScikitLearn wraps ElasticNet and ElasticNetCV now available (and registered at MLJRegistry). Resolves: MLJ #112
MLJ v0.2.1
- Fix a bug and related problem in "Getting Started" docs: [#126](https://github.com/alan-turing-institute/MLJ.jl/issues/126 .
MLJBase 0.2.0, MLJModels 0.2.0, MLJ 0.2.0
- Model API refactored to resolve #93 and #119 and hence simplify the model interface. This breaks all implementations of supervised models, and some scitype methods. However, for the regular user the effects are restricted to: (i) no more
target_type
hyperparameter for some models; (ii)Deterministic{Node}
is nowDeterministicNetwork
andProbabillistic{Node}
is nowProbabilisticNetwork
when exporting learning networks as models. - New feature: Task constructors now allow the user to explicitly specify scitypes of features/target. There is a
coerce
method for vectors and tables for the user who wants to do this manually. Resolves: #119
Official registered versions of MLJBase 0.1.1, MLJModels 0.1.1, MLJ 0.1.1 released
- Minor revisions to the repos, doc updates, and a small breaking change around scitype method names and associated traits. Resolves: #119
unversioned commits 12 April 2019 (around 00:10, GMT)
- Added out-of-bag estimates for performance in homogeneous ensembles. Resolves: #77
unversioned commits 11 April 2019 (before noon, GMT)
- Removed dependency on unregistered package TOML.jl (using, Pkg.TOML instead). Resolves #113
unversioned commits 8 April 2019 (some time after 20:00 GMT)
Addition of XGBoost models XGBoostRegressor, XGBoostClassifier and XGBoostCount. Resolves #65.
Documentation reorganized as GitHub pages. Includes some additions but still a work in progress.
unversioned commits 1 March 2019 (some time after 03:50 GMT)
Addition of "scientific type" hierarchy, including
Continuous
,Discrete
,Multiclass
, andOther
subtypes ofFound
(to complementMissing
). See Getting Started for more one this. Resolves: #86Revamp of model traits to take advantage of scientific types, with
output_kind
replaced withtarget_scitype_union
,input_kind
replaced withinput_scitype
. Also,output_quantity
dropped,input_quantity
replaced withBool
-valuedinput_is_multivariate
, andis_pure_julia
madeBool
-valued. Trait definitions in all model implementations and effected meta-algorithms have been updated. Related: #81Substantial update of the core guide Adding New Models to reflect above changes and in response to new model implementer queries. Some design "decisions" regarding multivariate targets now explict there.
the order the
y
andyhat
arguments of measures (aka loss functions) have been reversed. Progress on: #91Update of Standardizer and OneHotEncoder to mesh with new scitypes.
New improved task constructors infer task metadata from data scitypes. This brings us close to a simple implementation of basic task-model matching. Query the doc-strings for
SupervisedTask
andUnsupervisedTask
for details. Machines can now dispatch on tasks instead ofX
andy
. A task,task
, is now callable:task()
returns(X, y)
for supervised models, andX
for unsupervised models. Progress on: #86the data in the
load_ames()
test task has been replaced by the full data set, andload_reduced_ames()
now loads a reduced set.