OpenML Integration
OpenML provides an integration platform for carrying out and comparing machine learning solutions across a broad collection of public datasets and software platforms. Integration of MLJ with OpenML is a work in progress.
Loading IRIS Dataset
As an example, we will try to load the iris dataset using OpenML.load(taskID)
.
using MLJ.MLJBase
Task ID
OpenML.load
requires task ID of the the dataset to be loaded. This ID can be found on the OpenML website. The task ID for the iris dataset is 61, as mentioned in this OpenML Page
julia> rowtable = OpenML.load(61)
150-element Array{NamedTuple{(:sepallength, :sepalwidth, :petallength, :petalwidth, :class),Tuple{Float64,Float64,Float64,Float64,SubString{String}}},1}:
(sepallength = 5.1, sepalwidth = 3.5, petallength = 1.4, petalwidth = 0.2, class = "Iris-setosa")
(sepallength = 4.9, sepalwidth = 3.0, petallength = 1.4, petalwidth = 0.2, class = "Iris-setosa")
(sepallength = 4.7, sepalwidth = 3.2, petallength = 1.3, petalwidth = 0.2, class = "Iris-setosa")
(sepallength = 4.6, sepalwidth = 3.1, petallength = 1.5, petalwidth = 0.2, class = "Iris-setosa")
(sepallength = 5.0, sepalwidth = 3.6, petallength = 1.4, petalwidth = 0.2, class = "Iris-setosa")
(sepallength = 5.4, sepalwidth = 3.9, petallength = 1.7, petalwidth = 0.4, class = "Iris-setosa")
(sepallength = 4.6, sepalwidth = 3.4, petallength = 1.4, petalwidth = 0.3, class = "Iris-setosa")
(sepallength = 5.0, sepalwidth = 3.4, petallength = 1.5, petalwidth = 0.2, class = "Iris-setosa")
(sepallength = 4.4, sepalwidth = 2.9, petallength = 1.4, petalwidth = 0.2, class = "Iris-setosa")
(sepallength = 4.9, sepalwidth = 3.1, petallength = 1.5, petalwidth = 0.1, class = "Iris-setosa")
⋮
(sepallength = 6.9, sepalwidth = 3.1, petallength = 5.1, petalwidth = 2.3, class = "Iris-virginica")
(sepallength = 5.8, sepalwidth = 2.7, petallength = 5.1, petalwidth = 1.9, class = "Iris-virginica")
(sepallength = 6.8, sepalwidth = 3.2, petallength = 5.9, petalwidth = 2.3, class = "Iris-virginica")
(sepallength = 6.7, sepalwidth = 3.3, petallength = 5.7, petalwidth = 2.5, class = "Iris-virginica")
(sepallength = 6.7, sepalwidth = 3.0, petallength = 5.2, petalwidth = 2.3, class = "Iris-virginica")
(sepallength = 6.3, sepalwidth = 2.5, petallength = 5.0, petalwidth = 1.9, class = "Iris-virginica")
(sepallength = 6.5, sepalwidth = 3.0, petallength = 5.2, petalwidth = 2.0, class = "Iris-virginica")
(sepallength = 6.2, sepalwidth = 3.4, petallength = 5.4, petalwidth = 2.3, class = "Iris-virginica")
(sepallength = 5.9, sepalwidth = 3.0, petallength = 5.1, petalwidth = 1.8, class = "Iris-virginica")
Converting to DataFrame
julia> using DataFrames
julia> df = DataFrame(rowtable)
150×5 DataFrame
Row │ sepallength sepalwidth petallength petalwidth class
│ Float64 Float64 Float64 Float64 SubStrin…
─────┼──────────────────────────────────────────────────────────────────
1 │ 5.1 3.5 1.4 0.2 Iris-setosa
2 │ 4.9 3.0 1.4 0.2 Iris-setosa
3 │ 4.7 3.2 1.3 0.2 Iris-setosa
4 │ 4.6 3.1 1.5 0.2 Iris-setosa
5 │ 5.0 3.6 1.4 0.2 Iris-setosa
6 │ 5.4 3.9 1.7 0.4 Iris-setosa
7 │ 4.6 3.4 1.4 0.3 Iris-setosa
8 │ 5.0 3.4 1.5 0.2 Iris-setosa
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
144 │ 6.8 3.2 5.9 2.3 Iris-virginica
145 │ 6.7 3.3 5.7 2.5 Iris-virginica
146 │ 6.7 3.0 5.2 2.3 Iris-virginica
147 │ 6.3 2.5 5.0 1.9 Iris-virginica
148 │ 6.5 3.0 5.2 2.0 Iris-virginica
149 │ 6.2 3.4 5.4 2.3 Iris-virginica
150 │ 5.9 3.0 5.1 1.8 Iris-virginica
135 rows omitted
julia> df2 = coerce(df, :class=>Multiclass)
150×5 DataFrame
Row │ sepallength sepalwidth petallength petalwidth class
│ Float64 Float64 Float64 Float64 Cat…
─────┼──────────────────────────────────────────────────────────────────
1 │ 5.1 3.5 1.4 0.2 Iris-setosa
2 │ 4.9 3.0 1.4 0.2 Iris-setosa
3 │ 4.7 3.2 1.3 0.2 Iris-setosa
4 │ 4.6 3.1 1.5 0.2 Iris-setosa
5 │ 5.0 3.6 1.4 0.2 Iris-setosa
6 │ 5.4 3.9 1.7 0.4 Iris-setosa
7 │ 4.6 3.4 1.4 0.3 Iris-setosa
8 │ 5.0 3.4 1.5 0.2 Iris-setosa
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
144 │ 6.8 3.2 5.9 2.3 Iris-virginica
145 │ 6.7 3.3 5.7 2.5 Iris-virginica
146 │ 6.7 3.0 5.2 2.3 Iris-virginica
147 │ 6.3 2.5 5.0 1.9 Iris-virginica
148 │ 6.5 3.0 5.2 2.0 Iris-virginica
149 │ 6.2 3.4 5.4 2.3 Iris-virginica
150 │ 5.9 3.0 5.1 1.8 Iris-virginica
135 rows omitted
MLJBase.OpenML.load
— FunctionOpenML.load(id)
Load the OpenML dataset with specified id
, from those listed on the OpenML site.
Returns a "row table", i.e., a Vector
of identically typed NamedTuple
s. A row table is compatible with the Tables.jl interface and can therefore be readily converted to other compatible formats. For example:
using DataFrames
rowtable = OpenML.load(61);
df = DataFrame(rowtable);
df2 = coerce(df, :class=>Multiclass)