OpenML Integration
OpenML provides an integration platform for carrying out and comparing machine learning solutions across a broad collection of public datasets and software platforms. Integration of MLJ with OpenML is a work in progress.
Loading IRIS Dataset
As an example, we will try to load iris dataset using OpenML.load(taskID)
.
using MLJ.MLJBase
Task ID
OpenML.load
requires task ID of the the dataset to be loaded. This ID can be found on OpenML website. The task ID for iris dataset is 61, as mentioned in this OpenML Page
julia> rowtable = OpenML.load(61)
150-element Array{NamedTuple{(:sepallength, :sepalwidth, :petallength, :petalwidth, :class),Tuple{Float64,Float64,Float64,Float64,SubString{String}}},1}:
(sepallength = 5.1, sepalwidth = 3.5, petallength = 1.4, petalwidth = 0.2, class = "Iris-setosa")
(sepallength = 4.9, sepalwidth = 3.0, petallength = 1.4, petalwidth = 0.2, class = "Iris-setosa")
(sepallength = 4.7, sepalwidth = 3.2, petallength = 1.3, petalwidth = 0.2, class = "Iris-setosa")
(sepallength = 4.6, sepalwidth = 3.1, petallength = 1.5, petalwidth = 0.2, class = "Iris-setosa")
(sepallength = 5.0, sepalwidth = 3.6, petallength = 1.4, petalwidth = 0.2, class = "Iris-setosa")
(sepallength = 5.4, sepalwidth = 3.9, petallength = 1.7, petalwidth = 0.4, class = "Iris-setosa")
(sepallength = 4.6, sepalwidth = 3.4, petallength = 1.4, petalwidth = 0.3, class = "Iris-setosa")
(sepallength = 5.0, sepalwidth = 3.4, petallength = 1.5, petalwidth = 0.2, class = "Iris-setosa")
(sepallength = 4.4, sepalwidth = 2.9, petallength = 1.4, petalwidth = 0.2, class = "Iris-setosa")
(sepallength = 4.9, sepalwidth = 3.1, petallength = 1.5, petalwidth = 0.1, class = "Iris-setosa")
⋮
(sepallength = 6.9, sepalwidth = 3.1, petallength = 5.1, petalwidth = 2.3, class = "Iris-virginica")
(sepallength = 5.8, sepalwidth = 2.7, petallength = 5.1, petalwidth = 1.9, class = "Iris-virginica")
(sepallength = 6.8, sepalwidth = 3.2, petallength = 5.9, petalwidth = 2.3, class = "Iris-virginica")
(sepallength = 6.7, sepalwidth = 3.3, petallength = 5.7, petalwidth = 2.5, class = "Iris-virginica")
(sepallength = 6.7, sepalwidth = 3.0, petallength = 5.2, petalwidth = 2.3, class = "Iris-virginica")
(sepallength = 6.3, sepalwidth = 2.5, petallength = 5.0, petalwidth = 1.9, class = "Iris-virginica")
(sepallength = 6.5, sepalwidth = 3.0, petallength = 5.2, petalwidth = 2.0, class = "Iris-virginica")
(sepallength = 6.2, sepalwidth = 3.4, petallength = 5.4, petalwidth = 2.3, class = "Iris-virginica")
(sepallength = 5.9, sepalwidth = 3.0, petallength = 5.1, petalwidth = 1.8, class = "Iris-virginica")
Coverting to DataFrame
julia> using DataFrames
julia> df = DataFrame(rowtable)
150×5 DataFrames.DataFrame
│ Row │ sepallength │ sepalwidth │ petallength │ petalwidth │ class │
│ │ Float64 │ Float64 │ Float64 │ Float64 │ SubStrin… │
├─────┼─────────────┼────────────┼─────────────┼────────────┼────────────────┤
│ 1 │ 5.1 │ 3.5 │ 1.4 │ 0.2 │ Iris-setosa │
│ 2 │ 4.9 │ 3.0 │ 1.4 │ 0.2 │ Iris-setosa │
│ 3 │ 4.7 │ 3.2 │ 1.3 │ 0.2 │ Iris-setosa │
│ 4 │ 4.6 │ 3.1 │ 1.5 │ 0.2 │ Iris-setosa │
│ 5 │ 5.0 │ 3.6 │ 1.4 │ 0.2 │ Iris-setosa │
│ 6 │ 5.4 │ 3.9 │ 1.7 │ 0.4 │ Iris-setosa │
│ 7 │ 4.6 │ 3.4 │ 1.4 │ 0.3 │ Iris-setosa │
⋮
│ 143 │ 5.8 │ 2.7 │ 5.1 │ 1.9 │ Iris-virginica │
│ 144 │ 6.8 │ 3.2 │ 5.9 │ 2.3 │ Iris-virginica │
│ 145 │ 6.7 │ 3.3 │ 5.7 │ 2.5 │ Iris-virginica │
│ 146 │ 6.7 │ 3.0 │ 5.2 │ 2.3 │ Iris-virginica │
│ 147 │ 6.3 │ 2.5 │ 5.0 │ 1.9 │ Iris-virginica │
│ 148 │ 6.5 │ 3.0 │ 5.2 │ 2.0 │ Iris-virginica │
│ 149 │ 6.2 │ 3.4 │ 5.4 │ 2.3 │ Iris-virginica │
│ 150 │ 5.9 │ 3.0 │ 5.1 │ 1.8 │ Iris-virginica │
julia> df2 = coerce(df, :class=>Multiclass)
150×5 DataFrames.DataFrame
│ Row │ sepallength │ sepalwidth │ petallength │ petalwidth │ class │
│ │ Float64 │ Float64 │ Float64 │ Float64 │ Categorical… │
├─────┼─────────────┼────────────┼─────────────┼────────────┼────────────────┤
│ 1 │ 5.1 │ 3.5 │ 1.4 │ 0.2 │ Iris-setosa │
│ 2 │ 4.9 │ 3.0 │ 1.4 │ 0.2 │ Iris-setosa │
│ 3 │ 4.7 │ 3.2 │ 1.3 │ 0.2 │ Iris-setosa │
│ 4 │ 4.6 │ 3.1 │ 1.5 │ 0.2 │ Iris-setosa │
│ 5 │ 5.0 │ 3.6 │ 1.4 │ 0.2 │ Iris-setosa │
│ 6 │ 5.4 │ 3.9 │ 1.7 │ 0.4 │ Iris-setosa │
│ 7 │ 4.6 │ 3.4 │ 1.4 │ 0.3 │ Iris-setosa │
⋮
│ 143 │ 5.8 │ 2.7 │ 5.1 │ 1.9 │ Iris-virginica │
│ 144 │ 6.8 │ 3.2 │ 5.9 │ 2.3 │ Iris-virginica │
│ 145 │ 6.7 │ 3.3 │ 5.7 │ 2.5 │ Iris-virginica │
│ 146 │ 6.7 │ 3.0 │ 5.2 │ 2.3 │ Iris-virginica │
│ 147 │ 6.3 │ 2.5 │ 5.0 │ 1.9 │ Iris-virginica │
│ 148 │ 6.5 │ 3.0 │ 5.2 │ 2.0 │ Iris-virginica │
│ 149 │ 6.2 │ 3.4 │ 5.4 │ 2.3 │ Iris-virginica │
│ 150 │ 5.9 │ 3.0 │ 5.1 │ 1.8 │ Iris-virginica │
MLJBase.OpenML.load
— FunctionOpenML.load(id)
Load the OpenML dataset with specified id
, from those listed on the OpenML site.
Returns a "row table", i.e., a Vector
of identically typed NamedTuple
s. A row table is compatible with the Tables.jl interface and can therefore be readily converted to other compatible formats. For example:
using DataFrames
rowtable = OpenML.load(61);
df = DataFrame(rowtable);
df2 = coerce(df, :class=>Multiclass)