Tutorial

This tutorial will guide you through the core features of this library. By the end of this tutorial, you will have a solid understanding of how to use the library effectively. Let's get started!

Requirements

To run this tutorial, you need to have the following packages installed:

MLJ.jl - A machine learning framework for Julia
MLJDecisionTreeInterface.jl - Decision tree models for MLJ
JLSO.jl - Julia Serialized Object file format
DataFrames.jl - For handling tabular data

You can install these packages using Julia's package manager. Open the Julia REPL and run:

using Pkg
Pkg.add("MLJ")
Pkg.add("MLJDecisionTreeInterface")
Pkg.add("JLSO")
Pkg.add("DataFrames")

Loading the Data

First, we need to load the dataset that we will be using for this tutorial.

using MLJ
using JLSO
using DataFrames
using DearDiary

iris = DataFrames.DataFrame(load_iris())
train, test = partition(iris, 0.8, shuffle=true)

train_y, train_X = unpack(train, ==(:target))
test_y, test_X = unpack(test, ==(:target))

Initializing the database

Before we start tracking our experiments, we need to initialize the database where the experiment data will be stored.

DearDiary.initialize_database()

This will create a local SQLite database file named deardiary.db in the current directory.

Creating a new project and experiment

Projects help you organize your experiments. Let's create a new project for our iris classification experiment.

julia> project_id, _ = create_project("Tutorial project")(id = 1, status = DearDiary.Created)

Once we have a project, we can create an experiment within that project.

julia> experiment_id, _ = create_experiment(project_id, DearDiary.IN_PROGRESS, "Iris classification experiment")(id = 1, status = DearDiary.Created)

Note

In the case that something goes wrong during the project or experiment creation, the functions will return nothing and a marker type indicating the type of error. You can check the marker types in the Miscellaneous section of the documentation.

Training the model and tracking the experiment

Now we are ready to train a machine learning model and track the experiment using the library. We will use a decision tree classifier for this example.

DecisionTreeClassifier = @load DecisionTreeClassifier pkg=DecisionTree
dtc = DecisionTreeClassifier()
max_depth_range = range(dtc, :max_depth, lower=2, upper=10, scale=:linear)

model = TunedModel(
    model=dtc,
    resampling=CV(),
    tuning=Grid(),
    range=max_depth_range,
    measure=[accuracy, log_loss, misclassification_rate, brier_score],
)

ProbabilisticTunedModel(
  model = DecisionTreeClassifier(
        max_depth = -1, 
        min_samples_leaf = 1, 
        min_samples_split = 2, 
        min_purity_increase = 0.0, 
        n_subfeatures = 0, 
        post_prune = false, 
        merge_purity_threshold = 1.0, 
        display_depth = 5, 
        feature_importance = :impurity, 
        rng = Random.TaskLocalRNG()), 
  tuning = Grid(
        goal = nothing, 
        resolution = 10, 
        shuffle = true, 
        rng = Random.TaskLocalRNG()), 
  resampling = CV(
        nfolds = 6, 
        shuffle = false, 
        rng = Random.TaskLocalRNG()), 
  measure = StatisticalMeasuresBase.FussyMeasure[Accuracy(), LogLoss(tol = 2.22045e-16), MisclassificationRate(), BrierScore()], 
  weights = nothing, 
  class_weights = nothing, 
  operation = nothing, 
  range = NumericRange(2 ≤ max_depth ≤ 10; origin=6.0, unit=4.0), 
  selection_heuristic = MLJTuning.NaiveSelection(nothing), 
  train_best = true, 
  repeats = 1, 
  n = nothing, 
  acceleration = ComputationalResources.CPU1{Nothing}(nothing), 
  acceleration_resampling = ComputationalResources.CPU1{Nothing}(nothing), 
  check_measure = true, 
  cache = true, 
  compact_history = true, 
  logger = nothing)

julia> mach = machine(model, train_X, train_y)untrained Machine; does not cache data
  model: ProbabilisticTunedModel(model = DecisionTreeClassifier(max_depth = -1, …), …)
  args:
    1:	Source @410 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}
    2:	Source @573 ⏎ AbstractVector{ScientificTypesBase.Multiclass{3}}

julia> fit!(mach)[ Info: Training machine(ProbabilisticTunedModel(model = DecisionTreeClassifier(max_depth = -1, …), …), …).
[ Info: Attempting to evaluate 9 models.

Evaluating over 9 metamodels:   0%[>                        ]  ETA: N/A
Evaluating over 9 metamodels:  11%[==>                      ]  ETA: 0:01:43
Evaluating over 9 metamodels:  22%[=====>                   ]  ETA: 0:00:46
Evaluating over 9 metamodels:  33%[========>                ]  ETA: 0:00:26
Evaluating over 9 metamodels: 100%[=========================] Time: 0:00:13
trained Machine; does not cache data
  model: ProbabilisticTunedModel(model = DecisionTreeClassifier(max_depth = -1, …), …)
  args:
    1:	Source @410 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}
    2:	Source @573 ⏎ AbstractVector{ScientificTypesBase.Multiclass{3}}

After training the model, we can log the results of the experiment to the database.

model_values = report(mach).history .|> (x -> (x.measure, x.measurement, x.model.max_depth))

for (measure, measurements, max_depth) in model_values
    iteration_id, _ = create_iteration(experiment_id)
    create_parameter(iteration_id, "max_depth", max_depth)

    measures_names = [split(x |> string, "(") |> first for x in measure]
    metrics_at_step = Dict(
        name => value for (name, value) in zip(measures_names, measurements)
    )
    log_metrics(iteration_id, metrics_at_step)
end

Each create_metric or log_metrics call appends to a per-(iteration, key) series. The server auto-assigns step (max(step) + 1) and recorded_at (now()) when you don't pass them, so logging the same key repeatedly forms a chronological time series — exactly what a training loop produces over epochs:

for epoch in 1:10
    log_metrics(iteration_id, Dict("loss" => train_loss(epoch), "acc" => accuracy(epoch)))
end

Viewing the logged data

You can retrieve and check the logged data from the database to ensure everything was logged correctly.

julia> iteration = last(get_iterations(experiment_id)) # Checking only the last iterationDearDiary.Iteration
 ├ id = 9
 ├ experiment_id = 1
 ├ notes = ""
 ├ created_date = 2026-06-06T18:34:58.465
 ├ end_date = nothing
 ├ parent_iteration_id = nothing
 ├ status_id = 1
 ├ error_message = ""
 ├ julia_version = ""
 ├ git_sha = ""
 ├ git_dirty = false
 ├ entrypoint = ""
 ├ project_toml = ""
 └ manifest_toml = ""

julia> get_parameters(iteration.id)1-element Vector{DearDiary.Parameter}:
DearDiary.Parameter
 ├ id = 9
 ├ iteration_id = 9
 ├ key = "max_depth"
 └ value = "5"

julia> get_metrics(iteration.id)4-element Vector{DearDiary.Metric}:
DearDiary.Metric
 ├ id = 33
 ├ iteration_id = 9
 ├ key = "MisclassificationRate"
 ├ value = 0.049999999999999996
 ├ step = 0
 └ recorded_at = 2026-06-06T18:34:58.465
DearDiary.Metric
 ├ id = 34
 ├ iteration_id = 9
 ├ key = "Accuracy"
 ├ value = 0.9500000000000001
 ├ step = 0
 └ recorded_at = 2026-06-06T18:34:58.465
DearDiary.Metric
 ├ id = 35
 ├ iteration_id = 9
 ├ key = "LogLoss"
 ├ value = 1.8021826694558578
 ├ step = 0
 └ recorded_at = 2026-06-06T18:34:58.465
DearDiary.Metric
 ├ id = 36
 ├ iteration_id = 9
 ├ key = "BrierScore"
 ├ value = -0.09999999999999999
 ├ step = 0
 └ recorded_at = 2026-06-06T18:34:58.465

Save and load the trained model

You can save serialized objects, files, or any other resources related to your experiments.

smach = serializable(mach)
io = IOBuffer()
JLSO.save(io, :machine => smach)

bytes = take!(io)

julia> resource_id, _ = create_resource(experiment_id, "Iris DTC MLJ Machine", bytes)(id = 1, status = DearDiary.Created)

Then you can load the model back when needed.

julia> resource = get_resource(resource_id)DearDiary.Resource
 ├ id = 1
 ├ experiment_id = 1
 ├ name = "Iris DTC MLJ Machine"
 ├ description = ""
 ├ data = UInt8[0x5f, 0x51, 0x00, …, 0x75, 0x00, 0x00]
 ├ created_date = 2026-06-06T18:35:07.724
 ├ updated_date = nothing
 ├ backend = "sqlite"
 ├ uri = ""
 ├ size_bytes = 25124
 └ content_hash = "cfc39422b9312715694b170e54d4dfaa84ca88d4978b0dfcf2a2bf12eaa33c01"

The metadata response carries the artifact's metadata only — fetch the raw bytes via read_resource_data.

io = IOBuffer(read_resource_data(resource_id))
loaded_mach = JLSO.load(io)[:machine]

serializable Machine
  model: ProbabilisticTunedModel(model = DecisionTreeClassifier(max_depth = -1, …), …)
  args:

julia> restore!(loaded_mach)trained Machine; does not cache data
  model: ProbabilisticTunedModel(model = DecisionTreeClassifier(max_depth = -1, …), …)
  args:

Built-in REST API

The library also provides a built-in REST API to allow the outside world to interact with your projects. You can start the API server using the following command:

DearDiary.run(;)

This will start the API server on http://localhost:9000. You can customize the settings by setting an .env file containing the configuration options. For more details, refer to the REST API section of the documentation.

Logging from a remote training script

When the training script runs on a different machine from the server, use the bundled Julia client. Every CRUD verb shown above gains a Client-aware method, and the with_iteration helper auto-finalises an iteration on both success and exception.

using DearDiary

client = DearDiary.connect(
    "http://server.example:9000"; username="alice", password="secret",
)

project_id = create_project(client, "Iris classification")
experiment_id = create_experiment(
    client, project_id, DearDiary.IN_PROGRESS, "Decision tree sweep",
)

with_iteration(client, experiment_id) do iter
    create_parameter(client, iter.id, "max_depth", 4)
    create_metric(client, iter.id, "accuracy", 0.96)
end

See the Client reference for the full list of helpers.

Conclusion

And that's it! You have successfully completed the tutorial and learned how to use the core features of this library. You can now track your machine learning experiments effectively. For more advanced features and options, refer to the rest of the documentation.