Tutorial
This tutorial will guide you through the core features of this library. By the end of this tutorial, you will have a solid understanding of how to use the library effectively. Let's get started!
Requirements
To run this tutorial, you need to have the following packages installed:
- MLJ.jl - A machine learning framework for Julia
- MLJDecisionTreeInterface.jl - Decision tree models for MLJ
- JLSO.jl - Julia Serialized Object file format
- DataFrames.jl - For handling tabular data
You can install these packages using Julia's package manager. Open the Julia REPL and run:
using Pkg
Pkg.add("MLJ")
Pkg.add("MLJDecisionTreeInterface")
Pkg.add("JLSO")
Pkg.add("DataFrames")Loading the Data
First, we need to load the dataset that we will be using for this tutorial.
using MLJ
using JLSO
using DataFrames
using DearDiary
iris = DataFrames.DataFrame(load_iris())
train, test = partition(iris, 0.8, shuffle=true)
train_y, train_X = unpack(train, ==(:target))
test_y, test_X = unpack(test, ==(:target))Initializing the database
Before we start tracking our experiments, we need to initialize the database where the experiment data will be stored.
DearDiary.initialize_database()This will create a local SQLite database file named deardiary.db in the current directory.
Creating a new project and experiment
Projects help you organize your experiments. Let's create a new project for our iris classification experiment.
julia> project_id, _ = create_project("Tutorial project")(id = 1, status = DearDiary.Created)
Once we have a project, we can create an experiment within that project.
julia> experiment_id, _ = create_experiment(project_id, DearDiary.IN_PROGRESS, "Iris classification experiment")(id = 1, status = DearDiary.Created)
In the case that something goes wrong during the project or experiment creation, the functions will return nothing and a marker type indicating the type of error. You can check the marker types in the Miscellaneous section of the documentation.
Training the model and tracking the experiment
Now we are ready to train a machine learning model and track the experiment using the library. We will use a decision tree classifier for this example.
DecisionTreeClassifier = @load DecisionTreeClassifier pkg=DecisionTree
dtc = DecisionTreeClassifier()
max_depth_range = range(dtc, :max_depth, lower=2, upper=10, scale=:linear)
model = TunedModel(
model=dtc,
resampling=CV(),
tuning=Grid(),
range=max_depth_range,
measure=[accuracy, log_loss, misclassification_rate, brier_score],
)ProbabilisticTunedModel(
model = DecisionTreeClassifier(
max_depth = -1,
min_samples_leaf = 1,
min_samples_split = 2,
min_purity_increase = 0.0,
n_subfeatures = 0,
post_prune = false,
merge_purity_threshold = 1.0,
display_depth = 5,
feature_importance = :impurity,
rng = Random.TaskLocalRNG()),
tuning = Grid(
goal = nothing,
resolution = 10,
shuffle = true,
rng = Random.TaskLocalRNG()),
resampling = CV(
nfolds = 6,
shuffle = false,
rng = Random.TaskLocalRNG()),
measure = StatisticalMeasuresBase.FussyMeasure[Accuracy(), LogLoss(tol = 2.22045e-16), MisclassificationRate(), BrierScore()],
weights = nothing,
class_weights = nothing,
operation = nothing,
range = NumericRange(2 ≤ max_depth ≤ 10; origin=6.0, unit=4.0),
selection_heuristic = MLJTuning.NaiveSelection(nothing),
train_best = true,
repeats = 1,
n = nothing,
acceleration = ComputationalResources.CPU1{Nothing}(nothing),
acceleration_resampling = ComputationalResources.CPU1{Nothing}(nothing),
check_measure = true,
cache = true,
compact_history = true,
logger = nothing)julia> mach = machine(model, train_X, train_y)untrained Machine; does not cache data model: ProbabilisticTunedModel(model = DecisionTreeClassifier(max_depth = -1, …), …) args: 1: Source @410 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}} 2: Source @573 ⏎ AbstractVector{ScientificTypesBase.Multiclass{3}}
julia> fit!(mach)[ Info: Training machine(ProbabilisticTunedModel(model = DecisionTreeClassifier(max_depth = -1, …), …), …). [ Info: Attempting to evaluate 9 models. Evaluating over 9 metamodels: 0%[> ] ETA: N/A Evaluating over 9 metamodels: 11%[==> ] ETA: 0:01:43 Evaluating over 9 metamodels: 22%[=====> ] ETA: 0:00:46 Evaluating over 9 metamodels: 33%[========> ] ETA: 0:00:26 Evaluating over 9 metamodels: 100%[=========================] Time: 0:00:13 trained Machine; does not cache data model: ProbabilisticTunedModel(model = DecisionTreeClassifier(max_depth = -1, …), …) args: 1: Source @410 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}} 2: Source @573 ⏎ AbstractVector{ScientificTypesBase.Multiclass{3}}
After training the model, we can log the results of the experiment to the database.
model_values = report(mach).history .|> (x -> (x.measure, x.measurement, x.model.max_depth))
for (measure, measurements, max_depth) in model_values
iteration_id, _ = create_iteration(experiment_id)
create_parameter(iteration_id, "max_depth", max_depth)
measures_names = [split(x |> string, "(") |> first for x in measure]
metrics_at_step = Dict(
name => value for (name, value) in zip(measures_names, measurements)
)
log_metrics(iteration_id, metrics_at_step)
endEach create_metric or log_metrics call appends to a per-(iteration, key) series. The server auto-assigns step (max(step) + 1) and recorded_at (now()) when you don't pass them, so logging the same key repeatedly forms a chronological time series — exactly what a training loop produces over epochs:
for epoch in 1:10
log_metrics(iteration_id, Dict("loss" => train_loss(epoch), "acc" => accuracy(epoch)))
endViewing the logged data
You can retrieve and check the logged data from the database to ensure everything was logged correctly.
julia> iteration = last(get_iterations(experiment_id)) # Checking only the last iterationDearDiary.Iteration ├ id = 9 ├ experiment_id = 1 ├ notes = "" ├ created_date = 2026-06-06T18:34:58.465 ├ end_date = nothing ├ parent_iteration_id = nothing ├ status_id = 1 ├ error_message = "" ├ julia_version = "" ├ git_sha = "" ├ git_dirty = false ├ entrypoint = "" ├ project_toml = "" └ manifest_toml = ""
julia> get_parameters(iteration.id)1-element Vector{DearDiary.Parameter}: DearDiary.Parameter ├ id = 9 ├ iteration_id = 9 ├ key = "max_depth" └ value = "5"
julia> get_metrics(iteration.id)4-element Vector{DearDiary.Metric}: DearDiary.Metric ├ id = 33 ├ iteration_id = 9 ├ key = "MisclassificationRate" ├ value = 0.049999999999999996 ├ step = 0 └ recorded_at = 2026-06-06T18:34:58.465 DearDiary.Metric ├ id = 34 ├ iteration_id = 9 ├ key = "Accuracy" ├ value = 0.9500000000000001 ├ step = 0 └ recorded_at = 2026-06-06T18:34:58.465 DearDiary.Metric ├ id = 35 ├ iteration_id = 9 ├ key = "LogLoss" ├ value = 1.8021826694558578 ├ step = 0 └ recorded_at = 2026-06-06T18:34:58.465 DearDiary.Metric ├ id = 36 ├ iteration_id = 9 ├ key = "BrierScore" ├ value = -0.09999999999999999 ├ step = 0 └ recorded_at = 2026-06-06T18:34:58.465
Save and load the trained model
You can save serialized objects, files, or any other resources related to your experiments.
smach = serializable(mach)
io = IOBuffer()
JLSO.save(io, :machine => smach)
bytes = take!(io)julia> resource_id, _ = create_resource(experiment_id, "Iris DTC MLJ Machine", bytes)(id = 1, status = DearDiary.Created)
Then you can load the model back when needed.
julia> resource = get_resource(resource_id)DearDiary.Resource ├ id = 1 ├ experiment_id = 1 ├ name = "Iris DTC MLJ Machine" ├ description = "" ├ data = UInt8[0x5f, 0x51, 0x00, …, 0x75, 0x00, 0x00] ├ created_date = 2026-06-06T18:35:07.724 ├ updated_date = nothing ├ backend = "sqlite" ├ uri = "" ├ size_bytes = 25124 └ content_hash = "cfc39422b9312715694b170e54d4dfaa84ca88d4978b0dfcf2a2bf12eaa33c01"
The metadata response carries the artifact's metadata only — fetch the raw bytes via read_resource_data.
io = IOBuffer(read_resource_data(resource_id))
loaded_mach = JLSO.load(io)[:machine]serializable Machine
model: ProbabilisticTunedModel(model = DecisionTreeClassifier(max_depth = -1, …), …)
args:
julia> restore!(loaded_mach)trained Machine; does not cache data model: ProbabilisticTunedModel(model = DecisionTreeClassifier(max_depth = -1, …), …) args:
Built-in REST API
The library also provides a built-in REST API to allow the outside world to interact with your projects. You can start the API server using the following command:
DearDiary.run(;)This will start the API server on http://localhost:9000. You can customize the settings by setting an .env file containing the configuration options. For more details, refer to the REST API section of the documentation.
Logging from a remote training script
When the training script runs on a different machine from the server, use the bundled Julia client. Every CRUD verb shown above gains a Client-aware method, and the with_iteration helper auto-finalises an iteration on both success and exception.
using DearDiary
client = DearDiary.connect(
"http://server.example:9000"; username="alice", password="secret",
)
project_id = create_project(client, "Iris classification")
experiment_id = create_experiment(
client, project_id, DearDiary.IN_PROGRESS, "Decision tree sweep",
)
with_iteration(client, experiment_id) do iter
create_parameter(client, iter.id, "max_depth", 4)
create_metric(client, iter.id, "accuracy", 0.96)
endSee the Client reference for the full list of helpers.
Conclusion
And that's it! You have successfully completed the tutorial and learned how to use the core features of this library. You can now track your machine learning experiments effectively. For more advanced features and options, refer to the rest of the documentation.