Lab 2
To ensure code in this tutorial runs as shown, download the tutorial project folder and follow these instructions.If you have questions or suggestions about this tutorial, please open an issue here.
This is a very brief and rough primer if you're new to Julia and wondering how to do simple things that are relevant for data analysis.
Defining a vector
x = [1, 3, 2, 5]
@show x
@show length(x)
x = [1, 3, 2, 5]
length(x) = 4
Operations between vectors
y = [4, 5, 6, 1]
z = x .+ y # elementwise operation
4-element Vector{Int64}:
5
8
8
6
Defining a matrix
X = [1 2; 3 4]
2×2 Matrix{Int64}:
1 2
3 4
You can also do that from a vector
X = reshape([1, 2, 3, 4], 2, 2)
2×2 Matrix{Int64}:
1 3
2 4
But you have to be careful that it fills the matrix by column; so if you want to get the same result as before, you will need to permute the dimensions
X = permutedims(reshape([1, 2, 3, 4], 2, 2))
2×2 Matrix{Int64}:
1 2
3 4
Function calls can be split with the |>
operator so that the above can also be written
X = reshape([1,2,3,4], 2, 2) |> permutedims
2×2 Matrix{Int64}:
1 2
3 4
You don't have to do that of course but we will sometimes use it in these tutorials.
There's a wealth of functions available for simple math operations
x = 4
@show x^2
@show sqrt(x)
x ^ 2 = 16
sqrt(x) = 2.0
Element wise operations on a collection can be done with the dot syntax:
sqrt.([4, 9, 16])
3-element Vector{Float64}:
2.0
3.0
4.0
The packages Statistics
(from the standard library) and StatsBase
offer a number of useful function for stats:
using Statistics, StatsBase
Note that if you don't have StatsBase
, you can add it using using Pkg; Pkg.add("StatsBase")
. Right, let's now compute some simple statistics:
x = randn(1_000) # 1_000 points iid from a N(0, 1)
μ = mean(x)
σ = std(x)
@show (μ, σ)
(μ, σ) = (0.021443736346958762, 0.9845672909964411)
Indexing data starts at 1, use :
to indicate the full range
X = [1 2; 3 4; 5 6]
@show X[1, 2]
@show X[:, 1]
@show X[1, :]
@show X[[1, 2], [1, 2]]
X[1, 2] = 2
X[:, 1] = [1, 3, 5]
X[1, :] = [1, 2]
X[[1, 2], [1, 2]] = [1 2; 3 4]
size
gives dimensions (nrows, ncolumns)
size(X)
(3, 2)
There are many ways to load data in Julia, one convenient one is via the CSV
package.
using CSV
Many datasets are available via the RDatasets
package
using RDatasets
And finally the DataFrames
package allows to manipulate data easily
using DataFrames
Let's load some data from RDatasets (the full list of datasets is available here).
auto = dataset("ISLR", "Auto")
first(auto, 3)
3×9 DataFrame
Row │ MPG Cylinders Displacement Horsepower Weight Acceleration Year Origin Name
│ Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 String
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 18.0 8.0 307.0 130.0 3504.0 12.0 70.0 1.0 chevrolet chevelle malibu
2 │ 15.0 8.0 350.0 165.0 3693.0 11.5 70.0 1.0 buick skylark 320
3 │ 18.0 8.0 318.0 150.0 3436.0 11.0 70.0 1.0 plymouth satellite
The describe
function allows to get an idea for the data:
describe(auto, :mean, :median, :std)
9×4 DataFrame
Row │ variable mean median std
│ Symbol Union… Union… Union…
─────┼─────────────────────────────────────────
1 │ MPG 23.4459 22.75 7.80501
2 │ Cylinders 5.47194 4.0 1.70578
3 │ Displacement 194.412 151.0 104.644
4 │ Horsepower 104.469 93.5 38.4912
5 │ Weight 2977.58 2803.5 849.403
6 │ Acceleration 15.5413 15.5 2.75886
7 │ Year 75.9796 76.0 3.68374
8 │ Origin 1.57653 1.0 0.805518
9 │ Name
To retrieve column names, you can use names
:
names(auto)
9-element Vector{String}:
"MPG"
"Cylinders"
"Displacement"
"Horsepower"
"Weight"
"Acceleration"
"Year"
"Origin"
"Name"
Accesssing columns can be done in different ways:
mpg = auto.MPG
mpg = auto[:, 1]
mpg = auto[:, :MPG]
mpg |> mean
23.44591836734694
To get dimensions you can use size
and nrow
and ncol
@show size(auto)
@show nrow(auto)
@show ncol(auto)
size(auto) = (392, 9)
nrow(auto) = 392
ncol(auto) = 9
For more detailed tutorials on basic data wrangling in Julia, consider
the learn x in y julia tutorial
There are multiple libraries that can be used to plot things in Julia:
Plots.jl which supports multiple plotting backends,
Gadfly.jl influenced by the grammar of graphics and
ggplot2
PyPlot.jl basically matplotlib
PGFPlotsX.jl and PGFPlots using the LaTeX package pgfplots,
In these tutorials we use Plots.jl
but you could use another package of course.
using Plots
plot(mpg, size=(800,600), linewidth=2, legend=false)