Lab 2

To ensure code in this tutorial runs as shown, download the tutorial project folder and follow these instructions.

If you have questions or suggestions about this tutorial, please open an issue here.

Basic commands

This is a very brief and rough primer if you're new to Julia and wondering how to do simple things that are relevant for data analysis.

Defining a vector

x = [1, 3, 2, 5]
@show x
@show length(x)
x = [1, 3, 2, 5]
length(x) = 4

Operations between vectors

y = [4, 5, 6, 1]
z = x .+ y # elementwise operation
4-element Vector{Int64}:
 5
 8
 8
 6

Defining a matrix

X = [1  2; 3 4]
2×2 Matrix{Int64}:
 1  2
 3  4

You can also do that from a vector

X = reshape([1, 2, 3, 4], 2, 2)
2×2 Matrix{Int64}:
 1  3
 2  4

But you have to be careful that it fills the matrix by column; so if you want to get the same result as before, you will need to permute the dimensions

X = permutedims(reshape([1, 2, 3, 4], 2, 2))
2×2 Matrix{Int64}:
 1  2
 3  4

Function calls can be split with the |> operator so that the above can also be written

X = reshape([1,2,3,4], 2, 2) |> permutedims
2×2 Matrix{Int64}:
 1  2
 3  4

You don't have to do that of course but we will sometimes use it in these tutorials.

There's a wealth of functions available for simple math operations

x = 4
@show x^2
@show sqrt(x)
x ^ 2 = 16
sqrt(x) = 2.0

Element wise operations on a collection can be done with the dot syntax:

sqrt.([4, 9, 16])
3-element Vector{Float64}:
 2.0
 3.0
 4.0

The packages Statistics (from the standard library) and StatsBase offer a number of useful function for stats:

using Statistics, StatsBase

Note that if you don't have StatsBase, you can add it using using Pkg; Pkg.add("StatsBase"). Right, let's now compute some simple statistics:

x = randn(1_000) # 1_000 points iid from a N(0, 1)
μ = mean(x)
σ = std(x)
@show (μ, σ)
(μ, σ) = (0.01753762456170407, 0.9707741088066123)

Indexing data starts at 1, use : to indicate the full range

X = [1 2; 3 4; 5 6]
@show X[1, 2]
@show X[:, 1]
@show X[1, :]
@show X[[1, 2], [1, 2]]
X[1, 2] = 2
X[:, 1] = [1, 3, 5]
X[1, :] = [1, 2]
X[[1, 2], [1, 2]] = [1 2; 3 4]

size gives dimensions (nrows, ncolumns)

size(X)
(3, 2)

Loading data

There are many ways to load data in Julia, one convenient one is via the CSV package.

using CSV

Many datasets are available via the RDatasets package

using RDatasets

And finally the DataFrames package allows to manipulate data easily

using DataFrames

Let's load some data from RDatasets (the full list of datasets is available here).

auto = dataset("ISLR", "Auto")
first(auto, 3)
3×9 DataFrame
 Row │ MPG      Cylinders  Displacement  Horsepower  Weight   Acceleration  Year     Origin   Name
     │ Float64  Float64    Float64       Float64     Float64  Float64       Float64  Float64  String
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │    18.0        8.0         307.0       130.0   3504.0          12.0     70.0      1.0  chevrolet chevelle malibu
   2 │    15.0        8.0         350.0       165.0   3693.0          11.5     70.0      1.0  buick skylark 320
   3 │    18.0        8.0         318.0       150.0   3436.0          11.0     70.0      1.0  plymouth satellite

The describe function allows to get an idea for the data:

describe(auto, :mean, :median, :std)
9×4 DataFrame
 Row │ variable      mean     median  std
     │ Symbol        Union…   Union…  Union…
─────┼─────────────────────────────────────────
   1 │ MPG           23.4459  22.75   7.80501
   2 │ Cylinders     5.47194  4.0     1.70578
   3 │ Displacement  194.412  151.0   104.644
   4 │ Horsepower    104.469  93.5    38.4912
   5 │ Weight        2977.58  2803.5  849.403
   6 │ Acceleration  15.5413  15.5    2.75886
   7 │ Year          75.9796  76.0    3.68374
   8 │ Origin        1.57653  1.0     0.805518
   9 │ Name

To retrieve column names, you can use names:

names(auto)
9-element Vector{String}:
 "MPG"
 "Cylinders"
 "Displacement"
 "Horsepower"
 "Weight"
 "Acceleration"
 "Year"
 "Origin"
 "Name"

Accesssing columns can be done in different ways:

mpg = auto.MPG
mpg = auto[:, 1]
mpg = auto[:, :MPG]
mpg |> mean
23.44591836734694

To get dimensions you can use size and nrow and ncol

@show size(auto)
@show nrow(auto)
@show ncol(auto)
size(auto) = (392, 9)
nrow(auto) = 392
ncol(auto) = 9

For more detailed tutorials on basic data wrangling in Julia, consider

Plotting data

There are multiple libraries that can be used to plot things in Julia:

In these tutorials we use PyPlot but you could use another package of course.

using PyPlot

figure(figsize=(8,6))
plot(mpg)
Simple plot