Reference

Types
Methods

Types

Finite
Multiclass
OrderedFactor
Infinite
Continuous
Count
Image
ColorImage
GrayImage
Sampleable
Density
Table
Textual

ScientificTypesBase.Finite — Type

Finite{N}

Scientific type for scalar, categorical data taking on one of N possible discrete values, which may or may not have a natural ordering.

Subtypes: Multiclass{N}, OrderedFactor{N}

Aliases: Binary==Finite{2}. Binary data can be unordered (Multiclass{2}) or ordered (OrderedFactor{2}).

See also scitype.

source

ScientificTypesBase.Multiclass — Type

Multiclass{N}

Scientific type for scalar, categorical data with N possible values but no natural ordering for those classes (nominal data).

Examples: gender, team member, model number, product color, ethnicity, zipcode

Supertype: Finite{N}

See also scitype.

source

ScientificTypesBase.OrderedFactor — Type

OrderedFactor{N}

Scientific type for scalar, categorical data with N possible values with a natural ordering (ordinal data).

Includes the binary data scientific type OrderedFactor{2}, applying whenever it is natural to assign a "positive" class, for example, by a standard convention (e.g, "is toxic", "is an anomaly", "has the disease"). The "positive" class is the maximal class under the ordering. The distinction is important to disambiguate statistical metrics such as "number of true positives", "recall", etc.

Examples: letter grade in an exam, education level, number of stars in a review, safe/toxic, inlier/outlier, rejected/accepted.

Supertype: Finite{N}

See also scitype.

source

ScientificTypesBase.Infinite — Type

Infinite{N}

Scientific type for scalar data with an intrinsic order, but of unbounded nature, either discrete or continuous.

Subtypes: Continuous, Count

See also scitype.

source

ScientificTypesBase.Continuous — Type

Continuous

Scientific type for continuous scalar data.

Examples: height, age, blood-pressure, weight, temperature.

Supertype: Infinite

See also scitype.

source

ScientificTypesBase.Count — Type

Count

Scientific type for discrete, ordered data, of unbounded nature.

Examples: number of phone calls per hour, number of building occupants, number of earthquakes per year over 6 on the Richter scale, number of unsaturated carbon-carbon bonds in a molecule.

Supertype: Infinite

See also scitype.

source

ScientificTypesBase.Image — Type

Image{W,H}

Scientific type for image data, where W is the width and H the height.

Subtypes: GrayImage{W,H}, ColorImage{W,H}

See also scitype.

source

ScientificTypesBase.ColorImage — Type

ColorImage{W,H}

Scientific type for a color image, where W is the width and H the height.

Supertype: Image{W,H}

See also scitype.

source

ScientificTypesBase.GrayImage — Type

GrayImage{W,H}

Scientific type for a grey-scale image, where W is the width and H the height.

Supertype: Image{W,H}

See also scitype.

source

ScientificTypesBase.Sampleable — Type

Sampleable{Ω}

Scientific type for an object, such a probability distribution, that can be sampled. Each individual sample x will satisfy scitype(x) isa Ω.

Subtype: Density{Ω}

See also scitype.

source

ScientificTypesBase.Density — Type

Density{Ω}

Scientific type for an object representing a probability density function or probability mass function, and more generally, for any probability measure that is absolutely continuous with respect to some standard measure on the sample space. Elements x of the sample space will satisfy scitype(x) isa Ω. Objects of this type can, at least in principle, be sampled.

Supertype: Sampleable{Ω}

See also scitype.

source

ScientificTypesBase.Table — Type

Table{K}

Scientific type for tabular data. Here K will be a union of the scitypes of the columns (not the union of the element scitype of the columns).

See also scitype.

source

ScientificTypesBase.Textual — Type

Textual

Scientific type for text data playing some linguistic role, for example in sentiment analysis. This is to be contrasted with text used simply to label classes of a categorical variable; see instead Finite.

Examples: survey questions with discursive answers, text to be translated into a new language, vocabularies, email messages.

See also scitype.

source

ScientificTypes.scitype — Function

scitype(X)

Return the scientific type (interpretation) of X, as distinct from its machine type. Atomic scientific types (Continuous, Multiclass, etc) are mostly abstract types defined in the package ScientificTypesBase.jl. Scientific types do not ordinarily have instances.

Note

Third party packages may extend the behavior of scitype: Objects previously having Unknown scitype may no longer do so.

To display the active scientific type hierarchy (excluding Missing and Nothing) do scitype().

Examples

julia> scitype(3.14)
Continuous

julia> scitype([1, 2, missing])
AbstractVector{Union{Missing, Count}}

julia> scitype((5, "beige"))
Tuple{Count, Textual}

julia> using CategoricalArrays

julia> table = (gender = categorical(['M', 'M', 'F', 'M', 'F']),
     ndevices = [1, 3, 2, 3, 2])

julia> scitype(table)
Table{Union{AbstractVector{Count}, AbstractVector{Multiclass{2}}}}

Column scitypes of a table can also be inspected with schema.

The behavior of scitype is detailed in the ScientificTypes documentation. Key features of the default behavior are:

AbstractFloat has scitype as Continuous <: Infinite.
Any Integer has scitype as Count <: Infinite.
Any CategoricalValue x has scitype as Multiclass <: Finite or OrderedFactor <: Finite, depending on the value of isordered(x).
Strings and Chars do not have scitype Multiclass or OrderedFactor; they have scitypes Textual and Unknown respectively.
The scientific types of nothing and missing are Nothing and Missing, Julia types that are also regarded as scientific.

See also coerce, autotype, schema.

source

scitype(; io=stdout)

Print to io the scitype hierarchy, beginning at Found (and so excluding Missing and Nothing).

Note that third party packages can extend the hierarchy, so output is not static.

source

ScientificTypes.schema — Function

schema(X)

Inspect the column types and scitypes of a tabular object. returns nothing if the column types and/or scitypes can't be inspected.

Example

X = (ncalls=[1, 2, 4], mean_delay=[2.0, 5.7, 6.0])
schema(X)

source

ScientificTypes.coerce — Function

coerce(A, S)

Return new version of the array A whose scientific element type is S.

julia> v = coerce([3, 7, 5], Continuous)
3-element Vector{Float64}:
 3.0
 7.0
 5.0

julia> scitype(v)
AbstractVector{Continuous}

coerce(X, specs...; tight=false, verbosity=1)

Given a table X, return a copy of X, ensuring that the element scitypes of the columns match the new specification, specs. There are three valid specifications:

(i) one or more column_name=>Scitype pairs:

coerce(X, col1=>Scitype1, col2=>Scitype2, ... ; verbosity=1)

(ii) one or more OldScitype=>NewScitype pairs (OldScitype covering both the OldScitype and Union{Missing,OldScitype} cases):

coerce(X, OldScitype1=>NewScitype1, OldScitype2=>NewScitype2, ... ; verbosity=1)

(iii) a dictionary of scientific types keyed on column names:

coerce(X, d::AbstractDict{<:ColKey, <:Type}; verbosity=1)

where ColKey = Union{Symbol,AbstractString}.

Examples

Specifying column_name=>Scitype pairs:

using CategoricalArrays, DataFrames, Tables
X = DataFrame(name=["Siri", "Robo", "Alexa", "Cortana"],
              height=[152, missing, 148, 163],
              rating=[1, 5, 2, 1])
Xc = coerce(X, :name=>Multiclass, :height=>Continuous, :rating=>OrderedFactor)
schema(Xc).scitypes # (Multiclass, Continuous, OrderedFactor)

Specifying OldScitype=>NewScitype pairs:

X  = (x = [1, 2, 3],
      y = rand(3),
      z = [10, 20, 30])
Xc = coerce(X, Count=>Continuous)
schema(Xfixed).scitypes # (Continuous, Continuous, Continuous)

source

coerce(image::AbstractArray{<:Real, N}, I)

Given an array called image representing one or more images, return a transformed version of the data so as to enforce an appropriate scientific interpretation I:

single or collection ?	N	I	`scitype` of result
single	2	`GrayImage`	`GrayImage{W,H}`
single	3	`ColorImage`	`ColorImage{W,H}`
collection	3	`GrayImage`	`AbstractVector{<:GrayImage}`
collection	4 (W x H x {1} x C)	`GrayImage`	`AbstractVector{<:GrayImage}`
collection	4	`ColorImage`	`AbstractVector{<:ColorImage}`

imgs = rand(10, 10, 3, 5)
v = coerce(imgs, ColorImage)

julia> typeof(v)
Vector{Matrix{ColorTypes.RGB{Float64}}}

julia> scitype(v)
AbstractVector{ColorImage{10, 10}}

source

ScientificTypes.autotype — Function

autotype(X; kw...)

Return a dictionary of suggested scitypes for each column of X, a table or an array based on rules

Kwargs

only_changes=true: if true, return only a dictionary of the names for which applying autotype differs from just using the ambient convention. When coercing with autotype, only_changes should be true.
rules=(:few_to_finite,): the set of rules to apply.

source