Reference

Types

ScientificTypesBase.FiniteType
Finite{N}

Scientific type for scalar, categorical data taking on one of N possible discrete values, which may or may not have a natural ordering.

Subtypes: Multiclass{N}, OrderedFactor{N}

Aliases: Binary==Finite{2}. Binary data can be unordered (Multiclass{2}) or ordered (OrderedFactor{2}).

See also scitype.

source
ScientificTypesBase.MulticlassType
Multiclass{N}

Scientific type for scalar, categorical data with N possible values but no natural ordering for those classes (nominal data).

Examples: gender, team member, model number, product color, ethnicity, zipcode

Supertype: Finite{N}

See also scitype.

source
ScientificTypesBase.OrderedFactorType
OrderedFactor{N}

Scientific type for scalar, categorical data with N possible values with a natural ordering (ordinal data).

Includes the binary data scientific type OrderedFactor{2}, applying whenever it is natural to assign a "positive" class, for example, by a standard convention (e.g, "is toxic", "is an anomaly", "has the disease"). The "positive" class is the maximal class under the ordering. The distinction is important to disambiguate statistical metrics such as "number of true positives", "recall", etc.

Examples: letter grade in an exam, education level, number of stars in a review, safe/toxic, inlier/outlier, rejected/accepted.

Supertype: Finite{N}

See also scitype.

source
ScientificTypesBase.CountType
Count

Scientific type for discrete, ordered data, of unbounded nature.

Examples: number of phone calls per hour, number of building occupants, number of earthquakes per year over 6 on the Richter scale, number of unsaturated carbon-carbon bonds in a molecule.

Supertype: Infinite

See also scitype.

source
ScientificTypesBase.SampleableType
Sampleable{Ω}

Scientific type for an object, such a probability distribution, that can be sampled. Each individual sample x will satisfy scitype(x) isa Ω.

Subtype: Density{Ω}

See also scitype.

source
ScientificTypesBase.DensityType
Density{Ω}

Scientific type for an object representing a probability density function or probability mass function, and more generally, for any probability measure that is absolutely continuous with respect to some standard measure on the sample space. Elements x of the sample space will satisfy scitype(x) isa Ω. Objects of this type can, at least in principle, be sampled.

Supertype: Sampleable{Ω}

See also scitype.

source
ScientificTypesBase.TableType
Table{K}

Scientific type for tabular data. Here K will be a union of the scitypes of the columns (not the union of the element scitype of the columns).

See also scitype.

source
ScientificTypesBase.TextualType
Textual

Scientific type for text data playing some linguistic role, for example in sentiment analysis. This is to be contrasted with text used simply to label classes of a categorical variable; see instead Finite.

Examples: survey questions with discursive answers, text to be translated into a new language, vocabularies, email messages.

See also scitype.

source

Methods

ScientificTypes.scitypeFunction
scitype(X)

Return the scientific type (interpretation) of X, as distinct from its machine type. Atomic scientific types (Continuous, Multiclass, etc) are mostly abstract types defined in the package ScientificTypesBase.jl. Scientific types do not ordinarily have instances.

Note

Third party packages may extend the behavior of scitype: Objects previously having Unknown scitype may no longer do so.

To display the active scientific type hierarchy (excluding Missing and Nothing) do scitype().

Examples

julia> scitype(3.14)
Continuous

julia> scitype([1, 2, missing])
AbstractVector{Union{Missing, Count}}

julia> scitype((5, "beige"))
Tuple{Count, Textual}

julia> using CategoricalArrays

julia> table = (gender = categorical(['M', 'M', 'F', 'M', 'F']),
     ndevices = [1, 3, 2, 3, 2])

julia> scitype(table)
Table{Union{AbstractVector{Count}, AbstractVector{Multiclass{2}}}}

Column scitypes of a table can also be inspected with schema.

The behavior of scitype is detailed in the ScientificTypes documentation. Key features of the default behavior are:

  • AbstractFloat has scitype as Continuous <: Infinite.

  • Any Integer has scitype as Count <: Infinite.

  • Any CategoricalValue x has scitype as Multiclass <: Finite or OrderedFactor <: Finite, depending on the value of isordered(x).

  • Strings and Chars do not have scitype Multiclass or OrderedFactor; they have scitypes Textual and Unknown respectively.

  • The scientific types of nothing and missing are Nothing and Missing, Julia types that are also regarded as scientific.

See also coerce, autotype, schema.

source
scitype(; io=stdout)

Print to io the scitype hierarchy, beginning at Found (and so excluding Missing and Nothing).

Note that third party packages can extend the hierarchy, so output is not static.

source
ScientificTypes.schemaFunction
schema(X)

Inspect the column types and scitypes of a tabular object. returns nothing if the column types and/or scitypes can't be inspected.

Example

X = (ncalls=[1, 2, 4], mean_delay=[2.0, 5.7, 6.0])
schema(X)
source
ScientificTypes.coerceFunction
coerce(A, S)

Return new version of the array A whose scientific element type is S.

julia> v = coerce([3, 7, 5], Continuous)
3-element Vector{Float64}:
 3.0
 7.0
 5.0

julia> scitype(v)
AbstractVector{Continuous}
coerce(X, specs...; tight=false, verbosity=1)

Given a table X, return a copy of X, ensuring that the element scitypes of the columns match the new specification, specs. There are three valid specifications:

(i) one or more column_name=>Scitype pairs:

coerce(X, col1=>Scitype1, col2=>Scitype2, ... ; verbosity=1)

(ii) one or more OldScitype=>NewScitype pairs (OldScitype covering both the OldScitype and Union{Missing,OldScitype} cases):

coerce(X, OldScitype1=>NewScitype1, OldScitype2=>NewScitype2, ... ; verbosity=1)

(iii) a dictionary of scientific types keyed on column names:

coerce(X, d::AbstractDict{<:ColKey, <:Type}; verbosity=1)

where ColKey = Union{Symbol,AbstractString}.

Examples

Specifying column_name=>Scitype pairs:

using CategoricalArrays, DataFrames, Tables
X = DataFrame(name=["Siri", "Robo", "Alexa", "Cortana"],
              height=[152, missing, 148, 163],
              rating=[1, 5, 2, 1])
Xc = coerce(X, :name=>Multiclass, :height=>Continuous, :rating=>OrderedFactor)
schema(Xc).scitypes # (Multiclass, Continuous, OrderedFactor)

Specifying OldScitype=>NewScitype pairs:

X  = (x = [1, 2, 3],
      y = rand(3),
      z = [10, 20, 30])
Xc = coerce(X, Count=>Continuous)
schema(Xfixed).scitypes # (Continuous, Continuous, Continuous)
source
coerce(image::AbstractArray{<:Real, N}, I)

Given an array called image representing one or more images, return a transformed version of the data so as to enforce an appropriate scientific interpretation I:

single or collection ?NIscitype of result
single2GrayImageGrayImage{W,H}
single3ColorImageColorImage{W,H}
collection3GrayImageAbstractVector{<:GrayImage}
collection4 (W x H x {1} x C)GrayImageAbstractVector{<:GrayImage}
collection4ColorImageAbstractVector{<:ColorImage}
imgs = rand(10, 10, 3, 5)
v = coerce(imgs, ColorImage)

julia> typeof(v)
Vector{Matrix{ColorTypes.RGB{Float64}}}

julia> scitype(v)
AbstractVector{ColorImage{10, 10}}
source
ScientificTypes.autotypeFunction
autotype(X; kw...)

Return a dictionary of suggested scitypes for each column of X, a table or an array based on rules

Kwargs

  • only_changes=true: if true, return only a dictionary of the names for which applying autotype differs from just using the ambient convention. When coercing with autotype, only_changes should be true.
  • rules=(:few_to_finite,): the set of rules to apply.
source